Modelling the M*-SFR relation at high redshift: untangling factors driving biases in the intrinsic scatter measurement

We present a method to self-consistently propagate M? and SFR (Ψ) uncertainties onto intercept, slope and intrinsic scatter estimates for a simple model of the main sequence of star forming galaxies where Ψ = α+βM?+N (0, σ). From simple idealised models set up with broad-band photometry from NIRCam filters at z ∼ 5, we test the method and compare to methods in the literature. Simplifying the Ψ estimate by basing it on dust-corrected MUV can help to reduce the impact of template set degeneracies on slope and intercept estimates, but act to bias the intrinsic scatter estimate. Specifically, the Ψ estimate employed in Santini et al. (2017) significantly under-estimates the intrinsic scatter present in our scenarios. Our test scenarios employ standard star-formation histories used in SED-fitting in the literature. We find that priors in age for constant SFHs, and age and τsfr for delayed histories, act to corral constraints in the M?-Ψ plane. This means that one might infer better Ψ constraints than are actually possible because of the underlying priors. These priors are also incompatible with searching for possible mass-dependence of intrinsic scatter in the main sequence, even for the M? estimates being used with independent Ψ estimates. Ψ estimates obtained from Hα with dust attenuation correction using Hβ are subject to uncertain stellar Hβ absorption if SFHs are stochastic. Strong constraints on rest-frame UV slope are therefore required in addition to Hα and Hβ fluxes when investigating an increase in scatter to low stellar masses. These can be obtained with NIRSpec R100 spectra. We show that, for simple exposure time calculations assuming point sources, without dust, we should be able to probe to log(m?/M ) ∼ 8.5 with JWST at z ∼ 5.


INTRODUCTION
Trends and correlations between different properties of observed galaxies provide vital clues for decoding the underlying physical processes that govern galaxy evolution. One such trend is the tight correlation between star formation rate (SFR) and stellar mass (M ) for 'normal' star forming galaxies (e.g. Brinchmann et al. 2004 at z ∼ 0, Elbaz et al. 2007 at z ∼ 1 and Daddi et al. 2007 at z ∼ 2). This is the socalled 'main sequence of star-forming galaxies', labelled as such by Noeske et al. (2007), which we will refer to simply as the 'main sequence', or M − Ψ relation throughout this paper. The physical origin of this trend is still an active area of Email: ec296@cam.ac.uk debate (e.g., Kelson 2014;Lin et al. 2019;Matthee & Schaye 2019). Perhaps it is only a population average, with objects above and below the relation having very different evolutionary histories. Or perhaps physical processes (e.g. gas infall and dynamics, feedback from stars and active galactic nuclei) draw galaxies back to the main sequence if they venture too far from it in either direction before the eventual cessation of star formation that causes them to drop off the main sequence altogether. Either of these scenarios would produce a trend between M and SFR but the origin of the scatter about the relation would be very different. In the first scenario the distribution of SFR values at given stellar mass and redshift would be determined by the range of possible evolutionary paths, while in the second scenario it would be shaped by feedback duty-cycles that produce short-timescale variations in SFR. Whatever the physical reason for the relation, it must simultaneously explain the existence of the correlation as well as its tightness, with an observed scatter of σ(log SFR/M yr −1 )∼ 0.20-0.35 dex (Daddi et al. 2007;Speagle et al. 2014;Shivaei et al. 2015;Salmon et al. 2015;Kurczynski et al. 2016;Santini et al. 2017).
A major challenge in understanding the physical origin of the main-sequence is the difficulty of linking progenitor and descendent galaxy populations at different epochs. This has motivated the use of galaxy formation models to guide the interpretation of the main sequence. For example, the results of Dutton, van den Bosch & Dekel (2010, using the semi-analytic models of Dutton & Van Den Bosch 2009) support the first scenario where the position of a galaxy within the main sequence is set by the gas accretion history of its parent halo. Halos which started to accrete gas early lie above the relation, and vica-versa. A limitation of the approach of Dutton, van den Bosch & Dekel (2010) is their simple treatment of the mass accretion histories, which are modelled to be smoothly evolving and neglect the impact of mergers. By introducing halo merger driven fluctuations in gas accretion in the analytical equilibrium model of Mitra, Davé & Finlator (2015), Mitra et al. (2017) show that the scatter produced is larger than that found by Dutton, van den Bosch & Dekel (2010) (∼ 0.25 dex compared to ∼ 0.12 dex), more in line with observations (see also Forbes et al. 2014). Mitra et al. (2017) predict only a modest redshift dependence of the scatter, increasing to higher redshifts (∼ 0.2 dex at z ∼ 0.5 to ∼ 0.25 dex at z ∼ 4).
The second scenario is supported by Tacchella et al. (2016), who used cosmological zoom-in simulations to follow the star formation and gas accretion histories of 26 simulated galaxies. By tracking galaxy positions on the main sequence across cosmic time, Tacchella et al. (2016) find that their simulated galaxies oscillate about the main sequence. In their simulations, events such as mergers, disc instabilities and accretion from counter-rotating streams increase the central gas densities and central SFR, leading to shorter gas depletion times. The central gas disc shrinks, suppressing the replenishment of gas to the centre and feedbackdriven outflows cause the SFR to decrease back toward the main sequence. Once the central gas is depleted, the SFR decreases, but if the replenishment timescale is shorter than the depletion timescale, the gas inflow recovers and a subsequent compaction event can occur. Although zoom-in simulations allow smaller scale processes to be resolved, such as the intricate gas dynamical effects seen in the Tacchella et al. (2016) simulations, they lack the statistics to compare the impact of these processes to those driven by halo mass accretion histories (the main driver of scatter in the Dutton et al. 2010 SAM andMitra et al. 2017 semi-analytic model).
Most likely the main sequence is shaped by a combination of the two scenarios, but to disentangle their relative contributions requires cosmological hydrodynamic simulations of galaxy formation that include prescriptions of sub-grid physics that cannot be resolved in the simulation itself. For instance, Matthee & Schaye (2019) find that SFR variations over both long and short timescales contribute to the scatter about the main sequence within the EAGLE simulation of galaxy formation and evolution (Crain et al. 2015;Schaye et al. 2015). They find that galaxies may cross the main sequence many times over their lifetime (as in the simulations studied in Tacchella et al. 2016), but galaxies fluctuate about tracks that depend on the halo properties, most noticeably the halo formation time (reminiscent of the Dutton et al. 2010 results).
While at z ∼ 0 the scatter in the EAGLE simulation is shown to modestly increase with decreasing stellar mass, at 0 z 4, Matthee & Schaye (2019) find a greater decrease in scatter with increasing redshift at ∼ 10 9 M than at higher masses. The higher scatter at ∼ 10 10 M compared to ∼ 10 9 M at ∼ 2 − 4 is attributed to the enhanced influence of AGN feedback at the higher stellar masses. At lower stellar masses, it is likely that stellar feedback prescriptions have a greater impact on the scatter. Sparre et al. (2015) investigate the form of the main-sequence in the Illustris simulations (Vogelsberger et al. 2014), finding constant scatter of σ ∼ 0.2-0.3 dex at M 10 10.5 M at 0 z 4. The FIRE simulations (Feedback in Realistic Environments, Hopkins et al. 2014) use zoom-in simulations to resolve the ISM in galaxies down to scales required to properly model stellar feedback. Sparre et al. (2017) show that these simulations predict an increase in scatter to low stellar masses at z ∼ 2 (see their figs 3 and 4), and predict much higher scatter in SFR than seen in the EAGLE simulation at these stellar masses (∼ 0.4 dex compared to ∼ 0.2 dex in the EA-GLE simulations at 10 9 M ). These simulations suggest that differences in the treatment of stellar feedback, a stochastic process, can have large impact on the predicted scatter about the main sequence.
Cosmological hydrodynamic simulations modelling the high redshift Universe are generally run over much smaller volumes than the simulations designed to study present-day galaxy populations. This is because they aim to model the formation and feedback processes of low-mass galaxies, considered to be the dominant source of hydrogen-ionising photons in the high-redshift Universe (see e.g. Dayal & Ferrara 2018, for a review). High-redshift simulations reveal bursty SFHs of low-mass galaxies (e.g. Dayal et al. 2013;Yajima et al. 2017;Rosdahl et al. 2018), which would likely lead to higher scatter in the main-sequence to low stellar masses, as seen in the FIRE simulations (Sparre et al. 2017).
Currently, all SAMs and cosmological hydrodynamic simulations have problems fitting to the evolution of the low-mass end of the stellar mass function at high to intermediate redshifts (z ∼ 1-4), as well as the evolution of the main sequence over the same redshift range (see figs. 4 and 5 Somerville & Davé 2015). This is likely because the stellar feedback prescriptions are not yet optimal in any of the simulations. This highlights the possibility that providing accurate second-order measurements of the main-sequence, like the intrinsic scatter, may help to constrain physical processes (e.g. gas dynamics and stellar feedback) that govern the evolution and form first-order measurements (slope and intercept).
Measuring the evolution of the main sequence and its intrinsic scatter across a wide range of masses and redshifts can therefore constrain different galaxy evolution scenarios, and help to disentangle what physical mechanisms shape the main sequence (e.g. the gas dynamical properties, stellar and AGN feedback prescriptions or halo mass accretion rate distributions). In practice, the measurements are limited by the significant uncertainties affecting SFR and stellar-mass estimates, especially at high redshifts. Robust measurements of galaxy SFR can be obtained from measurements of Hα line flux (using Hβ to help correct for dust attenuation), which traces the emission of massive stars re-processed by ionized gas; or through the sum of ultra-violet (UV) and infra-red (IR) emission, which probes both the direct (unobscured) and attenuated (reprocessed by dust) UV emission from these young stars. At high redshifts, both types of measurements become difficult as Hα is no-longer visible from the ground at z 2.5, and bolometric IR measurements are challenging except in the most luminous objects (e.g. with Herschel ; Gruppioni et al. 2013). The Atacama Large Millimeter Array (ALMA) provides the sensitivity to search for dust emission from 'normal' star-forming galaxies at z 3, but direct detections were only acquired for a very small fraction of the z 3 objects in the Hubble Ultra-Deep field (Bouwens et al. 2016;Dunlop et al. 2017). Given these limitations, one must rely on fitting broad-band UV-to-near-IR photometry with galaxy spectral evolution models to derive redshifts, SFRs and stellar masses for large samples of galaxies at high redshifts. This approach has pushed constraints of the main sequence to redshifts as high as z ∼ 7, with studies showing that such a relation may have been in place since ∼ 900 Myr after the Big Bang (e.g. González et al. 2011;Salmon et al. 2015;Santini et al. 2017).
Various studies have attempted to constrain the intrinsic scatter of the main sequence. For example, Kurczynski et al. (2016, hereafter K16) derived the slope, intercept and intrinsic scatter of the main sequence at 0.5 < z < 3 by fitting the broad-band photometry from the Hubble Ultra Deep Field (HUDF12; Ellis et al. 2013;Koekemoer et al. 2013), Ultraviolet Ultra Deep Field (UVUDF;Teplitz et al. 2013) and CANDELS/GOODS-S (Grogin et al. 2011;Koekemoer et al. 2011) campaigns. They find a moderate increase of the intrinsic scatter with increasing cosmic time, but no significant dependence on stellar mass. At higher redshifts, Salmon et al. (2015, hereafter Sal15) used CANDELS data to provide the first estimates of the intrinsic scatter in 3 redshift bins at 3.5 < z < 6.5, finding a scatter of ∼0.2-0.3 dex (once deconvolved from measurement uncertainties). Santini et al. (2017, hereafter San17) exploited the effect of gravitational lensing in the HFF to reach lower mass-completeness limits than those reached in standard blank-field surveys. Adopting a sophisticated modelling approach to account for the complicated selection function of their sample, they find tantalising, but uncertain, evidence that the intrinsic scatter of the main sequence increases at low stellar masses.
In practice, comparing the constraints on the main sequence obtained in the various studies mentioned above is complicated since they rely on different SED modelling prescriptions: K16 and San17 adopt delayed exponential starformation histories (SFHs), while Sal15 use a constant SFH; K16 derive SFRs directly from SED fits, while Sal15 and San17 estimate SFRs using rest-frame UV magnitudes cor-rected with different dust-attenuation prescriptions. Finally, all three studies adopt different approaches to model the galaxy main sequence in the presence of uncertainties on stellar mass and SFR.
Regardless of the models used, the stellar masses and SFRs estimated in the above studies suffer from significant uncertainties arising from the faintness of high-redshift sources, the sparse sampling of their SEDs (especially at z 3.5, where often the only detections redward of the Balmer break are supplied by the 3.6 and 4.5 µm Spitzer IRAC filters) and emission-line contamination to photometry (Schaerer & de Barros 2009;Curtis-Lake et al. 2013;de Barros et al. 2013;Smit et al. 2014). Emission-line contamination is particularly harmful, since it affects the photometric bands sampling the SED redward of the Balmer break, which more closely traces emission from stars accounting for the bulk of a galaxy's stellar mass. Intrinsicscatter measurements are highly sensitive to this myriad of uncertainties, as well as to the methods employed to account for them.
In recent years, there has been a significant drive toward developing Bayesian spectral-modelling tools able to provide robust uncertainties on estimates of galaxy physical parameters (e.g. beagle Chevallard & Charlot 2016, bagpipes Carnall et al. 2018and prospector Leja et al. 2017Johnson et al. 2019). In addition, efforts to formalise the inclusion of nebular (both continuum and line) emission in spectral models (e.g. Gutkin, Charlot & Bruzual 2016;Byler et al. 2017;Plat et al. 2019, following the method of Charlot & Longhetti 2001) have led to the possibility of explicitly accounting for variations in nebular parameters (e.g, interstellar metallicity, ionization parameter, metal depletion onto dust grains) and thus to self-consistently include the effect of emission lines on broad-band fluxes in the fitting process. These new spectral modelling techniques allow a better quantification of the uncertainties affecting physical-parameter estimates in individual galaxies, while the propagation of these uncertainties on population-wide parameters (e.g., intercept, slope and intrinsic scatter of the main sequence) still requires the development of innovative modelling approaches.
In this paper we present an original Bayesian Hierarchical approach to model the main sequence based on SEDfitting results obtained with beagle. This modelling allows us to self-consistently propagate the uncertainties of stellar mass and SFR estimates of individual galaxies to the measurement of the slope, intercept and intrinsic scatter of the main sequence. We also use idealised mock galaxy catalogues to compare our approach to other methods employed in the literature. These tests allow us to understand how common SED-fitting approaches can affect estimates of the intercept, slope and intrinsic scatter of the main sequence. In Section 2 we introduce the idealized scenarios that we set up to test different approaches to model the main sequence. These approaches are described in Section 3. The results of our analysis are presented in Section 4, while in Section 5 we discuss our findings with respect to the prospects of determining the mass-dependence of the intrinsic scatter at high redshifts. Throughout the paper we adopt a Chabrier (2003) initial mass function with upper-mass cutoff of 100M and employ a flat ΛCDM cosmology with ΩΛ = 0.7, ΩM = 0.3, and H0 = 70 h70 kms 1 Mpc 1 .

SIMULATING AND FITTING GALAXY SEDS WITH BEAGLE
In this paper we present a Hierarchical Bayesian approach to measuring slope, intercept and scatter of the main sequence and compare it to other methods employed in the literature. To provide a basis for these comparisons, we construct a number of simple scenarios consisting of populations of mock galaxies with a known input main sequence. In this section, we describe the set of scenarios we consider, and in the next section, we will provide details about the different methods used to measure the slope, intercept and scatter. This approach allows us to compare directly the strengths and weaknesses of these methods. We set up four different scenarios with known input Mtot − Ψ relation (Mtot is the integrated star formation history, see Table 1). 1 We model Mtot − Ψ as a linear relation with Gaussian scatter in the dependent variable, Ψ: where: In equation 1, α, β and σ are the intercept, slope and intrinsic scatter, respectively. Throughout this paper we use the notation N (0, σ 2 ) to denote a Gaussian distribution with zero mean and standard deviation, σ (where σ 2 denotes the variance). The four scenarios increase in complexity by progressively including more of the effects that we see in real galaxy samples. The first scenario (scenario i) is designed as an ideal test-case, where the measurement errors on Mtot and Ψ are modelled to be correlated, bi-variate Gaussians: , where N2 denotes a bi-variate Gaussian. The values of this covariance matrix were chosen to provide uncertainties that co-vary with maximum variance perpendicular to the input relation (which is chosen to have a slope, β = 1) in order to resemble the uncertainties found by Sal15 (see their fig. 7). The constant covariance matrix means that the measurement errors in this scenario are 'homoskedastic'. We produce 10 realizations, each containing 100 objects with Mtot and Ψ values drawn from the input Mtot − Ψ relation. These Mtot and Ψ values are then perturbed by the Gaussian uncertainties described by the 1 Although M is the physical property of interest, for the sake of these tests we use Mtot (the integrated SFH, see Table 1) for the mass measurements, as it is possible to set Mtot and Ψ exactly for each galaxy from the SFH parameters (τsfr, t, etc., see Table 1), while M depends on the mass fraction returned to the interstellar medium by evolved stars, which in turn depends on other physical parameters (e.g., the metallicity).  covariance matrix. The other three scenarios (scenarios iiiv) use mock photometry of 1000 objects with known input physical properties that follow the chosen input Mtot − Ψ relation, before performing SED-fitting in order to provide posterior probabilities of Mtot and Ψ. To produce and fit the mock photometry, we use the new-generation Bayesian galaxy spectral modelling tool beagle (Chevallard & Charlot 2016). This employs the most recent version of the Bruzual & Charlot (2003) stellar population synthesis models, incorporated self-consistently in the photoionization models of Gutkin, Charlot & Bruzual (2016). These stellar plus photoionization models allow us to explore a wide range of galaxy SEDs, which are described by the physical parameters reported in Table 1. We do not exploit the full range of nebular parameters allowed by the models, choosing to keep the hydrogen density fixed to nh = 100 cm −3 , which is close to the typical value measured for galaxies at z ∼ 2 − 3 (e.g. Sanders et al. 2016;Strom et al. 2017). We also fix C/O ratio to the solar value of (C/O) = 0.44 since we are primarily interested with broad-band photometric fits and the only lines strong enough to significantly affect the broad-band fluxes are unaffected by C/O. We include dust attenuation following the physically-motivated Charlot & Fall (2000, hereafter CF00) dust prescription, which accounts for the enhanced attenuation suffered by young stars still embedded in their stellar birth clouds, as well as the effect of absorption by the intergalactic medium following the model of Inoue et al. (2014).
In order to produce mock photometry with beagle for scenarios ii-iv, we are required to choose a set of photometric filters and a redshift. Given the difficulties of obtaining stellar mass estimates at high redshifts using current facilities, mainly because of the lower sensitivity of Spitzer compared to HST, we produce the mock photometry using a set of JWST /NIRCam broad-band filters. We fix the redshift to z = 5, since this is the lowest redshift for which the Near-Infrared Camera (NIRCam) on board JWST will cover the entire wavelength range from UV (λ ∼ 1100Å) to optical (λ ∼ 8300Å). At higher redshifts, objects will be fainter and thus constraints on M and Ψ will be more challenging, while at lower redshifts constraints on M and Ψ will depend on the variable depths of complementary HST observations sampling the rest-frame ultra-violet of JWST -observed sources. In practice, we consider a set of JWST /NIRCam broad-band filters to produce noiseless mock photometry, to which we add Gaussian noise to mimic the depths summarised in Table 3. These depths correspond to the predicted 5σ limits of the NIRCam GTO medium survey (e.g. Williams et al. 2018). Another way that the scenarios increase in complexity is by the distribution of Mtot values. The Mtot values in scenarios i-iii are Gaussian-distributed with mean (µM) and standard-deviation (σM) of [µM, σM] = [9.5, 0.5]. In scenarios ii and iii, where we produce mock photometry using beagle, we choose the input distribution of Mtot to allow all objects to be detected at the chosen photometric depths in all bands in Table 3. For scenario iv, however, Mtot values are drawn from the z ∼ 5 Duncan et al. (2014) measured stellar mass function and in this case we must also account for survey selection effects. In scenario iv, we therefore keep only galaxies with at least 3 bands detected with > 3σ. This simple approach neglects the photometric redshift uncertainty which is larger at low stellar masses and which can scatter objects into different redshift bins (see, e.g. Kemp et al. 2019 for a quantification of this effect on stellar mass function estimates).
In all scenarios, Ψ values are drawn from the corresponding Mtot − Ψ relation defined by the set of values [α, β, σ], (the value for each parameter is given in Table 2). To produce the mock SEDs in scenario ii and iv, we adopt a constant SFH, for which a combination of the model parameters Mtot and galaxy age, t, determine the Ψ of each galaxy. We choose each input Mtot − Ψ relation such that the required values of t would be less than the age of the Universe at z ∼ 5. In scenario iii we adopt a delayed SFH, for which Ψ depends on the combination of τsfr (the timescale of star formation for a delayed SFH as defined in Table 1) and t. To produce the mock SEDs, we also have to set other key physical parameters in beagle, which are summarised in Table 4 (their physical meaning is reported in Table 1). We then fit each scenario using the same model assumptions with which it is built.
Throughout the rest of this paper we will refer to each of these scenarios using a label. Scenario i will be referred to as the ideal-G scenario, scenario ii the const-G scenario, scenario iii the delayed-G scenario and scenario iv the const-MF scenario. The first part of the label refers to the way in which Mtot and Ψ estimates and uncertainties are derived (an ideal model of homoskedastic, Gaussian uncertainties, or derived by fitting to mock photometry with beagle using either constant or delayed SFHs) while the second part of the label refers to the distribution of Mtot values created in the scenario (either Gaussian, G, or z ∼ 5 mass function, MF).

MEASURING THE M − Ψ RELATION
Given a sample of galaxies with stellar mass and SFR estimates, the simplest way to obtain the slope and intercept of the main sequence would be to use ordinary linear regression. In this case, the scatter can be estimated from the standard deviation of the offsets in Ψ from the relation. However, ordinary linear regression assumes that the 'true' values of the dependent and independent variables only deviate from the line because of random Gaussian fluctuations of the dependent variable, and that these fluctuations have a constant variance (i.e. they are homoskedastic). 2 If these random fluctuations are due to homoskedastic measurement errors then the underlying model requires that the 'true' values are drawn from a very tight linear relationship. A further assumption of ordinary linear regression is that the distribution of the independent variable is uniform. The above assumptions do not hold in our case, since uncertainties in the measurements of Ψ and M can be correlated, and non-Gaussian. Moreover, the uncertainties tend to be heteroskedastic, as they are larger at low M where objects Table 2. Summary of four test scenarios. The columns give, in turn, the test label, the form of Ψ − M uncertainties, whether-or-not beagle is used to produce and fit to mock photometry in the test, the SFH used to produce the mock photometry and subsequently fitted to it, the parameters of the Mtot − Ψ distribution in the test, the distribution in Mtotand whether-or-not selection effects are accounted for in the test. are fainter. Also, the distribution of M values from a fluxlimited survey is not uniform, but rather a convolution of the stellar mass function and the mass completeness of the survey. Finally, it would be a strange Universe indeed if M and Ψ lay along a very tight linear relationship. Given these considerations, SFR values are therefore better modelled as distributed about the linear relationship with some scatter while also explicitly modelling the measurement uncertainties. We address this in the present work by implementing a fully Bayesian Hierarchical method for constraining α, β, σ in the presence of heteroskedastic, co-varying errors and a non-uniform distribution of stellar mass values. Whether or not the intrinsic scatter is uniform with stellar mass at high redshift is still unknown, although San17 find tantalising evidence of an increase of the intrinsic scatter toward low stellar masses. Some simulations also suggest that the intrinsic scatter increases toward low stellar masses because of short timescale variations of star-formation (e.g. Sparre et al. 2017;Matthee & Schaye 2019). Here, we choose to model the relation with a scatter that is constant with stellar mass but will discuss the prospects for extending that model in section 5. We compare the results obtained with our hierarchical method with those obtained using ordinary linear regression, and also study the impact of different methods used in the literature to estimate Ψ. We summarise the assumptions held by each of these methods in Table 5.

Method I: Linear regression to joint posterior medians (the PM method)
The simplest way to derive α, β and σ is to perform ordinary linear regression (via ordinary least-squares) on point-wise estimates of M and Ψ, e.g. on their posterior median values, which we will refer to as M med and Ψ med respectively. 3 As described above, linear regression assumes that the true values deviate from a straight line with random Gaussian fluctuations in the dependent variable only. Given a sample of galaxies with stellar mass and SFR estimates, we provide an estimate of σ from the standard deviation of residuals from the best-fit linear relation. In essence this assumes that any deviation from the straight line is due to intrinsic scatter and as such should provide an estimate of the convolution of the intrinsic scatter and the scatter due to measurement uncertainties in M and Ψ. As such, this method does not rigorously treat uncertainties in either Ψ or M , and does not allow for covarying errors or a non-uniform M distribution. We will refer to this method throughout the paper as the PM (Posterior Median) method.

Method Ia: Linear regression with Sal15 Ψ measurements (the PM-Sal15 method)
Sal15 provided the first estimates of intrinsic scatter in the M − Ψ relation to high redshift (z 4). They derive M adopting a constant SFH, and Ψ with an updated form of the Kennicutt (1998) UV-luminosity to SFR conversion (adjusted to a Chabrier 2003 IMF). The original Kennicutt (1998) conversion assumes there has been at least 100 Myr of constant star formation in the galaxy. By 100 Myrs the relative contribution to the rest-frame UV of very young, hot and luminous stars to slightly older, cooler stars becomes fairly stable. At younger ages, however, there are fewer intermediate-age stars and therefore a higher SFR is needed to produce the same UV luminosity. The updated conversion by Reddy et al. (2012) takes account of the higher Table 4. The parameters used to produce mock photometry, and then fitted to it in scenarios ii-iv. Mock photometry and photometric fits are produced with beagle. The central column describes the distribution of the parameter (right column) used to produce the mock photometry while the left column describes the prior set on that parameter when fitting.

Parameter
Mock distribution Fitted prior Fixed 0.4 a Parameter only used with delayed SFHs. Table 5. Summary of the assumptions held by the different methods employed here to measure the main parameters of the M − Ψ relation. The table on the left describes each method while the table on the right gives details of which assumptions are taken into account with each method.

I -PM
Ordinary linear regression to joint posterior medians with intrinsic scatter estimated from deviations from relation.

Ia -PM-Sal15
As for I but with Ψ estimates provided using the method of Sal15.

Ib -PM-San17
As for I but with Ψ estimates provided using the method of San17.

II -BH
Kelly (2007)  SFR-to-UV ratio found in young galaxies by including a dependence on age. However, this calibration neglects the dependence of the UV-to-SFR conversion on stellar metallicity as well as any contribution from nebular continuum emission, which is itself dependent on the metallicity and ionisation parameter of the ionised gas. As such, use of the Reddy et al. (2012) UV-to-SFR conversion may artificially reduce the scatter in measured Ψ values.
To test the SFR estimation employed by Sal15, we take the observed flux closest to rest-frame 1500Å for each simulated galaxy and correct for dust using AUV med , the median of the marginalised attenuation at 1500Å. Sal15 uses both the Calzetti et al. (2000) and an SMC-like dust attenuation curve in their analysis, while here we are using the CF00 two component dust model, which implies a dependence of the effective AUV on a galaxy's SFH. Once the dust-corrected 1500Å flux is converted to luminosity (using z med ), we use t med of each galaxy to estimate Ψ using the Reddy et al. (2012) UV-to-SFR conversion.
To determine the parameters α, β and σ describing the M − Ψ relation, Sal15 apply a weighted linear regression algorithm to Ψ values binned according to M . The weights and reported scatter, σ, are given by the standard deviation of Ψ values in each bin. This definition of σ differs from that in equation 1, as the variation of Ψ as a function of M within each M bin will provide an additional scatter component in their measurements. Sal15 also provide estimates of the intrinsic scatter de-convolved from measurement uncertainties in M and Ψ. Here, we do not attempt to reproduce the exact measurement procedure employed in Sal15, but estimate α, β and σ using the approach described in Section 3.1. The only difference with respect to the method of Section 3.1 is hence in the way we estimate Ψ. In this way, we can study to what extent the Ψ estimation affects derived properties. We will refer to this method throughout the paper as the PM-Sal15 method.

Method Ib: Linear regression with San17 Ψ measurements (the PM-San17 method)
San17 use data from the Hubble Frontier Fields (Lotz et al. 2017) to probe the M − Ψ relation to lower masses than the Sal15 study by exploiting the magnification of galaxies by foreground clusters. Their method to measure the main-sequence parameters also relies on a dust-corrected Ψ estimate. The attenuation-corrected UV-luminosity is esti-mated using the Meurer, Heckman & Calzetti (1999) relation. This correction essentially assumes that every galaxy has the same intrinsic UV-slope of −2.23, and any deviation from this slope is caused by dust. Using the Calzetti et al. (2000) dust curve, they derive a dust-corrected UVluminosity value, which they then convert to Ψ using the Kennicutt & Evans (2012) conversion. This conversion essentially updates the Kennicutt (1998) calibration given new stellar models and is based on a Chabrier (2003) IMF, but unlike the Reddy et al. (2012) calibration adopted by Sal15, it does not include the effect of stellar population age on the UV-to-SFR conversion.
As in the case of the PM-Sal15 method, we estimate α β and σ in the same way as for the PM method, but using Ψ estimated with the San17 approach. We will refer to this method as the PM-San17 method from now on. This method differs from the full method of San17, who provide estimates of the intrinsic scatter after accounting for the measurement uncertainties. Their method also carefully accounts for the bias introduced when relatively more numerous, low mass objects are scattered into the selected sample, because of the measurement uncertainties, than the less populous, higher mass objects that are scattered out of it (Eddington bias). Our approach of only varying the method used to derive Ψ between the PM, PM-Sal15 and PM-San17 methods, allows us to isolate the effects of Ψ estimation on the derived Mtot − Ψ relation.

Method II: The Kelly (2007) model (the Bayesian Hierarchical, or BH method)
As discussed at the beginning of this section, the assumptions of ordinary linear regression are not appropriate in the case of measuring the main sequence. Kelly (2007, hereafter K07) presented a Bayesian Hierarchical model that self-consistently derives the posterior probability of the intercept, slope and intrinsic scatter between variables with correlated, heteroskedastic errors and a non-uniform distribution of covariate values. We therefore adapt this model to work with output joint posteriors on Mtot-Ψ derived from SED-fitting with beagle. The K07 model is general and not specifically written for modelling the main sequence, but we will summarize key components of the model here using M as the independent variable, and Ψ as the dependent variable.
Before describing the K07 model, it may be helpful to refer to Bayes theorem, in particular to the definitions of posterior probability, likelihood and prior probabilities. The Bayes equation can be written as: where P(Θ | D, H) is the posterior probability distribution of the model parameters Θ given a set of data, D and a model (or hypothesis) H. In this equation: P(D | Θ, H) is the likelihood function, which quantifies the statistical agreement between model and data for a fixed set of parameters; P(Θ | H) expresses the prior probability, which encapsulates our knowledge on the model parameters before analysing the observations; the denominator is the evidence (or marginal likelihood), a normalization factor often written as P(D | H). A Bayesian Hierarchical model is one that is structured over different 'levels'. The highest level of the K07 model describes the distribution of M values which do not have to be uniform. This can be written as: which denotes that stellar masses have some distribution given by P(M | η), where η is a set of variables specifying the shape of that distribution. In the K07 sampler, this distribution is a weighted linear combination of a set of Gaussians, or a Gaussian mixture model (GMM). The second level describes the distribution of Ψ values given M , which is modelled as a linear relation with some Gaussian scatter: (note that this describes the same relationship as in equation 1 but written in a different way here for clarity).
The lowest level of the model describes the measurements of M and Ψ of the individual objects. The K07 model assumes that one has point-wise estimates of the two variables of interest, and some Gaussian error on these estimates that may be correlated and is described by covariance matrix Σ. Taking a measurement in this way can be thought of as drawing the measured values from a Gaussian distribution (with some covariance Σ) centred on the 'true' values of M and Ψ. We will label these point-wise estimates x and y: Modelling the M and Ψ estimates in this way is not wholly appropriate for our needs as we derive posterior probability distributions of Ψ and M using beagle. We will describe how we incorporate the beagle estimates at the end of this section but proceed to describe the model assuming we can take direct measurements of M and Ψ (following the model presented in K07).
Given this model, we wish to calculate the posterior probabilities of the main-sequence parameters (α, β, σ), as well as the parameters describing the distribution in M (η from equation 4) given the measurements and the 'true' M and Ψ values. We can write this posterior probability using the chain rule as: where the first conditional probability on the left-hand side is described in equation 6, the second conditional probability describes the linear relation with Gaussian scatter given in equation 5, the third describes the distribution of M values (equation 4) and P(α, β, σ, η) describes the prior probabilities assumed for the parameters of interest. However, we must note that under normal circumstances we do not know the 'true' M and Ψ values for individual objects which means that we cannot define these conditional probabilities. K07 deals with this by treating the true values as missing data that must be marginalised over, thus obtaining the observed data likelihood function: This likelihood function can be used to compute the maximum-likelihood. However, computing this integral is non-trivial and K07 proposes a Gibbs sampler. A Gibbs sampler is a Markov Chain Monte Carlo (MCMC) algorithm that generates posterior samples by performing random draws from the full conditional probability distribution of each free parameter in turn. In our case, each free parameter from equation 7 will be updated in turn within each MCMC iteration. In practice, treating the 'true' values of M and Ψ as missing data means that these are treated as free parameters which will also be updated in each iteration of the sampler. We refer the reader to K07 for further details, in particular section 6.2.1 which lists the steps performed in each iteration. We have employed the Python implementation of the K07 Gibbs sampler, as implemented by Josh Melior 4 . As stated above, rather than having direct measurements of M and Ψ (which we referred to as x and y), we have beagle-derived joint M -Ψ posterior distributions that we do not want to assume to be described by a single Gaussian. We therefore extend the K07 Gibbs sampler to accept Gaussian mixture models of the beagle-derived object M -Ψ posterior probabilities. 5 Essentially, these models are given by the sum of K bi-variate Gaussians, each with weights given by π k , mean µ k = [M ,Ψ], and covariance matrix ζ k : In this equation, F = [f1, f2, ..., fN ] and E = [e1, e2, ..., eN ] are the set of N broad-band fluxes and flux errors, respectively, and ] is the set of free parameters describing the GMM. These joint posterior probability distributions are not the same thing as direct measurements of M and Ψ as written in Equation 6. Our data are flux estimates and our posterior probabilities give the probability of the 'true' M and Ψ values given the data, rather than the other way around. However, within the Gibbs sampler, when we update the 'true' M and Ψ values we sample directly from the full conditional probabilities. For M this will look like:  where we can derive P(M | F, E, Ψ) from the joint M -Ψ posterior defined in equation 9. We fit GMM models to the beagle-derived joint Ψ − M posterior distributions using sklearn.mixture.GaussianMixture from the python package scikit-learn Pedregosa et al. (2011). After some experimentation we opted to fit 3 Gaussians to each posterior, finding that this prevented over-fitting to individual features, but was flexible enough to represent the range of mophologies in the output posterior distributions. An example GMM fit is shown in figure 2, where we display samples from the joint Ψ − Mtot posterior from the beagle fit to a single object from the const-G scenario, which was fit to with a constant SFH. Over-plotted are the samples from the GMM fit to the Ψ − Mtot posterior.
As constructed, the Gibbs sampler self-consistently propagates the uncertainties on individual object Ψ and M estimates onto the uncertainties on main sequence parameters α, β and σ. By using beagle-derived Ψ estimates, the full uncertainties in dust attenuation, intrinsic UV-slope and UV-to-SFR conversion are self-consistently accounted for. Additionally, the model explicitly accounts for a nonuniform distribution of M values.

RESULTS
The four test scenarios described in Section 2 allow us to test the four methods outlined in Section 3. The ideal-G scenario was constructed to demonstrate that the BH method works as expected in the case of homoskedastic measurement uncertainties that are described by single, covariant Gaussians before considering the added complexity of using Mtot and Ψ estimates derived with beagle. We show the results in Fig. 3 where the posterior median and 95 % credible in-   terval of the parameters α, β and σ are plotted for the ten independent realizations of the ideal-G scenario. The results demonstrate that the BH method allows us to recover the input parameters (intercept, slope and scatter of the Mtot − Ψ relation) in an unbiased way. The const-G, delayed-G and const-MF scenarios employ beagle to fit to noisy mock photometry to obtain Mtot and Ψ estimates. These scenarios hence naturally account for the fact that SED fitting provides poorer constraints on lower mass, fainter objects (i.e. the uncertainties on Mtot and Ψ are heteroskedastic). Fig. 4 shows a 'heatplot' of samples randomly drawn from the joint posterior probability of (Mtot, Ψ) for 100 of the objects produced for the const-G scenario. Fig. 4 clearly illustrates the effect of the heteroskedastic uncertainties. Additionally from this plot we see that the Ψ − Mtot uncertainties now include an imprint of the priors associated with other physical parameters that were sampled over in the beagle fits. In particular, limits in the prior on galaxy age t impose sharp cutoffs in allowed Ψ (shown as white dashed lines in Fig. 4).
The heteroskedastic nature of the Ψ − Mtot uncertainties leads to constraints on α, β and σ that depend on which objects are included in the analysis. In practical situations stellar mass cuts are imposed on samples of galaxies to ensure that either the samples are complete in stellar mass, or that only objects with good enough constraints on stellar mass and Ψ are used to derive population-wide relations. In Fig. 5 we hence adopt a similar approach and show the constraints on α, β and σ obtained for the const-G scenario when imposing different minimum stellar mass cuts. The results obtained with each of the methods described in Section 3 are shown with different symbols, as indicated in the figure legend. The top two panels of Fig. 5 display the constraints on intercept and slope, showing that the BH method derives a systematically steeper Mtot − Ψ relation with a lower intercept when the lowest mass cuts are applied. This is because the Ψ and Mtot estimates are biased to low and high values, respectively, at low stellar mass. These biases are caused by degeneracies between galaxy age and dust attenuation, which can make a dusty, young galaxy look similar to an older galaxy. The effect of these degeneracies becomes more prominent at low stellar masses where the photometry has lower S/N. The effect can be seen in Fig. 4 where the joint posterior has higher probability below the input relation than above it at low stellar mass. The uncertainties are asymmetric across the relation because the age-dust degeneracy is most severe at old ages. At young ages the upper limit on the prior onτv will limit the number of templates that can reasonably fit to the SED of an older object by employing dust attenuation. Additionally, the younger templates do not show a strong Balmer-break and, even attenuated, will be a poor fit to an observed galaxy that shows evidence of a Balmer-break. The estimates of σ with the BH method are less biased than those of α and β where mock photometry are made and fitted to using constant SFHs. The panels display the derived intercept, α, slope, β and intrinsic scatter, σ using each method described in section 3. For the results derived using the BH method, the median and 68% confidence limits of the posterior distribution for each parameter is plotted while the results derived using the PM, PM-Sal15 and PM-San17 methods are displayed as stars and are plotted with small offsets in the x−axis for clarity. The results are shown as a function of lower-limit in Mtot, where this limit is applied based on measured median Mtot value, and the true input values are shown as dashed horizontal lines in each panel.
at low stellar masses. Estimates of σ are somewhat biased to low values for the samples with low stellar mass cuts, but they agree within the 95% confidence contours (not displayed here). Fig. 5 also shows that the constraints obtained with the PM method generally follow the same trends as those obtained with the BH method, with the σ estimates recovering the input value well. We might have expected the PM method to significantly over-estimate σ, since it gives a mea- emergent prior 6 on Ψ respectively, and we would measure a very small scatter from the posterior median values. The PM-Sal15 method produces estimates with steeper β and lower α than the input relation, even for the highest Mtot limits. This is because the Reddy et al. (2012) UV-to-SFR calibration is systematically offset from the relation inherent in the Gutkin, Charlot & Bruzual (2016) stellar+nebular models that we use. Part of this offset is due to the contribution of nebular continuum emission to the rest-frame UV in the Gutkin, Charlot & Bruzual (2016) models that is not accounted for in any of the standard UV-to-SFR calibrations. Not only would a UV-to-SFR calibration based on the Gutkin, Charlot & Bruzual (2016) models have an age and stellar metallicity dependence, but it also has a significant dependence on the metallicity, Z, and ionization parameter, log Us, of the ionized gas. We further discuss the impacts of the dependencies of the UV-to-SFR ratio in Section 5.1.1 (see Fig. 9).
The PM-Sal15 method also biases the measured σ to low values. This bias is driven by the somewhat contrived scenario whereby we vary the age of the constant SFH at given stellar mass to produce the desired relation, meaning that objects above the relation will have younger ages than objects below the relation. The discrepancy in UVto-SFR ratio between the Reddy et al. (2012) relation and that inherent in the Gutkin, Charlot & Bruzual (2016) models varies with stellar age, and therefore varies systematically across the relation, biasing the estimate of the intrinsic scatter. Although we do not expect either UV-to-SFR calibration to be a perfect representation of the 'true' ratio present in the galaxy population, it is worth noting that the Reddy et al. (2012) calibration used in the PM-Sal15 method does not account for the stellar metallicity dependence, or any dependence on nebular parameters, which vary in the true galaxy population.
The α and β estimates derived using the PM-San17 method are less biased than those from the BH and PM methods, but σ is significantly under-estimated. The Kennicutt & Evans (2012) calibration used in the PM-San17 neglects the stellar age dependence of the calibration thus avoiding the age-dust degeneracy that is inherent in t andτv estimates derived from SED-fitting. Additionally the Kennicutt & Evans (2012) calibration neglects any dependence on stellar metallicity and nebular gas parameters, as for the Reddy et al. (2012) calibration, which would likely cause σ to be under-estimated. Moreover, the extreme assumption of the PM-San17 method that each galaxy has the same intrinsic UV slope causes estimates of σ to be biased to even lower values. Fig. 6 shows the same results as Fig. 5, but for the delayed-G scenario. Fig. 6 shows that the constraints on β and α are biased towards a steeper relation with lower α at all mass cuts. We also display the measurements using the BH and PM methods for a similar scenario but with an Mtot distribution described by a Gaussian centred on Mtot = 10 rather than Mtot = 9.5 [Mtot ∼ N (10, 0.25)]. These results are displayed as grey symbols and show unbiased estimates of α and β. This indicates that when the majority of objects included in the sample have strong photometric constraints, the results of α and β are not biased. The reason for the bias is similar to that for the constant SFH where at low stellar masses, when objects are fainter, t becomes harder to constrain and the results are heavily influenced by the age-dust degeneracy in the templates. However, comparing the results of the scenarios with Mtot ∼ N (9.5, 0.25) between Figs. 6 and 5 we see that the results of the delayed-G Incomplete Figure 8. The results of fitting to the const-MF scenario where mock photometry are made and fitted to using constant SFHs, and mass values are drawn from a mass function (see Table 2). The median and 68% confidence contours of the posterior distribution for each parameter are plotted for the K07 sampler results as filled circles with error bars whereas the measurements performed with individual object Ψ − Mtot posterior medians and alternative Ψ estimates are plotted as stars. The results are shown as a function of lower-limit in Mtot, where this limit is applied based on measured median Mtot value. The blue horizontal lines show the true input values while the grey pentagons show the measured relation for only galaxies that meet our selection criteria above the given Mtot limit. The orange shaded region delimits the measurements that affected by incompleteness (e.g. see Figure  1). scenario are biased in samples with higher Mtot limits than for the const-G scenario. This is because the ages are less well constrained on average in the delayed-G scenario than in the const-G scenario, because of the added free parameter, τsfr. Additionally the delayed SFH does not impose a mass-dependent lower-limit in Ψ, in contrast to the constant SFH, meaning that the Ψ uncertainties can extend to much lower values when the photometric constraints are poor, further biasing the fit.
Comparing the results for the two different input distributions (Gaussian distributions centred on Mtot = 9.5 and Mtot = 10), we see that σ is under-estimated in all cases, even if α and β are un-biased. This is because the delayed SFH includes the extra τsfr parameter. This parameter describes the peak of star formation, but is unconstrained for an object that has not yet reached the peak of its star formation rate because τsfr negligibly affects the shape of the model SED during the rising portion of the delayed history. However, τsfr still affects the normalisation of the SED, which effectively means that the prior on τsfr will significantly affect the inferred Mtot and Ψ values when objects are best fitted by a rising SFH. The prior employed here is uniform on log(τsfr) in the range 7 < log(τsfr/yr) < 10.5 (see Table 4), which at z ∼ 5 will mean that for a significant fraction of the prior space, the templates describe rising SFHs. This is best appreciated from the right-hand panel of Fig. 7 which displays how the uniform prior on log(τsfr) is distributed over the derived specific star formation rate (sSFR). At z ∼ 5, values of log(τsfr) above the horizontal dashed line will always describe a rising SFH, whereas for values below the dashed line, the SFH will be in the rising portion only if τsfr > t. In the delayed-G scenario Ψ is well recovered but Mtot is often systematically biased to high values. This can be traced to the dependence of the normalisation of a 1 M spectral template on the τsfr parameter. The integrated SFH is given by τsfr[τsfr − e −t/τsfr (τsfr + t)]. Therefore, the template normalisation will only be independent of τsfr in the limit τsfr t. For a uniform prior on log(τsfr), high τsfr values are favoured, and for a given t, a higher τsfr will give templates with lower luminosity per solar mass thereby over-estimating the mass. We show the emergent prior on Ψ as a function of Mtot in the left-hand panel of Fig. 7. There is non-zero probability that objects can be fitted with low Ψ values below the lower solid line which delimits the edge of the prior space occupied by rising SFHs (whereas that is not true for a constant SFH), but there is still significantly more probability that the resulting fits will be constrained to a restricted region of the Ψ − Mtot parameter space. Given that τsfr is unconstrained in most fits, the under-estimated σ values are due to the prior on τsfr constraining the fits to the limited region displayed with high probability in the left-hand panel of Fig. 7. We further demonstrate the dependence of the results on the prior on τsfr in Appendix A.
The dependence of derived physical parameters on an unconstrained parameter, τsfr is clearly a problem for extending the use of the delayed SFH to high redshifts where we expect a large population of objects to be experiencing a rising SFH. An exponentially rising SFH, described by ψ ∝ exp(t/τsfr) will also provide biased Mtot and/or Ψ estimates since a change in τsfr changes the normalisation of a 1M template, yet hardly changes the shape of the SED, and so will not be constrained from broad-band photometry. We note that Carnall et al. (2019) and Leja et al. (2019) demonstrated the dependence of deriving SFHs on the priors on various SFH parameters by fitting to galaxies in the Galaxy and Mass Assembly (GAMA) survey (?). Here, we demonstrate the types of effects on the derived M − Ψ relation given the chosen priors on analytical SFH parameters.
In the const-MF scenario we consider a more physicallymotivated distribution of Mtot values, i.e. with a larger pro-portion of objects with low stellar masses, and thus poor physical parameter constraints, compared to the previous scenarios. We also include selection effects, since some objects are too faint to be detected at the chosen survey depths. Fig. 1 shows the input Mtot − Ψ distribution, as well as the objects entering our selection. The distribution of selected objects is characterised by SFRs that are higher than the average population at low Mtot. We show in Fig. 8 the results of our analysis for different stellar mass cutoffs. Unlike in Figs. 5 and 6, in Fig. 8 we also show the result of ordinary linear regression to the 'true' Mtot and Ψ values of the galaxies selected at each mass cut (grey pentagons). This allows us to identify how the selection effects themselves are biasing the measurements of the Mtot − Ψ relation. In the const-MF scenario we also consider a shallower slope for the Mtot − Ψ relation (0.8 vs 1), to ensure that we explore a scenario with an intrinsic slope which differs from the slope of the limits in Ψ imposed by the prior on t (white dashed lines in Fig. 4). Choosing a shallow slope maximises the effect of the asymmetric uncertainties that result from the age-dust degeneracy, as well as being very close to the z ∼ 5 value provided by the Speagle et al. (2014) redshiftdependent fit to the main sequence. Fig. 8 shows that the constraints on α and β obtained with the BH method for stellar mass cutoffs 10 8 − 10 8.5 M agree to within the uncertainties with the underlying input relation, once selection effects are taken into account. For lower mass cutoffs, however, the BH method overestimates the β and underestimates α compared to the relation measured for selected objects only (grey histograms), but agrees well with the input relation. It is likely the agreement with the input relation is somewhat by chance, because degeneracies among physical parameters push the derived M − Ψ relation into the region of highest posterior probability close to the lower limit in Ψ even when objects with low Ψ are on average missing from the selected objects (see Fig. 1). Fig. 8 also shows that σ measured from the 'true' input values and the BH method are biased low even before the effect of mass incompleteness becomes significant (i.e., to the right of the pink-shaded region). The reason is that galaxies with poor physical parameter constraints can have estimated stellar masses above the cutoff, i.e. galaxies can be 'scattered' into our mass-limited sample, even though the 'true' masses would be below it. When objects have poor physical parameter constraints, the uncertainties in M and Ψ tend to co-vary with higher posterior probabilities below the input relation due to parameter degeneracies. Therefore objects close to or below the relation are preferentially scattered into the sample. We discuss how to deal with these types of selection effects in section 5.2. Fig. 8 shows that the Mtot − Ψ relation estimated with the PM method closely follows the grey pentagons at all mass limits, while the PM-Sal15 and PM-San17 methods do reasonably well at measuring β and α. As for the const-G and delayed-G scenarios, the PM-Sal15 method underestimates the scatter when constraints are good. The PM-San17 method, on the other hand, underestimates the scatter when the constraints are good, but in the samples with lower Mtot limits, it overestimates the scatter because the intrinsic UV slope estimates used to correct the rest-frame UV for dust become quite uncertain. Although this uncertainty is formally accounted for in the original San17 work, our analysis demonstrates that the uncertainties in the UV slope estimation can dominate the overall uncertainty affecting the Ψ estimation.
A limitation of our analysis is that we account for dust attenuation adopting a uniform distribution of V -band optical depths between 0 and 2 (see Table 4). Current observations suggest that UV-bright galaxies have, on average, redder UV slopes and hence likely more dust attenuation than UV-faint galaxies (e.g. Stark, Ellis & Ouchi 2011;Bouwens et al. 2014;Rogers et al. 2014). Given the measured relationship between M and UV absolute magnitude (Duncan et al. 2014;Salmon et al. 2015;Song et al. 2016) this likely translates into lower mass galaxies having lower dust attenuation. Hence, in the const-MF scenario we likely overestimate the effect of dust on low-mass galaxies. This, in turn, decreases the fraction low-mass galaxies that are detected, and leads to poorer Ψ constraints at low Mtot than we might expect from current observations. We therefore repeated the analysis of the const-MF scenario assuming a uniform distribution ofτv in the range [0, 0.2]. We do not display the results here but describe briefly the differences compared to Fig. 8. We find a better agreement between α and β estimated with the BH method and those obtained from linear regression to selected galaxies down to a mass cutoff of 10 7.5 M . This effectively indicates that objects with less dust will have less biased estimates of Mtot and Ψ due to the age-dust degeneracy at lower stellar mass limits than seen in our fiducial scenarios, because the rest-frame UV photometry has higher S/N when subjected to lower levels of dust attenuation. The PM-San17 method also only becomes dominated by photometric uncertainties at ∼ 10 8 M .
The similarity between the results derived using the BH and PM methods in both the const-G and const-MF scenarios suggest that the assumptions of the underlying distribution of Mtot values implicit in the chosen method does not greatly affect the measurements of the Mtot − Ψ relation. Other effects, in fact, dominate with respect to the distribution of stellar masses, in particular the fraction of galaxies with poor physical parameter constraints, where emerging priors and degeneracies bias estimates of Mtot and Ψ, as well as the selection function of the survey.
We stress that the scenarios that we tested do not allow us to constrain the absolute Mtot limits at which we can constrain the Mtot − Ψ relation at z ∼ 5, but they rather indicate how the Mtot − Ψ relation can be biased when including faint objects with poor physical parameter constraints. As shown in Fig. 5, we may obtain tighter constraints on α, β and σ (i.e. smaller error bars) by including more low-mass galaxies, but when these galaxies have poor Ψ constraints, then we will bias the derived Mtot − Ψ relation, boosting the (hard-to-identify) effects of emerging priors and parameter degeneracies.

DISCUSSION
In this paper we have presented a Bayesian Hierarchical approach to modelling the main sequence that self-consistently propagates the uncertainties on M and Ψ estimates onto the uncertainties about the intercept, α, slope, β and scatter, σ of the Mtot − Ψ relation. We considered a set of four scenarios with increasing complexity to test key aspects of this approach and compared the results to three other methods based on ordinary linear regression. In addition, these scenarios allowed us to test standard SED-fitting procedures and their impact on main sequence parameter retrieval.
Our results show that, when fitting to broad-band photometric data, the age-dust degeneracy will cause asymmetric uncertainties when photometric constraints are poor, and these will bias measurements of the main sequence. Thus, the fraction of objects entering into the sample with poor M and Ψ constraints is one of the most important factors in main-sequence parameter estimation. Simple estimates of Ψ based on UV-to-SFR calibrations can be used to avoid these template degeneracies (but may be biased themselves if the calibrations are not suitable for the populations under study) but the simplifying assumptions will bias the measurement of intrinsic scatter. Of similar importance to considerations of bias due to template degeneracies is the treatment of sample selection and the effects of Eddington bias. Explicitly modelling the distribution of M values is of secondary importance.
In this section we further discuss our results in the context of previous measurements of the main sequence at high redshifts from photometric data. We explore prospects for investigating any mass dependence in the intrinsic scatter of this relation, both generally and specifically with JWST, as well as best practices for modelling the main sequence in order to derive estimates of the intrinsic scatter.

Measurements of intrinsic scatter at high
redshift (z 3.5) from the literature

Salmon et al. (2015)
We find that σ is somewhat under-estimated when using the method employed in Sal15 to estimate Ψ. This is likely due to the difference in age-dependence of the Reddy et al. (2012) calibration, compared to the UV-to-SFR agedependence of the Gutkin, Charlot & Bruzual (2016) stellar plus nebular models we use. However, we must note that the Reddy et al. (2012) calibration does not account for the stellar-metallicity dependence of the calibration, which would likely further under-estimate σ for a population of galaxies with a range of metallicities. We plot the agedependent calibration of Reddy et al. (2012), converted to a Chabrier (2003) IMF, compared to the calibration derived for the Gutkin, Charlot & Bruzual (2016) models over the full range of the parameter space in Fig. 9. Carton et al. (2017) found an anti-correlation between log Us and log(Z) from SDSS galaxies, with a large scatter in log Us at low metallicities. This correlation would minimise the effects of the nebular contribution to the UV-to-SFR conversion as for given log Us the ratio between the SFR and luminosity  Figure 9. Logarithm of the ratio between SFR and luminosity at 1700Å plotted as a function of age for a constant SFH. We show the conversion for the stellar models used in this work at solar metallicity as the dashed black line, as well as the Reddy et al. (2012) conversion, converted to a Chabrier (2003) IMF as the dashed red line. The red star shows the Kennicutt & Evans (2012) conversion. Additionally, we display the variation in the conversion introduced by accounting for the contribution from nebular emission. At solar metallicity, the dependence on log Us is shown as the orange lines, as indicated in the legend. The orange shaded region shows the variation in the UV to SFR conversion due to metallicity at fixed log Us = −2, where the limits in the shaded region are given by log(Z/Z ) = −2.2 (lower limit) and log(Z/Z ) = 0.4 (upper limit). The solid orange line displays the ratio of SFR to UV luminosity for our fiducial scenarios, with log Us = −2 and log(Z/Z ) = 0. at 1700Å is lower at lower metallicity. However, the results of Sanders et al. (2016) suggest that high redshift galaxies (z ∼ 2.3) have systematically higher ionisation parameters than low redshift galaxies (see also Hirschmann et al. 2017), and Fig. 9 shows that raising log Us at given metallicity increases the ratio between SFR and UV luminosity. The contribution of nebular continuum emission to the restframe UV should be accounted for in calibrations of SFR to UV luminosity, in particular when comparing samples over a wide redshift range.
As mentioned in Section 3.2, Sal15 employ a method to estimate the scatter introduced by uncertainties in M and Ψ measurements, and subtract this in quadrature from the measured scatter. Subtracting the estimated scatter due to uncertainties in M and Ψ in this way would be appropriate if these uncertainties introduce Gaussian scatter in the measured values. However, we have shown that the PM method (which should by construction provide an estimate of σ that is the convolution of the intrinsic scatter and scatter due to measurement uncertainties) does not significantly overestimate σ in the results of the const-G scenario for samples with the lowest Mtot cuts (where the fraction of objects with poor constraints in Mtot and Ψ is the highest). This is likely due to the effects of template degeneracies biasing the measurements and the imprint of the priors on t clipping the emergent prior on Ψ (e.g. Fig. 4). Sal15 use a constant SFH for fitting to the object photometry and employ t med and AUV med estimates to derive Ψ. Thus, their method is subject to the same age-dust degeneracies as our full SED-fitting approach. Subtracting M and Ψ uncertainties in quadrature is therefore likely to significantly under-estimate the intrinsic scatter about the main sequence.

Santini et al. (2017)
San17 employ a sophisticated forward modelling approach for mitigating the effects imposed by their complicated selection function resulting from selecting magnified galaxies within the Hubble Frontier Fields. When using the method of San17 for estimating Ψ, we find that σ is significantly under-estimated when the UV-slope (required to correct the UV-luminosity for dust) is well constrained (see Fig. 5), while it is over-estimated when flux errors dominate UVslope measurements (Fig. 8). San17 take these increased uncertainties in Ψ estimates toward low-S/N UV measurements into account in their analysis. However, we find that any increase in intrinsic scatter to low stellar masses is unlikely to be reliably measured with this method as the underlying assumption that all objects have the same intrinsic UV slope will break down in practice. The distribution of β slopes is taken into account to some extent in their analysis by adding an uncertainty in quadrature to other uncertainties, but this approach is appropriate only if the distribution of values arises from measurement errors rather than intrinsic properties of the population. Kurczynski et al. (2016) measure the intrinsic scatter at redshifts 0.5 < z < 3 using the same model of the main sequence that we employ (equation 1). They estimate the output joint uncertainties on M and Ψ as single, multivariate Gaussians (whereas we find that single Gaussians are often not a good representation of the uncertainties on M and Ψ, see Fig. 2), and take them into account when fitting the model parameters α, β and σ. Kurczynski et al. (2016) adopt a delayed SFH when measuring M and Ψ for their galaxies, but they do not explicitly state whether t and τsfr are effectively sampled in a uniform way, either linearly or logarithmically, in their priors. As shown in Section 4 above, as well as in Appendix A, the priors on these parameters can affect the measured Mtot − Ψ relation, biasing the intrinsic scatter to low values, and potentially masking any increase in intrinsic scatter to low stellar masses where the constraints are poorer and more influenced by the prior.

Selection effects
We have quantified two different types of selection effects on the recovery of M − Ψ parameters in the results of the const-MF scenario (shown in Fig. 8). Mass incompleteness in the selection biases the measured slope, β, to shallower values and the intercept, α, to higher ones. This is because the galaxies preferentially detected below the masscompleteness limit have high SFR (Fig. 1). At the same time, adopting a mass limit based on estimated stellar masses introduces additional biases above this mass completeness limit, in particular by biasing the scatter measurements, σ, to low values. This is an imprint of the Eddington bias, whereby the underlying mass function means that lower mass objects are more numerous and so are preferentially scattered into the selected sample compared to those being scattered out. Further to the simple imprint of Eddington bias, the age-dust degeneracy causes Mtot values to be overestimated and Ψ to be under-estimated when photometric constraints are poor, meaning that objects with 'true' values below the Mtot limit are preferentially scattered into the selection if they are below, or very close to the relation.
One can account for these biases by explicitly modelling the effects of the selection criteria and measurement uncertainties. The K07 full model includes the possibility to model selection effects (see K07, section 5.1), however, the implemented model is only appropriate for the case when the two measurements are independent, which is not useful for our purpose as the selection function depends on both Mtot and Ψ, the uncertainties are correlated between these two parameters and subject to template degeneracies. Extending the model to allow for correlated errors in Mtot and Ψ is discussed in K07, section 5.1.2, but implementation of this model is beyond the scope of the present paper, although we will investigate this type of modelling, as well as accounting for the effects of template degeneracies, in future work. San17 do explicitly account for the full complexity of the selection function but their method for deriving Ψ biases the measurement of σ. To obtain unbiased constraints on the intrinsic scatter we either need to explicitly model the selection effects, Eddington bias and bias due to template degeneracies, or we need to minimise this effect by choosing a mass cutoff that selects galaxies with tight constraints on Mtot and Ψ.

Stellar mass and SFR estimates and their impact on the search for mass-dependence of the intrinsic scatter
We have seen from the results of fitting to the const-G, delayed-G and const-MF scenarios in Section 4 that the priors imposed in the SED fitting, as well as templateset degeneracies, can significantly bias measurements of the M − Ψ relation. How can we avoid these biases while also allowing for the possibility of measuring any massdependence of the intrinsic scatter, in particular if the massdependence is due to stochastic star formation in galaxies at high redshifts? We found that employing the UV-based Ψ estimate following the method of San17 mitigates the effects of the age-dust degeneracy and the associated biases to α and β estimates, but that the assumptions it entails cause σ to be biased. One alternative would be to use stellar mass estimates from SED-fitting along with completely independent Ψ estimates, e.g. from Hα corrected for dust using the Hα/Hβ ratio, as used in the work of e.g. Shivaei et al. (2015), however even this approach may prove problematic. Considering first the stellar mass estimate, even if the photometry has high-enough S/N to avoid biases due to the age-dust degeneracy, the emergent priors on SFR imposed by a constant or delayed SFH can still affect the mass estimates. For example, if an object with a somewhat bursty SFH sits (in between bursts) below the lower limit imposed by the age of the Universe at the time of observation (lower dashed line shown in Fig. 4) and is fitted to with a constant SFH, the derived stellar mass will be biased low and/or the SFR biased high. The extent of either bias will depend on the relative S/N between rest-frame optical and UV bands. A similar scenario occurs if we fit the objects with a delayed history employing a uniform prior on log(τsfr) and log(t). There is no hard lower-limit in SFR imposed by this SFH but the emergent prior has much lower probability at low SFR (see Fig.7), meaning that when constraints on the observed fluxes are poor, the prior will influence the derived properties. These effects would mask any increase in scatter to low mass present in the population.
The possibilities of mitigating the effects of priors on stellar mass estimates are sparse. One option is to decouple SFR from the previous SFH. For example, one can construct a SFH in which the last 10 Myrs have constant SFR while the previous history be described by a burst, delayed or constant history (e.g. Curtis-Lake et al. 2013;Chevallard et al. 2018). However, this added degree of freedom leads to much poorer constraints. We demonstrate this in Fig. 10, which shows the heatplot for the joint Mtot − Ψ posterior when fitting to objects produced for the const-MF scenario but fitting to them with a constant SFH plus a recent 10 Myr burst of star formation. The priors on physical parameters set in the beagle fits are identical to those used to fit to the const-MF and const-G scenarios as described in Section 2, except that we add a recent 10 Myr of constant star formation, with a uniform prior on Ψ between −4 < log(ψ/M yr −1 ) < 4. This setup leads to much poorer constraints on measured Ψ and to M estimates that are biased high to compensate for the lack of luminous, young stars to the model fluxes when the Ψ is under-estimated.
Considering the requirement for an un-biased estimate of the SFR, correcting Hα for dust attenuation using the Hα/Hβ flux ratio is a standard procedure. However, to obtain such measurements we must account for Hβ and Hα stellar absorption. Under the assumption of smoothlyvarying SFHs, the stellar absorption varies little, and the line emission may be corrected for stellar absorption using an average correction factor. However, Hβ absorption, in particular, increases from type O to B to A stars, and so increases significantly after the first population of O-type stars die. If we relax the assumption of a smoothly-varying SFH, Hβ absorption becomes highly uncertain. Correcting Hβ nebular emission for stellar absorption in this case is impossible without spectra that have high S/N in the continuum and high-enough resolution to resolve the underlying absorption feature, in order to fit the absorption and line emission simultaneously. Even with NIRSpec, this will not be feasible at high redshifts in low-mass galaxies where stochastic star formation may dominate. Hα cannot, therefore, be corrected for dust attenuation using Hβ plus some average stellar absorption correction. The measured Hα/Hβ flux ratio will depend not only on dust attenuation, which it is being used to correct for, but also on the ratio of recent to longer-term SFR, as well as the duration of the most recent burst.
In principle, fitting to the broad-band fluxes together with the measured Hα and Hβ line fluxes with a twocomponent SFH would improve the constraints on stellar mass while self-consistently accounting for the uncertainty in Hβ and Hα stellar absorption. Including Hα fluxes alone in the fitting can already improve the Mtot estimates. This is demonstrated in the top panel of Fig. 11, which shows the distribution of differences between measured and input Mtot values (for all objects with log(mtot,in/M ) > 8.5) when fitting to the const-MF scenario with a two-component SFH. When fitting to broad-band fluxes we show that the masses are biased high, due to the large uncertainties in Ψ, which can be seen in Fig. 10. When we include Hα fluxes in the fitting (where the input Hα flux for each galaxy has been perturbed by Gaussian errors such that S/N∼ 5) we find that the constraints on Ψ and Mtot improve dramatically, with the measured Mtot values only marginally biased compared to the input values. However, even when including Hα and Hβ in the fitting at high S/N, there still exists a degeneracy between luminosity-weighted stellar age at λrest = 4861Å and dust-attenuation on the Hα/Hβ ratio, which biases the SFR estimates. This is demonstrated in the bottom panel of Fig. 11 where we now perturbed Hα fluxes to meet a S/N of 25, and assumed a flat noise model to add noise to the corresponding Hβ measurement. Including Hα and Hβ significantly improve the constraints on Ψ, but the estimates are still biased to a value of order the scatter that we wish to measure. One would therefore further require good constraints on the rest-frame ultraviolet spectral slope, either through very high S/N broad-band photometry or by fitting to NIRSpec R100 spectra. Chevallard et al. (2018) find unbiased recovery of current SFR derived by fitting to R100 spectra with two-component SFHs (see their fig. 13 and table 4).

What can we measure with JWST
As discussed in Section 4, spectral template degeneracies act to bias estimates of M and Ψ. The S/N of the photometry fitted to has a direct impact on the tightness of derived constraints on M and Ψ, and hence, on whether or not the estimates will be biased. For example, the difference between output and input Mtot and Ψ in the SED fits to the objects in the const-G scenario described above depends on the S/N of the NIRCam F090W and F444W photometry, the estimates starting to become biased when the S/N falls below ∼ 20. We use the python version of pandeia (Pontoppidan et al. 2016), the JWST exposure time calculation (ETC) tool, to translate these required S/N values to required exposure times in the two NIRCam filters. The tool provides S/N for a given exposure setup, so we invert the calculation to provide the exposure time at given S/N by calculating the S/N on grids of flux and exposure time. One is required to set various exposure settings within the tool, which we summarise in Table 6. 7 The top two panels in Fig. 12 display the M − Ψ sequence colour-coded by the exposure time to reach a S/N of 20 in the F090W and F444W bands. This main sequence was constructed from a mock catalogue of 1000 objects with M drawn from the Duncan et al. (2014) z ∼ 5 stellar mass function, and Ψ drawn from the M − Ψ relation measured by San17 (α = −7.748, β = 0.94) with a constant intrinsic scatter of σ = 0.3. The mock galaxies were constructed with constant SFHs without any dust. We see that in the absence of dust, NIRCam should reach masscomplete samples with the required S/N down to M ∼ 8.5 in 30 hours per filter. It is likely that a similar exposure time is required in the full complement of NIRCam filters, although we have not investigated the goodness-of-fit with variable depths in each filter. We indicate how this exposure time is affected by the presence of dust with the dotted white line which shows which objects would require 30 hours of integration to reach S/N∼ 20 when dust is added to the SEDs using the CF00 dust law withτv = 1 and µ d = 0.4.
As discussed in Section 5.3, fitting to SEDs with a constant SFH would mask any increase in scatter to low stellar masses. Fitting with a two-component SFH is more appropriate for assessing any mass dependence of the intrinsic 7 the readout patterns for NIRCam https://jwstdocs.stsci.edu/display/JTI/NIRCam+Detector+Readout+Patterns and NIRSpec https://jwst-docs.stsci.edu/display/JTI/NIRSpec+ Detector+Recommended+Strategies are described in the respective jdocs pages. scatter, but this requires extra constraints. Incorporating Hα constraints in the beagle fitting can improve M constraints, but using Hα as a SFR estimate after correcting for dust with Hβ is complicated by the uncertain contribution of stellar absorption if the SFH is not smooth. Alternatively, we can incorporate Hα and Hβ fluxes into the beagle fitting, although this will still require observations to break the degeneracy between the effects of luminosity-weighted stellar age at λrest = 4861Å and dust-attenuation on the Hα/Hβ ratio. Such observations can be provided by NIR-Spec R ∼ 100 continuum-spectroscopy over the rest-frame UV. Chevallard et al. (2018) fitted two-component SFHs to simulated NIRSpec R ∼ 100 spectroscopy. A per-pixel S/N of ∼ 7 within the region of λrest = 1500Å provides un-biased estimates of Ψ (private communication). The bottom three panels in Fig 12 show the exposure times required to reach S/N∼ 5 in Hα and Hβ at R ∼ 1000 and S/N∼ 7 per pixel at R ∼ 100 in the spectral continuum. The exposure times were estimated assuming the sources are point sources and centred on the central shutter of a 1x3 array of open slitlets in the micro-shutter array (MSA). These plots indicate that the constraints required to probe the intrinsic scatter of M − Ψ down to M ∼ 8.5 − 9 could be obtained with NIRSpec within 30 hours of integration. The R100 mode on NIRSpec observes the full 0.7 − 5µm range in a single exposure, but at R1000 the full spectral range is covered with three different filters. Thus, reaching the required S/N on both Hα and Hβ requires ∼ 30 hours of integration in two different NIRSpec filters. We note that, although this is a highly idealised scenario, we are most interested in how far we can push measurements of intrinsic scatter down the mass function. At low stellar masses, we are more likely to meet the conditions of low amounts of dust and small objects that are only partially resolved with JWST.

Modelling complexity in SED fitting
Increasing modelling complexity in SED fitting introduces degeneracies that can hamper physical parameter determination. It can be advantageous to avoid these degeneracies by, for example, decoupling the estimation of Ψ from the attenuation and age estimates derived from SED-fitting. Doing so requires simplifying assumptions, as for the method used by San17 to estimate SFR. However, our results show that when searching to constrain the level of variety in a population, rather than population mean relations, this approach is flawed. We must instead include in our models the variety that we hope to constrain in the population, and then carefully take account of the uncertainties and biases that this approach will introduce. Attenuation by dust is a key uncertainty in galaxy spectral fitting. Reddy et al. (2012) find that using the Calzetti et al. (2000) attenuation curve when fitting to photometry of high redshift galaxies can lead to un-physically low ages. More physically plausible ages are obtained by using a steeper attenuation curve similar to the SMC extinction 8 8 We follow the standard nomenclature and refer by attenuation curve (Pei 1992). Recent studies constraining dust properties based on the relation between ratio of far-infrared to ultraviolet luminosities and ultraviolet spectral slope (the IRX-β relation) appear to give contradictory results, favouring either an SMC-like curve (Reddy et al. 2018) or a Calzetti-like curve (McLure et al. 2018) to describe the dust attenuation of z ∼ 2 galaxies. McLure et al. (2018) suggest that the difference can be attributed to whether galaxies are stacked by stellar mass or UV slope, with stacks according to UV slope yielding a steeper UV attenuation curve. Nevertheless, fig. 5 of Reddy et al. (2018) clearly shows how strongly inferences on the dust attenuation curve from the IRX-β relation depend on the assumed intrinsic ultraviolet slope of galaxies.
Here we propose using the CF00 dust prescription to provide a way to marginalise over the uncertainty in dust attenuation curves in a physically-consistent way. As described in Section 2, the CF00 two-component model accounts for the extra attenuation of light from young stars that still reside in their birth clouds compared to that from older stars floating in the diffuse ISM. The ISM has an attenuation curve which is modelled with a power law slope of 0.7, while the power law slope of modelled attenuation curve of the birth clouds is 1.3. A result of this prescription is an effective, galaxy-wide attenuation curve that is dependent on the distribution of stellar ages in the galaxy. This is best appreciated in Fig.13, showing that varying current-to-past star-formation rate provides effective, galaxy-wide attenuation curves that approximate both the Calzetti et al. (2000) and SMC dust curves. This plot was produced by sampling 1000 random objects from within the prior space used to fit the two-component SFHs shown in Fig. 10. Salmon et al. (2016) find a correlation between dust colour excess, E(B − V ), and slope of the attenuation curve derived from SED-fitting to broad-band photometric data while allowing the slope of the dust law to vary. This confirms the quasi-universal relation identified by Chevallard et al. (2013) between slope of the attenuation curve and V -band attenuation optical depth in the diffuse ISM, at all galaxy inclinations, from the analysis of a range of sophisticated radiative transfer models. Based on this finding, Chevallard et al. (2013) extend the two-component dust law of CF00 by making the power-law slope of the diffuse ISM depend on the V -band optical depth, in such a way that the geometry of dust with respect to stars and galaxy inclination effects are accounted for. We have not employed the Chevallard et al. (2013) model in this work as it was based on simulations of galaxies that resemble local galaxies with thin and thick stellar discs as well as bulges. These simulations are unlikely to resemble star-forming galaxies at very high redshift (z 3). However, given the results of Salmon et al. (2016), we highlight that it would be important to include these geometric effects if studying galaxies at lower redshifts z 3.
to the combined effects of absorption and scattering in and out of the line of sight to a galaxy caused by both local and global geometric effects, while the term extinction is reserved for photon absorption along and scattering out of a single line of sight ( Figure 13. The effective galaxy wide attenuation curve optical depth of 1000 galaxy templates with two-component SFHs (constant SFH with recent 10 Myr burst of star formation). The templates were drawn from the prior space used to fit the const-MF scenario (see Fig. 10, and description in section 5.3). Each line is colour-coded by Ψ 10 /Ψ 100 , where Ψ 10 is Ψ averaged over the last 10 Myr, and Ψ 100 is the Ψ averaged over the last 100 Myr. Also plotted in the thick, solid, orange line is the SMC attenuation curve as measured by Pei (1992), and the dashed, orange line shows the Calzetti et al. (2000) attenuation curve.
Another key uncertainty is the calibration between Hα and SFR. Hα is a direct tracer of the ionizing flux of a stellar population, but which depends on the demographics and properties of the population of hot stars, and in turn on the stellar initial mass function, the stellar tracks and atmospheres, stellar rotation physics and whether or not binaries are accounted for in the stellar population synthesis models. In these tests we marginalise over the variation in Hα due to stellar metallicity and log Us only.
We note that including more uncertain physics introduces a risk of over-fitting to the data and also diluting the information the data contain in the sea of uncertainties. We should be looking, therefore, to include enough physics to encompass the underlying probable variety within the population, to then find what observables are required to constrain the physical properties we are interested in (as in the approach described in Section 5.3). We should include here at least enough variation to allow us to investigate the intrinsic scatter in the main sequence, and possible dependence on stellar mass, namely: (i) Variation in past to present SFR; (ii) Variation in stellar and nebular emission due to metallicity, log Us (and ξ d if using constraints from lines other than Hydrogen); (iii) Variation in the dust attenuation curve.
Different spectral synthesis models allow one to explore different underlying physical models such as: inclusion or not of binary stars, which have been incorporated into the BPASS models (Binary Population and Spectral Synthesis; Eldridge et al. 2017;Stanway & Eldridge 2018); stellar rotation which can be explored using starburst99 (Leitherer et al. 1999), using the Geneva stellar evolutionary tracks for single rotating stars (Ekström et al. 2012;Georgy et al. 2013); treatment of hot stellar atmospheres, in particular those of hot O-and B-type stars, and Wolf-Rayet stars for which the tlusty model grid of Hubeny & Lanz (1995, for O-and B-type stars), and powr library of Hamann & Gräfener (2004, for Wolf-Rayet stars) are incorporated into the stellar plus nebular models of Gutkin, Charlot & Bruzual (2016). The differences between evolutionary stellar spectral synthesis models likely cause systematic differences in our inferences. However, the underlying physical models are not generally parameterized in such a way that they can be sampled over continuously, which would allow the associated uncertainties to be subsumed in our measurement uncertainties. The best way to encompass those uncertainties currently would be to compare main-sequence determination obtained with different model assumptions and see to what extent they provide the same qualitative evolution and intrinsic scatter estimates.

Suitability of the chosen model of the main sequence
The model adopted in this paper to describe the main sequence (equation 1) is quite simple, primarily because current data at high redshift have had limited constraining power for more complicated models. In this context, the shape of the main sequence at low redshift as measured from SDSS or from simulations of galaxy formation and evolution can provide insight into how our model may need to be improved once more complete high-redshift samples and better Ψ constraints are available. For example, at low redshift, there is evidence for a flattening of the relation (Whitaker et al. 2014), and potentially a rise in the intrinsic scatter toward high stellar masses (Guo, Zhong Zheng & Fu 2013), where feedback from the growth of a central super-massive black hole can quench star-formation (Matthee & Schaye 2019). Our current model does not account for this type of mass dependence in the intrinsic scatter, and we have only discussed the possibility of measuring and increase in scatter at the low-mass end.
To a certain extent, the inferred form of the main sequence at high mass is sensitive to how one selects starforming vs. non star-forming galaxies as well as how the main sequence is defined. For example, Renzini & Peng (2015) offer an objective definition of the main sequence as the ridge in the mass-sfr-number density plot. By this definition the relation is measured from the mode in the SFR distribution at given stellar mass, which is less sensitive to the definition of what constitutes a star-forming galaxy than the median or mean. Renzini & Peng (2015) see no flattening of the main sequence to high stellar masses when it is measured in this way. In essence our model is consistent with this definition of the main sequence. High mass sources with lower number-density that sit below the main relation will not strongly impact the derived M − Ψ properties unless the mass completeness limits themselves are very high and so only these objects are included. As such, the model we employ is not suitable for tracing the flattening of the main sequence or the mass-dependence of the scatter at high stellar mass due to AGN feedback. An extension to the model would have to allow for asymmetric scatter at the high mass end.
It is also worth noting that we have investigated here a model with intrinsic scatter only in the direction of Ψ, which is standard for studies of the main-sequence at high redshift. At low redshift, however, it is often standard to measure the scatter in specific SFR, or even perpendicular to the main-sequence. For a population with uniform distribution in the independent variable and Gaussian scatter about a linear relation in the dependent variable, the corresponding scatter perpendicular to the relation will also be Gaussian. As such, modelling the scatter in the y-direction or perpendicular to the relation are essentially equivalent to each other. However, this is no longer the case when including the number density distribution imposed by a stellar mass function. Specifically, if the intrinsic scatter is Gaussian perpendicular to the relation, the scatter will be skewed to large SFR values in the y-direction. One way to mitigate the biases introduced by an incorrect assumption of the form of the intrinsic scatter would be to allow the scatter to be non-symmetric, e.g. by introducing skew to the Gaussian scatter.

Limitations of this analysis
We have not considered so far the photometric-redshift quality in the selection criteria and ignored its influence on physical parameter uncertainties (we only allow redshift to vary over the range 4.5 < z < 5.5 in the SED fitting). As shown by Kemp et al. (2019), photometric redshift uncertainties impact stellar mass estimations significantly for faint objects or objects with significant dust attenuation. Yet, as discussed in Section 5.3 (see also Section 5.4), spectroscopic information is required to investigate the mass-dependence of intrinsic scatter in the relation, implying that all objects will have to have spectroscopic redshifts to perform this style of analysis.
All the tests presented in this paper were performed measuring M and Ψ with the 'correct' SFH (i.e. adopting the same SFH in the fitting as used to produce the spectra). We also fit with the same stellar and nebular emission models that were used to produce the spectra. We will break this criteria in future work.
Finally, we note that the nebular-emission models of Gutkin, Charlot & Bruzual (2016) assume ionizationbounded nebulae opaque to hydrogen-ionizing photons (fesc = 0 where fesc is the escape fraction of ionizing photons). Plat et al. (2019) investigate the effects of densitybounded nebulae allowing for some escape of ionizing photons. Incorporating these models into our analysis is beyond the scope of the present paper.

CONCLUSIONS
We present a Bayesian Hierarchical approach, based on the model of Kelly (2007), to self-consistently propagate the full measurement uncertainties on stellar mass and SFR derived from SED fitting to the derived population-wide main sequence parameters (intercept, α, slope, β and intrinsic scatter, σ). The Bayesian Hierarchical model takes as input the joint posterior probabilities on M and Ψ from beagle, or any other Bayesian SED-fitting code, and in turn provides self-consistent posterior probability distributions on α, β and σ. In this work we model the main sequence as a linear relation with uniform, Gaussian intrinsic scatter (see equation 1) and test the Bayesian Hierarchical approach on a set of idealised scenarios while comparing to standard methods employed in the literature. It is possible to extend the model to allow mass-dependence of the intrinsic scatter and we will explore this in future work.
With idealised scenarios for which we know the input main sequence, we show that the Bayesian Hierarchical method provides robust determinations of the main sequence parameters and their uncertainties when fitting to data using a constant SFH and when photometric constraints are good. However, the intercept and slope in particular become biased by template set degeneracies when photometric constraints are poor. Specifically, the degeneracy between age and dust attenuation causes Ψ and M estimates to be biased low and high respectively, which leads to measurements of the main sequence that are steeper and with lower intercept than the input relation. We also find that the intrinsic scatter is under-estimated when fitting to mock photometry using a delayed SFH. This is true, even when the photometric constraints are robust, because the τsfr parameter is unconstrained in the case of rising histories. The results on all main-sequence parameters are sensitive to the form of prior set on τsfr, especially when fitting to objects with rising SFHs.
Decoupling the stellar mass and SFR estimates by estimating the SFR from a dust-corrected rest-frame UV luminosity may reduce the impact of template set degeneracies on the measurements of intercept and slope, but the simplifying assumptions used lead the estimates on intrinsic scatter to be under-estimated. When investigating variation within the population, the modelling must include enough complexity to replicate the likely variation we expect to see. Specifically, correcting MUV for dust using a Meurer, Heckman & Calzetti (1999) relation will attribute variation in UV-slope due to age, present-to-past star formation activity or metallicity, to dust attenuation.
The biases on individual object M and Ψ constraints, introduced by template degeneracies, also impact which galaxies would be selected when imposing a given cut on measured M . Thus, sample selection based on expected mass completeness alone is not sufficient to prevent bias of the intrinsic scatter measurement. The degeneracy acts to bias M high and Ψ low, which in turn means that objects scattered into the selection based on measured M are either below or close to the true relation. This will bias the measured scatter to low values, which could mask any in-crease in intrinsic scatter to low masses present in the galaxy population.
In addition to the above considerations, standard SEDfitting assumptions of constant, rising or delayed SFHs are not appropriate for investigating whether the intrinsic scatter increases to low stellar masses. In particular, the underlying prior on age (and τsfr for the case of delayed histories) restricts M and Ψ estimates to an increasingly constricted region of the M − Ψ plane with increasing redshift. For this reason it is not advised to use these SFHs to provide the stellar mass estimates if aiming to investigate any potential mass dependence of the intrinsic scatter, even if they were to be used with independent SFR estimates. We propose the use of two-component star formation histories (e.g. constant SFH with a 10 Myr burst of recent star formation) teamed with Hα fluxes to provide stellar mass constraints free from the biases imposed by standard SFH priors. However, in order to obtain un-biased Ψ estimates, we find that correcting Hα for dust attenuation using measured Hβ fluxes is insufficient. This is because once the possibility of stochastic star formation is accounted for, the level of Hβ stellar absorption becomes incredibly uncertain, and this uncertainty must be marginalised over in the fitting. This may only be avoided if high S/N and high resolution spectroscopy is available so that the Hβ absorption and emission lines can be individually measured. The uncertainty in Hβ absorption means that Hα/Hβ is dependent on dust attenuation as well as the luminosity-weighted age of stars at Hβ, which is in turn dependent on present to past star formation activity. This added degeneracy can be broken by fitting to NIRSpec R100 spectra using two-component SFHs.
We find that for the filter set and corresponding depths probed with our idealised scenarios, a S/N∼ 20 in NIR-Cam F090W and F444W filters is sufficient to avoid biases introduced by the age-dust degeneracy. Based on simple point-source exposure time estimates calculated with pandeia (Pontoppidan et al. 2016), we find that NIRCam should reach S/N∼ 20 down to log(m /yr) ∼ 8 at z ∼ 5 in F090W and F444W within 30 hours of integration per filter. Hα and Hβ are required to investigate any mass-dependence of the intrinsic scatter. R1000 NIRSpec spectroscopy will reach S/N∼ 5 down to log(m /yr) ∼ 8 in Hα within 30 hours, and down to log(m /yr) ∼ 8.5 − 9 with Hβ (assuming point sources and no dust attenuation). R100 NIRSpec spectra may help to break degeneracies between dust attenuation and luminosity-weighted stellar age at Hβ introduced by allowing for SFHs to be stochastic, and can reach S/N∼ 7 per pixel at log(m /yr) ∼ 8.5 within 30 hours of integration. These results suggest it will be possible to investigate the form of the main-sequence and its intrinsic scatter to high redshifts, while avoiding template-set degeneracies and restrictive priors imposed by standard SED-fitting assumptions, by using JWST /NIRCam photometry teamed with JWST /NIRSpec. employed in the delayed-G scenario, which is namely a uniform prior on log(τsfr) between 7 < log(τsfr/yr) < 10.5. The second prior is a uniform prior on 1 τsfr (as explored in Carnall et al. (2018) investigating how well parametric SFHs can be constrained from broad-band photometry) between 10 −10.5 < 1 τsfr /yr −1 < 10 −7 . We display the emergent priors on Ψ as a function of Mtot in the top two panels of Fig. A1, with the emergent priors due to the uniform prior on log(τsfr/yr) displayed on the top left, and those due to the uniform prior on 1/τsfr on the top right panel.
We fit to the delayed-G scenario with input Mtot distribution given by Mtot ∼ N (10, 0.25) with each of these two priors. The two-dimensional joint posterior distribution for the population of objects are displayed for each prior in the middle two panels of Fig. A1, as well as the constraints on α, β and σ derived for samples with different defined cuts in Mtot (bottom two panels). The middle two panels show that where the constraints are poor (Mtot 10), the joint posterior resembles the emergent priors displayed in the top two panels. A uniform prior on 1/τsfr leads to higher posterior probability to lower Ψ than in the fits using the uniform prior on log(τsfr), and as a results σ is over-estimated, whereas it is under-estimated for the uniform prior on log(τsfr). These tests show the extent to which inferences on population-wide parameters can be affected by the choice of prior on the τsfr parameter, which clearly shows that when fitting any kind of SFH to data, we need to understand which parameters are constrained by the data, and where the priors are influencing our results when there are parameters that are not constrained by the data. Plotting emergent priors and joint posterior probability distribution functions like those shown in Fig. A1 is an effective way of investigating these effects.  Figure A1. Fitting to the delayed-G scenario for an Mtot distribution given by Mtot ∼ N (10, 0.25) with a uniform prior on log(τsfr) (left column) and with a uniform prior on 1/τsfr (right column). The top row displays the emergent prior on Ψ for each prior as a function of Mtot. The middle row displays the 'heatplot' of samples from the posterior probabilities for fits performed with each prior to mock photometry. The bottom row displays the constraints on α, β and σ derived using these fits for samples with different lower limits in Mtot.