Scalable hierarchical BayeSN inference: Investigating dependence of SN Ia host galaxy dust properties on stellar mass and redshift

We apply the hierarchical probabilistic SED model BayeSN to analyse a sample of 475 SNe Ia (0.015<z<0.4) from Foundation, DES3YR and PS1MD to investigate the properties of dust in their host galaxies. We jointly infer the dust law $R_V$ population distributions at the SED level in high- and low-mass galaxies simultaneously with dust-independent, intrinsic differences. We find an intrinsic mass step of $-0.049\pm0.016$ mag, at a significance of 3.1$\sigma$, when allowing for a constant intrinsic, achromatic magnitude offset. We additionally apply a model allowing for time- and wavelength-dependent intrinsic differences between SNe Ia in different mass bins, finding $\sim$2$\sigma$ differences in magnitude and colour around peak and 4.5$\sigma$ differences at later times. These intrinsic differences are inferred simultaneously with a difference in population mean $R_V$ of $\sim$2$\sigma$ significance, demonstrating that both intrinsic and extrinsic differences may play a role in causing the host galaxy mass step. We also consider a model which allows the mean of the $R_V$ distribution to linearly evolve with redshift but find no evidence for any evolution - we infer the gradient of this relation $\eta_R = -0.38\pm0.70$. In addition, we discuss in brief a new, GPU-accelerated Python implementation of BayeSN suitable for application to large surveys which is publicly available and can be used for future cosmological analyses; this code can be found here: https://github.com/bayesn/bayesn.


INTRODUCTION
Type Ia supernovae (SNe Ia) are bright, thermonuclear explosions of carbon-oxygen white dwarfs in a binary system (e.g., see Maguire 2017).They are excellent distance indicators, were used for the discovery of the accelerated expansion of the universe (Riess et al. 1998;Perlmutter et al. 1999), and play a central role in precisely constraining the properties of dark energy, for example its equation of state () and time dependence (  ) (Brout et al. 2022;Dark Energy Survey Collaboration et al. 2024).SNe Ia are also crucial for local measurements of the Hubble Constant ( 0 ) via the distance ladder (Riess et al. 2022), which is currently in ∼ 5 tension with the inference from the early universe (Planck Collaboration et al. 2020).
While the current best constraints on  are dominated by statistical errors (Vincenzi et al. 2024), the order-of-magnitude larger samples expected from upcoming surveys such as the Legacy Survey of Space and Time at Vera Rubin Observatory (LSST; Ivezić et al. 2019)as well as complementary low-redshifts surveys such as the Young Supernova Experiment (YSE; Jones et al. 2021;Aleo et al. 2023) and Zwicky Transient Facility (ZTF; Bellm et al. 2019a,b) -means obtaining more precise cosmological constraints will increasingly rely on better understanding of systematics.Improving the constraints on ★ Contact e-mail: mg2102@cam.ac.uk cosmological parameters from SNe Ia hinges on improved physical understanding of the nature of these events.In this paper, we present analysis of a combined sample of 475 SNe Ia using the hierarchical Bayesian spectral energy distribution (SED) model BayeSN (Mandel et al. 2022;Thorp et al. 2021), focusing on how their properties vary as a function of both host galaxy stellar mass and redshift.In addition, we present a new GPU-accelerated Python implementation of BayeSN which we make publicly available.
A key open question within the field at present is the nature of the host galaxy 'mass step', whereby supernovae in higher mass galaxies are on average brighter than those in lower mass galaxies after standardisation (Sullivan et al. 2010;Kelly et al. 2010).This effect has been consistently observed in optical samples with a mass step of ∼0.04 − 0.1 mag (e.g. Kelly et al. 2010;Sullivan et al. 2010;Lampeitl et al. 2010;Gupta et al. 2011;Childress et al. 2013;Betoule et al. 2014;Jones et al. 2018Jones et al. , 2019;;Roman et al. 2018;Smith et al. 2020;Kelsey et al. 2021), with the split between high-and lowmass hosts typically placed between 10 10 M ⊙ (e.g.Sullivan et al. 2010) and 10 10.8 M ⊙ (e.g. Kelly et al. 2010).Relations have also been demonstrated between SN luminosity and other environmental properties, including star formation rate (SFR), specific star formation rate (sSFR) and rest-frame colour (e.g.Lampeitl et al. 2010;Sullivan et al. 2010;Briday et al. 2022).Subsequent studies have demonstrated that SN luminosity often show a stronger relation with local properties, derived from the region in which each SN exploded rather than from the whole galaxy (e.g.Rigault et al. 2013Rigault et al. , 2015Rigault et al. , 2020;;Jones et al. 2015;Moreno-Raya et al. 2016a,b;Jones et al. 2018;Kim et al. 2018;Roman et al. 2018;Kim et al. 2019;Rose et al. 2019;Kelsey et al. 2021).For example, Rigault et al. (2020) found a step of 0.163±0.029mag when looking at local sSFR, compared with a mass step of 0.119±0.032mag on the same sample.Kelsey et al. (2021) found that the maximum step size increased by ∼ 40 per cent when considering the mass enclosed in an aperture around the SN position rather than the total mass of the host galaxy.
The exact physical cause of the mass step remains open for debate.It could, for example, be a result of intrinsic differences in the SN populations in high-and low-mass galaxies; a number of works have posited that this effect could result from a difference in SN progenitors in different environments (e.g.Sullivan et al. 2010;Childress et al. 2014;Kim et al. 2018;Rigault et al. 2020;Briday et al. 2022).Alternatively, the mass step could arise from extrinsic differences; Brout & Scolnic (2021) proposed that it can be explained entirely by differences in host galaxy dust properties between high-and lowmass SN hosts, requiring no difference in intrinsic SN populations.
Several works have found results consistent with the view that extrinsic effects can explain the mass step.The aforementioned study presented in Brout & Scolnic (2021) finds that a dust parameter   distribution with a mean   of 1.50 ± 0.25 for high-mass hosts and 2.75 ± 0.35 for low-mass hosts, along with a wide standard deviation of   = 1.3 ± 0.2, provides the best match to the SALT2 (Guy et al. 2007) fits of 1445 SNe Ia from the Carnegie Supernova Project I (CSP-I; Contreras et al. 2010;Stritzinger et al. 2011;Krisciunas et al. 2017), the CfA (Hicken et al. 2009(Hicken et al. , 2012)), Foundation (Foley et al. 2018;Jones et al. 2019), the Pan-STARRS-1 Medium Deep Survey (Rest et al. 2014;Scolnic et al. 2018), Supernova Legacy Survey (Astier et al. 2006;Betoule et al. 2014), SDSS-II (Frieman et al. 2008;Sako et al. 2011Sako et al. , 2018) ) and the Dark Energy Survey (DES; Dark Energy Survey Collaboration et al. 2016;Brout et al. 2019).Popovic et al. (2023) updates the methods of Brout & Scolnic (2021) to perform inference using an MCMC sampler, finding   for highand low-mass hosts of 2.138±0.25 and 3.026±0.375respectively for the Pantheon+ sample of 1701 SNe Ia (Brout et al. 2022).Wiseman et al. (2023) finds that a model with a galaxy age-varying   and no intrinsic luminosity difference replicates the observed SN population well, although does not rule out that intrinsic differences may play a role in the mass step.Meldorf et al. (2023) analysed the dust laws of 1100 SN host galaxies in DES and found a significant difference in   between high-and low-mass galaxies of ∼0.7, albeit this was based on analysis of the attenuation law of the full galaxy rather than focusing on extinction along the line-of-sight to SN positions.
In contrast, a number of other works have instead been consistent with the view that the mass step results at least in part from intrinsic differences.For example, González-Gaitán et al. (2021) finds evidence for a mass step alongside a difference in   values between high-and low-mass galaxies, and posits that this may result from a difference in intrinsic colour.In a separate analysis of attenuation laws of SN host galaxies in DES, Duarte et al. (2023) found that host galaxy dust differences cannot fully explain the mass step.Jones et al. (2023) presented a version of the SALT3 model (Kenworthy et al. 2021) which incorporates a relation between host galaxy stellar mass and the SN Ia SED, finding evidence for differences in spectroscopic and photometric properties of SNe Ia between each bin, although in the latter case the model could not discern intrinsic differences from those caused by dust.Taylor et al. (2024) applied the SALT3 model to two separate samples split based on host galaxy stellar mass, also finding some differences between the two although not discerning between intrinsic colour and dust reddening.Recent analysis of the DES-SN 5YR sample of SNe Ia from DES (Dark Energy Survey Collaboration et al. 2024) found inconsistencies between parameters inferred from simulations based on a dust-only mass step and those inferred from real data, suggesting that dust alone cannot explain the mass step (Vincenzi et al. 2024).
Additionally, if the mass step were solely a result of host galaxy dust, one would expect that no mass step would be observed in nearinfrared (NIR) wavelengths where dust -particularly the value of   -has little effect.Despite this, several works have found evidence for a mass step in NIR wavelengths.Ponder et al. (2021) analysed 143 SNe Ia in -band and found a significant step of 0.13±0.04 at a mass of 10 10.43  ⊙ ; omitting notable outliers, the most significant step is 0.08 ± 0.04 at 10 10.65  ⊙ .Uddin et al. (2020) analysed a sample of 113 SNe Ia from CSP-I, fitting them using the max_model method within SNoopy (Burns et al. 2011) across   bands.Based on a split at the median stellar mass of the sample, 10 10.65  ⊙ , they find an -band step of 0.093 ± 0.043 mag and a -band step of 0.090 ± 0.043 mag -overall, they find NIR steps of comparable significance to the optical steps and that the relation between step size and wavelength was inconsistent with a dust-driven step.Jones et al. (2022a) analysed a sample of 79 SNe, including 42 low redshift SNe (z < 0.1) from CSP-I and 37 higher redshift (0.2 < z < 0.7) SNe with rest-frame NIR, and found a mass step from NIR light curves of 0.072 ± 0.041 mag at mass of 10 10.44  ⊙ .
However, it should be noted that other NIR analyses have not found evidence for a mass step.Johansson et al. (2021) analysed a sample of ∼240 SNe Ia using the colour_model method within SNooPy, assuming a fixed host galaxy   = 2.0.This study finds NIR mass steps in  and −bands of 0.021 ± 0.033 mag and −0.020 ± 0.036 respectively, both consistent with zero, although not inconsistent with the steps found in Uddin et al. (2020) considering the uncertainties.They also find the mass step to be consistent with zero across optical and NIR bands when allowing   to be a free parameter for each SN.
Most recently, Uddin et al. (2023) analysed a sample of 325 SNe from CSP-I and CSP-II (Phillips et al. 2019), including the sample of 113 SNe analysed in Uddin et al. (2020).Using a more recent version of the max_model method within SNooPy, they find no significant mass step in any optical or NIR band, with the possible exception of -band which shows a step of −0.151 ± 0.069 mag.They also demonstrate that their inferred correlations between host mass and luminosity do not vary with wavelength.Both of these findings are consistent whether using only the CSP-I sample from Uddin et al. (2020) or the combined CSP-I and CSP-II sample.Additionally, Karchev et al. (2023a) applied Bayesian model comparison using simulation-based inference (SBI) and found results disfavouring both intrinsic and extrinsic differences.
The debate between a mass step driven by extrinsic effects and intrinsic variation in the SN population closely relates to a general challenge in SNe Ia -disentangling the intrinsic distribution from the effects of host galaxy dust extinction.It is challenging, for example, to tell apart an intrinsically red SN from an intrinsically typical SN which happened to explode in a dusty environment.SALT (Guy et al. 2007;Kenworthy et al. 2021) is the most widely used model for SN Ia analyses, however a key limitation is that it compresses all colour information into a single  parameter, which roughly corresponds to peak  − apparent colour.As demonstrated in Mandel et al. (2017), with this approach care must be taken to avoid confounding intrinsic variation with dust.While some SALT-based analyses do break down the  parameter into separate intrinsic and dust components (e.g.Mandel et al. 2017;Brout & Scolnic 2021;Popovic et al. 2023), SALT is trained with a single colour law describing the effect that the apparent colour parameter  has on the underlying SN SED, meaning that it cannot accommodate the possibility of different dust laws when fitting observed light curves.
Unlike SALT, SNooPy does include an explicit treatment of dust extinction.However, in addition to host galaxy dust it is also important to incorporate the intrinsic colour distribution of SNe, including variations beyond those correlated with light curve shape.Excluding this effect can bias estimates of   (Mandel et al. 2017) and lead to overestimates of the colour variation caused by an   distribution (Thorp & Mandel 2022).
The challenge of disentangling dust from intrinsic SN variation has led to the development of hierarchical Bayesian models for analysing samples of SNe Ia.Hierarchical modelling was first applied to this end in Mandel et al. (2009Mandel et al. ( , 2011) ) and again in Burns et al. (2014); the former method has since been developed further into the hierarchical SED model BayeSN (Thorp et al. 2021;Mandel et al. 2022).The advantage of a hierarchical approach is that it allows for joint inference of population level and individual SN parameters.Specific to this problem, it allows for explicit, separate treatments of the intrinsic and host galaxy dust distributions of SNe Ia, with the populationlevel parameters inferred while marginalising over all individual SN (latent) parameters.Hierarchical modelling also provides more precise inference of the width of a population, when compared with simply taking the standard deviation of a number of individual estimates; the latter approach will lead to overestimates as it fails to take into account errors on the individual estimates (see discussion in Section 5 of Thorp & Mandel 2022).Wojtak et al. (2023) presents a recent analysis of SNe Ia using hierarchical Bayesian modelling, finding evidence for two separate populations although not splitting the population directly based on host galaxy stellar mass.
Both Thorp et al. (2021) and Thorp & Mandel (2022) apply BayeSN to study the dust distributions of SNe in high-and lowmass hosts and find no evidence for a difference in   distributions between different environments.Notably, Thorp & Mandel (2022) finds evidence for a NIR mass step even when allowing for different dust distributions between high-and low-mass hosts.
In this work, we use a new GPU-accelerated implementation of BayeSN which has increased performance by factors of ∼50-100, making large-scale hierarchical analyses of SNe Ia more than feasible; we make this code publicly available.We build on previous analysis in Thorp et al. (2021) in applying BayeSN to optical data to study the host galaxy dust properties of SNe Ia.We apply the model trained in Thorp et al. (2021) to higher redshift SNe from the Dark Energy Survey three-year sample (DES3YR) and the Pan-STARRS Medium Deep survey (PS1MD), increasing both the sample size and redshift range covered.We investigate the host galaxy dust distributions in both high-and low-mass hosts in tandem with dust-independent intrinsic differences, allowing us to comment on the debate between intrinsic and extrinsic causes.We also perform the first hierarchical Bayesian analysis with a redshift-dependent   distribution to test whether there is any evidence that SN host galaxy dust properties evolve with redshift.
In Section 2, we detail the SN samples used in this work and describe redshift cuts applied to mitigate for selection effects.In Section 3, we summarise the BayeSN model and our new GPUaccelerated implementation, as well as detailing the specific models and analysis variants used within this work.We then describe our results, looking separately at a single dust population (Section 4.1), multiple dust populations split by host galaxy stellar mass (Section 4.2) and a redshift-dependent dust distribution (Section 4.3), before concluding in Section 5.

DATA
For this work we analyse three separate spectroscopically-confirmed optical samples of SNe Ia observed in  bands: the Foundation DR1 sample (Foley et al. 2018), the Dark Energy Survey (DES; Dark Energy Survey Collaboration et al. 2016) 3-year sample (DES3YR; Brout et al. 2019;Smith et al. 2020) and the sample from the Pan-STARRS Medium Deep survey (PS1MD; Kaiser et al. 2010) as compiled in the Pantheon sample (Scolnic et al. 2018).For all three samples, we use photometric data taken from the Pantheon+ compilation (Scolnic et al. 2022), which corrects for cross-calibration systematics between surveys.We include SNe from PS1MD present in the Pantheon compilation, rather than the Pantheon+ compilation, to increase our sample size as there were a few objects which were removed in the latter compilation that are suitable for our analysis.
All of the SNe in this analysis pass standard cosmology cuts (e.g.Scolnic et al. 2018Scolnic et al. , 2022) ) and have global host galaxy masses measured from SED fits to  and  photometry for Foundation (Jones et al. 2019) and PS1MD (Scolnic et al. 2018), and from DES  photometry for DES3YR (Brout et al. 2019;Smith et al. 2020;Wiseman et al. 2020).
Within our analysis, we take care to mitigate for selection effects such as Malmquist bias (Malmquist 1922).Foundation DR1 is a local sample; we apply the same selection cuts as used in Thorp et al. (2021) for a total of 157 SNe Ia which span a redshift range of 0.015 < z < 0.08, where the sample has minimal impact from Malmquist bias (Foley et al. 2018, see further discussion in Appendix A).In contrast, DES3YR and PS1MD are higher redshift meaning that they are more susceptible to impact from selection effects.DES3YR comprises 207 SNe Ia over a redshift range of 0.077 < z < 0.85, while PS1MD comprises 279 SNe Ia over a range of 0.026 < z < 0.63.To mitigate for this, in this work we take volume limited subsets of the full samples -we apply a cut at the point where the redshift distribution begins to decrease, indicating that a large number of SNe have been missed due to selection effects.For DES3YR, we apply a redshift cut at 0.4 to give a sample of 119 SNe Ia, while for PS1MD we apply a cut at 0.35 for a final sample of 199 SNe Ia.When combined, these samples include a total of 475 SNe Ia. Figure 1 shows the redshift distributions of the three samples used in this work, with dashed lines indicating the upper redshift cuts applied to DES3YR and PS1MD.For discussion of these redshift cuts, please see Appendix A. In brief, we investigate how the total -band extinction varies with redshift and find no evidence that these volume limited sub-samples are significantly impacted by selection effects, although cannot rule out that some small effects may remain.
It should be noted that in this work we utilise the version of the BayeSN model presented in Thorp et al. (2021), which is defined down to a rest-frame wavelength of 3500 Å, while the redshift range spanned by DES3YR and PS1MD means that rest-frame -band often covers wavelengths below this lower limit.As such, for the majority of SNe in these two samples we disregard -band data.

METHOD
For this work, we utilise the hierarchical Bayesian SN Ia SED model BayeSN (Thorp et al. 2021;Mandel et al. 2022).Hierarchical Bayes provides an ideal framework for joint inference of populations and the individual objects which comprise them, for example looking in tandem at individual SN host galaxies and the overall population.In this section we detail the model used for our analysis and discuss the different variants used.

Numpyro Implementation of BayeSN
Compared with previous BayeSN analyses (e.g.Mandel et al. 2022;Thorp et al. 2021;Thorp & Mandel 2022;Ward et al. 2023b;Dhawan et al. 2023) which implemented the model in Stan (Carpenter et al. 2017;Stan Development Team 2023), we use a version implemented using the probabilistic programming Python library numpyro (Phan et al. 2019;Bingham et al. 2019).All computations in numpyro are carried out using jax (Bradbury et al. 2018), meaning that this implementation of BayeSN supports just-in-time (JIT) compilation and GPUs, and similarly to the Stan version is fully differentiable with autodiff.By combining vectorised likelihood evaluation across the SN population with the use of GPU acceleration, the speed of the inference has increased by factors of ∼50-100, enabling hierarchical analyses of much larger SN samples.The numpyro implementation of the BayeSN model can be found here: https://github.com/bayesn/bayesn.
BayeSN uses the numpyro implementation of the No U-Turn Sampler (NUTS; Hoffman & Gelman 2014), an adaptive variant of Hamiltonian Monte Carlo (HMC; Neal 2011; Betancourt 2017), to sample from target posterior distributions.To give an example of typical performance, for the hierarchical SED-and populationlevel dust models presented in this work HMC typically converges and generates sufficient effective sample sizes in ∼15-30 minutes; these values are quoted for 4 chains run in parallel across 4 NVIDIA A100 GPUs, using resources from the Cambridge Service for Data Driven Discovery (CSD3).This runtime allows for quick exploration of different model variants.
While this work focuses on the physical properties of the SN Ia population and does not touch on cosmological analysis (as discussed in Section 3.2.2our results are conditioned on a fixed cosmology), the BayeSN code we make available can also be used to infer distance.A model is first 'trained' by inferring the population-level parameters of the model (as in Mandel et al. 2022;Thorp et al. 2021;Ward et al. 2023b).These population-level parameters are then fixed and the model can be applied to infer a cosmology-independent distance modulus   simultaneously with all the other latent SN parameters, conditioned on those fixed population parameters.Using this approach, distance is jointly inferred with SN latent parameters rather than being estimated via the Tripp formula (Tripp 1998) -a post-hoc linear relation with those SN parameters -as in SALT-based analyses.The advantage of this method is that it allows the full SN light curve to be used when estimating distance rather than compressing it to a single colour at peak, and also provides a distance estimate separately marginalised over the intrinsic and extrinsic variation observed in the population.Work is underway to integrate our code within the SN analysis framework SNANA (Kessler et al. 2009), and future cosmological analyses will be able to use BayeSN-derived distances.

The BayeSN Model
BayeSN is an SED model for SNe Ia, with the full time-and wavelength-varying SED given by, where  is the rest-frame phase relative to B-band maximum and   is rest-frame wavelength. 0 (,   ) is the optical-NIR SN Ia SED template of Hsiao et al. (2007) and is fixed a priori as a zeroth-order template, along with the normalisation constant  0 which is set to −19.5.All parameters denoted with  are latent SN parameters, with unique values for each SN , while all other parameters are global hyperparameters shared across the population.The different components which make up the model are described as follows: •  0 (,   ) is a function that warps and normalises the zerothorder SED template to create a mean intrinsic SED for the population, while  1 (,   ) is a functional principal component (FPC) that describes the first mode of intrinsic SED variation for SNe Ia.Both of these are implemented as cubic spline surfaces.
•   1 is a coefficient quantifying the effect of the  1 FPC for each SN.This is modelled as a normal distribution, with   1 ∼  (0, 1) 1 .Combined,  1 (,   ) and   1 capture the 'broader-brighter' relation observed in SNe Ia where intrinsically brighter light curves evolve over longer timescales around peak (Phillips 1993).The parameter   1 can therefore be thought of as analogous to a stretch parameter as present in models such as SALT and SNooPy.
•   is an achromatic, time-independent magnitude offset for each SN, drawn from a normal distribution with   ∼  (0, 2 0 ). 0 is a hyperparameter describing the width of this distribution and is inferred when the model is trained.
•   (,   ) is a time-and wavelength-dependent function that describes the time-varying residual intrinsic colour variations in the SED not captured by   1  1 (,   ).This parameter is represented by a cubic spline function over time and wavelength defined by a matrix of knots E  .These are drawn from a multivariate Gaussian e  ∼  (0,   ), where e  is the vectorisation of the E  matrix.The covariance matrix   is a hyperparameter of the model which describes the distribution of residual scatter across the population of SNe Ia and is inferred during training.
•    and  () describe the host galaxy extinction law for each SN;    is the total amount of V-band extinction, while parameterises the Fitzpatrick (1999) dust extinction law assumed by the model, here denoted by    ;  ()

𝑉 . 𝑅 (𝑠)
can be treated as either a shared parameter across the population or a unique value for each SN (see Section 3.2.3 for discussion of different assumed    distributions).For    , we assume an exponential prior described by a scale parameter (the population mean)   ; that is, for   ≥ 0, and zero otherwise.
The rest-frame, host galaxy dust-extinguished SED model   (,   ) is then scaled based on distance modulus   , redshifted and corrected for Milky Way dust extinction using a Fitzpatrick (1999) dust law with   = 3.1 and dust maps from Schlafly & Finkbeiner (2011).This produces an observer-frame SED, which can be integrated through photometric filters to produce model photometry that can in turn be compared with observed photometry to compute a likelihood.This hierarchical model is trained by inferring all global parameters and latent parameters for a population of SNe Ia (see Mandel et al. (2022) and Thorp et al. (2021) for full discussion of model training).The latent parameters are marginalised over and posterior estimates are obtained for the global parameters.
Previous SALT-based analyses which aimed to disentangle the intrinsic colour of SNe Ia from dust reddening (e.g.Mandel et al. 2017;Brout & Scolnic 2021;Popovic et al. 2021) have separated out the typical linear relationship between peak  −  colour and peak magnitude, characterised by a slope , into an intrinsic colour relation with slope  int and dust relation with slope   .Within the BayeSN model, there is a degree of colour dependence on  1 i.e. the stretch-like parameter.The dependence of the SED on the component of colour which is independent of both dust and stretch is parameterised by   , which as discussed above is drawn from a multivariate Gaussian e  ∼  (0,   ).Within our framework, we do not compress this to a simple relation between peak magnitude and intrinsic colour; we find that much of the residual intrinsic variation of SNe Ia SEDs does not correlate with − colour at peak.However, we emphasise that a relation between peak magnitude and intrinsic colour, independent of light curve stretch, is present in the model. 1 Throughout this paper, we use the notation  ∼  ( ,  2 ) where  is the mean and  2 is the variance.

Hierarchical Dust Model
The focus of this analysis is to study the host galaxy dust distributions of the sample.In this work, we modify the BayeSN model discussed previously by fixing all global parameters not related to host galaxy dust extinction ( 0 ,  1 ,   ).These parameters are fixed to the values for the T21 model presented in Thorp et al. (2021), which was trained on the sample of 157 SNe from Foundation DR1 also included in this work.
The hyperparameters inferred in our hierarchical model are the mean and standard deviation of the host galaxy   population distribution (  and   ), the population mean extinction (  ), which defines the scale of the exponential prior on   , and the intrinsic achromatic SN scatter ( 0 ).We place a uniform hyperprior on   of   ∼  (1.2, 6).For   we use a half-normal hyperprior with a scale factor of 2 [  ∼ Half- (0, 2 2 )], following Thorp et al. (2021) 2 .For   and  0 , we adopt half-Cauchy priors with scale parameters of 1.0 mag and 0.1 mag respectively to reflect the expected scales of these parameters, as in Mandel et al. (2022).

Conditioning on Distance
There are two approaches that can be taken with regard to distance in this type of hierarchical model, as discussed in Thorp & Mandel (2022).The first is to condition on distance based on redshifts and an assumed cosmology, while the second is to keep photometric distance as a free parameter and marginalise over the distance to each SN   when inferring dust hyperparameters.The latter can be advantageous as it provides a cosmology-independent dust estimate, but effectively means that dust properties are inferred using colour information alone rather than colour and magnitude information.For an optical plus NIR analysis such as Thorp & Mandel (2022), this is sufficient as optical-NIR colours provide additional constraints on the dust law.However, for an optical-only analysis it is necessary to condition on redshift-based distances to obtain reasonable dust constraints.
In this work, we condition on the redshift-distance relation such that the external constraint on   is, where   is the redshift of each SN,  ΛCDM () is the distance modulus of redshift  assuming a flat ΛCDM cosmology with Ω  = 0.28 and  0 = 73.24km s −1 Mpc −1 and   ext is the uncertainty in  ΛCDM (  ).This is based on propagating the uncertainty in the spectroscopic redshift   through to an uncertainty in .The redshift uncertainty is given by the individual uncertainty    added in quadrature with a peculiar velocity dispersion  pec .In this work, we assume  pec = 150 km s −1 (Carrick et al. 2015) 3 .All posteriors presented in this work are conditioned on this fixed cosmology.

Impact of Truncated Dust 𝑅 𝑉 Population Distribution
As discussed in Section 1, a number of previous studies have applied a population distribution to model host galaxy   for supernovae.One approach would be to model the population as a normal distribution, However, previous works have typically used a truncated normal distribution to ensure that   cannot go to unphysically low values.For example, Brout & Scolnic (2021) used a distribution truncated at 0.5 at the lower end.A more physically-motivated lower bound on the value of   is 1.2, based on the Rayleigh scattering limit (Draine 2003).We adopt this value for this analysis, modelling the host galaxy   distribution as where   (,  2 , , ) denotes a truncated normal distribution with  and  2 representing the mean and variance of a normal distribution, and  and  representing the upper and lower bounds of truncation applied to that normal distribution.This is the   distribution we use throughout our analysis.However, it is important to note that using a truncated distribution will impact inference in two ways: (i) Depending on the values of   and   , it is important to consider them as fitting parameters rather than directly as the population mean and standard deviation, which we will refer to hereafter as E[  ] and √︁ Var[  ].Using Eq. 4 with   = 1.2 will give a half-normal distribution; E[  ] will be significantly different from   .Considering a different case, for example where   = 3.0 and   = 0.2, the truncation can also have minimal effect -  and   will be practically identical to (ii) A truncated   distribution will impact inference of   .In general, unphysically low values of   will lead to poor quality light curve fits since they do not represent realistic extinction laws.Using a normal distribution without truncation rules out larger values of   to ensure that these low   values are not included in the distribution.With a truncated   distribution, these larger   values are not ruled out since low   values are already excluded -a large   will only impact the upper end of the distribution.Overall, the use of a truncated distribution can lead to higher inferred values of   , compared with an untruncated distribution.
The extent to which both of these effects will impact the analysis depends on the values of   and   .If the true   distribution is far above the lower limit, the impact will be negligible.If the true distribution overlaps significantly with this lower bound, the truncation will have a significant impact.

Assessing Quality of MCMC Chains
We use a number of standard diagnostics to assess the quality of our MCMC chains.We initialise 4 independent chains at different points of parameter space and run them independently, and verify that the chains have mixed and converged using the R (Gelman-Rubin) statistic as well as assessing the effective sample size (Gelman & Rubin 1992;Vehtari et al. 2021).In addition, we check that our chains do not contain any divergent transitions (e.g.Betancourt et al. 2014;Betancourt 2017).

Analysis Variations
Throughout this analysis, we perform a number of variations of our dust inference model, which are detailed as follows.

Single Dust Population
We first consider that all SNe are drawn from the same host galaxy dust distribution, with the same priors on    and    used across all SNe.For this approach, we consider each of the Foundation, DES3YR and PS1MD subsamples separately as well as considering one combined sample including all SNe.

Binned Populations
Motivated by ongoing debate regarding the cause of the mass step, we also consider a variation of the model where the dust population hyperparameters are binned based on global host galaxy mass.For example, using this binned approach and assuming a truncated normal distribution, the host galaxy   distributions become where   * is the host stellar mass of each SN  and  split is some reference stellar mass at which the split point is located.Such a split is also applied to give separate   distributions for high-and lowmass galaxies described by different   parameters.In this work, we set  split at 10 10  ⊙ .
In addition, we include parameters which represent a possible intrinsic difference between the populations of SNe Ia in each mass bin.This is done so that the model is flexible enough to allow either a mass step driven by differences in dust properties, some other intrinsic effect, or some combination thereof.We consider three separate forms for this intrinsic mass step, described as follows: (i) A model including mass step parameters Δ 0,HM and Δ 0,LM which act as a constant achromatic shift in magnitude for each bin applied to the mean of the   distributions, such that (ii) A model including mass step parameters Δ 0,HM and Δ 0,LM , which allow for a time-and wavelength-dependent difference in the baseline intrinsic ( 1 = 0) SED, independent of stretch, between SNe in different environments.Throughout our analysis, we use the term baseline to refer to the intrinsic SED with  1 = 0, since for a population which has a non-zero mean value of  this does not necessarily correspond to the population mean.In this model, where  0 describes the baseline intrinsic SED as in Equation 1,  T21 0 represents the  0 matrix inferred in Thorp et al. (2021) across SNe Ia in both high-and low-mass galaxies and Δ 0,HM and Δ 0,LM encode a shift in the baseline intrinsic SED in each environment. 0 and Δ 0,HM/LM are matrices defining a 2D cubic spline surface in wavelength and time, which warp the Hsiao template to match the baseline intrinsic colours of the training sample.The matrices are of shape 6 × 6 as detailed in Thorp et al. (2021); there are 6 knots in phase every 10 rest-frame days between -10 and +40 days relative to B-band maximum, and 6 knots in wavelength with one at the effective wavelength of each of  bands and an additional 2 knots at the high and low end acting as an anchor at the edge of the wavelength coverage of the model.Δ 0,HM/LM therefore each consist of 36 parameters.This more general model in principle also allows for the previous case of a constant, achromatic magnitude shift if that is what the data supports.
(iii) A model which does not include any parameters representing an intrinsic mass step and includes splits only on dust parameters.
The first case allows for the possibility of an intrinsic, achromatic mass step between SNe in high-and low-mass galaxies, as explored in Thorp et al. (2021); Thorp & Mandel (2022).However, this form carries with it the assumption that the baseline intrinsic colour of SNe Ia is the same between high-and low-mass environments.While this is a possibility, the most general and flexible model we explore is one that allows for the baseline intrinsic SED of SNe Ia to vary between high-and low-mass galaxies, which is the second case we explore.It is important to jointly consider intrinsic colour along with the dust distribution since dust extinction is inferred with respect to intrinsic properties.
In the first case, for Δ 0,HM and Δ 0,LM we assume a uniform hyperprior in the range [−0.2, 0.2] as in Thorp et al. (2021) to safely capture the full range of mass steps observed in previous studies.The total achromatic mass step Δ 0 is then given by Δ 0,HM −Δ 0,LM .This must be parameterised as a separate parameter for each mass bin to avoid arbitrarily setting one bin to the mean absolute magnitude of the T21 BayeSN model.
In the second case, we assume that the shift in  0 is a small variation around the overall population baseline intrinsic SED; for all of the elements that compose the Δ 0,HM/LM matrix, we use the hyperprior Δ 0,HM/LM ∼  (0, 0.1 × I). ( where I is the identity matrix.The factor of 0.1 represents our expectation that this effect is small 4 .In the third case, we only include splits on dust parameters between different environments with no parameters representing an intrinsic mass step, effectively enforcing a mass step which is explained by dust properties.This is the approach used by Brout & Scolnic (2021); Popovic et al. (2021) and we include this case for comparison with these works and to examine the effect omitting these parameters has on dust inference.

Redshift Evolution
The samples we consider in this analysis range in redshift from 0.015 up to 0.4, factoring in the redshift cuts applied to mitigate for selection effects.The range spanned means that these SNe provide an opportunity to test whether there is any evidence that the SN Ia host galaxy dust distribution varies with redshift.Our approach of using volume-limited samples, based on redshift cuts, to test for redshift evolution is similar to that of Nicolas et al. (2021).We apply a simple linear model for   , adapting Eq. 4 by setting 4 Relaxing this expectation and allowing a wider prior does not impact our conclusions.
where  is redshift,   is the gradient of the relation between redshift and   , and  ,0 is the value of   at  = 0. Within the inference, the prior on a SN at redshift   is then    ∼   (  (  ),  2  , 1.2, ∞).We choose a linear model for this relation as the gradient parameter   provides a simple way to assess whether the data provides evidence for evolution with redshift i.e. that   is non-zero.We also apply a similar linear relation with redshift to   , with the value of   at  = 0 denoted by  ,0 and the gradient of the relation given by   .For   , we are often only able to obtain upper limits and judge that we would not be able to constrain a   redshift relation.We elect to keep   as a fixed population parameter and only allow   and   to vary with redshift.
We use a joint prior on  ,0 and   .For  ,0 we use the same prior as applied for   previously.For   , we apply a uniform prior, This enforces the condition that   is restricted to the range [1.2, 6] below  = 1, regardless of  ,0 , to rule out unphysical redshift evolution in the   distribution.For  ,0 , we use the same hyperprior as for   in Section 3.2.1.For   , we place a wide uninformative hyperprior such that   ∼  (−0.5, 0.5) mag, chosen to prevent the posterior distribution from approaching the prior bounds.

RESULTS AND DISCUSSION
In this section we present and discuss our results for each of our model variations.To demonstrate typical light curve fits obtained using BayeSN, Figure 2 shows examples of a fit to a SN from each of the three surveys which make up our combined sample.

Population Sub-sample Dust Properties
We begin by considering each of the DES, Foundation and PS1MD samples separately, as well as a combined sample containing all SNe across the three samples.The full results are shown in Table 1.Individually, we obtain   values of 2.66 ± 0.13, 3.14 ± 0.59 and 1.69 ± 0.33 for Foundation, DES3YR and PS1MD respectively.The differences seen here perfectly highlight the importance of considering the population mean E[  ] as well as just   when using a truncated distribution.The inferred value of   = 1.69 ± 0.33 for PS1MD is close to the lower truncation bound of 1.2, and therefore a difference between E[  ] and   is expected.When we calculate E[  ] for these samples, we instead infer 2.66 ± 0.13, 3.41 ± 0.57 and 2.44 ± 0.29 respectively for Foundation, DES3YR and PS1MD, showing PS1MD to be much more consistent with the others.While there is a numerical difference between DES3YR and PS1MD close to 1, considering the uncertainties all three of these samples are statistically consistent with each other and also with the inferred E[  ] value of the combined population, 2.58 ± 0.14.
The effect of the truncated distribution is also demonstrated in Figure 3 mag for Foundation, DES3YR and PS1MD respectively.The higher value for Foundation is intriguing; although we have applied redshift cuts to both the DES3YR and PS1MD samples to mitigate for selection effects, it is possible that the sample is lacking higher   objects at higher redshift which is reducing the inferred   values for these samples.Nevertheless, all three individual samples are consistent with the combined sample value of 0.16 ± 0.01 mag.Finally, considering  0 we obtain values of 0.10 ± 0.01, 0.11 ± 0.01 and 0.12 ± 0.01 mag for Foundation, DES3YR and PS1MD respectively, and a combined sample value of 0.11 ± 0.01 mag.
It should be noted that our inferred parameters for the SN Ia host galaxy   population distribution are consistent with those from previous hierarchical Bayesian analyses which incorporated both optical and NIR data.Previous analysis using BayeSN in Thorp & Mandel (2022) of 75 nearby SNe from CSP-I (Krisciunas et al. 2017) found   = 2.59 ± 0.14 and   = 0.62 ± 0.16, while a light curve model-independent analysis of dust reddening based on peak optical and NIR apparent colours of 65 low-redshift SNe Ia presented in Ward et al. (2023a) found   = 2.61 +0.38  −0.35 and placed 68th (95th) percentile upper limits of   < 0.92(1.96).

Dust Properties Split by Host Mass
We next consider the dust properties binned based on global host galaxy mass.As discussed in Section 3.3.2,we consider three separate variants of this model: an achromatic intrinsic mass step, a difference in baseline intrinsic SED between SNe in high-and lowmass galaxies, and a less flexible model which only incorporates differences in dust properties.The results obtained from these models are summarised in Table 2, but discussed here in more detail.

Intrinsic Achromatic Mass Step
We first consider dust properties under the assumption of an achromatic mass step, with a wavelength-independent intrinsic magnitude offset between SNe in high-and low-mass host galaxies.In this case, we find evidence in favour of an intrinsic offset, inferring Δ 0 = −0.049±0.016at a significance of 3.1.Moreover, using this model we find no evidence for a difference in   as a function of host galaxy stellar mass; for E[  ] we infer 2.51±0.16and 2.74±0.35respectively across high-and low-mass host galaxies, with ΔE[  ] = 0.32 ± 0.58, consistent with zero.
In addition to the   distributions and Δ 0 , we consider   and  0 separately for high-and low-mass hosts.  and Δ 0 have similar effects in that increasing both corresponds to a fainter observed SN sample, however the key difference is that Δ 0 is achromatic while increasing   will also cause the sample to appear redder.We find evidence that   is higher for high-mass hosts than low-mass hosts with a difference of 0.068±0.018mag at a significance of ∼3.7, indicating that there is on average more dust along the line of sight to SNe Ia in high-mass galaxies than low-mass galaxies.This result is consistent with expectations from analysis of galaxy attenuation (e.g.Salim et al. 2018;Nagaraj et al. 2022;Alsing et al. 2024), although it should be noted that we do not necessarily expect consistent findings when considering line-of-sight SN extinction as opposed to galaxy attenuation.
It is noticeable that the posterior distributions for the parameters relating to   are wider for the low-mass bin than for the high-mass bin.In part, this can be simply understood as a consequence of there being more SNe in the high-mass bin.However, we can see from the inferred   values that SNe in the high-mass bin have more dust along the line-of-sight on average and therefore provide greater constraint on the properties of the dust.
Finally, concerning  0 we infer a slightly lower value for highmass hosts, 0.10 ± 0.01 mag as opposed to 0.12 ± 0.01 mag with a difference of -0.031±0.012mag at a significance of ∼2.6.

Intrinsic SED Difference
We next consider a more flexible variation of the model, with the baseline ( 1 = 0) intrinsic SED able to vary between high-and lowmass galaxies.As mentioned previously, any reference to the baseline intrinsic properties refers to the case with  1 = 0, i.e. independent of light curve stretch.
In this case, the parameters comprising Δ 0 are less directly interpretable in a physical sense compared with a specific mass step term Δ 0 .However,  0 corresponds to the baseline intrinsic SED of the SN population; for each posterior sample of Δ 0,HM/LM we can integrate the SED through a given set of passbands in order to calculate intrinsic baseline light and colour curves.In this analysis, we consider derived posterior distributions on mean light and colour curves for SNe in high-and low-mass galaxies separately.We do this comparison by setting   1 =    =   (, ) = 0 5 to assess the baseline dust-and stretch-independent properties of SNe Ia in different Table 1.Inferred parameter values when applying our hierarchical dust model to each sub-sample separately, as well as for the combined sample, assuming a truncated normal   distribution.E[  ] and √︁ Var[  ] respectively denote the population mean and square-root of the population variance, as opposed to the fitting parameters   and   .Values quoted as  ±  represent the mean and standard deviation of the posterior, while those quoted as X (Y) denote the 68th (95th) percentiles for parameters where only an upper limit could be constrained.environments, after post-processing our MCMC chains to ensure a consistent definition of  1 between high-and low-mass galaxies as discussed in Appendix D.

Sample
Figures 5 and 6 respectively show the posterior distributions of baseline dust-and stretch-independent  light curves and −,  − and  −  colour curves for SNe in high-and low-mass galaxies, along with the differences between the two.Figure 7, meanwhile, shows joint and marginal posterior distributions of E[  ], √︁ Var[  ],   and  0 as well as derived distributions of baseline intrinsic peak band absolute magnitude and  −  colour for each mass bin (Figure C2 in Appendix C shows the posteriors for the differences in inferred parameter values between each mass bin. Compared with the Δ 0 case presented in Section 4.2.1, we are now analysing intrinsic differences as a function of both wavelength and time.Regarding an intrinsic mass step, in the conventional Tripp formula (Tripp 1998) approach for estimating distances to SNe Ia we expect that an intrinsic mass step corresponds to a difference in peak magnitude in some reference band.For this model we will define Δ 0 =  int peak,g,HM −  int peak,g,LM and infer Δ 0 = −0.049± 0.027 mag.
Compared with the achromatic mass step model, we observe a similar magnitude offset but at a lower significance of 1.8.Of course, in isolation a 1.8 offset is not significant but it is unsurprising that our uncertainties increase when switching to a more flexible model.
Beyond a simple mass step parameter, we should consider how the difference in baseline light and colour curves varies with both time and wavelength.We can consider the intrinsic magnitude difference at peak in different bands, inferring Δ int  ,peak = −0.027± 0.022, Δ int ,peak = −0.032± 0.019 and Δ int ,peak = −0.016± 0.030.The relatively large uncertainties for these values makes it challenging to comment on the wavelength dependence of any magnitude offset at peak.The most significant difference in magnitude is at 20 days postpeak in -band, where Δ int ,=20 = −0.099± 0.022 at a significance of 4.5.Concerning colour, at peak we see a difference in baseline intrinsic  −  colour between high-and low-mass of −0.022 ± 0.010 at a significance of 2.2, weak evidence that SNe Ia in high-mass galaxies are bluer around peak than those in low-mass galaxies.However, we can also consider other bands and phases; 20 days postpeak, the difference in baseline intrinsic  −  colour is 0.067 ± 0.015 at a significance of 4.5.Overall, our results do indicate statistically significant intrinsic differences between SNe Ia in high-and lowmass host galaxies.
It is interesting to note that our results suggest that intrinsic colour differences between SNe Ia in each mass bin are larger at later times and longer wavelengths, most notably around the second peak in band.Other works have suggested that the effect of dust can itself we verify that the posterior samples of  (,  ) for both SNe in high-and lowmass galaxies separately do not shift the population mean intrinsic colours.vary in time, perhaps because of circumstellar dust or nearby dust clouds (e.g.Förster et al. 2013;Bulla et al. 2018a,b).Compared with these analyses, we assume that dust properties are constant with time and allow for spectrotemporal perturbations around the mean intrinsic SED.The possibility that differences in intrinsic colour between different environments vary in size with time would have significant implications for studies considering time-varying dust, and should be taken into consideration in future work.
In terms of host galaxy dust properties, for the population mean of the   distribution for high-mass bin we infer E[  ] = 2.26 ± 0.14 while for the low-mass bin we infer E[  ] = 3.36 ± 0.51; the difference between the values is ΔE[  ] = −1.10 ± 0.53 at a significance of 2.1.As in Section 4.2.1, we see a difference in   between SNe Ia in high-and low-mass hosts, although this difference is reduced to -0.047±0.024mag at a significance of just under 2.Regarding  0 , compared with the achromatic mass step model we infer the same value for the high mass bin of 0.10 ± 0.01 mag and a slightly lower value for low-mass galaxies of 0.12 ± 0.01 mag, with the difference between high-and low-mass reduced to −0.022±0.012mag.
Beyond considering the posterior distributions of the parameters, we can also compare SNe in high-and low-mass hosts by summarising their physical properties.In Figure 8, we present the mean and standard error on the mean in Hubble residual binned as a function of rest-frame peak  −  apparent colour (derived from the posterior samples of the latent SN parameters).These values were obtained by fitting the SN sample with the BayeSN model with the values of   ,   ,   ,  0 , and  0 fixed to the medians of the posterior samples for high-and low-mass hosts separately -for these fits, no external distance constraint based on redshift was included.Based on this result, we see no discernible trend between Hubble residual and apparent colour nor a difference in this trend between each bin, although the lack of redder SNe in the low-mass bin makes this comparison challenging.

Dust-only Difference
The final case we consider is a model that does not include either of the intrinsic differences assessed previously (or, equivalently, forces these differences to be equal to zero).Under the strong assumption that the SN Ia population is intrinsically identical between highand low-mass galaxies, for the host galaxy   distribution we infer E[  ] of 2.39±0.13 and 3.14±0.39for high-and low-mass galaxies respectively, with a difference ΔE[  ] = −0.74± 0.41.As for the other models which incorporate intrinsic differences, in this case we still infer a difference in   between the different mass bins; 0.18 ± 0.01 mag for high-mass and 0.13 ± 0.01 mag for low-mass, with Δ  = 0.047 ± 0.017 mag.There is also a difference in inferred   0 between the two bins, with Δ 0 = −0.029± 0.012 mag in this case.

Intrinsic or Extrinsic?
The aim of this binned population analysis is to investigate whether the mass step is driven by intrinsic differences, differences in the host galaxy dust properties or some combination of those two effects.However, before considering the relative contribution of each of these effects we start with a simpler question: are there intrinsic differences between SNe Ia on either side of the mass step?
We have considered two separate model variations which jointly infer intrinsic properties in each bin simultaneously with host galaxy dust properties.The important thing to emphasise is that both of these models are flexible and allow for the possibility of a mass step driven purely by dust; nevertheless, when we condition our models on observed data the inferred parameter values support the existence of non-zero intrinsic differences.When we allow for an intrinsic  achromatic magnitude offset between each mass bin, we infer a mass step Δ 0 = −0.049±0.016mag at a significance of 3.1.In the more flexible case of a difference in baseline intrinsic SED between mass bins, we infer a similar magnitude offset in -band of −0.049 ± 0.027 mag although at a lower significance of 1.8, unsurprising given the increased complexity of the model.When looking at intrinsic SED differences we can additionally examine differences as a function of both wavelength and time.Our analysis shows a difference in intrinsic peak  −  colour between mass bins of 2.2 significance, providing weak evidence that SNe Ia in high-mass galaxies are intrinsically bluer than their low-mass counterparts in these bands.Looking to other wavelengths and phases, we also find much more significant differences.For example, around the time of the second maximum 20 days post-peak we find that there are differences in baseline intrinsic -band magnitude and  −  colour of 4.5 significance.Overall, our results do support the existence of intrinsic differences.Brout & Scolnic (2021); Popovic et al. (2023) posit that the mass step is driven solely by differences in dust properties between highand low-mass galaxies.Our analysis, however, does find significant intrinsic differences and therefore contradicts this idea.Our results also differ from those of Karchev et al. (2023a), which disfavours both intrinsic and extrinsic differences between SNe Ia in different mass bins.We note that the main difference between our approach and that of Karchev et al. (2023a) is that we have also treated the population mean -band extinction (  ) as a separate parameter for each mass bin.We next consider what our results can tell us about the relative contributions of intrinsic and extrinsic effects in driving the mass step.
As part of our analysis, we do consider a model which assumes  no intrinsic differences between mass bins and only allows the dust properties to vary.We find a lower population mean   of E[  ] = 2.39±0.13for SNe Ia in high-mass galaxies compared with low-mass galaxies, for which we infer E[  ] = 3.14 ± 0.39.This is similar to the trend found in Brout & Scolnic (2021); Popovic et al. (2023).In this case, it is not surprising that we infer differences in the host galaxy dust properties; by definition, differences in dust properties were the only effect included in the model which were allowed to explain the mass step.
When we increase the flexibility of our model to jointly infer an achromatic magnitude offset between mass bins alongside dust properties, our results no longer provide any evidence to support a difference in   distribution; for high-and low-mass bins respectively, we infer 2.51±0.16and 2.74±0.35for E[  ].When we allow for a magnitude offset, our results support the idea of a mass step driven solely by intrinsic differences.However, while this model is more flexible than the previous case it still carries the assumption that the baseline intrinsic colours of SNe Ia in each mass bin are identical, only allowing for a magnitude shift.For the final case, which allows for both time and wavelengthdependent differences in the baseline intrinsic SED between each mass bin, we infer E[  ] values of 2.26±0.14 and 3.36±0.51for high-and low-mass bins respectively.The difference between these two values is ΔE[  ] = −1.10 ± 0.53 at a significance of 2.1.Our interpretation of these results is somewhat limited by the weak constraint we are able to place on E[  ] for the low-mass bin, given the lower overall extinction of SNe Ia in this bin.Considering Hubble residuals and apparent peak  −  colours, we see no evidence for a relation between the two nor any difference in this relation between each mass bin.However, this analysis is again limited by the lack of redder SNe in the low-mass bin.Our results do not provide strong evidence in favour of a mass step driven in part by extrinsic differences, but nor can we rule it out.González-Gaitán et al. (2021) speculates that the mass step may in part result from a difference in intrinsic colour between SNe Ia in high-and low-mass bins.The Tripp formula used to estimate distance in SALT-based analyses involves a linear correction based on  −  colour at peak, .González-Gaitán et al. ( 2021) therefore conjecture that a difference in intrinsic colour  int between each mass bin would lead to an overall magnitude offset ∼  int  int , leading to an observed mass step.We again emphasise that the BayeSN model does contain a stretch-independent colour term within  (, ), which aims to capture the intrinsic variation across the SED as a function of both time and wavelength.Nevertheless, if we think of the intrinsic differences we see in terms of a linear relation this would imply  int, ∼ 2.2, where  denotes the fact that this is based off -band magnitude and  −  colour rather than  − .
The flexible model we have used, equitably treating differences in the intrinsic baseline SED of SNe Ia in each mass bin alongside differences in dust properties, has a number of advantages.However, the complexity of the effects described by the model and their relative interplay makes this a challenging problem to solve.We observe intrinsic differences in peak -band magnitude and  −  colour with significances of 1.8-2.2(albeit with intrinsic differences of 4.5 at other phases), as well as a difference in ΔE[  ] with 2.1 significance.The combined sample of 475 SNe Ia used in this work has provided evidence in favour of intrinsic differences between SNe Ia in different mass bins, but further analysis of larger samples is necessary to allow for more confident conclusions about the relative contributions of intrinsic and extrinsic effects in explaining the mass step.
We have not considered different metrics for model comparison, for example using Bayes factors or the Bayesian Information Criterion, as part of this analysis.These metrics can be challenging to calculate for hierarchical models.However, we emphasise that our more complex models always allow for the possibility of the simpler cases, if that is what the data supports.For example, if the data were consistent with a mass step driven solely by differences in dust properties, we would expect to infer differences relating only to dust, rather than any intrinsic effect, using our intrinsic SED difference model.In this work we have considered a variety of models with increasing flexibility, rather than comparing a number of distinct models.The approach of considering multiple separate effects within the same model is referred to as 'continuous model expansion' (Gelman & Shalizi 2013).
Within this work, we have focused on SN Ia properties split based on their host galaxy stellar mass.However, as mentioned in Section 1 previous work has established relations between SN Ia luminosity and other galaxy properties besides stellar mass, for example SFR, sSFR and rest-frame colour.These relations are also stronger when considering the properties of the local environment in which the SN exploded rather than global properties of the entire host galaxy.The BayeSN model we have used for this work can be modified to incorporate population splits on other parameters, and future studies can analyse intrinsic and extrinsic differences as a function of any host property using our approach.

Potential Cause of Intrinsic Differences
Both Jones et al. (2023) and Taylor et al. (2024) previously examined the mass step using SALT; the former incorporated an SED surface to capture the variation in the apparent SN SED as a function of host galaxy stellar mass, while the latter applied the standard SALT model to separate training samples of SNe in high-and low-mass hosts.These works found opposite conclusions, albeit with different methodologies and samples; Jones et al. (2023) found that the SEDs of SNe Ia in high-mass galaxies were bluer than those in low-mass galaxies, while Taylor et al. (2024) found the inverse.However, neither of these works aims to disentangle intrinsic colour from dust reddening, making a direct comparison with our results difficult.We find that SNe Ia in high-mass bins are on average bluer than their low-mass counterparts, but also that they typically have more dust Table 2. Inferred parameter values for our hierarchical dust model with separate population parameters for SNe in host galaxies above and below 10 10  ⊙ for the combined sample of 475 SNe Ia.We present results for cases allowing for an intrinsic magnitude difference between bins, an intrinsic SED difference and no intrinsic difference.In some cases, Δ values are presented which correspond to the difference between posterior samples for SNe in high-mass hosts and low-mass hosts.along the line-of-sight as demonstrated by the differences in inferred   values.

Method
Given that our results have indicated intrinsic differences between SNe Ia in high-and low-mass galaxies, it is important to consider what could cause such differences.It has previously been suggested that the luminosity distribution of SNe Ia may be explained in part by differences in metallicities of the progenitor stars affecting the mass of 56 Ni produced (e.g.Timmes et al. 2003).Kasen et al. (2009, see Fig. 4) predicts that metallicity is expected to influence both the luminosity and decline rate of SNe Ia; for an otherwise unchanged progenitor star, an increase in metallicity will decrease the production of 56 Ni, causing a fainter SN which declines more quickly.While higher metallicity will lead to an intrinsically fainter SN population overall, the relative impact of metallicity on both luminosity and decline rate means that we expect SNe Ia with higher progenitor metallicities to be intrinsically brighter, for a given value of stretch.
Our results indicate that SNe Ia in high-mass galaxies are intrinsically brighter than those in low-mass galaxies, independently of light curve stretch.High-mass galaxies will also on average have higher metallicities (e.g.Tremonti et al. 2004).Our findings are therefore consistent with the suggestion from Kasen et al. (2009) that SNe Ia in higher metallicity environments will be brighter than those of equivalent light curve stretch in lower metallicity environments.
Additionally, previous analysis of the ejecta velocities of SNe Ia around peak in the Si II 6355 line have provided evidence in favour of two separate populations of SNe Ia (Wang et al. 2009(Wang et al. , 2013)), consisting of 'normal-velocity' SNe Ia and 'high-velocity' SNe Ia.While we cannot comment on ejecta velocity in this work, given that the implementation of BayeSN is applied only to photometry6 , we can consider if the intrinsic differences we see might relate to this idea of normal-velocity and high-velocity SNe Ia.Some previous studies have indicated that ejecta velocities of SNe Ia correlate with apparent and intrinsic SN colour (e.g.Foley & Kasen 2011;Foley et al. 2011;Foley 2012;Mandel et al. 2014, although see Dettman et al. 2021 for a counterexample based on the Foundation survey); specifically, these works suggest that high-velocity SNe are intrinsically redder than their normal-velocity counterparts.Additionally, Siebert et al. (2020) finds that composite spectra of SNe Ia with negative Hubble residuals -those that are brighter -have higher ejecta velocities than those constructed from SNe Ia with positive Hubble residuals at phases around peak.Further analysis has found that these two SN Ia populations demonstrate an environmental dependence; while normal-velocity SNe occur across a full range of hosts, high-velocity SNe occur specifically in high-mass, high-metallicity environments (Wang et al. 2013;Pan et al. 2015;Pan 2020).Wang et al. (2013) postulated that high-velocity SNe Ia are associated with young progenitor stars, though Pan (2020) found that these objects were not associated with particularly young environments and concluded that metallicity was the main driver behind these objects.On this basis, one might expect to observe that SNe Ia in the high-mass bin are on average intrinsically redder than those in the low-mass bin.However, our analysis finds the opposite -SNe Ia in the high-mass bin seem to be intrinsically bluer.It is important to note that our findings do not contradict previous results regarding the environmental dependence and nature of high-velocity SNe Ia.High-velocity SNe only comprise a subset of the population, and it may well be the case that there is an environmental dependence in the population of normal-velocity SNe Ia which acts to counter the colour difference expected from high-velocity SNe Ia.However, this does mean that the observed association of high-velocity SNe Ia with high-mass galaxies cannot explain the trends found in our analysis.One proposed explanation for the mass step is of two different populations of SNe Ia originating from different progenitor systems.These are typically thought of being divided into young/prompt SNe Ia resulting from single degenerate systems comprising a white dwarf and massive companion star, and old/delayed SNe Ia resulting from double degenerate systems comprising two white dwarfs (e.g.Mannucci et al. 2006;Rigault et al. 2020;Nicolas et al. 2021).In this case, the mass step arises as a result of differences in the ages of the progenitor stars between each mass bin; more massive host galaxies are associated in general with older progenitor stars.The size of any step therefore increases with significance when considering environmental properties that more closely depend on progenitor age (e.g. −  colour, local sSFR; Rigault et al. 2020;Kelsey et al. 2021).
The intrinsic differences we find in this work may be the result of two different populations of SNe Ia originating from progenitor systems of different ages.However, the physical nature of SN Ia explosions and how they relate to their progenitors is not yet established enough to allow us to comment on whether our results regarding intrinsic differences in luminosity and colour are consistent with a split between young and old progenitor systems.3.
For the gradient of the   redshift relation, we infer   = −0.22 ± 0.06 mag, non-zero at ∼3.7 significance.A challenge in interpreting this result is that Malmquist bias would be expected to lead to smaller inferred values of   at higher redshift.While we have mitigated for this by applying selection effects to the DES3YR and PS1MD samples, some small effect may remain -overall, we caution that this result should be interpreted as an evolution in the effective   in the samples we are looking at with redshift, rather than necessarily a physical evolution with redshift in the distribution of   .Nevertheless, it is important that   is free to evolve with redshift in the model in order to fairly assess any evolution with redshift of   .
Considering the   distribution itself, our results do not provide any evidence that it evolves with redshift across the Foundation, DES3YR and PS1MD samples -we find the gradient of the  redshift relation to be   = −0.38 ± 0.70, consistent with zero.While we have performed this analysis with   also free to evolve with redshift, it should be noted that our conclusions are unchanged when we fix   to be constant with redshift.However, we emphasise that the samples included in this work are all below  = 0.4 and that further analyses with higher redshift objects are required to investigate the possibility of redshift evolution further.

CONCLUSIONS AND FUTURE WORK
In this work we apply BayeSN, a hierarchical Bayesian SED model for SNe Ia, to infer the global dust distributions for samples from Foundation, DES3YR and PS1MD.We have investigated the possibility that the host galaxy mass step can be explained solely by differences in the dust population between high-and low-mass galaxies, and also perform the first hierarchical analysis allowing for a redshift evolution in the dust population.Our findings can be summarised as follows: • For a combined sample of 475 SNe Ia from Foundation, DES3YR and PS1MD (with redshift cuts applied to mitigate for selection effects), we infer the population mean of the line-of-sight   distribution to be E[  ] = 2.58 ± 0.14, the square root of the   population variance to be √︁ Var[  ] = 0.59 ± 0.20, the population mean   to be   = 0.16 ± 0.01 mag and the intrinsic dispersion to be  0 = 0.11 ± 0.01 mag.Considering each sub-sample separately, we infer values consistent with these.
• When jointly inferring differences in intrinsic properties simultaneously with differences in host galaxy properties, we find evidence for intrinsic differences between SNe in host galaxies with stellar masses above and below 10 10 M ⊙ .Allowing for a constant, achromatic magnitude offset between each mass bin, we infer an intrinsic mass step of -0.049±0.016mag and no evidence for any difference in the   distributions.However, we also apply a more flexible model which allows for time-and wavelength-dependent differences in the intrinsic SED between the two populations.In this case, we infer differences in peak stretch and dust-independent intrinsic -band magnitude and  −  colour of -0.049±0.027and -0.022±0.010,and more significant differences in intrinsic -band magnitude and  −  colour 20 days after peak of -0.099±0.022and 0.067±0.015.For this model we also infer a difference in the population mean of the   distribution for each mass bin of ΔE[  ] = −1.10 ± 0.53.These results suggest that extrinsic effects, in addition to intrinsic effects, may contribute to the host galaxy mass step.Future analyses of this type on larger samples will allow for better constraints of each of these effects to better understand their relative contributions.
• Across all model variants we infer consistently larger line-ofsight population mean   (  ) values for SNe Ia in high-mass galaxies than for SNe Ia in low-mass galaxies.When allowing for a constant intrinsic magnitude offset between each mass bin we infer a difference of Δ  = 0.068 ± 0.018 mag, although this is reduced to Δ  = 0.047 ± 0.024 mag when allowing the baseline, stretchindependent intrinsic SED to vary with both time and wavelength between each mass bin.
• We apply a model which allows   and   to vary with redshift, modelling these relations as a straight line and constraining both the value at  = 0 and the gradient of the relation with redshift (  and   ).We find no evidence that   evolves with redshift over the range covered by the samples used for this work, inferring   = −0.38 ± 0.70, consistent with 0. Concerning   , we infer   = −0.22 ± 0.06 mag, however we emphasise that some small selection effects may remain in spite of our redshift cuts and that this result should be interpreted as an evolution in the effective   in these samples rather than necessarily suggesting a physical evolution with redshift in the distribution of   .Future studies involving higher redshift objects should investigate this further.
Our flexible BayeSN, SED-based approach to studying intrinsic and extrinsic properties of SNe Ia provides an ideal framework for studying how the population varies with different host galaxy properties.As discussed in Section 3.1, this analysis was performed using a new, GPU-accelerated implementation of BayeSN; this code has improved performance by a factor of ∼100, making it suitable for application to large SN samples, and is made publicly available.Further performance increases (up to an additional factor of ∼10) are possible through the use of Variational Inference (VI; Blei et al. 2016), which is planned for inclusion within BayeSN (Uzsoy 2022).
Alternatively, it is also possible to use the BayeSN model in an SBI framework.Karchev et al. (2024) presents dust inference using BayeSN with truncated marginal neural ratio estimation (TMNRE; Miller et al. 2021 for the method itself, Karchev et al. 2023b for a specific application to SN cosmology), an SBI approach which uses neural networks to derive posterior distributions where an analytic likelihood is not possible.Such an approach has the potential to fully incorporate survey selection effects -which can be simulated but not expressed analytically -within the hierarchical framework rather than requiring a redshift cut as we have used for this analysis.
In this work, we have focused on physical properties of the population of SNe Ia, conditioned on a fixed cosmology.However, the BayeSN code we make available can also be used to infer cosmologyindependent distances whilst marginalising over the intrinsic and extrinsic variations in the population, and is being integrated within SNANA.Additionally, the SBI approach to dust inference using BayeSN presented in Karchev et al. (2024) provides a path towards a fully hierarchical cosmological analysis of SNe Ia all the way from light curves to cosmological parameters, taking survey selection effects into account.Moving forward, BayeSN can therefore be a key component of cosmological analyses.

APPENDIX A: ASSESSING COMPLETENESS
To mitigate for the impact of selection effects such as Malmquist bias on our analysis, we have applied upper redshift cuts to the PS1MD and DES3YR samples at  < 0.35 and  < 0.4 respectively.Considering our model, selection effects are most likely to impact inference of the population mean   ,   -at higher redshifts, higher extinction objects with large   values are less likely to be observed as they will be fainter in the observer-frame.Without mitigating for selection effects, lower values of   will be inferred.
To assess our redshift cuts, we re-run our hierarchical model on the Foundation, DES3YR and PS1MD samples with lower redshift cuts to examine the impact on the results.The results of this analysis are shown in Table A1.For the DES3YR sample, lowering the upper redshift limit from 0.4 to 0.35 or 0.3 does not impact the inferred value of   .This provides reassurance that with our chosen redshift limit of 0.4, the sample is not significantly impacted by selection effects.Our findings are similar for the PS1MD sample; a redshift cut at 0.3 does not impact the inferred   value, and while an even lower cut at 0.25 does increase the inferred value by ∼0.01 mag this difference is not statistically significant.Regarding Foundation, this sample focused on targets at  < 0.08 with minimal impact from Malmquist bias (Foley et al. 2018).Nevertheless, we do consider lower redshift cuts to see how this affects   .We find that   does increase with lower redshift cuts, however these changes are only of ∼ 1 significance.We ultimately opt to use the same Foundation sample as in Thorp et al. (2021) for consistency and because the sample was gathered with minimising Malmquist bias in mind.In addition, we did explore repeating the analysis presented in this paper with lower redshift cuts for the Foundation sample, but found that it did not impact our conclusions.
To further verify that our redshift cuts are reasonable, we also consider the inferred   values when fitting these samples with the BayeSN model trained in Thorp et al. (2021) as a function of redshift.These results are shown in Figure A1.For each redshift bin, we combine all posterior samples on   across all SNe in the bin, calculate the mean and standard error on the mean for each posterior sample, and then calculate the mean across all posterior samples for these two statistics; these are the plotted values and uncertainties (with the exception of the lowest redshift bin for DES3YR, denoted by a star, in which there is only one SN and hence the plotted error bar corresponds to the posterior standard deviation on that single value.).It should be noted that many of these posteriors will only be upper limits meaning that a point estimate may not be the most appropriate summary statistic, but the mean in each bin remains useful as a first order tool to examine how   varies with redshift.Considering the DES3YR sample, shown in the middle panel of Figure A1, we do not see evidence for a clear decrease in   with redshift below  = 0.4; as mentioned previously, the single, higher   bin denoted by a star at the lowest redshift corresponds to a single object.For the PS1MD sample, shown in lower panel of Figure A1,   remains consistent with redshift between 0.15 <  < 0.35.While some of the lowest redshift bins do have higher   values they also contain fewer SNe.
Overall, we are confident that our samples are not significantly impacted by selection effects, although we cannot rule out the possibility that some small effect may remain.

APPENDIX E: RELATING THE 15 DAY DECLINE IN MAG TO BAYESN'S SHAPE PARAMETER
In this appendix we will derive an expression for the derivative of Δ ,15 =   ( = 15 d) −   ( = 0 d), with respect to BayeSN's light curve shape parameter  1 .Although we will write down this expression for the -band, it can be applied to any passband.
The -band apparent magnitude of a supernova at a rest frame phase  is given by   () = −2.5 log 10 [   ()] +  (E1) where  is the zero-point, and   () is the BayeSN model flux (see Mandel et al. 2022 eq. 4 and 6).The decline in rest-frame -band − 3 .This paper has been typeset from a T E X/L A T E X file prepared by the author.

Figure 1 .
Figure 1.Redshift distributions for the 3 samples used in this work; Foundation, DES3YR and PS1MD.Dashed lines for DES3YR and PS1MD indicate the upper redshift limit of the volume limited sub-samples included in our analysis.

Figure 2 .
Figure 2.Example BayeSN light curve fits to SN2016W from the Foundation sample, 1303883 from the DES3YR sample and 440008 from the PS1MD sample.Bands have been offset arbitrarily for clarity.Note that while the data presented is in magnitude space, all fitting is performed in flux space.Light curves are calculated for all posterior samples of SN latent parameters, and lines and shaded regions represent the mean and standard deviation of all these light curves.
Figure 4 shows joint and marginal posterior distributions of E[  ], √︁ Var[  ],   , Δ 0 and  0 for each mass bin (Figure C1 in Appendix C shows the posteriors for the differences in inferred parameter values between each mass bin).

Figure 3 .
Figure 3. Corner plot showing joint and marginal posteriors on   ,   ,   and  0 as well as E[  ] and √︁ Var[  ] for the PS1MD sample.This demonstrates the importance of considering E[  ] and √︁ Var[  ] rather than just   and   when using truncated distributions.

Figure 4 .
Figure 4. Corner plot showing joint and marginal posteriors on   ,   ,   , Δ 0 and  0 for high-and low-mass hosts for 475 SNe across Foundation, DES3YR and PS1MD, with the split between high-and low-mass hosts at 10 10 M ⊙ .These posteriors are for a model which allows for a constant intrinsic magnitude offset between each mass bin.

Figure 5 .
Figure 5. Left panels: Posterior inference of the baseline dust-and stretch-independent intrinsic absolute SN Ia light curve in each of the  bands, evaluated from the posterior samples of  0,HM/LM .Solid lines and shaded regions represent the posterior mean and standard deviation (uncertainty) of the baseline light curve in each mass bin and band.Right panels: Posterior inference of the difference between baseline light curves depicted in left panels.Solid lines and shaded regions again represent the posterior mean and standard deviation (uncertainty) of the difference.

Figure 6 .
Figure 6.Left panels: Posterior inference of the baseline dust-and stretch-independent intrinsic SN Ia colour curve for  − ,  −  and  −  colours, evaluated from the posterior samples of  0,HM/LM .Solid lines and shaded regions represent the posterior mean and standard deviation (uncertainty) of the baseline colour curve in each mass bin and band.Right panels: Posterior inference of the difference between baseline intrinsic colour curves depicted in left panels.Solid lines and shaded regions again represent the posterior mean and standard deviation (uncertainty) of the difference.

Figure 7 .
Figure 7. Corner plot showing joint and marginal posteriors on E[  ], √︁ Var[  ],   and  0 , as well as derived baseline intrinsic peak -band absolute magnitude and  −  colour, for high-and low-mass hosts for 475 SNe across Foundation, DES3YR and PS1MD.This model allows for a difference in baseline intrinsic SED between each mass bin.

Figure 8 .
Figure8.Upper panel: Distributions of rest-frame apparent  −  colour at peak in both high and low-mass bins.Lower panel: Binned Hubble residual mean and standard error on the mean obtained from fitting SNe in high-and low-mass hosts with the BayeSN model, based on the population parameters inferred when treating these samples separately, binned as a function of restframe apparent  −  colour at peak.Bins are from -0.25 to 0.25 with widths of 0.05 -note that the binning used was identical for high-and low-mass hosts and the bins are offset slightly for clarity.Please note that we have restricted the y-axis range for presentation purposes; there are two data points that lie outside this range which are for bins containing single objects that have large Hubble residuals.

Figure C1 .
Figure C1.Joint and marginal posterior distributions for the difference between inferred parameter values of   ,   ,   , Δ 0 and  0 in high-and low-mass bins.These are for the model which allows for an intrinsic magnitude offset between each mass bin in addition to different host galaxy dust distributions.

Figure C2 .B∫
Figure C2.Joint and marginal posterior distributions for the difference between inferred parameter values of E[  ],√︁ Var[  ],   and  0 , as well as derived baseline intrinsic peak -band absolute magnitude and  −  colour, in high-and low-mass bins.These are for the model which allows for a difference in baseline intrinsic SED between each mass bin in addition to different host galaxy dust distributions.
E[  ] and √︁ Var[  ].The relations between   ,   , E[  ] and √︁ Var[  ] are detailed in Appendix B. We use these relations to calculate posterior distributions on the population E[  ] and √︁ Var[  ] from samples of   and   along our MCMC chains.

Table 3 .
Salim et al. 2018;Nagaraj et al. 2022;Alsing et al. 2024) which allows   and   to vary with redshift.We next consider the possibility of a dust distribution which varies as a function of redshift, using the linear model as outlined in Section 3.3.3.Previous work considering redshift evolution of SNe Ia host galaxy dust properties is scarce.Nordin et al. (2008)considered how such an evolution might lead to systematics in inferred cosmology but did not try to infer how these properties might evolve.Recently,Thorp et al. (2024)used the BayeSN model to compare dust properties between a low-redshift sample of CSP SNe Ia and a higher redshift sample from RAISIN (SN IA in the IR;Jones et al. 2022b), and constrained the size of the shift in   between the two to be −1.16<Δ< 1.38 at 95 per cent posterior probability.Although there is literature concerning relations between galaxy dust properties and other galaxy properties such as SFR and sSFR which are known to evolve with redshift (e.g.Salim et al. 2018;Nagaraj et al. 2022;Alsing et al. 2024), we note that these are based on galaxy attenuation and will not necessarily be the same when considering SN line-of-sight extinction.As such, we have no strong prior expectation on what we expect from this analysis.Our results are shown in Table

Table A1 .
Inferred   values using our hierarchical dust model for the Foundation, DES3YR and PS1MD samples with different upper redshift cuts.