A Hierarchical Bayesian SED Model for Type Ia Supernovae in the Optical to Near-Infrared

While conventional Type Ia supernova (SN Ia) cosmology analyses rely primarily on rest-frame optical light curves to determine distances, SNe Ia are excellent standard candles in near-infrared (NIR) light, which is significantly less sensitive to dust extinction. A SN Ia spectral energy distribution (SED) model capable of fitting rest-frame NIR observations is necessary to fully leverage current and future SN Ia datasets from ground- and space-based telescopes including HST, LSST, JWST, and RST. We construct a hierarchical Bayesian model for SN Ia SEDs, continuous over time and wavelength, from the optical to NIR ($B$ through $H$, or $0.35 -1.8\, \mu$m). We model the SED as a combination of physically-distinct host galaxy dust and intrinsic spectral components. The distribution of intrinsic SEDs over time and wavelength is modelled with probabilistic functional principal components and the covariance of residual functions. We train the model on a nearby sample of 79 SNe Ia with joint optical and NIR light curves by sampling the global posterior distribution over dust and intrinsic latent variables, SED components, and population hyperparameters. The photometric distances of SNe Ia with NIR data near maximum light obtain a total RMS error of 0.10 mag with our BayeSN model, compared to 0.14 mag with SNooPy and SALT2 for the same sample. Jointly fitting the optical and NIR data of the full sample for a global host dust law, we find $R_V = 2.9 \pm 0.2$, consistent with the Milky Way average.


INTRODUCTION
Type Ia supernovae (SNe Ia) are effective cosmological probes as "standardiseable candles": their peak luminosities can be inferred from their optical light curve shapes and colours, so their distances can be estimated from their apparent brightnesses.Precise and accurate SN Ia distances with small systematic errors are essential to accurate constraints on the cosmic expansion history, including local measurements of the Hubble constant (Burns et al. 2018;Riess et al. 2019), the late-time cosmic acceleration (Riess et al. 1998;Perlmutter et al. 1999), and the properties of the dark energy driving it, in particular, its equation-of-state parameter w (e.g.Scolnic et al. 2018;Abbott et al. 2019).Currently, E-mail: kmandel@ast.cam.ac.uk there is a significant 4.4σ tension between the value of H0 locally inferred from SNe Ia and the local distance ladder (74.03±1.42km s −1 Mpc −1 ; Riess et al. 2019) and the value derived from Planck CMB analysis and the ΛCDM cosmological model (67.4 ± 0.5 km s −1 Mpc −1 ; Planck Collaboration et al. 2018).Since this tension could potentially be a sign of new physics, it is imperative to test for systematic errors with empirical cross-checks (e.g.Dhawan et al. 2018).With increasing sample sizes, nearby SNe Ia will be able to constrain the growth of structure as probes of the peculiar velocity field (e.g.Huterer et al. 2017;Howlett et al. 2017;Graziani et al. 2020).Further cosmological constraints can also be derived from strong or weak lensing of SNe Ia (e.g.Goldstein et al. 2018;Dhawan et al. 2020;Macaulay et al. 2020).In this paper, we present a new data-driven statistical model, BayeSN, for SNe Ia light curves to extract more pre-cise and accurate SN Ia distances from current and future surveys by exploiting the advantageous properties of SNe Ia in the near-infrared (NIR).
The current global sample used for cosmology, derived from the SDSS-II, SNLS, Pan-STARRS (PS1), low-z and HST surveys, has grown to over a thousand SNe Ia (Pantheon; Scolnic et al. 2018).Future surveys, such as the Legacy Survey of Space and Time (LSST, Ivezić et al. 2019) with the Vera Rubin Observatory, will boost that number by orders of magnitude.The constraints on dark energy with the current sample are already limited, not by statistical uncertainties from the numbers of SNe, but by systematic errors.In recent analyses, photometric calibration and SN model uncertainties dominate the systematic error budget (PS1: Scolnic et al. 2014bScolnic et al. , 2018;;DES: Brout et al. 2019).The calibration systematics are now being tamed by better cross-calibration between surveys (Scolnic et al. 2015), better networks of photometric standards (Narayan et al. 2016(Narayan et al. , 2019)), and by ongoing efforts to replace the heterogenous low-redshift sample with a large, unbiased, homogenous sample obtained on a precisely-calibrated photometric system (PS1, Foundation Survey; Foley et al. 2018b;Jones et al. 2019).LSST will increase the cosmologically useful SN Ia sample to ∼ 10 5 over its 10 year duration.It will further diminish cross-survey calibration systematics by replacing previous high-redshift SN Ia surveys with a single, homogeneous, and large SN Ia sample taken on a single system.However, systematic errors due to the statistical models and methods used to analyse SN Ia light curve data will remain.
Observations probing the rest-frame near-infrared (NIR, particularly λ 1 µm, e.g.Y JH-bands) are a route to more precise and accurate distances.NIR observations of SNe Ia significantly improve their cosmological utility.Unlike in the optical, where they must be standardised via correlations of optical luminosity with light-curve shape and colour, SNe Ia are excellent, nearly-standard candles in the NIR, showing little intrinsic luminosity variation (∼ 0.1 mag) at peak (e.g.Krisciunas et al. 2004c;Wood-Vasey et al. 2008;Mandel et al. 2009;Contreras et al. 2010;Kattner et al. 2012;Phillips 2012;Barone-Nugent et al. 2012;Stanishev et al. 2018;Burns et al. 2018;Avelino et al. 2019).The NIR also has significantly reduced sensitivity to dust extinction relative to the optical (by factors of 4 − 8, comparing NIR Y JH to optical B).Dhawan et al. (2018) showed how a small set of SNe Ia, used as NIR standard candles to measure H0, can replace a much larger optical sample, while still providing a 4.3% measurement (consistent with Riess et al. 2019), without any light-curve shape or colour corrections as are required in the optical.We recently compiled a sample of 89 nearby SNe Ia with optical and NIR light curves passing standard quality cuts (Avelino et al. 2019).Using 56 SNe Ia with NIR data near peak brightness, where the luminosity dispersion is minimal, we found a 35% reduction in Hubble Diagram scatter (i.e. more precise distances) when using SNe Ia as NIR standard candles, relative to conventional optical-only fits to the same SNe.
The combination of optical and NIR data better constrains the host galaxy dust extinction and the shape of the dust law as a function of λ (parametrized by RV ) (Krisciunas et al. 2007;Burns et al. 2014), and significantly improves the accuracy and precision of SN Ia distances (Mandel et al. 2011).The nature of the dust in SN Ia host galaxies is fun-damental to the largest "correction" in standardising SNe Ia, that due to colour.Incorrect modelling interpretation of the SN Ia colour-magnitude relation is therefore a major source of systematic error in SN distances.However, the correct values(s) of the RV parametrizing the dust extinction law has long been a matter of confusion, and its proper estimation is fraught with statistical subtleties.
Very early analyses that found unphysically low values RV 1 (Branch & Tammann 1992) did not account for correlations between the luminosity, colour and light curve shape (later modelled by e.g.Phillips 1993;Riess et al. 1996a;Phillips et al. 1999).Riess et al. (1996b) noted that confusing intrinsic colour-luminosity variation with dust effects would lead to mistakenly lower estimated RV values.Simple linear regression analyses of SN (extinguished) absolute magnitudes against (apparent) colours and light curve shapes have led to apparent colour-magnitude slopes (e.g.β in the Tripp formula) that have sometimes been interpreted as low dust RV values (Tripp 1998;Tripp & Branch 1999;Guy et al. 2005;Astier et al. 2006;Conley et al. 2007;Freedman et al. 2009).Scolnic et al. (2014a) highlighted the relevance of colour dispersion to estimating a Milky Way dustlike colour-magnitude slope.Mandel et al. (2017) showed that statistical confounding of the intrinsic color-luminosity correlation and dispersion with the extrinsic effects of dust leads to estimates of β that are biased low relative to the true dust RV , and a probabilistic generative model with explicit parameters for these physically-distinct effects led to a Bayesian estimate of RV = 2.8 ± 0.3, consistent with the Milky Way average.
Anomalously low RV ≈ 1.5 − 1.8 values have been estimated for a few very highly reddened SNe Ia (E(B −V ) > 1) (e.g.Elias-Rosa et al. 2006, 2008;Wang et al. 2008;Amanullah et al. 2014).While the origin of these low RV estimates is still under investigation (Wang 2005;Goobar 2008;Amanullah & Goobar 2011;Phillips et al. 2013;Amanullah et al. 2015;Johansson et al. 2017;Bulla et al. 2018a,b), these very red SNe are not present in the cosmological sample, due to the standard cut on peak apparent SN colour (B − V 0.3).When only low-to moderately-reddened normal SNe Ia with apparent colours consistent with the cosmological sample are considered, values of RV ≈ 2.5 − 3 have generally been estimated in nearby samples, often by utilising spectroscopic or NIR data to break the degeneracy between intrinsic colours and dust in the optical (Folatelli et al. 2010;Mandel et al. 2011;Foley & Kasen 2011;Chotard et al. 2011;Phillips 2012;Burns et al. 2014;Mandel et al. 2017;Léget et al. 2020).
The excellent properties of the NIR have not been fully integrated into and leveraged by the statistical models routinely used for SN Ia cosmology.We have constructed a new, hierarchical Bayesian model, BayeSN, for time-dependent SN Ia spectral energy distributions (SEDs) from the optical to NIR wavelengths.With NIR coverage, our model leverages the low luminosity dispersion in the NIR, while its wide optical-to-NIR wavelength range enables it to more stringently constrain the host galaxy dust, and the dust law, affecting the SNe Ia.These two advantages enable us to more accurately improve our model of the intrinsic SED coherently across all wavelengths.While it produces the best distance estimates when fitting complete light curves across the full wavelength range, as a Bayesian model, it also makes the most effective use of the observations available in any par-tial dataset, e.g.optical-only, NIR-only, while marginalising over the unobserved parts of the SED.
BayeSN is an essential tool, not only for properly analysing current datasets, but also extracting optimal distances and robust cosmological constraints from future optical and NIR SNe Ia observations.Beyond the datasets analysed in the present work, the ability to effectively leverage joint optical and NIR observations is crucial for fully exploiting a number of recent and current surveys and forthcoming datasets, including the Carnegie Supernova Project-II (CSP-II; Phillips et al. 2019), the Foundation Supernova Survey (Foley et al. 2018b) and Young Supernova Experiment with Pan-STARRS, RAISIN (GO-13046, GO-14216) and SIRAH (GO-15889) with the Hubble Space Telescope (HST), the ESO VISTA Extragalactic Infrared Legacy Survey (VEILS), and the DEHVILS Survey using UKIRT.This is also important for LSST, which will observe SNe Ia in ugrizy, and will therefore probe rest-frame z or y to redshifts z 0.3.The Nancy Grace Roman Space Telescope (RST, formerly WFIRST) will have a dedicated SN survey and its wide NIR filters will probe rest-frame Y JH out to redshifts z 1, 0.7, 0.4 respectively.

Comparison to existing models
The models used to analyse SN Ia light curves and estimate distances are entirely empirical and are learned from the data.The conventional approach has a number of shortcomings that need to be addressed to exploit fully the data and to control astrophysical and modelling systematics.The model most commonly used for fitting optical SN Ia light curves is SALT2 (Guy et al. 2007(Guy et al. , 2010;;Betoule et al. 2014).It models the SN Ia SED in phase (rest-frame time since peak luminosity) and wavelength, as a function of optical light curve shape (x1) and apparent colour (c) at peak.SN Ia light curve fits estimate these parameters and the optical peak apparent magnitude mB.Photometric distances are obtained from a fitted linear dependence of SN Ia absolute magnitude on light curve shape and colour (Tripp 1998): where µs is the distance modulus of an individual SN s, (mB,s, x1,s, cs) are parameters obtained from the SALT2 fit of the individual SN s, and (α, β, MB) are global (or population) parameters describing the luminosity trends with light curve shape and colour, and the absolute magnitude intercept at x1 = c = 0, respectively.Major shortcomings of the conventional approach are: (i) Residual ("Intrinsic")1 scatter systematic error: Spectral variations of SN Ia light curve data around the best-fit SED model in excess of measurement error are accounted for by an "intrinsic scatter model."This characterises the covariance of SN Ia spectral residuals unaccounted for by the SALT2 light curve shape and apparent colour parameters.It is not well constrained, and currently there are two options: one with 30% chromatic variation and 70% achromatic variation (Guy et al. 2010), and the other, based on Chotard et al. (2011), with a 75% : 25% split.Scolnic et al. (2014a) showed that both models are consistent with the cosmological SN Ia data, therefore the current optical data alone cannot discriminate between the two.However, the impact of changing the assumed model for the residual scatter in a cosmological analysis results in a shift ∆w ∼ 0.04, and thus is a dominant systematic error.
Employing the correct residual covariances across phase and wavelength is crucial to the proper quantification of uncertainties and weighting of the SN data.Our BayeSN SED model coherently estimates the intrinsic residual covariance across phase and wavelength simultaneously with the training of the entire hierarchical model, and this covariance is employed when fitting SN light curves to estimate dust and distance, while marginalising over the SED residuals.
(ii) Degeneracy between intrinsic vs. dust colourluminosity variations: The largest "correction" in Eq. 1 is due to colour, but the conventional analysis treats it in a simplistic way.Fundamentally, intrinsic variation and dust have physically-distinct effects on the SN Ia SED.However, the SALT2 model assumes that all colour variation can be described by the peak apparent B − V colour parameter c and a single, effective colour law, CL(λ).The conventional approach of fitting a single linear function for (extinguished) absolute magnitude versus apparent colour confounds the two effects.
In contrast, our BayeSN SED model allows for a probabilistic, physically-motivated combination of different spectral effects from intrinsic SN variation and dust across time and wavelength.
(iii) Lack of NIR coverage: The most recent SALT2.4 model is only specified over rest-frame wavelengths of 0.2 − 0.9 µm, though the colour law for λ > 0.7 µm is an extrapolation.Although optical surveys, such as Foundation (Foley et al. 2018b), routinely obtain z-band data, they cannot be fit by SALT2 for nearby SNe Ia.SALT2.4 is incapable of leveraging the useful properties of SNe Ia in the rest-frame NIR at λ 1µm.
In contrast, our BayeSN SED model is trained on data covering optical to NIR wavelengths extending from B through H-band (0.35−1.8 µm) and uses Bayesian inference to combine information over the full phase and wavelength range for optimal estimates of dust and distance.
Although SALT2 is the most common SN model used in cosmology, there are alternatives.SNooPy is an optical-NIR model for SN Ia light curves defined in discrete restframe uBV griY JH passbands (Burns et al. 2011).It is not a model for the continuous SED; rather, for each discrete restframe passband it has a template light curve that varies as a function of a shape parameter (either ∆m15(B) or sBV ).It requires the calculation of K-corrections of the photometry between each observer-frame passband into a corresponding rest-frame model passband as a preprocessing step.The template light curve model is then fit to the K-corrected data in the rest-frame bands.This 1-to-1 mapping is not ideal, as there are redshifts at which, for example, wide HST WFC3 NIR filters significantly cover two rest-frame model passbands, so the observed light curves are actually sensitive to the statistical properties of the underlying SED in both rest-frame bands.Furthermore, the K-correction calculation employs an ad-hoc "mangling" procedure to match a spectral template to observed colours independently at each epoch.This is prone to overfitting, its uncertainties are difficult to propagate, and is not viable for the noisy, sparse data typical of high-z light curves, in which the light curves in different passbands may be irregularly and asynchronously sampled.We compare the results from BayeSN to those from applying SALT2 and SNooPy to the same SNe Ia in §5.
SNEMO (Saunders et al. 2018;Rose et al. 2020) and SUGAR (Léget et al. 2020) are recent empirical models built from optical spectrophotometric time series.Whereas SNEMO is a principal components model for the optical SED, SUGAR models the spectral dependence on factors composed of spectral line characteristics.However, they only cover rest-frame 0.33 < λ < 0.86 µm, and so they cannot leverage the valuable NIR at λ 1 µm.

Outline of paper
The outline of this paper is as follows.In §2, we describe our new hierarchical Bayesian model for SN Ia SEDs in the optical to NIR.In §3 we describe the compilation of optical and NIR SN Ia light curve data that we analyse.In §4, we describe our computational implementation for training the BayeSN model and fitting SN Ia light curves.In §5, we present our results, including a Hubble diagram showing the improvement in distances (to 0.10 mag total RMS error) obtained from BayeSN fits to optical and NIR data compared to current methods applied to the same sample.We also describe our inferences about host galaxy dust, for which we constrain a global value of RV = 2.9 ± 0.2 for our sample with E(B − V ) host 0.4.In §6, we conclude.
In Appendix §A, we describe technical aspects of modelling our SED surfaces, and in Appendix §B, we describe the extension of our model to a second functional component.

THE STATISTICAL MODEL
To construct and train our SN Ia SED model, we employ a hierarchical Bayesian approach.Hierarchical Bayes provides a principled, coherent framework for modelling multiple uncertain and random effects underlying the data described via a probabilistic generative model.It is a natural strategy for probabilistic modelling and inference of populations as well as their constituent individuals (Gelman et al. 2013;Loredo & Hendry 2010, 2019).The first applications of hierarchical Bayes to supernova analyses were demonstrated by Mandel et al. (2009Mandel et al. ( , 2011)), who developed probabilistic models for SN Ia optical and NIR light curves in discrete passbands.Mandel et al. (2014) constructed a hierarchical Bayesian model to disentangle dust reddening from intrinsic colours in the optical by leveraging the velocity-colour relation (VCR; Foley & Kasen 2011).
Other hierarchical Bayesian models for SN Ia analysis have focused exclusively on analysing the 3-parameter output from SALT2 fits to SN Ia light curves (March et al. 2011;Rubin et al. 2015;Shariff et al. 2016b;Mandel et al. 2017;Hinton et al. 2019), rather than the observed data itself.Since they do not attempt to directly model the irregularly and asynchronously sampled multivariate, multi-band light curve (time series) data, they are dependent on the internal shortcomings of SALT2 described in §1.1.
In constrast, our BayeSN SED model combines the hierarchical Bayesian strategy with techniques from functional data analysis (e.g.Ramsay & Silverman 2005) to deal with the full complexity of observed photometric time series, and to perform probabilistic inference on the multiple time-and wavelength-dependent latent functions underlying the observed data.In particular, we model the modes of variation of the intrinsic SED in terms of a Bayesian formulation of functional principal components.While principal components analysis (PCA) is a standard tool for dimensionality reduction of multivariate data, in its conventional use, however, it lacks a probabilistic framework.Probabilistic and Bayesian formulations of PCA for multivariate vectorial data were described by Roweis (1998); Tipping & Bishop (1999); Bishop (1999Bishop ( , 2006)).In particular, Tipping & Bishop (1999) constructed a probabilistic PCA as a special case of a Gaussian latent variable model for factor analysis, with an associated likelihood function mapping between a low-dimensional latent space and the high-dimensional data space, and a prior distribution over the latent variables.Bishop (1999) further developed Bayesian PCA by introducing priors on the principal components and residual variance.These useful probabilistic formulations enable us to embed a principal components model within our hierarchical Bayesian framework while simultaneously accounting for multiple random effects and sources of uncertainty, such as dust, distance, and measurement error.Thus, we can determine the intrinsic principal components while marginalising over the other uncertainties in the global inference problem.
A primary goal of BayeSN is to model populations of latent SED functions over time and wavelength, so we extend these concepts to functional data, by incorporating continuity and smoothness constraints on the functional principal components, and by modelling the time-and wavelengthdependent covariance structure of the residual functions.In this paper, we deal mainly with photometric flux data, which are essentially integral constraints (under the passband throughput and with measurement errors) on the latent SED component functions.Embedding the functional inference within a hierarchical Bayesian structure naturally effects regularisation to solve the inverse problem.
A schematic depiction of the probabilistic forward model of the SED for a single supernova's light curve data is shown in Fig. 1.We construct a log intrinsic SN SED across time and and optical to NIR wavelengths by modifying a mean intrinsic SED function with functional principal components scaled by latent SED shape parameters.This is further modified by the dust extinction law as a function of wavelength, scaled by the dust extinction parameter.A random function described by a covariance matrix models the SED residuals, as a function of time and wavelength, that are not captured by the previous main modes of variation.The combination of these effects yields the latent hostreddened SED in the SN rest-frame.Finally, the effects of distance, redshifting and time dilation, integration of the flux under the observer's filter functions, the observational cadence of the survey, and photometric measurement error yields the observed multi-band optical and NIR time series (light curves) of a SN Ia.

Flux Data Model
Suppose supernova s with spectroscopic redshift zs has a distance modulus µs.The ith photometric observation of SN s is taken at observer-frame Modified Julian Date (MJD) T i s through a filter with a transmission function Ts,i(λo) as a function of observed wavelength λo.The calibration standard has an SED F std (λo), which defines the reference magnitude in the passband.The calibrated flux ("FluxCal" in SNANA; Kessler et al. 2009) is the ratio of the SN flux at the observer F s,i obs integrated over the passband to the flux of the standard star through the same passband: fs,i = 10 0.4×Z s,i × F s,i obs (λo) Ts,i(λo)λo dλo F std (λo) Ts,i(λo)λo dλo = 10 0.4×Z s,i F s,i obs (λo) Bs,i(λo)λo dλo. ( where Zs,i is the zeropoint for this observation 2 .The passbands used in this analysis are described in §3.2.We define 2 Z s,i = 27.5+mstd s,i , where m std s,i is the reference magnitude in the passband, and 27.5 is a conventional scaling applied in SNANA files, i.e the flux ratios are multiplied by 10 0.4×27.5 = 10 11 for convenience.

×
Ts,i(λo)λo dλo F std (λo) Ts,i(λo)λo dλo . (3) The model flux value can be converted to an apparent magnitude, on the system of the standard, like so: ms,i = −2.5 log 10 [fs,i] + Zs,i = −2.5 log 10 F s,i obs (λo) Bs,i(λo)λo dλo. (4) Now we model the observable flux density F s,i obs (λ0) (per unit wavelength) for observation i of SN s.If the MJD date of B-band maximum is T max s , then we define the rest-frame phase of this observation as t i s ≡ (T i s − T max s )/(1 + zs).We denote the effective SED in the SN rest-frame, extinguished by host galaxy dust, as Ss(t, λr).The flux density of the light from SN s at observed wavelength λo and at time T i s at the Earth is: The last term is the attenuation of flux by dust along the line-of-sight within the Milky Way Galaxy.The V -band Milky Way extinction is obtained from the reddening map (Schlafly & Finkbeiner 2011), A s MW = E(B − V ) s MW × RMW, and we adopt RMW = 3.1 and the Fitzpatrick (1999) × Bs,i[λr(1 + zs)] λr dλr. (6) This calibrated flux is measured with some photometric noise with a given variance σ 2 s,i , and we assume a Gaussian sampling distribution for the measured flux fs,i: For all the observations i (across all observation times and filters) of SN s, the measurement likelihood is assuming independence of the flux measurement errors.

Dust and Intrinsic Supernova SED Model
The host-dust-extinguished SED is obtained from the intrinsic SED S int s (t, λr) in the SN rest-frame via where A s V is the host galaxy dust extinction and ξ(λr; RV ) is the extinction law with parameter RV .We adopt the extinction law of Fitzpatrick (1999).
Our model intrinsic SN spectral energy distribution is a function of rest-frame phase t and λr.We decompose it into a global spectral template modified by individual effects that vary per SN s.

S int
s (t, λr) = S0(t, λr) × 10 −0.4M 0 × 10 −0.4 W 0 (t,λr ) × 10 −0.4 δMs × 10 −0.4 δWs(t,λr ) (10) where M0 ≡ −19.5 is fixed normalisation factor and the fixed function S0(t, λr) is the updated spectral template of Hsiao (2009).This template spans 0.1 µm to 2.5 µm from −20d to +85d past B-band maximum, and was constructed from over 1,000 spectra, including NIR spectra from Marion et al. (2009), using the procedure described in Hsiao et al. (2007).It is arbitrarily normalised to have a B-band magnitude of zero at peak phase t = 0.The blue terms on the top line altogether describe the global spectral template.They model the baseline mean intrinsic SED as the Hsiao (2009) spectral template, normalised by M0 and smoothly warped by the function W0(t, λr) to match the inferred mean intrinsic colours of the training sample.
The red terms on the bottom line describe the individual effects, the modifications to the global SED that are specific to each supernova s.The δMs term corresponds to an overall shift of the log SED that is independent of phase and wavelength.The function δWs(t, λr) corresponds to phase-and wavelength-dependent effects.We further decompose this function as: Note that M0, W0(t, λr), δMs, and δWs(t, λr) are in units of magnitude, like µs and A s V .The advantage of modelling the logarithm of the SED is that we can easily preserve positive flux at all phases and wavelengths while having priors on the functional components W k (t, λr) and the latent principal component scores that span positive and negative reals.

Magnitude Approximation
For the vast majority of the nearby training set used in this work, the flux data have high signal-to-noise.Therefore, it is a good approximation to convert these data to magnitudes.The magnitude measurement and the variance of its measurement error are ms,i = −2.5 log 10 [ fs,i] + Zs,i and Transformation of the model flux (Eq.6) to the model magnitude ms,i yields ms,i = µs + M0 + δMs − 2.5 log 10 (1 + zs) Using this, we can change the measurement likelihood function, Eq. 7, to: This form is useful since the model magnitude inside the likelihood is linear in some of the parameters.However, the full flux model (Eq.7) allows us to use low signal-to-noise, or even negative, flux measurements which cannot be reliably converted into magnitudes with Gaussian errors.

2D SED Surface Models
We model the unknown functions {W k (t, λr) : k = 0, . . ., K}, and { s(t, λr) : s = 1, . . ., NSN} in a flexible, data-driven manner.Each function is represented as a surface defined by a 2D grid of knots.We specify a 2D grid as the Cartesian product of a 1D grid in rest-frame phase, τ , and a 1D grid in rest-frame wavelength l.Each 1D grid can be irregularly spaced.Algorithmic details are described briefly in Appendix A. The essential idea is that a generic, smooth surface g(t, λr) at any point (t, λr) in the 2D domain of the SED can be modelled as g(t, λr) = s(λr; l) T G s(t; τ ), where s(x; ξ) denotes the 1D natural cubic spline smoother (column) vector for knots ξ at evaluated at point x.The knots matrix G has elements Gij = g(t = τj, λr = li), which define the values the surface must pass through at the knot locations, and are parameters for inference.
Using this, we model the functions of phase and wavelength in terms of knot matrices {W k : k = 0, . . ., K} and {Es : s = 1, . . ., NSN}, like so.For the global correction to the mean template: For the functional components (k = 1, . . ., K), For the residual SED functions of each SN s, s(t, λr) = s(λr; l) T Es s(t; τ ).( 18) These latent functions are determined by the unknown matrices {W k : k = 0, . . ., K}, and {Es : s = 1, . . ., NSN}, which are inferred as hyperparameters and latent variables.We specify a set of knots on a grid in rest-frame phase and wavelength.The phase coordinates are, e.g.τ = (−10, 0, 10, 20, 30, 40) days.The phase spacing is justified since we know that SN Ia light curves vary smoothly on ∼ 10 day timescales.The wavelength coordinates are l.We place a knot at the central wavelengths of the filters BV riY JH plus two outer knots bracketting these: l = (0.3, 0.43, 0.54, 0.62, 0.77, 1.04, 1.24, 1.65, 1.85) µm.The purpose of the first and last knots in wavelength is to ensure that our spline surfaces are defined throughout the entire first (B) and last (H) broadband filters.To avoid degeneracies, we "tie down" the residual knot matrices at the first and last wavelength knots for every phase knot: Es,ij = 0 if i = 1 or i = dim(l), ∀ j.

Population Distributions and Hyperpriors
We specify the population distributions on the latent parameters of individual supernovae.
For the latent functional SED effects, following the probabilistic PCA formulation (Tipping & Bishop 1999), we adopt the standard Gaussian prior θ k s ∼ N (0, 1) for the individual score of each SN s in each component k = 1, . . ., K. Thus, the resulting functions W k (t, λr)(k ≥ 1) are not scaled to have unit norm, as they would be in standard PCA.
Rather, because the latent scores θ k s are normalised to have a population variance of one, W k (t, λr) absorbs a factor of the population standard deviation in that component.A "1σ" effect of the k-th component on the SED is thus computed from θ k W k (t, λr) by varying ∆θ k ± 1 around the mean.
For the W k matrices that parametrize our functional components, we use an independent standard normal hyperprior on the value of each knot: W k,ij ∼ N (0, 1).This is a weak constraint, since we have scaled the problem to expect these variations to be of order tenths of a magnitude.
For the residual SED perturbations, we assume a multivariate Gaussian distribution on the column-wise vectorisation of each residual matrix Es: A matrix Γ(t, λr; τ , l) can be constructed so that Eq. 18 can be written equivalently as, While Eq. 18 and Eq.20 are equivalent, Eq. 18 is the more compact representation, since Γ(t, λr; τ , l) tends to a very large (but sparse) matrix.However, Eq. 20 is useful, because, together with the residual distribution Eq. 19, it implies that the residual functions s(t, λr) are realisations of a Gaussian process (GP; Rasmussen & Williams 2005): with a zero prior mean and a non-stationary kernel for the covariance of the residuals at any two coordinates: We adopt this non-stationary covariance structure rather than the more popular stationary kernels, such as squared exponential, since we do not expect the complex physical mechanisms of SN Ia explosions to generate statistical properties that are invariant to phase or wavelength shifts.GPs have been previously used to model spectra, e.g. by Czekala et al. (2015Czekala et al. ( , 2017)).
The covariance matrix Σ encodes the variances and correlation structures of the residual functions: Σ = diag(σ ) R diag(σ ).Following the separation strategy proposed by Barnard, McCulloch & Meng (2000), we specify separate priors on the standard deviation parameters σ and the correlation matrix R .For each q-th element σ ,q ≥ 0, we adopt a weakly-informative half-Cauchy hyperprior with unit scale (Gelman 2006;Polson & Scott 2012), i.e.P (σ ,q ) = HC(σ ,q | a = 1), with probability density for x ≥ 0, and zero otherwise.This hyperprior is proper, and relatively flat for small x.It is sensible because we have scaled the problem to expect σ ,q to be less than a magnitude.For the correlation matrix, we adopt the LKJ hyperprior as implemented in Stan and derived from Lewandowski, Kurowicka & Joe (2009).
with η = 1.This places a uniform prior on positive semidefinite correlation matrices.
The δMs terms model a phase-and wavelengthindependent shift of the SED in overall log luminosity.Since these shifts are indistinguishable from the effect of distance on the apparent light curves, this propagates into an uncertainty floor on photometric distance estimates.We model the population of these shifts as δMs ∼ N (0, σ 2 0 ) and estimate their variance σ 2 0 as a hyperparameter.We use a weak half-Cauchy prior (Eq 23) on σ0 with scale a = 0.1, since we expect this to be of order a tenth of magnitude.
We assume that host galaxy extinction A s V is drawn from an exponential distribution with mean extinction hyperparameter τA: for AV ≥ 0 and zero otherwise.This is a sensible choice, since the true A s V must be non-negative, and we expect the most lines of sight through the host galaxies to pass through little dust, with the probability density decreasing with increasing column density.This model distribution has been used before by, e.g.Jha et al. (2007) and Mandel et al. (2009).The hyperprior we adopt for τA is also a unit half-Cauchy, P (τA) = HC(τA, 1), reflecting our expectations that the typical τA is on the order of tenths of a magnitude.For the unknown RV , we assume a single global value with a uniform hyperprior RV ∼ U (1, 5) reflecting a wide range of possible values.In the future, we can expand our framework to allow per-SN variation in R s V by modelling and inferring their population distribution, as was done previously by Mandel et al. (2011).

External Distance Constraints
In the training phase, we use estimates μext,s of the SN distance moduli that are external to the photometric SN data, as described in Avelino et al. (2019).We assume they have Gaussian errors around the true distance modulus.
For the vast majority of the training set, we utilise the redshift as an indicator of distance conditional on the fiducial cosmological model μext,s = µΛCDM (zs) with ΩM = 0.28 and ΩΛ = 0.72.However, at these redshifts z < 0.04, these distances are relatively insensitive to the cosmological parameters, other than H0 which only sets an overall scale for all absolute magnitudes, and for which we adopt 73.24 km s −1 Mpc −1 (Riess et al. 2016).These redshifts are corrected to the CMB frame and corrected for bulk flows.The distance modulus uncertainty, due to errors in observed redshift zs as estimates for the cosmological redshift z c s , from redshift and peculiar velocity uncertainties is where we have adopted σpec = 150 km s −1 (Carrick et al. 2015).The external distance constraint can be expressed as P (µs| zs) ∝ N (μext,s| µs, σ2 ext,s ) after marginalising out the unknown z c s .For eight SNe Ia in our training set at z < 0.01, we use external distance estimates μext,s from available redshiftindependent measures (e.g.Cepheids), and their uncertainties σ2 ext,s , as listed in Table 4 of Avelino et al. (2019).These external distance constraints can be expressed as ext,s ).
The global posterior distribution of all the latent variables of individual supernovae and the population hyperparameters given the data, external distance constraints, and redshifts is This global posterior distribution is the objective function for training our model to learn the population hyperparameters, covariance structure, and SED components while marginalising over the latent variables of individual SNe Ia.It provides a coherent, probabilistic quantification of uncertainty of over all parameters and hyperparameters.

Photometric Distance Estimation
The training process gives us posterior estimates of the hyperparameters Ĥ ≡ ( Ŵ0:K , Σ , σ2 0 , τA, RV ) marginalised over all latent variables in the sample.For simplicity, we take the posterior means of these hyperparameters as point estimates.Under distance-fitting mode, we condition on the hyperparameters, and the posterior density of the latent parameters of any given SN s is P (θs, es, δMs, A s V , µs| fs; Ĥ) ∝ P ( fs| θs, es, δMs, A s V , µs; Ŵ0:K , RV ) where we omit any external distance constraint.By sampling this joint posterior, we can approximate the marginal posterior density of the photometric distance modulus, P (µs| fs; Ĥ) = P (θs, es, δMs, A s V , µs| fs; Ĥ) dθs des dδMs dA s V , as well as its posterior summaries such as the mean and variance via marginalisation.However, this distribution is not necessarily Gaussian, nor is it required to be, since dust effects introduce some asymmetry.V of a SN s is drawn from a population distribution of extinction values, and the dust law is parametrized by an unknown R V .The effects of dust and distance modulus µs on the intrinsic SED combine (with appropriate redshifting and time dilation) to yield the apparent SN SED.This is observed with some observational cadence and noise through the observer's filter functions to yield the optical and NIR light curve data.During training, the distance is constrained externally to the light curve by the cosmological redshift and the fiducial cosmological parameters (fixed in this low-z analysis).The redshift is observed with some error, including uncertainties in local peculiar velocities.The hierarchical global posterior density (Eq.28) estimates the unknown latent variables and hyperparameters conditional on the observed data of the entire SN Ia sample.
In principle, instead of fixing the hyperparameters to their posterior means from training, we could use the samples of the joint posterior over the hyperparameters (Eq.28) to incorporate their uncertainties into the photometric distance estimates.However, this is a more computationally burdensome process, and we have found the posterior means to be sufficient for our purposes.

Probabilistic Graphical Model
Our hierarchical Bayesian model can be depicted with a type of probabilistic graphical model called a directed acyclic graph, shown in Fig. 2. Each open box presents a set of unknown parameters or hyperparameters, each grey-shaded box represents observed data, and the arrows indicate relations of conditional probability.Parameters within the plate, labelled s = 1, . . ., NSN, are repeated for every SN s, whereas parameters outside the plate represent global or population parameters.The intrinsic SED of a single SN Ia s is constructed from the mean SED and functional principal components W0:K (t, λr), a draw of the FPC scores θs from its population distribution, and a draw of an intrinsic residual SED function from its population distribution described by a covariance function over time and wavelength.The host galaxy dust extinction A s V of a SN s is drawn from a population distribution of extinction values, whereas the unknown RV , parametrizing the dust law, is given a wide prior.The effects of dust and distance modulus µs on the intrinsic SED combine (with appropriate redshifting and time dilation) to yield the apparent SN SED.This is observed with some cadence and noise through the observer's filter functions to yield the optical and NIR light curve data.During training, the distance is constrained externally to the light curve by the cosmological redshift and the fiducial cosmological parameters (fixed in this low-z analysis).The redshift is observed with some uncertainty due to local peculiar velocities.Bayesian inference with the hierarchical model solves the inverse problem through the computation of the posterior probability density (Eq.28) of the unknown latent variables and hyperparameters conditional on the observed data of the entire SN Ia sample.

Optical and NIR Light Curve Data
We use the compilation of low-z SNe Ia with joint optical and NIR light curves described in Avelino et al. (2019).For our purposes, we define the optical as the BV RI filters and the NIR as the Y JH.In our various analyses below, we fit either the available optical (BV RI) or optical+NIR (BV RIY JH) for a given SNe Ia.The selection criteria and cuts were described in Avelino et al. (2019) and detailed information on the specific SNe is listed in their Tables 2  & 3.In particular, a colour excess cut E(B − V ) host ≤ 0.4 was applied for consistency with the cosmological sample.For each SN Ia, we use published optical and NIR data only from the same survey; we do not mix data sources within a single SN.Consequently, SN 2005bo, SNF20080514-002, SN 2010iw, SN 2010kg, SN 2011ao, SN 2011B, SN 2011by, SN 2011df were removed because they had NIR data, but no published optical data, from the CfA.SN 2006bt was removed because it is a known peculiar supernova (Foley et al. 2010).We failed to fit the light curves of SN 2000E (Valentini et al. 2003) with our current model (although it was not an outlier in the Hubble Diagram), so we have omitted it to avoid biasing the training.
The resulting sample comprises 79 SNe Ia with joint optical and NIR light curves.Avelino et al. (2019) further defined a subset with NIR data near maximum light (see their Table 13).To enter into this subset, a SN Ia was required to have at least one NIR observation at least 2.5 days before maximum light.We have a total of 48 SNe Ia in this cut, which we refer to as "NIR@max".All SNe Ia in the full sample have some NIR data available regardless of the phase of the first NIR observation.These SNe Ia and cuts are listed in Table 1.Additional information can be found in Avelino et al. (2019).The full dataset consists of 22 SN from the CfA Supernova Program (CfA; Jha et al. 1999;Wood-Vasey et al. 2008;Hicken et al. 2009Hicken et al. , 2012;;Friedman et al. 2015), 44 from the Carnegie Supernova Project (CSP; Krisciunas et al. 2017), 8 from the Las Campanas Observatory (K04a,b: Krisciunas et al. 2004a,b), as well as 5 others from individual papers in the literature (K03: Krisciunas et al. 2003;P08: Pignata et al. 2008;St07: Stanishev et al. 2007; L09: Leloudas et al. 2009;K07: Krisciunas et al. 2007).
The size of our training set reflects the recent progress of ground-based surveys in accumulating quality joint optical and NIR SN Ia light curve data (Friedman et al. 2015;Krisciunas et al. 2017).The number of SNe Ia in our current compilation more than doubles those used to train previous NIR-capable light curve models.The training set for the first BayeSN models included 37 SNe Ia with both optical and NIR coverage (Mandel et al. 2009(Mandel et al. , 2011)), and the training set for SNooPy comprised 30 SNe Ia (Burns et al. 2011).Further increases in the training set will soon be possible with forthcoming data from CSP-II (Phillips et al. 2019) and the Supernovae in the Infrared avec Hubble (SIRAH) program (HST GO-15889, P.I. S. Jha).

Passband Throughput
For each observation in the data compilation, we specify a model for the passband throughput.A model passband throughput is needed to forward model the observed flux, regardless of whether that flux is reported in the natural system of the telescope or transformed on to a "standard" system such as SDSS ugriz (Fukugita et al. 1996).For a measurement reported on the natural system of a telescope, the total passband throughput must include all terrestrial elements of the measurement chain -site atmospheric transmission, mirror reflectivity, filter transmission, transmission of camera optics, and detector quantum efficiency.For measurements reported on a standard system, the passband throughput must reflect the original measurement chain used to observe the standard stars that were employed in calibrating the SN flux, rather than the measurement chain of the facility used to observe the SN itself.
For the Carnegie Supernova Project and related objects observed at Las Campanas Observatory (Krisciunas et al. 2017(Krisciunas et al. , 2004a,b),b), we use the total natural system passband throughputs3 as defined in the implementation of the SNooPy (Burns et al. 2011).We take care to include any changes in the CSP passband throughputs when a filter was replaced.For the NIR SNe observed by the CfA using the 1.3m PAIRITEL telescope at Mt. Hopkins (Wood-Vasey et al. 2008;Friedman et al. 2015), we use the natural system passband throughputs measured by the 2MASS project4 , which used the same facility.Finally, for objects observed by the CfA Supernova Program (Jha et al. 1999;Hicken et al. 2009Hicken et al. , 2012) ) and remaining literature objects (K03, K07, St07, P08, and L09) we use published the standard system photometry and model the passband throughput using the shifted Bessell filters described in Stritzinger et al. (2005).While the CfA SN program published both natural and standard system photometry, and the former is generally preferred as it avoids some potential systematic errors in transforming the flux, using the natural system photometry relies on having a good description of the passband throughput of the natural system.Unfortunately, there are no determinations of all the elements in the measurement chain for objects observed by the CfA SN survey, and the current model for passband throughput included in the SNDATA repository5 does not include any model for the site atmosphere at all.The CfA Supernova measurements were the result of an extensive effort over almost two decades with four separate cameras, through a variety of filters, using a telescope that underwent numerous mirror coatings, and the provenance of each measurement cannot easily be determined retrospectively.By contrast, the standard system photometry for CfA objects is known to be consistent with standard system photometry measured by the CSP and LOSS (Ganeshalingam et al. 2010).Thus, we prefer to use the standard system photometry over the natural system photometry in this work.
Ultimately, we plan on training a version of BayeSN exclusively on SNe Ia observed by the Foundation Survey and the Young Supernova Experiment (YSE; D. Jones et al. 2020, in prep.), which have well-determined measurements of the natural system passband throughput.

BayeSN
We have implemented our Bayesian model in the Stan probabilistic programming language (Carpenter et al. 2017) to specify and sample the global posterior density over all latent variables and hyperparameters conditional on the training set data.Stan implements a variant of dynamic Hamiltonian Monte Carlo (HMC; Neal 2011; Betancourt 2017), originally based on the No-U-Turn Sampler (NUTS) (Hoffman & Gelman 2014).Stan utilises automatic differentiation to compute gradients of the log posterior (Eqs.28, 29) and guide efficient exploration and convergence to the target density in high-dimensional parameter spaces.We typically run 4 chains in parallel, each initialised with random jitter to start at a different point in parameter space.We follow standard procedures to assess convergence and mixing of the chains (Gelman & Rubin 1992;Gelman et al. 2013).The first half of the iterations, which are used for adaptation of the HMC algorithm and burn-in, are discarded.The algorithm adapts the integration time to yield samples that are nearly serially uncorrelated, and we run it long enough so that the effective sample size is approximately 1,000.
We discretise the integrals over wavelength (Eq.6) as numerical Riemann sums with resolution ∆λr = 20 Å.This provides sufficient precision for evaluating the model fluxes (with discretisation error < 0.2% and therefore much smaller than typical photometric error).
We can employ the model and Bayesian inference code in two modes.In training mode, we condition on the external distance estimates and their uncertainties, along with the SN Ia light curves and redshifts, to sample the joint posterior of all hyperparameters and latent variables.Trying to find the single optimal point of the global posterior in the high-dimensional parameter space is vulnerable to overfitting.Instead, we use the Bayesian approach to sample the global joint posterior Eq. 28, which allows us to marginalise over the posterior uncertainties in the latent variables when estimating the hyperparameters, including the SED components.In distance-fitting mode, we use posterior estimates of the hyperparameters of the already-trained model, and we remove the external distance constraint.Redshifts are only used to shift the SED between the rest-frame and observerframe and to account for time-dilation.We then compute posterior inference on the latent parameters of individual SNe, and marginalise to obtain the posterior the photometric distance from the SN Ia light curve (Eq.30).
For the purposes of this analysis, we define optical as the BV RI bands and the NIR as Y JH.For BayeSN and SNooPy we fit the BV RIY JH bands, where RI includes ri and r i filters, where applicable.The version of BayeSN described here has not been trained on U -band data; preliminary analysis with a BayeSN prototype including the Uband does not show a significant improvement in results on this sample.We apply our current model either to the available BV RI (optical) or BV RIY JH (optical+NIR) data.

SALT2 and SNooPy fitting
We used the SALT2.4model of Betoule et al. (2014) and use the calibration therein of the Tripp estimator (Eq. 1) for distances.The specific implementation of SALT2 used is available in the sncosmo package (Barbary et al. 2016).
For each object, we use literature estimates of the time of B-band maximum to select observations between -10d and +40d in phase with S/N > 3.This ensures that the same observations are used by both SALT2 and BayeSN.SALT2 has a range of 2000-9000 Å and therefore can fit the U BV RI bands, but as with BayeSN, we do not fit the U -band and restrict the comparison to BV RI.We compared the SALT2 results with or without U -band, and found that the U -band data did not improve the results for our sample.The limited BayeSN 13 template range of SALT2 also prevents us from any comparison with BV RIY JH fits.Pierel et al. (2018) created a NIR extension to the SALT2 model that is suitable for simulations, but that did not use the same training procedure as that used to create the SALT2.4model templates.Therefore, it is not suitable for fitting real light curves and does not yield calibrated distances.
For each object, we begin with an initial guess for the parameters, which we refine with Minuit (James & Roos 1975).We use the result from Minuit to set the initial positions of 32 walkers used to sample the posterior distribution with the emcee Markov Chain Monte Carlo package (Foreman-Mackey et al. 2013).We generate 2000 samples per walker after discarding the first 500 steps as burn-in.We visually inspect the parameter chains and 2D marginalized posterior distributions.We report the median value of the samples as the "best-fit" estimate and use the 16 th and 84 th percentiles of the samples as a credible interval.The official procedure for SALT2 light curve fitting weights the fit by the photometric errors only; although a residual scatter model exists (Guy et al. 2010), when we have tried to use it for fitting, we obtained a much larger Hubble diagram scatter and results with substantially larger uncertainties than plausible for these low-redshift, high S/N objects.For consistent comparisons, we have adjusted the the SALT2 distance estimates to a scale of H0 = 73.24km s −1 Mpc −1 .
We use the SNooPy EBV_model26 to fit the observations using templates parametrized by the light curve stretch, sBV .The EBV_model2 uses the same algorithm as Prieto et al. (2006) to build the templates together with the updated calibration of 24 CSP supernovae presented in Burns et al. (2011).The resulting EBV_model2 rest-frame light curves templates cover uBgV riY JH.To be consistent with our comparison to SALT2, which is restricted to modeling only the optical observations, we fit BV RI (optical-only) as well as BV RIY JH (optical+NIR) data with SNooPy.We use the same initial guesses for the SNooPy fit parameters as used for the SALT2 fits.SNooPy uses a nonlinear least-squares Levenberg-Marquadt algorithm to minimize the variance weighted residuals to the model.As with SALT2, we report the statistical uncertainties on the fit parameters derived from inverting the Hessian matrix at the best-fit parameters, and we have adjusted the SNooPy distance estimates to a scale of H0 = 73.24km s −1 Mpc −1 .Our low-redshift SNe have well-sampled light curves with high S/N and we do not find any significant differences between the Levenberg-Marquadt results and those using MCMC sampling.The SNooPy light curve fitting procedure weights the light curve fit only by the photometric errors; there is no residual covariance model.In the 2D contour plots, the black contours contan 68% and 95% of the marginal posterior probability, and the mode is indicated.The 1D marginal plots depict a kernel density estimate applied to the MCMC samples for each parameter.The SED shape parameter θ 1 and host galaxy dust extinction A V are marginalised over to obtained the posterior distribution of the photometric distance modulus µs.

Light Curve Inference for Individual SNe Ia
As an example, Fig. 3 demonstrates a BayeSN light curve fit to optical and NIR observations of SN 2005iq (CSP, z = 0.034).It also shows the posterior distribution of the latent parameters (θ1, AV , µ) obtained under distance-fitting mode.To obtain the marginal distribution of the photometric distance modulus µ phot s , the other latent parameters of the SN (including the residuals es) are integrated over.The photometric distance modulus is well constrained to ±0.09 mag using the joint optical and NIR data at all phases.
In Fig. 4, we show a visual comparison between the BayeSN and SALT2 parameter estimates.In the top panel, we plot the SED shape parameter θ1, which is the score of the first functional component, against the SALT2 x1 "stretch" parameter for the same SNe Ia.The sign of θ1 has been chosen to be in the same sense as the decline rate ∆m15(B) of Phillips (1993), which is the magnitude change in B-band between B-band peak and 15 days afterwards.Larger values of θ1 correspond to faster (larger) postmaximum optical decline rates.Larger x1 values correspond to broader optical light curves, which have slower (smaller) optical decline rates.There is a fairly tight, slightly nonlinear correlation between θ1 and x1, suggesting that they are capturing the same underlying major mode of variation.
In the bottom panel of Fig. 4, we compare the SALT2 colour parameter c and the BayeSN fitted value of the apparent B − V colour at peak t = 0.The latter is determined by evaluating the rest-frame SED model (at redshift zs = 0) with the fit parameters (θ s 1 , A s V , es) for each SN, and integrating it under reference B and V bandpasses, which we take to be those of the CSP.There is a strong but not exactly 1-to-1 correlation between the two.The BayeSN model is able to leverage the optical and NIR data of the full light curve to probabilistically decompose the apparent colour into an intrinsic B − V colour and dust reddening E(B − V ).The former is estimated from the light curve fit by evaluating the rest-frame SED with the light curve fit parameters (θ s 1 , es) and setting A s V = 0, and integrating it under the reference passbands, and the latter is determined by E(B − V )s = A s V /RV .Our model finds that the apparent colours are the sum of two different effects and captures these two different sources of variation, which are each correlated with the rest of the SED (and thus luminosity as a function of wavelength) in different ways.In contrast, the conventional Tripp formula (Eq. 1) assumes that the apparent color-magnitude relation is described by a single factor depending on the SALT2 apparent colour parameter c.

Population Inference
The statistical properties of the latent SED, captured by the intrinsic FPC, residual covariance, and dust distribution, are learned during the BayeSN model training phase by sampling the global posterior density, Eq. 28.

Intrinsic SED components
The baseline intrinsic SED depicted in Fig. 1 is obtained with θ s 1 = A s V = es = δMs = 0, and is equal to S0(t, λr)10 −0.4[M 0 +W 0 (t,λr )] .The first functional principal component (FPC) W1(t, λ) is also shown in Fig. 1.The top panel of Fig. 5 shows the effect of our first functional component W1(t, λ) on the baseline intrinsic SED at phases t = 0 and t = 20 as one changes the coefficient θ1.In the bottom panel, for comparison, we show the effect of the dust extinction on the SED.An interesting difference between the two is the sign flip of the effect of θ1 in the NIR at phase t = 20.Under this effect, SNe Ia that are dimmer in the optical are actually brighter in the NIR Y J bands at this later phase.This is an indication of the correlation of dimmer SNe Ia having earlier rises to the secondary NIR maximum.In contrast, the effect of dust is to make SNe Ia dimmer at all phases.This sign-flip distinction may help break the degeneracy between intrinsic SN and extrinsic dust effects.
The figure also shows a reference set of filter passband functions (with arbitrary scaling for visualisation purposes).We visualise the effect of our functional components on rest-frame photometric light curves by integrating our SED model with various parameter values under this reference set.The reference set we choose for illustration are the CSP BV riY JH passbands and the z-band filter from Pan-STARRS1 (PS1).The rest-frame z-band (at ≈ 0.9 µm, between i and Y ) region of SN Ia SEDs is regularly probed by low-z surveys such as Foundation and YSE, but is not modelled by either SALT2 or SNooPy.In §5.4,we demonstrate an example of BayeSN fitting of a rest-frame z-band SN Ia light curve from Foundation DR1 (Foley et al. 2018b).
By integrating the SED model under these reference optical and near-infrared passbands, we show in Fig. 6 the effect of the 1st FPC W1(t, λr) on the intrinsic optical and and NIR light curves.We see that this intrinsic component captures the optical width-luminosity relation (Phillips 1993): intrinsically brighter supernovae have more slowly- Effect of Dust at phase t = 0 Variation in the optical and NIR intrinsic SED captured by the first functional component W 1 (t, λ) at t = 0 and 20 days.We vary the value of θ 1 by θ1 ± 2σ, holding all other SN parameters to zero.(bottom) The effect of dust extinction on the optical and NIR SED.We apply dust extinction to the baseline mean intrinsic SED with different combinations of A V , R V that produce the same optical colour excess declining (or broader) light curves, whereas dimmer ones decline faster.This effect is seen most clearly in the B and V bands.In the redder optical bands (r and i) and into the NIR zY JH bands, we see that this same effect is also correlated with the timing of the second peak at t = 20 − 30 days: brighter supernovae tend to have later secondary NIR peaks, while dimmer SNe Ia have earlier ones, which is a further reflection of the trend seen in Fig. 5.In iY JH bands, the effect also correlates to more pronounced second peaks.The empirical relation we capture correlates strongly with the theoretical models of Kasen (2006), who found that brighter SNe Ia should have more pronounced NIR secondary maxima at later phases due to role of the ionisation evolution of iron group elements in the SN ejecta in redistributing energy from the optical to the NIR.Similar trends have been seen by Dhawan et al. (2015), and Shariff et al. (2016a) ex-plored the use of the phase of the secondary NIR maximum for standardising SN Ia optical magitudes.
The first NIR peak typically occurs a few days before the optical (B) peak (t = 0).Estimation of the 1st FPC at early pre-maximum phases in the NIR is somewhat limited by the relative scarcity of quality NIR observations there in the current dataset (particularly in the H-band).Future data releases with greater NIR coverage at early phases will help us improve the model.
In Fig. 7, we illustrate the dependence of optical and NIR absolute magnitudes on the SED shape parameter θ1 of the FPC.The extinguished absolute magnitudes of a SN s are obtained by evaluating the model SED with its fitted parameters (θ s 1 , es, δMs, A s V ), setting µs = 0, and integrating it under the reference passbands in the SN rest-frame.The intrinsic absolute magnitudes are obtained in the same way but by setting A s V = 0.In the optical B-band the av--10 0 10 20 30 Rest-Frame Phase erage dust extinction correction is a 0.40 mag shift in absolute magnitude for the sample.In the NIR Y, H-bands, the mean shift is 0.10 and 0.06 mag, respectively, which reflects the much diminished effect of dust extinction in the NIR compared to the optical (Fig. 1).The relatively steep mean dependence of the B intrinsic absolute magnitude on θ1 captures the optical width-luminosity relation (Phillips 1993).In the NIR, the slopes of the dependence of Y and H intrinsic absolute magnitudes with θ1 are consistent with zero, after marginalising over the posterior uncertainties.
The scatter about the mean intrinsic relation due to the SED residual functions is approximately 0.10 mag.We note that the scatter around the mean intrinsic relation is not necessarily identical to the photometric distance uncertainty nor the expected scatter in the Hubble diagram.This is because the SED shape θ1 and the dust extinction AV factors must themselves be estimated from the data, and their uncertainties are themselves influenced by the intrinsic residual covariance.Instead, proper inference of the photometric distance uncertainty comes from the marginalisation in Eq. 30.However, the diminished effect of dust AV and the insensitivity to θ1 in the NIR do significantly reduce their contributions to the derived photometric distance uncertainties.
Colour curves, derived from flux ratios or magnitude differences between different filters, provide a useful window for understanding SNe Ia, since they are independent of the distance estimate and its errors.In the top panel of Fig. 8, we illustrate the effect of the W1(t, λr) on the intrinsic optical-NIR colour curves by varying θ1.At each epoch t, these are obtained by integrating the resulting rest-frame SED under each passband taking the difference with respect to the Vband magnitude.The general trend is that the intrinsically brighter, and more slowly declining, SNe Ia (more negative θ1) tend to have bluer (more negative) colour curves in each of the colours shown.The first FPC W1(t, λr) modulates the colour curves in a time-dependent fashion.While there are fixed points in phase when particular intrinsic colours are fairly insensitive to θ1, at phases 10 < t < 20 days, there is significant intrinsic colour variation in all optical-NIR colours relative to V -band.
In the bottom panel of Fig. 8, we compare this with the impact of host galaxy dust reddening on the optical-NIR colour curves.In contrast to the intrinsic FPC, the effect of dust on colour curves is relatively constant in phase7 , and the main effect is across different colours.We show the mean intrinsic colour curves with no dust AV = 0 (thin blue), as well as two combinations of the dust parameters [(AV , RV ) = (0.75, 3) or (0.50, 2)] that result in the same colour excess E(B − V ) = AV /RV = 0.25.The plot demonstrates that, with apparent colour information in optical BV r data alone, it is very difficult to distinguish between the two possibilities.In contrast, the optical-NIR V − Y JH colour information helps to break the degeneracy and distinguish between the two values of the dust law RV .

Intrinsic SED Residual Distribution
The model captures the population distribution of residual SED variations that are unexplained by the intrinsic FPC, the host galaxy dust extinction, peculiar velocities or other external distance uncertainties, or measurement error, through the residual covariance.The total residual SED function of a SN Ia s is ηs(t, λr) = δMs + s(t, λr).An example of an SED residual function is shown in Fig. 1.Fig. 9 shows the effect of intrinsic SED residuals on rest-frame optical and NIR light and colour curves.We hold θ1 = AV = 0, and we compute the impact of the distribution of SED residuals on the light curves and colour curves by integrating through the reference passbands.We compute the ±1σ range at each epoch t.We do not unrealistically assume the residuals are statistically independent at each phase or in each filter; rather the residuals manifest as continuous perturbations around the main effects.The model captures continuous residual SED functions correlated across phase and wavelength.To illustrate this, we show the effect of three realisations of the intrinsic residual functions on the light curves.The residual variance is generally narrow at phases around the first peak.In later phases, particularly in the NIR, there is more intrinsic residual variation because the 1st FPC does not capture the full range of variation of the second peak.In §B, we show that some of the additional structure there may be captured with higher order FPCs.

Host Galaxy Dust Population
Fig. 10 shows the distribution of posterior mean estimates of the individual dust extinction AV values.It is well described by an exponential distribution with an average value of τA = 0.329 ± 0.045 mag.Fig. 11 shows posterior inferences of the average τA and the global value of the dust law slope RV .The posterior constraints are determined during the training phase, and thus are obtained by marginalising over all other components and hyperparameters of the hierarchical model.These posterior estimates are well constrained fairly independently.In particular, for this sample with colour excess E(B − V ) host 0.4, the estimated global RV = 2.89 ± 0.20 is consistent with the average for normal Milky Way dust.This is in good agreement with previous analyses of nearby samples, which have found, at these relatively low-to-moderate values of reddening (which are similar to those found in the cosmological sample), average values of the host dust RV ≈ 2.5 − 3 (Mandel et al. 2011;Chotard et al. 2011;Foley & Kasen 2011;Burns et al. 2014;Mandel et al. 2017;Léget et al. 2020).
The hierarchical model constrains RV by analysing and weighing the entire distribution of SEDs over phase and optical to NIR wavelengths using the entire training set of SNe Ia.For visualisation purposes, however, it is useful to inspect a "slice" of this inference by examining a low-dimensional summary.Multi-dimensional colour information is useful as it provides constraints on the dust distribution while being insensitive to the distance estimate (and its errors).We exploit the fact that the optical and NIR data allows us to constrain the dust effects over a much larger wavelength range than is possible conventionally with the optical data alone.The plot of the dust law in Fig. 2 shows that the extinction at NIR H-band (≈ 1.6 µm, cf.Fig. 5) is only 16% of that in optical V -band (≈ 0.54 µm), and very insensitive to RV .Thus, the differential extinction (the colour excess) between V -and H-bands probes a large net dust effect (≈ 0.83 AV ), while itself being insensitive to RV .Meanwhile, the colour excess between B (≈ 0.43 µm) and H-bands similarly covers a large wavelength range and therefore a large dust effect, but because of the high sensitivity of AB to RV (for a given AV ), this colour excess is very sensitive to RV .The complementary optical B − V colours cover only a narrow range in the optical, and therefore captures a smaller differential effect of dust, but is also highly sensitive to RV .The advantage of measurements spanning optical to NIR is that we can leverage the joint colour information in these SNe Ia to constrain and break the degeneracy between AV and RV in the optical-only colours (Fig. 8).
Fig. 12 shows a "slice" of the constraints on RV in these colours from training the BayeSN model (Eq.28).The top left panel shows posterior realisations of the SNe Ia peak (t = 0) apparent colours (red) and intrinsic colours (blue) inferred by the model, and corrected for the inferred intrinsic colour-shape relation (Fig. 8) to θ1 = 0.The blue contours indicate the (68%, 95%) contours of the inferred intrinsic colour distribution, which is anchored by the SNe Ia with the least inferred amount of dust.The red solid (dashed) lines have the slope of the reddening vector for RV = 3, and intercept the mean (are tangent to the 95% contour) of the intrinsic distribution.Nearly all of the SNe Ia apparent colours should lie within the dashed lines under the correct -10 0 10 20 30 Rest-Frame Phase Intrinsic variation in optical and NIR colour curves captured by the first functional component W 1 (t, λ).We vary the value of θ 1 by θ1 ± 1σ, while fixing A V and other SN parameters to zero.(bottom) Effect of host galaxy dust extinction on optical and NIR colour curves.We show unreddened, intrinsic colour curves (blue), and two apparent colour curves with the same amount of optical E(B − V ) colour excess due to dust, but two different values of the dust law R V = 2 or 3. We fix θ 1 and other SN parameters to zero.The phase-dependence of the W 1 (, t, λ) component on intrinsic colour curves makes it distinguishable from dust.The effect of dust reddening on colour curves is approximately constant with phase.(bottom left) With optical data only, it is difficult to distinguish between two different combinations of host dust A V , R V that produce the same colour excess E(B − V ) = A V /R V .(bottom right) Since the dust extinction in the NIR is smaller and less dependent on R V , the optical-NIR colour curves help to break this degeneracy.For visual clarity, the colour curves have been shifted vertically by arbitrary constants (B−V : 0, -10 0 10 20 30 Rest-Frame Phase dust reddening law.The arrows indicate the dust reddening vector for each colour pair for a given dust law RV and illustrate a shift corresponding to AV = 0.57 mag from the centre of the intrinsic colour distribution.In the other panels, we show that the apparent colour distribution in these colour pairs are inconsistent with the dust reddening vectors for RV = 2 or RV = 1.5.That is, given the apparent colours of the low-reddening set (low B − H), assuming a low RV would predict bluer (more negative) average V −H apparent colours for a given B − H for more reddened SNe Ia (e.g.B − H > 0.5) than is observed.Conversely, the apparent colours of the high-reddening set (high B − H) would imply that the apparent V − H colours of the low-reddening set ought to be redder (more positive) than is observed, when assuming a low RV .For B − V colours, the same inconsistencies persist but in the opposite sense.The high-and low-reddening ends of the apparent colour distribution are most consistent with each other for RV ≈ 3.Because the estimation of RV hinges on the comparison of the colours of high-reddening SNe to those of lowreddening SNe, the most highly reddened SNe have the most leverage.In our sample, SN 1998bu has the largest extinction estimate (AV = 1.15±0.08).To test that our RV estimate is not entirely driven by this SN, we retrained the full hierarchical model omitting SN 1998bu.We found RV = 2.83 ± 0.19, indicating that our estimate is robust to the reddest SN.

Covariance Structure of Optical and NIR Peak Absolute Magnitudes
During the training phase, we estimate the population covariance structure of SN Ia SEDs.The covariance structure is implied by the model Eq. 12 and the population distribution of the latent parameters.where k (t, λr; t , λ r ) is given by Eq. 22, and we invoke the statistical independence and normalisation of the FPC scores: Cov[θi, θj] = δij.On the right-hand side, the top line describes the covariance across rest-frame wavelength induced by the dust extinction and the dust law ξ(λ), which depends on RV .The second line describes covariance across both phase and wavelength induced by the K intrinsic functional principal components of the SED.The third line describes the covariance of the intrinsic residual terms ηs(t, λr) = δMs + s(t, λr).Because the absolute magnitude in any one passband at some phase t is obtained by exponentiating Eq. 12 and then performing an integral of the SED under the transmission function, the covariance between any pair of absolute magnitudes in different filters at different phases is not analytic and must be computed numerically.
The full covariance structure over rest-frame phase and wavelength learned by the model is complex, and we defer a detailed discussion to future work.Here, we distill some of its key aspects.Fig. 13 depicts the population cross-correlation structure between peak (at t = 0) optical and NIR absolute magnitudes.The variation in absolute magnitudes is generated by the combination of the various latent component effects on the SED, and is obtained by integrating the SED through the reference filters.The map shows the correlation of the peak extinguished absolute magnitudes across optical and NIR passbands, inclusive of dust, intrinsic θ1 SED variation, and residual covariance.The peak absolute magnitudes in the optical have a very strong total correlation, whereas the cross-correlation between optical and NIR peak absolute magnitudes is as low as ≈ 40%.This is caused in part by the strong, coherent wavelength-dependence of the host galaxy dust extinction.However, the dust extinction is significantly diminished in the NIR.This reduced crosscorrelation indicates there is additional information in the NIR magnitudes that helps to improve distance estimates.

Hubble Diagram Analysis
After training the model by sampling Eq. 28, we obtain posterior estimates of the FPC and population hyperparameter Ĥ ≡ ( Ŵ0:K , Σ , σ2 0 , τA, RV ).We then use these to evaluate the photometric distances, derived from the light curves alone, using Eq.30.We take the posterior mean and standard deviation of the posterior probability density of the photometric distances.Table 1 lists the redshifts, external distance estimates, and BayeSN photometric distance moduli for the sample.
We assess the accuracy and precision of our photometric distance estimate by comparison to the external distance estimates, via the Hubble residuals, μphot s − μext s .We compare them using two summary statistics, listed in Table 2. First, we report the simple total RMS the differences between our posterior mean estimate photometric distance modulus μphot These include all modelled sources of latent SED variation, including dust extinction, the intrinsic FPC θ 1 W 1 (t, λr), and the residual SED covariance.Dust effects induce significant wavelengthdependent correlations in the optical, but have significantly diminished effect in the NIR.While the optical magnitudes are significantly correlated with themselves, they are less so with the NIR magnitudes, with optical-NIR cross-correlations as low as ≈ 40%.This indicates there is additional information in the NIR that helps improve distance estimates.
is dominated by the peculiar velocity uncertainty σpec = 150 km s −1 for the vast majority of this low-z sample.
It is conventional in the SALT2 analysis to compute an "intrinsic dispersion"8 σint of the Hubble residuals, by estimating the amount of scatter in the Hubble residuals in excess of the expected contributions of "measurement error" (which is really the estimated uncertainty on the fit parameters mB, x1, c), and the peculiar velocity uncertainties.This is necessary because only the light curve fitting uncertainties on the SALT2 parameters are propagated through the Tripp formula, Eq. 1, to compute the distance modulus uncertainties, and the results are typically much smaller than the total RMS in the Hubble diagram.Similarly, SNooPy only uses the photometric measurement uncertainties in the light curve fit.In contrast, BayeSN produces distance uncertainties via Bayesian marginalisation of the SED fit to the light curve data, coherently incorporating θ1 and AV uncertainties and the residual covariance over phase and wavelength (Eq.30).Since each method has a different way of reporting the distance errors, we do not "subtract" the reported distance errors from the total RMS.Instead, to ensure consistent comparisons across methods, we use σ-pv to remove from the total RMS only the expected contribution from external distance errors (e.g.peculiar velocities), which are the same for each method applied to the same set of SNe Ia.
Table 2 lists these Hubble residual dispersion measures for different subsets of the SN Ia sample.The vast majority comes from two large surveys with homogeneously reduced data, the CfA (Hicken et al. 2009(Hicken et al. , 2012;;Friedman et al. 2015) and CSP-I (Krisciunas et al. 2017).We label this set "CfA+CSP".Including the minority of other SNe Ia drawn from the more heterogenous data sources in the literature results in the "All" sample.Furthermore, a subset of the full "AnyNIR" sample with NIR observations near maximum light is labelled "NIR@max."We run BayeSN and SNooPy on either optical-only (BV RI) or optical+NIR (BV RIY JH) light curve data, while SALT2 is only run on optical BV RI data.

Resubstitution or Training Error
The resubstitution, or training error, is obtained by training the model on the entire dataset, and then applying it to determine the photometric distance estimates to the SNe Ia that were in the training set.In We assess the significance of the difference between the RMS Hubble residual of distance from our model compared to those from SALT2 using bootstrap.From the full training set SNe Ia, we construct a bootstrapped set by sampling with replacement.For each method, we compute the Hubble residual RMS of the SNe Ia within the bootstrapped set.We compute the difference in RMS between the two methods within the bootstrapped set.We repeat this 1,000 times and then compute the variance of the differences in RMS across the bootstraps.This procedure accounts for the fact that each method is analysing the same set of SNe Ia, and therefore the joint sampling distribution of both methods' RMS over bootstraps is correlated.For the CfA+CSP NIR@max subset, we compare SALT2 using optical (which has the lowest RMS of the alternate methods) versus BayeSN using optical+NIR, and we find a ∆RMS = 0.040 ± 0.012 (3.3σ).
Table 2 summarises of Hubble diagram dispersions of the other subsets of the SN Ia sample.We find that the addition of the literature sample to the CfA+CSP sample (to constitute All) increases the dispersion slightly in nearly all cases, which is to be expected since these SNe Ia come from more heterogenous data sources.BayeSN optical+NIR distances are still more precise than SNooPY and SALT2 in the AnyNIR sample, when we do not require NIR measurements near maximum light, but the advantage is reduced.
On optical-only data (BV RI), all three methods perform similarly, with total RMS ≈ 0.15 − 0.16 mag.

Cross-Validation
Cross-validation techniques to test the sensitivity of SN Ia models and their distance estimates to the finite training set have been previously employed by Mandel et al. (2009Mandel et al. ( , 2011) ) and Blondin, Mandel & Kirshner (2011).These procedures address the double use of the data inherent in resubstitution.We performed 10-fold cross-validation to assess the out-oftraining sample distance error.We equally divided the full training set into 10 folds, each with a roughly similar redshift distribution.First, we hold out one fold, and train a new BayeSN SED model on the SNe Ia in the other 9 folds.Then we used the new trained model to estimate the photometric distances of the SNe Ia in the held-out fold.We repeated this procedure 10 times, each time holding out a different fold, training a new model on the complement, and using it to evaluate the photometric distances of the held-out SNe.The Hubble residual summaries of the cross-validation outof-training sample photometric distances thus obtained are listed in Table 2 as "BayeSN-cv".
In the best case, for the CfA+CSP NIR@max subset, the total RMS of the photometric distances relative to the external distances is 0.108 mag.As expected, this is slightly higher than the RMS training error (0.096 mag) because the cross-validated distance of each SN is obtained using a model trained on a set that excludes that SN.This is an overestimate of the true error of the fully-trained model, since each model under CV is trained on a 10% smaller training set than the full sample.We expect the difference between the training and cross-validation error to narrow as more training data becomes available.Still, the difference between the two numbers is already small (0.012 mag), so it is reasonable to conclude that the typical distance error for similar optical and NIR light curves with peak NIR data is ≈ 0.10 mag.
A large fraction of our training set SNe Ia were also used in the training sets for both SNooPy (Burns et al. 2011) and SALT2 (Guy et al. 2010).To our knowledge, there has been no equivalent cross-validation analysis, including hold-out and iterative retraining, for these other models.Since we are unable to retrain these other models on partitions or resampled subsets, it is difficult to make equivalent, direct comparisons of these models to our cross-validation results.
Our cross-validation runs demonstrate the capability of our training code to straightforwardly and repeatedly train new models on different SN Ia datasets automatically without human intervention.We will be able to use this modularity to train and compare new BayeSN SED models based on datasets partitioned by survey or by astrophysical classes (e.g.SN Ia host galaxy properties or spectroscopic subclasses) to investigate the statistical and physical differences in the learned SED components and latent variables.

Application to Foundation SN Ia light curves
The optical and NIR light curves in our training set listed in Table 1 are mainly from the Carnegie Supernova Project and CfA Supernova Program, which typically measured highquality light curves with relatively frequent time sampling (c.f.Fig. 3).However, most SN Ia light curves used for cosmology are not sampled as well in phase or wavelength.To test our model on SN Ia light curves outside of our training set with more typical sampling, we have fit griz light curves obtained by the Foundation Supernova Survey using the Pan-STARRS1 (PS1) telescope (Foley et al. 2018b) .
Fig. 15 demonstrates a BayeSN fit to Foundation observations of the Type Ia SN 2016gou / ATLAS16cxr.It shows the well-constrained joint posterior distribution of the parameters obtained from the MCMC fit: the θ1 coefficient of the 1st FPC, the dust extinction AV , and the photometric distance µ.Because BayeSN is a model for the continuous SED spanning 0.35 to 1.8 µm, we are able to integrate the model SED under the griz PS1 passbands to fit this data, even though these exact passbands were not used in the training phase.Our SED model does not require K-corrections to be computed as preprocessing step to map observer-frame to rest-frame passbands.Notably, the SALT2.4model cannot properly fit rest-frame z-band due to the wavelength limits of its SED template, and SNooPy lacks a rest-frame z-band light curve template.However, proper modelling of the rest-frame z-band is important for fully utilising griz data from low-z surveys such as Foundation (Foley et al. 2018b) and the Young Supernova Experiment.In future work, we will present a full analysis of the Foundation DR1 dataset using our new BayeSN model.

Improvements over current models
We have constructed a new hierarchical Bayesian model, BayeSN, for SN Ia spectral energy distributions (SEDs) from the optical through NIR.This is the first statistical model for continuous SN Ia SEDs designed for fitting observed optical and NIR light curve data, and is crucial for properly analysing NIR observations from current and future SN Ia surveys.Our model is capable of statistically leveraging the powerful properties of SN Ia in the NIR, in particular the narrow dispersion in NIR luminosities at peak, and the much diminished effect of dust in the SN Ia host galaxies.
BayeSN jointly leverages the optical and NIR data to constrain the dust extinction AV and the reddening law RV more stringently, thereby controlling systematic errors due to the dust correction.BayeSN coherently estimates the covariance of the residual SED functions across time and wavelength, and incorporates them into the dust and distance estimates in a principled, probabilistic manner.By generalising the previous hierarchical Bayesian framework of Mandel et al. (2009of Mandel et al. ( , 2011) ) from modelling light curves in fixed discrete rest-frame filters to modelling a continuous SED function in phase and wavelength, we obviate the need for ad-hoc K-correction pre-processing procedures to compute 1-to-1 mappings between observer-frame and rest-frame filters, which is required by SNooPy.Instead, observed data are compared directly against the model fluxes implied by the redshifted SED model integrated under the observer's passbands.Redshifting effects are thereby incorporated directly into the statistical model.Furthermore, BayeSN has a number of advantages over the SALT2.4model conventionally used in cosmological analyses.Whereas the SALT2.4spectral template has coverage only up to rest-frame 0.9 µm (inclusive of restframe i-band), our BayeSN SED model extends to 1.8 µm (i.e. through rest-frame H-band).The SALT2 model cannot internally discern distinct SED components separately describing the effects of SN Ia intrinsic variation versus host galaxy dust extinction.Instead, it uses a single colour law to fit a single apparent colour parameter, effectively confounding the two physically-distinct sources of spectral variation.
In contrast, our BayeSN SED model internally encodes the continuous wavelength-dependent host galaxy dust reddening and extinction on the SN Ia SED, as effects physically distinct from the time-dependent intrinsic components of SED variation.Our model leverages the photometric constraints on the entire continuous SED to determine the dust properties, fit for the intrinsic modes of variation, and coherently weigh the uncertainties and combine information from across phase and wavelength to compute the probability distribution of the photometric distance modulus.With the low-z compilation analysed here, BayeSN can determine the distance moduli for SNe Ia with optical and NIR coverage near maximum light to ≈ 0.10 mag precision (total RMS), compared to 0.14 mag using SALT2 or SNooPy on the same SNe Ia.Combining optical and NIR data across the entire phase and wavelength range, we used BayeSN to derive tight constraints on the host galaxy dust law.For this sample with colour excess E(B − V ) host 0.4, we found RV = 2.9 ± 0.2, consistent with the Milky Way average.

Applications to current and future datasets
Beyond the data compilation analysed here, our BayeSN SED model will be broadly applicable for analysing optical and NIR SN Ia light curve data from more recent and current surveys.Forthcoming data from the Carnegie Supernova Project-II (Phillips et al. 2019) will enable us to expand our nearby training set with high-quality optical and NIR light curves of SNe Ia further into the Hubble flow (limiting the impact of peculiar velocity uncertainties).We are using Foundation DR1 griz light curves obtained with the wellcalibrated Pan-STARRS telescope for training and analysis with BayeSN.
BayeSN is essential for fully analysing data from recent and ongoing programs that use the Hubble Space Telescope to observe SNe Ia in the rest-frame NIR at high-z (RAISIN) and low-z (SIRAH), in conjunction with optical data from ground-based surveys.The ESO VISTA Extragalactic In-frared Legacy Survey (VEILS)9 recently concluded a timedomain survey that observed SNe Ia in the observer-frame J-band up to z ≈ 0.6, in conjunction with the Dark Energy Survey and the ESO VOILETTE survey in griz.
LSST's observer-frame y filter will probe the rest-frame NIR z or y bands to redshifts z 0.3.The Nancy Grace Roman Space Telescope (RST)'s wide imaging filters will extend to 2.0 µm (e.g.Hounsell et al. 2018), and thus will probe rest-frame H to z 0.4, J to z 0.7, and Y to z 1. BayeSN will be crucial for properly leveraging the full wavelength range of these surveys both to constrain the host galaxy dust properties and to produce optimal distance estimates.It will also be important for fully analysing any potential simultaneous observations of SNe Ia by LSST and RST (e.g.Foley et al. 2018a) or Euclid (Rhodes et al. 2017).

Future analyses and model extensions
Our hierarchical Bayesian SED modelling and inference framework is modular and flexible and will enable us to expand upon the SED model presented here to explore in greater depth various aspects of SNe Ia.In future work, we will investigate dust distributions by allowing R s V to vary for each SN Ia within a population governed by hyperparameters to be inferred, as was done previously by Mandel et al. (2011).We will also be able to test alternative forms of the dust extinction law (e.g.Goobar 2008;Amanullah et al. 2015).We will further probe the statistical properties of the intrinsic SED residuals over phase and wavelength, through the modelling and assessment of additional K > 2 functional components and improved estimation of residual covariance.
A further shortcoming of current SN Ia models is the lack of incorporation of astrophysical correlations at the fundamental level of the SED.A "host mass step" captures an apparent correlation between host galaxy stellar masses and SN Ia optical luminosities controlling for light curve shape and colour (Kelly et al. 2010;Sullivan et al. 2010;Smith et al. 2020).While the astrophysical nature of this correlation is still under active investigation (Jones et al. 2018;Rigault et al. 2018), it is typically addressed simplistically by correcting derived distances, or equivalently splitting the scalar absolute magnitude constant in Eq 1, according to the host mass.The correlation of SN Ia NIR absolute magnitudes with host mass has been investigated recently by Burns et al. (2018), Ponder et al. (2020), and Uddin et al. (2020).Our current low-z training set has roughly an average log host mass log 10 (M * /M ) ≈ 10.3 and approximately 80% lie in the "high-mass" category log 10 (M * /M ) > 10.In future work, we will apply BayeSN to a broader set of SNe Ia to conduct a Bayesian statistical analysis of this effect.
Similarly, SN Ia ejecta velocities, measured from spectral lines, are correlated with SN Ia intrinsic colour, and can be used to gain leverage on dust estimation and improve the accuracy of distances (Foley & Kasen 2011;Foley 2012;Mandel et al. 2014). Recently, Siebert et al. (2020) found correlations between ejecta velocity and SALT2 Hubble residuals.However, these astrophysical correlations should be accounted for at the fundamental physical level of the SN Ia SED functions, rather than by correcting derived distances.In future work, we will expand our BayeSN framework to explore, estimate, and incorporate the impact of these astrophysical effects on the full SED function S(t, λr) in a coherent statistical model.We will do this by adding functional regression terms proportional to f (θM * ) WM * (t, λr) or f (θv) Wv(t, λr) to our SED model (Eq.12), and by modelling potential correlations with host dust population parameters.
In the present work, we have leveraged joint optical and NIR broadband photometry of SNe Ia to learn the statistical properties of the latent intrinsic and dust components of SN Ia SEDs, while using the Hsiao (2009) template as a baseline "skeleton" to model spectral features at finer resolutions than the typical passband.Some of the residual SED covariance and scatter in the Hubble residuals indeed may be caused by per-SN variation in spectral features on wavelength scales much smaller than the typical filter.In future development, we will increase the wavelength resolution of our model, so that we can train simultaneously on spectroscopic sequences and photometric light curves of SNe Ia to improve the latent SED model.We will be able to leverage databases of optical spectra (Blondin et al. 2012;Silverman et al. 2012;Folatelli et al. 2013;Siebert et al. 2019), as well as forthcoming ground-based NIR spectra from the Magellan FIRE instrument obtained by the CSP-II and CfA Supernova Group (Hsiao et al. 2019), and space-based NIR spectra from the ongoing Hubble Space Telescope SIRAH program (GO-15889).
In future work, our BayeSN SED model will serve as the centrepiece of a fully hierarchical Bayesian statistical model for principled supernova cosmology analysis.

Figure 1 .
Figure 1.Schematic of the BayeSN forward generative model for the optical and NIR light curve (time series) data of a single SN.The log SN SED across time and wavelength comprises a mean intrinsic SED function modified by intrinsic functional principal components scaled by latent SED shape parameters, and extinguished and reddened by the host galaxy dust law, parametrized by the optical slope R V and scaled by the visual extinction A V .Variations not captured by these major modes are modelled by residual SED functions whose statistical properties across time and wavelength are captured by a covariance function.The resulting latent host-dust-reddened rest-frame SED undergoes the effects of distance, redshifting and time dilation, integration of the flux under the observer's filter functions, the survey cadence, and measurement error to yield the observed multi-band optical and NIR time series (light curves) of a SN Ia.
Intr Mean & FPCs W0(t, λ), W1:K(t, λ) Pop Dist'n of host galaxy dust Dust ⌦ M , w, . . .t e x i t s h a 1 _ b a s e 6 4 = " u t J I B Q Q 5 c p D t V S 7 z U r J w x m c w y V 4 c A 0 1 u I c 6 + M B A w C u 8 w b u D z o f z 6 X w t o j k n + 3 M K S 3 C + / w D P a J E O < / l a t e x i t > Pop Dist'n of Intrinsic SED(t, λ)s t e x i t s h a 1 _ b a s e 6 4 = " 0 a b H H E Y w D G j 2 n 6 e 8 3 V F X z g o n p z c = " > A A A B 8 n i c b

Figure 2 .
Figure 2. Probabilistic graphical model depicting the hierarchical BayeSN model for optical-NIR SN Ia light curve data.Each open box presents a set of unknown parameters or hyperparameters, each grey-shaded box represents observed data, and the arrows indicate relations of conditional probability.Parameters within the plate, labelled s = 1, . . ., N SN , are repeated for every SN s, whereas parameters outside the plate represent global or population parameters.The intrinsic SED of a single SN Ia s is constructed from the mean SED and functional principal components W 0:K (t, λr), a draw of the FPC scores θs from its population distribution, and a draw of an intrinsic residual SED function from its population distribution described by a covariance function over time and wavelength.The host galaxy dust extinction A sV of a SN s is drawn from a population distribution of extinction values, and the dust law is parametrized by an unknown R V .The effects of dust and distance modulus µs on the intrinsic SED combine (with appropriate redshifting and time dilation) to yield the apparent SN SED.This is observed with some observational cadence and noise through the observer's filter functions to yield the optical and NIR light curve data.During training, the distance is constrained externally to the light curve by the cosmological redshift and the fiducial cosmological parameters (fixed in this low-z analysis).The redshift is observed with some error, including uncertainties in local peculiar velocities.The hierarchical global posterior density (Eq.28) estimates the unknown latent variables and hyperparameters conditional on the observed data of the entire SN Ia sample.

Figure 3 .
Figure 3. (top) Example BayeSN light curve fit of optical and NIR BV riY JH CSP observations of the Type Ia SN 2005iq.(bottom) Posterior distribution of latent parameters of light curve fit to CSP observations of SN 2005iq.In the 2D contour plots, the black contours contan 68% and 95% of the marginal posterior probability, and the mode is indicated.The 1D marginal plots depict a kernel density estimate applied to the MCMC samples for each parameter.The SED shape parameter θ 1 and host galaxy dust extinction A V are marginalised over to obtained the posterior distribution of the photometric distance modulus µs.

Figure 4 .
Figure 4. Comparison of BayeSN and SALT2 parameters.(upper panel) Strong correlation between the θ 1 coefficient of the first principal SED component and the SALT2 light curve shape "stretch" parameter x 1 .(lower panel, top) correlation between the peak (t = 0) B − V apparent colour from BayeSN light curve fit and the SALT2 colour parameter c. (lower panel, bottom) BayeSN models the apparent colour as the sum of two latent components: the intrinsic colour (blue) and the positive reddening due to dust, E(B −V ) = A V /R V (red).We plot the SNe with B and V measurements within ±5 days of B maximum light.

Figure 6 .
Figure6.Intrinsic variation in optical and NIR intrinsic absolute light curves captured by the first functional component W 1 (t, λ).Variation in θ 1 W 1 (t, λ) captures the width-luminosity relation in the optical(Phillips 1993).Variation in this component simultaneously modulates the amplitude and timing of the second peak in the near-infrared.For visual clarity, the absolute light curves have been shifted vertically by arbitrary constants (B : 0, V : −1, r : −2.5, i : −4, z : 0.5, Y : −1, J : −3, H : −4).

Figure 7 .
Figure 7. Intrinsic variation and host galaxy dust effects on peak absolute magnitudes at T B,max (phase t = 0) in the rest-frame optical B and NIR Y, H bands.Each point is a posterior realisation of the intrinsic absolute magnitude M int s (blue) or host dust-extinguished absolute magnitude M ext s (red) of each SN.In each panel, we plot the SNe with data in a given filter.The solid line indicates the mean effect of the intrinsic W 1 (t, λ) model component on the intrinsic absolute magnitude through the coefficient θ 1 .The slope of this line is indicated as b.The dashed lines indicate ±1 standard deviation captured by the intrinsic residual covariance.The mean effect of host galaxy dust extinction in each band, quantified by the sample average difference between each SN's extinguished and intrinsic absolute magnitude, is shown.

Figure 9 .Figure 10 .Figure 11 .
Figure9.Effects of the covariance of phase-and wavelength-dependent intrinsic SED residuals on optical and NIR light curves (top) and colour curves (bottom).We fix the main effects θ 1 = A V = 0.The black solid lines represent the light curves or colour curves generated from the mean intrinsic SED model.The dashed lines correspond to ±1 population standard deviation around the mean curves captured by the intrinsic SED residual covariance.The light curves or colour curves corresponding to the effects of three sample realisations of intrinsic SED residual functions ηs(t, λ) are shown as blue, yellow, or red curves.For example, the red curves in all the panels correspond to the effect of a single realisation of a intrinsic SED residual function.

Figure 12 .
Figure12.Constraints on the host galaxy dust R V from the optical and NIR colour-colour diagram of SNe Ia observed in B, V , and H near peak (t = 0).(top left) Each point is a posterior realisation of the peak apparent colours (red) or intrinsic colours (blue) of a SN, corrected for the inferred intrinsic colour-shape relation to θ 1 = 0. Blue ellipses are (68%, 95%) contours of the intrinsic colour population distribution inferred during the training phase, which estimated a global dust law parameter R V = 2.9 ± 0.2.For comparison, the red solid (dashed) lines have the slope of the reddening vector for R V = 3 in these colours, and intercept the mean (are tangent to the 95% contour) of the intrinsic distribution.Nearly all of the SNe Ia apparent colours should lie within the dashed lines under the correct dust reddening law.(top right) Comparison of the apparent colour distribution with the inconsistent dust reddening vector for R V = 2. (bottom left) Comparison of the apparent colour distribution with the inconsistent dust reddening vector for R V = 1.5.

sFigure 13 .
Figure13.Map of population correlations between peak (t = 0) extinguished absolute magnitudes in optical and NIR passbands.These include all modelled sources of latent SED variation, including dust extinction, the intrinsic FPC θ 1 W 1 (t, λr), and the residual SED covariance.Dust effects induce significant wavelengthdependent correlations in the optical, but have significantly diminished effect in the NIR.While the optical magnitudes are significantly correlated with themselves, they are less so with the NIR magnitudes, with optical-NIR cross-correlations as low as ≈ 40%.This indicates there is additional information in the NIR that helps improve distance estimates.

Figure 14 .
Figure 14.Comparison of Hubble diagrams and Hubble residuals from BayeSN, SNooPy, and SALT2, applied to the same set of CfA and CSP SNe Ia with NIR data near maximum light.(top left) Hubble Diagram of photometric distances obtained by fitting the optical and NIR light curves, compared to the local distance-redshift relation under standard cosmological parameters.(bottom left) Hubble residuals for BayeSN.The simple total RMS is 0.096 mag.After removing the expected variance due to peculiar velocity uncertainty (dashed, σpec = 150 km s −1 ), the remaining dispersion is σ-pv = 0.078 mag.The distance uncertainties are determined via marginalisation accounting for the residual covariance (Eq.30).(top right) Hubble residuals from SALT2 applied to the optical-only data (BV RI) of the same sample.(bottom right) Hubble residuals from SNooPy applied to the optical and NIR data of the same sample.

Figure 15 .
Figure 15.(top) BayeSN light curve fit of Foundation DR1 griz observations of ATLAS16cxr.(bottom) Posterior distribution of BayeSN parameters from the light curve fit.

Table 1 .
Table of supernovae External distance estimate and standard deviation, either from redshift-independent distance estimate or from redshift and assumed H 0 = 73.24kms−1 Mpc −1 .See Avelino et al. (2019) Tables2 and 4.
Avelino et al. (2019)ctions for local flows and CMB as described inAvelino et al. (2019).For 8 SN with available redshift-independent distance estimates from Cepheids, Tully-Fisher, or surface brightness fluctuations, this is an effective redshift as described inAvelino et al. (2019).b c Optical+NIR BayeSN photometric distance estimate obtained by resubstitution.d Optical+NIR BayeSN photometric distance estimate obtained by cross-validation.

Table 2
BayeSN achieves a low total RMS = 0.096 mag on this set.Removing the expected contribution from external distance error and peculiar velocities, we obtain σ-pv = 0.077 mag.Meanwhile, on the same set of SNe Ia, SNooPy and SALT2 have larger RMS ≈ 0.14 mag, with σ-pv ≈ 0.10−0.12mag.Notably, the photometric distance modulus uncertainties of individual SNe Ia from SNooPy or SALT2 with the standard procedure are small in comparison to the total RMS, because they only propagate the uncertainties due to photometric light curve errors.In contrast, the BayeSN photometric distance uncertainties are obtained in a principled manner by marginalisation of the latent components including the residual covariance (Eq.30).The individual photometric distance uncertainties from BayeSN listed in Table1already reflect the scatter in the Hubble diagram.
, these estimates are labelled "BayeSN-tr".Fig.14shows the Hubble diagram obtained with BayeSN fits of optical and NIR data of the CfA+CSP NIR@max sample.With joint optical and NIR data,