-
PDF
- Split View
-
Views
-
Cite
Cite
Cong Ma, Pier-Stefano Corasaniti, Bruce A. Bassett, Application of Bayesian graphs to SN Ia data analysis and compression, Monthly Notices of the Royal Astronomical Society, Volume 463, Issue 2, 01 December 2016, Pages 1651–1665, https://doi.org/10.1093/mnras/stw2069
Close -
Share
Abstract
Bayesian graphical models are an efficient tool for modelling complex data and derive self-consistent expressions of the posterior distribution of model parameters. We apply Bayesian graphs to perform statistical analyses of Type Ia supernova (SN Ia) luminosity distance measurements from the joint light-curve analysis (JLA) data set. In contrast to the χ2 approach used in previous studies, the Bayesian inference allows us to fully account for the standard-candle parameter dependence of the data covariance matrix. Comparing with χ2 analysis results, we find a systematic offset of the marginal model parameter bounds. We demonstrate that the bias is statistically significant in the case of the SN Ia standardization parameters with a maximal 6σ shift of the SN light-curve colour correction. In addition, we find that the evidence for a host galaxy correction is now only 2.4σ. Systematic offsets on the cosmological parameters remain small, but may increase by combining constraints from complementary cosmological probes. The bias of the χ2 analysis is due to neglecting the parameter-dependent log-determinant of the data covariance, which gives more statistical weight to larger values of the standardization parameters. We find a similar effect on compressed distance modulus data. To this end, we implement a fully consistent compression method of the JLA data set that uses a Gaussian approximation of the posterior distribution for fast generation of compressed data. Overall, the results of our analysis emphasize the need for a fully consistent Bayesian statistical approach in the analysis of future large SN Ia data sets.
1 INTRODUCTION
Over the past decade, observational programmes dedicated to Type Ia supernovae (SN Ia) have significantly enlarged the original data set that lead to the pioneering discovery of the cosmic acceleration (Riess et al. 1998; Perlmutter et al. 1999). To date, these systematic searches have detected about a thousand SN Ia across a large redshift range (see Astier et al. 2006; Riess et al. 2007; Wood-Vasey et al. 2007; Frieman et al. 2008; Hicken et al. 2009; Contreras et al. 2010; Suzuki et al. 2012; Tonry et al. 2012; Campbell et al. 2013). Thanks to this new generation of SN surveys, it has been possible to achieve unprecedented high statistical precision on luminosity distance measurements. In fact, there is a widespread consensus that current cosmological constraints from SN Ia are limited by systematic uncertainties (see Conley et al. 2011; Scolnic et al. 2014). Potential sources of bias arise from variations of SN magnitudes that correlate with host galaxy properties (see Kelly et al. 2010; Sullivan et al. 2010; Maguire et al. 2012) as well as model assumptions in the light-curve fitting methods that are used to standardize the SN sample.
Recently, in an effort to bring SN Ia observations from different data sets on a common ground, Betoule et al. (2014, hereafter B14) have performed a joint light-curve analysis (JLA) of data from the Supernova Legacy Surveys (SNLS), the Sloan Digital Sky Survey-II supernova survey (SDSS-II) and a variety of programmes that targeted low- and high-redshift SNe. The full data set has been made publicly available, including light-curve fitting parameters with their covariance matrices and a compressed set of distance modulus data, thus providing all elements necessary to perform statistically robust cosmological data analyses.
SN Ia magnitudes are standardized using an empirical relation between the maximum absolute magnitude peak and the time width (Phillips 1993; Hamuy et al. 1996; Phillips et al. 1999) of the light curve and the SN colour (Tripp 1998). These parameters are first extracted for each SN by fitting the observed light curves, then they are used in the standard-candle relation to estimate the distance moduli from which cosmological parameter constraints are finally inferred. This is the operational mode of the SALT2 light-curve fitting model (Mosher et al. 2014) originally introduced in Guy et al. (2007) and used to derive the measurements of B14. A critical aspect of this process concerns the propagation of uncertainties in the standardization parameters that parametrize light-curve features in the standard-candle relation. In the context of Bayesian statistics, this problem is addressed unambiguously by assigning priors to the standardization parameters. More in general, Bayesian methods can handle all the complexity of large SN data sets while providing a self-consistent probabilistic modelling of the data. As an example, in χ2 analyses the residual SN intrinsic magnitude scatter is usually fitted together with the cosmological parameters under the (unphysical) requirement that the χ2 per degree of freedom of the best-fitting model is ∼1. This is not needed in the Bayesian framework where it is possible to derive the full posterior probability distribution of the intrinsic scatter. March et al. (2011, 2014) have shown this to be the case using a Bayesian hierarchical (or graphical/network) model of the SN data. Recently, Shariff et al. (2016) have also applied a similar formalism to the analysis of the JLA data set to simultaneously infer constraints on cosmological and standardization parameters. Bayesian graphs, also known as Bayesian networks, have a twofold advantage over χ2 statistical methods (see for a review Jensen & Nielsen 2007). On the one hand, it provides a better understanding of the data through a graphical representation of the causal and probabilistic connections of all problem's variables. On the other hand, the graphical model allows one to directly derive the factorized form of joint probability distributions for the parameter of interests, thus providing a (numerical) solution even when the problem is extremely complex.
Here, we use Bayesian graphical models for the JLA data set to perform a self-consistent cosmological parameter inferences that account for the light-curve fitting parameter dependence of the data covariance. This is an important point that has been overlooked in previous SN studies. We will show that such a dependence not only impacts the cosmological parameter constraints, but also the estimation of the standard-candle parameters. Neglecting such a dependence is even more problematic in the case of compressed SN data sets. Due to the statistical nature of the compression method, the effect of the parameter-dependent covariance can lead to biased results. Once the compression is done, there is no simple method to amend the inconsistency using compressed data alone.
The paper is organized as follows. In Section 2, we introduce the basic concepts of SN cosmology and describe the JLA data. In Section 3, we introduce Bayesian graphical models and discuss their application of the JLA data set, while in Section 4 we will present the result of the cosmological parameter inference. In Section 5, we describe the statistical compression method applied to the JLA distance modulus data and discuss the result of various tests in Section 6. Finally, Section 7 presents our conclusion.
2 COSMOLOGY WITH SN IA DATA
In this section, we will briefly review the main concepts of SN cosmology and introduce the JLA data set. We will do so from our Bayesian point of view which refers to the fact that (1) all parameters are considered random variables to which we assign prior probability distributions based on available information or the lack thereof; (2) data are described by random variables with only one realization deduced from the observations, thus formally described in probabilistic expression as conditioned variables.
The possibility of this standardization was first pointed out by Phillips (1993) who showed that the SN peak luminosity correlates with the rate of brightness decline (or stretch) of the light curve. Subsequently, Tripp (1998) showed an additional correlation with the SN colour. Further correlations have been found with the host galaxy properties, such as the star formation rate and metallicity (Gallagher et al. 2005; Rigault et al. 2015), stellar mass (Kelly et al. 2010) and galaxy morphology (Hicken et al. 2009). SN samples such as the JLA data set include such corrections as we shall describe next.
2.1 Description of the JLA data set
These data are provided with a set of metadata containing non-random information such as the SN redshift (for which errors are negligible), the stellar mass of the host galaxy Mstellar (in units of solar mass M⊙) and a tag specifying the sample of origin of each SN. The panels in Fig. 1 summarize the redshift distribution of the JLA observables.
Visualization of the JLA data set coloured by the four component source surveys (low z, SDSS, SNLS and HST). Displayed are the apparent peak magnitude |${m}_{B}^{\star }$| (top-left panel), the light-curve stretch X1 (top-right panel) and the colour C (bottom-right panel). The bottom-right panel shows the SN redshift empirical distribution (left axis, filled histograms) and the cumulative distribution (right axis, unfilled histograms).
Visualization of the JLA data set coloured by the four component source surveys (low z, SDSS, SNLS and HST). Displayed are the apparent peak magnitude |${m}_{B}^{\star }$| (top-left panel), the light-curve stretch X1 (top-right panel) and the colour C (bottom-right panel). The bottom-right panel shows the SN redshift empirical distribution (left axis, filled histograms) and the cumulative distribution (right axis, unfilled histograms).
Notice that the standardization parameters α, β and ΔM are again random variables. In general, their joint distribution is assumed to be the same for all individual SNe in the sample. Being random variables, in the Bayesian approach they can be assigned a prior probability based on prior information (or the lack thereof), and their posterior probability can be obtained by the inference process. Therefore, in this paper, we will not attempt to fix them at any particular values.
2.2 Cosmological model of the luminosity distance
3 GRAPHICAL MODELS
Here, we introduce Bayesian graphical models (or networks) and describe their application to SN Ia data. The literature on graphical representations of Bayesian statistical models is quite vast; we refer the interested reader to review papers by Jordan (2004) and D'Agostini (2005) for a first introduction and to Kjærulff & Madsen (2013) for an extended treatment of the subject.
3.1 Statistical inference and graphical representations
Graphical model representing deterministic and probabilistic relations between model parameters (|$\boldsymbol {\Theta }$|), model predictions (|$\boldsymbol {\mu }_{{\rm t}}$|) and variables with evidence (|$\boldsymbol {\mu }, \boldsymbol{\sf C}$|), in this case deduced from observational data.
Graphical model representing deterministic and probabilistic relations between model parameters (|$\boldsymbol {\Theta }$|), model predictions (|$\boldsymbol {\mu }_{{\rm t}}$|) and variables with evidence (|$\boldsymbol {\mu }, \boldsymbol{\sf C}$|), in this case deduced from observational data.
In Fig. 2, we can already identify different kinds of nodes. First, the grey nodes represent the random variables on which we have evidence. The evidence may come in the form of observational data or other considerations specified probabilistically. Henceforward we will denote these nodes as ‘evident’ ones, which is more general than the term ‘observed’, thus avoiding confusion with purely observed data. Secondly, a node may be marked by a double-circled boundary if it is deterministic, i.e. its conditional distribution is a Dirac δ distribution. In this paper, our notations for the nodes follow that of Shachter (1998).
Notice that both the nodes for |$\boldsymbol{\sf C}$| and |$\boldsymbol {\Theta }$| have no parents, i.e. there are no edges from other nodes leading to them. In this sense, both may be said to be ‘unconditioned’. However, they take different roles in the statistical reasoning. The theoretical model parameter |$\boldsymbol {\Theta }$| is directly specified by its prior probability |$P(\boldsymbol {\Theta })$|; on the other hand, |$\boldsymbol{\sf C}$| is an evident variable. In fact, though it may not be a direct observable, it can be derived by a data-processing pipeline that propagates the statistical distributions from a variety of observables. We can thus imagine another graphical model in which edges that flow from the observables ultimately arrive at |$\boldsymbol{\sf C}$|. However, once this upstream analysis is performed and |$\boldsymbol{\sf C}$| is given its evident value, its use in a subsequent analysis severs the links to the original observables in the ‘upstream’ of the pipeline (Kjærulff & Madsen 2013, chapter 2.5.1). Thus, while |$\boldsymbol{\sf C}$| has evidence, the parameter |$\boldsymbol {\Theta }$| has none which justify our convention in denoting their respective nodes. The data variable |$\boldsymbol {\mu }$| occupies the root node of the graph. If there are more data sets available, these will appear as multiple root nodes at the bottom of the graph.
Starting from the graphical model in Fig. 2, one can easily construct the factorized joint probability distribution by traversing the graph, which is equivalent to the chain rule. Each non-observed starting node contributes with a prior PDF [e.g. |$\boldsymbol {\Theta } \rightarrow P(\boldsymbol {\Theta })$|], while non-starting nodes contribute with conditional probabilities that are conditional to the variables associated with the connected nodes [e.g. |$\boldsymbol {\mu } \rightarrow P({\boldsymbol {\mu }} \, \vert \, {\boldsymbol {\mu }_{{\rm t}}, \boldsymbol{\sf C}})$|]. The grey, ‘observed’ nodes provide evidence, usually in the form of the available realization provided by the data set, which constrains the randomness of the parameters.
Graphical model as in Fig. 2 with the addition of a deterministic node |$\Delta \boldsymbol {\mu } = \boldsymbol {\mu } - \boldsymbol {\mu }_{{\rm t}}$| and an evident node |$\boldsymbol {\mu }_0$| whose value is to be fixed at |$\boldsymbol {0}$|.
Graphical model as in Fig. 2 with the addition of a deterministic node |$\Delta \boldsymbol {\mu } = \boldsymbol {\mu } - \boldsymbol {\mu }_{{\rm t}}$| and an evident node |$\boldsymbol {\mu }_0$| whose value is to be fixed at |$\boldsymbol {0}$|.
The extended graphical model may appear trivial. However, the addition of the extra nodes is indeed the key to handle the fact that the distance modulus data from SN Ia depend on additional standardization parameters. In particular, both |$\boldsymbol {\mu }$| and |$\boldsymbol{\sf C}$| now occupy similar positions with no parents, and our discussion about such data-derived evident variables, exemplified by |$\boldsymbol{\sf C}$| in the context of Fig. 2, now applies symmetrically to both.
3.2 Graphical model of inference with JLA data
Graphical model of cosmological analysis using the JLA data set.
Graphical model of cosmological analysis using the JLA data set.
The above expressions show an evident fallacy of the χ2 analysis, namely neglecting the contribution of the parameter-dependent covariance. This term cannot be dismissed as an implied constant even if one argues for the use of χ2 statistics as motivated by the non-Bayesian theory of least squares which yields the minimum variance unbiased estimator. In fact, this theory requires that the covariance (or dispersion) matrix has to be known up to a constant multiplier (Rao 1945). However, if the covariance is parameter dependent, then in order to apply the classical least-squares approach, the covariance must be approximated quadratically so that the parameter-dependent contribution can be absorbed into a quadratic form with constant dispersion matrix. Hence, such dependence does need to be properly propagated in the final parameter inference. As we will show next, neglecting this term can lead to biased results since maximizing the posterior is not equivalent to minimizing the χ2.
4 JLA COSMOLOGICAL PARAMETER CONSTRAINTS
We perform a cosmological parameter inference using the JLA data set and compare results based on the computation of the posterior distribution equation (26) versus the customary χ2 analysis specified by equation (27). We will refer to the former as the Bayesian approach and the latter as ‘χ2’. As the target model we consider a flat dark energy wCDM model with parameters |$\boldsymbol {\Theta } = [ \Omega _{{\rm M}}, w, h, M, \alpha , \beta , \Delta _M ]$|, where w is a constant dark energy equation-of-state parameter. We assume a Gaussian prior on h with mean 0.688 and standard deviation 0.033 consistent with the recent analysis on the distances to nearby SN Ia by the Nearby Supernova Factory project (Rigault et al. 2015). Following the discussion in Planck Collaboration XIII (2015, section 5.4), we use the NFS value obtained from an independent megamaser distance calibration to NGC 4258 (Humphreys et al. 2013). This result is consistent (within 0.5σ) with the value of the Hubble constant obtained from the Hubble Space Telescope (HST) Cepheid and nearby SN Ia data (Riess et al. 2011) as re-analysed by Efstathiou (2014) also calibrated on NGC 4258 alone. This prior differs from that used in the main analysis of B14 that assumed a hard prior h = 0.7 (while letting |$M_B^1$| to freely vary).1 For the other parameters, we assume uniform priors in the following intervals: ΩM ∈ [0, 1], w ∈ [ − 2.5, 1], M ∈ [ − 5, 5], α ∈ [ − 1, 1], β ∈ [ − 10, 10] and ΔM ∈ [ − 0.5, 0.5].
We evaluate the posterior distribution using the Markov chain Monte Carlo (MCMC) method as implemented in by the pymc2 library (Patil, Huard & Fonnesbeck 2010). The χ2 analysis based on equation (27) is performed by inserting a potential function proportional to |$\sqrt{\det \boldsymbol{\sf C}_{{\rm d}}}$| in the joint probability distribution, which compensates the term |$(\ln \det \boldsymbol{\sf C}_{{\rm d}}) / 2$| in equation (26) so that the ‘standard’ χ2 analysis is emulated. We run four chains with 5 × 105 samples each, and check their convergence using the Gelman–Rubin test (Gelman & Rubin 1992; Brooks & Gelman 1998). The estimated Monte Carlo standard error on the parameter mean is of the order of 10−2 of statistical standard deviation, negligibly affecting the results.
In Fig. 5, we plot the marginalized PDF contours in the ΩM–w plane obtained from the Bayesian and χ2 analyses, respectively. We can see that the effect of neglecting the covariance term results in a systematic offset of the probability contours. The marginalized mean and standard deviation from the MCMC samples give w = −0.82 ± 0.22 for the posterior PDF analysis and w = −0.88 ± 0.24 for the χ2 approach, while we find ΩM = 0.22 ± 0.11 in both cases. These results are consistent to within 1σ with the findings of Shariff et al. (2016).
Marginal 0.683 and 0.95 two-dimensional credibility regions in the ΩM–w plane for a flat wCDM model derived from the analysis of the full posterior distribution (black solid lines) and the χ2 analysis (blue dashed lines).
Marginal 0.683 and 0.95 two-dimensional credibility regions in the ΩM–w plane for a flat wCDM model derived from the analysis of the full posterior distribution (black solid lines) and the χ2 analysis (blue dashed lines).
Although the bias effect on w appears to be small (about 0.25σ), this can be deceptive. As is well known, the parameters (w, ΩM) display significant degeneracy when using SN Ia data alone. The cosmological constraints noticeably tighten when combined with other cosmological probes as shown for instance in B14. Hence, the bias effect shown here may be enhanced by the addition of complementary constraints as also found by Shariff et al. (2016). The comparison of the marginal distributions of (w, ΩM) is not all there is to the full inference. Unsurprisingly, we find a more significant bias effect on (α, β) and ΔM on which |$\ln \det \boldsymbol{\sf C}_{{\rm d}}$| directly depends.
In Fig. 6, we plot the posterior contours for different combinations of standardization parameter pairs, while in Table 1 we quote the marginal mean and standard deviation of α, β and ΔM, respectively. We may notice that the values derived in the χ2 case are consistent with those quoted in B14. We can see that the χ2 analysis significantly shifts the standardization parameters away from the ideal standard-candle case (i.e. α = β = ΔM = 0) compared to the Bayesian approach. In particular, we have systematic offsets of 2σ for the stretch parameter, 6σ for the colour correction parameter and about 1σ for the host galaxy correction. This indicates that the data require less adjustment of the light-curve shape, SN colour and host stellar mass, which is a direct consequence of the fact that neglecting the covariance term in the χ2 analysis is equivalent to a distortion of the parameter priors. In fact, it amounts to the replacement |$\ln P(\boldsymbol {\Theta }) \rightarrow \ln P(\boldsymbol {\Theta }) + \frac{1}{2} \ln \det \boldsymbol{\sf C}_{\rm d}(\boldsymbol {\Theta })$| up to a normalization, thus leading to a level of distortion of the uniform prior on α and β as shown in Fig. 7. We can now see why the χ2 analysis gives larger values of the standardization parameter. It effectively uses a prior that artificially underestimates the region where (α, β) is close to zero, while it overestimates the range where it is large.
Marginal 0.683 and 0.95 credibility levels for pairs of standardization parameter derived from the Bayesian (black solid lines) and χ2 (blue dashed lines) analysis.
Marginal 0.683 and 0.95 credibility levels for pairs of standardization parameter derived from the Bayesian (black solid lines) and χ2 (blue dashed lines) analysis.
Level contours of |$[ \ln \det \boldsymbol{\sf C}_{{\rm d}}(\alpha , \beta ) ] / 2$| for the JLA data set. This can be interpreted as the log-distortion of the priors on α and β.
Level contours of |$[ \ln \det \boldsymbol{\sf C}_{{\rm d}}(\alpha , \beta ) ] / 2$| for the JLA data set. This can be interpreted as the log-distortion of the priors on α and β.
Marginalized mean and standard deviation of SN Ia standardization parameters inferred from the full posterior analysis and from the χ2 approach for the wCDM model, along with the number of σ bias.
| . | This work . | χ2 analysis . | Bias amplitude . |
|---|---|---|---|
| α | 0.127 ± 0.006 | 0.141 ± 0.007 | 2σ |
| β | −2.62 ± 0.07 | −3.10 ± 0.08 | 6σ |
| ΔM | −0.053 ± 0.022 | −0.071 ± 0.023 | 0.8σ |
| . | This work . | χ2 analysis . | Bias amplitude . |
|---|---|---|---|
| α | 0.127 ± 0.006 | 0.141 ± 0.007 | 2σ |
| β | −2.62 ± 0.07 | −3.10 ± 0.08 | 6σ |
| ΔM | −0.053 ± 0.022 | −0.071 ± 0.023 | 0.8σ |
Marginalized mean and standard deviation of SN Ia standardization parameters inferred from the full posterior analysis and from the χ2 approach for the wCDM model, along with the number of σ bias.
| . | This work . | χ2 analysis . | Bias amplitude . |
|---|---|---|---|
| α | 0.127 ± 0.006 | 0.141 ± 0.007 | 2σ |
| β | −2.62 ± 0.07 | −3.10 ± 0.08 | 6σ |
| ΔM | −0.053 ± 0.022 | −0.071 ± 0.023 | 0.8σ |
| . | This work . | χ2 analysis . | Bias amplitude . |
|---|---|---|---|
| α | 0.127 ± 0.006 | 0.141 ± 0.007 | 2σ |
| β | −2.62 ± 0.07 | −3.10 ± 0.08 | 6σ |
| ΔM | −0.053 ± 0.022 | −0.071 ± 0.023 | 0.8σ |
In the light of these results, we are tempted to conclude that the Bayesian approach adds more weight to our belief that SN Ia are standard candles, at least more than what we are led to believe if we use the customary χ2 analysis.
To test whether the observed level of bias is cosmological model dependent, we have performed similar analyses for LambdaCDM models with or without non-zero curvature. The marginal mean and variance of the parameters are quoted in Table 2. We can see that the bias remains of the same amplitude for the different cosmological model assumptions.
Marginal mean and standard deviation of model parameters for LambdaCDM models as inferred from the Bayesian method of this work and from χ2 analysis.
| . | ΛCDM . | ΛCDM (χ2) . | Flat ΛCDM . | Flat ΛCDM (χ2) . |
|---|---|---|---|---|
| α | 0.126 ± 0.006 | 0.141 ± 0.006 | 0.126 ± 0.006 | 0.141 ± 0.007 |
| β | −2.62 ± 0.07 | −3.10 ± 0.08 | −2.62 ± 0.07 | −3.10 ± 0.08 |
| ΔM | −0.053 ± 0.022 | −0.071 ± 0.023 | −0.053 ± 0.022 | −0.070 ± 0.023 |
| ΩM | 0.22 ± 0.10 | 0.20 ± 0.10 | 0.33 ± 0.03 | 0.30 ± 0.03 |
| ΩDE | 0.50 ± 0.15 | 0.55 ± 0.15 | N/A | N/A |
| . | ΛCDM . | ΛCDM (χ2) . | Flat ΛCDM . | Flat ΛCDM (χ2) . |
|---|---|---|---|---|
| α | 0.126 ± 0.006 | 0.141 ± 0.006 | 0.126 ± 0.006 | 0.141 ± 0.007 |
| β | −2.62 ± 0.07 | −3.10 ± 0.08 | −2.62 ± 0.07 | −3.10 ± 0.08 |
| ΔM | −0.053 ± 0.022 | −0.071 ± 0.023 | −0.053 ± 0.022 | −0.070 ± 0.023 |
| ΩM | 0.22 ± 0.10 | 0.20 ± 0.10 | 0.33 ± 0.03 | 0.30 ± 0.03 |
| ΩDE | 0.50 ± 0.15 | 0.55 ± 0.15 | N/A | N/A |
Marginal mean and standard deviation of model parameters for LambdaCDM models as inferred from the Bayesian method of this work and from χ2 analysis.
| . | ΛCDM . | ΛCDM (χ2) . | Flat ΛCDM . | Flat ΛCDM (χ2) . |
|---|---|---|---|---|
| α | 0.126 ± 0.006 | 0.141 ± 0.006 | 0.126 ± 0.006 | 0.141 ± 0.007 |
| β | −2.62 ± 0.07 | −3.10 ± 0.08 | −2.62 ± 0.07 | −3.10 ± 0.08 |
| ΔM | −0.053 ± 0.022 | −0.071 ± 0.023 | −0.053 ± 0.022 | −0.070 ± 0.023 |
| ΩM | 0.22 ± 0.10 | 0.20 ± 0.10 | 0.33 ± 0.03 | 0.30 ± 0.03 |
| ΩDE | 0.50 ± 0.15 | 0.55 ± 0.15 | N/A | N/A |
| . | ΛCDM . | ΛCDM (χ2) . | Flat ΛCDM . | Flat ΛCDM (χ2) . |
|---|---|---|---|---|
| α | 0.126 ± 0.006 | 0.141 ± 0.006 | 0.126 ± 0.006 | 0.141 ± 0.007 |
| β | −2.62 ± 0.07 | −3.10 ± 0.08 | −2.62 ± 0.07 | −3.10 ± 0.08 |
| ΔM | −0.053 ± 0.022 | −0.071 ± 0.023 | −0.053 ± 0.022 | −0.070 ± 0.023 |
| ΩM | 0.22 ± 0.10 | 0.20 ± 0.10 | 0.33 ± 0.03 | 0.30 ± 0.03 |
| ΩDE | 0.50 ± 0.15 | 0.55 ± 0.15 | N/A | N/A |
Independently of the underlying cosmological model, we find no information gain on h, whose posterior remains indistinguishable from the assumed Gaussian prior. On the other hand, we find M = −0.03 ± 0.11 for wCDM and in the case of the non-flat LambdaCDM cosmology, while M = −0.04 ± 0.11 for the flat LambdaCDM case. As expected, the joint posterior in the M–h plane shows a strong degeneracy along the direction M΄ = M − 5log10h as shown in Fig. 8. From the marginalized posterior, we find σ(M) ≈ 0.1. This reflects the posterior dispersion of |$M_B^1$| that should have been there if we had chosen to let it vary freely (see Sections 2.1 and 2.2). If we had neglected M altogether in our model specifications, by degeneracy this could have led to a spurious constraint on h.
Marginal contours in the M–h plane from the full posterior analysis for the flat wCDM model. The solid horizontal line indicates the value h = 0.7 used in B14, while the dashed curve shows the degeneracy direction M − 5log10h = const.
Marginal contours in the M–h plane from the full posterior analysis for the flat wCDM model. The solid horizontal line indicates the value h = 0.7 used in B14, while the dashed curve shows the degeneracy direction M − 5log10h = const.
5 DATA COMPRESSION
5.1 The issue of scalability
We now turn to the problem of compressing the JLA data set. The need for compressed data may respond to specific needs of cosmological analysis. For instance, tests of the distance-duality relation requires luminosity distance estimates at redshift locations where angular-diameter distance measurements are also available. However, the main application of data compression is to address the problem of scalability. With the increasing size of SN data sets, the evaluation of expressions such as equation (26) will be computationally more challenging. In particular, evaluation of forms such as |$\boldsymbol {y} = \boldsymbol{\sf C}_{{\rm d}}^{-1} \boldsymbol {x}$| by solving the linear system of equations |$\boldsymbol{\sf C}_{{\rm d}} \boldsymbol {y} = \boldsymbol {x}$| will be inefficient when |$\boldsymbol{\sf C}_{{\rm d}}$| is parameter dependent. This is caused by the computational complexity scaling as |$\mathcal {O}(n^3)$|, or cubic of the data size (see e.g. Golub & van Loan 2013, chapter 4.2.5). With changing parameter, for instance during MCMC evaluation, this cubic operation has to be performed whenever the parameters change value.
The scalability issue of computing a parameter-dependent covariance matrix has been gaining more attention recently, especially in the context of statistical data analyses dedicated to measurements of the clustering of matter in the universe. As an example, White & Padmanabhan (2015) have developed an interpolation method for efficiently evaluating the likelihood of the two-point correlation function of the matter density field. A different approach to handle large data sets with covariances relies instead on approximating the object of interest with a reduced set of functional bases. This method has seen widespread use in the context of cosmic microwave background data analysis (see Tegmark 1997; Tegmark, Taylor & Heavens 1997) and has been applied in B14 to the JLA data set.
Here, we aim to perform a thorough analysis of the distance modulus data compression procedure in the context of the Bayesian framework we have discussed in the previous sections. This will enable us to assess the impact of model assumptions and more importantly the effects of the parameter-dependent covariance on the data compression itself.
5.2 Formalism of linear compression
The goal of the compression is to provide the user with a reduced data set of distance modulus estimates |$\boldsymbol {\mu }_{{\rm dc}}$| together with their covariance matrix |$\boldsymbol{\sf C}_{\mathrm{dc}}$| and the post-compression standardization parameters |$\boldsymbol {\varphi }_{{\rm dc}}$| (correlated with |$\boldsymbol {\mu }_{{\rm dc}}$|).
In B14, the linear compression of the JLA data set is performed by first taking the logarithm of the redshift z. This is because the log-transformation of the redshift makes the cosmological-dependent part of the signal better linearized (as can be seen in Fig. 1). Then, the distance modulus data are fitted against a parametric model that is represented by ‘broken line segments’ with control points at fixed log-redshift locations {x1 < x2 < ⋅⋅⋅ < xm} (in this section and the next, the symbol x will be used for log-redshift). The values of the model parameters at the control points define the fitting parameters of the compression procedure. Their (posterior) mean and covariance give the final compressed data set.
It is worth noticing that the specific choice of |$\boldsymbol{\sf B}$| is more or less arbitrary. The form of the basis functions may be dictated by the needs of the problem at hand. For instance, if the data to be compressed have structures in the scale space, a set of wavelet bases would be a well-motivated choice (see e.g. Pando et al. 1998). On the other hand, if the goal is to extract low-variance, discriminating information from noisy data at the cost of bias, then the suitable bases may be found through principal component analysis methods (see Huterer & Starkman 2003; Huterer & Cooray 2005).
We adopt the sawtooth bases used in B14 which are especially suitable when the signal to be extracted is expected to be fairly continuous over the support interval S (as in the case of the distance modulus). The sawtooth bandwidth is set by the user. A constant value of the bandwidth corresponds to evenly spaced control points. On the other hand, it is possible to even out statistical noise by adjusting the sawtooth window such as to cover the same number of data points, a choice that prevents sawtooth windows to cover insufficient data, which may result in overfitting.
5.3 Approximate solution and optimal compression
To perform the maximization, we use the trust-region Newton-conjugate-gradient (trust-ncg) algorithm implementation (Nocedal & Wright 2006, chapter 7.1) from the python library scipy.optimize.3 Using analytical expressions for |$\boldsymbol {D}$| and |$\boldsymbol{\sf H}$|, it finds the optimal point |$\boldsymbol {\varphi }^{\star }$| in a few seconds on a typical desktop computer and evaluates the approximated posterior by computing equation (32) at |$\boldsymbol {\varphi }^{\star }$|. This is a Gaussian PDF with mean |$\boldsymbol {\varphi }^{\star }$| and covariance |$\boldsymbol{\sf C}_{\boldsymbol {\varphi }} = 2 \boldsymbol{\sf H}^{-1}(\boldsymbol {\varphi }^{\star })$| from which the marginal distribution for both the compression coefficients |$\boldsymbol {\xi }$| and the post-compression standardization parameters |$\boldsymbol {\varphi }_{{\rm dc}}$| is obtained. Then, the code uses the optimal compression coefficients and the covariance to generate series of distance modulus data |$\boldsymbol {\mu }_{{\rm dc}}$| at any given output log-redshift locations |$\tilde{x}_i = \log _{10}\tilde{z}_i$| (specified by the user) by computing the Gaussian random vector |$\boldsymbol {\mu }_{{\rm dc}} = \tilde{\boldsymbol{\sf B}} \boldsymbol {\xi }$| with mean |$\langle {\boldsymbol {\mu }_{{\rm dc}}}\rangle = \tilde{\boldsymbol{\sf B}} \boldsymbol {\xi }^{\star }$| and covariance |$\boldsymbol{\sf C}_{{\rm dc}} = \tilde{\boldsymbol{\sf B}} \boldsymbol{\sf C}_{\boldsymbol {\xi }} \tilde{\boldsymbol{\sf B}}^{\intercal }$|, where the elements of the ‘data-generation matrix’ |$\tilde{\boldsymbol{\sf B}}$| are |$\tilde{B}_{ij} = b_j(\tilde{x}_i)$|.4
There is considerable freedom in the choice of the output redshift locations |$\tilde{z}_i$| or their logarithm |$\tilde{x}_i$|. Nevertheless, one should avoid putting more than two output |$\tilde{x}_i$|'s between each pair of adjacent control points that have been specified at the beginning of the compression procedure (see Section 5.2). In that case, these compressed output data will not be affine independent, thus providing little additional information. Similarly, given a chosen set of m basis functions, there is no purpose in generating more than m compressed data points, because the additional points will be inevitably affine dependent on the others (a consequence of the pigeon-hole principle). A special choice of the redshift locations is given by setting them to the control points. In such a case, the data-generation matrix |$\tilde{\boldsymbol{\sf B}}$| (or |$\tilde{\boldsymbol{\sf B}}^{\prime }$| for the inclusion of post-compression standardization parameters) is the identity matrix |$\boldsymbol{\sf I}$|, and no actual computation for data generation needs to be done.
The JLA data compression code we have developed for this analysis is publicly available.5 In Appendix A, we present the result of this compression at the same redshift locations as those of B14.
6 ASSESSMENT OF COMPRESSED DATA
6.1 Comparison with B14 compression
Here, we present the results of the JLA data compression analysis. Our goal is to evaluate the impact of the parameter-dependent covariance on the resulting data set and compare it with the compressed sample from B14. Following the prescription adopted in B14, we set 31 log-equidistant control points in the redshift range z ∈ [0.01, 1.30] for the sawtooth basis defined in equation (28). Using the compression procedure described in the previous section, we infer the compression coefficients and the post-compression standardization parameters using the Gaussian approximation of the posterior distribution to equation (30). We test the accuracy of this approximation by MC sampling the full posterior distribution. We will refer to the former as ‘Approx.’ and the latter as ‘MC’. To compare to the results of B14, we perform an analogous estimation in which we neglect the |$\ln \det \boldsymbol{\sf C}_{{\rm d}}$| term in equation (30), cases which we will refer to as ‘Approx.-χ2’ and ‘MC-χ2’, respectively.
In Fig. 10, we plot the deviations of the mean of the generated distance moduli with respect to the compressed data from table F.1 of B14 obtained from the Gaussian approximation of the posterior, the MC computation of the posterior, the quadratic approximation of the χ2 and the MC sampling of the χ2. The error bars are the marginalized standard deviations at each control point. They are displayed only for a qualitative visual comparison, since the figure does not reflect the full covariance of the compressed data which we will discuss later.
Difference of mean compressed distance moduli with respect to the compressed data set of B14, obtained from the Gaussian approximation of the posterior (black circles), MC computation of the posterior (black triangles), quadratic approximation of the χ2 (blue squares) and MC sampling of the χ2 (blue crosses). For visual purposes, we only display one every two data points staggered around the redshift locations by 0.015 dex to reduce crowding in the figure. The error bars show the size of the marginalized deviations corresponding to each control point from the respective data set.
Difference of mean compressed distance moduli with respect to the compressed data set of B14, obtained from the Gaussian approximation of the posterior (black circles), MC computation of the posterior (black triangles), quadratic approximation of the χ2 (blue squares) and MC sampling of the χ2 (blue crosses). For visual purposes, we only display one every two data points staggered around the redshift locations by 0.015 dex to reduce crowding in the figure. The error bars show the size of the marginalized deviations corresponding to each control point from the respective data set.
We can see that the results from the use of the Gaussian approximation are indistinguishable from those obtained using the MC sampling even in the case where the |$\ln \det \boldsymbol{\sf C}_{{\rm d}}$| term is neglected. This also guarantees that our optimization algorithm for the determination of the parameter vector |$\boldsymbol {\varphi }^{\star }$| has converged to a global maximum (minimum for the χ2 analysis) instead of a local one.
Let us now compare the differences of the generated distance moduli to those from B14. The latter are consistent with the compression obtained from the χ2 analysis for which differences are below 0.1 mag and well within 1σ errors especially at z ≥ 0.1 where differences vanish. However, we can also notice that the B14 compressed data set shows deviations as large as 1σ with respect to the result of the Bayesian analysis.
As for the standardization parameters, in Table 3 we quote their marginal mean and standard deviation, post-compression, obtained using different methods. Again, we can see that the estimates from the Gaussian approximation are consistent with the MC results and in agreement with the values inferred from the full data set shown in Tables 1 and 2.
Marginal mean and standard deviation of standardization parameters after compression for the various cases.
| . | Approx. . | Approx.-χ2 . | MC . | MC-χ2 . |
|---|---|---|---|---|
| α | 0.125 ± 0.006 | 0.140 ± 0.007 | 0.126 ± 0.007 | 0.141 ± 0.006 |
| β | −2.58 ± 0.07 | −3.08 ± 0.08 | −2.60 ± 0.08 | −3.11 ± 0.07 |
| ΔM | −0.052 ± 0.022 | −0.070 ± 0.023 | −0.053 ± 0.023 | −0.071 ± 0.022 |
| . | Approx. . | Approx.-χ2 . | MC . | MC-χ2 . |
|---|---|---|---|---|
| α | 0.125 ± 0.006 | 0.140 ± 0.007 | 0.126 ± 0.007 | 0.141 ± 0.006 |
| β | −2.58 ± 0.07 | −3.08 ± 0.08 | −2.60 ± 0.08 | −3.11 ± 0.07 |
| ΔM | −0.052 ± 0.022 | −0.070 ± 0.023 | −0.053 ± 0.023 | −0.071 ± 0.022 |
Marginal mean and standard deviation of standardization parameters after compression for the various cases.
| . | Approx. . | Approx.-χ2 . | MC . | MC-χ2 . |
|---|---|---|---|---|
| α | 0.125 ± 0.006 | 0.140 ± 0.007 | 0.126 ± 0.007 | 0.141 ± 0.006 |
| β | −2.58 ± 0.07 | −3.08 ± 0.08 | −2.60 ± 0.08 | −3.11 ± 0.07 |
| ΔM | −0.052 ± 0.022 | −0.070 ± 0.023 | −0.053 ± 0.023 | −0.071 ± 0.022 |
| . | Approx. . | Approx.-χ2 . | MC . | MC-χ2 . |
|---|---|---|---|---|
| α | 0.125 ± 0.006 | 0.140 ± 0.007 | 0.126 ± 0.007 | 0.141 ± 0.006 |
| β | −2.58 ± 0.07 | −3.08 ± 0.08 | −2.60 ± 0.08 | −3.11 ± 0.07 |
| ΔM | −0.052 ± 0.022 | −0.070 ± 0.023 | −0.053 ± 0.023 | −0.071 ± 0.022 |
To visualize the differences between pairs of covariances, we introduce an algebraic method described in Appendix B. This is based on the idea that two covariance matrices |$\boldsymbol{\sf C}_1$| and |$\boldsymbol{\sf C}_2$| can be linked by the matrix |$\boldsymbol{\sf W}_{12}$|, displayed as a bitmap image. If |$\boldsymbol{\sf C}_1 \approx \boldsymbol{\sf C}_2$|, then the matrix |$\boldsymbol{\sf W}_{12}$| is close to the identity. If they differ by a simple scaling, then |$\boldsymbol{\sf W}_{12}$| is diagonal with diagonal elements differing from unity. On the other hand, if differences occur on off-diagonal elements, these will stand out as off-diagonal features on the image of |$\boldsymbol{\sf W}_{12}$|. In Fig. 11, we display |$\boldsymbol{\sf W}_{12}$| between pairs of matrices for the different cases. In each panel, we also quote the ratio r and the centred KL divergence value DKL. Comparison between some of the pairs is not shown since it would only provide redundant information. We use a colour palette suitable for the bimodal distribution of all pixel values.
Pairwise comparisons of covariance matrices for compressed distance moduli obtained from various computations. In each panel, the comparison matrix |$\boldsymbol{\sf W}_{12}$| is displayed as a bitmap image. The scaled ratio of the determinants and the centred KL divergence values are quoted in the lower-left and upper-right side of each panel, respectively.
Pairwise comparisons of covariance matrices for compressed distance moduli obtained from various computations. In each panel, the comparison matrix |$\boldsymbol{\sf W}_{12}$| is displayed as a bitmap image. The scaled ratio of the determinants and the centred KL divergence values are quoted in the lower-left and upper-right side of each panel, respectively.
First, comparing |$\boldsymbol{\sf C}_1$| obtained from the Gaussian approximation of the posterior (‘Approx.’) to |$\boldsymbol{\sf C}_2$| from MC sampling of the full posterior distribution (‘MC’), we can see they are nearly identical with differences in the off-diagonal elements simply due to MC noise, an artefact of numerical computation. This is also confirmed quantitatively by the vanishing DKL ≈ 0.03 and a negligible difference of r from unity by 0.3 per cent.
Similarly, we find the covariance |$\boldsymbol{\sf C}_1$| of the compressed data from B14 to be identical to |$\boldsymbol{\sf C}_2$| from the Approx.-χ2 computation. This is not surprising, since the data compression performed in B14 neglects the |$\ln \det \boldsymbol{\sf C}_{{\rm d}}$| term. Indeed, neglecting this term leads to significant, systematic differences between the covariance inferred from the χ2 analysis and that obtained from the posterior computation. The r value indicates that the χ2 method overstates the overall uncertainty by 6 per cent. The non-vanishing value of DKL ≈ 0.15, five times the noise-induced value, cannot be dismissed as a small random error. This is visually corroborated by the presence of block structures in the |$\boldsymbol{\sf W}_{12}$| comparison matrix, a feature distinguished from mere numerical artefacts.
6.2 Standard-candle properties and cosmological constraints
Here, we use the set of compressed data to perform a consistency analysis of the SN Ia light-curve parameters across different redshift intervals. Recently, Li et al. (2016) have performed an analysis of the redshift evolution of the standardization parameters by dividing the JLA data set in redshift subsamples and found that the higher redshift data favour a lower value of colour correction parameter β than the subsample at lower redshift. We show how the use of compressed data generated from χ2 analysis may lead to unexpected results when performing such tests.
We divide the JLA data set into two overlapping redshift regions: S1 containing 166 data points in the redshift range 0.01 ≤ z < 0.114 and S2 containing 599 data points in the range 0.082 ≤ z < 1.3. S1 is dominated by low-z sources from various observational programmes, while S2 is dominated by SDSS and SNLS sources. The overlapping region covers the redshift range 0.082 ≤ z < 0.114 and contains 25 points. We apply the compression to both subsamples with control points at the same locations as described in the previous sections which is consistent with the data binning of B14. In the overlapping region, we find the distance moduli to be consistent within 1σ. This is an important consistency check that validates our compression procedure.
In Fig. 12, we plot the contours of the marginalized Gaussian-approximate PDF for the post-compression standardization parameters inferred from the Bayesian analysis with Gaussian approximation of the posterior and the χ2 approach, in S1, S2, and for the full data set, respectively. We may notice that the constraints obtained using the full compressed data set are dominated by data in the region S2. This is not surprising since this redshift interval has the greatest number of data points. The ellipses from S1 and S2 intervals lie within 1σ. In contrast, we can see that the results inferred from the χ2 computation favour values of the parameter β that are systematically larger (in absolute value) than those inferred from the posterior analysis. Again, such a systematic bias is the result of neglecting the |$\ln \det \boldsymbol{\sf C}_{{\rm d}}$| term.
Marginal 0.683 and 0.95 contours of the post-compression standardization parameters inferred from the Gaussian approximation of the posterior (black) and the χ2 analysis (blue), in S1 (dashed lines), S2 (solid lines) and the full data set (filled contours), respectively.
Marginal 0.683 and 0.95 contours of the post-compression standardization parameters inferred from the Gaussian approximation of the posterior (black) and the χ2 analysis (blue), in S1 (dashed lines), S2 (solid lines) and the full data set (filled contours), respectively.
As final test of the data compression analysis, we perform a cosmological parameter inference using the compressed JLA data in the case of the wCDM model discussed in Section 4. In Fig. 13, we plot the contours in the ΩM–w plane obtained using the uncompressed full JLA data set, the compressed data from the Gaussian approximation of the posterior (including the post-compression standardization parameters) and the compressed data with standardization parameters pre-marginalized before entering the cosmological fitting. The displayed contours are nearly indistinguishable from one another. Similarly, we find identical marginal mean and standard deviation of the model parameters: w = −0.82 ± 0.22 and ΩM = 0.22 ± 0.11. These results are in excellent agreement with those discussed in Section 4. Furthermore, for (w, ΩM), we estimate the KL divergence of their two-dimensional distributions, from the one obtained using compressed data to the other obtained with the full JLA data, using the k-nearest neighbour estimator of Pérez-Cruz (2008). The resulting DKL = 0.004 indicates that the cosmological information is preserved by the data compression model described in Section 5. To put this minuscule value into context, the systematic shift from the χ2 result to the Bayesian posterior to that we see in Section 4 and Fig. 5 corresponds to DKL = 0.51.
Marginal contours in the ΩM–w plane from the analysis of the full uncompressed JLA data set (black solid lines), the compressed set obtained from the Gaussian approximation of the posterior (blue dashed lines) and with standardization parameters pre-marginalized after the compression procedure (brown dotted lines).
Marginal contours in the ΩM–w plane from the analysis of the full uncompressed JLA data set (black solid lines), the compressed set obtained from the Gaussian approximation of the posterior (blue dashed lines) and with standardization parameters pre-marginalized after the compression procedure (brown dotted lines).
We make publicly available example programs that implement the graphical models and the MCMC analyses of the full JLA data set and the compressed one at https://gitlab.com/congma/sn-bayesian-model-example/.
7 CONCLUSION
We have performed a detailed Bayesian statistical analysis of the JLA data set using Bayesian graphical models to derive the full posterior distribution of fitting model parameters. We have done so with the specific intent of evaluating the impact of correctly propagating SN standard-candle parameter errors through the data covariance matrix in contrast to the χ2 analysis.
Comparing results from the full posterior distribution with those inferred from the χ2 approach, we find a statistically significant shift of the SN standard-candle corrections towards lower (absolute value). This is because the χ2 fit does not fully propagate the parameter dependence of the covariance which contribute with a |$\ln \det \boldsymbol{\sf C}_{\mathrm{d}}$| term in the parameter posterior. We have shown that neglecting this term is equivalent to assuming non-uniform priors on the parameter α and β which parametrize the effect of the SN light-curve stretch and colour in the standard-candle relation. Due to this improper prior, the χ2 analysis gives more statistical weight to the region of the parameter space away from α = β = 0. In particular, we find a 2σ shift in the best-fitting value of α, a 6σ change in the best-fitting value of β and lower host galaxy correction ΔM of roughly 1σ. Sullivan et al. (2010) found a non-vanishing ΔM at 3.7σ. B14 measured a non-zero value at 5σ (excluding the systematics of the host mass correction itself) or 3σ (including all systematics), while our estimate is at 2.4σ. Recently, Campbell, Fraser & Gilmore (2016) also reported a 2.5σ difference based on the same host mass classification in the SDSS-II SN Ia. We find the amplitude of the systematic offset between the full Bayesian analysis and the χ2 results to be independent of the underlying cosmological model assumption.
The impact of the χ2 analysis bias is less significant on the cosmological parameter inference. To this purpose, we have derived marginal bounds on the parameters of a flat wCDM. The constraints on (ΩM, w) from the two statistical approaches differ to within ∼1σ. However, the effect can be more significant if the bounds are combined with other constraints that break the cosmological degeneracies of the distance modulus.
This statistical bias problem also affects the generation of compressed distance modulus data. We have used the linear compression model presented in B14 and determined the compression coefficients performing a full posterior analysis of the compression parameters and post-compression standardization parameters as opposite to the χ2 approach. Indeed, the comparison between the compressed data sets obtained using the full posterior analysis and the χ2 approach shows differences of the marginal mean value of the post-standardization parameters, the mean of the compressed distance moduli and their covariance.
In related works dedicated to SN Ia cosmology with Bayesian methods (March et al. 2011, 2014; Rubin et al. 2015; Shariff et al. 2016), cosmological models are analysed globally with the SN Ia observables. Although we acknowledge that these analyses are better equipped with representing the full dependence relations of all the random variables involved, we also note the considerable cost and complexity of such analyses. In this work, we instead take the already reduced SALT2 filter data output of JLA as statistical evidence (and we expect that future data may be utilized in a similar manner). This allows us to present a simple, modular approach of using the reduced data for a wide family of cosmological models. The simplicity is further improved by the data compression procedure. This step is present in B14 but lacking formal details. In this work, we formalize the compression as a discrete linear model and subject it to the same Bayesian analysis showing that inconsistency exists in the B14 compression results.
Our main contribution to the Bayesian data compression problem of the JLA data set is the development of an efficient method that uses a Gaussian approximation of the posterior which we have checked against MC sampling. We have implemented this method as publicly available code that allows the user to fast generate compressed distance modulus data set (including their correlated standardization parameters) at given input redshift locations.
The cosmological parameter inference from the compressed data set gives results that are nearly identical to those obtained using the entire uncompressed JLA data set, and this shows that cosmological information is left unaltered by the statistical compression method. However, we acknowledge that further investigation should be needed for understanding the extent of its limitations, and for its greater optimization and generalization towards future data sets.
Overall, the analysis presented here stresses the necessity of using self-consistent Bayesian statistical approaches to perform unbiased model parameter inference of SN Ia to generate unbiased sets of compressed data.
Acknowledgments
The research leading to these results has received funding from the European Research Council under the European Union's Seventh Framework Programme (FP/2007–2013)/ERC Grant Agreement no. 279954. CM acknowledges the support from the joint research training programme of the Chinese Academy of Sciences and the CNRS. Data visualizations are prepared with the matplotlib6 library (Hunter 2007), and DAG diagrams are drawn using the dot2tex7 tool created by K. M. Fauske and contributors. The authors are grateful for the critical, anonymous reviews that improved this paper.
" As discussed at the end of Section 2.2, fixing h while letting |$M_B^1$| to vary is not the same as treating both parameters as random variables with different priors. However, the cosmological parameter inference is insensitive to the choice of a specific value of h whether in the form of a hard prior or as a mean of a Gaussian prior when the SN Ia data are used alone.
" Information on the post-compression standardization parameter vector |$\boldsymbol {\varphi }_{{\rm dc}}$| can be included in a concise form by extending the matrix |$\tilde{\boldsymbol{\sf B}}$| into a block-diagonal form |$\tilde{\boldsymbol{\sf B}}^{\prime} = {\boldsymbol{\sf I} \ \boldsymbol{\sf 0} \choose \boldsymbol{\sf 0} \ \tilde{\boldsymbol{\sf B}} }$|, where |$\boldsymbol{\sf I}$| is the 3 × 3 identity matrix associated with |$\boldsymbol {\varphi }$|. Thus, after compression, the joint distribution |$P(\boldsymbol {\varphi }_{{\rm dc}}, \boldsymbol {\mu }_{{\rm dc}})$| is given by a Gaussian with mean |$\tilde{\boldsymbol{\sf B}}^{\prime } \boldsymbol {\Phi }^{\star }$| and covariance |$\tilde{\boldsymbol{\sf B}}^{\prime } \boldsymbol{\sf C}_{\boldsymbol {\Phi }} \tilde{\boldsymbol{\sf B}}^{\prime \top }$|.
REFERENCES
SUPPORTING INFORMATION
Additional Supporting Information may be found in the online version of this article:
Table A1. Compressed JLA SN Ia distance modulus data vector |$\boldsymbol {\mu }_{{\rm dc}}$|. The mean values of post-compression standardization parameters are already listed as the first column of Table 3.
Table A2. Joint covariance matrix of compressed JLA distance moduli and standardization parameters. For the purpose of presentation only, values in this table have been multiplied by 106, and only the upper triangle of the symmetric matrix is shown.
Please note: Oxford University Press is not responsible for the content or functionality of any supporting materials supplied by the authors. Any queries (other than missing material) should be directed to the corresponding author for the article.
APPENDIX A: COMPRESSED SN IA DATA TABLES
We present the compressed JLA |$\boldsymbol {\mu }_{{\rm dc}}$| data set in Table A1 and the covariance matrix in Table A2 obtained using our method. They are available at the same 31 redshift locations as those of B14, table F.1. However, we can also incorporate the post-compression standardization parameters (α, β, ΔM) in our data set. The mean values of those standardization parameters are already shown as the first column of Table 3, which can be concatenated with the |$\boldsymbol {\mu }_{{\rm dc}}$| vector listed in Table A1 to form the full compressed data set. The standardization parameters are correlated with |$\boldsymbol {\mu }_{{\rm dc}}$|, a fact reflected in Table A2. It is possible to pre-marginalize over the standardization parameters before using the compressed data, simply by dropping the corresponding rows and columns from the tables.
Compressed JLA SN Ia distance modulus data vector |$\boldsymbol {\mu }_{{\rm dc}}$|. The mean values of post-compression standardization parameters are already listed as the first column of Table 3. The full table is available online.
| z . | μdc . |
|---|---|
| 0.010 | 33.006 |
| 0.012 | 33.833 |
| 0.014 | 33.862 |
| 0.016 | 34.119 |
| 0.019 | 34.587 |
| ... | ... |
| 1.300 | 44.826 |
| z . | μdc . |
|---|---|
| 0.010 | 33.006 |
| 0.012 | 33.833 |
| 0.014 | 33.862 |
| 0.016 | 34.119 |
| 0.019 | 34.587 |
| ... | ... |
| 1.300 | 44.826 |
Compressed JLA SN Ia distance modulus data vector |$\boldsymbol {\mu }_{{\rm dc}}$|. The mean values of post-compression standardization parameters are already listed as the first column of Table 3. The full table is available online.
| z . | μdc . |
|---|---|
| 0.010 | 33.006 |
| 0.012 | 33.833 |
| 0.014 | 33.862 |
| 0.016 | 34.119 |
| 0.019 | 34.587 |
| ... | ... |
| 1.300 | 44.826 |
| z . | μdc . |
|---|---|
| 0.010 | 33.006 |
| 0.012 | 33.833 |
| 0.014 | 33.862 |
| 0.016 | 34.119 |
| 0.019 | 34.587 |
| ... | ... |
| 1.300 | 44.826 |
Joint covariance matrix of compressed JLA distance moduli and standardization parameters. For the purpose of presentation only, values in this table have been multiplied by 106, and only the upper triangle of the symmetric matrix is shown. The full table (without scaling by 106) is available online.
| . | α . | β . | ΔM . | μdc, 1 . | μdc, 2 . | ... . | μdc, 31 . |
|---|---|---|---|---|---|---|---|
| α | 35 | 19 | −30 | 11 | −28 | ... | 25 |
| β | 4533 | 15 | 541 | −577 | ... | 132 | |
| ΔM | 479 | −160 | −117 | ... | −189 | ||
| μdc, 1 | 20 375 | −10 398 | ... | 183 | |||
| μdc, 2 | 27 129 | ... | 214 | ||||
| ... | ... | ... | |||||
| μdc, 31 | 16 300 |
| . | α . | β . | ΔM . | μdc, 1 . | μdc, 2 . | ... . | μdc, 31 . |
|---|---|---|---|---|---|---|---|
| α | 35 | 19 | −30 | 11 | −28 | ... | 25 |
| β | 4533 | 15 | 541 | −577 | ... | 132 | |
| ΔM | 479 | −160 | −117 | ... | −189 | ||
| μdc, 1 | 20 375 | −10 398 | ... | 183 | |||
| μdc, 2 | 27 129 | ... | 214 | ||||
| ... | ... | ... | |||||
| μdc, 31 | 16 300 |
Joint covariance matrix of compressed JLA distance moduli and standardization parameters. For the purpose of presentation only, values in this table have been multiplied by 106, and only the upper triangle of the symmetric matrix is shown. The full table (without scaling by 106) is available online.
| . | α . | β . | ΔM . | μdc, 1 . | μdc, 2 . | ... . | μdc, 31 . |
|---|---|---|---|---|---|---|---|
| α | 35 | 19 | −30 | 11 | −28 | ... | 25 |
| β | 4533 | 15 | 541 | −577 | ... | 132 | |
| ΔM | 479 | −160 | −117 | ... | −189 | ||
| μdc, 1 | 20 375 | −10 398 | ... | 183 | |||
| μdc, 2 | 27 129 | ... | 214 | ||||
| ... | ... | ... | |||||
| μdc, 31 | 16 300 |
| . | α . | β . | ΔM . | μdc, 1 . | μdc, 2 . | ... . | μdc, 31 . |
|---|---|---|---|---|---|---|---|
| α | 35 | 19 | −30 | 11 | −28 | ... | 25 |
| β | 4533 | 15 | 541 | −577 | ... | 132 | |
| ΔM | 479 | −160 | −117 | ... | −189 | ||
| μdc, 1 | 20 375 | −10 398 | ... | 183 | |||
| μdc, 2 | 27 129 | ... | 214 | ||||
| ... | ... | ... | |||||
| μdc, 31 | 16 300 |
APPENDIX B: COMPARING COVARIANCE MATRICES
Here, we present a method to visually compare covariance matrices of the same size. Indeed, an element-wise comparison can be done directly. However, it is possible to design a positive-definite operator that allows for a intuitive visual comparison.
However, the matrix |$\boldsymbol{\sf W}_{12}$| is not unique. For example, let the covariance matrices have factorized form |$\boldsymbol{\sf S}_1 \boldsymbol{\sf S}_1^{\intercal } = \boldsymbol{\sf C}_1$| and |$\boldsymbol{\sf S}_2 \boldsymbol{\sf S}_2^{\intercal } = \boldsymbol{\sf C}_2$|, where |$\boldsymbol{\sf S}_{1, 2}$| are of the same dimensions as |$\boldsymbol{\sf C}_{1, 2}$|. Such factors exist, for example, by the existence of Cholesky factorization or diagonalizability of positive-definite symmetric matrices. They are invertible, for if they are not, we have |$\det \boldsymbol{\sf C}_{1, 2} = (\det \boldsymbol{\sf S}_{1, 2})^2 = 0$|. It follows that any matrix in the form of |$\boldsymbol{\sf S}_2 \boldsymbol{\sf P} \boldsymbol{\sf S}_1^{-1}$|, where |$\boldsymbol{\sf P}$| is a matrix such that |$\boldsymbol{\sf P} \boldsymbol{\sf P}^{\intercal } = \boldsymbol{\sf I}$| (i.e. orthogonal), can be a choice for |$\boldsymbol{\sf W}_{12}$|. In the following discussion, we aim to find the one that is positive-definite and symmetric.
By the aforementioned theorem of Wigner (1963), |$\boldsymbol{\sf W}_{12}$| itself is positive-definite. Notice that in the expression equation (B12) the matrix exponent 1/2 cannot simply be distributed to the individual factors of the matrix product |$\boldsymbol{\sf C}_1^{\frac{1}{2}} \boldsymbol{\sf C}_2 \boldsymbol{\sf C}_1^{\frac{1}{2}}$|. Instead, it must be understood by solving the generalized eigenvalue problem of equation (B6).
Just like the square root defined in equation (B2) tends to conserve the window function bandwidth (Tegmark 1997), the extension |$\boldsymbol{\sf W}_{12}$| as defined in equation (B12) also creates narrow windows. In other words, it is not likely to generate a combination of wide windows in order to account for a small difference. This is a desirable feature for the matrices we want to compare, because we expect small differences, some of which are simply numerical artefacts of the computation.







![Level contours of $[ \ln \det \boldsymbol{\sf C}_{{\rm d}}(\alpha , \beta ) ] / 2$ for the JLA data set. This can be interpreted as the log-distortion of the priors on α and β.](https://oup.silverchair-cdn.com/oup/backfile/Content_public/Journal/mnras/463/2/10.1093_mnras_stw2069/1/m_stw2069fig7.jpeg?Expires=1591769727&Signature=pbTKWbl3F1b88zVP9V621eLWPe8SP7fJhpRpq9ELJ55mxbhknzHhs9D7vpwacct-F1iWyNly1ZxOEIx-L2WlvUqsvKGjMu3bswuyQ~RLLNZ5ZYMea29fgZqMag~B6oHg96VhY6q7FzCQlBd0YtQ2IFW~BgxgIA1UkIzf3iAhEENQs6MwYjMx6QfSHFCzc-yzfA0HRuC9J1UZJu-B0p2A6VXwT9aXO~FbbiM0-D2wMcOdc31h7i6YKAGZ6ylr8QMuF5aurnTli8UceHXxWbTFlGpqxNbI2hu5Shsc3DQjE02aG~dIuEIz2y0~fxrQSn0JK5mR92Nlc0cezWP44jSQHQ__&Key-Pair-Id=APKAIE5G5CRDK6RD3PGA)





