-
PDF
- Split View
-
Views
-
Cite
Cite
Christopher Gordon, Roberto Trotta, Bayesian calibrated significance levels applied to the spectral tilt and hemispherical asymmetry, Monthly Notices of the Royal Astronomical Society, Volume 382, Issue 4, 21 December 2007, Pages 1859–1863, https://doi.org/10.1111/j.1365-2966.2007.12707.x
Close -
Share
Abstract
Bayesian model selection provides a formal method of determining the level of support for new parameters in a model. However, if there is not a specific enough underlying physical motivation for the new parameters it can be hard to assign them meaningful priors, an essential ingredient of Bayesian model selection. Here we look at methods maximizing the prior so as to work out what is the maximum support the data could give for the new parameters. If the maximum support is not high enough then one can confidently conclude that the new parameters are unnecessary without needing to worry that some other prior may make them significant. We discuss a computationally efficient means of doing this which involves mapping p-values on to upper bounds of the Bayes factor (or odds) for the new parameters. A p-value of 0.05 (1.96σ) corresponds to odds less than or equal to 5:2, which is below the ‘weak’ support at best threshold. A p-value of 0.0003 (3.6σ) corresponds to odds of less than or equal to 150:1, which is the ‘strong’ support at best threshold. Applying this method we find that the odds on the scalar spectral index being different from one are 49:1 at best. We also find that the odds that there is primordial hemispherical asymmetry in the cosmic microwave background are 9:1 at best.
1 INTRODUCTION
When there are several competing theoretical models, Bayesian model selection provides a formal way of evaluating their relative probabilities in light of the data and any prior information available. A common scenario is where a model is being extended by adding new parameters. Then the relative probability of the model with the extra parameters can be compared with that for the original model. This provides a way of evaluating whether the new parameters are supported by the data. Often the original model is ‘nested’ in the new model, in that the new model reduces to the original model for specific values of the new parameters. The Bayesian framework automatically implements an Occam's razor effect as a penalization factor for less predictive models – the best model is then the one that strikes the best balance between goodness of fit and economy of parameters (Trotta 2007b).
For nested models, the Occam's razor effect is controlled by the volume of parameter space enclosed by the prior probability distributions for the new parameters. The relative probability of the new model can be made arbitrarily small by increasing the broadness of the prior. Often this is not problematical as prior ranges for the new parameters can (and should) be motivated from the underlying theory. For example, in estimating whether the scalar spectral index (n) of the primordial perturbations is equal to one (see Section 4), the prior range of the index can be constrained to be 0.8 ≲n≲ 1.2 by assuming the perturbations were generated by slow roll inflation. The sensitivity of the model selection result can also be easily investigated for other plausible, physically motivated choice of prior ranges (e.g. Trotta 2007c,a).
However, there are cases like the asymmetry seen in the Wilkinson Microwave Anisotropy Probe (WMAP) cosmic microwave background (CMB) temperature data (see Section 5) where there is not a specific enough model available to place meaningful limits on the prior ranges of the new parameters. This hurdle arises frequently in cases when the new parameters are a phenomenological description of a new effect, only loosely tied to the underlying physics, such as, for example, expansion coefficients of some series. In these cases, an alternative is to choose the prior on the new parameters in such a way as to maximize the probability of the new model, given the data. If, even under this best case scenario, the new model is not significantly more probable than the old model, then one can confidently say that the data does not support the addition of the new parameters, regardless of the prior choice for the new parameters.
2 UPPER BOUNDS ON THE BAYES FACTOR








for some common thresholds of ℘ and ln B. Note how the p-value of 0.05 (a 95 per cent confidence level result) only corresponds to an odds ratio upper bound of
and so does not quite reach the ‘weak’ support threshold even for an optimized prior. Also note that in order for the ‘strong’ support threshold to be reachable, σ≥ 3.6 is required.Translation table (using equation 8) between p-values and the upper bounds on the odds
between the two models. The ‘Sigma’ column is the corresponding number of standard deviations away from the mean for a normal distribution. In the ‘Category’ column are the descriptions for the different categories of support reachable for the corresponding p-value.
| p-value | | | Sigma | Category |
| 0.05 | 2.5 | 0.9 | 2.0 | |
| 0.04 | 2.9 | 1.0 | 2.1 | ‘weak’ at best |
| 0.01 | 8.0 | 2.1 | 2.6 | |
| 0.006 | 12 | 2.5 | 2.7 | ‘moderate’ at best |
| 0.003 | 21 | 3.0 | 3.0 | |
| 0.001 | 53 | 4.0 | 3.3 | |
| 0.0003 | 150 | 5.0 | 3.6 | ‘strong’ at best |
| 6 × 10−7 | 43000 | 11 | 5.0 |
| p-value | | | Sigma | Category |
| 0.05 | 2.5 | 0.9 | 2.0 | |
| 0.04 | 2.9 | 1.0 | 2.1 | ‘weak’ at best |
| 0.01 | 8.0 | 2.1 | 2.6 | |
| 0.006 | 12 | 2.5 | 2.7 | ‘moderate’ at best |
| 0.003 | 21 | 3.0 | 3.0 | |
| 0.001 | 53 | 4.0 | 3.3 | |
| 0.0003 | 150 | 5.0 | 3.6 | ‘strong’ at best |
| 6 × 10−7 | 43000 | 11 | 5.0 |
Translation table (using equation 8) between p-values and the upper bounds on the odds
between the two models. The ‘Sigma’ column is the corresponding number of standard deviations away from the mean for a normal distribution. In the ‘Category’ column are the descriptions for the different categories of support reachable for the corresponding p-value.
| p-value | | | Sigma | Category |
| 0.05 | 2.5 | 0.9 | 2.0 | |
| 0.04 | 2.9 | 1.0 | 2.1 | ‘weak’ at best |
| 0.01 | 8.0 | 2.1 | 2.6 | |
| 0.006 | 12 | 2.5 | 2.7 | ‘moderate’ at best |
| 0.003 | 21 | 3.0 | 3.0 | |
| 0.001 | 53 | 4.0 | 3.3 | |
| 0.0003 | 150 | 5.0 | 3.6 | ‘strong’ at best |
| 6 × 10−7 | 43000 | 11 | 5.0 |
| p-value | | | Sigma | Category |
| 0.05 | 2.5 | 0.9 | 2.0 | |
| 0.04 | 2.9 | 1.0 | 2.1 | ‘weak’ at best |
| 0.01 | 8.0 | 2.1 | 2.6 | |
| 0.006 | 12 | 2.5 | 2.7 | ‘moderate’ at best |
| 0.003 | 21 | 3.0 | 3.0 | |
| 0.001 | 53 | 4.0 | 3.3 | |
| 0.0003 | 150 | 5.0 | 3.6 | ‘strong’ at best |
| 6 × 10−7 | 43000 | 11 | 5.0 |
, and when the new parameters are allowed to vary (
). Then the quantity 

A very different approach to estimating the Bayes factor without having to specify a prior is the Bayesian Information Criteria (BIC) (Schwarz 1978; Magueijo & Sorkin 2007; Liddle 2007). The BIC assumes a prior for the new parameters which is equivalent to a single data point (Raftery 1995). Therefore, it will in general give lower values for B. The BIC is complementary to the upper bound for B presented here in that it provides a default weak rather than default strong prior.
3 AN ILLUSTRATIVE EXAMPLE
for fixed μ0 (the null hypothesis), while under the alternative
and N data samples are available (with σ known). If the prior on μ is taken to be symmetric about μ=μ0 and unimodal, then (Berger & Sellke 1987) 
, and K is found by solving 

. Suppose that the proportion of nulls and alternatives is equal. We then compute the p-value using equation (13) and we select all the tests that give ℘∈[α−ε, α+ε], for a certain value of α and ε≪α. Among such results, which rejected the null hypothesis at the 1 −α level, we then determine the proportion that actually came from the null, i.e. the percentage of wrongly rejected nulls. We assume that either M1 or M0 is true. This allows us to use 
Proportion of wrongly rejected nulls among all results reporting a certain p-value (simulation results). This illustrates that the p-value is not equal to the fraction of wrongly rejected true nulls, which can be considerably worse. This effect does not depend on the assumption of Gaussianity nor on the sample size. The right most column gives a lower bound on the fraction of true nulls derived using equations (8) and (14).
| p-value | Sigma | Fraction of true nulls | Lower bound |
| 0.05 | 1.96 | 0.51 | 0.29 |
| 0.01 | 2.58 | 0.20 | 0.11 |
| 0.001 | 3.29 | 0.024 | 0.018 |
| p-value | Sigma | Fraction of true nulls | Lower bound |
| 0.05 | 1.96 | 0.51 | 0.29 |
| 0.01 | 2.58 | 0.20 | 0.11 |
| 0.001 | 3.29 | 0.024 | 0.018 |
Proportion of wrongly rejected nulls among all results reporting a certain p-value (simulation results). This illustrates that the p-value is not equal to the fraction of wrongly rejected true nulls, which can be considerably worse. This effect does not depend on the assumption of Gaussianity nor on the sample size. The right most column gives a lower bound on the fraction of true nulls derived using equations (8) and (14).
| p-value | Sigma | Fraction of true nulls | Lower bound |
| 0.05 | 1.96 | 0.51 | 0.29 |
| 0.01 | 2.58 | 0.20 | 0.11 |
| 0.001 | 3.29 | 0.024 | 0.018 |
| p-value | Sigma | Fraction of true nulls | Lower bound |
| 0.05 | 1.96 | 0.51 | 0.29 |
| 0.01 | 2.58 | 0.20 | 0.11 |
| 0.001 | 3.29 | 0.024 | 0.018 |
The root of this striking disagreement with a common misinterpretation of the p-value (namely, that the p-value gives the fraction of wrongly rejected nulls in the long run) is twofold. While the p-value gives the probability of obtaining data that are as extreme or more extreme than what has actually been observed assuming the null hypothesis is true, one is not allowed to interpret this as the probability of the null hypothesis to be true, which is actually the quantity one is interested in assessing. The latter step requires using Bayes theorem and is therefore not defined for a frequentist. Also, quantifying how rare the observed data are under the null is not meaningful unless we can compare this number with their rareness under an alternative hypothesis. Both these points are discussed in greater detail in Berger & Sellke (1987), Sellke et al. (2001) and Berger (2003).
4 SCALAR SPECTRAL INDEX




As there is such a broad range for the the prior on n, it is useful to evaluate what is the upper bound on the odds for a non-Harrison–Zeldovich spectrum, n≠ 1. In Table 3 we list a number of different studies of the variation of the spectral index for a range of data. Where the Bayes factor has been worked out it can be seen that our estimate of the upper bound is always more than the evaluated version. Also, for the case with the greatest amount of data there is quite a large discrepancy between the upper bound and the evaluated odds. This makes sense as the same prior for n was used (equation 17) but now the data is more constraining and so the maximizing prior is narrower. Using the most constraining data combination (WMAPext + HST + SDSS) the upper limits on the odds against n= 1 is 49:1. However, the odds against Harrison–Zeldovich could be weakened by various systematic effects in data analysis choices, e.g. inclusion of gravitational lensing, beam modelling, not including Sunyaev–Zeldovich (SZ) marginalization, and point-source subtraction (Peiris & Easther 2006; Lewis 2006; Parkinson, Mukherjee & Liddle 2006; Eriksen et al. 2007; Huffenberger, Eriksen & Hansen 2006; Spergel et al. 2007).
The odds against a Harrison–Zeldovich spectrum. The p-values where estimated from Δχ2eff using equation (10). The upper bounds on the Bayes factor were estimated using equation (8). Where ln B is available it was calculated with the prior of equation (17).
| Data | Δχ2eff | p-value | ln B | | |
| WMAP | 6 | 0.014 | – | 1.8 | 6 |
| (Spergel et al. 2007) | |||||
| WMAPext+SDSS+2df+No SZ | 8 | 0.005 | 2.0 | 2.7 | 15 |
| (Parkinson et al. 2006) | |||||
| WMAPext+HST | 8 | 0.004 | 2.7 | 2.8 | 16 |
| (Kunz, Trotta & Parkinson 2006) | |||||
| WMAPext+HST+SDSS | 11 | 0.001 | 2.9 | 3.9 | 49 |
| (Trotta 2007b) | |||||
| Data | Δχ2eff | p-value | ln B | | |
| WMAP | 6 | 0.014 | – | 1.8 | 6 |
| (Spergel et al. 2007) | |||||
| WMAPext+SDSS+2df+No SZ | 8 | 0.005 | 2.0 | 2.7 | 15 |
| (Parkinson et al. 2006) | |||||
| WMAPext+HST | 8 | 0.004 | 2.7 | 2.8 | 16 |
| (Kunz, Trotta & Parkinson 2006) | |||||
| WMAPext+HST+SDSS | 11 | 0.001 | 2.9 | 3.9 | 49 |
| (Trotta 2007b) | |||||
The odds against a Harrison–Zeldovich spectrum. The p-values where estimated from Δχ2eff using equation (10). The upper bounds on the Bayes factor were estimated using equation (8). Where ln B is available it was calculated with the prior of equation (17).
| Data | Δχ2eff | p-value | ln B | | |
| WMAP | 6 | 0.014 | – | 1.8 | 6 |
| (Spergel et al. 2007) | |||||
| WMAPext+SDSS+2df+No SZ | 8 | 0.005 | 2.0 | 2.7 | 15 |
| (Parkinson et al. 2006) | |||||
| WMAPext+HST | 8 | 0.004 | 2.7 | 2.8 | 16 |
| (Kunz, Trotta & Parkinson 2006) | |||||
| WMAPext+HST+SDSS | 11 | 0.001 | 2.9 | 3.9 | 49 |
| (Trotta 2007b) | |||||
| Data | Δχ2eff | p-value | ln B | | |
| WMAP | 6 | 0.014 | – | 1.8 | 6 |
| (Spergel et al. 2007) | |||||
| WMAPext+SDSS+2df+No SZ | 8 | 0.005 | 2.0 | 2.7 | 15 |
| (Parkinson et al. 2006) | |||||
| WMAPext+HST | 8 | 0.004 | 2.7 | 2.8 | 16 |
| (Kunz, Trotta & Parkinson 2006) | |||||
| WMAPext+HST+SDSS | 11 | 0.001 | 2.9 | 3.9 | 49 |
| (Trotta 2007b) | |||||
5 ASYMMETRY IN THE CMB

are the underlying isotropically distributed temperature fluctuations, A is the amplitude of the isotropy breaking and
is the direction of isotropy breaking. The isotropy of the fluctuations can then be tested by evaluating whether A= 0. The problem with using the Bayes ratio in this case is that there is no good underlying model which produces this type of isotropy breaking. An attempt was made by Donoghue, Dutta & Ross (2007) to allow an initial gradient in the inflaton field but they found that the modulation dropped sharply with scale. However, the required modulation should probably extend all the way to scales associated with the harmonic ℓ= 40 (Hansen, Banday & Gorski 2004). Also, Inoue & Silk (2006) postulated that Poisson-distributed voids may be responsible for the asymmetry. However, a generating mechanism for the voids and a detailed likelihood analysis are presently lacking.Therefore at present there is not a concrete enough theory to place meaningful prior limits on A. However, we can still work out the upper limit on the Bayes factor. The p-values can be evaluated from equation (10). Although A= 0 is on the boundary of the parameter space, the problem can be reparametrized in Cartesian coordinates where A=w2x+w2y+w2z and wi is a linear modulation weight for spatial dimension i. Then the wi= 0 point, for all i, will not be on the edge of the parameter space and so equation (10) can be used.
The results are shown in Table 4. Simulations had been done for the last row's p-value (Eriksen et al. 2007) and were in excellent agreement with the result from equation (10). Eriksen et al. (2007) did compute the Bayes factor, taking as the prior 0 ≤A≤ 0.3, but did not give a justification for that prior except that it contained all the non-negligible likelihood. This is unproblematic for parameter estimation, but is ambiguous for working out the Bayes factor. For example, if the prior range for A was extended to be 0 ≤A≤ 0.6 then the Bayes factor would decrease by 2 but the parameter estimates would be unaffected.
The odds for dipolar modulation, A≠ 0. The resolution of the data used is also indicated. The Cmarg refers to marginalization over a non-modulated monopole and dipole.
was evaluated using equation (8).
| Data | Δχ2eff | p-value | ln B | | |
| WMAP (7°) | 3 | 0.4 | – | – | – |
| (Spergel et al. 2007) | |||||
| WMAP (7°)+Cmarg | 9 | 0.03 | – | 1.3 | 4 |
| (Gordon 2007) | |||||
WMAP +Cmarg | 11 | 0.01 | 1.8 | 2.16 | 9 |
| (Eriksen et al. 2007) | |||||
| Data | Δχ2eff | p-value | ln B | | |
| WMAP (7°) | 3 | 0.4 | – | – | – |
| (Spergel et al. 2007) | |||||
| WMAP (7°)+Cmarg | 9 | 0.03 | – | 1.3 | 4 |
| (Gordon 2007) | |||||
WMAP +Cmarg | 11 | 0.01 | 1.8 | 2.16 | 9 |
| (Eriksen et al. 2007) | |||||
The odds for dipolar modulation, A≠ 0. The resolution of the data used is also indicated. The Cmarg refers to marginalization over a non-modulated monopole and dipole.
was evaluated using equation (8).
| Data | Δχ2eff | p-value | ln B | | |
| WMAP (7°) | 3 | 0.4 | – | – | – |
| (Spergel et al. 2007) | |||||
| WMAP (7°)+Cmarg | 9 | 0.03 | – | 1.3 | 4 |
| (Gordon 2007) | |||||
WMAP +Cmarg | 11 | 0.01 | 1.8 | 2.16 | 9 |
| (Eriksen et al. 2007) | |||||
| Data | Δχ2eff | p-value | ln B | | |
| WMAP (7°) | 3 | 0.4 | – | – | – |
| (Spergel et al. 2007) | |||||
| WMAP (7°)+Cmarg | 9 | 0.03 | – | 1.3 | 4 |
| (Gordon 2007) | |||||
WMAP +Cmarg | 11 | 0.01 | 1.8 | 2.16 | 9 |
| (Eriksen et al. 2007) | |||||
6 CONCLUSIONS
Bayesian model selection provides a powerful way of evaluating whether new parameters are needed in a model. There are, however, cases where the prior for the new parameter can be uncertain, or physically difficult to motivate. Here we have looked at priors which maximize the Bayes factor for the new parameters. This puts the reduced model under the most strain possible and so tells the user what the best-case scenario is for the new parameters. We have pointed out a common misinterpretation of the meaning of p-values, which often results in an overestimation of the true significance of rejection tests for null hypotheses.
Using Bayesian-calibrated p-values we have evaluated upper bounds on the Bayes factor for the spectral index. We have found that the best the current data can do is provide moderate support (odds ≤49:1) for n≠ 1. We also looked at the maximum Bayes factor for a modulation in the WMAP CMB temperature data. We found that the current data can at best provide weak support (odds ≤ 9:1) for a departure from isotropy.
The comparison between p-values and Bayes factors suggests a threshold of ℘= 3 × 10−4 or σ= 3.6 is needed if the odds of 150:1 (‘strong’ support at best) are to be obtained. It is difficult to detect systematics which are smaller than the statistical noise and so systematic effects in the data analysis typically lead to a shift of order a sigma. It follows that the ‘particle physics discovery threshold’ of 5σ may be required in order to obtain odds of at best 150 : 1.
We are grateful to Kate Land for useful conversations and to Uros Seljak for interesting comments. CG is supported by Beecroft Institute for Particle Astrophysics and Cosmology. RT is supported by the Royal Astronomical Society through the Sir Norman Lockyer Fellowship, and by St Anne's College, Oxford.
REFERENCES






