## Abstract

It has been argued recently by Copi et al. 2009 that the lack of large angular correlations of the CMB temperature field provides strong evidence against the standard, statistically isotropic, inflationary Lambda cold dark matter (ΛCDM) cosmology. We compare various estimators of the temperature correlation function showing how they depend on assumptions of statistical isotropy and how they perform on the *Wilkinson Microwave Anisotropy Probe* (*WMAP*) 5-yr Internal Linear Combination (ILC) maps with and without a sky cut. We show that the low multipole harmonics that determine the large-scale features of the temperature correlation function can be reconstructed accurately from the data that lie outside the sky cuts. The reconstructions are only weakly dependent on the assumed statistical properties of the temperature field. The temperature correlation functions computed from these reconstructions are in good agreement with those computed from the ILC map over the whole sky. We conclude that the large-scale angular correlation function for our realization of the sky is well determined. A Bayesian analysis of the large-scale correlations is presented, which shows that the data cannot exclude the standard ΛCDM model. We discuss the differences between our results and those of Copi et al. Either there exists a violation of statistical isotropy as claimed by Copi et al., or these authors have overestimated the significance of the discrepancy because of a posteriori choices of estimator, statistic and sky cut.

## 1 INTRODUCTION

Following the discovery by the *COBE* team of temperature anisotropies in the cosmic microwave background (CMB) radiation (Smoot et al. 1992; Wright et al. 1992), Hinshaw et al. (1996) noted that the temperature angular correlation function (ACF), *C*(θ), measured from the *COBE* maps was close to zero on large angular scales. This result attracted little attention until the publication of the first-year results from Wilkinson Microwave Anisotropy Probe (*WMAP*) (Bennett et al. 2003; Spergel et al. 2003, hereafter S03). The results from *WMAP* confirmed the lack of large-scale angular correlations in the temperature maps and led S03 to introduce the statistic

*p*-value’, i.e. the fraction of models in their Monte Carlo Markov chains which had a value of

*S*

^{model}

_{1/2}<

*S*

^{data}

_{1/2}, using the same estimator and sky cut that they applied to the data. For their standard six-parameter inflationary Lambda cold dark matter (ΛCDM) cosmology, they found a

*p*-value of 0.15 per cent, suggesting a significant discrepancy between the model and the data.

This problem was revisited by Efstathiou (2004a, hereafter E04). The main focus of the E04 paper was to improve on the pseudo-harmonic power-spectrum analysis used by the *WMAP* team (Hinshaw et al. 2003) by using quadratic maximum likelihood (QML) estimates of the power spectrum, with particular emphasis on the statistical significance of the low amplitude of the quadrupole anisotropy. As an aside, E04 computed ACFs from the QML power-spectrum estimates and showed that they were insensitive to the presence of a sky cut. Similarly, the *S*_{1/2} statistic computed from these correlation functions was found to be insensitive to the size of the sky cut giving *p*-values of a few per cent. E04 concluded that the correlation function and *S*_{1/2} statistic offered no compelling evidence against the concordance inflationary ΛCDM model. E04 did not explore in any detail the low *p*-value for the *S*_{1/2} statistic reported by S03, but commented that it was probably simply an ‘unfortunate’ consequence of the particular choice of statistic, estimator and sky cut chosen by these authors (in other words, a result of various a posteriori choices).

The CMB temperature correlation function and *S* statistic have been reanalysed in two recent papers (Copi et al. 2007, 2009, hereafter CHSS09). The arguments in the two papers are quite similar, and so for the most part we will refer to the later paper (since, as in this paper, it analyses the 5-yr *WMAP* temperature data; Hinshaw et al. 2009). The Copi et al. papers are largely motivated by evidence for a violation of statistical isotropy in the *WMAP* temperature maps, in particular evidence of alignments amongst the low-order CMB multipoles (e.g. Tegmark, de Oliveira-Costa & Hamilton 2003; Schwarz et al. 2004; Land & Magueijo 2005a,b), although the statistical significance of these alignments has been questioned (de Oliveira-Costa et al. 2004; Francis & Peacock 2009; Bennett et al. 2010). Copi et al. make the valid point that statistical isotropy is often implicitly assumed in defining what is meant by the term ‘correlation function’ and in defining estimators. They argue further that different estimators contain different information. They then focus on pixel-based estimates of the correlation function applied to the *WMAP* data, including a sky cut, and find *p*-values for the *S*_{1/2} statistic of ∼0.025–0.04 per cent, depending on the choice of CMB map and sky cut. If no sky cut is applied, they find *p*-values of ∼5 per cent (similar to the *p*-values reported in E04). CHSS09 comment that the full-sky results are apparently inconsistent with the cut-sky analysis suggesting a violation of statistical isotropy.

Any analysis which claims to strongly rule out the simple inflationary ΛCDM model deserves careful scrutiny since a confirmed discordance would have profound consequences for our understanding of the early Universe. The purpose of this paper is to investigate carefully the analysis presented in CHSS09. In Section 2 we discuss estimators of the correlation function and relate the pixel-based estimator used by CHSS09 to the pseudo-power spectrum (PCL) computed on a cut sky. In Section 3, we explicitly reconstruct the individual low-order multipole coefficients *a*_{ℓm} from cut-sky maps using a technique first applied by de Oliveira-Costa & Tegmark (2006). This allows us to test the sensitivity of the large-angle correlation function to the presence of a sky cut, largely independent of assumptions concerning statistical isotropy or Gaussianity. The results of this analysis are compared with the QML estimates of the correlation function used in E04. Section 4 describes a Bayesian analysis of the *S*_{1/2} statistic and contrasts it with the frequentist analysis applied by S03 and CHSS09. Our conclusions are summarized in Section 5. A recent paper by Pontzen & Peiris (2010) extends the analysis presented here to general anisotropic Gaussian theories with largely similar conclusions.

## 2 ESTIMATORS OF THE CORRELATION FUNCTION

If we assume statistical isotropy, the ensemble average of the temperature ACF measured over the whole sky 〈*C*(θ)〉 is related to the ensemble average of the angular power spectrum 〈*C*_{ℓ}〉 by the well-known relation

CHSS09 use a direct pixel-based correlation function1 on the cut sky,

where*x*denotes the temperature value in pixel

_{i}*i*and the angular brackets denote an average over all pixel pairs outside the sky cut with an angular separation that lies within a small interval of θ.

If the underlying temperature field is statistically isotropic, equation (3) provides an unbiased estimate of the correlation function, i.e. the average over a large number of independent realizations is unbiased, irrespective of the sky cut. However, if the fluctuations are statistically isotropic and Gaussian, equation (3) is not an optimal estimator of 〈*C*(θ)〉. To see this, expand the temperature field in spherical harmonics

*w*

_{i}is a window function that is zero or unity depending on whether a pixel (of area Ω

_{i}) lies inside or outside the sky cut. The function in (5) is the PCL of the window function

*w*

_{i}: Relation (5) is an identity and does not depend on the assumption of statistical isotropy.

In fact, the pixel estimator (5) is mathematically identical2 to the PCL estimator used by S03 and E04:

where the matrix is The equivalence between the estimators (5) and (8) demonstrates that for isotropic Gaussian random fields, the pixel estimator (3) is suboptimal since PCL power-spectrum estimates are suboptimal when applied to the cut sky [for extensive discussions see Efstathiou (2004a,b) and references therein].The coefficients are related to the coefficients *a*_{ℓm} on the uncut sky by the coupling matrix ,

Fig. 1(a) shows the direct pixel-based estimator (3) applied to the 5-yr *WMAP* ILC map after smoothing with a Gaussian filter of 10° full width at half-maximum (FWHM) and repixelizing at a healpix (Gorski et al., 2005) resolution NSIDE = 16. The results of Fig. 1 are consistent with those of CHSS09. With the *WMAP* KQ85 and KQ75 masks3 applied (retaining about 82 and 71 per cent of the sky respectively, Gold et al. 2009), there is little power over the angular range 60°–160°. However, there is some non-zero correlation if the pixel estimator is evaluated over the full sky. Fig. 1(b) shows the correlation functions determined from the pseudo-spectra (equation 5). This simply confirms the equivalence of the two estimators (3) and (5) (apart from minor differences arising from the finite angular bin widths). The covariance matrix for these estimators for the full sky is shown in Fig. 2, using the *C*_{ℓ} for the six parameter ΛCDM model that provides the best fit to the *WMAP* data (Komatsu et al. 2009). The large-angle ACF for a nearly scale invariant temperature spectrum is dominated by a small number of modes leading to large correlations between different angular scales. The main effect of a KQ75-type sky cut on the covariance matrix is to increase its overall amplitude. The angular structure of the covariance matrix is insensitive to the precise size and shape of the sky cut.

The main differences between the various ACF estimates plotted in Fig. 1 come from application of the sky cuts. With the sky cuts applied, the ACFs are close to zero on angular scales ≳60°. This lack of power leads to particulary low values for the *S*_{1/2} statistic of about 1000–2000 (μK)^{4}, as listed in Table 1. If no sky cut is applied, the value of the *S*_{1/2} statistic is substantially higher at around 8000 (μK)^{4}. It is worth noting that the values listed in Table 1 are very similar to the values obtained from the *WMAP* first-year data (cf. table 5 in E04). The low multipole anisotropies that contribute to the ACF at large angular scales have remained stable as the data have improved. The low multipoles are signal dominated and stable to improved gain corrections, foreground separation and small perturbations to the Galactic mask.

S_{1/2} statistic in (μK)^{4} | |||||||

Sky cut | Pixel ACF (equation 3) | Pixel ACF (equation 5) | Harmonic reconstruction | QML ACF (equation 14) | |||

ℓ_{max}= 5 | ℓ_{max}= 10 | ℓ_{max}= 15 | ℓ_{max}= 20 | ||||

Full sky | 7373 | 8532 | 8170 | 7777 | 7649 | 7606 | 8532 |

KQ85 | 1401 | 1781 | 8250 | 6953 | 7612 | 6383 | 7234 |

KQ75 | 647 | 963 | 7913 | 6914 | 8233 | 5139 | 5764 |

S_{1/2} statistic in (μK)^{4} | |||||||

Sky cut | Pixel ACF (equation 3) | Pixel ACF (equation 5) | Harmonic reconstruction | QML ACF (equation 14) | |||

ℓ_{max}= 5 | ℓ_{max}= 10 | ℓ_{max}= 15 | ℓ_{max}= 20 | ||||

Full sky | 7373 | 8532 | 8170 | 7777 | 7649 | 7606 | 8532 |

KQ85 | 1401 | 1781 | 8250 | 6953 | 7612 | 6383 | 7234 |

KQ75 | 647 | 963 | 7913 | 6914 | 8233 | 5139 | 5764 |

^{a} For maps degraded to healpix resolution of NSIDE = 16 and smoothed with a Gaussian of FWHM 10°.

As discussed in the Section 1, CHSS09 argue that the pixel estimates of the ACF computed from the masked regions of the sky lead to *p*-values with respect to the standard ΛCDM cosmology of ∼0.1 per cent or less, suggesting a significant discrepancy between the model and the data. However, the *p*-values computed for the unmasked sky are much less significant (∼5 per cent). Various interpretations of this result have been proposed.

(i) The interpretation put forward by CHSS09 is that either correlations have been introduced in reconstructing the full-sky maps from the observations, or that there are highly significant departures from statistical isotropy that are correlated with the Galactic sky cut leading to an ACF that is very close to zero for regions outside the sky cut.

(ii) The interpretation put forward by E04 is that the ACF computed over the whole sky is accurate and unaffected by Galactic contamination. The low *p*-values arise from are a consequence of using a suboptimal estimator of the ACF on a cut sky, combined with a posteriori choices of the form of the *S*_{1/2} statistic.

In support of point (ii), E04 used a QML estimator of the power spectrum, , and computed the ACF:

For Gaussian, statistically isotropic temperature maps, the QML estimates have a significantly smaller variance than the PCL estimates if a sky cut is applied to the data.4 Hence the estimator*C*(θ) will generally be closer to the truth, i.e. closer to the ensemble mean 〈

^{Q}*C*(θ)〉 than the estimator (8). [Pontzen & Peiris (2010) argue more generally that the QML estimator will be close to optimal for any theory with a power spectrum close to that of the concordance ΛCDM cosmology.] Applied to the first-year

*WMAP*ILC map, E04 found that the ACF estimates derived from (14) are insensitive to a sky cut and lead to

*p*-values for the

*S*-statistic of ∼5 per cent, i.e. no strong evidence against the concordance ΛCDM model.

The reason that the QML estimator has significantly smaller ‘estimator-induced’ variance than the PCL estimator is easy to understand (see Efstathiou 2004b). For noise-free band-limited data, it is possible to reconstruct the low multipole coefficients *a*_{ℓm} exactly from data over an incomplete sky. This is, in effect, what the QML estimator does, though it implicitly assumes statistical isotropy in weighting the *a*_{ℓm} coefficients to form the power spectrum (see equation 25). For low multipoles, the assumption of statistical isotropy is unimportant, and for the noise-free data and sky cuts relevant to *WMAP*, the low-order multipole coefficients and the power spectrum *C*_{ℓ} can be reconstructed almost exactly from the data on the incomplete sky. In this paper, we will extend the analysis of E04 by explicitly reconstructing the low-order coefficients *a*_{ℓm} over the entire sky. This analysis will confirm that the ACF at large angular scales is insensitive to a sky cut and leads to *p*-values of marginal significance.

The ‘estimator-induced’ variance of the pixel estimator of the ACF (3) is also easy to understand intuitively. [This problem has been discussed extensively in the literature in the context of angular clustering analysis of galaxy surveys: Groth & Peebles (1986), Landy & Szalay (1993), Hamilton (1993) and Maddox, Efstathiou & Sutherland (1996).] The ACF is a pair-weighted statistic. Consider the analysis of data on an incomplete sky. An overdensity, or underdensity, close to the boundary of the sky cut will almost certainly continue as an overdensity, or underdensity, across the cut. If the pair count is merely corrected by the missing area that lies within the cut region of sky (as in estimator 3) overdense and underdense regions close to the boundary will be underweighted. This causes no bias to the estimator, but increases the sample variance. The analysis presented in Section 3 shows that it is possible to reduce this sampling variance by reconstructing the low multipoles across a sky cut in a way that is numerically stable and free of assumptions concerning statistical isotropy.

## 3 RECONSTRUCTING LOW-ORDER MULTIPOLES ON A CUT SKY

The aim of this section is to reconstruct the large-scale features of the temperature anisotropies over the whole sky using only the incomplete data that lie outside a chosen sky cut. This can be done in a number of ways, for example, by Weiner or ‘power-equalization’ filtering (Bielwicz, Górski & Banday 2004), Gibbs sampling (Eriksen et al. 2004; Wandelt, Larson & Lakshminarayanan 2004) or by ‘harmonic inpainting’ (Inoue, Cabella & Komatsu 2008). Here we apply a direct inversion method, which is insensitive to assumptions concerning the statistical properties of the temperature field.

Let the vector ** x** denote the temperature field on the sky and let the vector

**denote the spherical harmonic coefficients**

*a**a*

_{ℓm}. The vectors

**and**

*x***are related by the spherical transform**

*a***Y**,

**represents ‘noise’ in the data.**

*n*Now consider the reconstruction *a*^{e}:

**A**. The reconstruction is related to the true coefficients

**by If the data are noise-free, equation (16) recovers the true vector**

*a***exactly. If, further, we choose**

*a***A**to be the identity matrix, then where is the coupling matrix (11).

The reconstruction of (18) is closely related to the problem of defining an orthonormal basis set of functions on the cut sky, which has been studied extensively in the literature (see e.g. Górski 1994; Górski et al. 1994; Mortlock, Challinor & Hobson 2002). If the sky cut is relatively small and the data are noise-free and band limited, the coupling matrix will be non-singular and can be inverted to yield the full-sky harmonics ** a** exactly. If the data are noise-free but not band limited, the matrix will become numerically singular on the incomplete sky as ℓ

_{max}→∞. (As a rule of thumb the matrix will become singular if ℓ

_{max}exceeds the inverse of the width of the sky cut in radians.) This simply tells us that there are ‘ambiguous’ harmonic coefficients that are unconstrained by the data outside the sky cut. For noise-free data that are not strictly band limited, the solution (16) truncated to a finite value of ℓ

_{max}will amplify some of the high-frequency signal which will appear as ‘noise’ within the sky cut in the reconstruction

*x*^{e}=

**Y**

*a*^{e}. The amplitude of this ‘noise’ can be reduced by an appropriate choice of the matrix

**A**. If we assume that the signal and noise are Gaussian, the optimal solution of (15) is the familiar ‘map-making’ solution

*independent*of any assumptions concerning statistical isotropy.

Fig. 3 illustrates the application of this machinery. The upper row shows the smoothed *WMAP* 5-yr ILC map (to the left-hand side) and the degraded resolution KQ75 mask (to the right-hand side). The remaining figures show the reconstructed maps from the harmonic coefficients (19a) for the KQ85 mask (figures to the left-hand side) and for the KQ75 mask (figures to the right-hand side). The figures show the reconstructions with ℓ_{max} truncated at 5, 10, 15 and 20. The maps for the two sky cuts at ℓ_{max}= 5 and 10 are virtually identical, and by ℓ= 10 the reconstructions look visually similar to the ILC map over the entire sky. For ℓ_{max}= 15 and 20, the reconstructions for the KQ85 mask are stable and, again, look very similar to the *WMAP* ILC map over the whole sky. For the KQ75 mask, one can see ‘noise’ (i.e. reconstruction errors) beginning to appear inside the sky cut when ℓ_{max} is increased to ℓ_{max}= 15 and 20.

However, the high harmonics that contribute to the ‘noise’ in Fig. 3 make very little contribution to the correlation function at large angular scales. This is illustrated in Fig. 4, which shows the dependence of the correlation functions

on ℓ_{max}for each of the each of the sky cuts. In the case of zero sky cut, the correlation function stabilizes to its final shape by ℓ

_{max}= 10; higher multipoles make a negligible contribution to the correlation function at large angular scales. The reconstructed correlation functions for the KQ85 and KQ75 masks are almost identical to the all-sky correlation function for ℓ

_{max}= 5, 10 and 15. For the KQ75 mask, one can begin to see the effects of reconstruction noise in

*C*(θ) for ℓ

^{e}_{max}= 20, but the correlation function for the KQ85 mask remains stable.

In Table 2, we list the mean square temperature at each multipole,

for the first few multipoles (ℓ≤ 10) computed from the ILC map and the residuals, for our reconstructed maps. For the KQ85 mask, all multipoles with ℓ≤ 10 are reconstructed to high accuracy, with residuals δ(Δ*T*)

^{2}

_{ℓ}∼ 2 μK

^{2}or less. For the KQ75 mask, the reconstruction begins to break down for multipoles ℓ≳ 6 if ℓ

_{max}≥ 15 because of the coupling with ‘ambiguous’ modes that cannot accurately be reconstructed within the sky cut. Nevertheless, since the shape of the correlation function is dominated by the low multipoles, the shape remains reasonably stable even for ℓ

_{max}= 20 (Fig. 4d).

ℓℓ | (ΔT^{2})_{ℓ} ILC | δ(ΔT^{2})_{ℓ} Harmonic reconstruction KQ85 | δ(ΔT^{2})_{ℓ} Harmonic reconstruction KQ75 | ||||||

ℓ_{max}= 5 | ℓ_{max}= 10 | ℓ_{max}= 15 | ℓ_{max}= 20 | ℓ_{max}= 5 | ℓ_{max}= 10 | ℓ_{max}= 15 | ℓ_{max}= 20 | ||

2 | 98.8 | 0.28 | 0.04 | 0.29 | 0.22 | 6.5 | 2.0 | 5.0 | 5.4 |

3 | 292.2 | 0.14 | 0.26 | 0.31 | 0.58 | 2.4 | 4.0 | 15.0 | 14.1 |

4 | 150.3 | 0.57 | 0.51 | 0.99 | 1.09 | 13.8 | 3.9 | 36.2 | 38.1 |

5 | 254.3 | 0.36 | 1.18 | 1.69 | 1.41 | 3.4 | 13.2 | 23.6 | 35.8 |

6 | 79.6 | – | 0.88 | 1.89 | 1.85 | – | 4.8 | 63.1 | 39.2 |

7 | 132.4 | – | 1.96 | 2.80 | 2.28 | – | 13.9 | 37.7 | 42.8 |

8 | 55.3 | – | 1.72 | 3.65 | 2.25 | – | 9.3 | 67.8 | 65.6 |

9 | 44.0 | – | 1.77 | 2.84 | 2.17 | – | 10.6 | 30.0 | 44.7 |

10 | 45.6 | – | 1.23 | 2.89 | 1.69 | – | 4.3 | 43.5 | 33.3 |

ℓℓ | (ΔT^{2})_{ℓ} ILC | δ(ΔT^{2})_{ℓ} Harmonic reconstruction KQ85 | δ(ΔT^{2})_{ℓ} Harmonic reconstruction KQ75 | ||||||

ℓ_{max}= 5 | ℓ_{max}= 10 | ℓ_{max}= 15 | ℓ_{max}= 20 | ℓ_{max}= 5 | ℓ_{max}= 10 | ℓ_{max}= 15 | ℓ_{max}= 20 | ||

2 | 98.8 | 0.28 | 0.04 | 0.29 | 0.22 | 6.5 | 2.0 | 5.0 | 5.4 |

3 | 292.2 | 0.14 | 0.26 | 0.31 | 0.58 | 2.4 | 4.0 | 15.0 | 14.1 |

4 | 150.3 | 0.57 | 0.51 | 0.99 | 1.09 | 13.8 | 3.9 | 36.2 | 38.1 |

5 | 254.3 | 0.36 | 1.18 | 1.69 | 1.41 | 3.4 | 13.2 | 23.6 | 35.8 |

6 | 79.6 | – | 0.88 | 1.89 | 1.85 | – | 4.8 | 63.1 | 39.2 |

7 | 132.4 | – | 1.96 | 2.80 | 2.28 | – | 13.9 | 37.7 | 42.8 |

8 | 55.3 | – | 1.72 | 3.65 | 2.25 | – | 9.3 | 67.8 | 65.6 |

9 | 44.0 | – | 1.77 | 2.84 | 2.17 | – | 10.6 | 30.0 | 44.7 |

10 | 45.6 | – | 1.23 | 2.89 | 1.69 | – | 4.3 | 43.5 | 33.3 |

This analysis shows that it is possible to reconstruct the low-order harmonic coefficients that contribute to the large-angle correlation functions accurately from data on the cut sky. The sky cut is basically irrelevant and so the all-sky form of the correlation function can be reconstructed from the cut sky irrespective of any assumptions concerning Gaussianity or statistical isotropy and with only a very weak dependence on the assumed shape of the covariance matrix . Values for the *S*_{1/2} statistic for each of the cases shown in Fig. 4 are listed in Table 1.

Note that if we define weighted harmonic coefficients,

then the power spectrum computed from these weighted coefficients is where In other words, the power spectrum of the weighted coefficients is identically equivalent to the QML power-spectrum estimator (Tegmark 1997; de Oliviera-Costa & Tegmark 2006). If statistical isotropy holds, and the data are noise-free, the quantity provides an unbiased estimate of the power spectrum, where**F**is the Fisher matrix, Note that for the complete sky, and for noise-free data, in the limit ℓ

_{max}→∞, i.e. the variance on the

*a*

_{ℓm}is just the cosmic variance. The Fisher matrix is and so the QML estimates are identical to the PCL estimates. For relatively small sky cuts such as the KQ85 and KQ75 masks, the Fisher matrix at low multipoles will be

*almost*diagonal (see Efstathiou 2004b), and the recovered power spectrum from the cut sky will be almost identical to the true power spectrum computed from the whole sky. The QML estimator effectively performs the reconstruction

*a*^{e}of equation (19a), but uses the assumption of statistical isotropy to downweight ‘ambiguous’ modes that are poorly constrained by the sky cut.

For small sky cuts, we would therefore expect the QML correlation function estimate (14) to be almost identical at large angular scales to the correlation functions computed from the reconstructed coefficients *a*^{e}. (They are, of course, mathematically identical for zero sky cut.) *C ^{Q}*(θ) is expected to behave more stably than

*C*(θ) as the sky cut is increased since the QML correlation function downweights ambiguous modes. This is exactly what we see when we apply the QML estimate to the

^{e}*WMAP*5-yr ILC maps (see Fig. 5). The ACF is almost independent of the sky cut, confirming the results of E04. Values of the

*S*

_{1/2}statistic for the QML ACF estimates are listed in the final column of Table 1.

The QML power-spectrum estimates are plotted in Fig. 6. The power-spectrum coefficients are extremely stable to the sky cut, varying by only a few tens of (μK)^{2} for ℓ≤ 10. The figure compares these estimates to the power-spectrum estimates for the reconstructed all-sky maps using equation (19a). We plot the results for ℓ_{max}= 10 since this value is large enough to determine the shape of the ACF at large angular scales, but small enough to limit the noise in the reconstructed maps at high multipoles. The power spectra of the reconstructed maps are very close to the QML estimates at ℓ≤ 8, though one can begin to see the effects of reconstruction noise in the KQ75 case at ℓ > 8. [However, as Fig. 4(b) shows, this reconstruction noise has very little effect on the shape of the ACF at large angular scales.]

The results of this section show that the low-order multipole coefficients that determine the behaviour of the correlation function at large angular scales can be reconstructed to high accuracy from data on the incomplete sky, independent of any assumptions concerning statistical isotropy. The usual motivation for applying a sky cut is to remove regions of the sky that may be contaminated by residual Galactic emission. However, for the KQ85 and KQ75 sky cuts, the missing area of sky leads to little loss of information at low multipoles. The low multipoles can therefore be reconstructed from the data on the incomplete sky. The imposition of the sky cuts does not remove foreground contamination at these low multipoles: any residual Galactic contribution to the low multipoles in the ILC map is, like the CMB signal, faithfully reproduced by the reconstructions shown in Fig. 3. What a Galactic cut can do is to mask out localized Galactic emission (‘ambiguous’ modes) that could, in principle, couple to the low multipoles in a way that depends on the estimator (e.g. via the coupling matrix in the simple inversion of equation 18). The similarities between the reconstructions of Fig. 3 and the full-sky ILC map (and the correlation functions and power spectra plotted in Figs 5 and 6) show that the ILC map has removed Galactic emission successfully at low Galactic latitudes since there is no evidence of high-amplitude ‘ambiguous’ modes in the ILC map within the region of the sky cut.

If suitable estimators are applied to noise-free data, a sky cut of the size of the KQ85 or KQ75 masks has little impact on the reconstruction of the low-order multipoles or the all-sky ACF. The imposition of a sky cut does, however, lead to a loss of information if a poor estimator is used to estimate the ACF. This is what happens when the pixel estimator (3) is used to estimate the ACF on the cut sky (Copi et al. 2007; Hajian 2007; CHSS09). The analysis presented in this section provides compelling evidence that the true value of the *S*_{1/2} statistic for our realization of the sky is in the region of 6000–8000 (μK)^{4}, independent of the sky cut. The remaining question is whether the anomalously low *p*-values implied by the cut-sky pixel ACF are the result of a statistical fluke, or indicative of new physics.

## 4 ANALYSIS OF THE *S*_{1/2} STATISTIC

In this section we analyse the *S*_{1/2} statistic, first from a Bayesian point of view, and then from a frequentist point of view. We then discuss the interpretation of the low frequentist *p*-values found by CHSS09.

### 4.1 Approximate Bayesian analysis

We begin by performing an approximate Bayesian analysis to compute the posterior distribution of the *S*_{1/2} given the data on the assumption that the fluctuations are Gaussian and statistically isotropic. If the data were noise-free and covered the entire sky, then, under the assumptions of statistical isotropy and Gaussianity, the data power spectrum *C*^{d}_{ℓ} provides a loss-free description of the data. Assuming uniform priors on each of the *C*^{T}_{ℓ}, the posterior distribution of the theory power-spectrum coefficients *C*^{T}_{ℓ} is given by the inverse Gamma distribution

*C*

^{T}

_{ℓ}is statistically independent and the mean value is The distribution (32) will therefore favour theory values that are larger than the observed values

*C*

^{d}

_{ℓ}.

The results of the previous section (and Fig. 6 in particular) show that the low multipoles are well determined and insensitive to the application of a sky cut. We can therefore use the measurements *C*^{d}_{ℓ} computed over the whole sky to represent the data.5 The multipole expansion is truncated at ℓ_{max}= 20 (although as discussed in the previous section, multipoles greater than ℓ≈ 10 make very little contribution to the ACF at large angular scales) and statistically independent *C*^{T}_{ℓ} values are generated from the inverse Gamma distribution (31). These values are then used to generate Gaussian *a*^{T}_{ℓm} from which we synthesize real-space maps *x _{i}* at a healpix resolution of NSIDE = 16 smoothed with a Gaussian of FWHM 10°. We then compute

*S*

_{1/2}from the pixel correlation function (3). This methodology provides a test of statistically isotropic Gaussian models, with no additional constraints imposed on the theory

*C*

^{T}

_{ℓ}apart from uniform priors.

The posterior distributions of *S*^{T}_{1/2} are shown in Fig. 7(a) for the analysis of all-sky maps (red/solid histogram) and for maps with the KQ75 mask applied (blue/dotted histogram). The distribution for the trials with the sky cut applied is slightly broader than the distribution for the all-sky trials, as expected since the pixel ACF estimator is suboptimal on a cut sky. The peaks of the distributions occur at *S*^{T}_{1/2}≈ 6000 (μK)^{4} and so low values of *S*^{T}_{1/2} are clearly preferred by the data. However, the posterior distributions have a very long tail to high values [as expected from the inverse Gamma distribution (equation 32)]. The best-fitting six-parameter ΛCDM model as determined from the 5-yr *WMAP* analysis (Komatsu et al. 2009) has a value of *S*^{T}_{1/2}∼ 49 000 (μK)^{4}. At this value (indicated by the vertical dashed line in Fig. 7a), the posterior distribution has fallen to a value of about 0.4. Such high values of *S*^{T}_{1/2} are evidently not favoured by the data, but they are not strongly disfavoured. Very low values of *S*^{T}_{1/2} of ∼ 1000 (μK)^{4} are also not strongly disfavoured.

From the Bayesian point of view, the quantity *S*_{1/2} is a poor discriminator of theoretical models and so is relatively uninformative. The posterior distributions of Fig. 7(a) are extremely broad with a long tail to high values. The data, irrespective of estimator or sky cut, clearly prefer low values of *S*_{1/2} but cannot exclude the value of *S*^{T}_{1/2}∼ 49 000 (μK)^{4} expected for the concordance inflationary ΛCDM model.

### 4.2 Frequentist analysis

We now generate statistically isotropic Gaussian realizations with the *C*^{T}_{ℓ} constrained to those of the best-fitting ΛCDM model. The frequency distributions of *S*_{1/2} computed from the pixel estimator are plotted in Fig. 7(b). The distributions of Figs 7(a) and (b) look fairly similar, but the frequentist interpretation is very different. For the all-sky analysis, the *p*-value of finding *S*_{1/2} < 7373 (μK)^{4} is 8 per cent and hence is not statistically significant. However, if we apply the KQ75 mask, the *p*-value for *S*_{1/2} < 647 (μK)^{4} is only 0.065 per cent. This result appears strongly significant and, at face value, inconsistent with the *p*-value for the all-sky analysis.

The low *p*-value found here and by S03 and CHSS09 come exclusively from analysing cut-sky maps with ‘suboptimal’ (in the sense of not reproducing the ACF for the whole sky) estimators. The sky cuts, ostensibly imposed to reduce any effects of Galactic emission at low Galactic latitudes, lead to a loss of information and to poorer estimates of the ACF for *our realization* of the sky. But as we have demonstrated, the information on the ACF at large angles for our realization of the sky is contained in the data outside the sky cuts. The imposition of a sky cut therefore has little to do with reducing the effects of Galactic emission on the ACF at large angular scales. If there is any cosmological significance to the low *p*-values, then one must accept that the Galactic cut aligns with the signal, *purely by coincidence*, in just such a way as to remove the large-scale angular correlations for particular choices of estimator of the ACF. This alignment may indicate a violation of statistical isotropy, as argued by CHSS09, but if this is true the alignment with the Galactic plane must be purely coincidental.

It seems to us that a more plausible interpretation of the low *p*-values is that they are a consequence of the a posteriori selection of the *S*_{1/2} statistic by S03 for a particular choice of estimator and sky cut. It is difficult to quantify the effects of a posteriori choices. However, numerical tests with the more general statistic

*S*

_{1/2}for the choices μ= 1/2 and

*p*= 2) suggest that it is possible to alter

*p*-values by an order of magnitude or more by selecting the parameters in response to the data. It would be possible to raise the

*p*-values even more by varying the size and orientation of a sky cut.

Is there any way of testing this hypothesis further? In the ΛCDM model, the integrated Sachs–Wolfe (ISW) effect (Sachs & Wolfe 1967) makes a significant contribution to the total temperature anisotropy signal at low multipoles. The ISW contribution from the time of last scattering (*t*_{LS}) and the present day (*t*_{0}) is given by

*z*< 0.3. If a posteriori choices are responsible for the low

*p*-values, we should find large changes to the pixel ACF estimates for the masked sky when the

*WMAP*ILC maps are corrected for the local ISW contribution. As shown in Fig. 8, this is indeed what we find when we subtract the local ISW contribution computed by Francis & Peacock from the 5-yr

*WMAP*ILC map. The

*S*

_{1/2}statistic computed from the ACFs shown in Fig. 8 are 10 360 (μK)

^{4}(all-sky), 6463 (μK)

^{4}(KQ85 mask) and 5257 (μK)

^{4}(KQ75 mask), all consistent with the concordance ΛCDM model at the few per cent level. This is consistent with our hypothesis that the CHSS09 low

*p*-values are a fluke, unless one is prepared to argue that there is a physical alignment of local structure with the potential fluctuations at the last scattering surface that conspires to remove large-angle temperature correlations in the regions outside the Galactic mask (which seems implausible to us). Francis & Peacock (2009) discuss how the local ISW correction affects a number of other low multipole statistics, in particular reducing the statistical significance of the alignment between the quadrupole and octopole.

## 5 DISCUSSION AND CONCLUSIONS

The low amplitude of the temperature autocorrelation function at large angular scales has led to some controversy since the publication of the first-year results from *WMAP*. This paper has sought to clarify the following points.

(1) We have compared different estimators of the ACF showing (i) how they depend on assumptions of statistical isotropy; (ii) how they are interrelated; (iii) how they perform on the *WMAP* 5-yr ILC maps with and without a sky cut.

(2) The imposition of the KQ85 and KQ75 sky masks leads to little loss of information on the low multipoles that contribute to the large-scale ACF. As demonstrated in Section 3, the low multipole harmonics can be reconstructed accurately from the data that lie outside the sky cuts, independent of any assumptions concerning statistical isotropy and with only a weak dependence of the shape of the pixel covariance matrix . The ACFs computed from these reconstructions are in good agreement with the ACF computed from the whole sky and in good agreement with the maximum likelihood estimator (14). There can be little doubt that the large-scale ACF for our realization of the sky is very close to the all-sky results shown in Figs 1 and 5.

(3) The lack of large-scale structure seen in the cut-sky pixel ACF arises from a particular alignment of the low-order multipoles (see also Pontzen & Peiris 2010) which, as we have demonstrated, can be reconstructed accurately from the data outside the sky cut. The ACF at large angular scales is insensitive to high-order modes localized within the sky cut.

(4) The Bayesian analysis presented in Section 4 shows that the posterior distribution of the *S*^{T}_{1/2} is broad and cannot exclude the value *S*^{T}_{1/2}∼ 49 000 (μK)^{4} appropriate for the Komatsu et al. (2009) best-fitting inflationary ΛCDM model. The breadth of the posterior distribution of *S*^{T}_{1/2} shows that it is fairly uninformative statistic and so is not a particularly good discriminator of theoretical models.

(5) Unusually low values of the *S*_{1/2} statistic are found only if ‘suboptimal’ ACF estimators are applied to maps that include a Galactic mask. We have argued that the low *p*-values associated with these low values of *S*_{1/2} are most plausibly a result of a posteriori choices of statistic. This seems plausible to us because (i) *S*-type statistics are relatively uninformative and hence sensitive to a posteriori choices; (ii) the all-sky ACF (which is compatible with the concordance ΛCDM model) can be recovered from the data outside the mask and so any physical model for the low *p*-values requires a fortuitous alignment of the temperature field with the Galaxy; (iii) the analysis of the local ISW-corrected maps presented in Section 4 suggests that any physical model of the low *p*-values requires a precise alignment of local structure with the large-scale potential fluctuations at the last scattering surface.

(6) If one does not accept that the low *p*-values are associated with a posteriori choices, then one must accept that they may be indicative of new physics as suggested by CHSS09.

In summary, the results of this paper suggest to us that, irrespective of the imposition of Galactic sky cuts or assumptions of statistical isotropy, the large-scale correlations of the CMB temperature field provide unconvincing evidence against the concordance inflationary ΛCDM cosmology.

*C*

^{d}

_{ℓ}are ignored.

The authors acknowledge use of the healpix package and the Legacy Archive for Microwave Background Data Analysis (LAMBDA). Support for LAMBDA is provided by the NASA Office of Space Science. We are particularly grateful to John Peacock and Caroline Francis for allowing us to use their ISW maps. We are also thankful to the referee for providing useful comments on this paper. DH is grateful for the support of a Gates scholarship.

## REFERENCES

*et al.*,