We investigate two possible explanations for the large-angle anomalies in the cosmic microwave background (CMB): an intrinsically anisotropic model and an inhomogeneous model. We take as an example of the former a Bianchi model (which leaves a spiral pattern in the sky) and of the latter a background model that already contains a non-linear long-wavelength plane wave (leaving a stripy pattern in the sky). We make use of an adaptation of the ‘template’ formalism, previously designed to detect galactic foregrounds, to recognize these patterns and produce confidence levels for their detection. The ‘corrected’ maps, from which these patterns have been removed, are free of anomalies, in particular their quadrupole and octopole are not planar and their intensities are not low. We stress that although the ‘template’ detections are not found to be statistically significant they do correct statistically significant anomalies.
Since the release of the first year data from NASA's Wilkinson Microwave Anisotropy Probe (WMAP) (Bennett et al. 2003a), there have been extensive studies of the cosmic microwave background (CMB) maps. Generally, the results impressively support the favoured Lambda cold dark matter (ΛCDM) cosmological model, though a lack of power on the largest scales has produced much debate (see e.g. Bridle et al. 2003; Cline, Crotty & Lesgourgues 2003; Contaldi et al. 2003; Bielewicz, Gorski & Banday 2004; de Oliveira-Costa et al. 2004; Efstathiou 2004; Slosar, Seljak & Makarov 2004; Weeks et al. 2004; Zhao & Zhang 2005). Of further interest are studies that have detected evidence for significant departures from the fundamental cosmological assumption that the Universe is isotropic and homogeneous (de Oliveira-Costa et al. 2004; Eriksen et al. 2004, 2005; Hansen, Banday & Górski 2004a; Hansen et al. 2004b; Ralston & Jain 2004; Roukema et al. 2004; Schwarz et al. 2004; Vielva et al. 2004; Copi et al. 2005; Jaffe et al. 2005; Land & Magueijo 2005a,b). Thus, it appears that the large-angle CMB multipoles are anomalous in two seemingly distinct ways. First, their power Cℓ is abnormally low. Secondly, they have an improbable directionality revealed by the fact that for a certain z-axis' orientation, one m mode absorbs most of the power, for ℓ= 2, …, 5 (Land & Magueijo 2005b). Naively, one may expect the two features to be related: the absence of power in all possible m modes may be why the observed average Cℓ is so low. However, this has to be made rigorous. Such is the purpose of this letter: could it be that accounting for the large-angle directional multipoles accounts for the observed low Cℓ?
ΛCDM theory matches the power spectrum very well on most scales (Bennett et al. 2003a), but the strong non-Gaussianity and/or anisotropy of the low multipoles suggests a competition between two processes: Gaussian isotropic random fluctuations and a deterministic process. One possibility for the latter is intrinsic cosmic anisotropy (as encoded in the Bianchi models) which imparts a spiral pattern in the CMB as studied in Barrow, Juszkiewicz & Sonoda (1985), Bunn, Ferreira & Silk (1996), Kogut, Hinshaw & Banday (1997) and Jaffe et al. (2005). The other is a background model that already contains a non-linear long-wavelength plane wave. This would leave a striking stripy pattern in the sky (a ‘Jupiter’, if the polar axis is defined suitably). The matter may then be addressed by regarding these patterns as a ‘contamination’ template, and attempting to ‘correct’ the data for its coupling to the template, thereby purifying the underlying Gaussian process. The formalism has been developed to deal with galactic foregrounds but may be adapted to this situation.
We therefore seek a template capable of explaining the preferred axis in the CMB, and ask the question: could the corrected map reveal higher Cℓs, thus linking the two issues? Note that this is highly non-trivial. Usually when one corrects a map for a ‘foreground’ the power in the corrected map is smaller. However, it could also be that the deterministic process acted destructively. For this to be possible, the template would have to have a pattern complementary to the observed data, i.e. contain power in (at least some of the) m's which fail to reveal it in the data; in addition this power would have to have acted destructively upon the underlying Gaussian process, so that only one m mode survived in the data.
The presence of a template is more likely to add power than to remove it, so to explain the low-ℓ low power this way one relies in part on a chance alignment of phases between the Gaussian fluctuations and the template. Still, in this paper we explore this approach with two different models. In both cases, once we correct the data, and are left with a map with higher low-multipole power Cℓ, and no preferred direction as seen in Land & Magueijo (2005b).
In Section 2, we outline our method for selecting favoured templates. Then, in Section 3 we introduce the templates of our two specific models, and in Section 4 we explain how we selected the data and record our results. We discuss the implications of these in Sections 5 and 6.
2 THE FORMALISM
Our work is modelled on that of Jaffe et al. (2005) but with an important extension. We start by reviewing Jaffe et al. (2005). Suppose the sky may be modelled as the sum of a Gaussian process and a non-Gaussian ‘template’, . Then, the maximum likelihood estimation (MLE) problem may be set up by introducing a coupling parameter α so that . The usual MLE problem now concerns , that is
(we have ignored an additive constant in the first expression). Here, is the covariance matrix of (which may include noise) and the data vectors may be either temperatures in pixel space, or a set of spherical harmonic coefficients aℓm. In the latter case, the data are complex, satisfying ‘reality’ constraints; the concomitant modifications to the formalism are trivial. For full sky maps, the aℓm are the preferred formulation.
In general, we have a joint estimation problem for α and the unknowns parametrizing (which can be binned Cℓ or the half dozen parameters encoding ‘cosmic concordance’). However, it is often the case that the range of ℓs which decides the problem and that for the template is disjoint. For example, the geometry of the Universe Ω, or the amount of baryonic matter Ωb, is mainly decided by the Doppler peaks, whereas we are interested in templates producing large-angle non-Gaussian features (say ℓ= 2–5; see Land & Magueijo 2005b). One may then decouple the two problems: find ignoring the template and then solve for α using this solution for .
2.1 Parameter estimation
The MLE problem for α then becomes the problem of minimizing χ2. The solution is
By construction, α is a Gaussian variable. Its variance is
Two comments are in order. First, note that this formalism wants the template to be like the data. If a mysterious theory was found predicting , then α= 1 and the MLE problem tells us that the Gaussian process is in fact absent.1 This is not in line with common sense: we do know that there is a Gaussian process. Secondly, this procedure always produces a corrected map with lower χ2 than the original map . This is fine if one suspects contamination because of an abnormally high χ2, but what if the observed χ2 is too low? In that case, the corrected map is even more abnormal.
As is usually the case, these quibbles boil down to the choice of parameters and priors used. We have failed to impose a suitable condition enforcing the presence of a Gaussian process, so that the corrected maps have a reduced χ2 close to 1. If we are in aℓm space, for example, we are assuming a uniform prior in aℓm. The peak of the distribution is always at aℓm= 0, so if is such that this condition can be satisfied that is precisely what the formalism does (a zero χ2 is the highest probability configuration if one assumes uniform priors in aℓm). Seen in another way, if we have a deterministic template competing with a Gaussian process on equal footing, the formalism will always try to suppress the random Gaussian process in favour of the certainties of the template.
To address this issue, we use an alternative function of the data in calculating the likelihood. We base the likelihood on the probability distribution of the ln(χ2) (this rather than χ2 produces the correct power spectrum estimator). By changing variables, the likelihood is given by
where D is the number of degrees of freedom (i.e. the number of pixels, or the number of aℓm being considered). Its maximization with respect to α now leads to
i.e. either the first factor is zero (equivalent to the problem of minimizing the χ2 solved above), or χ2=D, which means the reduced χ2 is 1. This is simple to solve, and involves finding the roots of the quadratic equation (2). The solution is
This formalism for correcting the map generalizes that of Jaffe et al. (2005), while reducing to it in the appropriate circumstances. If the quadratic does not have real solutions for α[because the minimum of χ2(α) is above D], then the problem reduces to Jaffe et al. (2005). This is the case where the χ2 of the data is too high, so that correcting it entails reducing it. In this case, the two formalisms agree. However, if the data's χ2 is too low then our formalism takes over. The quadratic then has real solutions, and the MLE problem is solved by solutions α± given by equation (7).
We note that this approach could conceptually be improved by factorizing the likelihood ℓ by ℓ and imposing uniform priors on an appropriate function of eachχ2ℓ. Minimizing the likelihood with respect to α would now lead to the problem of solving
However, this is in practice impossible to solve, especially for large ℓmax. Therefore, we are solving for the simpler case of .
2.2 Model comparison
We have found the necessary correction to make for a given template. Perhaps more important is to find the preferred template, i.e. the preferred model. The Bayesian approach to model comparison is to marginalize over the parameters of a model, to find the probability of data given in the model (whatever the parameters), as this relates to the probability of the model given by the data. We have been working with P(α, t), and above we discuss the specific α solutions that maximize this for a given template. We now wish to compare the likelihood of different templates, that is we require just P(t) – a marginalization over α. The standard approach to this (see e.g. Jaffe et al. 2003; Slosar & Seljak 2004) is to absorb the uncertainties of α into the covariance matrix.2 That is
where is the usual covariance matrix associated with the aℓms as used above. Using the Sherman–Morrison–Woodbury (Jaffe et al. 2003) formula, we find
Assuming uniform priors on our templates, our model comparison now involves maximizing the likelihood equation (5) using the new expression for χ2 and the matrix . In Slosar & Seljak (2004), there is a discussion of how this method of marginalizing over α inside the covariance matrix is equivalent to ignoring the data that correlate with the template when working out the χ2.
The likelihood function (5) has its maximum at χ2=D. The effective χ2 from a template is given by equation (15), and we see it is always less than χ2d. Therefore, if the initial χ2 of the data is low: χ2 < χ2d < D, the preferred template is that which maximizes the χ2 (minimizes Γ) so as to bring it as close to χ2d as possible, the optimal solution being Γ= 0. Conversely, if the data have a high χ2, our formalism reduces to that of Jaffe et al. (2005), where we maximize Γ so as to reduce the χ2 to as close to D as possible.
3 THE TEMPLATES
We consider two prototypes of violation of isotropy and homogeneity: Bianchi models and plane-wave inhomogeneous models.
3.1 Bianchi models
Bianchi models are well-known homogeneous, anisotropic generalizations of Friedmann–Robertson–Walker models. They may be used as a competitor to standard cosmology, constraining violations of isotropy while still assuming homogeneity. In these models, the CMB is never isotropic, even before fluctuations are added on. Of particular interest are Bianchi VIIh models, for which an asymmetric spiral pattern is imprinted upon the sky. These models were extensively examined in Jaffe et al. (2005) with reference to asymmetries in the power spectrum and anomalous hotspots. We have reproduced these results with our programmes. One of the deficiencies of the fits found in Jaffe et al. (2005) (recognized by the authors) is that they require Ω≠ 1. This would almost certainly lead to a bad fit at high ℓ even if the large-angle corrected map is improved. For this reason, here we will restrict ourselves to Ω= 1 models.
We follow the parametrization of the Bianchi VIIh model from Barrow et al. (1985). These parameters are , handedness and the Euler angles (φ, θ, ψ).3 We refer the reader to Barrow et al. (1985) for a fuller explanation. We confine ourselves to the limiting Bianchi VII0 model with Ω→ 1 (h→ 0), and explore the range x∈[0.1, 10], with both left and right handedness, over the total range of Euler angles. (Note that for this Bianchi VII0 model x is finite, and no longer equal to the above expression).
3.2 Plane-wave cosmology
Inhomogeneous cosmologies are either very simple or very complicated (Krasinski 1997). We consider a model in which a plane wave in the gravitational potential Φ is part and parcel of the background model (i.e. the unperturbed, zeroth-order cosmological model, before the standard Gaussian isotropic scale-invariant adiabatic fluctuations are added to it).
An outstanding wave might be the result of several processes. It could be for example the hallmark of non-trivial topology (Starobinsky 1993; Roukema et al. 2004; Weeks et al. 2004; Hipolito-Ricaldi & Gomero 2005). Non-trivial topologies may be ruled out on the grounds that they would imprint repeat patterns in the CMB sky, e.g. matching circles in antipodal locations for the most basic ‘slab topology’. This constraint is only applicable if the fundamental domain is smaller than the last scattering surface.
Larger-than-the-horizon domains have two effects upon the density plane waves riding the homogeneous model. First, modes must fit into the domain, leading to a discretization. However, they also introduce a scale in the problem, the size of the domain, L. Hence, the usual naive arguments for scale invariance no longer apply and k3Φ2 may actually be a function of kL without introducing an arbitrary scale in the problem. We speculate that even if L is sufficiently large that one may neglect mode discretization inside the horizon, the fundamental mode k = 2π/L will be very intense, and dominate over whatever underlying (scale-invariant) Gaussian process. Interestingly in Hipolito-Ricaldi & Gomero (2005), it was shown that multipole alignments, like those observed (de Oliveira-Costa et al. 2004; Schwarz et al. 2004; Land & Magueijo 2005b) may be produced in certain non-trivial topologies, although an ‘axis of evil’ is not generally seen in a Slab topology (Cresswell et al. 2005). Further, for a T1 topology we can expect a symmetry plane to exist in the CMB (Starobinsky 1993), which may be similar to the observations reported in Land & Magueijo (2005e). This remains to be seen.
Another context in which strong long-wavelength modes have been studied is the work of Kolb, Matarrese & Riotto (2005). Here (controversial) claims were made that super horizon scale fluctuations may be causing the acceleration of the universe. It would be interesting to investigate further the exact length/intensity ratio required for such a model to work (should it work at all). The impact upon the CMB would then fall under the present study.
Regardless of these two possible motivations, it is interesting to examine the evidence for one such wave in the CMB, and it is this more phenomenological approach that will be followed in our paper. We will search for evidence of a dominating k mode with wavelength λ∈[0.5, 10] in units of the diameter of the last scattering surface. Our parameters for this model are the wavelength λ, the phase ρ and the Euler angles (φ, θ, ψ) (we actually do not need to concern ourselves with the first rotation φ about the z-axis as the wave is cylindrically symmetric).
We use full sky maps, specifically we look at the cleaned map of Tegmark, de Oliveira-Costa & Hamilton (2003), as this map has been shown to have interesting features (Land & Magueijo 2005b). In our covariance matrix, we include only the theoretical power spectrum terms from WMAP (Bennett et al. 2003a), and ignore noise (we are considering only the low ℓs). We include a 24° beam, and consider the ℓ range as 2–5 (32 degrees of freedom). For this ℓ range, this map finds a χ2d= 27.06 < 32, and so the formalism outlined in Section 2 requires that we select the template that minimizes Γ.
We use this ℓ range for two reasons. First, two significant and different anomalies have been seen in these scales, a preferred direction and low power. Secondly, this is the relevant ℓ range for the templates we are examining. Considering an ℓ range not covered by a template is pointless when the purpose is to assess the effect a template will have on the map.
We then minimize Γ over the range of parameters for both template types. It turns out that in neither cases do we find one clear optimal solution: there are numerous templates that find Γ≈ 0 within the limitations of computational accuracy and the rotation grid. This is due to the small ℓ range, but also due to the fact that the observed data are such a bad fit, with a very low χ2. Note that we are limited to this narrow ℓ range by our choice of models as this is where their power lies.
Given the degeneracy of our solution, we have to impose an additional condition to select the preferred templates. Considering that the full (unsolvable) problem requires solving equation (11), we may break the degeneracy by evaluating the χ2ℓ and defining
By minimizing Δ, we find the solution of the abridged problem that is closest to the solution to the full problem. Then, the corrected map will have the power spectrum closest to the expected power spectrum line. This also breaks the degeneracy between the two α solutions associated with each template (see equation 7).
We find that our preferred Bianchi template has x= 1.4 with left-hand vorticity, Euler angles (φ, θ, ψ) = (−82, 72, −62) and therefore an axis in the direction (l, b) = (−62, 18) in galactic coordinates. We find the preferred large wave template has λ= 0.96 and ρ= 44°, in the direction (l, b) = (−38, 30) [equivalently a wave in the opposite direction with (l, b) = (142, −30) and phase π–ρ]. That is, the preferred wave is just slightly smaller than the size of the CMB.
We examined the ‘axis of evil’ behaviour of the corrected maps (Land & Magueijo 2005b), and results are in Table 1 and Fig. 1. We define a direction for each multipole by finding the m and the direction n that maximizes
where Cℓ0= |aℓ0|2, Cℓm= 2|aℓm|2 for m > 0 (note that two modes contribute for m≠ 0) and . This finds the axis for which a multipole is most dominated by just one m mode. As can be seen in Table 1, the uncorrected map finds similar directions for the ℓ= 2, 3, 4, 5 multipoles (as reported in Land & Magueijo 2005b) – a significant departure from isotropy. We see that the corrected maps find no such anisotropic behaviour. The average inter-ℓ angles found are perfectly consistent with isotropy as compared to simulations.
|ℓ||Before n||m||Bianchi corrected||Wave corrected|
|ℓ||Before n||m||Bianchi corrected||Wave corrected|
In Figs 2 and 3, we plot the map before and after the correction, only showing the multipoles ℓ= 2–5. In Fig. 4, we plot the much-examined quadrupole and octopole before and after the corrections. The alignment (Bielewicz et al. 2004; de Oliveira-Costa et al. 2004; Schwarz et al. 2004) can clearly be seen in the before maps, and not in the after maps, also the anomalously low power is increased (see Fig. 5).
We have therefore proved that all known large-angle anomalies may be coupled. There are models providing templates which can correct, in one go, all known anomalies – the low power and the ‘axis of evil’ effect. Whether or not the evidence for these models is high is another matter, which we now proceed to examine.
There are many ways to quantify the significance of the final correction. As already noted in Section 1, the expected power spectrum of a map will be higher than that of a purely Gaussian map if we assume the presence of a template. Thus, when considering the low-ℓ low power, if we take into account the presence of a template, then we actually make the situation worse. We consider simulations of Gaussian maps with the added template, for both the Bianchi and the wave models. We calculate power spectrum estimator for 5000 simulations and indeed we find that without adding a template, only 2.6 per cent of simulations find a C2 lower than that observed by WMAP. And with the Bianchi template added, this reduces to 1.2 per cent, thus making the fit worse.
However, in apparent contradiction to this we have a map that, when corrected for the preferred template, finds excellent consistency with the theoretical power spectrum. Therefore, we ask a different question here. We allow 1000 simulations to find their best template, for both model types. We record the χ2d before the correction, and the α, χ2t, Δ of the correction, and we only look at those simulations that find a low initial χ2d (525 of our 1000 simulations). We can now assess the significance of our results, given the prior assumption of a destructive alignment between the Gaussian process and the template.
We find the simulations return similar results to the WMAP map. That is we looked at the range of α values returned, and various functions of (α, χ2t, Δ, χ2d) and found our WMAP result not to be particularly anomalous compared to the simulation results. This was the case for both the Bianchi and the wave templates.
We therefore must conclude that the detections are not significant. However, we find the method still of interest, and note that it can be used to investigate new models of contamination in the future. We give one example of a possible improvement.
Qualitatively, the crux of the issue lies in that we must rely on a chance anti-alignment of phases between template and Gaussian process. We do not see power in some m modes because it was there both in the template and in the Gaussian process and the two added destructively. This will always sound suspicious and we suggest that models might be set up, where the underlying non-Gaussian process influences the production of Gaussian fluctuations so that the two processes are anticorrelated. Specifically, we could have set up a likelihood, where the χ2 functions for template and Gaussian process added (rather than their aℓm). However, such work is beyond the scope of this paper.
We found templates from two different models that can simultaneously explain the observed anisotropic alignments of the low-ℓ multipoles and their low power. The corrected maps show power spectra vastly more consistent with the standard ΛCDM theoretical power spectrum (Bennett et al. 2003a) than the uncorrected map. The scope of our work was twofold.
First, we proposed a formalism for dealing with templates in the presence of data which has a low χ2. In this approach, we promote the χ2 to the status of relevant function of the data for comparing theory and data. This is in contrast to the usual aℓms, and reduces the number of degrees of freedom. Furthermore, the likelihood problem is then turned on its head in that it enforces the strong prior that a Gaussian process with a power known a priori must be present in the corrected maps. Thus, this approach generalizes the previous one in that it reduces to it for cases where the data χ2 are too high, but leads significantly to different results when the observed χ2 is too low.
Note that standard methods for template correction will lead to a corrected map with a lower χ2. This is because it is more probable that the presence of a template will increase the power than remove it. We are therefore always on the back foot when trying to use a template to explain a low power spectrum (see Slosar & Seljak 2004; Slosar et al. 2004, for a similar discussion). We note that as the current data are such a bad fit, most of our templates (∼90 per cent) improved the C2 fit before we even imposed the second condition of minimizing Δ.
The second purpose of our paper was to use this formalism to investigate whether the low power (i.e. low χ2) observed in the low multipoles and the axis of evil effect might be related to each other, and whether both could be due to an underlying anisotropic or inhomogeneous model of the Universe. The axis of evil effect consists of the fact that for a given z-axis orientation the power in a given multipole is not distributed ‘at random’ among the various m modes but concentrates on planar models (m =±ℓ) for ℓ= 2, 3; furthermore, the axis for which this happens is roughly the same for ℓ= 2, …, 5. A priori the two effects should be coupled because the Cℓ is an average over m of the variance |aℓm|2. It looks as if some m modes missed their power and that if this was reinstated the low Cℓ and the ‘axis of evil’ anomalies would disappear. This might happen after correcting the nefarious influence of a deterministic template and it suggests that the template should have a strong non-planar component along the preferred direction.
We considered Bianchi models and also inhomogeneous models, where a very strong long-wavelength wave is a part of the background model. We find the preferred parameters for these models and correct the maps. The corrected maps are free of anomalies, in particular their quadrupole and octopole are not planar and their intensities not low. Therefore, we stress that although the ‘template’ detection is not found to be statistically significant, they do correct statistically significant anomalies.
While this paper was being finished two preprints appeared proposing explanations for the axis of evil effect (Gordon et al. 2005; Vale 2005). One of these is rather similar in spirit to ours (Gordon et al. 2005): the idea that a long-wavelength mode is imprinted in the sky. Even though our statistical treatment is rather different, both papers highlight the same difficulty with such explanations: that in general waves would add, rather than subtract power. Our suggested explanation (that an anticorrelation might exist between the Gaussian and the deterministic process) finds its counterpart in the model considered in Gordon et al. (2005) in the concept of multiplicative non-linear response. We find this idea very interesting, but as with our idea of Gaussian/template anticorrelation, we stress that it may not be necessary. The fit between the data and the pure Gaussian model is so bad that a fluke anti-alignment of phases is already a good enough explanation (even though clearly the Bayesian evidence for more complex models will always be superior).
We are also examining the possible correlations between the CMB anomalies and the large-scale structure and defer to a future publication (Lahav et al., in preparation) comments on Vale (2005).
We thank Andrew Jaffe, Carlo Contaldi, Anthony Banday, João Medeiros, Tesse Jaffe, Kris Górski and Max Tegmark for helpful comments. Our calculations made use of the healpix package (Górski, Hivon & Wandelt 1998) and were performed on COSMOS, the UK cosmology supercomputer facility. KL is funded by PPARC.