Blind foreground subtraction for intensity mapping experiments

We make use of a large set of fast simulations of an intensity mapping experiment with characteristics similar to those expected of the Square Kilometre Array (SKA) in order to study the viability and limits of blind foreground subtraction techniques. In particular, we consider different approaches: polynomial fitting, principal component analysis (PCA) and independent component analysis (ICA). We review the motivations and algorithms for the three methods, and show that they can all be described, using the same mathematical framework, as different approaches to the blind source separation problem. We study the efficiency of foreground subtraction both in the angular and radial (frequency) directions, as well as the dependence of this efficiency on different instrumental and modelling parameters. For well-behaved foregrounds and instrumental effects we find that foreground subtraction can be successful to a reasonable level on most scales of interest. We also quantify the effect that the cleaning has on the recovered signal and power spectra. Interestingly, we find that the three methods yield quantitatively similar results, with PCA and ICA being almost equivalent.


INTRODUCTION
The last few decades have seen a radical improvement in the quantity and quality of observational data that has transformed cosmology into a fully-fleshed data-driven scienceso much so that it has now become common to refer to the current status of the field as the "era of precision cosmology". The vast majority of these data come from two very distinct regimes, both observationally and physically: on the one hand, observations of the cosmic microwave background (CMB), emitted in the early Universe (z ∼ 1000), have allowed us to measure the values of most cosmological parameters with astonishing precision (Hinshaw et al. 2013;Planck Collaboration et al. 2013). On the other, astronomical observations in the optical range of wavelengths have supplied a wealth of data with which we have been able to characterise the late-time (z 1) evolution of the Universe and to make one of the most puzzling discoveries in cosmology: its accelerated expansion (Riess et al. 1998; Perlmutter * E-mail:david.alonso@astro.ox.ac.uk et al. 1999). In this regime, galaxy redshift surveys (Anderson et al. 2014;Contreras et al. 2013) have allowed us to draw a plausible picture describing the evolution of the primordial density perturbations observed in the CMB to form the non-linear large-scale structure (LSS) that we observe today.
There exist, however, multiple open questions in cosmology, the study of which would greatly benefit from observational data in the remaining intermediate range of redshifts. For instance, one of the most important stages in the evolution of the Universe, the epoch of reionisation (EoR) and the formation of the first stars and galaxies, must occur at z ∼ 10, but its precise nature remains unknown due to the lack of observations . Furthermore, being able to observe the distribution of matter out to redshift z 3 − 4 would make a very large volume available for LSS analyses, which could significantly improve our understanding of the process of structure formation, as well as opening the field to the study of ultralarge scales, which contain substantial information about the underlying gravitational theory and the statistics of the arXiv:1409.8667v1 [astro-ph.CO] 30 Sep 2014 primordial density field (Hall et al. 2013;Camera et al. 2013). Radio-astronomical observations offer an excellent tool in this regard through the measurement of the 21cm signal from neutral hydrogen (HI). Measuring the intensity of this line emission, due to the hyperfine spin-flip transition (ν21 1420 MHz), as a function of frequency constitutes an ideal way to trace the HI density field as a function of redshift z = ν21/ν − 1.
While the potential scientific value of these techniques is enormous, the detection of the HI signal faces several observational challenges. After reionisation, most of the neutral hydrogen in the Universe is thought to reside in high density regions inside galaxies, in what are known as damped Lyman-α systems, where it is shielded from the ionising UV photons. Unfortunately, the HI emission from individual galaxies is usually too weak to be measured efficiently, and thus a technique known as intensity mapping has been advocated (Battye et al. 2004;McQuinn et al. 2006;Chang et al. 2008;Mao et al. 2008;Pritchard & Loeb 2008;Peterson et al. 2009;Bagla et al. 2010;Seo et al. 2010;Lidz et al. 2011;Ansari et al. 2012;Battye et al. 2013; Barkana & Loeb 2005;Masui et al. 2013;Switzer et al. 2013;Bull et al. 2014), in which the combined emission from all sources in wide angular pixels is observed instead. While small angular scales are lost through this method, it is possible to retain the larger scales of most cosmological relevance, such as the baryon acoustic oscillation (BAO) scale. Thus intensity mapping constitutes a promising method to efficiently observe the distribution of matter in the Universe on cosmological scales.
The success of intensity mapping relies on being able to separate the cosmological HI signal from that of foreground radio sources emitting in the same frequency range, however. The most important of these sources, synchrotron emission from our own galaxy, is ∼ 5 orders of magnitude larger than the expected 21cm signal, even at high galactic latitudes, and so the problem of foreground subtraction is of prime importance. This topic has been widely covered in the literature (Di Matteo et al. 2002;Oh & Mack 2003;Santos et al. 2005;Morales et al. 2006;Wang et al. 2006;Gleser et al. 2008;Jelić et al. 2008;Bernardi et al. 2009Bernardi et al. , 2010Jelić et al. 2010;Moore et al. 2013), mostly for the EoR regime, and several foreground removal techniques have been proposed (Liu et al. 2009;Liu & Tegmark 2011;Masui et al. 2013;Wolz et al. 2014;Shaw et al. 2013Shaw et al. , 2014. Fortunately, as opposed to the cosmological signal, most relevant foregrounds have a very smooth frequency dependence, a fact that can be exploited to subtract them efficiently. Nevertheless, their large amplitude makes a thorough study of the consequences of foreground cleaning on the recovered cosmological signal indispensable.
Foreground removal techniques can be classified as being either blind, if they only make use of generic foreground properties like spectral smoothness, or non-blind, if a more detailed model of the foregrounds is required. In this paper we have used a large set of simulations of both the cosmological signal and several different foregrounds to study and compare the performance of different blind foreground subtraction methods. The paper is structured as follows: in Section 2 we briefly describe the physics of the HI cosmological signal and the most relevant foregrounds for intensity map-ping. Section 3 covers the mathematics of blind foreground cleaning within a unified formalism. Section 4 contains the details of the simulations used in this analysis, as well as the criteria used to compare the different methods. Finally, Section 5 presents the results of this study regarding the performance and limits of blind foreground cleaning, and Section 6 summarises our findings.

INTENSITY MAPPING AND ITS FOREGROUNDS
The emissivity (luminosity per unit frequency and solid angle) of a cloud of neutral hydrogen due to the 21cm line is given, in its rest frame, by where hp is Planck's constant, A21 = 2.876 × 10 −15 Hz is the Einstein coefficient corresponding to the 21cm line, N2 is the number of atoms in the excited state, NH is the total number of HI atoms and ϕ(ν) is the normalised line profile. From this result it is easy to calculate the intensity measured by an observer in the lightcone and hence the brightness temperature along a particular directionn in the Rayleigh-Jeans approximation (Abdalla & Rawlings 2005): T (z,n) = 3 hpc 3 A21 32πkBν 2

21
(1 + z) 2 H(z) nHI(z,n) where nHI ∝ (1 + δHI) is the comoving number density of HI atoms, kB is Boltzmann's constant and xHI is the fraction of the total baryonic mass in HI. Note that to reach this result we have neglected scattering and self-absorption of the emitted photons, as well as any relativistic effects other than the Hubble expansion. The observed HI signal is also affected by redshift-space distortions (Kaiser 1987) in the same way as galaxy number counts are (Hall et al. 2013). The most relevant foregrounds for intensity mapping can be classified as being extragalactic (caused by astrophysical sources beyond the Milky Way) or galactic. In terms of amplitude, the largest foreground by far is galactic synchrotron emission (GSE), caused by high-energy cosmicray electrons accelerated by the Galactic magnetic field. Although at the lowest frequencies the GSE spectrum is damped by self-absorption and free-free absorption, for the relevant radio frequencies it should be possible to describe it by a single power law TGSE ∝ ν β , with a spectral index β(n) ≈ −2.8 (Dickinson et al. 2009;Delabrouille et al. 2013). GSE is expected to be partially linearly polarised, and so its polarised part will undergo Faraday rotation as it travels through the Galactic magnetic field and the interstellar medium (Rybicki & Lightman 1986;Waelkens et al. 2009). This effect gives polarised GSE a non-trivial and nonsmooth frequency dependence, which would make it a very troublesome foreground if leaked into the unpolarised signal (Jelić et al. 2010;Moore et al. 2013;De & Tashiro 2014).
Another source of foreground emission within our Galaxy is the so-called free-free emission, caused by free electrons accelerated by ions, which traces the warm ionised medium. As in the case of GSE, free-free emission is predicted to be spectrally smooth in the relevant range of frequencies (Dickinson et al. 2003). All other known potential galactic foregrounds, such as anomalous microwave emission or dust emission, should be negligible below ∼ 1 GHz.
Extragalactic radio sources can be classified into two different categories: bright radio galaxies, such as active galactic nuclei, and "normal" star-forming galaxies. While the distribution of the former should be Poisson-like, the clustering of the latter should be significant, which has a direct impact on the degree of smoothness of their joint emission (Santos et al. 2005;Zaldarriaga et al. 2004). Note that radio sources have been observed up to very high redshifts (z ∼ 5), and therefore many of them will be physically behind the cosmological signal in the case of intensity mapping. However we will refer to all sources of radio emission that need to be separated from the cosmological signal as "foregrounds", trusting that the use of this term will not cause any confusion.
Other possible foreground sources are atmospheric noise, artificial radio frequency interference (which we discuss in section 5.3) and line foregrounds, caused by line emission from astrophysical sources in other frequencies. Due to the spectral isolation of the 21cm line, together with the expected low intensity of the most potentially harmful lines (such as OH at νOH ∼ 1600 MHz), the HI signal should be very robust against line confusion.

BLIND FOREGROUND SUBTRACTION
As has been noted above, the most relevant foregrounds for intensity mapping should be correlated (i.e. smooth) in frequency. Blind foreground subtraction methods make use of this property, modelling the sky brightness temperature along a given direction in the skyn and at a frequency ν as Where N fg is the number of foreground degrees of freedom to subtract, f k (ν) are a set of smooth functions of the frequency, S k (n) are the foreground sky maps and Tcosmo and Tnoise are the cosmological signal and instrumental noise components.
For a particular line of sightn measured in a discrete set of Nν frequencies, this model can be written as a linear system: where xi = T (νi,n), A ik = f k (νi), s k = S k (n) and we have grouped the cosmological signal and thermal noise in a single component ri = Tcosmo(νi,n) + Tnoise(νi,n). The problem of foreground subtraction then boils down to devising a method to determineÂ and s so that r = x −Âs is recovered as accurately as possible. The three methods described below correspond to three different approaches to this problem.

Polynomial fitting
Probably the most intuitive (and naïve) way of approaching blind foreground subtraction is to choose an ad-hoc basis of smooth functions f k that we think can describe the foregrounds and then find the foreground maps s k by leastsquares fitting the model in Eq. 4 to each line of sight. This is done by minimising the χ 2 , whereN is the covariance matrix of r. The solution is In this analysis we have neglected the correlation between different frequency bands and therefore we have assumed a diagonal covariance Nij = σ 2 i δij 1 . It is important to note that the deviations from the smooth foregrounds are due not only to thermal noise, but also to the cosmological clustering signal. For this reason, in order to optimally fit for the foregrounds we have used the joint variance of both components: Evidently it is not possible to compute σcosmo from the cosmological signal before subtracting the foregrounds. In our analysis we have done this by using the foreground-free maps, which wouldn't be available in a real experiment; we can propose three alternative methods, however: • Assuming a particular cosmological model it would be possible to quantify the clustering variance, at least at the linear level, as: where P (k) is the power spectrum for the assumed model and Wpix,i is the Fourier-space window function describing the beam and pixel size and the radial width for the i − th frequency bin.
• Again assuming a particular cosmological model, another possibility would be to simulate the cosmological signal and include instrumental effects to compute σ 2 i from the simulations.
• Through a multi-step analysis: first the foregrounds are subtracted using only the variance due to the instrumental noise (or even no variance weighting at all). The joint variance σi is then estimated from the cleaned maps and the foreground subtraction is repeated using this better estimate. The process can then be repeated until it converges. This method would be computationally more demanding, but it has the advantage of being model-independent.
Previous studies of foreground subtraction in intensity mapping (particularly for the EoR regime) have made use of this technique in log-log space, i.e. using the model: Following what has been done by other groups (Wang et al. 2006;Ansari et al. 2012), we used polynomials as basis functions in this study, i.e. f k (log(ν)) = [log(ν)] k−1 . Note that although very similar in spirit, this procedure does not adhere exactly to the model in Eq. 4, since the fitting is done in log-log space. This also implies that the fit weights σ −2 i must be translated from linear space:

Principal component analysis
Principal component analysis (PCA) applied to foreground subtraction seeks to make use of the main properties of the foregrounds (namely their large amplitude and smooth frequency coherence) to find both the foreground components s k and an optimal set of basis functions A ik at the same time. Here we present the motivation for PCA and its connection to the main equation for blind foreground subtraction (Eq. 4).
It is easy to prove that the frequency-frequency covariance matrix of any component that is very correlated in frequency will have a very particular eigensystem: most of the information will be concentrated in a small set of very large eigenvalues, the other ones being negligibly small. As an extreme case, consider a completely correlated component, whose covariance matrix is Cij = 1 (∀ i, j ∈ [1, Nν ]), and whose eigenvalues are λ1 = Nν , λi = 0 i > 1. Thus we can attempt to subtract the foregrounds by eliminating from the measured temperature maps the components corresponding to the eigenvectors of the frequency covariance matrix with the N fg largest associated eigenvalues. The explicit method is as follows: (i) Compute the frequency covariance matrix from the data by averaging over the available N θ lines of sight: T (νi,nn)T (νj,nn).
(ii) Diagonalise the covariance matrix: Where λi > λi+1 ∀i are the eigenvalues ofĈ, andÛ is an orthogonal matrix whose columns are the corresponding eigenvectors.
(iii) At this stage we identify N fg eigenvalues corresponding to the foregrounds as those that are much larger than the rest. Depending on the frequency structure of the foregrounds and the different instrumental effects this number will be more or less evident (see the discussion in section 5.1). We then build the matrixÛ fg from the columns ofÛ corresponding to these eigenvalues and model the brightness temperature for each line of sight as which is analogous to Equation 4. The foreground maps s are then found by projecting x on the basis of eigenvectors ofĈ, yielding: Note that, sinceÛ is orthogonal,Û T fgÛfg = 1, and, except for the presence of the covarianceN, Equations 13 and 14 are equivalent to Equations 4 and 6.
In the presence ofN, the frequency covariance must be computed using an inverse-variance scheme: which is equivalent to carrying out the steps above using the weighted maps xi = T (νi)/σi and multiplying by σi in the end to obtain the de-weighted temperature maps. It is easy to see that doing this we introduce the missingN −1 factors that make the equivalence between Equations 14 and 6 complete.
We thus see that the PCA method is in fact equivalent to the polynomial fitting method, with the set of basis functions A ik given by the data themselves in the form of those that that contain most of the total variance (i.e. the principal eigenvectors of the covariance matrix).
Finally, let us note that in a real experiment the situation will be more complicated, for example due to the presence of correlated instrumental noise. This problem was addressed in the pioneering analysis done by the Green Bank Telescope team (Masui et al. 2013;Switzer et al. 2013) by computing the frequency covariance matrix through a cross correlation of temperature maps corresponding to different seasons. However this yields a non-symmetric covariance matrix, and therefore its singular value decomposition (SVD) must be used, instead of the PCA. The simulations used for this analysis, however, do not contain correlated noise, and hence we will not worry about these complications. We leave the analysis of such instrumental issues for future work.

Independent component analysis
Independent component analysis (ICA) tries to solve the blind equation (Eq. 4) under the assumption that the sources s are statistically independent from each other (P (s) = i Pi(si)). Explicitly this is enforced by using the central limit theorem, according to which, if x is made up of a linear combination of linearly independent sources, its distribution should be "more Gaussian" than those of the independent sources. Therefore we can attempt to impose statistical independence by maximising any statistical quantity that describes non-Gaussianity.
FastICA (Hyvärinen 1999) is a relatively popular and computationally efficient algorithm to apply ICA to a general system. Here we will outline the operations carried out by FastICA as well as its similarities with PCA. As before, we label the brightness temperature measured in different frequencies along a given line of sightn by a vector x, however the reader must bear in mind that FastICA is provided with a number of samples of x (i.e. different pixels), which it uses to compute the expectation values in the equations below.
FastICA begins by "whitening" the data x. This implies first of all performing a full PCA analysis of the data, decorrelating them and sorting them by the magnitude of the covariance eigenvalues. The uncorrelated variables are then divided by their standard deviations so as to impose a unit variance on all of them (this is done in order to simplify the subsequent steps in the algorithm). We are then left with the corresponding equation for the whitened data where as beforeÛ andΛ are the eigenvectors and eigenvalues of the data covariance matrix. Note that although the data have been whitened, FastICA still preserves their PCA order. The problem is further simplified by requiring that the sources be also "white" (i.e., uncorrelated and unit variance), since that implies thatÂ must be orthonormal.
FastICA then tries to find the independent components by inverting Equation 16: and finding the rows ofŴ by maximising the non-Gaussianity of the individual components. FastICA parametrises the level of non-Gaussianity in terms of the negentropy J(y) = H(yG) − H(y), where H(y) is the entropy of the variable y and yG is a unit-variance Gaussian random variable. Since for all variables with the same variance the entropy is maximal for a Gaussian variable, the negentropy J is always positive for non-Gaussian variables. However, computing the negentropy requires an intimate knowledge of the probability distribution, which in general we lack. For this reason FastICA uses an approximation to J given by where (ki) are positive constants, · θ denotes averaging over all available samples (i.e. pixels) and Gi is a set of nonquadratic functions. FastICA makes use of two functions in particular 2 : The rows ofŴ are found as follows: (i) First an initial unit random vector w is chosen as one of the rows ofŴ and its projection on the data y ≡ w T x is computed.
(ii) The correct direction of w is found by maximising the negentropy of y. This is done by an iterative Newton-Raphson algorithm (renormalising the modulus of w after each step).
(iii) The process ends when a given Newton-Raphson iterations yields a vector that is close enough to the one from the previous step.
Further details regarding the algorithm and its relation to other existing ICA methods can be found in Hyvärinen (1999); Hyvärinen & Oja (2000) It is important to note that the second step in FastICA's algorithm (namely, the maximisation of the negentropy) will be unable to separate Gaussian foreground components beyond the initial PCA, since their negentropy is identically 0. This is simply a consequence of the fact that ICA is equivalent to PCA for the case in which all the sources are Gaussian 3 .
Unfortunately, as described in section 4.1, the foreground maps corresponding to point sources and free-free emission were generated as Gaussian random fields in our simulations, and therefore we will not be able to appreciate the full power of ICA in removing those. However, the bulk of the galactic synchrotron emission (which is the most relevant foreground for intensity mapping) was extrapolated from the Haslam map (Haslam et al. 1982), and is therefore non-Gaussian. Thus we should still be able to evaluate the impact of ICA methods in foreground removal.

Summary
As we have shown in the previous three sections, the three blind methods under study in this paper can be understood as three different ways of approaching the least-squares problem of Equations 4 and 6, each following different criteria in order to find the basis functionsÂ.
• Line-of-sight (polynomial) fitting imposes an adhoc set of basis functions that, we think, should be able to describe the foregrounds accurately enough.
• PCA findsÂ by requiring that the sources should be uncorrelated and account for the majority of the total variance in the data.
• ICA, like PCA, maximises the total variance but instead of uncorrelatedness it imposes statistical independence. Since these two properties are equivalent for Gaussian variables, PCA and ICA are mathematically equivalent when all the sources are Gaussian.

METHOD
In order to compare the three blind methods discussed in the previous section we have tested them on a suite of 100 fast simulations of the cosmological signal and most relevant foregrounds. In this section we will describe the characteristics of these simulations as well as the criteria used to compare the different methods.

The simulations
The simulations used here were generated using the publicly available code presented in Alonso et al. (2014) 4 . The code generates random realisations of the cosmological signal and most relevant foregrounds, and includes a basic set of instrumental effects.
The cosmological signal. We generate the cosmological signal by producing a three-dimensional Gaussian realisation of the matter density and velocity fields in a cubic grid. The Gaussian density field is transformed into a more physically motivated one using the log-normal transformation (Coles & Jones 1991). During this step we also apply a linear redshift-dependent bias and lightcone evolution based on linear perturbation theory to the overdensity field. This field is interpolated onto a set of temperature maps at different frequencies (i.e. redshifts) using Equation 2 At this point, redshift-space distortions are included using the radial velocity field. For these simulations we used the parametrisation xHI = 0.008 (1 + z) for the neutral hydrogen fraction, based on the observations of Zwaan et al. (2005) and Wolfe et al. (2005), and for simplicity we assumed that HI is an unbiased tracer of the density field (δHI = δ). Further details regarding this method can be found in Alonso et al. (2014).
A box of size L box = 8240 Mpc/h was used with a Cartesian grid of 2048 cells per side. This way we are able to generate full-sky maps to redshift zmax = 2.55 with a spatial resolution of ∆x 4 Mpc/h. The cosmological signal was generated using Planckcompatible parameters: (ΩM , Ω b , Ω k , h, w0, wa, σ8, ns) = (0.3, 0.049, 0, 0.67, −1, 0, 0.8, 0.96), and the HI temperature maps were produced in the frequency range 400 − 800 MHz (0.78 z 2.55) at intervals of δν ∼ 1 MHz, corresponding to radial separations of r ∼ 5 − 6 Mpc/h. These maps were created using the HEALPix pixelisation scheme (Górski et al. 2005) with a resolution parameter nside=512 (δθ ∼ 0.115 • ), corresponding to transverse scales 3.8 Mpc/h < r ⊥ < 8.2Mpc/h. The foregrounds. The foregrounds included in these simulations can be classified into two categories: isotropic and anisotropic. We expect that extra-galactic foregrounds, due to astrophysical sources beyond our galaxy should be isotropically distributed across the sky, while galactic foregrounds should be significantly more prominent closer to the galactic plane.
The most relevant foreground in this frequency range in terms of amplitude is unpolarised galactic synchrotron emission. In broad terms, synchrotron emission was simulated by extrapolating the 408 MHz Haslam map (Haslam et al. 1982) to other frequencies using a direction-dependent spectral index provided by the Planck Sky model (Delabrouille et al. 2013). Structure on angular scales smaller than the ones resolved in the original Haslam map and a mild frequency decorrelation was included by adding a constrained Gaussian realisation with parameters given by (Santos et al. 2005). Further details can be found in Alonso et al. (2014).
Other relevant foregrounds are point sources and freefree emission. We have modelled these as Gaussian realisations of the generic power spectra:  (2014)). As mentioned in section 3.3, ICA and PCA will only yield different results for non-Gaussian foregrounds, and therefore we will not be able to observe the advantage of using ICA over PCA for point sources and free-free emission. Any difference found between both methods will then be due only to the non-Gaussian galactic synchrotron foreground.
As has was mentioned in section 2, even though we expect the 21 cm signal to be largely unpolarised, the presence of polarised foregrounds should be a concern for intensity mapping experiments due to polarisation leakage. Since our aim in this paper is to compare the usefulness of different foreground cleaning methods, and since the amplitude of polarised foregrounds depends both on instrumental design and survey strategy, we have decided to postpone the analysis of polarised foreground removal for future work. Nevertheless, in section 5.3 we have studied the minimum degree of frequency correlation that foregrounds must have in order to be efficiently removed by blind methods.
Instrumental effects. The final observed temperature maps were generated by including the most relevant instrumental effects corresponding to a single-dish experiment (i.e. an instrument made up of a number of single-dish antennae used as a set of auto-correlators) with the parameters listed in Table 1. These are similar to those planned for the SKA-MID configuration, for which a single-dish strategy is optimal for cosmological purposes (Bull et al. 2014). Two instrumental effects were implemented: a frequency-dependent beam and uncorrelated instrumental noise.
Ideally we can parametrise the beam of such a system as being Gaussian with a simple frequency-dependent width: where D dish is the antenna diameter. For the frequency range under consideration this yields an angular resolution that ranges between 0.7 • and 1.4 • . Furthermore, we have described the instrumental noise as being Gaussian and uncorrelated, with a frequencydependent, per-pointing rms where Tinst is the instrument temperature, f sky is the observed sky fraction, δν is the frequency resolution, t obs is the total observation time, N dish is number of dishes and Ω beam = 1.133 θ 2 FWHM is the beam solid angle. For the values listed in Table 1 σN varies between 0.025 and 0.05 mK.
The process to build the full observed temperature maps is as follows: (i) We directly add the temperature maps corresponding to the cosmological signal and the different foregrounds. For computational ease and to avoid redundant calculations we then degrade the angular resolution of the total map from  Table 2. Frequency bins used for the analysis of the radial power spectrum. ∆χ is the radial comoving distance between the bin edges, and k ,max is the maximum radial wavenumber reached in that bin given the frequency resolution. For the three bins the minimum radial wavenumber k ,min ≡ 2π/∆χ is approximately 0.01 h Mpc −1 . Note that these bins were only used for the analysis of the radial power spectrum, while the foreground cleaning was done on the full frequency band (400 MHz < ν < 800 MHz).
(ii) We smooth the total temperature using a Gaussian beam with full-width-half-maximum θFWHM.
(iii) We add a random realisation of uncorrelated Gaussian noise with variance σN . Note that the per-pixel variance is similar to the one given in Equation 22, with the total number of pixels N θ substituting the factor 4π f sky /Ω beam .
At this point the observed maps should be cut to the desired survey mask. However, in order to simplify the analysis and to avoid the complication of accurately estimating the angular power spectrum of cut-sky maps, we decided to use full-sky maps for our fiducial simulations. The most important effect of an incomplete mask for foreground subtraction is a reduction in the number of pixels that can be used to determine the statistical properties of the foregrounds (e.g. estimating the covariance matrix for PCA or the negentropy for ICA). We have studied the relevance of this effect in section 5.3.4.
Finally, we note that the parameters listed in Table 1 correspond to our fiducial set of simulations. However, in order to study the effect of different instrumental configurations we have varied these parameters from their fiducial values. These variations are clearly stated in the corresponding sections below.

Clustering analysis
The most important cosmological constraints from HI intensity mapping will probably be obtained from the two-point statistics of the overdensity field, since this encapsulates the vast majority of the information regarding the underlying cosmological model. Hence the merit of a given foreground cleaning method should be judged partly in terms of its ability to recover the true power spectrum on different radial and angular scales. We have therefore defined three quantities to characterise the efficiency of the different cleaning methods: where P clean , Ptrue, Pres are the power-spectra of the cleaned maps, the true signal (cosmological signal and noise) and the residuals ("cleaned − true") respectively, and σP is the statistical error in the power-spectrum, caused both by clustering variance and instrumental noise. Thus and η quantify the bias induced by foreground subtraction on the power spectrum as a fraction of the true power spectrum and of the expected statistical uncertainties respectively, while ρ is a measure of the corresponding loss in the signal itself. A perfect foreground cleaning would yield 0 for these three quantities.
We must clarify that we have used generic term "power spectrum" here referring to both the angular power spectrum C l and the radial power spectrum (defined below), which depend on (ν, l) and (z eff , k ) respectively. This distinction is important: since the foregrounds are assumed to be extremely smooth in frequency (i.e. the radial direction) we can expect the effects of foreground subtraction to be reduced to the largest radial scales (an effect commonly known as the "foreground wedge" (Liu et al. 2014)).
It is also important to note that we have computed the ensemble averages in Equation 23 by averaging over the 100 different realisations of the cosmological signal, and the foregrounds. We believe that, given the current uncertainties in the multi-frequency description of the different relevant foregrounds, this approach is reasonably conservative in that, by averaging over foreground realisations, we effectively marginalise over these uncertainties. Consequently, the mean and variance of the quantities in Equation 23 will depend not only on the statistics of the cosmological signal, but also on the model used to describe the foregrounds.

Angular power spectrum.
Assuming a full-sky survey, for a fixed frequency shell, the angular power spectrum of the brightness temperature fluctuations ∆T is estimated by first computing their harmonic coefficients where Y lm (n) are the spherical harmonics. The angular power spectrum C l ≡ |a lm | 2 is then estimated by averaging over the symmetric index m Unfortunately this estimator is biased and non-optimal for cut-sky maps, and in fact the estimation of angular power spectra for incomplete maps is a non-trivial problem that has been widely treated in the literature (Tegmark 1997;Wandelt et al. 2001;Hivon et al. 2002). As we mentioned above, our fiducial set of simulations were generated as fullsky maps in order to avoid these complications. However we have studied the effect of an angular mask in section 5.3.4, and in this case the angular power spectra were computed using the PolSpice software package (Chon et al. 2004).

Radial power spectrum.
Because astronomical observations are performed on the lightcone, cosmological observables such as the temperature perturbation ∆T are not homogeneous in the radial direction, since they evolve in time. For this reason, the only rigorous method to analyse the clustering pattern of the overdensity field in the radial direction is indirectly through  Figure 1. Principal eigenvalues of the frequency covariance matrix for a simulation including a frequency-dependent beam (blue circles) and a constant beam (red squares), as well as for simulations containing foregrounds with different correlation lengths ξ ∈ (1, 0.05). In the first two cases there is a clear division between foreground eigenvalues and cosmological ones, although the presence of a ν-dependent beam makes the transition between the two smoother, and a larger number of foreground degrees of freedom must be eliminated. For smaller frequency correlation lengths the contribution of foregrounds spreads through a larger number of eigenvalues, making foreground contamination more troublesome.
the angular cross-correlation of maps at different redshifts (Montanari & Durrer 2012;Asorey et al. 2012). However, in practice it is possible to divide the survey into redshift shells that are wide enough to contain the most relevant radial scales and narrow enough so that we can assume a static universe at some effective redshift z eff , neglecting evolution effects. This has in fact been common practice among most galaxy redshift surveys (Anderson et al. 2014;Contreras et al. 2013), and enables much more direct, easier access to important cosmological observables. In order to obtain a direct intuition about the effectiveness of foreground subtraction in the radial direction we have followed this approach here.
We have divided our full frequency range into a number of thick redshift shells of equal bandwidth ∆ν ∼ 100 MHz, each containing the same number of frequency bins (see Table 2 for details). The location, width and number of these shells were chosen in order to avoid the edge effects described in section 5.3.2 and to include the most relevant cosmological scales. For each of these shells, every individual pixel corresponds to a different realisation of the overdensity field along the line of sight. We can therefore compute the radial power spectrum by averaging over the modulus of the Fourier transform of each line of sight where ∆χ = χ(zmax) − χ(zmin) is the comoving width of the redshift shell under consideration. We estimate the Fourier coefficients ∆T by computing the Fast Fourier Transform (FFT) of each line of sight. Since the different values of ∆T are separated by a constant frequency interval δν = 1 MHz, its FFT is calculated at constant interval in the conjugate variable corresponding to ν, δkν = 2π/∆ν. Assuming a constant effective redshift z eff for each bin, kν can be related to the radial wavenumber k through A more rigorous definition for P (k ) can be found in Alonso et al. (2014).

RESULTS
We have used 100 independent simulations as described in the previous section in order to compare the three blind subtraction methods and evaluate the ranges of scales in which the recovered maps can be reliably used for cosmology. We have studied the number of foreground degrees of freedom N fg that need to be removed (i.e. the dimension of s in Equation 4) for each method, the values of the parameters defined in Equation 23 as a function of scale and method, and the limits of applicability of these techniques. 5

Foreground degrees of freedom
By subtracting the foregrounds we are effectively removing information from the temperature maps. We should therefore be concerned about removing only the information related to the foregrounds, and minimising the loss of cosmological information. This implies minimising the number of degrees of freedom that we subtract, which ideally would correspond to the number of independent foregrounds. In our case we have included four different foregrounds (galactic synchrotron, point sources and galactic and extragalactic free-free). Additionally, the mean HI temperature is also a smooth function of the frequency, and will be inevitably removed by the foreground cleaning method. On top of this, a frequency-dependent instrumental beam will also act as an extra effective foreground, yielding a total of ∼ 6 potential foreground sources, the cleaning of which will depend critically on them being clearly distinct from the cosmological signal we hope to measure.
PCA offers an intuitive way to address this question, and to decide on the number of foreground degrees of freedom to subtract. We know that most of the information related to the foregrounds should be contained in a few of the largest eigenvalues of the frequency covariance matrix. Thus, by looking at these eigenvalues it is possible to see whether a number of them is clearly different (much larger) from the rest, and whether there exists a clear divide between foreground and cosmological eigenvalues. Figure 1 shows the first 25 principal eigenvalues for several of our simulations. One of them (blue circles) includes a frequency-dependent beam size given by Eq. (21), while another one (red squares) was generated for a constant beam size corresponding to the median frequency of the simulation. It is possible to see that in both cases there exists a clear distinction between 10 -2 10 -1 10 0 10 1 10 2 Pol. fitting Pol. fitting  Figure 2. Ratio of the power spectrum of the foreground cleaning residuals to the power spectrum of the cosmological signal as a function of the number of foreground degrees of freedom removed for the three different cleaning methods. The left and right panels show the results for the angular and radial power spectrum respectively. Each plot contains 3 sub-panels, showing the results for polynomial fitting, PCA, and ICA (in descending order). The angular power spectra shown correspond to a frequency bin 599 MHz < ν < 600 MHz, while the radial ones correspond to the second bin in Table 2. As can be seen, in most cases the efficiency of the foreground cleaning converges for N fg ∼ 6 − 7, although the distinction between both values is clearer for polynomial fitting, which attains a clear minimum for N fg = 7. This result could have been anticipated visually from Figure 1. The black lines in both plots correspond to the optimal case for the constant-beam simulation (N fg = 7 for polynomial fitting and N fg = 5 for PCA and ICA).
foreground and cosmological eigenvalues, although the presence of a frequency-dependent beam makes the transition between the two smoother. We can predict an optimal value of N fg = 5 for the constant-beam simulation, and either 6 or 7 for the ν-dependent beam using PCA.
We have studied the total number of eigenvalues to subtract by computing the ratio of the power spectrum of the residual maps to the cosmological power spectrum in one of our simulations for the three methods and for different numbers of foregrounds. Figure 2 shows this ratio for the angular (left panel) and radial (right panel) power spectra and for the three different methods (in descending order: polynomial fitting, PCA and ICA). For this plot we chose to show the results corresponding to an intermediate frequency bin 599 MHz < ν < 600 MHz for the angular case, and the second bin in Table 2 for the radial case, but similar results hold in general. As anticipated above, both PCA and ICA converge for N fg = 6 or 7, and we do not gain anything by subtracting additional degrees of freedom. For polynomial fitting, however, the optimal value is clearly N fg = 7. In view of this result we have performed the analysis below on the total ensemble of simulations using the fiducial value N fg = 7 for the three methods.

The effects of foreground removal
In order to evaluate the extra statistical and systematic uncertainties introduced by the process of foreground subtraction, and to ascertain the range of scales where the cleaned maps can be reliably used to obtain cosmological constraints we have computed the parameters η, and ρ introduced in Eq. (23) from our 100 independent simulations. Figure 3 shows the values of these parameters computed for the angular power spectrum in the frequency bin 599 MHz < ν < 600 MHz (left column) and for the radial power spectrum for the second bin in Table 2 (right column). Each plot shows the results for the three cleaning methods, and the error bars show the statistical deviation of these quantities. In all cases the three methods yield very similar (almost equivalent) results, all of them being able to clean the foregrounds reasonably well for the same number of foreground degrees of freedom. There is a very wide ρ(k ) Figure 3. The parameters η, and ρ (rows 1 to 3 respectively), introduced in Equation 23, for the angular power spectrum in the bin ν ∈ (599, 600) MHz (left panel) and the radial power spectrum for the second bin in Table 2 (right panel). Each plot shows the result for polynomial fitting (red circles), PCA (green squares) and ICA (blue triangles). The error bars in the plots show the variance of each of these quantities.
range of scales where foreground contamination is small in comparison with the expected statistical errors, and which can be reliably used for cosmology. In particular, PCA and ICA yield remarkably similar results, in spite of the latter exploiting the central limit theorem to enforce the statistical independence of the different foreground components. Thus, at least for the foregrounds simulated in this work, requiring the foregrounds to be uncorrelated (which is common to PCA and ICA) is the most relevant constraint. Furthermore, the results for polynomial fitting are very similar to the other two methods (even slightly better for certain radial scales), in spite of this method being the most naïve approach to foreground cleaning. It is also worth noting that both η and ρ become more consistent with 0 on large angular scales. This is due to the fact that these two quantities are weighed by the statistical errors, wich are inversely proportional to the square root of the number of modes available (σ(C l ) ∝ (2 l + 1) −1/2 ). On large radial scales (k 0.1 h Mpc −1 ) we can clearly see the effect of the aforementioned "foreground wedge". This is a known and reasonable result caused by the spectral smoothness of the foregrounds, which implies that any foreground leakage will contribute mainly to the largest radial scales, which are catastrophically contaminated. However, in terms of the parameters η, and ρ we do not observe any clear degradation in the small-l regime (large angular scales).
It is important to study the validity of these results across different frequencies. Figure 4 shows the values of η and ρ (upper and lower rows respectively) computed for the angular power spectrum in the 400 different frequency channels (left column) and the radial power spectrum in the 3 bins described in Table 2 (right column) using a PCA approach. Equivalent results were found for polynomial fitting and ICA. As is evident from this figure, even though foreground removal is reasonably successful for a large range of frequencies, it break down close to the edges of the frequency band, where the recovered maps are highly biased. The cosmological analysis must therefore be performed on the frequency interval not affected by these edge effects. We will discuss this effect in more detail in section 5.3.2.
In order to quantify the performance of each foreground . The parameters η and ρ from Eq. 23 (top and bottom rows respectively) for the angular (left column) and radial (right column) power spectra for all the different frequency (or redshift) bins considered in this analysis. The results shown correspond to the PCA method, but equivalent results were found for polynomial fitting and ICA. In most cases the bias due to foreground cleaning is much smaller than the statistical errors for a wide range of scales, although this breaks down on large radial scales and close to the edges of the frequency band.
cleaning method with a single number, we have computed an effective bias η eff for the angular power spectrum by averaging over all the values of l in the range of frequencies 450 MHz -750 MHz (i.e. omitting the frequencies close to the edges where the cleaned maps are less reliable). In all cases we obtain an effective bias η eff −0.2, i.e. the amount of signal loss in the power spectrum is on average 20% of the statistical errors. The bias in the radial power spectrum drops below this 20% level for scales smaller than k 0.15 h Mpc −1 . Throughout the same range of scales, the angular power spectrum of the cleaning residuals is also about one fifth of the statistical uncertainties.
The negative value of η eff implies that we are in fact overfitting the foregrounds, and that there is a net leakage of the cosmological signal into the removed foregrounds which induces a non-zero (albeit small) bias in the power spectrum. We have studied the possibility of alleviating this signal loss by decreasing the number of subtracted foreground degrees of freedom, but for any smaller N fg the leakage of foregrounds into the signal became catastrophically large on certain scales. This bias is therefore an undesirable but seemingly unavoidable effect of foreground cleaning which could, if accurately characterised, be potentially corrected for. However, the stochastic nature of this bias implies that, even if corrected for, it will induce an additional uncertainty in the measurement of the power spectrum. This additional uncertainty is given by the standard deviation of η, shown as error bars in the first row of Figure 3. While the additional uncertainty on P (k ) is relatively small (5 − 10% for k > 0.1 h Mpc −1 ), the statistical errors in the angular power spectrum must be enlarged by a significantly larger fraction (20 − 30%). We can understand this result as due to the much higher degree of stochasticity of the foregrounds in the angular direction with respect to the radial one. These results are consistent for the three different cleaning meth- ods and across the whole frequency range, and show that foreground removal will inevitably degrade the cosmological constrains that can be obtained from the measurement of the temperature power spectrum. Besides this increase in the amplitude of the statistical uncertainties in the power spectrum, foreground removal could potentially also affect their correlation structure. Thus it is important to compare the full covariance matrices of the power spectrum for the cleaned and the true temperature maps. We have estimated the covariance matrix for a fixed frequency bin and for a fixed scale l from our 100 simulations. The top row in Figure 5 shows the correlation matrix (rij ≡ Cij/ CiiCjj) for a fixed frequency ν = 600 MHz for the true and foreground-cleaned maps (left and right panels respectively), while the bottom row shows the analogous matrices for a fixed angular scale l = 50. According to these results, the variation in the correlation structure of the statistical uncertainties of the angular power spectrum caused by foreground subtraction is quantitatively negligible. This disagrees with the results of (Wolz et al. 2014), who find a noticeable effect on the correlation matrix induced by foreground removal using FastICA. This disagreement could be due to a number of reasons, from differences in the simulations to the different analysis pipelines. We found analogous results for the correlation matrix of the radial power spectrum (see Fig. 6), in spite of the large bias induced by foreground subtraction on large radial scales.
Although the bias in the recovered power spectra is a good way to parametrise the effect of foreground removal in intensity mapping, certain observables, such as the baryon acoustic oscillation (BAO) scale are known to be extremely robust against systematic alterations in the over-  Table 2 for the true cosmological signal (left panel) and the cleaned maps (right panel).  Table 2 in the same two cases.
all shape of the power spectrum. Therefore it is also relevant to visualise directly the effect of foreground cleaning on the power spectrum. The left panel of Figure 7 shows the average angular power spectrum at ν = 600 MHz computed for the foreground-free map (black solid line) and for the foreground-cleaned map (circles with error bars showing the standard deviation). Although the angular resolution of single-dish observations with the SKA-MID configuration is not good enough to measure the angular BAO, the overall shape of the power spectrum is not dramatically changed by foreground removal, and therefore it should be possible to constrain large-scale cosmological observables with intensity mapping. Analogously, the right panel of Figure 7 shows the radial power spectrum in the second bin of Table  2 divided by a smooth (no-BAO) fit to the overall shape of P (k ), thus highlighting the BAO wiggles. The positions of the BAO wiggles are not significantly altered by foreground cleaning, and hence it should be possible to measure the radial BAO scale accurately with this configuration.

Other effects
As has been shown in the previous section, blind foreground cleaning methods are reasonably effective, and should allow us to recover the cosmological signal with a relatively small bias compared to the statistical uncertainties. However it is interesting to explore how much this result depends on the assumptions and approximations used in this work. In this section we have thus studied several effects that could potentially affect the performance of foreground subtraction in cases that depart from the fiducial simulations used in the previous section. Since, as we have also seen, the performance of the three blind cleaning methods under study is almost equivalent, in this section we will show only results corresponding to a PCA analysis, although similar results hold for ICA and polynomial fitting.

Foreground smoothness
The success of foreground cleaning for intensity mapping relies heavily on the foreground sources being very correlated (smooth) in frequency. Even though we tried to be conservative in this regard when simulating the foregrounds, it is of key importance to quantify the minimum degree of smoothness required for a successful subtraction. By doing this we can, at least qualitatively, assess the effects of a potentially non-smooth foreground, such as polarisation leakage. The SCK model (Eq. 20) provides an ideal way to quantify this degree of smoothness in terms of the frequency correlation length ξ, which describes the number of e-folds in frequency over which the foregrounds do not deviate significantly from a perfect correlation. We have generated foreground maps using Gaussian realisations of this model for different values of this parameter, from ξ = 1 (corresponding to the model used for point sources) to ξ = 0.05, and keeping all other parameters fixed to the values used to simulate extragalactic point sources. For each value we generated 100 independent realisations, which were combined with the maps of the cosmological signal through the process defined in section 4.1. Figure 1 shows the first 25 principal eigenvalues for simulations with different values of ξ. As ξ decreases, the foregrounds become less correlated in frequency and their contribution spreads over a larger number of eigenvalues. It is then easy to understand how foreground cleaning would fail: at some point the number of foreground degrees of freedom that must be subtracted becomes too large, and too much information about the cosmological signal is lost.
We performed a full PCA analysis on the 100 simulations for each value of ξ and computed the bias figure of merit η in each case. The result described above is explicitly shown in Figure 8: as the foregrounds become "noisier" more signal is lost in cleaning them, and the bias in the estimated power spectra eventually becomes larger than the statistical errors. In view of this results we estimate that an effective correlation length ξ 0.25 is necessary for a reliable foreground cleaning using a blind approach. This value was estimated as the correlation length for which the average effective bias |η eff | in the angular power spectrum becomes larger than 30%.

Edge effects
In order to verify that the larger cleaning bias found at high and low frequencies is not related to the specific frequency values or to a computational error in the simulation of the foregrounds, but to the fact that these regions are the boundaries of the frequency band under study, we have performed the following test: we re-analysed the simulations in a restricted frequency range, cutting out two bands of width 20 MHz at either end of it. By doing so we confirmed that the regions of unreliable foreground cleaning shift to the new band edges (see Figure 9).
In an intensity mapping experiment, it is therefore important to allow for a buffer of frequencies at either end of the band in which the foreground cleaning is to be carried out, where the results will not be reliable. More interestingly, this justifies extending the frequency coverage of the survey beyond the values of interest for cosmology (i.e. above ν = 1420 MHz) in order to improve the foreground subtraction.

Radio Frequency Interference
Until now we have not taken into account the possible presence of man-made radio frequency interference (RFI), which can completely prevent the usage of certain radio channels for astronomical purposes. The usual way to deal with RFI is to place the experiment in a remote location, out of reach of artificial electronic signals, however there will inevitably be certain bands in the radio spectrum that will be rendered useless by RFI. The presence of RFI could affect foreground cleaning by limiting our ability to characterise the frequency dependence of the foregrounds.
We have attempted to quantify the effect on foreground subtraction by crudely simulating the presence of RFI in our simulations. We did so by flagging certain frequency channels as RFI and removing them entirely from the analysis. Two RFI models were simulated, which we labelled random and clustered: for a fixed fraction of flagged channels, random RFI is simulated by selecting those at random inside the simulated frequency range. For clustered RFI, on the other hand, the same number of flagged channels are collected in a small number of wide frequency bands or "clusters". While the effect of random RFI on foreground subtraction should be minimal as long as the fraction of RFI is low enough, clustered RFI could have a more significant influence: if the clusters are wide enough in frequency they will act as effective "boundaries" and, as shown in section 5.3.2, the cleaned maps close to these edges will be less reliable.
A number of simulated observations were generated, varying the fraction of RFI-dominated channels and the number and width of RFI clusters. For random RFI, no significant degradation of the foreground cleaning process was observed even for RFI fractions of up to 20%, where the average effective bias rises only from −0.2 to η eff −0.22 (left panel in Figure 10). However, for clustered RFI the boundary effect quoted above becomes relevant for clusters wider than ∆νRFI ∼ 20 MHz (right panel in Figure 10). In this limiting case the average effective bias becomes η eff ∼ −0.28.
Another further complication caused by the presence of RFI is that a more involved treatment is necessary in order to study the clustering statistics of the cosmological signal in the radial direction, in the same way that angular masking affects the computation of the angular power spectrum C l . The estimation of the radial power spectrum in the presence of RFI lies beyond the scope of the present work, however, and so we have not studied the corresponding effect on P (k ).  The first and last 20 MHz of the fiducial frequency band were cut out (grey, hash-marked bands) and the foreground subtraction was done in the restricted band. The region of large bias near the boundaries of the frequency range observed in Figure 4 is now shifted to the new boundaries, showing that it is truly an edge effect that will affect blind foreground subtraction in general.

Angular masking
There are a number of reasons why we cannot expect fullsky coverage for any realistic intensity mapping experiment. For example, ground-based experiments can only observe slightly more than one celestial hemisphere, depending on their location, and in most cases this maximal coverage is not reached due to technical or time limitations. We have studied the potential effects of incomplete sky coverage on foreground cleaning by performing a full PCA analysis on our fiducial set of simulations using different masking criteria. We can anticipate that angular masking should affect the efficiency of foreground removal in two different ways: (i) On the one hand, a reduced sky fraction implies a smaller number of independent samples (pixels) that can be used to characterise the foregrounds (e.g. to calculate the frequency covariance matrix in the case of PCA), thus potentially reducing the quality of the cleaned maps. In order to quantify this effect we created sky masks where only a region in the southern celestial hemisphere below a given declination δ thr < 0 is visible.
(ii) On the other hand, masking regions of the sky where we expect that foreground subtraction will be complicated (e.g., regions close to the galactic plane) can have the opposite effect, improving the efficiency of the method. In order to study this possibility we masked all pixels where the synchrotron temperature at 408 MHz, given by the Haslam map, is above a given threshold T thr . This threshold must be found as a compromise between covering regions of high synchrotron emission and minimising the loss of sky coverage. We decided to use T thr = 40 K, which still leaves a sizeable fraction of the sky observable (f sky ∼ 0.8).
As mentioned before, and incomplete sky coverage also complicates the estimation of the angular power spectrum. For this reason the software package PolSpice (Chon et al. 2004) was used to estimate the C l s, and we only studied scales on which we found this estimation to be unbiased for all the different masks (l 10). Also note that the statistical uncertainties on the angular power spectrum increase for a masked sky by a factor f −1/2 sky . In order to eliminate this effect from the comparison between different masks, in this case we used the ratio of the power spectrum of the cleaning residuals to that of the cosmological signal as a figure of merit. Figure 11 summarises our findings. The figure shows the ratio C l,res / C l,cosmo for the bin 600 MHz < ν < 601 MHz and for different masks. As could be expected from the discussion above, as we decrease the observable fraction of the sky the cleaning becomes less efficient and the residuals grow, especially on large scales. However, when the appropriate parts of the sky are masked (i.e., regions with large foregrounds), the cleaned maps become more reliable, although the magnitude of this improvement is not very large. Fiducial, f sky =1 dec <0 • , f sky =0.5 dec <−60 • , f sky =0.07 T synch <40 K, f sky =0.81 Figure 11. Ratio of the power spectrum of the residuals to that of the cosmological signal for different masks: full-sky (red), halfsky (δ thr = 0, green), δ thr = 60 • (blue) and T thr = 40 K (black).

Angular resolution
Due to the fiducial SKA dish size, a single-dish intensity mapping survey is only able to resolve very large scales (θ θFWHM 1.4 • ). However it is relevant to study the performance of blind foreground cleaning for an experiment with a better angular resolution. Two effects will be most relevant: on the one hand a better angular resolution provides a larger number of independent samples (pixels) that can be used to model the frequency structure of the foregrounds, thus potentially improving the cleaning. On the other, the statistical errors on smaller angular scales are also smaller (σ(C l ) ∝ (2 l + 1) −1/2 ), and hence the requirement on the foreground cleaning bias becomes more stringent.
We have generated observed sky maps for our 100 simulations assuming a constant angular beam size of θFWHM = 0.3 • , and keeping all other instrumental parameters equal to their fiducial values (except for the angular resolution parameter, which was increased to nside = 256). After applying the three blind methods, similar results to the ones quoted above for the fiducial simulations were found on all scales and frequencies (see Figure 12), which shows that blind foreground cleaning should also be successful for more futuristic experiments. It is also worth noting that due to the fact that in this case we used a constant beam width, only 5 foreground degrees of freedom had to be subtracted.

Instrumental noise
A large level of instrumental noise could also in principle prevent a correct characterisation of the frequency structure of the foregrounds, and thus affect the effectiveness of foreground subtraction. In order to address this possible issue we generated an ensemble of 100 simulations with a value of the system temperature Tinst twice as large as the fiducial one (quoted in Table 1) foreground cleaning bias η. This is shown in Figure 13. In all cases we are able to separate the smooth foregrounds from the cosmological signal and noise to the same level as in the fiducial case (section 5.2).

DISCUSSION
In this work we have studied the efficiency of blind foreground removal methods for HI intensity mapping. By "blind" we refer here to methods that only assume generic characteristics about the foregrounds (in particular spectral smoothness). Due to the lack of multi-frequency information about the foregrounds relevant for intensity mapping, blind cleaning methods will inevitably be necessary for the first experiments.
In particular we have tested and compared three different methods: polynomial line-of-sight fitting, principal component analysis and independent component analysis. We have shown that all of the methods can be described as different approaches to the same mathematical problem of blind source separation -all of them attempt to filter out the foregrounds by fitting their frequency dependence to a combination of smooth functions, and they only differ in their approach to finding these functions.
In order to carry out this study we have generated a suite of 100 independent simulations of both the expected intensity mapping signal and the most relevant foregrounds using the publicly available code by Alonso et al. (2014). The simulated sky maps were then modified to simulate the angular resolution and instrumental noise level expected for the SKA-MID configuration (see Bull et al. (2014) for details).
In order to compare the different methods we have characterised the efficiency of foreground cleaning through the quantities η, and ρ given in Equation 23, which describe the bias induced on the power spectrum (both angular and radial) of the cleaned maps as well as the contamination in the signal itself. We find that the cosmological signal can be optimally recovered by removing a total of N fg = 7 foreground degrees of freedom for the three methods. The analysis of the foreground-cleaned maps has yielded several important results: • Not only can the three methods be described using the same mathematical formalism, but they also yield quantitatively similar (almost equivalent) results.
• The cosmological signal can be successfully recovered across a wide range of scales and redshifts. However, the foreground removal induces a bias in the power spectrum that must be taken into account.
• In the angular power spectrum, this bias is of the order of 0 − 20% of the statistical uncertainties (η ∼ −0.2) for all angular scales and most frequencies.
• The radial power spectrum in turn is strongly affected by foreground removal on large scales (k 0.1h Mpc −1 ), but the bias η is rapidly suppressed on smaller ones.
• Foreground removal also increases the amplitude of the statistical uncertainties in the power spectrum by as much as 20 − 30% in the case of the angular power spectrum. No relevant variation in the correlation structure of these uncertainties was observed, however.
• In spite of this bias we have verified that some of the most important cosmological features, such as the BAO wiggles, are not strongly affected by foreground removal, and therefore it should be possible to measure them with good accuracy.
• Blind foreground cleaning relies heavily on the spectral smoothness of the foregrounds. In order to quantify the minimum degree of smoothness for a successful removal, we generated simulations for foregrounds with a varying frequency correlation length. We have determined that, in order to limit the average effective bias to |η eff | < 0.3, a frequency correlation length of ξ > 0.25 is necessary.
• Close to the boundaries of the frequency band, the foreground-cleaned maps cease to be reliable. This suggests that a buffer of frequencies at either end of the band should be allowed for, and that an intensity mapping experiment could benefit from extending its observations beyond the frequency values of cosmological interest.
• We have studied the effect of RFI (i.e. an incomplete frequency coverage) on the cleaned maps. Although foreground cleaning only deteriorates noticeably when up to 20% of the frequency channels are lost due to RFI at random, the effect is much stronger when the RFI-dominated channels are clustered into wider bands.
• We have also quantified two possible effects that an angular mask could have on foreground subtraction. On the one hand we observe a deterioration in the cleaned maps for smaller fractions of the sky, due to the smaller number of independent samples (pixels) that can be used to characterise the foregrounds. On the other, the quality of the foreground-cleaned signal can benefit from masking regions of the sky where the foregrounds are larger.
We have left a number of analyses for future work, such as studying the effect of polarisation leakage and correlated noise or characterising the effects of foreground cleaning on the cosmological parameters inferred from the power spectrum. However, the results presented here should be relevant in order to produce more realistic forecasts for intensity mapping, optimise the design of future intensity mapping experiments and analyse the data extracted from them.
The computer tools developed and used in this analysis have been made publicly available at http://intensitymapping. physics.ox.ac.uk/codes.html.