Extended Delta-map: a map-based foreground removal method for CMB polarization observations

In order to extract information about inflationary gravitational waves using $B$-mode patterns of cosmic microwave polarization anisotropy, we need to remove the foreground radiation from the Milky Way. In our previous delta-map method for foreground removal, the number of observation bands was limited to the number of parameters of the assumed foreground model, and therefore it was difficult to improve the sensitivity by increasing the number of observation bands. Here, we extend the previous method so that it can be adapted to an arbitrary number of observation bands. Using parametric likelihood and realistic foreground and CMB simulations, we show that our method can increase the sensitivity to the tensor-to-scalar ratio $r$ without inducing any significant bias.

However, celestial sources also emit microwave photons with linear polarization that mimic the cosmological B-mode, the so-called foreground emission, which disturbs the precise measurement of the CMB B-mode [15]. The two representative foreground emissions in CMB polarization measurements are synchrotron and thermal dust emissions, which have different spectral energy distribution from the black body and dominate at lower and higher frequencies, respectively. Therefore, we can remove the foreground emissions using multiple-band observations owing to their frequency dependences.
Many foreground removal and component separation methods exist to extract the cosmological B-mode signal, e.g., Commander [16], SEVEM [17], SMICA [18], NILC [19], and GNILC [20]. Among them, we proposed a method called "delta-map" based on a linear combination of the observed maps to remove the foreground components [21]. This method allows the directional dependence of the frequency spectrum of foreground emissions up to the first order to be taken into account by considering the differences in the observation maps 1 . Because the method uses one additional frequency map to eliminate the directional dependence of one foreground parameter, the usable number of maps is limited by the number of foreground parameters in the assumed model. Specifically, suppose that we assume one foreground model with N parameters and CMB signal in all the multi-frequency maps. In this case, the delta-map method requires one map for the CMB, one map for the zeroth (or spatially uniform) foreground parameter, and N maps for the first order expansion of the N parameters; we need exactly (2 + N ) maps. This prevents us from obtaining improvement of the sensitivity in observation by increasing the number of observing bands. More concretely, if we consider a power-law synchrotron foreground model with one parameter β s and a one-component dust model with two parameters β d and T d , the number of observation bands should be exactly six. This is a waste of resources since some future CMB missions plan to have plenty of bands, e.g., LiteBIRD will have 15 bands [26,27]. In this paper, we improve the method by constructing a parametric likelihood in a Bayesian way so that more observation bands can be used. 1 A similar perturbative approach in multipole space has also been investigated in Refs. [22][23][24][25].
The rest of this paper is organized as follows. In Sect. 2, we first review the previous delta-map method that is based on the linear combination, and then derive a new delta-map method based on a parametric likelihood. In Sect. 3, we explain the foreground models and simulations used in this work. We show the results of the measurements of CMB parameters with some sky simulation setups in Sect. 4. In Sect. 5, we discuss the results and conclude this work.

Methodology
We first introduce the delta-map method described in Ref. [21]. We decompose a linear polarization signal of a microwave component at a frequency, ν, from a line of sight,n, to two orthogonal Stokes parameters, Q(n) and U (n), which we observe in thermodynamic temperature units. Let us vectorize all Q(n) and U (n) from alln in use and concatenate them into one column vector as where N pix is the number of pixels in use and the subscript T represents transpose.
The CMB signal is common in any frequency band except for beam effects of instruments, and can be written as s CMB (ν) = s CMB . We assume that each frequency has each independent Gaussian noise, s N . A foreground signal varies in frequency space according to N parameters that vary over the sky, p I (I = 1, 2, · · · , N ), as where s b is a signal vector at a pivot frequency in brightness temperature units, D ν ( p I ) is a diagonal matrix with the same dimension of s f , and g ν is the conversion factor from the brightness temperature to the CMB thermodynamic temperature given by In this paper, we use power-law synchrotron and one-component modified black body (1MBB) models to fit synchrotron and dust foreground emissions, respectively. The functional form of one pixel of D ν ( p I ) for power-law synchrotron is where ν * is the reference frequency, which we set as ν s * = 23 GHz, and β s (n) is the synchrotron spectral index. The one for 1MBB is is the dust spectral index, and we set ν d * = 353 GHz.
The basic idea of the delta-map method is to consider a spatial variation of foreground signal parameters up to the first-order expansion 2 as wherep I represents the mean value of the parameter p I over the sky, I is an identity matrix and "•" means Hadamard product. Hereafter, we consider two representative foreground components: synchrotron radiation and thermal dust emission. With these two foreground components, the foreground signal is written as where the superscripts and subscripts "s" and "d" denote synchrotron and dust emission foreground components, respectively, and N s and N d denote the numbers of parameters for synchrotron and dust foreground emissions, respectively. HereD ν represents frequency dependence of the signals up to the first-order s f and given bỹ 2 The standard deviation of directional variation of foreground parameters considered in this paper is less than 10% of the spatial uniform parameter. We can assume second-order variation to be less than 1%; hence we neglect higher-order terms. and, at the last line, we define the foreground signal vector as In summary, we assume the observed data can be decomposed as We need to observe the sky at multiple frequencies to remove foreground components using its frequency dependence. When we observe the sky at N ν frequencies, the total data can be expressed as where This is the baseline expression of the "delta-map" method.
In Ref. [21], one frequency, ν 1 , was chosen as the CMB channel and other channels with weights α T = (α ν 2 , . . . , α ν Nν ) were added to remove foreground contributions and to have a cleaned CMB map as with The above equation can be solved only when the number of frequencies, N ν , is equal to the number of degrees of freedom, (N d + 1) + (N s + 1) + 1, where N s and N d is the number of parameters of synchrotron radiation and thermal dust emission models, respectively. This is the caveat and weak point of the previous delta-map method because we cannot increase sensitivity by increasing N ν . We shall mitigate this point in the following sections.
Since the cleaned CMB map should contain only the CMB and the combined noise, we can construct our likelihood as where the covariance matrix C is given by [21,28] By minimizing this likelihood, we can determine a CMB parameter, r, and foreground parameters, p I d d and p Is s , where I d ∈ (1, . . . , N d ) and I s ∈ (1, . . . , N s ). It is known that the determined foreground parameters are biased when we use the full likelihood function [29].
We can avoid the bias by using the χ 2 term, namely, the first term of Eq. (16). Technically, the following iteration scheme (Algorithm 1) was adapted in Ref. [21] to determine both the CMB and foreground parameters.

Extended delta-map
In this section we introduce our method using a parametric likelihood. We describe the details of the derivation in Appendix A.
We start from Eq. (11). By subtracting the CMB and foreground terms from the observed maps m, we can construct the likelihood of data as where N = diag(N ν 1 , · · · , N ν Nν ) is a noise covariance matrix.

Algorithm 1 Iteration algorithm of the minimization
Set initial values of p d I d and p s Set initial values of foreground parameters Fix r out and minimize χ 2 term against p d I d and p s I s Fix foreground and minimize all the −2L term against r out We use Bayes' theorem to relate the posterior distribution to the likelihood as where S CMB 0 and S f are covariance matrices of the CMB, s CMB , and foreground signal, s f , respectively,p I represents foreground parameters of both synchrotron and thermal dust emissions, and P ( m) is a normalization factor. Here we use flat prior onp I and have We marginalize it over the CMB signal assuming a Gaussian distribution, to have where S CMB = D CMB S CMB 0 D CMB T . This is Eq. (58) in Ref. [21].
From here, we deal with P s f , S f term. Our strategy is to marginalize foreground probability function assuming its mean is zero, s f = 0 and its covariance is "vague", S f −1 → O. We follow the methodology described in Sect. 2 of Ref. [30] to deal with the "vague" foreground covariance matrix. Following the method, we obtain where and put the noise terms, m T N −1 m + ln |N|, into the const. term. For details of the derivation of Eq. (22), see Appendix A. To determine the CMB parameter (r) and foreground parameters (p I ), we follow the same procedure, namely Algorithm 1, as in Ref. [21]. We set We obtain the "extended delta-map" likelihood as Eq. (22), which allows us to determine CMB and foreground parameters without the band number constraint. This likelihood has another benefit. Because we use "matrix inversion lemma" (Appendix B) in the derivation, we can reduce the computational cost in using the Cholesky solver for S CMB + N in Eq.

Models and simulations
We use simulations to validate our methodology. We use the "PySM" package [31] to produce polarized synchrotron and thermal dust emission maps with direction-dependent spectral parameters. For the synchrotron map, we use the power-law synchrotron model, "s1", which is based on the Q and U maps from WMAP-9 [32] and the spectral index map from Ref. [33]. For the thermal dust emission map, we use the one-component modified black-body (MBB) model, "d1" [34], and the two-component MBB model, "d4" [35], both of which are based on the Planck HFI results. Note that the results used the intensity map in addition to Q and U polarization maps, and the model adopted common foreground parameters β d and T d for Q and U Stokes parameters at each sky pixel. Cosmological CMB maps are generated using synfast function of HEALPix package [36] from the power spectra calculated using CAMB [37] with the Planck 2018 cosmological parameters for "TT,TE,EE+lowE+lensing" [2]: A s = 2.100 × 10 −9 , and n s = 0.9649. We generate CMB maps with some values of the tensor-to-scalar ratio, r.
We use experimental parameters such as frequency bands and angular resolutions similar to the LiteBIRD mission [26] (Table 1). For the input noise, we assume white noise with standard deviation of σ N = (π/10800)(w −1/2 p /µK ) µK str −1/2 [28], where we use the "Polarization sensitivity" column of Table 1 for the values of w −1/2 p . We incorporate the beam smearing effect from the finite angular resolution, whose fullwidth-half-maximum (FWHM) values are given by the "Beam size in FWHM" column of Table 1, by multiplying the spherical harmonics coefficients of the CMB and foreground maps by the corresponding Gaussian beam transfer function for individual frequencies. In this paper, we use the map resolution parameter of N side = 4 and set max = 2N side . To ensure that the maps are limited to low resolution, we follow the method described in Ref. [38]; specifically, we de-convolve each frequency map and re-convolve all the maps with Gaussian 2200 arcmin beam, which is 2.5 times the pixel size of N side = 4. Even though our method can be applied to higher resolution maps, we choose N side = 4 (or max = 8) which covers the so-called reionization bump. Since we do not apply any delensing scheme, we cannot effectively increase the sensitivity to r because of the cosmic variance of lensing B-mode. If we can increase max further to cover the so-called recombination bump, we could increase the sensitivity. However, such high-resolution analysis is limited by our computational resources.
In Sect. 4.4, we will show the case in which the frequencies of the two highest frequency bands are increased. For that we show the replaced parameters in brackets. The replaced polarization sensitivity and beam size in FWHM are extrapolated using power-law function. Since our method approximates the directional dependence of foreground parameters up to the first order, we need to mask the very bright Galactic plane. We follow Ref. [21] and use the "P06 mask" by WMAP polarization analysis, whose f sky = 0.56. Since our method uses pixel space maps, we do not apply any apodization to the mask.

Results
We apply our new method to simulated CMB maps with foregrounds. In the following sections, we show the results of estimating tensor-to-scalar ratio r with CMB + synchrotron maps. Recently, it has been shown that shifting the observation bands toward higher frequencies can improve the determination of foreground parameters [27]. Thus, we show the results of estimating tensor-to-scalar ratio r, with the two highest observation frequencies given in  Summary of our analysis. Estimated r out value is from 50th percentile. For r in = 1.0 × 10 −2 , we use (16th,84th) percentiles for its uncertainties. For null r in , we use 95% C.L. as the upper bounds. "(High)" indicates the result with "modified high-frequency model" where we increase the frequencies of the highest two frequency bands.  We summarize all the estimated r out values for all the setups in Table 2.

Estimation with synchrotron radiation foreground only
In this section we consider the case where only synchrotron radiation is the foreground source. Because we use a power-law synchrotron radiation model, we need at least three observation bands to fit our parametric model. We first estimate tensor-to-scalar ratio, r, and synchrotron spectrum index, β s , with three exact bands ν ∈ (40, 60, 140) GHz against input r in = 0.01. We show the histograms of the estimated r and β s as blue boxes of the left and right panels of Fig. 1 from 1000   from the constraint of the number of bands that existed in the previous delta-map method. We find that r and β s are determined better by increasing the number of the observation bands.
We show the results of the estimation of r out for the case in which r in is null in Fig. 2.
We find that r is constrained to r out < 0.12 × 10 −2 and r out < 0.07 × 10 −2 using three and nine bands, respectively.

Estimation with thermal dust emission foreground only
Next, we consider the case where only the thermal dust emission is the foreground source.
We choose one-component modified black body (1MBB) model as the foreground dust model, for which we need at least four bands to fit our parametric model, and we set r in to be 0.01.
We first estimate the tensor-to-scalar ratio parameter r and foreground parameters, T d and β d , with the four exact bands, ν ∈ (140, 235, 280, 402) GHz. We show the histograms of the estimated r out from 1000 realizations in Fig. 3. We find that uncertainty on r out is large and the lowest bin is dominant, which reflects the fact that we impose an r ≥ 0 prior on r so that S CMB 0 is a positive definite matrix.
We show 2D histograms of the estimates of T d and β d in the left panel of Fig. 4. We find that foreground parameters T d and β d are not precisely determined with the four bands. The undetermined foreground parameters cause the large error in estimating r. This can be mitigated by increasing the number of frequency bands. Next we show the estimated r out with nine bands, ν ∈ (100, 119, 140, 166, 195, 235, 280, 337, 402) GHz, in the orange histogram of Fig. 3, and the estimated T d and β d with nine bands using the 2D histogram in the right panel of Fig. 4. We find that we can determine foreground parameters well using the nine bands, and thereby we can determine r out more precisely.

Estimation with thermal dust and synchrotron foreground emissions
Finally, we consider one-component modified black body dust and power-law synchrotron as our foreground models, and estimate the CMB parameter, r, and the foreground parameters, T d , β d , and β s . We use all the 15 bands in Table 1. Because we found that the number  of bands is not sufficient to determine T d , we impose some priors, T d = 21.8 ± 1σ K and ±1.0 × 10 −5 σ K, on T d , where σ is the standard deviation of the dust temperature measured by Planck [34].
The estimated r out are shown in the left panel of Fig. 5. We find that stronger constraints on T d lead to a more precise estimate of r out . However, the 50th percentile is slightly biased to positive, as shown in Table 2. We can see this bias in the estimation of r for null r in , as shown in the right panel of Fig. 5, which shows r out is slightly biased to positive and the upper bound with 95% C.L becomes larger (Table 2).

Modified high-frequency case
We next see the results with the modified high-frequency model, which increase frequencies of the high-frequency bands of telescopes.
To see the determination of foreground parameters, we show 2D histograms of β d and T d for 15 normal bands and 15 modified bands in the left and right panels of Fig. 6, respectively. We find that we can determine foreground parameters precisely in the case of modified high-frequency model.
We show histograms of r out against r in = 0.01 with flat prior and a 1σ prior in Fig. 7 for the modified high-frequency model. Compared to the result with the fiducial frequency band setting shown in the left panel of Fig. 5, the uncertainties on r become much smaller while the 50th percentile values of r are positively biased as shown in Table 2 if the modified high-frequency model is considered. This tendency was also found in Ref. [21]; the smaller uncertainty comes with larger systematic bias if one sets the foreground frequency bands far away from the CMB bands.

Mismodeling
To demonstrate the case in which we assume a wrong foreground model, we use the two-component modified black body (2MBB) model as the input but the 1MBB model to estimate r and the foreground parameters. We show the histograms of the estimated r from 1000 realizations with flat and 1σ priors on T d in Fig. 8. We find that the estimates of r are biased by ∼ 0.01.
This bias was not found in our previous paper [21]. The reason could be because the determinative power of the foreground parameters depends on the number of frequency  bands. In Ref. [21], r and 1MBB foreground parameters were estimated using only six bands.
Because of the small number of bands used in the analysis, the foreground parameters were not well determined and therefore the bias on r was small. To confirm this argument, we estimated r with seven and 10 bands using the extended delta-map method. The results showed that the bias was small for the case with seven bands, and large with 10 bands as well as with 15 bands.
We show 2D histograms of foreground parameters, T d and β d , in the left and right panels of Fig. 9 with flat and 1σ priors on T d , respectively. Without a prior, estimated T d reaches the bound.

Summary and discussion
In this paper, we have improved the "delta-map method" [21] by constructing a parametric likelihood in a Bayesian way so that more observation bands can be used. By incorporating the covariance of the foreground emission as "vague", we have extended the method. Sample codes are available at extended-deltamap GitHub repository https: We have tested the "extended delta-map method" with realistic simulations assuming LiteBIRD-like telescopes. In the case with one foreground component, we find that the extended delta-map method can estimate both the CMB parameter (tensor-to-scalar ratio r) and the foreground parameters, even if we use more frequency bands than the minimum required bands (Sect. 4.1 and Sect. 4.2). Moreover, we also found that the parameters of the foreground model are better determined when more observation frequency bands are used, and the tensor-to-scalar ratio r is better estimated accordingly. This is an improvement and benefit compared to the previous delta-map method.
Next, we apply our method to the model with two foreground components, synchrotron and dust. It turns out that determining dust foreground parameters β d and T d becomes difficult in this case. The reason for this probably lies in the interplay between the synchrotron and dust foreground emissions. To determine β d and T d simultaneously, one needs to observe the dust spectrum over a wide frequency range. However, in the two-component foreground model, synchrotron radiation dominates the low-frequency side and masks the dust component, effectively reducing the number of observed frequency bands that can be used to estimate dust foreground parameters. To aid the determination of the foreground parameter, therefore, we have imposed a prior on dust temperature based on the Planck results.
In this case, the error in the estimate of r becomes small, while we have a small positive bias in the estimate. The small bias may indicate that the mean value of T d estimated in the delta-map method is not necessarily the same as T d derived by the maps with higher angular resolutions, e.g., Planck.
A positive bias is also found in the case using a modified band configuration with higher frequencies. If we increase the frequencies of the two highest observation bands, we can determine the foreground parameters better, as we discussed above, while the estimated CMB parameter is biased. This tendency was already found in in Ref. [21]. The reason for this is probably the breakdown of the perturbative treatment of the foreground parameters in our method. Although it is easier to remove foreground radiation when the frequency bands are closer to each other, internal template methods, including ours, have the disadvantage of removing the CMB at the same time, resulting in relatively larger noise. Thus, we are faced with the familiar dilemma of systematic and statistical errors that are inherent in statistical analysis.
Lastly, we test the case with mismodeling by fitting the two-component modified black body model with the one-component modified black body model. We find that the estimated CMB parameter is biased.
In this paper, we have not considered detailed characteristics of telescopes or detectors, e.g., bandpass average discussed in Ref. [21]. We can include the detailed characteristics in the transfer matrices. The basic idea of the delta-map method is to consider the spatial variation of foreground signal parameters perturbatively. Though we have only considered the firstorder expansions of spatial variations of foreground parameters, we can consider higher-order expansions, which may improve the estimates of the foreground and CMB parameters. We leave this study for future works.

A Derivation of likelihood
We use the methodology the methodology described in Ref. [30] to deal with "vague" foreground covariance matrix.
When we marginalize P ( s f , S f ) in (21) , we have When we assume that the mean of foreground signal is zero, s f = 0, we have where we used matrix inversion lemma (B1), (B2).
When we are ignorant about the covariance of the foreground signals, we take the limit where S f −1 → O and have where we discard the terms ln |S f | following Ref. [30].
This is similar to the likelihood function (Eq. (61) of Ref. [21]) except for the additional term, ln D (S CMB + N) −1D . We will revisit this difference in Appendix C and show that this term is necessary to reproduce the results of Ref. [21].
Next, we apply matrix inversion lemma (B1), (B2) to (S CMB + N) −1 and ln |(S CMB + N)|, to reduce the calculation cost of the large covariance matrix. Using (S CMB + N) . When we substitute Eqs. (A4) and (A5) into Eq. (A3), we have where we define Here we try to keep symmetry to write down each term because we find that the estimated parameters are biased when we calculate the log-likelihood value using asymmetric terms.
Finally, we summarize newly defined matrices and the vector in Table A1.

B Matrix inversion lemma
In the derivation of the likelihood function, we use Woodbury, Sherman, and Morrison formula, so-called "matrix inversion lemma", where Z and W are the invertible matrices and U and V are matrices with corresponding dimensions.
For log-determinants, similar equation exists:

C Comparison of likelihood
In this section, we show that the estimated parameters with Eq. (22) are equivalent to those estimated from Eq. (C4) in the case in which we assume one foreground component and one parameter, β, where the number of frequencies is N ν = (N d (= 1) + 1) + 1 = 3.
The cleaned CMB map (14) can also be expressed as D −1 m CMB , where the subscript "CMB" means that we take the elements related to the CMB, as described above Eq. (50) in Ref. [21].
First, let us calculate this cleaned CMB map. The elements of D matrix are as The inverse matrix of D is as where ξ ij = g ν i g ν j (D ν i ,β D ν j − D ν i D ν j ,β ) and Ξ = ξ 23 I ξ 31 I ξ 12 I . Then the cleaned CMB map can be expressed as The corresponding covariance matrix (17) is calculated as where K = ΞNΞ T .
Therefore, we can write the likelihood function used in Ref. [21] as We will show that Eq. (22) is equal to this equation.
First, we calculateD T N −1D matrix and its determinant as Using them, we have This agrees to Eq. (C4) except for the term, ln |ΞD CMB |. This does not change the estimate parameters, since, in Ref. [21], foreground parameters are determined only with chi-squared term and CMB parameters are determined with the total likelihood function by fixing foreground parameter.