Delta-map method to remove CMB foregrounds with spatially varying spectra

We extend the internal template foreground removal method by accounting for spatially varying spectral parameters such as the spectral indices of synchrotron and dust emission and the dust temperature. As the previous algorithm had to assume that the spectral parameters are uniform over the full sky (or some significant fraction of the sky), it resulted in a bias in the tensor-to-scalar ratio parameter $r$ estimated from foreground-cleaned polarization maps of the cosmic microwave background (CMB). The new algorithm,"Delta-map method", accounts for spatially varying spectra to first order in perturbation. The only free parameters are the cosmological parameters such as $r$ and the sky-averaged foreground parameters. We show that a cleaned CMB map is the maximum likelihood solution to first order in perturbation, and derive the posterior distribution of $r$ and the sky-averaged foreground parameters using Bayesian statistics. Applying this to realistic simulated sky maps including polarized CMB, synchrotron and thermal dust emission, we find that the new algorithm removes the bias in $r$ down to undetected level in our noiseless simulation ($r\lesssim 4\times 10^{-4}$). We show that the frequency decorrelation of polarized foregrounds due to averaging of spatially varying spectra can be accounted for to first order in perturbation by using slightly different spectral parameters for the Stokes $Q$ and $U$ maps. Finally, we show that the effect of polarized anomalous microwave emission on foreground removal can be absorbed into the curvature parameter of the synchrotron spectrum.

We extend the internal template foreground removal method by accounting for spatially varying spectral parameters such as the spectral indices of synchrotron and dust emission and the dust temperature. As the previous algorithm had to assume that the spectral parameters are uniform over the full sky (or some significant fraction of the sky), it resulted in a bias in the tensor-to-scalar ratio parameter r estimated from foreground-cleaned polarization maps of the cosmic microwave background (CMB). The new algorithm, "Delta-map method", accounts for spatially varying spectra to first order in perturbation. The only free parameters are the cosmological parameters such as r and the sky-averaged foreground parameters. We show that a cleaned CMB map is the maximum likelihood solution to first order in perturbation, and derive the posterior distribution of r and the sky-averaged foreground parameters using Bayesian statistics. Applying this to realistic simulated sky maps including polarized CMB, synchrotron and thermal dust emission, we find that the new algorithm removes the bias in r down to undetected level in our noiseless simulation (r 4 × 10 −4 ). We show that the frequency decorrelation of polarized foregrounds due to averaging of spatially varying spectra can be accounted for to first order in perturbation by using slightly different spectral parameters for the Stokes Q and U maps. Finally, we show that the effect of polarized anomalous microwave emission on foreground removal can be absorbed into the curvature parameter of the synchrotron spectrum. simplicity we have not written explicitly experiment-specific effects such as a gain calibration factor, beam smearing, pixel window function, or any other smoothing/filtering that may be applied to data. Such operations will be common to all sky signals at the same frequency channel.
The coefficient g ν is given by which converts the units from brightness temperature to the CMB thermodynamic temperature (T CMB = 2.725 K). While this expression does not take into account averaging over the bandpass of detectors, we shall include it in Sect. 2.5. The function D ν denotes the frequency dependence of the foreground emission in brightness temperature units, and p I (n) a set of parameters (I = 1, 2, · · · , N ) that characterize the emission law and can vary across the sky. For example, if we consider a power-law foreground emission component, then N = 1 and p 1 (n) is a power-law spectral index β(n). Thus, D ν = (ν/ν * ) β(n) . Decomposing the spectral parameters to the mean value and perturbation as p I (n) =p I + δp I (n), we obtain where we wrote D ν,J ≡ ∂D ν /∂p J p J =p J . The central assumption we make throughout this paper is that we truncate the expansion at first order in δp I [12,13]. Therefore, when a foreground component is parameterized by N parameters that vary across the sky only weakly, we can interpret the emission as a superposition of (N + 1) template maps; namely [Q f , U f ] ν * (n) and δp I (n)[Q f , U f ] ν * (n) (I = 1, 2, · · · , N ). If the O(δp 2 ) term is significant compared with the first two terms, it may cause a bias in r as foreground residuals. The bias becomes greater when the foreground channels are taken further away from the CMB channel. We find, however, that the bias is negligible for the band specification and realistic simulated sky maps used in this work, as we shall show later.
To subtract the foreground component in Eq.
(1) with help of Eq.
(3), we form a linear combination of (N + 2) maps at different frequencies as 3/31 where α j are fitting coefficients, ν CMB denotes a frequency employed as a CMB channel, and ν j the foreground channels. The fitting coefficients are found by setting the foreground terms in the above equation (i.e., the second and third lines) to zero: g νCMB D νCMB,J + N +1 j=1 g νj D νj,J α j = 0 , for J = (1, 2, · · · , N ).
The coefficients α j can be obtained numerically by −A −1 d νCMB 1 . Once the coefficients are determined, we obtain a foreground-cleaned CMB map x as Since this is a linear combination of maps, we can construct a cleaned CMB in pixel, spherical harmonics, or wavelet space. Multi-component foregrounds can be incorporated straightforwardly by expanding the entries of A. For example, if we use D ν for dust and S ν for synchrotron, we may define a new vector for synchrotron as s ν ≡ g ν S ν (p J ), S ν,1 , · · · , S ν,M T , where M is the number of parameters characterizing synchrotron, and write the matrix A as There are N + M + 2 coefficients to be determined, and are given by α = −A −1 ( d νCMB , s νCMB ) T . Thus, obtaining one cleaned CMB map requires at least N + M + 3 frequencies. In Sect. 4, we derive the formula for a cleaned CMB map when we have more frequency channels than this minimum number. 1 The explicit form of A is 4/31

Power law
To obtain a better feel for the new algorithm, let us work out three simpler, concrete examples. First, we consider a single power-law spectrum with a spatially varying spectral index:

Bandpass average
The above formalism applies when we have detectors with a delta-function response to a single frequency. In reality the sky signal must be averaged within a given bandpass of detectors.
To incorporate the bandpass average, we first average an observed intensity as where the last equality holds when the bandpass function W (ν) is a top hat function with a width of ∆ν. We then relate I(ν) to the thermodynamic CMB temperature fluctuation δT as where B ν (T ) is a blackbody intensity spectrum. The same relation holds for the Stokes parameters Q and U .

6/31
We use the brightness temperature as an intermediate step between the intensity and thermodynamic temperature. The brightness-thermodynamic temperature conversion factor g ν (Eq. 2) with bandpass averaging is then given by There is no averaging for ν 2 in the numerator because it is the intensity that is averaged, and conversion from the intensity to brightness temperature is merely an intermediate step.
Using these quantities, our sky model (Eq. (1)) is now expressed as As [Q f , U f ] ν * is an auxiliary variable, there is no need to bandpass-average it. The average is done only for variables depending on ν. Therefore, to incorporate bandpass averaging, all we need is to replace g ν D ν with g ν ν 2 D ν /ν 2 . Incorporating bandpass averaging in the foreground removal algorithm is important because it changes the effective frequency at which intensity of foregrounds should be evaluated. Using an incorrect bandpass leads to a bias in r. How precisely we should know the bandpass needs to be determined by the requirement for science, e.g., a target value of r, coupled with a careful analysis including foreground removal and systematic errors in our knowledge of the bandpass. This is work in progress. For the rest of this paper we shall ignore bandpass averaging in the simulations and leave detailed study of its effect for future work.

Cleaning one map
Our likelihood for a cleaned CMB map x (Eq. (11)) is where the covariance matrix C is given by [8] C(r, s,p I ) = C tens (r) + C scal (s) and C tens and C scal are the tensor and scalar CMB signal covariance matrices (including beam smearing, pixel window function, and any other extra smoothing/filtering applied to data), respectively, calculated from theoretical power spectra following Appendix A of [8].
Noise covariance matrices N ν and E diag will be explained after Eq. (32).

7/31
Theoretical power spectra are given by where c tens,EE ℓ and c tens,BB ℓ are the E-and B-mode polarization power spectra from the tensor mode with r = 1, respectively, and c scal,EE ℓ and c lens,BB ℓ are the E mode from the scalar mode and the B mode from the gravitational-lensing effect, respectively.
The parameter s rescales the amplitude of the scalar perturbation, and the fiducial value is s = 1. In [8], we found that marginalization over the scalar amplitude s was able to remove a chance correlation between the template coefficients and the tensor-to-scalar ratio r.
In this paper, we work with low-resolution maps at a Healpix [15] resolution of N side = 4. At this resolution and with very low instrumental noise we assume in this paper (anticipating future space missions such as LiteBIRD [16]), the signal covariance matrix becomes illconditioned due to a strong scalar E-mode signal which is not band-limited. To solve this numerical issue, we follow the procedure described in Appendix A of [17]; specifically, we apply a Gaussian smoothing to a signal-plus-noise map with 2200 arcmin (FWHM), which is 2.5 times the pixel size at N side = 4, and resample the smoothed map to N side = 4. As the smoothed map at N side = 4 is dominated by the scalar E-mode signal at all angular scales supported by the map resolution, the covariance matrix of this map is singular. In order to regularize the covariance matrix, we add an artificial, homogeneous white noise of 0.2 µK arcmin to the smoothed map, such that the map becomes noise dominated at the Nyquist frequency of the map: The noise covariance matrices N νCMB and N νj in Eq. (29) become non-diagonal because of smoothing, and E diag is a diagonal matrix for the artificial noise used to regularize the covariance matrix. We now wish to estimatep I , r, and s from the likelihood given in Eq. (28). We follow the following procedure: 1. Initialization: Set initial values for s, r, andp I ; 2. Iteration step I: Minimize the χ 2 part of the likelihood (the first term in Eq. (28)) 2 with s and r fixed. During minimization, the covariance matrix C and its inverse C −1 are updated at each evaluation ofp I . 3. Iteration step II: Maximize the likelihood given in Eq. (28) for s and r with the foreground parametersp I fixed to the values found during the iteration step I. During maximization, C, C −1 , and the determinant |C| are updated at each evaluation of s and r. 4. Return to step 2 until a convergence criterion is satisfied.
We find that ten iterations are typically sufficient to meet the convergence criterion we set as |r new − r old | < 10 −3 r old . To maximize (minimize) the functions we have used the nlopt library 3 .
Finally, we include the mask in the covariance matrix following [6] (see also Appendix A). We use the "P06 mask" defined by the WMAP polarization analysis, which retains 73% of sky for the analysis. We degrade the original P06 mask to N side = 4.
For N side = 4 simulations, analyzing 1000 maps roughly takes 1000 seconds on a laptop (4 cores, 16GB memory). The most time-consuming part is matrix inversion, whose computational cost scales as the number of matrix elements cubed; thus, the computational cost scales as ∝ N 6 side .

Cleaning multiple maps
We may use the Delta-map method to clean two (or more) maps at two (or more) CMB frequencies.
Here we consider a case in which we clean two maps at different CMB frequencies (e.g., 100 GHz and 140 GHz) using lower-and higher-frequency maps as the common template maps for cleaning. Having two frequency channels cleaned separately (instead of combining them into one cleaned map) is useful for null tests, as we describe in Sect. 5.3. In this case, the map vector becomes where x νCMB1,2 are cleaned CMB maps at frequencies ν CMB1 and ν CMB2 , given by The noise covariance matrix for this map vector is given by where the components of the noise covariance matrix are, for example, Here n QQ ν , n QU ν , and n U U ν are the noise covariance matrices for the Stokes Q and U maps at frequency ν. The signal covariance is given more simply by where s QQ , s QU , s U Q and s U U are the signal covariance matrices for the Stokes Q and U of the CMB. Here, the same signal matrix elements appear for different frequency channels; however, it should be understood that they are different in practice due to different beam smearing and filtering depending on channels. We have not written this effect explicitly for simplicity throughout this paper.

Test
Before moving on to simulations with realistic foreground models, we test the algorithm using simplified ones. Specifically, we generate foreground maps matching exactly the form assumed in Eq. (15), i.e., power-law emission expanded and truncated at first order in δβ.
For Q and U maps and power-law indices of synchrotron and dust, we use the same maps as in [8], in which the mean synchrotron and dust indices areβ s = −3.00 andβ d = 1.647, respectively. We add scalar and tensor modes of CMB with a tensor-to-scalar ratio of r = 0.001. We do not add instrumental noise but add artificial noise to regularize the covariance matrix. We then apply the Delta-map method to estimate r,β d , andβ s .
For this test we use CMB+foreground maps generated at 40, 60, 140, 230, and 340 GHz, and use the lowest two and highest two frequencies to clean a map at 140 GHz. We do not apply bandpass averaging. Fig. 1 shows the values of r,β s , andβ d obtained from 1000 random realizations of the CMB. We successfully recover the input values.
Next, we apply the Delta-map method for a MBB for dust (Sect. 2.4), while the simulated maps always have a power-law dust (only in this section). For this case we generate another map generated at 400 GHz so that we have the highest three frequencies for dust. We find that we recover the input values of r andβ s . This is because, in the limited frequency range considered here, a MBB can mimic a power law with a spectral index of 1.647 by appropriately tuningβ d andT d . (MBB becomes a power law in the Rayleigh-Jeans limit, i.e., k B T d /hν → ∞.) The clear correlation seen in the right panel of Fig. 1 supports this interpretation.

Bayesian derivation of the Delta-map method 4.1. Likelihood
So far we have formulated the Delta-map method in a "bottom-up approach"; namely, we started with a plausible form of a foreground-cleaned CMB map given in Eq. (11) and constructed a likelihood given in Eq. (28). In this section we formulate the Delta-map method in a "top-down approach" by starting with the likelihood of data m given general sky signal s, mixing matrix D, and noise covariance matrix N :  Fig. 1 Values of r and the mean foreground parameters obtained from 1000 random realizations of the CMB. No instrumental noise is included. The simulation also includes power-law synchrotron and power-law dust with spatially-varying indices, and the mean indices areβ s = −3.0 andβ d = 1.647. While we always use the Delta-map method for a power-law spectrum (Sect. 2.2) for synchrotron, we compare two Delta-map methods for dust: power-law and MBB (Sect. 2.4). The input map for dust is always a power-law (only in this test). The green (denoted as "PL") and purple (denoted as "MBB") points show the results from the Delta-map method for power-law dust and MBB dust, respectively. We use We have concatenated the Stokes parameter data at all pixels and frequencies as We similarly concatenate the noise covariance matrix N , which now becomes a 2N pix N freqby-2N pix N freq matrix. (The factor of 2 is for Q and U .) Here, N pix is the number of pixels and N freq is the number of frequency channels, andn i is a sky direction unit vector toward ith pixel. We have also defined the sky signal vector as whose number of elements is 2N pix (N temp + 1), where N temp is the number of foreground ("template") components. When we have synchrotron and dust, N temp = 2.
The number of rows of the mixing matrix D is 2N pix N freq , and the number of columns is 2N pix (N temp + 1). We may write D using 2N pix N freq -by-2N pix sub-matrices as and similarly for D synch . Here, β d,i ≡ β d (n i ) etc., and each block in D dust should include the entries for both Q and U , i.e., each block is a 2N pix -by-2N pix matrix.
The effect we have not written explicitly here is the effect of a gain calibration, beam smearing, pixel window function, or any extra smoothing/filtering applied to data. These effects will modify all elements of the mixing matrix; e.g., "1" in the CMB matrix becomes a product of the calibration factor, beam transfer and pixel window function operators, etc. The next step is to expand the mixing matrix to first order in spatial variations of the spectral parameters. We redefine the signal vectors as as well as the mixing matrix as where dust and synchrotron have N (I = 1, · · · , N ) and M (J = 1, · · · , M ) spectral parameters, respectively. The subscript ", I" denotes derivatives with respect to the Ith foreground parameter (e.g., β d and T d for dust and β s and C s for synchrotron). All elements in this new mixing matrix are evaluated at the mean spectral parameters. The form of the likelihood given in Eq. (40) remains unchanged. 12/31 Specifically, and similarly for synchrotron etc. Here, I is a 2N pix -by-2N pix identity matrix. When we have only CMB, dust and synchrotron in the sky, the number of elements of s is 2N pix (N + M + 3) and D becomes a 2N pix N freq -by-2N pix (N + M + 3) matrix. When N freq = N + M + 3, D becomes a square matrix and invertible. This is a special case in which the number of frequency channels is just enough to solve for a cleaned CMB map. Now that the mixing matrix D does not depend on pixels, Eq. (40) can be written trivially in pixel, spherical harmonics, or wavelet space. This is an important advantage of linearization used by the Delta-map method.

Delta-map as the maximum likelihood solution
Maximizing the likelihood with respect to the CMB signal, we obtain [18] where the subscript "CMB" indicates that we take the elements relevant to the CMB, and the superscript "ML" stands for the maximum likelihood value. We can evaluate this in pixel, spherical harmonics, or wavelet space. When the number of frequency channels is just enough to solve for a cleaned CMB map, D is invertible and we obtain CMB ML = D −1 m CMB . For example, when we have a spatially uniform power-law component as foreground, the solution for a cleaned CMB map from two maps at two frequencies is where D ν = (ν/ν * )β. Note that ν * cancels. If we identify ν 2 with the reference frequency ν * , the expression further simplifies to CMB , which coincides with the usual internal template cleaning solution with α = (g ν1 /g ν * )D ν1 (see, e.g., Eq. (6) of [8]). We can obtain the same result from Eq. (11) with α given by −A −1 d νCMB and ν 1 = ν CMB . When we have a spatially-varying power-law component as foreground, the solution for a cleaned CMB map from three maps at three frequencies is where The numerator evaluates to 13/31 where we have canceled ν * with that in the denominator. We find that this result reproduces the numerator of Eq. (17), [Q, U ] νCMB + 2 j=1 [Q, U ] νj , up to a multiplicative factor if we identify ν 3 with ν CMB .
We thus conclude that Eq. (11) gives the maximum likelihood solution for a cleaned CMB map to first order in perturbation, provided that the number of frequency channels is just enough to solve for a cleaned CMB map. We need to use Eq. (49) when we have more channels, which would give a less noisy CMB map.

Should we cancel CMB in a template map?
SEVEM (see Appendix A.6 of [3]) is an internal template method used by the Planck collaboration. In this method one constructs a template foreground map by first taking difference between two frequency channels (so that CMB cancels) and fits this difference map to a CMB channel and subtracts the foreground. Specifically, when we have one foreground component and three frequency channels, SEVEM constructs where α is a fitting coefficient. (For this case, ν CMB = ν 1 .) When the foreground has a spatially uniform spectrum, the coefficient is ), which gives an unbiased estimate of CMB. We find that this is not the maximum likelihood solution given in Eq. (49). Assuming that the noise covariance matrix is diagonal and homogeneous over pixels, where "2 perm." indicates two permutation terms. While this may look complicated, there is a clear interpretation. To see this, let us define which is the Maximum likelihood solution when we have only two frequency channels ν i and ν j (Eq. (50)). We find Therefore, the maximum likelihood solution is equal to three cleaned CMB maps combined optimally, but is not equal to the SEVEM construction.

Posterior of the parameters
The next task is to derive a posterior distribution of the mean spectral parametersp I and the cosmological parameters such as the tensor-to-scalar ratio r. We start by using Bayes's theorem to relate the posterior distribution to the likelihood given in Eq. (40) as where S is a covariance matrix of the sky signal s = s CMB + s f (CMB s CMB and foregrounds s f ), P (D, s, S) is the prior distribution, and P ( m) is Bayes's (normalization) factor. Note 14/31 thatp I is a vector of the parameters that determine the mixing matrix D. We shall assume a flat prior on D, i.e., P (D, s, S) ∝ P ( s, S). We first marginalize Eq. (57) over the CMB signal. To this end we assume a Gaussian probability density distribution for the CMB signal part of the prior: is a 2N pix -by-2N pix CMB signal covariance matrix with no beam smearing, pixel window function, extra filtering, or calibration factor.
Integrating both sides of Eq. (57) over s CMB , we obtain where S CMB = D CMB S CMB 0 (D CMB ) T , s f is the foreground sky signal vector with P ( s f , S f ) being its prior, andD is a new mixing matrix eliminating CMB, i.e., Note thatD is not a square matrix and thus not invertible. For example, when we have only dust and synchrotron as foregrounds, and dust and synchrotron have N and M spectral parameters, respectively,D becomes a 2N pix N freq -by-2N pix (N + M + 2) matrix; however, to be able to solve for a cleaned CMB map we need N freq ≥ N + M + 3. We maximize the marginalized likelihood part of Eq. (58) (the first line on the right hand side) with respect to the foreground sky signal to obtain the maximum likelihood value of s f as Note that s ML f is not the maximum likelihood value of Eq. (40) but is the maximum likelihood value of the CMB-marginalized likelihood, Eq. (58), ignoring the prior term.
Inserting s ML f back into Eq. (58), we obtain the final expression for the posterior distribution: This formula is the foundation of the Delta-map method. Once again, asD no longer depends on sky directionsn, we can evaluate the posterior in pixel, spherical harmonics, or wavelet space by using m, S CMB , and N in the appropriate basis.
To understand this rather complicated expression, let us use an example of a spatially uniform power-law component as foreground and work with two maps at two frequencies.
Then the first two lines yield where CMB ML is given by Eq. (50). We find that this expression agrees with the likelihood we constructed in Eq. (28) with appropriate coefficients α j for this example, up to the diagonal 15/31 noise (which we have not added here yet). Therefore, we conclude that Eq. (28) gives the CMB-marginalized posterior distribution ofp I and the cosmological parameters, with the maximum likelihood estimate of foregrounds for the CMB-marginalized likelihood. As the determinant term in Eq. (61) does not depend on the mixing matrix (hencep I ), maximizing the posterior with respect top I with a flat prior, P ( s f , S f ) = constant, amounts to minimizing χ 2 as in Eq. (33). This justifies our algorithm presented in Sect. 3.
We could have marginalized Eq. (58) over s f . Then it would also modify the determinant which, in turn, would depend onp I . This would be the situation in which we maximize Eq. (28) including the determinant forp I , r, and s simultaneously. However, we find that this procedure produces a biased result forp I . The same behavior was observed in [18] for a flat prior on s f . An intuitive reason for this is that, when foregrounds are allowed to vary to unreasonably large values by a flat prior, the marginalized distribution hasp I that is far away from the maximum likelihood values. Therefore, a better approach would be either to use s ML f instead of marginalizing over it (which we have adopted here), or to marginalize over s f with a reasonable choice of S f (see [19,20] for this approach), or to marginalize over S f .
Note that the posterior distribution given in Eq. (61) does not make distinction between "CMB channels", "synchrotron channels", and "dust channels". This is a nice feature; while a heuristic construction of cleaned CMB maps based on identifying low-and high-frequency maps as foreground channels (discussed in Sect. 2) as well as its likelihood given in Eq. (28) is intuitive and easy to understand/interpret, it might give readers an impression that there is some arbitrariness in choosing which channels are CMB etc. Consider the case where we have far more channels than the number of foreground components. Then, how should we decide which ones are CMB and foreground channels? The posterior distribution we have derived in this section tells us that this arbitrariness does not exist: we can use Eq. (61) to combine all the data optimally to estimate r and the mean foreground parameters.
For the rest of this paper, we shall explore the simplest cases in which the number of frequency channels is just enough to solve for one CMB map and the necessary number of the mean foreground parameters. Then Eq. (28) is the appropriate likelihood. That is to say, N freq = N + M + 3. For more general cases in which N freq is greater, Eq. (61) should be used.

Sky model
Our fiducial sky model consists of three components: polarized CMB, synchrotron, and thermal dust emission. Later in Sect. 6.2, we also explore the effect of AME on cleaning synchrotron.
For synchrotron, we use "SynchrtronPol-commander 0256 R2.00" at 30 GHz as a template, which was derived from Planck maps using the Commander algorithm [21]. We extrapolate this map to other frequencies assuming power-law emission with spatially-varying spectral indices β s . The spatial variation is taken from [22], which was derived from WMAP 23 GHz and Haslam 408 MHz maps.
For dust component, we explore two possibilities: one-component and two-component MBB dust. The latter is to test robustness of the Delta-map method for more complex sky model; 16/31 namely, we build the Delta-map method assuming one component, and apply it to a sky map containing two-component dust. Our starting point is the dust polarization template "DustPol-commander 1024 R2.00" at 353 GHz, which was derived from Planck maps using the Commander algorithm [21]. Extrapolation to other frequencies is done by either one-or two-component MBB.
The one-component MBB is given by Eq. (21), with maps of β d (n) and T d (n) taken from the Commander analysis given in [21].
A two-component MBB model is motivated by an observation that the one-component model does not explain simultaneously the Rayleigh-Jeans (microwave) and Wien (infrared) regions of the dust spectrum observed by the FIRAS and DIRBE instruments on board COBE, respectively [23]. A two-component MBB dust model has been proposed to explain the observations. Physically, two components might represent different dust grains (e.g., silicate-and carbon-dominated grains) with different temperatures. Our two-component model is based on [24]. The model assumes the following frequency dependence in brightness temperature: with ν 0 = c/λ 0 and λ 0 = 100 µm. This function then gives D ν = F ν /F ν * . Here, a 1 , a 2 , β 1 , and β 2 are constant fitting parameters, whereas T 2 (n) (hot dust temperature) is a spatiallyvarying parameter. The remaining parameter, T 1 (n) (cold dust temperature), is derived from T 2 assuming thermal equilibrium with the same interstellar-radiation field [23], i.e., Here we have set the constant parameters as β 2 = 2.82, β 1 = 1.63, a 1 = f 1 q r , a 2 = (1 − f 1 ) with f 1 = 0.0485, and q r = 8.219 [24]. The spatially varying parameters T 2 (n) and [Q f , U f ] ν * (n) with ν * = 353 GHz are set according to the Planck results [24].

Results
We apply the Delta-map method to estimate r from 1000 random realizations of the CMB at a Healpix resolution of N side = 4. We do not include instrumental noise here but investigate the effect of noise in Sect. 7. We prepare two sets of simulations: one with no tensor mode (r input = 0) and another with r input = 10 −3 . We smooth foreground maps and resample them at low-resolution pixels of N side = 4 following the procedure described in Sect. 3. We use the P06 mask defined by the WMAP polarization analysis [6] and implement it in the likelihood evaluation following Appendix A.
We   to find the best-fitting parameters for each realization. We find that all the best-fitting parameters are well within these parameter ranges listed above for all the 1000 realizations.
In Fig. 2, we show the results from applying the Delta-map method for powerlaw synchrotron (Sect. 2.2) and one-component MBB (Sect. 2.4) to CMB+power-law synchrotron+one-component MBB simulations, i.e., the simulations and assumptions match up to first order in spatial variation of the foreground parameters. For r input = 0 (left panel), we find no significant bias, with a slightly larger uncertainty in r for Case B. The uncertainties are dominated by cosmic variance of the gravitational lensing effect [8] and are found to be r < 0.34 × 10 −3 for Case A and r < 0.48 × 10 −3 for Case B (95% CL). These results are much better than our previous study that found a significant bias for the P06 mask, 18  δr = 2 × 10 −3 [8], because our previous algorithm had to assume that the spectral parameters are uniform across the sky. The Delta-map method takes care of the spatial variation of the spectral parameters, as promised. Note that, for r input = 0, a large number of samples fall into the first bin because foregrounds are successfully removed and we have put a hard prior of r ≥ 0.
Next, we apply the Delta-map method for power-law synchrotron and one-component MBB to CMB+power-law synchrotron+two-component MBB simulations, i.e., the simulations and assumptions no longer match for dust. Can we still recover the input r without a significant bias? In Fig. 3 we show the recovered r. For r input = 0 (left panel) we find r < 0.26 × 10 −3 for Case A and r < 0.31 × 10 −3 for Case B (95% CL). For r input = 10 −3 (right panel) we find r = (1.02 ± 0.42) × 10 −3 for Case A and (1.05 ± 0.43) × 10 −3 for Case B (68% CL).

19/31
Interestingly we obtain slightly a better result for this case, but the difference with respect to the previous case is well within the error bar.
We thus conclude that the Delta-map method recovers the input value of r without a significant bias in the presence of realistic spatially-varying foreground spectral parameters. The slight difference in the uncertainties for Case A and B arises from foreground residuals. Generally, in the absence of instrumental noise we expect smaller uncertainties if the foreground channels are closer to the CMB channel, because a mismatch of foreground emission between the CMB and foreground channels decreases as the CMB and foreground channels become closer to one another.

Null test
To test our algorithm more precisely, we perform a null test; namely, we clean two maps (100 and 140 GHz) following the procedure in Sect. 3.2, estimate r, and take the difference to see if there is any residual bias in the difference. This is a powerful null test (see [25] for application to the optical depth parameter τ ), as it is free from cosmic variance, allowing for precise test with a smaller error bar. We must perform this kind of test on the real data to make sure that a future detection of r is not due to a residual foreground. As we do not include instrumental noise yet, the estimates of r at 100 GHz (r 100 ) and 140 GHz (r 140 ) should be identical if the foreground emission is removed adequately.
In the left panel of Fig. 4, we show that r 100 and r 140 are indeed highly correlated. In the right panel, we show the distribution of the difference, r 100 − r 140 , as well as those of r 100 and r 140 . We find that the distribution is consistent with the null hypothesis and that the small spread of the distribution of r 100 − r 140 is due to foreground residuals. We find r 100 − r 140 = (0.72 ± 2.98) × 10 −5 (68% CL).

Decorrelation
Internal template methods assume that a map of [Q, U ] at one frequency can be used to predict [Q, U ] at other frequencies as a function of foreground parameters. However, this assumption breaks down and leads to a phenomenon called "decorrelation", when we have a superposition of multiple components with different foreground parameters; e.g., superposition of multiple dust components with different values of β d and T d along the line of sight [26,27]. The same effect arises when we pixelize (or coarse-grain) a map with big pixels containing many smaller pixels with different values of β d and T d . Currently there is no evidence for the former, intrinsic decorrelation effect [28], whereas the latter effect can be modeled given the distribution of foreground parameters on smaller pixels. As we shall show in this section, one way to deal with this decorrelation effect is to use the foreground parameters that are slightly different for Stokes Q and U . T d,i and a polarization direction γ i . We then observe where I d is an unpolarized dust amplitude in brightness temperature units and Π i is a polarization fraction. We have omitted the sky direction vectorn to simplify notation. Here we assume that it is only the dust temperature T d,i that varies and we fix the dust emissivity β d , but it is straightforward to include it in the final result. We consider Q first. Writing dust temperatures as T d,i =T d (1 + δ i ), we obtain If we define a new dust temperature fluctuation variable and truncate expansion at first order in δ i , we find Now, we can always write A and B as A = Π cos(2γ) and B = Π sin(2γ) by defining Π and γ by Π ≡ √ A 2 + B 2 and tan(2γ) ≡ B/A. We obtain the final result: Therefore, to first order in temperature perturbation, Q and U behave as if they had slightly different dust temperatures. This makes an observed polarization direction, γ obs , different from γ because Importantly, the observed polarization direction changes with frequency, i.e., γ obs = γ obs (ν). This is the origin of frequency decorrelation. We can account for variations in any other foreground parameters (e.g., β d , β s , C s ) by using slightly different values for Q and U .
6.1.2. Impact on r. We include decorrelation in our two-component MBB dust simulation as follows. We perturb temperature of hot dust, T 2 , in Q and U as where ǫ Q and ǫ U are Gaussian random numbers with unit variance and σ T is a parameter that controls the effect of decorrelation. Temperatures of cold dust, T Q 1 and T U 1 , are related to T Q 2 and T U 2 via Eq. (64). Using these temperatures, we extrapolate the Planck dust polarization maps in brightness temperature units at 353 GHz to other frequencies using where I ν (T 2 (n)) is the frequency spectrum in the two component model given by Eq. (63). We can implement T Q 2 and T U 2 directly into the Delta-map method by increasing the number of mean temperatures from oneT d to twoT Q 2 andT U 2 . However, in this section we 22  instead apply the Delta-map method for one temperatureT d and see how much impact the decorrelation has on our estimation of r. To this end, we generate temperature fluctuations with r.m.s. of 1 K, i.e., σ T = 1 K and include them in the hot dust temperature of twocomponent MBB dust. To avoid negative temperature by fluctuations, we do not perturb it if T 2 < 5 K.
In Fig. 5 we find that decorrelation with σ T = 1 K shifts the best-fitting value ofβ d (right panel); a superposition of varying foreground parameters changes the average foreground spectrum as shown by [29]. Fortunately, this has little impact on r (left panel). This is a good news for estimation of r, though the appropriate magnitude of σ T in real sky and r.m.s. of other foreground parameters are yet to be determined. At least our method provides a framework within which frequency decorrelation due to any foreground parameters can be treated consistently to first order in perturbation. In table 1, we summarize the mean and variance of the sky-averaged foreground parameters obtained from 1000 random realizations of the CMB with the assumed foreground maps and the frequency channels used in the analysis. 23/31

AME
While it is widely believed that Galactic synchrotron and thermal dust emissions are the most dominant sources of polarized foreground emission on large angular scales [11], there might be a third polarized component, AME, whose most plausible candidate includes tiny PAH particles spinning with dipole moments [30,31]. While the AME has been shown to contribute to µK-level fluctuations in unpolarized intensity at 40 GHz and even smaller fluctuations at higher frequencies [32], little is known about its polarization properties.
The Planck collaboration has put an upper bound on the polarization fraction of AME of Π 1.6% in the Perseus region [33]. The QUIJOTE collaboration has placed a limit of Π 0.39% in the molecular complexes W43 [34]. The polarization degree in more diffuse areas that are suitable for cosmological analyses is unknown owing to the strong contamination from Galactic synchrotron emission. Even though the polarization contribution of AME to the microwave data near the foreground minimum, ν ≈ 80 − 100 GHz, is expected to be negligible, it affects component separation by changing interpretation of low frequency data from pure synchrotron to synchrotron plus AME. In this sense, AME affects estimation of r indirectly. In this section we show that this indirect effect can be absorbed into the synchrotron curvature parameter C s .
To estimate the effect of AME, we model it following [35]. Specifically, we take the intensity map of thermal dust emission at 353 GHz and rescale it by a factor of 0.9 that is found in the Planck 2015 results [33] to predict the AME intensity at 22.8 GHz. We then assign a 1% polarization fraction with the same polarization angles as those of the thermal dust component, with a frequency spectrum given by the cold-medium model described in [36].
We first apply the Delta-map method to the simulation with CMB+power-law synchrotron+one-component MBB dust+AME, using the Delta-map method for power-law synchrotron and one-component MBB dust. We use (40, 60, 140, 230, 280, 340) GHz. We find a significant bias in the estimated r: r est = 0.068 ± 0.011 (68% CL) for r input = 0. At first this result was surprising because the amplitude of AME at 140 GHz is too small to be relevant. We then find that AME affects r via changing the best-fitting synchrotron. We therefore added curvature to the synchrotron spectrum and implement it in the Delta-map method following the procedure in Sect. 2.3. Including C s requires one more frequency channel for synchrotron, so we add one more frequency, which is to be determined. When adding C s , we need to specify the pivot frequency ν s ; we write a curved power-law synchrotron spectrum as ν βs+Cs ln(ν/νs) . We try ν s = 40 and 50 GHz. Choice of ν s does not affect the CMB result because it cancels in the cleaned CMB map, but it would change the meaning of synchrotron parameters.
In the top panels of Fig. 6, we show scatter plots ofC s andβ s from 1000 simulations of CMB+AME+power-law synchrotron+one-component MBB dust emission. We study four configurations: to our Case A configuration (40, 60, 140, 230, 280, 340) GHz, we add 78 GHz (left panel) or 89 GHz (right panel) forC s , and for the pivot scale we choose ν s = 40 GHz or 50 GHz. We find that the distribution is split into two clusters. One clusters aroundC s ≃ 0 andβ s ≃ −3.07 and −3.20 in the left and right panels, respectively, while the other around C s = 0.1 and 0.2 in the left and right panels, respectively. In the former cluster, because |β s | ≫ |C s ln(ν/ν s )|, most of the low-frequency foreground emission can be removed as a power-law emission, and AME is absorbed by the Delta-map terms proportional to δβ s (n) 24 and δC s (n) (see Eq. (19)). This interpretation is supported by the fact that the distribution in this cluster is not very sensitive to the choice of ν s . In some realizations, however, non-zerō C s is needed by chance; in this case the results fall into the latter cluster. How about estimates of r? In the bottom panels of Fig. 6, we find that the mean curvature parameterC s is not correlated with r. In the left panel of Fig. 7, we show the distribution of r for two frequency configurations with r input = 0. We find r < 5.2 × 10 −4 and r < 6.4 × 10 −4 (95% CL) for additional 78 and 89 GHz channel, respectively; thus, we find no evidence for bias unlike when we did not include C s (n) to absorb the effect of AME. In fact, the different number of realizations in the first bin in the left panel of Fig. 7 is due to foreground residuals. Because the polarization intensity of the AME rapidly decreases above 40 GHz, an additional channel at 78 GHz does the better job than 89 GHz in order to capture the spectrum of the AME component, and relatively larger foreground residuals remain for an additional channel at 89 GHz. 25  On the other hand, the distribution ofβ s shifts significantly between the cases of adding 78 or 89 GHz. The peak positions in the distribution ofβ s depend upon the frequency channels used to absorb AME and the frequency spectrum of AME. Because of the fact that the AME spectrum is generally characterized by a negative spectral index and a negative running index, the Delta-map method returns smallerβ s if we choose a higher frequency channel to absorb AME (farther away from the other two low-frequency channels at 40 and 60 GHz). Even though the results forβ s are different (indicating that the physical meaning ofβ s is no longer a simple synchrotron index), we find no bias in the estimated r in either case. (We find a slightly better result for an additional channel at 78 GHz as described above.) This is a good news for estimation of r.

Effect of instrumental noise
All the results we have presented so far were derived without instrumental noise. Here, we shall include instrumental noise in the simulation. As with other template-cleaning methods, the Delta-map method also suffers from a boosted noise; because the method is based on a linear combination of maps at different frequencies having the CMB signal in common, we cannot avoid removing some portion of the CMB when removing foreground components from the CMB channel. A linear combination of Gaussian random noises, on the other hand, will simply result in a larger Gaussian random noise; thus, the net effect is to boost the noise relative to the CMB signal in the map after the Delta-map method is applied.
The effective noise level in a cleaned CMB map can be readily estimated once the noise level at each observing frequency channel is specified. When we use two channels for synchrotron and three channels for dust, the noise covariance matrix for each cleaned CMB map is given by the noise terms in Eq. (29) as n νCMB,cleaned = N νCMB + 5 j=1 α 2 j N νj / 1 + 5 j=1 α j  where α i are the coefficients given by solving Eqs. (5) and (6), and N ν is the noise covariance matrix at a frequency channel ν.
In Table 2, we show some examples of noise specifications and derived constraints on r using them. The effective noise levels we find are 3.0, 4.0, 3.9, 3.7, 3.4, 3.6, 3.8, 3.9, 4.7, 4.5, and 7.0 µK arcmin from the top to bottom specifications in the table. 5 To derive these values, we used the Delta-map method with a power-law synchrotron (Sect. 2.2) with β s = −3.0 and one-component MBB dust (Sect. 2.4) withβ d = 1.65 andT d = 19.15 K. The total effective noise slightly depends upon the foreground parameters. In fact, recent results from S-PASS have shown that the spectral index of Galactic synchrotron emission is slightly steeper than this value (β S−PASS s = −3.22) [37]; in that case, the total effective noise for the specification described in the top row in Table 2 would change from 3.0 to 2.65 µK arcmin.
If we combine several cleaned CMB maps (e.g., 100 and 140 GHz), the effective noise will be reduced to some extent and its inverse noise covariance matrix will be given by the inverse weighted sum of the noise matrices of the cleaned CMB maps as Note that the noise cannot be reduced by as much as the inverse-square of the number of the maps (n −1/2 map ), because the noises in the cleaned maps are correlated with one another. This might give readers an impression that the Delta-map method as presented in this paper is unbiased, but is not necessarily minimum variance. However, this is not the case; as we have discussed at the end of Sect. 4.4, the posterior distribution given in Eq. (61) can be used to combine all maps optimally to estimate r and the mean foreground parameters.

Conclusions
In this paper we have extended the internal template method by accounting for spatiallyvarying foreground parameters such as β s , C s , β d , and T d . This was achieved by expanding frequency spectra of foregrounds to first order in perturbations (spatial variations) [12,13]. We show that a foreground-cleaned CMB map obtained by our method ("Delta-map method") is the maximum likelihood solution (Eq. (49)), and its posterior distribution for the CMB parameters (e.g., tensor-to-scalar ratio parameter r) and the sky-averaged, mean foreground parameters (e.g.,β s ,C s ,β d , andT d ) is obtained by marginalizing over CMB and using the maximum likelihood solution to the foreground maps (Eq. (61)). We have found that the Delta-map method removes a bias in r, which was found in the previous study assuming uniform spectral indices [8], successfully. We have checked this against realistic sky simulations including polarized CMB, power-law synchrotron, and one-or two-component MBB dust emission.
While we have formulated and validated the Delta-map method in pixel space in this paper, it is valid in any other space such as spherical harmonics and wavelet space.
Armed with this new tool, we have presented physically motivated ways to address the frequency decorrelation due to averaging of different foreground parameters along the line of sight (or over pixels) and the effect of AME on foreground cleaning. We have shown that, to first order in perturbation, the decorrelation effect can be accounted for by using slightly different foreground parameters for Stokes Q and U . As an application we simulated a dust map with the r.m.s. temperature fluctuation along the line of sight of 1 K. We show that the best-fitting dust indexβ d is affected by this but the estimation of r is hardly affected.
While AME has negligible contribution to the foreground minimum, 80-100 GHz, it affects estimation of r via its effect on synchrotron cleaning. If not accounted for, it can result in a bias in r as large as δr ≃ 0.06 (for 1% polarization in AME). However, we have found that the effect of AME on synchrotron cleaning can be absorbed by the curvature parameter in synchrotron spectrum, C s , within the Delta-map method. We find that the Delta-map method can reduce the bias in r to undetected level in our simulation.
The remaining question is how the performance of the Delta-map method, especially the derived uncertainty in r, compares with those of other foreground cleaning methods in the literature. We leave this for future work.

A. Mask
One advantage of pixel-based internal-template methods is that one can easily and exactly account for the effect of a mask through the noise covariance matrix [6]. To implement it, we rewrite the likelihood function as where S = C tens (r) + C scal (s) and N are CMB signal and noise covariance matrices, respectively.
We rearrange the data vector and the covariance matrix such that unmasked pixels appear first and masked pixels afterward. Suppose that the structure of N is given by where T is the noise covariance matrix for unmasked pixels, W is for masked pixels, and U is for their correlation. Then the matrix inverse is given by where Q ≡ W − U T T −1 U and T −1 = B 11 − B 12 B −1 22 B 21 (Schur complement). We then assign infinite noise to the masked pixels such that N → N + λ(I − M ), where I is the unit matrix and M is the diagonal matrix whose elements are zero for masked pixels and unity otherwise. In the limit of λ → ∞, Q → ∞, and the inverse of N is given by In this work, we have utilized the "P06 mask" defined by the WMAP polarization analysis, degrading it to N side = 4. The sky coverage of our default P06 mask at N side = 4 is f sky = 0.56. Whilst we find that this mask is sufficient to suppress the residual O(δp 2 ) term in Eq. (3), it is interesting to investigate how the results change if one uses different masks. We apply the Delta-map method to the simulations with two different masks, namely more aggressive and conservative ones, whose sky coverages are f sky = 0.22 and 0.69, respectively, as shown in Fig. A1. The results are summarized in Table A1. We find that the aggressive mask leads to a larger bias and a weaker constraint because of foreground residuals, while the conservative mask leads to a smaller bias and a weaker constraint because of the smaller sky coverage. We caution, however, that one should not accept the results for the aggressive (and full-sky) case at face value because our foreground model may not be accurate enough close to the Galactic plane.