Improving Weak Lensing Mass Map Reconstructions using Gaussian and Sparsity Priors: Application to DES SV

Mapping the underlying density field, including non-visible dark matter, using weak gravitational lensing measurements is now a standard tool in cosmology. Due to its importance to the science results of current and upcoming surveys, the quality of the convergence reconstruction methods should be well understood. We compare three methods: Kaiser-Squires (KS), Wiener filter, and GLIMPSE. KS is a direct inversion, not accounting for survey masks or noise. The Wiener filter is well-motivated for Gaussian density fields in a Bayesian framework. GLIMPSE uses sparsity, aiming to reconstruct non-linearities in the density field. We compare these methods with several tests using public Dark Energy Survey (DES) Science Verification (SV) data and realistic DES simulations. The Wiener filter and GLIMPSE offer substantial improvements over smoothed KS with a range of metrics. Both the Wiener filter and GLIMPSE convergence reconstructions show a 12 per cent improvement in Pearson correlation with the underlying truth from simulations. To compare the mapping methods' abilities to find mass peaks, we measure the difference between peak counts from simulated {\Lambda}CDM shear catalogues and catalogues with no mass fluctuations (a standard data vector when inferring cosmology from peak statistics); the maximum signal-to-noise of these peak statistics is increased by a factor of 3.5 for the Wiener filter and 9 for GLIMPSE. With simulations we measure the reconstruction of the harmonic phases; the phase residuals' concentration is improved 17 per cent by GLIMPSE and 18 per cent by the Wiener filter. The correlation between reconstructions from data and foreground redMaPPer clusters is increased 18 per cent by the Wiener filter and 32 per cent by GLIMPSE.


INTRODUCTION
Mass map reconstruction from weak gravitational lensing recovers the underlying matter distribution in the universe from measurements of galaxy shapes. Images of distant galaxies are deformed by the inhomogeneous matter distribution along the line of sight. Any matter can contribute to the lensing effect, making it a direct probe of non-visible dark matter.
Weak lensing, which takes advantage of the statistical power from many small distortions (that is, gravitational lensing induced "shears"), is now a well established tool in constraining cosmology. The Dark Energy Survey (DES) has used the 2-point correlation function of shear to contribute to excellent constraints on cosmological parameters and models, including the nature of dark energy (DES Collaboration et al. 2017). Shear 2-point correlation functions have been used to constrain cosmology from many other survey datasets (van Uitert et al. 2017, Kilbinger et al. 2013. These methods use the shear measurements directly, as the shear can be related to the underlying matter distribution without needing to explicitly reconstruct mass maps.
A zero-mean Gaussian random field can be characterised entirely by its 2-point correlations. The matter density field in the early universe is expected to be highly Gaussian, a property which persists into the late universe for the large scales that were less affected by gravitational collapse. For the smaller scales at late times, non-linear gravitational collapse has led to a highly non-Gaussian density field. Much valuable information can be extracted from this non-Gaussianity, although this requires additional methods beyond 2-point statistics.
Popular proposed methods to extract this information include N-point statistics and higher order moments (Cooray & Hu 2001), peak statistics (Dietrich & Hartlap 2010, Kacprzak et al. 2016, Peel et al. 2017, Shan et al. 2017, Martinet et al. 2017, and Minkowksi functionals (Kerscher et al. 1996, Petri et al. 2013. It is often either essential or convenient to apply these methods to the density field directly (rather than in the space of the shear measurements), thereby necessitating a reliable mass map reconstruction.
Peak statistics are particularly promising, as peaks in the density field probe the non-Gaussian structure directly. Peaks can be identified from aperture mass maps, which are derived by convolving the shear data with a kernel, or from the reconstructed density field. The first approach has the advantage of having local noise, while the second is "closer" to the underlying density field and often has faster algorithms. Both methods often require simulations to provide a link between the theory and data, with the exception of proposed semi-analytic models (Lin & Kilbinger 2015, Shan et al. 2017. In addition to using mass maps for higher order statistics to constrain cosmological parameters and models, the mass maps can themselves be intrinsically useful. Clerkin et al. (2017), using the original DES Science Verification (SV) mass map, show evidence that the 1-point distribution of the density field is more consistent with Log-Normal than Gaussian. Combining mass maps with the spatial distributions of stellar mass, galaxies, or galaxy clusters allows the relationship between the visible baryonic matter and invisible dark matter to be studied. Using mass maps to constrain galaxy bias (Chang et al. 2016), the relation between the distribution of galaxies and matter, can in turn aid cosmological probes other than weak lensing. Maps also enable simple tests for systematic error in the galaxy shape catalogues.
Since the first application of mass mapping methods to wide-field surveys with the Canada-France Hawaii Telescope Lensing Survey (CFHTLenS) data (Van Waerbeke et al. 2013), mass maps have been a standard product of large weak lensing surveys. In addition to DES, current surveys reconstructing the density field from weak lensing data include the Kilo-Degree Survey (Giblin et al. in prep.) and the Hyper Supreme-Cam Subaru Strategic Program (HSC-SSP) (Oguri et al. 2017). Mapping dark matter is key to the science goals of the future Euclid Mission (Amendola et al. 2016) and the Large Synoptic Survey Telescope (LSST Science Collaboration et al. 2009).
DES is a ground based photometric galaxy survey, observing in the southern sky from the 4m Blanco telescope at the Cerro Tololo Inter-American Observatory (CTIO) in Chile with five photometric filters covering the optical and near-infrared spectrum using the Dark Energy Camera (Flaugher et al. 2015, Dark Energy Survey Collaboration et al. 2016. The SV data come from an initial run over a fraction of the final sky coverage, but to almost the full exposure time of the final survey. The sky coverage is still large, 139 deg 2 , and the nearly full exposure (Chang et al. 2015) gives a galaxy density almost equal to what is expected after the complete 5 years of DES observations. This paper uses the public DES SV data to compare the quality of mass mapping reconstruction methods. The maps are of the two-dimensional convergence, κ, a weighted projection of the density field in the foreground of the observed background galaxies. Recovering the convergence from the shear data is an ill-posed inverse problem, troubled by survey masks and galaxy "shape noise". This work follows on from that of Chang et al. (2015) and Vikram et al. (2015) in which the original DES SV mass map was created using the Kaiser & Squires (1993) method. In this paper we compare three quite different methods: Kaiser-Squires (KS); Wiener filtering (Wiener 1949); and Glimpse (Leonard et al. 2014, Lanusse et al. 2016), a sparsity-based reconstruction method. The Kaiser-Squires method is a direct inversion from shear to convergence, taking no account of missing data or the effect of noise. The Wiener filter and Glimpse assume different prior knowledge about the underlying convergence to account for the effects of noise and missing data.
In Sec. 2 we describe the theoretical foundation for weak lensing mass mapping and the three different methods used for this work. In Sec. 3 we describe the DES SV shear data, the accompanying simulations, and the redMaPPer galaxy cluster catalogue. Foreground galaxy clusters are expected to trace the true density field, and therefore should be correlated with the convergence reconstruction. The different methods are also applied to realistic data simulations where the true convergence is known. In Sec. 4 we present our results on data and simulation, using various quality metrics for the reconstruction. On simulations these metrics are the Pearson correlation coefficient, the pixel root-mean-square error (RMSE), the variance of the 1-point distribution of pixel values, the phase residuals, and peak statistics. On data we compare the convergence reconstructions to the foreground galaxy clusters. We conclude in Sec. 5.

Weak gravitational lensing
We can use measurements of the distortion of background galaxy shapes by weak gravitational lensing to learn about the mass distribution in the foreground without making many physical assumptions or relying on phenomenological models. For convenience, here we summarise some of the existing literature relevant for mass mapping from weak lensing (Bartelmann & Schneider 2001, Kilbinger 2015. The weak lensing formalism follows photon paths along geodesics in a perturbed Friedmann-Robertson-Walker (FRW) metric. The perturbations are sourced by the density field of large scale structure. Throughout we assume that the perturbations are small, and that the measurements are made over a small enough patch of the sky that the sky geometry is Euclidian. Consistent with the Planck CMB results (Planck Collaboration et al. 2016) and motivated by inflationary theory, we assume that the global geometry of the universe is flat.
The density contrast, δ = (ρ−ρ)/ρ, of a pressureless fluid is related to the scalar gravitational potential perturbation, Φ, through the Poisson equation, where H 0 is the present value of the Hubble parameter, a is the cosmological scale factor, and ρ andρ are the local and mean density respectively. For a flat universe, the lensing potential is given by where ω is the comoving distance. The Born approximation assumes that the observed angle to a point, #» θ , deviates only a small amount from the true angle #» β , so the change in distance of the photon's path is negligible. We can characterise the effect of lensing on the galaxies using the Jacobian of the transformation, A i j = ∂ β i /∂θ j , which is decomposed into the functions κ( #» θ ) and γ( #» θ ) = γ 1 + iγ 2 , and which is given by Using the definition of the lensing potential and the Poisson equation, the convergence can be expressed as an integral over the density along the line of sight, For a distribution n(ω) of lensed galaxies, the lensing efficiency kernel is defined to be this weights the contribution of the foreground density fluctuations to give the convergence weighted over the redshift distribution of source galaxies, The shear, γ( #» θ ), which is assumed to be an observable in the weak lensing limit, is given by For surveys where the integral is over large angles on the sky, this formulation breaks down, and requires a full treatment in spherical bases. Wallis et al. (2017) show that errors can be introduced at an O(1%) level for correlations between points at DES SV angular separation depending on the projection. All of the methods used here use the small angle approximation, and should suffer equally. The real and imaginary parts of the shear γ represent a chosen two dimensional coordinate system. In weak lensing, the observed ellipticity 1 of a galaxy obs is related to the reduced shear g plus the intrinsic ellipticity of the source galaxy s through The reduced shear is approximately the true shear, g ≈ γ, in the weak lensing limit. This allows a standard definition of observed shear, γ obs = obs , where the measurements are degraded by "shape noise", caused by the s values of the observed galaxies: The shape noise for a given galaxy is modelled as a randomly-drawn Gaussian variate, s ∼ G(0, σ ), where σ is estimated from data. The distribution of the ellipticity from the SV data in figure 2 is not an exact Gaussian, as the true distribution is the result of galaxy astrophysics, though a Gaussian still has properties that make it a good approximation. The Gaussian would be the maximum entropic, least informative, distribution for known mean and variance, and, by the central limit theorem, would be the correct distribution in the limit of large numbers of galaxies averaged in pixels.
It is possible to extend the simple Kaiser-Squires method (Sec. 2.2) to use the reduced shear, g, for the mildly non-linear lensing regime when it is no longer appropriate to assume g ≈ γ , Seitz & Schneider 2001. This is also done by Glimpse (Sec. 2.4).
In matrix notation, the problem as given by equations 7 and 9 can be expressed as a linear model, with a data vector of observed shear measurements where A is a discretised version of equation 7 and n is a noise vector due to shape noise (equation 9). The elements of the data vector can either correspond to the individual shear measurements or to measurements binned into angular pixels (in which case the noise vector would be the average noise in the pixel). The convergence need not be reconstructed with the same pixelisation as the shear measurements, giving κ and γ vectors of different length. Missing data due to survey masks would correspond to a shorter γ vector; here one may wish to fill in the convergence in the masked region -this is known as inpainting. Different sized κ and γ vectors result in a nonsquare A matrix, potentially causing inversion problems.

Theory
The convergence-to-shear relationship, equation 7, is a convolution in the two dimensional angular plane. The twodimensional Fourier transforms of the shear and convergence, defined for κ as are related through an elementwise product via the convolution theorem where the Fourier transform of the kernel is given bỹ here k 1 and k 2 are the components of #» k . UsingDD * = π 2 , equation 12 can be rewritten: The inverse Fourier transform then returns the convergence reconstruction in configuration space (Kaiser & Squires 1993). The real and imaginary parts of the reconstruction are the E-and B-modes respectively, where κ recon = κ E + iκ B . In standard cosmology (equation 7), the convergence sourced by a real density field should be a pure E-mode. Errors, noise or other systematic effects can lead to B-mode contributions to the reconstruction.

Implementation
In the matrix formulation of equation 10, this deconvolution corresponds to multiplying the Fourier space shear field with the inverse of A in Fourier space. For a case with no shape noise, that is the Kaiser-Squires method is identical to using the inverse matrix where the Kronecker delta function, δ i j , relates the elementwise multiplication in Fourier space to a diagonal matrix operator, and † is the conjugate transpose. For the Kaiser-Squires inversion in configuration space, the A and A † matrices are not diagonal, and therefore are slower to compute. The discretisation of the underlying smooth shear field into finite configuration space makes the property AA † = I inexact. As a result of these factors, we choose to implement the Kaiser-Squires reconstruction in Fourier space.
The shear due to lensing is much smaller than the shape noise, and not all places on the sky contain usable galaxies. Both the shape noise and the random sampling of background galaxies propagate error through this noisy reconstruction. Binning the shear measurements into larger pixels can reduce the shape noise per pixel and ensure that there are no empty pixels, but this comes at a loss of the small scale information and cannot deal with masks or the edges of the survey.
A smoothing filter is applied to the Kaiser-Squires reconstruction to reduce the noise. This will similarly lose any small scale structure, and especially suppress peaks in the convergence. In this work, matching Chang et al. (2015), we smooth the Kaiser-Squires maps with a Gaussian kernel. The standard deviation scale, σ smooth , of this Gaussian kernel is free to be chosen, where σ smooth = 0 corresponds to standard, unsmoothed Kaiser-Squires.

Theory
The Wiener filter is the linear minimum-variance solution to linear problems of the type in equation 10, where the noise is uncorrelated. The Wiener filter reconstruction (Lahav et al. 1994, Zaroubi et al. 1995 is given by Here S κ and N are the signal and noise covariance matrices respectively, which are κκ † and nn † for this problem. This filter is the linear minimum-variance solution, as W is a linear operator that minimises the variance If the chosen prior on κ does not constrain the reconstruction, so that S −1 κ → 0 (Simon et al. 2009), or if the data are noise-free, N = 0, then the linear minimum variance filter becomes the Kaiser-Squires reconstruction. Setting S −1 κ → 0 is equivalent to removing the signal prior in the following Bayesian framework.
From a different starting point, for the Wiener posterior we begin by assuming a Gaussian likelihood (Jasche & Lavaux 2015) Pr where it is assumed that N is known and the noise is both uncorrelated and Gaussian, as assumed in equation 10. Intrinsic alignments of clustered galaxies will violate this uncorrelation condition. The prior on the convergence is that of a Gaussian random field, which is applicable for the density field on large scales at late times, Using Bayes' theorem and the fact that Pr(γ|S κ , κ) = Pr(γ|κ), the full posterior is given by where W is the Wiener filter, so the maximum a posteriori (MAP) solution is that of the Wiener reconstruction. If the aim of the reconstruction is to infer cosmology from the non-Gaussian component of the density field, the Wiener filter may not be the ideal method for mass map recovery. The small scale modes with less power are often suppressed, losing the peak structure. Qualitatively it can be thought of as either the Gaussian prior being inappropriate or as the linear filter being insufficient.

Implementation
Using the exact Fourier space propertyÃ −1 =Ã † we rewrite equation 17 as where we have usedÃS κÃ † = Ãκκ †Ã † = γγ † =S γ . This shows that applying the Wiener filter to the shear to recover γ W and then applying the Kaiser-Squires inversion in Fourier space is equivalent to directly calculating the Wiener filter of the convergence.
In configuration space, the noise covariance matrix is given by where p i is the galaxy count per pixel. Empty pixels in the masked region have infinite variance, absorbing the mask into a special case of the Wiener filter denoising. The signal properties for a Gaussian random field are constrained entirely by the mean and the signal covariance matrix, which in harmonic space is identical to the power spectrum. The cosmological principle implies that the angular distribution of a field on the sky is statistically isotropic, so the angular power spectrum, C , can contain all the 2point statistical information. The angular power spectrum of the physical shear E-mode shear signal is defined as where a m are the spherical harmonic coefficients and the brackets average over realisations of the signal. The second equality assumes the flat sky approximation for high 2 .
It is commonly asked whether it is reasonable to assume cosmological parameters in the map reconstruction, if the maps are then used to infer cosmological parameters. Though we assume a specific set of cosmological parameters, it would still be possible to use the maps for cosmological parameter estimation, from peak statistics for example, if the same prior is used on the simulations and the data identically. If simulations are not used, the power spectrum can be jointly inferred from the data (Jasche & Lavaux 2015) using Gibbs sampling.
In order to generate the power spectrum in flat Fourier space, rather than on the curved sky, we again use a flat sky approximation adapted from Loverde & Afshordi (2008), where N is the total number of pixels in the map, k θ is the magnitude of the projected Fourier mode, and where we have defined our projected angular power spectrum as The largest scale mode is = 20.51, which corresponds to an angular separation of 17.55 deg.
Though the signal covariance matrix is diagonal in harmonic space (equation 26), and the independent noise has covariance which is diagonal in configuration space (equation 23), there is no natural basis in which both are sparse. Inversion of dense matrices to evaluate the Wiener filter is bypassed using the algorithm presented in Elsner & Wandelt (2013), where an additional messenger field is used to pass information between harmonic and configuration space, iteratively converging to the Wiener filter solution.
These messenger field methods were extended by Jasche & Lavaux (2015) to draw Markov chain Monte Carlo (MCMC) samples from the whole Wiener posterior (equation 21). The first application of messenger field methods to weak lensing data was by Alsing et al. (2016Alsing et al. ( , 2017, who drew samples from the Wiener posterior and generated Wiener filtered shear (not convergence) maps from CFHTLenS data. By comparison, in this work we do not sample from the Wiener posterior; instead, we use the original messenger field algorithm of Elsner & Wandelt (2013) to calculate the Wiener filter reconstruction of the convergence map from DES SV shear data and simulations.

Theory
Consider the coefficients α of the decomposition of a signal x in a representation space (or "dictionary") Φ, so that x = Φα. Example dictionaries include the Fourier transform or wavelet transforms. Assuming a sparse prior on the signal x in the dictionary Φ means that its representation α is expected to be sparse, that is, with most of the coefficients equal to 0 (Starck et al. 2015). A simple example is a cosine function signal and a Fourier transformation; in this sparse basis only two coefficients have a non-zero value (corresponding to the frequency of the cosine function).
Formally most signals cannot strictly be made sparse, and are merely compressible with a choice of an appropriate transformation, such as a wavelet transform (Starck et al. 2015, Leonard et al. 2014. For a compressible signal the magnitude-ordered sparse coefficients, α i , are expected to have exponential decay and therefore to have a Laplace distribution (Tibshirani 1994).
Consider a generic linear inverse problem of the form y = Ax+n. A robust estimate of the signal x can be recovered by solving the ("LASSO") optimisation problem where λ is a Lagrangian multiplier (Tibshirani 1994). Here the first term corresponds to a χ 2 minimisation, ensuring fidelity of the signal reconstruction, while the second is the sparsity-promoting regularisation term. We can include non-constant noise variance by weighting the first χ 2 according to the variance. If the noise variance is included in the χ 2 term, the λ value can be interpreted as a signal-to-noise level in the transformed (e.g. wavelet) space.
The second term does not use the Euclidan l 2 norm, but instead uses the sparsity-promoting l 1 norm, defined as These methods are non-linear, so it can be difficult to derive properties analytically. With realistic simulations of the data and true signal, the value for λ can be chosen to maximise some success metric. This is analogous to selecting a theoretical power spectrum for the Wiener filter, or a smoothing scale for Kaiser-Squires. Sparse recovery methods are non-linear and are not necessarily formulated in the Bayesian framework of the Wiener filter. The Wiener filter reconstruction is that which maximises the Wiener posterior, which is known analytically provided the noise and signal are Gaussian with known covariance. However, one may make a frequentist estimate of the error of the sparse reconstruction by propagating the noise properties of the data using bootstrapping or Monte Carlo techniques.

Implementation/GLIMPSE
The choice of dictionary depends on the structures contained in the signal. Theory of structure formation in the universe predicts the formation of quasi-spherical halos of bound matter. It is standard practice to represent the spatial distribution of matter in halos with spherically symmetric Navarro-Frenk-White (Navarro et al. 1996) or Singular Isothermal Sphere profiles. Coefficients of Isotropic Undecimated Wavelets (Starck et al. 2015) in two dimensions are well suited to the observed convergence of a dark matter halo. The wavelet transform used in the Glimpse algorithm is the Starlet (Starck et al. 2007), which can represent positive, isotropic objects.
The sparsity prior in the starlet basis enforces a physical model that the matter field is a superposition of spherically symmetric dark matter halos. This is not wholly correct, but is an approximation which is true for the non-linear regime in the standard model of structure formation, similarly to how the assumption of Gaussianity holds in the linear regime. On large scales, where the density field is expected to be Gaussian, the Glimpse sparsity prior is less appropriate.
The Glimpse algorithm aims to solve the optimisation problem where F is the Fourier transform matrix, T is the Nonequispaced Discrete Fourier Transform (NDFT) matrix,Â is defined in equation 16, ω is a diagonal matrix of weights, and Φ † is the inverse wavelet transform. The indicator function i Im(·)=0 (defined in Appendix B) in the final term imposes realness on the reconstruction (no B-modes). The use of NDFT allows the first term to perform a forward fitted Kaiser-Squires-like step without binning the shear data, allowing the smaller-scales to be retained in the reconstruction. The full algorithm, including the calculation of the weights, is described in Sec. 3.2 in Lanusse et al. (2016).
Though the problem presented in equation 29 is an optimisation using the shear data γ, in fact it is the reduced shear (equation 8) that Glimpse uses to recover κ (Lanusse et al. 2016). As an extension, the Glimpse algorithm can also perform the joint reconstruction with reduced shear and flexion, a third-order weak gravitational lensing effect (Bacon et al. 2006) (although no flexion data are available for our galaxy shear catalogue).
As the prior knowledge in this reconstruction relates to the quasi-spherical clustering of bound matter, enforced through a sparsity prior in Starlet space, this method should better reconstruct the smaller scale non-Gaussian structure than the Wiener filter.

Dark Energy Survey Science Verification Data
The shear data are from the 139 deg 2 SPT-E field of the public DES SV data. This initial test data set was taken during an observing run before the official start of the full science survey. The galaxy catalogue comes from the SVA1 (Science Verification) Data Release 3 . Due to changes to the catalogues before final release (more galaxy shear measurements are now available to us), the catalogue used in this work is not identical to that used by Chang et al. (2015), even when the same data selections are made. All maps are therefore new, and slightly different to the previously published SV map.
The photometric redshifts from five optical filters (grizY ) were estimated using the Bayesian Photometric Redshifts (BPZ) code (Benítez 2000, Coe et al. 2006, & Bonnett et al. 2016. The final median depth estimates are g ∼24.0, r ∼23.0, i ∼23.0 and z ∼22.4 (10-σ galaxy limiting magnitude). The "background galaxies", the ones from which the shear is measured, are taken in the range 0.6 < z mean < 1.2. The z mean value for each galaxy is the mean of the posterior probability distribution function (PDF) estimated using the BPZ code. The PDF for each galaxy is very broad, giving a total stacked PDF of background galaxies that extends beyond the [0.6, 1.2] redshift range, as can be seen in figure 1.
Using the ngmix shape catalogue, we apply a selection of sva1 flag = 0 & ngmix flag = 0 to obtain galaxies with a well-measured shear. The ngmix catalogue contains corrections to measurement bias, in the form of "sensitivities", which can be applied to a weighted ensemble of hundreds or thousands of galaxies, but which cannot be applied per galaxy (which is not ideal for mass mapping). The structure of equation 7 implies that a multiplicative shear bias would lead to a convergence amplitude bias. Under the assumption that multiplicative shear bias will not vary across the survey area, we correct all measured ellipticities by the same debiasing factor obs,i = measur ed,i ×s −1 , where i is a galaxy index ands (≈ 0.82) is the mean sensitivity correction from all galaxies in our ngmix-selected cat- stacked PDF lensing efficiency z mean point estimate Figure 1. The redshift distribution from BPZ of the selected background galaxies with 0.6 < z me a n < 1.2. The blue solid histogram is of the galaxies' point estimate mean redshifts in bins of ∆z = 0.02. The red line is the stacked redshift probability density function (PDF) of all selected galaxies. The green dashed line is the lensing efficiency (equation 5) of the background galaxies.
alogue. The total number of galaxies after the redshift and shape measurement selection is 1, 628, 663.
For the Kaiser-Squires reconstruction, the shear measurements are binned into angular pixels in a 256 × 256 map, with average pixel size of 4.11 arcmin, using a sinusoidal projection with a centre at RA=71.0 deg. This is similar to the 5 arcmin pixel scale of the original Chang et al. (2015) map. The choice of central RA for Kaiser-Squires is to minimise the mask in the square projection, which is a large source of systematic error. For the Wiener filter, where the mask is taken into account, the shear measurements are also binned into angular pixels in a 256 × 256 map, but sinusoidally projected with a central RA=81.3 deg, to make the square maximally isotropic. The Glimpse algorithm does not bin the input shear measurements, but requires a pixel scale for the reconstruction, which we set as 3 arcmin using its gnomonic projection centred on RA=76.95 deg and DEC=−52.23 deg.

redMaPPer Clusters
Groups and clusters of galaxies are expected to trace the highest density regions in the foreground. They are luminous objects that correspond to regions of highly non-linear growth, where the density field has deviated from Gaussianity.
The public redMaPPer cluster catalogue (Rykoff et al. 2016) used the redMaPPer algorithm to optically identify clusters and to estimate each cluster's richness, λ RM . The richness is defined as the sum of the membership probabilities over all galaxies within a scale radius (chosen to minimise the scatter in the mass-richness relation); it gives an estimate for the number of galaxies in a cluster. Cluster mass is expected to scale approximately linearly with richness. The redshift uncertainty is excellent, around σ z /(1 + z) ∼ 0.01, due to the clusters containing large numbers of well modelled, red galaxies. The public redMaPPer catalogue used in this work contains only clusters with λ RM 20, so that the clusters with less certainty of detection and characterisation are not used. A Gaussian distribution with the same mean and standard deviation shows that the ellipticity distribution is not a true Gaussian, though the noise per pixel will be more closely Gaussian due to the central limit theorem.

Simulations
To compare the reconstructions between different methods, we use a simulated catalogue with a known true convergence. We use a set of N-body simulations developed for the DES collaboration and designed to be representative of the DES data (Busha et al. 2013). We apply a mask to match the SV data. Source galaxies have randomly-assigned positions in the simulations, as correlation between the background galaxy positions and the weak lensing shear signal is expected to be negligible. The simulated catalogues contain the lensing matrix components, A i j , for each galaxy, calculated with the ray-tracing code CALCLENS (Becker 2013). This provides the true κ and γ per galaxy, from which we derive the reduced shear. The shape noise due to the intrinsic ellipticities of the source galaxies, s , is simulated by adding an ellipticity component to the reduced shear. Each noise realisation is generated from the data by randomly exchanging the ellipticity values between galaxies in the catalogue to remove the weak lensing signal and leave the shape noise.
We attempt to match the redshift distribution of the simulated galaxies to the observed redshift distribution, n(z). We use the stacked posterior probability density functions of individual galaxy redshifts from the selected data catalogue (figure 1), giving an estimate of the true underlying distribution. This assumes that where p i (z) are the individual probability distributions for the galaxies from BPZ. This is not necessarily exact, due to errors in p i (z) per galaxy (Leistedt et al. 2016), but is a reasonable choice for a simulated catalogue. Using rejec-tion sampling in bins of ∆z = 0.02 we select galaxies with a probability equal to the ratio between the desired n(z) from the data and the distribution in the simulation. One typical simulated catalogue contained 1, 629, 024 galaxies, slightly different to the data catalogue due to the sampling scheme, but with the desired n(z).

RESULTS
To ensure that the mass map tests are consistent with different output formats, all maps were converted onto a spherical pixelisation using HEALPix (Górski et al. 2005). A HEALPix map comprises twelve subdivisions on the sphere, which are then each partitioned into NSIDE × NSIDE grids. Each pixel of a HEALPix supersampled NSIDE = 4096 map was filled according to the value at the corresponding RA and DEC in the reconstructed maps. The supersampled high NSIDE maps were then degraded to NSIDE = 1024. The true convergence maps from the simulations were directly binned from the convergence values at galaxy positions to NSIDE = 1024. For all maps the same mask is applied, where pixels with no galaxies are masked. Figure 3 shows the mass map reconstructions from the SV shear data using the three different methods. An example simulation with truth and the three reconstructed maps is shown in figure 4. The "tuning parameters", σ smooth = 10.0 arcmin for Kaiser-Squires and λ = 3.0 for Glimpse, are tuned to maximise the Pearson correlation coefficient r with the underlying truth when tested on simulations.
Using a suite of 10 simulations, in Sec. 4.1 we calculate the Pearson correlation coefficient between the truth and the reconstruction with different methods as a test of the reconstruction's quality. In Sec. 4.2, we calculate the rootmean-square error of the residuals between the truth and the reconstruction. In Sec. 4.3 we calculate the variance of the 1-point distribution of the pixel values in the reconstruction and compare with the truth. In Sec. 4.4 and Sec. 4.5 we quantify the quality of the reconstruction of the phase and peak statistics respectively, by comparing to the simulated truth. The final result presented in Sec. 4.6 compares the reconstruction from the DES SV shear data with foreground galaxy clusters from the redMaPPer catalogue (which are expected to trace non-linearities in the underlying density field).
In this work we do not use correlation functions as a test of the map reconstruction. None of the mass mapping methods here are expected to reproduce the correct correlation functions or power spectra. It is simple to show this analytically with the Wiener filter, where despite the filter giving the MAP pixel values, the pixel variance, and therefore the power spectrum, is suppressed.

Pixel Cross Correlation
We quantify the correlation between the true convergence from simulation and the reconstructed convergence of the simulated catalogue using the Pearson correlation coefficient. As with other metrics of success for mass map reconstruction, this can be used to tune the sparsity λ parameter and the smoothing scale for Kaiser-Squires. The Pearson correlation coefficient, r, between the pixels' true convergence, κ truth , and the reconstruction, κ recon , is given by where the summations are over all pixels i in the map and κ is the mean convergence in the map.
In the left panels of figure 5, the Pearson r value from 10 simulations is plotted for varying tuning parameters. Almost all of the simulations and also their mean have a maximal Pearson r value at σ smooth = 10.0 arcmin for Kaiser-Squires and at λ = 3.0 for Glimpse. Table 1 presents the mean value from the 10 simulations, where the tuning parameter is chosen to maximise r when relevant. All methods show good correlation with the underlying true convergence. Both the Wiener filter and Glimpse have the same highest value of r = 0.37, 12% higher than Kaiser-Squires. Note that the Pearson correlation coefficient as pre-

Pixel Residuals
The difference between the true convergence from simulation and the reconstruction in pixel i is defined as We define the root-mean-square error (RMSE) as where n is the number of pixels. A smaller value of RMSE for a given method implies a better reconstruction according to this metric. It is this RMSE that the Wiener filter attempts to minimise using a linear filter, as defined in equation 18, by using an assumed signal covariance κκ † (see Sec. 2.3.1).
The centre panel of figure 5 shows that increasing the smoothing scale, σ smooth , for Kaiser-Squires or the regularisation parameter, λ, for Glimpse initially reduces the pixel RMSE, but increased filtering contributes little beyond σ smooth = 10.0 arcmin for Kaiser-Squires or λ = 3.0 for Glimpse.
The smallest mean pixel RMSE is 10.0×10 −3 for Kaiser-Squires and 9.9×10 −3 for Glimpse. The Wiener filter, whose smoothing is constrained by the prior on C and which therefore cannot be tuned, has a pixel RMSE of 9.4 × 10 −3 .

Pixel 1-Point Variance
The 1-point distribution can be thought as a histogram of the pixel values. Figure 6 shows an example of such a histogram (derived from the simulated truth map and reconstructions of figure 4).
The mean of this distribution is unconstrained by weak lensing, due to an integration constant in equation 7. The variance of the 1-point distribution is increased compared to the underlying truth due to shape noise in the un-  smoothed Kaiser-Squires reconstruction. A reconstruction method would aim to reduce the variance of the 1-point pixel distribution to match that of the underlying truth. Table 1. The centre column gives the average Pearson correlation coefficient r (equation 32) between κ t r u t h and κ recon from 10 simulations. The choices of σ s moot h = 10 arcmin and λ = 3.0 maximise the Pearson r value. The right column gives the ratio of the pixel variance between κ recon and κ t r u t h (equation 36).

Method
Pearson r Variance Ratio KS (σ s moot h = 10 arcmin) 0.33 3.7 ×10 −1 Wiener filter 0.37 6.3 ×10 −2 Glimpse (λ = 3.0 ) 0.37 5.0 ×10 −1 We define the estimate of the variance of the 1-point distributions of the truth or reconstructed κ as where the notation matches equation 34. The ratio of these variances is given by The closer this value is to 1, the better the variance of the pixel distribution matches the truth. Using 10 simulations we can calculate this quantity for different reconstruction methods (and at different smoothing scales or λ regularisation values where relevant). In figure 5 the right panel shows the result of this test for Glimpse and Kaiser-Squires. Both methods show a pixel distribution that has too high variance for insufficient filtering, and too low variance for over-filtering. For Kaiser-Squires, the ratio is closest to 1 at a smoothing scale of σ smooth = 5 arcmin. For Glimpse, the ratio is closest to 1 at a sparsity regularisation value of λ = 2.
Both of these reconstruction methods have a matching variance at a smoothing parameter value less than that which maximises the Pearson correlation coefficient r. If one chose this parameter to maximise the Pearson r value, such that λ = 3 and σ smooth = 10 arcmin, a good reconstruction should also have the ratio of the variances as close to 1 as possible.
The right column of table 1 gives the mean variance ratio from 10 simulations with the different methods. The choice of λ = 3.0 and σ smooth = 10 arcmin are the tuning parameters that maximise the Pearson r value for Glimpse and Kaiser-Squires respectively. Though Glimpse and the Wiener filter reconstructions both have the same Pearson r value, the variance of the pixel values of the Wiener filter is much lower with respect to the underlying truth than is the case for Glimpse. This can also be seen in the reconstructions of figure 4, where the Wiener filter pixel values are closer to zero than the simulated true convergence.
The histogram of figure 6 shows, for one single example, the distributions matching what the results of the second column of table 1 describe. Glimpse outperforms the other methods at matching the variance of the underlying truth, however it still falls short. Also, all methods, including Glimpse, have distributions which are symmetric, unlike the asymmetric, heavy-tailed distribution of the true κ values.
Though Glimpse reconstructs maps with the 1-point distribution variance closest to the truth, it is also the only method to have convergence values dropping below the truth. These unphysical "negative peaks" can also be seen in the map reconstructions from data (figure 3) and from simulated catalogues (figure 4), and are likely to come from enforcement of sparsity for positive and negative wavelets equally. The physical motivation for Glimpse comes from a density field of superimposed halos. Though there should be no negative halos, negative wavelets are included to map the underdense regions, clearly at the expense of producing these very negative regions.

Phase reconstruction
The summation over all m modes at each multipole in the angular power spectrum (equation 24) loses all phase information; only the magnitudes are retained. This phase information corresponds to the spatial distribution of anisotropies. As the phases are dependent on the physical underlying structure, they contain information beyond what can be gained by 2-point statistics. Their retention is a wellmotivated, desired property of a mass mapping reconstruction.
Inspired by Chapman et al. (2013), who use phases to test the reconstruction after foreground removal from simulated Epoch of Reonization 21-cm maps, we use the phase residual as a metric of success between our three methods.
The phase difference between the true map and the reconstruction is defined as where arg(z) = arctan Im(z) Re(z) .
A small phase difference ∆θ m between the truth and the reconstruction implies that the phase has been well reconstructed. For random variables drawn from a Gaussian distribution, this would correspond to a small standard deviation. Here, however, a Gaussian distribution would be an inappropriate choice as it assumes the data are defined on an unbounded Euclidean space. The two dimensional data space of phase pairs, {θ truth m , θ recon m }, is a torus, T 2 , and the projected data space of the phase difference, ∆θ m , is a circle, S 1 . On a circle, the maximum entropy, least informative, distribution for specified mean and variance is the von Mises (Jammalamadaka & Sengupta 2001), which in one dimension is given by where I 0 is the modified Bessel function of order 0, and C is a concentration parameter. For µ = 0, a large concentration parameter (analogous to 1/σ 2 ) would correspond to a small dispersion in the phase reconstruction error. The aim is therefore to compare the inferred value of the concentration, C, between different mass mapping methods, with a larger value of C implying a better phase reconstruction. By assuming that the error on the phase reconstruction is independent between phases, we can say that the phase differences, ∆ #» θ , are independent and identically distributed random variables, with a likelihood distribution given by As only the relative values of C are needed to compare different mass mapping methods, the full posterior distribution is not required. Additionally, any reasonable prior distribution, Pr(C), will be either flat or monotonically decreasing above zero, so the ranking of maps by the largest maximum likelihood value or maximum posterior value of C will be identical. For the purposes of this comparison the simpler maximum likelihood estimate,Ĉ MLE , will therefore do.
We calculate the maximum likelihood values of µ and C by taking the spherical harmonic transform of our HEALPix map to recover the a m coefficients up to max = 1024, calcu- lating the phase residual as defined by equation 37 between the truth and the reconstruction for each coefficient, and then maximising the likelihood (equation 39). The maximisation is performed using the scipy package BFGS algorithm (Byrd et al. 1995, Zhu et al. 1997, Morales & Nocedal 2011, using 3 random initialisation values to test for robustness. Figure 7 show the results for the phase reconstruction from 10 simulations using Kaiser-Squires and Glimpse with varying tuning parameters. For Kaiser-Squires the mean phase reconstruction value,Ĉ MLE , is maximised at σ smooth = 5.0 arcmin. For larger smoothing scales the phase reconstruction quality drops, as phase information is lost. For Glimpse the mean phase reconstruction value,Ĉ MLE , is maximised at λ = 3.0. The maximum value ofĈ MLE is not particularly pronounced, and theĈ MLE values are quite stable over a range of λ. Table 2 presents the mean values ofĈ MLE with the best tuning parameters for the three map reconstruction methods. Both Glimpse and the Wiener filter do much better than Kaiser-Squires for reconstructing the phases. Though the variance from these 10 different simulations is large, the Wiener filter does slightly better than Glimpse, as can be seen in figure 7.

Peak Statistics
Peak statistics are a promising method for inferring cosmological parameters from data, as they access information beyond what can be inferred from 2-point correlation functions. Unlike higher order correlation functions, such as the bispectrum, peak statistics are inherently high signal-tonoise. They also probe the highly non-linear regions, where non-Gaussianity is greatest. The effect of masking is trivially taken into account by applying the identical mask to the suite of simulations used to construct a likelihood.
We cannot truly test which mass mapping method best constrains cosmology with the statistics of density peaks without fully deriving the posterior probability distributions of cosmological parameters. It is possible to test which method returns peaks which are distinguishable from noise and at which convergence values. Distinguishing a large number of peaks from noise at high values of κ would mean the map is reconstructing the non-linear regions well.
For a given convergence map, we can define a function, n(κ), that gives the number of peaks as a function of convergence. For a given mass reconstruction method we can compare the peaks in reconstructions from simulated data with the peaks in reconstructions from catalogues of "randoms", with shape noise but no weak lensing shear signal (equiva- lent to γ = 0 in equation 9). If a given map from data or from a simulated catalogue has the same n(κ) as the random catalogues, then the mass mapping method used has been useless for peak statistics. On the other hand, if the map from data or simulation has a very different n(κ) function to that from the reconstruction from the random catalogue, then the map reconstruction method has recovered "true", physical κ peaks.
In the DES SV cosmology constraints from peak statistics, Kacprzak et al. (2016) use this difference as the data vector used to constrain cosmology, This function is far from zero at a given κ if there is a large difference between the number of peaks counted in maps reconstructed (a) from data and (b) from random catalogues. It is reasonable to believe that the number of peaks, n(κ i ), in the i th bin, κ i , is drawn from a Poisson distribution. The difference between two Poissonian random variables follows the Skellam distribution. Using this distribution, we expect the difference in the number peaks in maps from data and from random catalogues to have a mean given by and a variance given by We can therefore define a peak signal-to-noise estimate .
Here we define a peak as a local maxima in the HEALPIX map. Across different methods and smoothing parameters, the predicted variance from equation 42 matches well with the estimated sample variance, verifying that the peak distribution is indeed Poissonian for a given κ. Figure 9 shows the peak signal-to-noise (SNR) estimates from 10 simulations and from 10 random catalogues as a function of κ and smoothing scale, for Kaiser-Squires, or λ, for Glimpse. As the peaks in the maps from data have higher convergence values than those from random catalogues, the SNR(κ) function is negative for low values of κ.
In the figures, the Glimpse reconstruction gives better signal-to-noise estimates on the peaks than does the Kaiser-Squires reconstruction. For Kaiser-Squires, the largest positive and negative signal-to-noise values are 1.52 and -1.28. For Glimpse, the largest positive and negative are signal-tonoise values of 2.32 and -13.72. For the Wiener filter these values are 4.20 and -5.41.
The Wiener filter therefore has the highest signal-tonoise of the peak function n ∆ (κ), though the κ values of these peaks are very low. As can be seen in the reconstruction from the SV data in figure 3, the pixel values of the Wiener filter are much closer to zero. This is reflected in the peak statistic signal-to-noise values. In the left panel of figure 9, the Wiener filter detects negligibly few peaks with κ > 0.0125, whereas Glimpse detects peaks with positive signal-to-noise up to higher values of κ. It is at these high values where the non-Gaussian information due to non-linear structure formation can be probed.

Foreground Clusters
Comparisons with foreground clusters of galaxies is an independent test of the mass map reconstructions, as it uses data (unlike our tests on simulations).
In figure 10 the redMaPPer clusters described in Sec. 3.2 are overlaid on the DES SV κ map reconstructions shown in figure 3. The maps show good spatial correlation between the locations of the clusters and the κ peaks in the map.
The size of a cluster marker is the effective lensed cluster richness λ e f f RM , rather than the redMaPPer cluster richness. This concept is adapted from the definition of κ g presented in Chang et al. (2015). For a given cluster, this is defined as where z is the redshift of the cluster, p(z) is the lensing efficiency at the location of the cluster (see figure 1), and ω(z) is the comoving distance to the cluster (so that the first term matches the integrand of equation 6). The final term normalises the mean, where λ RM is the average richness over all galaxy clusters. The effective lensed cluster richness gives the richness as "seen" by the lensing effect, where clusters at the peak of the lensing efficiency should contribute more to the κ map. We therefore calculate the correlation between λ e f f RM for each cluster and the reconstructed κ value at the cluster centre.
This method does not take into account multiple clusters overlapping in a given line of sight. In figure 10, many small clusters overlap on large peaks in the reconstructed κ  map. The naive one-to-one correspondence between cluster and κ would mistake this for an excess of κ in the reconstruction. However, all methods will suffer equally from this assumption. A more thorough treatment of this overlapping effect is left for future work. Table 3 presents the Pearson correlation coefficient r between the λ e f f RM value of each cluster and the κ recon value at the corresponding pixel. The tuning parameters for Kaiser-Squires and Glimpse are chosen to maximise the Pearson correlation coefficient r between the reconstruction and the truth from simulations (see Sec. 4.1).
Though both Glimpse and the Wiener filter take into account the noise and the mask in the data, and therefore do better than Kaiser-Squires, the Glimpse reconstructions show higher correlation with the effective richness of the foreground clusters than do the Wiener filter reconstructions. This is no surprise, as Glimpse is expected to do better at reconstructing non-Gaussian κ, which would correspond to the non-linear matter structures in which clusters of galaxies form.

CONCLUSIONS
In this work we have presented convergence map reconstructions using the public DES SV shear data with three different methods: Kaiser-Squires, Wiener filter, and Glimpse.
Kaiser-Squires is a simple inversion from shear to convergence, whereas the Wiener filter and Glimpse use prior knowledge about the true convergence to help regularise the reconstruction and to reduce the effects of noise and missing data. The Wiener filter is a Bayesian MAP estimate if the signal and noise are Gaussian and the respective covariance matrices are known. The Glimpse method enforces a sparsity-promoting l 1 norm in a wavelet space where the wavelets represent positive, isotropic, quasi-spherical objects well. Glimpse is therefore expected to do well at reconstructing non-linear structures. The Wiener filter and Glimpse therefore aim to reconstruct different regimes: the linear and non-linear density field.
The three methods were applied to realistic simulations of the DES SV shear data, for which an underlying true convergence is known. Using these simulations we are also able to tune the Kaiser-Squires smoothing scale, σ smooth , and the Glimpse sparsity regularisation parameter, λ.
With these simulations we measure the Pearson correlation coefficient, r, between the truth and the reconstruction with different methods. Compared to the Kaiser-Squires reconstructions we find a 12% improvement in Pearson correlation with both the Wiener filter and Glimpse. The tuning parameters of σ smooth = 10 arcmin for Kaiser-Squires and λ = 3 for Glimpse maximise the Pearson correlation. We also measure the variance of the 1-point distribution of the reconstructed convergence. The Wiener filter suppresses the variance to 6.3 % of the truth, Kaiser-Squires to 37 % and Glimpse to 50 % of the truth. The tunable parameters here were those which maximised the Pearson correlation with the truth.
A large motivation for creating these maps is to reconstruct the convergence while still retaining the non-Gaussian information (which cannot be accessed with 2-point statistics such as the power spectrum). As such, we test the reconstruction of the harmonic phases, which is averaged out in the power spectrum, and the signal-to-noise of a peak statistic data vector, which is a popular probe of non-Gaussian information. The phase residuals between the truth and the reconstruction have the highest von Mises concentration  With realistic data vectors for peak statistics generated from simulations, the maximum signal-to-noise value was increased by a factor of 3.5 for the Wiener filter and by a factor of 9 for Glimpse, compared to Kaiser-Squires. The signal-to-noise of the peak statistic data vector (n ∆ (κ)) is shown in figure 9, where Glimpse has significant signal-tonoise with high convergence peaks, where non-linearities in the underlying density field are highest. We predict these high value peaks are most useful for constraining cosmology beyond Gaussianity. In order to constrain cosmology with these different reconstruction methods, realistic simulations with different cosmological parameters or models must be used and the same reconstruction method should be applied to the simulations and data. As seen from our results, different reconstruction methods can produce convergence maps with different properties.
Finally, we switched from using simulations to instead using real observations (DES SV data). Here we measured the correlation between the reconstructed maps and the effective richness of the foreground redMaPPer clusters (this is the cluster richness as "seen" by the lensing effect). Table 3 shows the results. Compared with Kaiser-Squires, the Wiener filter shows a 18% increase and Glimpse shows a 32% increase in correlation. This demonstrates with independent, cosmological data the ability of the methods to reconstruct non-linear structures.
The metrics we have used for comparing the three reconstruction methods are generic, and they have been inspired by recent applications of weak lensing mass maps to cosmological studies (e.g. Chang et al. 2016, Kacprzak et al. 2016. These metrics may not be optimal for evaluating every application of mass maps. Future studies can compare the efficiency of the three and other methods in end-to-end analyses; for example, with the estimation of cosmological parameters or identification of galaxy clusters. Applying the Wiener filter and Glimpse methods to the DES Year 1 (Y1) shear catalogue would require extensions of the methods to account for the curved sky at large angular scales. The Y1 data covers ≈ 1500 deg 2 and contains ≈ 34, 800, 000 galaxies, so is a large increase in data volume from DES SV. This modification has already been done with an extension of Kaiser-Squires to the sphere by Chang et al. (2017) for the Y1 DES data. These extensions would also be useful for the upcoming ≈ 5000 deg 2 DES Y3 shear catalogue.
Of future interest would be to use the Wiener filter or Glimpse convergence maps for scientific results, as we have shown that they reconstruct the convergence better than Kaiser-Squires according to many different metrics.
We have made our map reconstructions (as shown in figure 3) available at https://github.com/NiallJeffrey/ DES_SV_mass_maps. manuscript, or allow others to do so, for United States Government purposes.
We use the visualization software package SkyMapper 4 for the map figures.