- Split View
-
Views
-
CiteCitation
P. J. Marshall, M. P. Hobson, S. F. Gull, S. L. Bridle; Maximum-entropy weak lens reconstruction: improved methods and application to data, Monthly Notices of the Royal Astronomical Society, Volume 335, Issue 4, 1 October 2002, Pages 1037–1048, https://doi.org/10.1046/j.1365-8711.2002.05685.x
Download citation file:
© 2018 Oxford University Press
Close -
Share
Abstract
We develop the maximum-entropy weak shear mass reconstruction method presented in earlier papers by taking each background galaxy image shape as an independent estimator of the reduced shear field and incorporating an intrinsic smoothness into the reconstruction. The characteristic length-scale of this smoothing is determined by Bayesian methods. Within this algorithm the uncertainties owing to both the intrinsic distribution of galaxy shapes and galaxy shape estimation are carried through to the final mass reconstruction, and the mass within arbitrarily shaped apertures can be calculated with corresponding uncertainties. We apply this method to two clusters taken from n-body simulations using mock observations corresponding to Keck LRIS and mosaicked Hubble Space Telescope (HST) WFPC2 fields. We demonstrate that the Bayesian choice of smoothing length is sensible and that masses within apertures (including one on a filamentary structure) are reliable, provided the field of view is not too small. We apply the method to data taken on the cluster MS1054-03 using the Keck LRIS and HST, finding results in agreement with previous work; we also present reconstructions with optimal smoothing lengths, and mass estimates that do not rely on any assumptions of circular symmetry. The code used in this work (LENSENT2) is available from the web.
1 Introduction
Weak lensing studies of clusters of galaxies are an important complement to X-ray, Sunyaev–Zel'dovich effect and optical observations, allowing the projected distribution of mass to be investigated without any dynamical assumptions. The reconstruction of cluster mass distributions from weak gravitational lensing data is now well established; it has been shown that the projected density distribution can be recovered from magnification data, in the form of background galaxy number densities (Broadhurst, Taylor & Peacock 1995; Dye & Taylor 1998), or from shear data, the net statistical distortion of the images of background galaxies (Tyson, Valdes & Wenk 1990; Kaiser & Squires 1993; Schneider & Seitz 1995; Squires & Kaiser 1996).
We focus here on shear data, primarily because of its greater abundance; the likelihood function for shear data is also better understood (Section 2). Schneider, King & Erben (2000) discuss the use of the two types of weak gravitational lensing data.
Reconstruction methods using shear data fall into two classes: direct and iterative inverse methods. The direct methods are based on the pioneering work of Kaiser & Squires (1993, hereafter KS93); many improvements have since been made to the original algorithm (Bartelmann 1995; Kaiser 1995; Schneider & Seitz 1995; Squires & Kaiser 1996). In all of these methods the galaxy shape data have to be smoothed before they are input to the algorithm; the smoothing length is a parameter that is left undetermined. The class of iterative methods aims to find the mass or projected gravitational potential map that best fits the data (Squires & Kaiser 1996; Bartelmann et al. 1996; Seitz, Schneider & Bartelmann 1998). These methods are well suited to irregularly shaped observations, since they do not suffer from edge effects in the same way as the direct methods; however, they need to be regularized in some way to prevent overfitting the data, and it remains unclear how best to determine the resolution of either the data bins or the reconstruction grid.
In two other papers (Bridle et al. 1998, hereafter Paper I; Bridle et al. 2002, hereafter Paper II) we presented a maximum-entropy inverse method for reconstructing the mass distribution in clusters using shear and/or magnification data. In this paper we extend our method to give a fuller Bayesian analysis. As noted by other authors (Seitz et al. 1998), it would be desirable to work with each background galaxy shape individually, rather than binning or smoothing the data. This issue, together with the problem of the angular resolution of the reconstruction, is addressed by our extended algorithm. We apply our improved method to both realistic synthetic data, and previously published data for the high-redshift cluster MS1054-03. As with any Bayesian analysis, the aim is to derive and interpret the full posterior probability distribution of the quantity being inferred (in this case the mass distribution and any associated parameters). This approach will provide us not just with a mapping procedure, but also with a valuable insight into the quality of the data itself.
The method is reviewed and further developed in Section 2, and is applied to simulated data in Section 3. Section 4 contains the results of our method applied to the well-documented cluster MS1054-03, and gives a brief comparison with the previously published work. Our conclusions are presented in Section 5.
2 Method
The basis of the weak lensing reconstruction method described here is essentially that of Paper I; this section presents several developments in the algorithm and its implementation.
is used to generate a predicted reduced shear field g(θ) through the convolution (Kaiser & Squires 1993, Paper I) where the convergence
and
is a factor dependent on the lens and source redshifts.By design the lensing convolution kernel D is a complex quantity that picks out the two types of lensing distortion g1= Re(g) and g2= Im(g). Unbiased estimates of these components of reduced shear are given by the ensemble average of the background galaxy image ellipticity parameters ε1 and ε2 (Schramm & Kayser 1995).
As in Papers I and II, we aim to reconstruct the projected mass density of the lens defined on a grid of square pixels, where the observing region occupies a smaller area within this grid. This allows for the fact that the mass outside the observed field affects the shear data inside. It has been noted (Seitz et al. 1998) that reconstructing the projected lensing potential allows a purely local estimate of the mass distribution to be derived, by numerical differentiation of the potential. This last step involves throwing away a small amount of information, which describes the mass distribution outside the observing field. Although, as Seitz et al. point out, this information is limited, we feel it is as well to try and include it for completeness. In most cases, the cluster being studied will lie completely within the observing field and the two reconstruction approaches should produce indistinguishable results; it is then a matter of taste as to which quantity is inferred. Since here we are interested in the masses of clusters, we choose to reconstruct the surface mass density directly, leading to simply estimated projected masses with well-understood derived uncertainties.
2.1 Using individual galaxy shapes
In Paper I, the predicted reduced shear was compared with measured galaxy ellipticities averaged over coarse grid cells. Following Seitz et al. (1998), we prefer to use each galaxy shape individually, as independent estimators of the reduced shear. This procedure removes the potential problem of the bin boundaries affecting the inferred mass distribution, allows for optimal angular resolution in the reconstruction, and leaves the data in as pure a form as possible. The reconstruction grid pixel size is chosen to have approximately one galaxy per pixel, leading to comparable numbers of data points and fitted parameters. However, each data point has a very low signal-to-noise ratio, indicating that the number of parameters should be reduced in some way – this issue is addressed in the next section. The convolution of equation (1) is performed using fast Fourier transforms, and the resulting reduced shear field is interpolated on to the background galaxy positions.
2.2 The ICF and Bayesian evidence
This approach contains an implicit assumption that the values of Σ are uncorrelated. However, we expect clusters of galaxies to have smooth, extended projected mass distributions, and wish to include this knowledge in our analysis. The intrinsic correlation function (ICF) formalism (Gull 1989; Robinson 1992) allows us to do exactly that; the physical distribution Σ is expressed as the convolution of a ‘hidden’ distribution with a broad kernel (the ICF). In this way the smoothing, which is always necessary at some stage when using such noisy data, is transferred from the data to the reconstruction process itself. The large number of free parameters in the model (the hidden pixel values) is effectively reduced by this smoothing to a number appropriate to the quality of the data. In this way the properties of the noise can be carried through in a calculable, if non-linear, fashion. Seitz et al. (1998) incorporate smoothing in their reconstruction scheme, but in an iterative way, making error estimation non-trivial.
Interpolation of a fine grid of predicted reduced shear values on to the galaxy positions retains the potential to obtain high angular resolution reconstructions; inclusion of an ICF effectively reduces the number of independent pixels to one more appropriate to the quality of the data at hand. An increase in the number of pixels in the working grids and the inclusion of an extra convolution calls for a faster numerical algorithm than that used in Papers I and II; we have utilized the commercially available software memsys4, developed by Maximum Entropy Data Consultants Ltd.1 This code is widely used in the image processing community and has been proven to be highly stable; details of the numerical algorithms can be found in Gull & Skilling (1990).
2.3 Quantitative mass mapping
3 Application to simulated data
To demonstrate the method outlined in the previous section we now apply it to two simulated clusters, taken from the sample generated by Eke, Navarro & Frenk (1998). The naming of the clusters is retained from that paper, in which a flat Universe dominated by a cosmological constant was assumed. The same cosmological parameters (Ωm=0.3,ΩΛ=0.7) were used to calculate the critical density and angular diameter distances needed in the lensing analysis below. The Hubble constant is taken as 100 h km s−1 Mpc−1.
3.1 The mass distributions
The two projected mass distributions are shown in Fig. 1. The first, CL10, is at a redshift of 0.2, and has an X-ray emission-weighted temperature of 4.0 keV; the second, CL08, is at redshift 0.78 with temperature of 2.1 keV. Neither cluster is extremely massive, having approximate virial masses of 6 and 1 × 1014h−1 M⊙, respectively. Overlaid on these plots are the observing regions corresponding approximately to the field of view of the Keck Low Resolution Imaging Spectrograph (for CL10, see e.g. Clowe et al. 2000) and an Hubble Space Telescope (HST) mosaic comprising two WFPC2 pointings (for CL08). Also plotted are the apertures used to estimate projected masses for the cluster components. For CL10, these are circles of radius 0.2 h−1 Mpc ( ∼ 87 arcsec) and 0.1 h−1 Mpc (∼ 43 arcsec) centred on the large and small subclumps, respectively, and are referred to as apertures 1 and 2. Aperture 3 is the quadrilateral region between the subclumps. For CL08 a single aperture is defined, being a circle of radius 0.25 h−1 Mpc (∼ 48 arcsec). Neither observing region is particularly large, and the smaller clump in CL10 is marginally outside the observing field. These clusters were deliberately chosen to contain ‘interesting’ substructure, in order to illustrate the angular resolution of the reconstruction method.
Projected mass distribution of two simulated clusters. Left: CL10, a massive cluster at redshift 0.2. Right: CL08, a smaller cluster at redshift 0.78. The grey-scale is Σ/M⊙ pc−2, the contours are spaced in steps of 500 h M⊙ pc−2 for CL10 and 300 h M⊙ pc−2 for CL08. Also marked are the apertures used for mass estimation from the reconstructed maps (dotted) and the mock observing region (dashed).
Projected mass distribution of two simulated clusters. Left: CL10, a massive cluster at redshift 0.2. Right: CL08, a smaller cluster at redshift 0.78. The grey-scale is Σ/M⊙ pc−2, the contours are spaced in steps of 500 h M⊙ pc−2 for CL10 and 300 h M⊙ pc−2 for CL08. Also marked are the apertures used for mass estimation from the reconstructed maps (dotted) and the mock observing region (dashed).
3.2 Simulated lensing data
3.3 Results
Reconstructions were performed for a range of Gaussian ICFs with varying FWHM w; the mass distributions were defined on 64 × 64 pixel grids. Each reconstruction, including aperture integrated mass estimation, required approximately 2 min of CPU time on a single R10000 processor of a Silicon Graphics Origin 200.
3.3.1 CL10
Reconstructions with ICF widths of 20 and 70 arcsec are shown in the top two panels of Fig. 2. The grey-scale is plotted with the same limits as used for the true mass distribution of Fig. 1. For comparison, we also show results for which the mock data was first smoothed, using the same Gaussian ICFs as the smoothing kernel, and then inverted directly using the algorithm of Kaiser & Squires (1993) to obtain a convergence map; results are plotted in the lower panel of the same figure. Here the contours are simply of the surface density, obtained by scaling the convergence by the relevant value of Σcrit; the grey-scale is again the same as in Fig. 1. The grid on which the Fourier transforms were performed was padded with zeros outside the observing region to allow a more direct comparison with our method. At each smoothing scale the maps generated by the two methods contain recognizably similar structures; Fig. 2 illustrates the way in which the smoothing has been moved from the data to the reconstruction. Features that differ from the true mass distribution are similar in both reconstructions, and are caused by the noise realization.
Top: reconstructed mass distributions for the cluster CL10. Left to right, the ICF width parameter w increases from 20 to 70 arcsec. Bottom: KS93 direct inversions for comparison; the shear data were smoothed with a Gaussian of FWHM equal to w in each case. In all plots the contours show surface density in steps of 500 h M⊙ pc−2 . The maximum on the density scale corresponds to a convergence of 0.61.
Top: reconstructed mass distributions for the cluster CL10. Left to right, the ICF width parameter w increases from 20 to 70 arcsec. Bottom: KS93 direct inversions for comparison; the shear data were smoothed with a Gaussian of FWHM equal to w in each case. In all plots the contours show surface density in steps of 500 h M⊙ pc−2 . The maximum on the density scale corresponds to a convergence of 0.61.
Compared with the KS93 results, the maximum-entropy solutions are preferable as they are maps of inferred physical mass (so are necessarily positive), in which the noise on each data point has been translated to inferred uncertainties in the maps. At low values of the ICF width the high noise in the shear data acts to break up the lensing signal, leading to false apparent cluster substructure at small scales. The presence of many low-level spurious features is also a result of this overfitting of the data. Note that the lensing signal is not diluted by moving to higher values of w, in contrast with the effect of data smoothing in the KS93 process.
The two maximum-entropy reconstructions in Fig. 2 are taken from the posterior probability distribution Pr(w| data) as described in Section 2 this distribution is proportional to the numerically evaluated evidence and is shown in Fig. 3. The shape of this graph is typical. The reduction in probability at large ICF widths is because the data are poorly fitted by the overly smooth mass distribution. At the other extreme, small ICF widths are strongly disfavoured as they effectively increase the number of free parameters in the fit; this is the ‘Occam's razor' factor, which arises naturally from Bayesian model selection analysis (e.g. MacKay 1992) and also corresponds to the intuition that one should not over- or undersmooth data. The map corresponding to the maximum of Pr(w|data) represents a ‘map of believable features’, and occurs at w=70 arcsec, shown in the right-hand panel of Fig. 2. The contours of this reconstruction trace the two main mass condensations, and suggest the presence of a bridge of mass between them; no other significant features are visible. In the absence of further information concerning the cluster this map represents the most probable mass distribution, given the data and our choice of the functional form of the ICF. The eye is very sensitive to the high-resolution detail of the main peak in Fig. 2; however, the evidence is sensitive to the entire mass distribution, which is smooth on larger angular scales.
Pr( w |data) for the CL10 analysis. The logarithmic scale corresponds to the joined points, while the solid bars are on a linear scale.
Pr( w |data) for the CL10 analysis. The logarithmic scale corresponds to the joined points, while the solid bars are on a linear scale.
The projected mass within each of the three apertures shown in Fig. 1 was calculated for the preferred 70-arcsec ICF width. This process was carried out on reconstructions from 100 galaxy catalogues with the same observational parameters as in Table 1 but with different galaxy positions and ellipticities. The results are shown in the histograms of Fig. 4, with statistics from these measurements given in Table 2. The distributions are satisfyingly symmetrical about the true values; their widths are in very good agreement with the mean of the individual 1σ errors calculated from equation (10), shown by the solid error bars. This demonstrates that the noise present in the galaxy shape data has been carried through successfully to the inferred quantities.
Mass estimates for CL10. In each panel a histogram of mass estimates from the reconstruction maps from 100 realizations of the background galaxy population are plotted. The dotted line marks the true value. The point shows the mean mass estimate; the error bar is the mean inferred error, not the standard deviation of the histogram. The apertures used are shown in Fig. 1 . Top: circle of radius 0.2 h−1 Mpc centred on the main cluster; middle: circle of radius 0.1 h−1 Mpc centred on the subclump; bottom: quadrilateral region between the two main mass clumps.
Mass estimates for CL10. In each panel a histogram of mass estimates from the reconstruction maps from 100 realizations of the background galaxy population are plotted. The dotted line marks the true value. The point shows the mean mass estimate; the error bar is the mean inferred error, not the standard deviation of the histogram. The apertures used are shown in Fig. 1 . Top: circle of radius 0.2 h−1 Mpc centred on the main cluster; middle: circle of radius 0.1 h−1 Mpc centred on the subclump; bottom: quadrilateral region between the two main mass clumps.
We note that if there is no error caused by the estimation of the galaxy shapes (i.e. σobs=0) then the error bar on the mean inferred mass is reduced by approximately 10 per cent, which corresponds roughly to the change in the combined ellipticity error σ.
We can take this analysis one step further and calculate a mass profile around the larger subclump; this is shown in Fig. 5. The mass estimates can be seen to be accurate over a reasonable range of angular scales, with a slight overestimation as the apertures extend further outside the observing region.
Mass profile for the largest subclump of cluster CL10. The points shows the mean estimated mass within that radius while the error bar is the inferred 1σ error (averaged over 100 noise realizations). The shaded area shows the 1σ dispersion in the mass estimates over the 100 realizations. The dotted line marks the true profile. 0.4 h−1 Mpc corresponds to 170 arcsec.
Mass profile for the largest subclump of cluster CL10. The points shows the mean estimated mass within that radius while the error bar is the inferred 1σ error (averaged over 100 noise realizations). The shaded area shows the 1σ dispersion in the mass estimates over the 100 realizations. The dotted line marks the true profile. 0.4 h−1 Mpc corresponds to 170 arcsec.
Our analysis has at no point attempted to account for the ‘mass sheet degeneracy’ (Falco, Gorenstein & Shapiro 1985; Schneider & Seitz 1995). If the observing field is sufficiently large then the entropic prior acts to pin down the reconstruction at the edge of this region (Paper II) constraining any mass sheet transformation to be small. This control of the mass sheet degeneracy is the outcome of our choice of a low default model value to be used in the cross-entropy function (see Paper II). The value used in all reconstructions in this work was 100 h M⊙ pc−2; a lower value was found to leave the reconstruction maps and mass estimates unaffected. However, significantly increased model values gave masses overestimated by some tens of per cent, with mass sheets visibly present in the reconstructions. With the default model set suitably low, the residual effect of the mass sheet degeneracy on the mass estimates in any given reconstruction is small compared with the uncertainties owing to the noise realization; this can be seen by comparing the widths of the histograms to the ensemble-averaged error bars in Fig. 4.
Interestingly, the higher-resolution maps give mass estimates that are systematically below the true values: such maps provide closer fits to the noise in the data, which breaks up the coherent lensing signal, leading to an underestimate of the total mass present. This also illustrates the way in which the value of w preferred by the evidence is determined by the fit across the whole image, not just at the peaks of the mass distribution.
Although our mass estimation procedure is simplistic, it does produce sensible and accurate results, even for mass condensations located on the edges of the observing region. The filament-like structure lying between the subclumps of CL10, which was hinted at in the reconstruction maps of Fig. 2, was successfully detected in that region; its mass was measured with an uncertainty of ∼20 per cent. The same aperture was translated to the northeast by approximately 200 arcsec, to a region containing just 0.12 × 1014h−1 M⊙. When the mass estimation analysis was repeated, this value was also recovered to within the mean inferred error of ∼40 per cent.
3.3.2 CL08
With a smaller data set and lower peak convergence, this simulation presents a more difficult problem. The most probable Gaussian ICF width is found to be 50 arcsec and the corresponding reconstruction is shown in Fig. 6. The substructure in the cluster has been smoothed over, and the peak density is underestimated by a factor of 2. Although a higher-resolution reconstruction, which is also shown in Fig. 6, does suggest substructure, this particular noise realization does not enable the fine detail of the true cluster mass distribution to be faithfully recovered. As an illustration, 20-arcsec ICF reconstructions of four different noise realizations are shown in Fig. 7. These reconstructions were deliberately selected (from a larger sample of 20) to highlight the extremes of good and bad fortune. The presence of density peaks in the reconstructions is clearly sensitive to the particular noise realization. In contrast, the 50-arcsec reconstructions of the noise realizations of Fig. 7 are all very similar. In practice, there is only ever one available data set; for the set analysed in Fig. 6 there is very little information concerning the two density peaks, but there is a lensing signal from the broader underlying mass distribution. The 50-arcsec reconstruction is the most probable given the data: all of the data are used to infer the global noise properties, which are then reflected in the smooth reconstruction. Given such data we can say only that cluster CL08 is extended in the north–south direction, and that there is a slight suggestion of substructure.
Top: reconstructed mass distributions for the cluster CL08. Left to right, the ICF width parameter w increases from 20 to 50 arcsec. Contours show the surface density in steps of 300 h M⊙ pc−2 . The maximum on the density scale corresponds to a convergence of 0.38.
Top: reconstructed mass distributions for the cluster CL08. Left to right, the ICF width parameter w increases from 20 to 50 arcsec. Contours show the surface density in steps of 300 h M⊙ pc−2 . The maximum on the density scale corresponds to a convergence of 0.38.
High-resolution ( w =20 arcsec ) reconstructions of CL08, from four different realizations of the background galaxy population.
High-resolution ( w =20 arcsec ) reconstructions of CL08, from four different realizations of the background galaxy population.
Despite the apparent poor quality reconstruction, the shear data are still sensitive to the total mass within an aperture: the mass estimation histogram of Fig. 8 shows the total projected mass within 0.25 h−1 Mpc of the cluster centre to be well constrained by the data, with the large error bars (on average 0.2 × 1014h−1 M⊙) comparing reasonably well with the width of the histogram (0.3 × 1014h−1 M⊙). This discrepancy reflects the larger impact of the residual mass sheet degeneracy when smaller observing fields are used. There is a tendency towards overestimation of the total mass as the additive noise in the reconstruction becomes important for this less massive cluster, but this effect is within the estimated error for the aperture considered (mean = 0.92, truth = 0.74 × 1014h−1 M⊙). The effect can be seen more clearly in Fig. 9 (which corresponds to Fig. 5). Note that the edge of the observing region is at 0.3 h−1 Mpc; in such low-signal, high-noise, small observations, extrapolation beyond the immediate vicinity of the cluster is clearly to be done with care. One issue is that of the prior on the ICF width – as the ICF width approaches the size of the observation the effect of the mass sheet degeneracy will increase.
Mass profile for CL08. The points shows the mean estimated mass within that radius while the error bar is the inferred 1σ error (averaged over 100 noise realizations). The shaded area shows the 1σ dispersion in the mass estimates over the 100 realizations. The dotted line marks the true profile. 0.4 h−1 Mpc corresponds to 80 arcsec.
Mass profile for CL08. The points shows the mean estimated mass within that radius while the error bar is the inferred 1σ error (averaged over 100 noise realizations). The shaded area shows the 1σ dispersion in the mass estimates over the 100 realizations. The dotted line marks the true profile. 0.4 h−1 Mpc corresponds to 80 arcsec.
3.4 More advanced analyses
The above analysis used a circularly symmetric Gaussian intrinsic correlation function, but there is no a priori reason why this smoothing kernel should give the best results. We also experimented with circular top-hat, exponential and softened isothermal (‘beta’) profiles, and all were found to give significantly lower values of the evidence for a given data set than the Gaussian ICF. The softened isothermal profile, aiming to optimize the fit to the cluster profile, was found to be worse at suppressing the noise in the outer regions of the cluster, while the presence of such broad wings introduced a large systematic overestimation of the mass integrals owing to the mass sheet degeneracy. It is quite possible that the optimal ICF for the weak lensing reconstruction problem is not Gaussian, but our experience with these three alternative functions leads us to expect that any gain in evidence would be marginal, and the reconstruction for a given ICF width would be changed very little.
The argument given above against using the isothermal profile for an ICF suggests the use of more than one ICF at a time, allowing multiple resolution scales in the reconstruction. The reconstruction then consists of a weighted sum of convolutions of hidden images with varying width ICFs. This ‘multiscale maximum-entropy’ method has been applied to a number of problems (Weir 1992; Bontekoe, Koper & Kester 1994; McLachlan, Hobson & Gull, in preparation); it allows high spatial resolution where the data warrant it. However, when multiscale ICFs were applied to the weak lensing problems shown here, very little increase in evidence was found over the single ICF reconstructions, and the inferred mass distributions from the two approaches were indistinguishable. The introduction of another hidden image increases the size of the hypothesis space, introducing extra complexity into the reconstruction, which is not justified by the quality of the data.
4 Application to real data
We now apply our maximum-entropy method to real data. MS1054-03 is a high-redshift (z=0.83) galaxy cluster; X-ray and dynamical measurements suggest that it has a high mass (TX≈ 10 keV, Jeltema et al. 2001, σ≈ 1150 km s−1, van Dokkum 1999). Two sets of weak lensing data have been analysed. Clowe et al. (2000) produced a catalogue of 2723 background galaxies from a single Keck LRIS pointing, with a number density of approximately 50–60 galaxies arcmin−2. They performed a KS93 inversion using a smoothing kernel with a FWHM of approximately 40 arcsec, and found the cluster to be extended in the east–west direction; with a smaller smoothing kernel they find three mass peaks to lower significance. Hoekstra et al. (2000) measured the ellipticities of 2446 galaxy images from a deep HST mosaic consisting of six interlaced WFPC2 fields. They achieved a source density of around 80 arcmin2. They used the maximum probability extension to the KS93 algorithm (Squires & Kaiser 1996) to produce a higher-resolution map (smoothed with a 20-arcsec kernel) showing three distinct mass peaks.
The maximum entropy reconstructions from these data sets are shown in Fig. 10 for two ICF widths, a low value of w equal to that used in the previously published analysis, and the width that maximizes Pr(w|data. All reconstructions were performed using the Gaussian ICF on a 128 × 128 pixel grid. Hoekstra et al. calculate observational uncertainties on their estimated galaxy shape parameters and add them in quadrature to the intrinsic dispersion; we did the same. No corresponding weighting of galaxies is present in the Keck data set.
Top: reconstructed mass distributions for the cluster MS1054-03. Top: 40-arcsec (left) and 120-arcsec (right) ICF width reconstructions from the Keck data of Clowe et al. Bottom: 20-arcsec (left) and 80-arcsec (right) ICF width reconstructions from the HST data of Hoekstra et al. The left-hand panels contain maps with angular resolution corresponding to that of the maps already published; the right-hand panels show the maximum evidence reconstructions. Contours show surface density in steps of 300 h M⊙ pc−2 . The maps are centred on the brightest cluster galaxy (BCG) position, marked with a cross.
Top: reconstructed mass distributions for the cluster MS1054-03. Top: 40-arcsec (left) and 120-arcsec (right) ICF width reconstructions from the Keck data of Clowe et al. Bottom: 20-arcsec (left) and 80-arcsec (right) ICF width reconstructions from the HST data of Hoekstra et al. The left-hand panels contain maps with angular resolution corresponding to that of the maps already published; the right-hand panels show the maximum evidence reconstructions. Contours show surface density in steps of 300 h M⊙ pc−2 . The maps are centred on the brightest cluster galaxy (BCG) position, marked with a cross.
Both of the high-resolution maps are qualitatively very similar to those given in the referenced papers (where the data were smoothed with Gaussian kernels of the same width as the chosen low-w ICFs), but are now positivity-constrained. In particular, the three peaks found by Hoekstra et al. (2000) are reproduced along with several other features present towards the edges of their map. The reconstruction from Keck is also similar to that found by Clowe et al. (2000).
We now consider the probability distribution of the ICF width parameter w. This is shown for the two data sets in Fig. 11. In both cases the evidence peaks at a significantly larger value of w than that used in the high-resolution maps. These ‘maps of believable features’ clearly show the absence of significant structure away from the central cluster region, and suggest that the quality of the data is such that the substructure observed in the high-resolution maps of Fig. 10 should be interpreted with caution. A measure of the goodness of fit of the two different resolution maps to the data was obtained by calculating the chi-squared statistics of equation (3), with the summation now running over the galaxies contained within 40-arcsec radii apertures placed over the three candidate subclumps inferred from the HST data. Reduced chi-squared values were calculated by dividing by the number of galaxies in the aperture (≈280) minus the number of pixels in the aperture (≈100); these are given in Table 3.
Pr( w |data) for the MS1054-03 analyses. Left: Keck data; right: HST data.
Pr( w |data) for the MS1054-03 analyses. Left: Keck data; right: HST data.
Reduced chi-squared values for each of the three candidate subclumps in MS1054-03.
Reduced chi-squared values for each of the three candidate subclumps in MS1054-03.
It can be seen from this table that there is only a marginal improvement in the fit to the data by decreasing the ICF width; this improvement is heavily outweighed by the ‘Occam's razor' factor present in the Bayesian evidence, which suggests that the extra complexity in the inferred mass distribution introduced by using an ICF width of less than 80 arcsec is not justified by these data alone.
The two data sets are by no means independent noise realizations, since they are both observations of the same background galaxy population. However, the different observing conditions have clearly introduced different galaxy shape measurement errors. The resulting difference in the details of the high-resolution maps of Fig. 10 apparently accords with the conclusions drawn from the probability distribution of the resolution parameter w. However, the differences will also partly be a result of the different weighting of the images in the two data sets.
As stated before, in the absence of any other information concerning the cluster, the maximum evidence map represents the most probable mass distribution given the data; the additional information present in the cluster galaxy light and number density distributions (Hoekstra et al. 2000) and the X-ray surface brightness maps (Donahue et al. 1998; Jeltema et al. 2001) are clearly very important in the detailed interpretation of the weak lensing data, and ideally should be included in a joint analysis.
The projected mass within 0.5 h−1 Mpc (94 arcsec in the cosmology of Section 3) of the brightest cluster galaxy was calculated using the method outlined in Section 2. This was done for both maps given in the lower panels of Fig. 10, and the results compared with that from Hoekstra et al. (2000) in Table 4. Both the high- and the low-resolution mass estimates are consistent with the mass derived, from the tangential shear in circular apertures about the brightest cluster galaxy, by Hoekstra et al. It is reassuring to note that the statistical errors on these mass estimates agree very well with those calculated by aperture mass densitometry elsewhere. The issue of which resolution reconstruction, and so which mass estimate, to prefer may depend on prejudices concerning the likely level of substructure in this cluster. Given no other information concerning MS1054-03 we would conclude that the quality of the data suggest the 70-arcsec resolution map as being the more probable, but that there is strong evidence for substructure in the core of this cluster; in any case the weak lensing data give a projected mass of 1015h−1 M⊙ with a statistical error of about 10 per cent.
5 Conclusions
We have developed a Bayesian analysis, based on the maximum-entropy method of Bridle et al. (1998, 2002), for inferring the distribution of mass in clusters of galaxies from weak lensing shear data. We treat each background galaxy image as a noisy estimator of the reduced shear field of the cluster, retaining all the information concerning both signal and noise and so allowing for the high angular resolution. Use of an ‘intrinsic correlation function’ in the maximum-entropy formalism provides a way of incorporating our prior expectation of clusters as smooth, extended objects, and effectively replaces the data smoothing required by direct reconstruction methods. In contrast to these methods, the lensing signal is not diluted by this process. Moreover, analysis of the posterior probability distribution of the ICF width w, obtained by numerically evaluating the Bayesian evidence, provides an objective way of discriminating between smoothing scales. The map at the peak of this probability distribution was found not to contain any significant spurious peaks, and can be interpreted as the safest conclusion to draw from the data. The higher-resolution maps, although representing an overfit to the data, do contain limited useful information, particularly with respect to substructure in the cluster. The fact that their angular resolution is less favoured by the data quality gives a useful indication of the believability of these features.
Simple mass estimates extracted directly from the mass maps preferred by the evidence were found to be unbiased and accurate to within the estimated errors over a fairly wide range of angular scales; these uncertainties agreed very well with the standard deviation in the mass estimates from 100 different realizations of the background galaxy population. Noisier observations over a smaller field of view were found to give mass estimates with a slight bias towards overestimation – an effect understood in terms of the additivity of the reconstructed distribution and the mass sheet degeneracy. Inspection of the variety of structure in reconstructions from these different noise realizations justified the cautious interpretation of the high-resolution maps, indicating that this analysis provides a useful way of understanding the noise properties of the data.
We have applied our method to two galaxy shape data sets for the high-redshift cluster MS1054-03, one derived from a ground-based observation and the other from an HST mosaic. In both cases the features found in previously published maps, obtained by both direct and inverse methods, are reproduced, but with the added desirable features of being positivity-constrained and quantitatively useful. Simple mass estimates extracted directly from the reconstruction agree well with values found by aperture mass densitometry, as do the errors estimated by both methods.
The principal function of parameter-free mass maps as produced by this method is to provide the equivalent of a mass telescope, allowing images to be generated to aid further, more quantitative analysis. Parametrized fitting to shear data with physical motivation has been performed elsewhere (Schneider et al. 2000; King & Schneider 2001). We believe that the information provided by our variable angular resolution analysis is a helpful guide to this process, whilst also providing reasonably accurate ‘mass photometry’. The use of the Bayesian evidence in such fitting will be addressed in future papers.
Acknowledgments
We thank Henk Hoekstra and Douglas Clowe and their co-workers for supplying us with their data, and Vincent Eke for providing the simulated projected mass distributions. Thanks are also due to Andy Taylor, Richard Saunders and Charles McLachlan for helpful discussions, and to Anton Garret for proofreading the manuscript. We also thank the anonymous referee for comments leading to improvements in the paper. PJM acknowledges the PPARC for support in the form of a Research Studentship. The memsys4code was developed by Maximum Entropy Data Consultants Ltd, and is available for use by the astronomical community under the name ‘LENSENT2'. This can be obtained from



























