Abstract

We develop the maximum-entropy weak shear mass reconstruction method presented in earlier papers by taking each background galaxy image shape as an independent estimator of the reduced shear field and incorporating an intrinsic smoothness into the reconstruction. The characteristic length-scale of this smoothing is determined by Bayesian methods. Within this algorithm the uncertainties owing to both the intrinsic distribution of galaxy shapes and galaxy shape estimation are carried through to the final mass reconstruction, and the mass within arbitrarily shaped apertures can be calculated with corresponding uncertainties. We apply this method to two clusters taken from n-body simulations using mock observations corresponding to Keck LRIS and mosaicked Hubble Space Telescope (HST) WFPC2 fields. We demonstrate that the Bayesian choice of smoothing length is sensible and that masses within apertures (including one on a filamentary structure) are reliable, provided the field of view is not too small. We apply the method to data taken on the cluster MS1054-03 using the Keck LRIS and HST, finding results in agreement with previous work; we also present reconstructions with optimal smoothing lengths, and mass estimates that do not rely on any assumptions of circular symmetry. The code used in this work (LENSENT2) is available from the web.

1 Introduction

Weak lensing studies of clusters of galaxies are an important complement to X-ray, Sunyaev–Zel'dovich effect and optical observations, allowing the projected distribution of mass to be investigated without any dynamical assumptions. The reconstruction of cluster mass distributions from weak gravitational lensing data is now well established; it has been shown that the projected density distribution can be recovered from magnification data, in the form of background galaxy number densities (Broadhurst, Taylor & Peacock 1995; Dye & Taylor 1998), or from shear data, the net statistical distortion of the images of background galaxies (Tyson, Valdes & Wenk 1990; Kaiser & Squires 1993; Schneider & Seitz 1995; Squires & Kaiser 1996).

We focus here on shear data, primarily because of its greater abundance; the likelihood function for shear data is also better understood (Section 2). Schneider, King & Erben (2000) discuss the use of the two types of weak gravitational lensing data.

Reconstruction methods using shear data fall into two classes: direct and iterative inverse methods. The direct methods are based on the pioneering work of Kaiser & Squires (1993, hereafter KS93); many improvements have since been made to the original algorithm (Bartelmann 1995; Kaiser 1995; Schneider & Seitz 1995; Squires & Kaiser 1996). In all of these methods the galaxy shape data have to be smoothed before they are input to the algorithm; the smoothing length is a parameter that is left undetermined. The class of iterative methods aims to find the mass or projected gravitational potential map that best fits the data (Squires & Kaiser 1996; Bartelmann et al. 1996; Seitz, Schneider & Bartelmann 1998). These methods are well suited to irregularly shaped observations, since they do not suffer from edge effects in the same way as the direct methods; however, they need to be regularized in some way to prevent overfitting the data, and it remains unclear how best to determine the resolution of either the data bins or the reconstruction grid.

In two other papers (Bridle et al. 1998, hereafter Paper I; Bridle et al. 2002, hereafter Paper II) we presented a maximum-entropy inverse method for reconstructing the mass distribution in clusters using shear and/or magnification data. In this paper we extend our method to give a fuller Bayesian analysis. As noted by other authors (Seitz et al. 1998), it would be desirable to work with each background galaxy shape individually, rather than binning or smoothing the data. This issue, together with the problem of the angular resolution of the reconstruction, is addressed by our extended algorithm. We apply our improved method to both realistic synthetic data, and previously published data for the high-redshift cluster MS1054-03. As with any Bayesian analysis, the aim is to derive and interpret the full posterior probability distribution of the quantity being inferred (in this case the mass distribution and any associated parameters). This approach will provide us not just with a mapping procedure, but also with a valuable insight into the quality of the data itself.

The method is reviewed and further developed in Section 2, and is applied to simulated data in Section 3. Section 4 contains the results of our method applied to the well-documented cluster MS1054-03, and gives a brief comparison with the previously published work. Our conclusions are presented in Section 5.

2 Method

The basis of the weak lensing reconstruction method described here is essentially that of Paper I; this section presents several developments in the algorithm and its implementation.

A trial mass distribution graphic is used to generate a predicted reduced shear field g(θ) through the convolution (Kaiser & Squires 1993, Paper I)  
formula
1
where the convergence graphic and graphic is a factor dependent on the lens and source redshifts.

By design the lensing convolution kernel D is a complex quantity that picks out the two types of lensing distortion g1= Re(g) and g2= Im(g). Unbiased estimates of these components of reduced shear are given by the ensemble average of the background galaxy image ellipticity parameters ε1 and ε2 (Schramm & Kayser 1995).

As in Papers I and II, we aim to reconstruct the projected mass density of the lens defined on a grid of square pixels, where the observing region occupies a smaller area within this grid. This allows for the fact that the mass outside the observed field affects the shear data inside. It has been noted (Seitz et al. 1998) that reconstructing the projected lensing potential allows a purely local estimate of the mass distribution to be derived, by numerical differentiation of the potential. This last step involves throwing away a small amount of information, which describes the mass distribution outside the observing field. Although, as Seitz et al. point out, this information is limited, we feel it is as well to try and include it for completeness. In most cases, the cluster being studied will lie completely within the observing field and the two reconstruction approaches should produce indistinguishable results; it is then a matter of taste as to which quantity is inferred. Since here we are interested in the masses of clusters, we choose to reconstruct the surface mass density directly, leading to simply estimated projected masses with well-understood derived uncertainties.

2.1 Using individual galaxy shapes

In Paper I, the predicted reduced shear was compared with measured galaxy ellipticities averaged over coarse grid cells. Following Seitz et al. (1998), we prefer to use each galaxy shape individually, as independent estimators of the reduced shear. This procedure removes the potential problem of the bin boundaries affecting the inferred mass distribution, allows for optimal angular resolution in the reconstruction, and leaves the data in as pure a form as possible. The reconstruction grid pixel size is chosen to have approximately one galaxy per pixel, leading to comparable numbers of data points and fitted parameters. However, each data point has a very low signal-to-noise ratio, indicating that the number of parameters should be reduced in some way – this issue is addressed in the next section. The convolution of equation (1) is performed using fast Fourier transforms, and the resulting reduced shear field is interpolated on to the background galaxy positions.

Each of the 2N lensed ellipticity components εj of the N measured background galaxy images are taken as having been drawn independently from a Gaussian distribution with mean gj and variance σintrinsic2; here gj is the true value of the jth component of reduced shear at the position of the galaxy. We can then write the likelihood function as  
formula
2
where χ2 is the usual misfit statistic  
formula
3
and the normalization factor is  
formula
4
The effect of errors introduced by the galaxy shape estimation procedure have been included by adding them in quadrature to the intrinsic ellipticity dispersion (Hoekstra, Franx & Kuijken 2000),  
formula
5
This approximation rests on the assumption that both the shape estimation error and the unlensed ellipticity distributions are fitted well by Gaussians, and that the applied reduced shear is not too large. We follow Schneider et al. (2000) and correct the width of the ellipticity distributions by a factor of (1−|g|2) to account for the non-linearity in the lensing transformation (equation 12 below). We are concerned here with subcritical clusters for which this correction factor is small; in principle, the likelihood may be refined to include other effects as well. In practice, we find that this particular correction makes little difference to the reconstructions.

2.2 The ICF and Bayesian evidence

Our inferences of the distribution of Σ in the cluster are based on the posterior probability distribution given by Bayes' theorem:  
formula
6
In Paper I, an entropic prior Pr(Σ) was introduced for this positive additive distribution; maximization of Pr (Σ| data then reduces to minimization of the function F2/2 −αS, where S is the entropy function for the distribution. At this point the method is essentially an entropy-regularized maximum-likelihood technique, similar to that published elsewhere by Seitz et al. (1998).

This approach contains an implicit assumption that the values of Σ are uncorrelated. However, we expect clusters of galaxies to have smooth, extended projected mass distributions, and wish to include this knowledge in our analysis. The intrinsic correlation function (ICF) formalism (Gull 1989; Robinson 1992) allows us to do exactly that; the physical distribution Σ is expressed as the convolution of a ‘hidden’ distribution with a broad kernel (the ICF). In this way the smoothing, which is always necessary at some stage when using such noisy data, is transferred from the data to the reconstruction process itself. The large number of free parameters in the model (the hidden pixel values) is effectively reduced by this smoothing to a number appropriate to the quality of the data. In this way the properties of the noise can be carried through in a calculable, if non-linear, fashion. Seitz et al. (1998) incorporate smoothing in their reconstruction scheme, but in an iterative way, making error estimation non-trivial.

We now consider the form of the ICF; its parametrization introduces new degrees of freedom into the problem. A Bayesian analysis should allow the data to dictate the most suitable ICF, as follows. We expect the most important parameter of the ICF to be its width. For a given functional form (e.g. a circularly symmetric Gaussian), depending on a single width parameter w, equation (6) reads  
formula
7
The width parameter could be chosen to maximize its own posterior probability Pr (w| data) this distribution would certainly be a useful tool in assessing the relative merits of different ICF widths. Bayes' theorem again gives  
formula
8
Usually, the typical angular scale of the cluster is known to within at least an order of magnitude so that a uniform prior for w is appropriate; since Pr(data) is a constant, Pr(data |w) may be used directly to infer w. This value can be obtained during the reconstruction process by numerically evaluating the normalizing factor in equation (7),  
formula
9
This integral is known as the ‘evidence’; Sivia (1996) and MacKay (1992) give an explanation of its application in data analysis. The evidence provides an objective discriminator between ICF widths w, and, indeed, any other parameters we might choose to include in the reconstruction process. Comparison of the evidence calculated for different functional forms of the ICF allows the merits of different smoothing kernels to be evaluated (see Section 3.4 below). Indeed, the regularization parameter α (Paper I) is also determined by maximizing the evidence with respect to α (Gull 1989). Parameters such as α and w may be viewed as ‘nuisance’ parameters, and marginalized over. When the evidence is sharply peaked at some value, this marginalization is approximately equivalent to using the peak value (MacKay 1992).

Interpolation of a fine grid of predicted reduced shear values on to the galaxy positions retains the potential to obtain high angular resolution reconstructions; inclusion of an ICF effectively reduces the number of independent pixels to one more appropriate to the quality of the data at hand. An increase in the number of pixels in the working grids and the inclusion of an extra convolution calls for a faster numerical algorithm than that used in Papers I and II; we have utilized the commercially available software memsys4, developed by Maximum Entropy Data Consultants Ltd.1 This code is widely used in the image processing community and has been proven to be highly stable; details of the numerical algorithms can be found in Gull & Skilling (1990).

2.3 Quantitative mass mapping

A side-effect of smoothing data prior to an inversion, or of incorporating an intrinsic correlation function as described above, is the introduction of pronounced correlations between the errors on each reconstruction pixel value. However, calculation of the (Gaussian approximation to the) full covariance matrix of the errors on the Σ distribution (Paper I) successfully accounts for these correlations when calculating integrals over the reconstruction. This is of particular interest for the direct estimation of the total projected mass within an aperture from the reconstruction map. Information on the shape of the aperture is contained within a vector of weights ci. The constant ci is equal to zero if the ith pixel lies completely outside the aperture; otherwise ci=Ai, where Ai is the area of the ith pixel (in square parsecs at the cluster) lying within the aperture. We then approximate the integral by a weighted sum of pixel values to produce the mass estimate (M±σM), where  
formula
10
and  
formula
11
Here Vij is the covariance matrix of the reconstruction errors in each pixel (Paper I). Whilst a parametrized fit (Schneider et al. 2000; King & Schneider 2002) may be more physically motivated, this mass estimation procedure provides a quantitative result that can be used to guide further analysis. The aperture can be any shape, and so can be tailored to match the investigation at hand. Also, calculation of a realistic error on the mass estimates allows, within the Gaussian approximation, estimation of the significance of features in the maps, without recourse to the peaks analysis or resampling methods advocated elsewhere (van Waerbeke 2000; Erben et al. 2000; Hoekstra et al. 2002).

3 Application to simulated data

To demonstrate the method outlined in the previous section we now apply it to two simulated clusters, taken from the sample generated by Eke, Navarro & Frenk (1998). The naming of the clusters is retained from that paper, in which a flat Universe dominated by a cosmological constant was assumed. The same cosmological parameters (Ωm=0.3,ΩΛ=0.7) were used to calculate the critical density and angular diameter distances needed in the lensing analysis below. The Hubble constant is taken as 100 h km s−1 Mpc−1.

3.1 The mass distributions

The two projected mass distributions are shown in Fig. 1. The first, CL10, is at a redshift of 0.2, and has an X-ray emission-weighted temperature of 4.0 keV; the second, CL08, is at redshift 0.78 with temperature of 2.1 keV. Neither cluster is extremely massive, having approximate virial masses of 6 and 1 × 1014h−1 M, respectively. Overlaid on these plots are the observing regions corresponding approximately to the field of view of the Keck Low Resolution Imaging Spectrograph (for CL10, see e.g. Clowe et al. 2000) and an Hubble Space Telescope (HST) mosaic comprising two WFPC2 pointings (for CL08). Also plotted are the apertures used to estimate projected masses for the cluster components. For CL10, these are circles of radius 0.2 h−1 Mpc ( ∼ 87 arcsec) and 0.1 h−1 Mpc (∼ 43 arcsec) centred on the large and small subclumps, respectively, and are referred to as apertures 1 and 2. Aperture 3 is the quadrilateral region between the subclumps. For CL08 a single aperture is defined, being a circle of radius 0.25 h−1 Mpc (∼ 48 arcsec). Neither observing region is particularly large, and the smaller clump in CL10 is marginally outside the observing field. These clusters were deliberately chosen to contain ‘interesting’ substructure, in order to illustrate the angular resolution of the reconstruction method.

1

Projected mass distribution of two simulated clusters. Left: CL10, a massive cluster at redshift 0.2. Right: CL08, a smaller cluster at redshift 0.78. The grey-scale is Σ/M pc−2, the contours are spaced in steps of 500 h M pc−2 for CL10 and 300 h M pc−2 for CL08. Also marked are the apertures used for mass estimation from the reconstructed maps (dotted) and the mock observing region (dashed).

1

Projected mass distribution of two simulated clusters. Left: CL10, a massive cluster at redshift 0.2. Right: CL08, a smaller cluster at redshift 0.78. The grey-scale is Σ/M pc−2, the contours are spaced in steps of 500 h M pc−2 for CL10 and 300 h M pc−2 for CL08. Also marked are the apertures used for mass estimation from the reconstructed maps (dotted) and the mock observing region (dashed).

3.2 Simulated lensing data

Mock galaxy ellipticity catalogues were generated using the CL10 and CL08 mass distributions as follows. The parameters of the observations described in Clowe et al. (2000) and Hoekstra et al. (2000) were used to estimate the background galaxy number densities obtainable in 2-h observation with either the Keck LRIS or HST. The median galaxy redshift was estimated from the Hubble Deep Field photometric redshift catalogue (Fernandez-Soto, Lanzetta & Yahil 1999), and then used to calculate an approximate value of Σcrit. Systematic effects owing to the unknown redshift distribution of background galaxies (Fischer & Tyson 1997) are not considered here. The projected mass distributions of Fig. 1 were then converted to convergence and equation (1) was applied to produce maps of the components of reduced shear g on the same grid. These maps were then interpolated on to galaxy positions drawn at random from within the observing field. The components of the intrinsic ellipticity of the sources, εsi, were then drawn from a Gaussian distribution of width σintrinisic=0.25 and transformed to their lensed counterparts using the relation (Seitz & Schneider 1997)  
formula
12
where the asterisk denotes complex conjugation. Uncertainties introduced in the estimation of galaxy shapes were included by adding Gaussian noise with σobs=0.15. This is a reasonably pessimistic approach, with only the fainter galaxies detected having errors of this size associated with their ellipticities (Hoekstra et al. 2000; Bacon et al. 2001; Bridle et al., 2001). The limiting magnitudes quoted should therefore be taken as those down to which image shape measurements could be made to this accuracy. The parameters of the mock observations are given in Table 1.
1

Properties of the simulated 2-h observations of Section 3 .

1

Properties of the simulated 2-h observations of Section 3 .

3.3 Results

Reconstructions were performed for a range of Gaussian ICFs with varying FWHM w; the mass distributions were defined on 64 × 64 pixel grids. Each reconstruction, including aperture integrated mass estimation, required approximately 2 min of CPU time on a single R10000 processor of a Silicon Graphics Origin 200.

3.3.1 CL10

Reconstructions with ICF widths of 20 and 70 arcsec are shown in the top two panels of Fig. 2. The grey-scale is plotted with the same limits as used for the true mass distribution of Fig. 1. For comparison, we also show results for which the mock data was first smoothed, using the same Gaussian ICFs as the smoothing kernel, and then inverted directly using the algorithm of Kaiser & Squires (1993) to obtain a convergence map; results are plotted in the lower panel of the same figure. Here the contours are simply of the surface density, obtained by scaling the convergence by the relevant value of Σcrit; the grey-scale is again the same as in Fig. 1. The grid on which the Fourier transforms were performed was padded with zeros outside the observing region to allow a more direct comparison with our method. At each smoothing scale the maps generated by the two methods contain recognizably similar structures; Fig. 2 illustrates the way in which the smoothing has been moved from the data to the reconstruction. Features that differ from the true mass distribution are similar in both reconstructions, and are caused by the noise realization.

2

Top: reconstructed mass distributions for the cluster CL10. Left to right, the ICF width parameter w increases from 20 to 70 arcsec. Bottom: KS93 direct inversions for comparison; the shear data were smoothed with a Gaussian of FWHM equal to w in each case. In all plots the contours show surface density in steps of 500 h M pc−2 . The maximum on the density scale corresponds to a convergence of 0.61.

2

Top: reconstructed mass distributions for the cluster CL10. Left to right, the ICF width parameter w increases from 20 to 70 arcsec. Bottom: KS93 direct inversions for comparison; the shear data were smoothed with a Gaussian of FWHM equal to w in each case. In all plots the contours show surface density in steps of 500 h M pc−2 . The maximum on the density scale corresponds to a convergence of 0.61.

Compared with the KS93 results, the maximum-entropy solutions are preferable as they are maps of inferred physical mass (so are necessarily positive), in which the noise on each data point has been translated to inferred uncertainties in the maps. At low values of the ICF width the high noise in the shear data acts to break up the lensing signal, leading to false apparent cluster substructure at small scales. The presence of many low-level spurious features is also a result of this overfitting of the data. Note that the lensing signal is not diluted by moving to higher values of w, in contrast with the effect of data smoothing in the KS93 process.

The two maximum-entropy reconstructions in Fig. 2 are taken from the posterior probability distribution Pr(w| data) as described in Section 2 this distribution is proportional to the numerically evaluated evidence and is shown in Fig. 3. The shape of this graph is typical. The reduction in probability at large ICF widths is because the data are poorly fitted by the overly smooth mass distribution. At the other extreme, small ICF widths are strongly disfavoured as they effectively increase the number of free parameters in the fit; this is the ‘Occam's razor' factor, which arises naturally from Bayesian model selection analysis (e.g. MacKay 1992) and also corresponds to the intuition that one should not over- or undersmooth data. The map corresponding to the maximum of Pr(w|data) represents a ‘map of believable features’, and occurs at w=70 arcsec, shown in the right-hand panel of Fig. 2. The contours of this reconstruction trace the two main mass condensations, and suggest the presence of a bridge of mass between them; no other significant features are visible. In the absence of further information concerning the cluster this map represents the most probable mass distribution, given the data and our choice of the functional form of the ICF. The eye is very sensitive to the high-resolution detail of the main peak in Fig. 2; however, the evidence is sensitive to the entire mass distribution, which is smooth on larger angular scales.

3

Pr( w |data) for the CL10 analysis. The logarithmic scale corresponds to the joined points, while the solid bars are on a linear scale.

3

Pr( w |data) for the CL10 analysis. The logarithmic scale corresponds to the joined points, while the solid bars are on a linear scale.

The projected mass within each of the three apertures shown in Fig. 1 was calculated for the preferred 70-arcsec ICF width. This process was carried out on reconstructions from 100 galaxy catalogues with the same observational parameters as in Table 1 but with different galaxy positions and ellipticities. The results are shown in the histograms of Fig. 4, with statistics from these measurements given in Table 2. The distributions are satisfyingly symmetrical about the true values; their widths are in very good agreement with the mean of the individual 1σ errors calculated from equation (10), shown by the solid error bars. This demonstrates that the noise present in the galaxy shape data has been carried through successfully to the inferred quantities.

4

Mass estimates for CL10. In each panel a histogram of mass estimates from the reconstruction maps from 100 realizations of the background galaxy population are plotted. The dotted line marks the true value. The point shows the mean mass estimate; the error bar is the mean inferred error, not the standard deviation of the histogram. The apertures used are shown in Fig. 1 . Top: circle of radius 0.2 h−1 Mpc centred on the main cluster; middle: circle of radius 0.1 h−1 Mpc centred on the subclump; bottom: quadrilateral region between the two main mass clumps.

4

Mass estimates for CL10. In each panel a histogram of mass estimates from the reconstruction maps from 100 realizations of the background galaxy population are plotted. The dotted line marks the true value. The point shows the mean mass estimate; the error bar is the mean inferred error, not the standard deviation of the histogram. The apertures used are shown in Fig. 1 . Top: circle of radius 0.2 h−1 Mpc centred on the main cluster; middle: circle of radius 0.1 h−1 Mpc centred on the subclump; bottom: quadrilateral region between the two main mass clumps.

2

Mass estimates for CL10.

2

Mass estimates for CL10.

We note that if there is no error caused by the estimation of the galaxy shapes (i.e. σobs=0) then the error bar on the mean inferred mass is reduced by approximately 10 per cent, which corresponds roughly to the change in the combined ellipticity error σ.

We can take this analysis one step further and calculate a mass profile around the larger subclump; this is shown in Fig. 5. The mass estimates can be seen to be accurate over a reasonable range of angular scales, with a slight overestimation as the apertures extend further outside the observing region.

5

Mass profile for the largest subclump of cluster CL10. The points shows the mean estimated mass within that radius while the error bar is the inferred 1σ error (averaged over 100 noise realizations). The shaded area shows the 1σ dispersion in the mass estimates over the 100 realizations. The dotted line marks the true profile. 0.4 h−1 Mpc corresponds to 170 arcsec.

5

Mass profile for the largest subclump of cluster CL10. The points shows the mean estimated mass within that radius while the error bar is the inferred 1σ error (averaged over 100 noise realizations). The shaded area shows the 1σ dispersion in the mass estimates over the 100 realizations. The dotted line marks the true profile. 0.4 h−1 Mpc corresponds to 170 arcsec.

Our analysis has at no point attempted to account for the ‘mass sheet degeneracy’ (Falco, Gorenstein & Shapiro 1985; Schneider & Seitz 1995). If the observing field is sufficiently large then the entropic prior acts to pin down the reconstruction at the edge of this region (Paper II) constraining any mass sheet transformation to be small. This control of the mass sheet degeneracy is the outcome of our choice of a low default model value to be used in the cross-entropy function (see Paper II). The value used in all reconstructions in this work was 100 h M pc−2; a lower value was found to leave the reconstruction maps and mass estimates unaffected. However, significantly increased model values gave masses overestimated by some tens of per cent, with mass sheets visibly present in the reconstructions. With the default model set suitably low, the residual effect of the mass sheet degeneracy on the mass estimates in any given reconstruction is small compared with the uncertainties owing to the noise realization; this can be seen by comparing the widths of the histograms to the ensemble-averaged error bars in Fig. 4.

Interestingly, the higher-resolution maps give mass estimates that are systematically below the true values: such maps provide closer fits to the noise in the data, which breaks up the coherent lensing signal, leading to an underestimate of the total mass present. This also illustrates the way in which the value of w preferred by the evidence is determined by the fit across the whole image, not just at the peaks of the mass distribution.

Although our mass estimation procedure is simplistic, it does produce sensible and accurate results, even for mass condensations located on the edges of the observing region. The filament-like structure lying between the subclumps of CL10, which was hinted at in the reconstruction maps of Fig. 2, was successfully detected in that region; its mass was measured with an uncertainty of ∼20 per cent. The same aperture was translated to the northeast by approximately 200 arcsec, to a region containing just 0.12 × 1014h−1 M. When the mass estimation analysis was repeated, this value was also recovered to within the mean inferred error of ∼40 per cent.

3.3.2 CL08

With a smaller data set and lower peak convergence, this simulation presents a more difficult problem. The most probable Gaussian ICF width is found to be 50 arcsec and the corresponding reconstruction is shown in Fig. 6. The substructure in the cluster has been smoothed over, and the peak density is underestimated by a factor of 2. Although a higher-resolution reconstruction, which is also shown in Fig. 6, does suggest substructure, this particular noise realization does not enable the fine detail of the true cluster mass distribution to be faithfully recovered. As an illustration, 20-arcsec ICF reconstructions of four different noise realizations are shown in Fig. 7. These reconstructions were deliberately selected (from a larger sample of 20) to highlight the extremes of good and bad fortune. The presence of density peaks in the reconstructions is clearly sensitive to the particular noise realization. In contrast, the 50-arcsec reconstructions of the noise realizations of Fig. 7 are all very similar. In practice, there is only ever one available data set; for the set analysed in Fig. 6 there is very little information concerning the two density peaks, but there is a lensing signal from the broader underlying mass distribution. The 50-arcsec reconstruction is the most probable given the data: all of the data are used to infer the global noise properties, which are then reflected in the smooth reconstruction. Given such data we can say only that cluster CL08 is extended in the north–south direction, and that there is a slight suggestion of substructure.

6

Top: reconstructed mass distributions for the cluster CL08. Left to right, the ICF width parameter w increases from 20 to 50 arcsec. Contours show the surface density in steps of 300 h M pc−2 . The maximum on the density scale corresponds to a convergence of 0.38.

6

Top: reconstructed mass distributions for the cluster CL08. Left to right, the ICF width parameter w increases from 20 to 50 arcsec. Contours show the surface density in steps of 300 h M pc−2 . The maximum on the density scale corresponds to a convergence of 0.38.

7

High-resolution ( w =20 arcsec ) reconstructions of CL08, from four different realizations of the background galaxy population.

7

High-resolution ( w =20 arcsec ) reconstructions of CL08, from four different realizations of the background galaxy population.

Despite the apparent poor quality reconstruction, the shear data are still sensitive to the total mass within an aperture: the mass estimation histogram of Fig. 8 shows the total projected mass within 0.25 h−1 Mpc of the cluster centre to be well constrained by the data, with the large error bars (on average 0.2 × 1014h−1 M) comparing reasonably well with the width of the histogram (0.3 × 1014h−1 M). This discrepancy reflects the larger impact of the residual mass sheet degeneracy when smaller observing fields are used. There is a tendency towards overestimation of the total mass as the additive noise in the reconstruction becomes important for this less massive cluster, but this effect is within the estimated error for the aperture considered (mean = 0.92, truth = 0.74 × 1014h−1 M). The effect can be seen more clearly in Fig. 9 (which corresponds to Fig. 5). Note that the edge of the observing region is at 0.3 h−1 Mpc; in such low-signal, high-noise, small observations, extrapolation beyond the immediate vicinity of the cluster is clearly to be done with care. One issue is that of the prior on the ICF width – as the ICF width approaches the size of the observation the effect of the mass sheet degeneracy will increase.

8

Mass estimates from 50-arcsec reconstructions of CL08 (see Fig. 4 ). The aperture, shown in Fig. 1 is a circle of radius 0.25 h−1 Mpc.

8

Mass estimates from 50-arcsec reconstructions of CL08 (see Fig. 4 ). The aperture, shown in Fig. 1 is a circle of radius 0.25 h−1 Mpc.

9

Mass profile for CL08. The points shows the mean estimated mass within that radius while the error bar is the inferred 1σ error (averaged over 100 noise realizations). The shaded area shows the 1σ dispersion in the mass estimates over the 100 realizations. The dotted line marks the true profile. 0.4 h−1 Mpc corresponds to 80 arcsec.

9

Mass profile for CL08. The points shows the mean estimated mass within that radius while the error bar is the inferred 1σ error (averaged over 100 noise realizations). The shaded area shows the 1σ dispersion in the mass estimates over the 100 realizations. The dotted line marks the true profile. 0.4 h−1 Mpc corresponds to 80 arcsec.

3.4 More advanced analyses

The above analysis used a circularly symmetric Gaussian intrinsic correlation function, but there is no a priori reason why this smoothing kernel should give the best results. We also experimented with circular top-hat, exponential and softened isothermal (‘beta’) profiles, and all were found to give significantly lower values of the evidence for a given data set than the Gaussian ICF. The softened isothermal profile, aiming to optimize the fit to the cluster profile, was found to be worse at suppressing the noise in the outer regions of the cluster, while the presence of such broad wings introduced a large systematic overestimation of the mass integrals owing to the mass sheet degeneracy. It is quite possible that the optimal ICF for the weak lensing reconstruction problem is not Gaussian, but our experience with these three alternative functions leads us to expect that any gain in evidence would be marginal, and the reconstruction for a given ICF width would be changed very little.

The argument given above against using the isothermal profile for an ICF suggests the use of more than one ICF at a time, allowing multiple resolution scales in the reconstruction. The reconstruction then consists of a weighted sum of convolutions of hidden images with varying width ICFs. This ‘multiscale maximum-entropy’ method has been applied to a number of problems (Weir 1992; Bontekoe, Koper & Kester 1994; McLachlan, Hobson & Gull, in preparation); it allows high spatial resolution where the data warrant it. However, when multiscale ICFs were applied to the weak lensing problems shown here, very little increase in evidence was found over the single ICF reconstructions, and the inferred mass distributions from the two approaches were indistinguishable. The introduction of another hidden image increases the size of the hypothesis space, introducing extra complexity into the reconstruction, which is not justified by the quality of the data.

4 Application to real data

We now apply our maximum-entropy method to real data. MS1054-03 is a high-redshift (z=0.83) galaxy cluster; X-ray and dynamical measurements suggest that it has a high mass (TX≈ 10 keV, Jeltema et al. 2001, σ≈ 1150 km s−1, van Dokkum 1999). Two sets of weak lensing data have been analysed. Clowe et al. (2000) produced a catalogue of 2723 background galaxies from a single Keck LRIS pointing, with a number density of approximately 50–60 galaxies arcmin−2. They performed a KS93 inversion using a smoothing kernel with a FWHM of approximately 40 arcsec, and found the cluster to be extended in the east–west direction; with a smaller smoothing kernel they find three mass peaks to lower significance. Hoekstra et al. (2000) measured the ellipticities of 2446 galaxy images from a deep HST mosaic consisting of six interlaced WFPC2 fields. They achieved a source density of around 80 arcmin2. They used the maximum probability extension to the KS93 algorithm (Squires & Kaiser 1996) to produce a higher-resolution map (smoothed with a 20-arcsec kernel) showing three distinct mass peaks.

The maximum entropy reconstructions from these data sets are shown in Fig. 10 for two ICF widths, a low value of w equal to that used in the previously published analysis, and the width that maximizes Pr(w|data. All reconstructions were performed using the Gaussian ICF on a 128 × 128 pixel grid. Hoekstra et al. calculate observational uncertainties on their estimated galaxy shape parameters and add them in quadrature to the intrinsic dispersion; we did the same. No corresponding weighting of galaxies is present in the Keck data set.

10

Top: reconstructed mass distributions for the cluster MS1054-03. Top: 40-arcsec (left) and 120-arcsec (right) ICF width reconstructions from the Keck data of Clowe et al. Bottom: 20-arcsec (left) and 80-arcsec (right) ICF width reconstructions from the HST data of Hoekstra et al. The left-hand panels contain maps with angular resolution corresponding to that of the maps already published; the right-hand panels show the maximum evidence reconstructions. Contours show surface density in steps of 300 h M pc−2 . The maps are centred on the brightest cluster galaxy (BCG) position, marked with a cross.

10

Top: reconstructed mass distributions for the cluster MS1054-03. Top: 40-arcsec (left) and 120-arcsec (right) ICF width reconstructions from the Keck data of Clowe et al. Bottom: 20-arcsec (left) and 80-arcsec (right) ICF width reconstructions from the HST data of Hoekstra et al. The left-hand panels contain maps with angular resolution corresponding to that of the maps already published; the right-hand panels show the maximum evidence reconstructions. Contours show surface density in steps of 300 h M pc−2 . The maps are centred on the brightest cluster galaxy (BCG) position, marked with a cross.

Both of the high-resolution maps are qualitatively very similar to those given in the referenced papers (where the data were smoothed with Gaussian kernels of the same width as the chosen low-w ICFs), but are now positivity-constrained. In particular, the three peaks found by Hoekstra et al. (2000) are reproduced along with several other features present towards the edges of their map. The reconstruction from Keck is also similar to that found by Clowe et al. (2000).

We now consider the probability distribution of the ICF width parameter w. This is shown for the two data sets in Fig. 11. In both cases the evidence peaks at a significantly larger value of w than that used in the high-resolution maps. These ‘maps of believable features’ clearly show the absence of significant structure away from the central cluster region, and suggest that the quality of the data is such that the substructure observed in the high-resolution maps of Fig. 10 should be interpreted with caution. A measure of the goodness of fit of the two different resolution maps to the data was obtained by calculating the chi-squared statistics of equation (3), with the summation now running over the galaxies contained within 40-arcsec radii apertures placed over the three candidate subclumps inferred from the HST data. Reduced chi-squared values were calculated by dividing by the number of galaxies in the aperture (≈280) minus the number of pixels in the aperture (≈100); these are given in Table 3.

11

Pr( w |data) for the MS1054-03 analyses. Left: Keck data; right: HST data.

11

Pr( w |data) for the MS1054-03 analyses. Left: Keck data; right: HST data.

3

Reduced chi-squared values for each of the three candidate subclumps in MS1054-03.

3

Reduced chi-squared values for each of the three candidate subclumps in MS1054-03.

It can be seen from this table that there is only a marginal improvement in the fit to the data by decreasing the ICF width; this improvement is heavily outweighed by the ‘Occam's razor' factor present in the Bayesian evidence, which suggests that the extra complexity in the inferred mass distribution introduced by using an ICF width of less than 80 arcsec is not justified by these data alone.

The two data sets are by no means independent noise realizations, since they are both observations of the same background galaxy population. However, the different observing conditions have clearly introduced different galaxy shape measurement errors. The resulting difference in the details of the high-resolution maps of Fig. 10 apparently accords with the conclusions drawn from the probability distribution of the resolution parameter w. However, the differences will also partly be a result of the different weighting of the images in the two data sets.

As stated before, in the absence of any other information concerning the cluster, the maximum evidence map represents the most probable mass distribution given the data; the additional information present in the cluster galaxy light and number density distributions (Hoekstra et al. 2000) and the X-ray surface brightness maps (Donahue et al. 1998; Jeltema et al. 2001) are clearly very important in the detailed interpretation of the weak lensing data, and ideally should be included in a joint analysis.

The projected mass within 0.5 h−1 Mpc (94 arcsec in the cosmology of Section 3) of the brightest cluster galaxy was calculated using the method outlined in Section 2. This was done for both maps given in the lower panels of Fig. 10, and the results compared with that from Hoekstra et al. (2000) in Table 4. Both the high- and the low-resolution mass estimates are consistent with the mass derived, from the tangential shear in circular apertures about the brightest cluster galaxy, by Hoekstra et al. It is reassuring to note that the statistical errors on these mass estimates agree very well with those calculated by aperture mass densitometry elsewhere. The issue of which resolution reconstruction, and so which mass estimate, to prefer may depend on prejudices concerning the likely level of substructure in this cluster. Given no other information concerning MS1054-03 we would conclude that the quality of the data suggest the 70-arcsec resolution map as being the more probable, but that there is strong evidence for substructure in the core of this cluster; in any case the weak lensing data give a projected mass of 1015h−1 M with a statistical error of about 10 per cent.

4

Mass estimates for MS1054-03.

4

Mass estimates for MS1054-03.

5 Conclusions

We have developed a Bayesian analysis, based on the maximum-entropy method of Bridle et al. (1998, 2002), for inferring the distribution of mass in clusters of galaxies from weak lensing shear data. We treat each background galaxy image as a noisy estimator of the reduced shear field of the cluster, retaining all the information concerning both signal and noise and so allowing for the high angular resolution. Use of an ‘intrinsic correlation function’ in the maximum-entropy formalism provides a way of incorporating our prior expectation of clusters as smooth, extended objects, and effectively replaces the data smoothing required by direct reconstruction methods. In contrast to these methods, the lensing signal is not diluted by this process. Moreover, analysis of the posterior probability distribution of the ICF width w, obtained by numerically evaluating the Bayesian evidence, provides an objective way of discriminating between smoothing scales. The map at the peak of this probability distribution was found not to contain any significant spurious peaks, and can be interpreted as the safest conclusion to draw from the data. The higher-resolution maps, although representing an overfit to the data, do contain limited useful information, particularly with respect to substructure in the cluster. The fact that their angular resolution is less favoured by the data quality gives a useful indication of the believability of these features.

Simple mass estimates extracted directly from the mass maps preferred by the evidence were found to be unbiased and accurate to within the estimated errors over a fairly wide range of angular scales; these uncertainties agreed very well with the standard deviation in the mass estimates from 100 different realizations of the background galaxy population. Noisier observations over a smaller field of view were found to give mass estimates with a slight bias towards overestimation – an effect understood in terms of the additivity of the reconstructed distribution and the mass sheet degeneracy. Inspection of the variety of structure in reconstructions from these different noise realizations justified the cautious interpretation of the high-resolution maps, indicating that this analysis provides a useful way of understanding the noise properties of the data.

We have applied our method to two galaxy shape data sets for the high-redshift cluster MS1054-03, one derived from a ground-based observation and the other from an HST mosaic. In both cases the features found in previously published maps, obtained by both direct and inverse methods, are reproduced, but with the added desirable features of being positivity-constrained and quantitatively useful. Simple mass estimates extracted directly from the reconstruction agree well with values found by aperture mass densitometry, as do the errors estimated by both methods.

The principal function of parameter-free mass maps as produced by this method is to provide the equivalent of a mass telescope, allowing images to be generated to aid further, more quantitative analysis. Parametrized fitting to shear data with physical motivation has been performed elsewhere (Schneider et al. 2000; King & Schneider 2001). We believe that the information provided by our variable angular resolution analysis is a helpful guide to this process, whilst also providing reasonably accurate ‘mass photometry’. The use of the Bayesian evidence in such fitting will be addressed in future papers.

Acknowledgments

We thank Henk Hoekstra and Douglas Clowe and their co-workers for supplying us with their data, and Vincent Eke for providing the simulated projected mass distributions. Thanks are also due to Andy Taylor, Richard Saunders and Charles McLachlan for helpful discussions, and to Anton Garret for proofreading the manuscript. We also thank the anonymous referee for comments leading to improvements in the paper. PJM acknowledges the PPARC for support in the form of a Research Studentship. The memsys4code was developed by Maximum Entropy Data Consultants Ltd, and is available for use by the astronomical community under the name ‘LENSENT2'. This can be obtained from

References

Bacon
D.J.
Refregier
A.
Clowe
D.
Ellis
R.S.
,
2001
,
MNRAS
 ,
325
,
1065
Bartelmann
M.
,
1995
,
A&A
 ,
303
,
643
Bartelmann
M.
Narayan
R.
Seitz
S.
Schneider
P.
,
1996
,
ApJ
 ,
464
,
L115
Bontekoe
Tj. R.
Koper
E.
Kester
D.J.M.
,
1994
,
A&A
 ,
284
,
1037
Bridle
S.L.
Hobson
M.P.
Lasenby
A.N.
Saunders
R.D.E.
,
1998
,
MNRAS
 ,
299
,
895
(Paper I)
Bridle
S.L.
Gull
S.F.
Bardeau
S.
Kneib
J.-P.
,
2001
, in
Natarajan
P.
, ed.,
Proc. Yale Cosmology Workshop
,
The Shapes of Galaxies and Their Dark Halos
 . World Scientific, Singapore
Bridle
S.L.
Hobson
M.P.
Marshall
P.J.
Saunders
R.D.E.
Lasenby
A.N.
,
2002
,
MNRAS
 , submitted (astro-ph/0010387)(Paper II)
Broadhurst
T.
Taylor
A.
Peacock
J.
,
1995
,
ApJ
 ,
438
,
49
Clowe
D.
Luppino
G.A.
Kaiser
N.
Gioia
I.
,
2000
,
ApJ
 ,
539
,
540
Donahue
M.
Voit
G.M.
Gioia
I.
Luppino
G.A.
Hughes
J.P.
Stocke
J.T.
,
1998
,
ApJ
 ,
502
,
550
Dye
S.
Taylor
A.
,
1998
,
MNRAS
 ,
300
,
L23
Eke
V.R.
Navarro
J.F.
Frenk
C.S.
,
1998
,
ApJ
 ,
503
,
569
Erben
Th.
Van Waerbeke
L.
Mellier
Y.
Schneider
P.
Cuillandre
J.-C.
Castander
F.J.
Dantel-Fort
M.
,
2000
,
A&A
 ,
355
,
23
Falco
E.E.
Gorenstein
M.V.
Shapiro
I.I.
,
1985
,
ApJ
 ,
289
,
L1
Fernandez-Soto
A.
Lanzetta
K.M.
Yahil
A.
,
1999
,
ApJ
 ,
513
,
34
Fischer
P.
Tyson
J.A.
,
1997
,
AJ
 ,
114
,
1
Gull
S.F.
,
1989
, in
Skilling
J.
, ed.,
Maximum Entropy and Bayesian Methods
 .
Kluwer
, Dordrecht, p.
53
Gull
S.F.
Skilling
J.
,
1990
,
The memsys5 Users' Manual
 . Maximum Entropy Data Consultants Ltd, document available from
Hoekstra
H.
Franx
M.
Kuijken
K.
,
2000
,
ApJ
 ,
532
,
88
Hoekstra
H.
Franx
M.
Kuijken
K.
Van Dokkum
P.G.
,
2002
,
MNRAS
 ,
333
,
911
Jeltema
T.
Canizares
C.R.
Bautz
M.W.
Malm
M.R.
Donahue
M.
Garmire
G.P.
,
2001
,
ApJ
 ,
562
,
144
Kaiser
N.
,
1995
,
ApJ
 ,
439
,
L1
Kaiser
N.
Squires
G.
,
1993
,
ApJ
 ,
404
,
441
(KS93)
King
L.J.
Schneider
P.
,
2001
,
A&A
 ,
369
,
1
MacKay
D.J.C.
,
1992
,
Neural Comput.
 ,
4
,
415
Robinson
D.
,
1992
,
PhD thesis
 , Univ. Cambridge
Schneider
P.
Seitz
C.
,
1995
,
A&A
 ,
294
,
411
Schneider
P.
King
L.
Erben
Th.
,
2000
,
A&A
 ,
353
,
41
Schramm
T.
Kayser
R.
,
1995
,
A&A
 ,
289
,
L5
Seitz
C.
Schneider
P.
,
1997
,
A&A
 ,
318
,
687
Seitz
S.
Schneider
P.
Bartelmann
M.
,
1998
,
A&A
 ,
337
,
325
Sivia
D.S.
,
1996
,
Data Analysis – a Bayesian Tutorial
 .
Oxford Univ. Press
, Oxford
Squires
G.
Kaiser
N.
,
1996
,
ApJ
 ,
473
,
65
Tyson
J.A.
Valdes
F.
Wenk
R.A.
,
1990
,
ApJ
 ,
349
,
L1
Van Dokkum
P.
,
1999
,
PhD thesis,
 Univ. Groningen
Van Waerbeke
L.
,
2000
,
MNRAS
 ,
313
,
524
Weir
N.
,
1992
, in
Worrall
D.M.
Biemesderfer
C.
Barnes
J.
, eds,
ASP Conf. Ser. Vol. 25
 ,
Astronomical Data Analysis Software and Systems I Astron
 .
Soc. Pac.
, San Francisco, p.
186

Footnotes

Maximum Entropy Data Consultants, Ltd, South Hill, 42 Southgate Street, Bury St Edmonds, Suffolk IP33 2AZ.