## Abstract

We present the results of the Gravitational LEnsing Accuracy Testing 2008 (GREAT08) Challenge, a blind analysis challenge to infer weak gravitational lensing shear distortions from images. The primary goal was to stimulate new ideas by presenting the problem to researchers outside the shear measurement community. Six GREAT08 Team methods were presented at the launch of the Challenge and five additional groups submitted results during the 6-month competition. Participants analyzed 30 million simulated galaxies with a range in signal-to-noise ratio, point spread function ellipticity, galaxy size and galaxy type. The large quantity of simulations allowed shear measurement methods to be assessed at a level of accuracy suitable for currently planned future cosmic shear observations for the first time. Different methods perform well in different parts of simulation parameter space and come close to the target level of accuracy in several of these. A number of fresh ideas have emerged as a result of the Challenge including a re-examination of the process of combining information from different galaxies, which reduces the dependence on realistic galaxy modelling. The image simulations will become increasingly sophisticated in future GREAT Challenges, meanwhile the GREAT08 simulations remain as a benchmark for additional developments in shear measurement algorithms.

## 1 INTRODUCTION

A clump of matter induces a curvature in space–time which causes the trajectory of a light ray to appear bent. This effect, known as gravitational lensing, is analogous to light passing through a sheet of glass of varying thickness such as a bathroom window. In both cases, the light-emitting objects appear distorted. Making assumptions about the intrinsic (original) shapes of the emitting objects allows us to infer information about the intervening material. In cosmology, we learn about the distribution of matter by studying the shapes of distant galaxies. In the vast majority of cases, the distortion varies very little as a function of position on the galaxy image, and it can be approximated by a matrix distortion. This regime is known as weak gravitational lensing or cosmic shear when applied to large numbers of randomly selected distant galaxies.

Gravitational attraction of ordinary matter and dark matter is expected to slow the expansion of the Universe, causing the expansion to decelerate. However, multiple lines of evidence now show that the present-day expansion of the Universe seems instead to be accelerating. The main explanations explored in the literature are that (i) Einstein's cosmological constant is non-zero, (ii) the vacuum energy is small but non-negligible, (iii) the Universe is filled with some new fluid, dubbed dark energy or (iv) the laws of general relativity are wrong at large distances. Possibilities (i) and (ii) can be subsumed within item (iii) because they look like a dark energy fluid with equation of state *p*=*w*ρ*c*^{2}, where *w*=−1. To find out more about the nature of dark energy or modifications to the law of gravity, we need high-precision measurements of the recent (*z* < 1) Universe.

By studying cosmic shear using galaxies at a range of different epochs, we can learn how the dark matter clumps as a function of time, which itself depends on the nature of dark energy and the laws of gravity. Cosmic shear appears to hold the most potential of all methods for investigating the dark energy or modifications to gravity (Albrecht et al. 2006; Peacock et al. 2006; Albrecht & Bernstein 2007; Albrecht et al. 2009). There are many current, planned and proposed surveys to use cosmic shear to measure dark energy including the Canada–France–Hawaii Telescope Legacy Survey,1 the KIlo-Degree Survey, Panoramic Survey Telescope and Rapid Response System,2 the Dark Energy Survey,3 the Large Synoptic Survey Telescope,4 and space missions Euclid5 and the Joint Dark Energy Mission.6

Cosmic shear was first detected just one decade ago (Bacon, Refregier & Ellis 2000; Kaiser, Wilson & Luppino 2000; van Waerbeke et al. 2000; Wittman et al. 2000), and many studies have now used it to measure cosmological parameters. Much work has also been carried out on anticipating any problems that may limit the potential of cosmic shear over the coming decade. These are thought to be (i) accuracy of approximate methods for obtaining distances to galaxies, (ii) intrinsic alignments of galaxies, (iii) accuracy of numerical predictions of dark matter clustering on small scales and in the presence of baryons and (iv) unbiased measurement of shear from galaxy images. There is now much discussion about obtaining high-quality galaxy distances using spectroscopic redshifts to calibrate approximate methods to solve (i) (Ma, Hu & Huterer 2005; Huterer et al. 2006a; Bernstein & Ma 2008; Kitching, Taylor & Heavens 2008c; Bernstein & Huterer 2010). The intrinsic alignment signal (ii) can be removed if (i) can be solved perfectly (Takada & White 2004; Joachimi & Schneider 2008) and otherwise the two are closely linked (Heymans & Heavens 2003; King & Schneider 2003; King 2005; Bridle & King 2007; Zhang 2008; Bernstein 2009; Joachimi & Schneider 2009). Supercomputers are being deployed to produce higher accuracy predictions, and methods for suppressing information from the uncertain small-scale regime have been developed. In this paper we focus on the final problem, shear measurement from noisy images. It can be phrased entirely as a statistic problem of extracting information from images.

In 2004, the Shear TEsting Programme (STEP) was launched to assess the current status of shear measurement methods. It began with a blind challenge set by and for the weak lensing community (Heymans et al. 2006, hereafter STEP1). A large volume of images containing a mixture of stars and simple galaxies were produced. The participants had the task of extracting the (constant) input shear from the images, and these estimates were compared to the true input value. These end-to-end simulations showed that the shear measurement problem is far from trivial but that the methods in frequent use at that time were sufficiently accurate for the existing published cosmic shear measurements. Massey et al. (2007, hereafter STEP2) extended this work with more sophisticated galaxy models, and built statistical devices into larger simulations to improve the measurement precision. This showed that, even considering realistic and more complex galaxy morphologies, existing methods were still sufficient for the current data.

The cosmic shear community then began to look ahead to the coming decade of surveys and ask whether the existing methods are sufficiently accurate even when the statistical uncertainties are reduced by the massive increase in data quantity. Addressing this question requires much larger blind challenges, containing at least tens of millions of galaxies. At the same time, it was recognized that the shear estimation problem can be phrased as a statistical problem and that experts in image analysis from other disciplines may be in a position to contribute significantly to developing new approaches. Furthermore, it was decided that the strengths and weaknesses of different methods could be best assessed with slightly simpler simulations, in which various effects could be isolated.

The previous two published blind shear analysis challenges (STEP1, STEP2) were slightly simplified relative to real data in that the shear and the point spread function (PSF) did not vary across an image. However, they did ask participants to grapple with a number of difficult issues.

The images had relatively realistic PSFs with classical optical aberrations such as coma and trefoil.

Although the PSF did not vary across an image, participants were asked not to use this fact.

STEP1 required participants to determine which objects were stars and therefore could be used for a PSF determination.

Both challenges required participants to run object detection software to determine where the star and galaxies were. Spuriously detected objects could and did affect the shear.

Galaxies were drawn from a range of magnitudes, so that weighting schemes as a function of the signal-to-noise ratio (SNR) were important.

Galaxies were randomly placed, so that sometimes they overlapped. Participants were responsible for either deblending or rejecting these galaxies.

The Gravitational LEnsing Accuracy Testing 2008 (GREAT08) Challenge removes *all* of these issues to focus on the core problem of inferring shear given a PSF and standardized set of non-overlapping galaxies at (approximately) known positions. The motivation is that once this problem is solved, the other issues will be introduced in further challenges of increasing complexity.

The GREAT08 Challenge Handbook (Bridle et al. 2009, hereafter The GREAT08 Handbook) describes the shear measurement problem for non-cosmologists and sets out the challenge. GREAT08 was launched in October 2008 and ran as a blind competition for 6 months until the end of 2009 April. This paper describes the results of GREAT08. Section 2 describes the GREAT08 simulations. We review the shear measurement problem and shear accuracy requirements in Section 3. Section 4 summarizes current shear measurement methods and Section 5 presents the Challenge results. We conclude and overview the potential for future GREAT Challenges in Section 6. We provide extra details of the simulations, methods and results in appendices.

## 2 THE GREAT08 SIMULATIONS

The GREAT08 images are provided in sets of 10 000 objects in a single fits file. Each object is generated on its own grid of 39 × 39 pixels, and these postage stamps are patched together for convenience in a 100 × 100 layout, with a 1 pixel border, thus each set is a patchwork image of 4000 × 4000 pixels. Each galaxy postage stamp is generated using the following sequence: (i) simulate a galaxy model; (ii) convolve it with a kernel, referred to as the PSF; (iii) bin up the light in pixels and (iv) apply the noise model. The PSFs used are given in Appendix A1. Each postage stamp is produced using a list of parameters specifying the individual object and simulation properties. We describe the catalogues of these properties in Appendix A2. The method used to produce images from the catalogues is overviewed below and described in more detail in Appendix A3. Example images are shown in Fig. 1.

Four different groups of galaxy images were provided in GREAT08: (i) low-noise galaxy images for which the true shears were provided during the Challenge, labelled LowNoise_Known; (ii) low-noise galaxy images for which there was a blind challenge to extract the true shears, labelled LowNoise_Blind; (iii) realistic noise galaxy images for which the true shears were provided, labelled RealNoise_Known, and (iv) realistic noise galaxy images with blind shear values, RealNoise_Blind. This RealNoise_Blind group formed the main GREAT08 Challenge. These are described in more detail in the GREAT08 Handbook, together with the rules governing which information could be used to inform the blind challenges.

The parameters for each set in LowNoise_Known were determined using the upper panel of Fig. 2 and Table 1. There are 15 sets (fits images) each containing 10 000 galaxies. There are five sets with each of three different galaxy size values. The method for setting the galaxy sizes and SNR values is described in Appendices A2 and A3.

Fiducial | Lower value | Upper value | |

SNR | 200 | N/A | N/A |

R_{gp}/R_{p} | 1.4 | 1.22 | 1.6 |

PSF type | Fid | N/A | N/A |

Galaxy type | b or d | N/A | N/A |

Fiducial | Lower value | Upper value | |

SNR | 200 | N/A | N/A |

R_{gp}/R_{p} | 1.4 | 1.22 | 1.6 |

PSF type | Fid | N/A | N/A |

Galaxy type | b or d | N/A | N/A |

*Note.**R*_{gp}/*R*_{p} is the ratio of PSF convolved galaxy full width at half-maximum (FWHM) to the PSF FWHM. ‘b or d’ describes the fact that 50 per cent of the galaxies in each set have de Vaucouleurs profiles (bulge only) and 50 per cent have exponential profiles (disc only). The parameters for LowNoise_Blind are the same except the galaxies are a mix of the two components as described in the text. The parameters for RealNoise_Known are the same as for LowNoise_Known except the SNR is 20.

The parameters for each set in RealNoise_Blind were determined using the lower panel of Fig. 2 and Table 2. There is a range in SNR, galaxy size, PSF ellipticity and galaxy type. One branch of the RealNoise_Blind holds all parameters at their fiducial values. Each of the four variable parameters has a ‘lower’ and an ‘upper’ value relative to the fiducial. When each of these values is used, all other parameters are fixed at the fiducial values. This makes nine different branches in total. In each branch, there are six realizations of each of 50 different shear values, making 2700 sets with 10 000 galaxies in each.

Fiducial | Lower value | Upper value | |

SNR | 20 | 10 | 40 |

R_{gp}/R_{p} | 1.4 | 1.22 | 1.6 |

PSF type | Fid | Fid rotated | Fid e× 2 |

Galaxy type | b+d | b or d | b+d offcentre |

Fiducial | Lower value | Upper value | |

SNR | 20 | 10 | 40 |

R_{gp}/R_{p} | 1.4 | 1.22 | 1.6 |

PSF type | Fid | Fid rotated | Fid e× 2 |

Galaxy type | b+d | b or d | b+d offcentre |

Images are generated by sampling from the galaxy light distribution, sampling from the PSF, adding the sample positions to simulate convolution, binning the samples on to a pixel grid and then applying the noise model. The exact numerical techniques used are detailed in Appendix A3. In brief, samples are first generated from the circular galaxy profile. Next, they are stretched to have the required ellipticity and then sheared. Samples are then drawn from the circular PSF distribution and made elliptical using the shear distortion equations given in Appendix A3. The *x* and *y* coordinate values of each galaxy sample are added to those of the PSF sample, to simulate convolution, and finally the samples are binned into pixels.

## 3 FIGURE OF MERIT

The shear measurement problem was summarized for non-cosmologists in the GREAT08 Handbook. In short, light from a source galaxy is sheared and (slightly) magnified by passing through a gravitational potential on its way to the observer; the observable anisotropic stretching is called the *reduced shear g*, which is a pseudo-vector with two components. (Because the distinction between shear and reduced shear is not important in the context of this paper, which is aimed at both the astronomical and statistical communities, we refer to *g* as simply ‘shear’ for convenience.)

Shear measurements are confounded by several unavoidable observational effects. First, for ground-based telescopes, when the light passes through the atmosphere it is convolved with a kernel that must be inferred from the data. Secondly, telescope optics (whether in space or on the ground) also cause the image to be convolved with a kernel; this kernel may be more predictable than the atmospheric kernel because the optics may be well modelled. In any case, the effective kernel imposed by atmosphere *and* optics is referred to as the PSF. Thirdly, emission from the sky causes a roughly constant ‘background’ level to be added to the whole image. Fourthly, the detectors sum the light falling in each pixel, effectively convolving the image with a square top-hat window function and sampling the resulting image at the centre of each pixel. This extra convolution effect is treated by some authors as part of the PSF. Fifthly, the finite number of photons collected in a given pixel is subject to Poisson noise (in addition, the final detector readout adds Gaussian noise of zero mean, but this is ignored in GREAT08).

Thus, a successful method must both filter the noise effectively and remove the significant PSF convolution kernel in the observed galaxy image. To represent a method's ability to perform both tasks in a single number for the GREAT08 Challenge, we define a quality metric

where*g*is the

^{m}_{ij}*i*th component of the measured shear for simulation

*j*,

*g*is the corresponding true shear component, the inner angle brackets denote an average over sets with similar shear value and observing conditions

^{t}_{ij}*j*∈

*k*and the outer angle brackets denote an average over simulations with different true shears

*k*, observing conditions

*l*and shear components

*i*.

In our detailed discussion of the results below, we also define a *Q* value for each simulation branch. In this case, the average over different observing conditions *l* is omitted

*Q*to differentiate between methods.

To set this metric in context, if a single constant value of zero shear were submitted (*g*^{m}_{1j}=*g*^{m}_{2j}= 0 for all *j*) then, since the rms true shear would have a value ∼0.1. To date, methods tested in STEP1 and STEP2 and used on real data have *Q*∼ 10–100 (Kitching et al. 2008b), which is sufficient for the surveys on which they were employed but not sufficient for mid-term to far future surveys.

Amara & Réfrégier (2008) show that a deep full-sky (e.g. Euclid-like) survey requires that the additive error *c* < 0.0003 and the multiplicative error *m* < 0.001. For a pure additive error, this translates to a requirement that *Q* > 1000 and we set this as our target for GREAT08 because additive errors are much more difficult to self-calibrate using pairs of tomographic redshift bins (Huterer et al. 2006b, see also Van Waerbeke et al. 2006). A detailed analysis of the two separate terms is given in Appendix C.

As defined, *Q* penalizes deviations from truth regardless of whether they are random or systematic. This is useful for selecting a winner, but much can be learned by separating errors into random and systematic parts. For the systematic part, we follow STEP1 and STEP2 by defining a multiplicative error *m* and an additive error *c* as the best-fitting parameters to

*m*=〈

*m*〉

_{i}_{i},

*c*=〈

*c*〉

_{i}_{i}. For a given method, changes in

*m*and

*c*across simulation branches may indicate the strengths and weaknesses of the method.

Participants may optionally submit uncertainty estimates on their shears. These are compared to the residuals of the submitted shears over sets of simulations with nearly identical true shear values. If the uncertainty estimates are wrong by more than a factor of two, the submission is flagged as such but is not penalized. The main purpose of GREAT08 is to produce a high *Q* value rather than to yield correct uncertainty estimates.

A method is not useful if it obtains very small shear biases at the expense of throwing away most of the information and thus very noisy shear estimates. The quality factor *Q* will be worse if a method has very noisy shear estimates because the rms difference between the truth and submission will be non-negligible even if the biases are zero. We therefore calculate the scatter of the submitted shear values about the best linear fit to the true shears. Specifically, we plot submitted *g*_{1} values as a function of true *g*_{1}, with one point for each fits file and fit the straight line described above. We find the rms residual to obtain the scatter σ_{1} in the first component *g*_{1}. We repeat for *g*_{2} and write σ≡〈σ_{i}〉_{i} averaging over the two shear components *i*. See Kitching et al. (2008b) for additional discussion.

## 4 METHODS

In this section, we briefly summarize the algorithm used by each submitting group. Table 3 lists the participants, their methods and the corresponding identifiers used in subsequent tables and in the figure legends. Methods with an asterisk indicate GREAT08 Team entries; these participants had access to the internal details of the GREAT08 Challenge simulations, but they did not consciously use this information in their analyses. Entries from PG and MV had some overlap with the GREAT08 Team. Not all submitting groups submitted results for both types of blind simulation. An additional table (Table B1) in Appendix B gives further information including URLs where more information can be found.

Participant(s) | Key | Action 1 | Action 2 | Action 3 |

Hosseini, Bethge | HB | Estimate power spectrum | Average power spectra | Fit sheared circular model * PSF |

Lewis | AL | Estimate centroids | Average images | Fit sheared circular model * PSF |

Kitching | TK^{†} | Fit sheared circular model * PSF | Combine shear PDFs | Calculate shear |

Heymans | CH^{†} | Measure weighted quadrupole moments | Correct for weight and PSF | Average shear estimates |

Paulin, Gentile | PG | Fit sheared circular model * PSF | Average shear estimates | |

Velander | MV | Fit flexed sheared circular model * PSF | Average shear estimates | |

Kuijken | KK^{†} | Fitsheared circular model * PSF | Average shear estimates | |

Harmeling, Hirsch, Schölkopf | HHS3 | Estimate centroids | Average good images | Fitsheared circular model * PSF |

Bridle | SB^{†} | Fitsheared circular model * PSF | Average shear estimates | |

Harmeling, Hirsch, Schölkopf | HHS2 | Estimate centroids | Average images | Fitsheared circula model * PSF |

Harmeling, Hirsch, Schölkopf | HHS1 | Fitsheared circular Gaussian | Correct for model and PSF | Average shear estimates |

Jarvis | MJ^{†} | Fit ‘elliptical’ model * PSF | Average shear estimates | |

Bridle, Schrabback | USQM^{†} | Measure quadrupole moments – PSF | Average quadrupole moments | Calculate shear |

Participant(s) | Key | Action 1 | Action 2 | Action 3 |

Hosseini, Bethge | HB | Estimate power spectrum | Average power spectra | Fit sheared circular model * PSF |

Lewis | AL | Estimate centroids | Average images | Fit sheared circular model * PSF |

Kitching | TK^{†} | Fit sheared circular model * PSF | Combine shear PDFs | Calculate shear |

Heymans | CH^{†} | Measure weighted quadrupole moments | Correct for weight and PSF | Average shear estimates |

Paulin, Gentile | PG | Fit sheared circular model * PSF | Average shear estimates | |

Velander | MV | Fit flexed sheared circular model * PSF | Average shear estimates | |

Kuijken | KK^{†} | Fitsheared circular model * PSF | Average shear estimates | |

Harmeling, Hirsch, Schölkopf | HHS3 | Estimate centroids | Average good images | Fitsheared circular model * PSF |

Bridle | SB^{†} | Fitsheared circular model * PSF | Average shear estimates | |

Harmeling, Hirsch, Schölkopf | HHS2 | Estimate centroids | Average images | Fitsheared circula model * PSF |

Harmeling, Hirsch, Schölkopf | HHS1 | Fitsheared circular Gaussian | Correct for model and PSF | Average shear estimates |

Jarvis | MJ^{†} | Fit ‘elliptical’ model * PSF | Average shear estimates | |

Bridle, Schrabback | USQM^{†} | Measure quadrupole moments – PSF | Average quadrupole moments | Calculate shear |

*Note.*‘* PSF’ indicates that a PSF convolved model was fitted. ‘PDF’ stands for probability density function. Daggers after the Key indicate GREAT08 Team entries. More information is provided in the main text and in Appendix B.

Key | Method name | Language | Runtime (s/galaxy) | URLs |

HB | CVN Fourier | Matlab | 1 | http://great08challenge.pbworks.com/f/Great-Challenge.zip |

AL | CLT KK99 | f90 | 0.4 | http://cosmologist.info/utils/StackedShear.zip |

TK | lensfit | C | 0.08 | http://www.physics.ox.ac.uk/lensfit/ |

http://www-astro.physics.ox.ac.uk/~tdk/files/lensfit.tar.gz | ||||

CH | KSBf90 | f90 | 0.005 | http://www.roe.ac.uk/~heymans/KSBf90 |

PG | gfit | Python | 0.2 | |

MV | KKshapelets with flexion | f77, f95 | 0.03 | |

KK | KKshapelets | f77 | 0.03 | http://www.strw.leidenuniv.nl/~kuijken/shear-shapelets.html |

HHS | Gauss and variants | Python | 0.05 | |

SB | im2shape | C | 0.02 | http://www.sarahbridle.net/im2shape/great08_im2shape_v1.0.tar.gz |

MJ | BJ02 deconvolved shapelets | C++ | 0.08 | http://www.hep.upenn.edu/~mjarvis/great08/v1.tar.gz |

USQM | Unweighted stacked quadrupole moments | Matlab | 0.001 | http://www.sarahbridle.net/usqm_v1.0.tar.gz |

Key | Method name | Language | Runtime (s/galaxy) | URLs |

HB | CVN Fourier | Matlab | 1 | http://great08challenge.pbworks.com/f/Great-Challenge.zip |

AL | CLT KK99 | f90 | 0.4 | http://cosmologist.info/utils/StackedShear.zip |

TK | lensfit | C | 0.08 | http://www.physics.ox.ac.uk/lensfit/ |

http://www-astro.physics.ox.ac.uk/~tdk/files/lensfit.tar.gz | ||||

CH | KSBf90 | f90 | 0.005 | http://www.roe.ac.uk/~heymans/KSBf90 |

PG | gfit | Python | 0.2 | |

MV | KKshapelets with flexion | f77, f95 | 0.03 | |

KK | KKshapelets | f77 | 0.03 | http://www.strw.leidenuniv.nl/~kuijken/shear-shapelets.html |

HHS | Gauss and variants | Python | 0.05 | |

SB | im2shape | C | 0.02 | http://www.sarahbridle.net/im2shape/great08_im2shape_v1.0.tar.gz |

MJ | BJ02 deconvolved shapelets | C++ | 0.08 | http://www.hep.upenn.edu/~mjarvis/great08/v1.tar.gz |

USQM | Unweighted stacked quadrupole moments | Matlab | 0.001 | http://www.sarahbridle.net/usqm_v1.0.tar.gz |

For a quick overview, we attempt to summarize each method with just three action steps in Table 3. We see that a key differentiating factor is the stage at which an average is performed over galaxies in the image. HB, Antony Lewis (AL) and USQM as ‘stacking’ methods hereafter. The two different routes are illustrated in Fig. 3.

STEP2 classified methods according to their methods for PSF correction and construction of a shear estimator. PSF ‘deconvolution’ methods convolve a model with the PSF before fitting as indicated by ‘* PSF’ in the table; PSF ‘subtraction’ methods subtract a contribution due to the size and ellipticity of the PSF. ‘Active’ shear measurement methods sheared a ‘circular’ galaxy model until it best matched the data, generally indicated by the word ‘fit’ in the action list; ‘passive’ methods constructed a shear estimator from a combination of shape statistics and an estimate of how these would further change under a shear. This classification system proved insufficient to capture the more varied behaviour of methods containing new ideas in GREAT08. We next summarize each method in turn, in the order of decreasing *Q* value on RealNoise_Blind.

*HB.* The magnitude of the Fourier transform of the galaxy image raised to an arbitrary power is a characteristic feature of the individual galaxies. This feature is independent of the spatial location of the galaxy centre to a high precision, provided that the smoothed galaxy intensity decays sufficiently fast towards the edge of the image. No other assumptions are necessary. Because the galaxy images are contaminated by Poisson noise, an unbiased estimator of the power spectrum is given by the power spectrum of the noisy image minus a constant. The resulting image obtained by averaging over the unbiased estimators of the individual galaxy power spectra is an elliptically contoured function multiplied by the power spectrum of the convolution kernel plus Gaussian noise. After suitable normalization, the square root of the covariance matrix of the elliptically contoured function is equal to the shear coordinate transformation matrix. For parameter fitting, HB used a weighted non-linear least-squares method for which the weights are equal to the inverse of the standard deviation of the noise. For more information, see Hosseini & Bethge (2009).

*AL.* This method was inspired by Kuijken (1999) and is described in Lewis (2009). The centroid of each galaxy is estimated by performing a simple parametrized fit and all galaxies in an fits image are stacked on top of one another using these centroids. This is done at higher resolution than the image pixels themselves by dividing each pixel into subpixels and aligning on to a stacking grid using the pixel nearest the centroid. A PSF-convolved elliptical profile is fitted to this stacked image, and the ellipticity corresponds to the shear. As pointed out in Lewis (2009), the advantage of this approach is that the individual non-elliptical shapes of individual galaxies are averaged out. This fact was taken advantage of in HB, HHS2 and HHS3.

*TK.* The lensfit code fits a sum of co-elliptical exponential and de Vaucouleurs models to each individual galaxy and the best-fitting ellipticity is found. The bulge (de Vaucouleurs component) to disc (exponential component) fraction is a free parameter in the fit. The shear is calculated using a Bayesian estimator. For more details, see Appendix F of the GREAT08 Handbook and also Miller et al. (2007) and Kitching et al. (2008b). The version used here differs from the previously published implementations by including subpixel estimation of galaxy positions and adaptive ellipticity grid refinement.

*CH.* An implementation of the longstanding KSB (Kaiser, Squires & Broadhurst 1995) method, which is the most widely used code on observational data. For more information, see Appendix C of the GREAT08 Handbook.

*PG.* For each galaxy, a six-parameter Sersic model is convolved with the PSF and pixellated. This is fitted to the image through χ^{2} minimization using the gradient-expansion algorithm by Levenberg-Marquardt. The six fitted parameters are: the centroid (two parameters); the magnitude; the size and the ellipticity (two parameters). The estimated shear of an individual galaxy is derived from its fitted parameters and the averaged shear over a number of galaxies is the average of individual shears.

*MV.* This method is an extension of the KK method described below. It is being developed with the aim of measuring higher order galaxy image distortions, known as flexion, as well as shear. These higher order distortions add important detail to the measurement of galaxy halo density profiles and to dark matter mapping. For more information on this method see Velander & Kuijken (in preparation), and for further detail on flexion see Bacon et al. (2006).

*KK.* Each individual galaxy is modelled as a sheared, circular source described by means of the first-order shear operators in shapelet space. The PSF is also modelled as a high-order shapelet expansion, and all convolutions are carried out in shapelet space using the prescriptions in Refregier (2003). For further information, see Kuijken (2006) and appendix D of the GREAT08 Handbook.

*HHS1/HHS2/HHS3.* In *HHS1*, an elliptical Gaussian is fitted to each galaxy image by minimizing the mean-squared error via gradient descent in the six model parameters. As in *SB*, the average ellipticity is taken as an estimate for the shear. Due to the simplified galaxy model and the PSF blur a systematic bias is introduced, which is corrected for by offsetting the ellipticity values and via calibration using the training data. The methods *HHS2* and *HHS3* aim to be more robust by adopting the idea of *AL* to stack all galaxy images within one fits file on a subpixel scale in order to increase the SNR. In addition, in *HHS3* corrupted images were removed before stacking.

*SB.* The im2shape code models each individual galaxy as a sum of co-elliptical Gaussians. The parameters are marginalized using Markov Chain Monte Carlo sampling and the mean ellipticity of the samples is taken to correspond to the shear. For computational speed, only 16 × 16 pixels in the centre of each postage stamp were used in the fit. See appendix E of the GREAT08 Handbook and Bridle et al. (2002).

*MJ*. This algorithm seeks a coordinate system in which a model of the galaxy is found to be round. The model is convolved by the PSF and then compared to the observed pixel intensities. A shapelet decomposition is used for the underlying model, and ‘roundness’ is defined as the (sheared) second-order shapelet coefficients being 0. Then, the shear that brings this coordinate system back to the actual observation is assigned as the shape of the galaxy. For more information, see Bernstein & Jarvis (2002), Nakajima & Bernstein (2007) and appendix D of the GREAT08 Handbook.

*USQM.* This is a very simple method, not actually used in practice, but provided as a baseline comparison. The unweighted quadrupole moments of each galaxy are calculated within a square aperture of 20 pixels by 20 pixels. These are averaged (stacked) over all galaxies in each fits image and the PSF is removed by subtracting the PSF quadrupole moments. See Appendix B of the GREAT08 Handbook for more information.

In terms of the nomenclature introduced in STEP2, most of the methods forward fit an elliptical PSF convolved model (‘active’, ‘deconvolution’). This is in contrast to the situation in STEP1 and STEP2 where the majority of the methods were ‘passive’ PSF subtraction methods. There were no stacking methods in STEP1 or STEP2.

The use of correlation functions to measure shear was suggested by van Waerbeke et al. (1997). Another method in the literature makes use of a relation between shear and derivatives of the surface brightness field (Zhang 2008). This can be calibrated to remove noise and interpolation can be used to overcome pixelization (Zhang 2009).

## 5 RESULTS

There were two blind challenges: LowNoise_Blind contains high-SNR images and RealNoise_Blind contains images with a realistic noise level. The GREAT08 Challenge prize for highest *Q* value is based on the RealNoise_Blind results. The LowNoise_Blind competition contained significantly less data and should have been an easier challenge. Furthermore, the galaxy properties in LowNoise_Blind were similar to those in RealNoise_Blind and are mostly co-centred bulge plus disc models. It could therefore have been useful to optimize some properties of methods on the LowNoise_Blind images in preparation for RealNoise_Blind. First, we examine the LowNoise_Blind results.

### 5.1 LowNoise_Blind results

Table 4 shows the LowNoise_Blind leaderboard at the close of the challenge. The winner in LowNoise_Blind is the Gauss method of S. Harmeling, M. Hirsch and B. Schölkopf. The top three methods in LowNoise_Blind are not GREAT08 Team methods. Note that HB did not submit a result for LowNoise_Blind.

Rank | ID | Method | Q |

1 | HHS1 | Gauss | 488 |

2 | AL | CLT KK99 | 375 |

3 | PG | gfit | 136 |

4 | TK | Lensfit | 33.7 |

5 | CH | KSBf90 | 32.4 |

6 | MV | KKshapelets with flexion | 21.2 |

7 | MJ | BJ02 deconvolved shapelets | 20.2 |

8 | KK | KKshapelets | 19.7 |

9 | SB | im2shape | 15.3 |

10 | USQM | USQM | 1.84 |

Rank | ID | Method | Q |

1 | HHS1 | Gauss | 488 |

2 | AL | CLT KK99 | 375 |

3 | PG | gfit | 136 |

4 | TK | Lensfit | 33.7 |

5 | CH | KSBf90 | 32.4 |

6 | MV | KKshapelets with flexion | 21.2 |

7 | MJ | BJ02 deconvolved shapelets | 20.2 |

8 | KK | KKshapelets | 19.7 |

9 | SB | im2shape | 15.3 |

10 | USQM | USQM | 1.84 |

Fig. 4 shows our shear measurement figure of merit *Q* as a function of the ratio between the convolved galaxy size and the PSF size, *R*_{gp}/*R*_{p}. Since the number of galaxies decreases steeply as a function of galaxy size in real data, it is desirable to have a shear measurement method that allows the use of small galaxies. It is often assumed that shear measurement biases are larger for small galaxies. There are some examples where this is true in STEP2 fig. 7, and Nakajima & Bernstein (2007) fig. 5. However, the shear biases are caused by a combination of two effects: a poorly measured PSF and inherent biases that exist even if the PSF is perfectly known. It is expected that an incorrect PSF model will affect small galaxies the most, since for the largest galaxies the PSF has little effect (e.g. equation 13 of Paulin-Henriksson et al. 2008). In GREAT08, the exact PSF equation is known and if this information is properly used then the results will tell us about the inherent biases, for which there are less clear expectations.

HHS1 (dashed magenta line in Fig. 4) is the clear winner overall in LowNoise_Blind and wins at both the fiducial and small galaxy sizes. The implementation of KSB by CH (solid green line in Fig. 4) provided the best performance for highly resolved galaxies. As discussed above, this general trend of increasing *Q* with increasing galaxy size was expected and is followed for many methods. The winning method HHS1 performed worse as the galaxy size increased for LowNoise_Blind. We suggest that the method for calibrating the ellipticities for the PSF blurring was less reliable at large galaxy sizes due to the fact that the large elliptical galaxies sometimes extend beyond the 39 × 39 pixel postage stamp.

Further analysis of the LowNoise_Blind results in terms of multiplicative and additive shear calibration biases can be found in Appendix C4.

### 5.2 RealNoise_Blind results

The main challenge consisted of 27 million galaxies with roughly a factor of 10 more noise per pixel, corresponding to the type of image that we will ultimately want to use for cosmic shear. The RealNoise_Blind leaderboard at the close of the challenge is shown in Table 5. The winner of the GREAT08 Challenge is clearly the ‘CVN Fourier’ method by R. Hosseini and M. Bethge, HB. This method was inspired by the second-place AL method, but improves on a key limitation which was highlighted by Lewis (2009) in that it did not depend on the galaxy centroid.

Rank | Author | Method | Q |

1 | HB | CVN Fourier | 211 |

2 | AL | KK99 | 131 |

3 | TK | Lensfit | 119 |

4 | CH | KSBf90 | 52.3 |

5 | PG | gfit | 32.0 |

6 | MV | KKshapelets with flexion | 28.6 |

7 | KK | KKshapelets | 23.0 |

8 | HHS3 | GaussStackForwardGaussCleaned | 22.4 |

9 | SB | im2shape | 20.1 |

10 | HHS2 | GaussStackForwardGauss | 19.9 |

11 | HHS1 | Gauss | 12.8 |

12 | MJ | BJ02 deconvolved shapelets | 9.80 |

13 | USQM | USQM | 1.22 |

Rank | Author | Method | Q |

1 | HB | CVN Fourier | 211 |

2 | AL | KK99 | 131 |

3 | TK | Lensfit | 119 |

4 | CH | KSBf90 | 52.3 |

5 | PG | gfit | 32.0 |

6 | MV | KKshapelets with flexion | 28.6 |

7 | KK | KKshapelets | 23.0 |

8 | HHS3 | GaussStackForwardGaussCleaned | 22.4 |

9 | SB | im2shape | 20.1 |

10 | HHS2 | GaussStackForwardGauss | 19.9 |

11 | HHS1 | Gauss | 12.8 |

12 | MJ | BJ02 deconvolved shapelets | 9.80 |

13 | USQM | USQM | 1.22 |

Fig. 5 shows *Q* as a function of galaxy type, PSF type, SNR and galaxy size for RealNoise_Blind. The central, fiducial value is the same on each of the four panels. Each point on the panels corresponds to a single set of conditions; for example, for the SNR = 10 points all other parameters are set at the fiducial value.

HB performs consistently well through all branches of the simulation, with significantly improved performance on the ‘b+d offcentre’ galaxies. AL actually outperformed HB on six of the nine simulation branches, and obtains a *Q* value a factor of almost 4 larger than any other method for the fiducial simulation set, which is close to our target value of 1000. AL was second overall mostly as a result of a poor performance on the low-SNR branch, and to a lesser extent on the ‘Fid *e*× 2’ PSF. It would be interesting to see if the results could be improved in either of these regimes, for example with better centroiding at low SNR or better modelling of the ‘Fid *e*× 2’ PSF.

TK uses a model with coaligned exponential and de Vaucouleurs components which explain why the results on ‘b or d’ are so good. It also does well on ‘b+d offcentre’. If the galaxy model could be extended, then this may improve the other results, which all use the fiducial galaxy type. KK also performs well on the ‘b or d’ branch and, to a lesser extent, so does SB. Both these methods also assume that galaxies have elliptical isophotes, which matches exactly the model in the simulation.

The best method at the high-SNR end of RealNoise_Blind is MV (KK shapelets with flexion), which also performs well for the larger galaxies. HHS1 on the larger galaxy branch is the only method on any branch to achieve greater than the *Q*∼ 1000 level required for future precision surveys. This trend is surprising given that it reverses the trend with *R*_{gp}/*R*_{p} seen in LowNoise_Blind. It also obtains a good *Q* value at the high-SNR end (SNR = 40) of RealNoise_Blind, which is not surprising given the strong performance in LowNoise_Blind (SNR = 200).

Note that the absolute value of *Q* will depend on the noise on the shear measurements and on the number of realizations over which the average is performed. Therefore, it is not terribly meaningful to compare values between LowNoise and RealNoise; however, the *m* and *c* values can be usefully compared. These values are discussed for RealNoise_Blind in Appendix C5.

## 6 DISCUSSION

The GREAT08 Challenge has moved shear measurement research significantly beyond STEP1 and STEP2. We recognized that the shear measurement problem is intrinsically a statistical, not astronomical, problem and wrote a description addressed at non-astronomers (the GREAT08 Handbook). At the launch of the challenge, we had achieved the following:

We moved from end-to-end simulations to simpler simulations which isolate a key difficult part of the shear measurement problem without confusion from other effects.

The simulations focus in on key areas of simulation parameter space and allow a detailed assessment of the success of different methods in the various regimes explored.

We used a larger suite of simulations to assess methods at a much higher level of precision than was possible in STEP1 and STEP2; this level of precision is appropriate for the most ambitious planned cosmic shear surveys.

The GREAT08 Team was formulated from the original STEP Team and new groups, for example LensFit, were incorporated and assessed as part of the blind competition.

We formulated a new figure of merit with which to assess the results of the challenge and provided active leaderboards during the challenge.

The GREAT08 Team codes were all made publically available at the launch of the challenge.

In addition to the six GREAT08 Team entries on the leaderboards at the start of the challenge, there were five new entries which included computer scientists and non-lensers. The GREAT08 Challenge has therefore achieved its main goal of reaching out beyond the existing shear measurement community.

The GREAT08 Challenge prize for the highest *Q* value in RealNoise_Blind went to Reshad Hosseini and Matthias Bethge (HB). The GREAT08 Team also awarded a prize for a significant contribution to advancing shear measurement methods to AL, specifically for superb results over a significant range of simulation branches, and a timely summary of the problem that highlighted important issues (Lewis 2009). Neither of these prize-winning groups are associated with existing lensing groups.

The shear measurement problem has been invigorated by the Challenge and by the new ideas brought in. The most important new ideas are

a consideration of the impact of the assumed galaxy model on the accuracy of shear measurements and

a reconsideration of the stage in the measurement process at which to average observational quantities.

The assumed galaxy model has recently been shown to be important in causing biases in shear measurement (Lewis 2009; Melchior et al. 2009; Voigt & Bridle 2009). The existence of this bias was first pointed out by Lewis (2009), and this was the motivation for using a ‘stacking’ method by both AL, HB and HHS2/3. In both the methods, the individual galaxy properties are averaged away before a model is fitted, by averaging together simple statistics of the galaxy images. AL pointed out that averaging together the images themselves is not fully independent of the galaxy model, the PSF or the shear because a centroid must be estimated before stacking. HB solved this by instead stacking two-point statistics of the image (specifically the power spectrum), which is insensitive to the centroid. This raises the general question of what quantity should be averaged (or otherwise combined), and at what stage, when presented with many galaxy images all with the same shear value.

The success of the stacking methods on images with constant galaxy properties leads to questions about how well stacking could work on more realistic data. Because shear varies with position in real data, the stacking process will average the shear signal as well as nullify the observation effects it was designed to remove. However, we speculate that the average shear in a patch of sky is still a useful cosmological quantity, as has sometimes been considered (e.g. most recently the top-hat shear variance statistic shown in fig. 5 of Fu et al. 2008) (see also cosmic shear ring statistics described in Schneider & Kilbinger 2007; Eifler, Schneider & Krause 2009). For lensing analyses of clusters or galaxies, the assumption of axisymmetry is often made which lends itself naturally to stacking in annuli about the centre of the cluster. It would also be necessary to determine how to properly stack galaxies with a range of SNR or PSF in a given patch of sky, and especially how to tackle galaxies with a range of redshifts, and thus a range of shears. For example, three-dimensional lensing (Heavens 2003; Kitching et al. 2008a) is specifically designed to take into account the probability distributions in redshift and shear for each galaxy separately.

The results of GREAT08 show that different methods are successful in different corners of parameter space and many results are close to the target *Q* value of 1000. The results from different simulation branches give clues as to where methods could be improved, and we expect to see further work on developing the methods. The winning method HB only finished its first run 2 days before the challenge deadline and therefore it could be optimized further. In addition, it shows remarkably stable performance as a function of SNR, implying that the good *Q* results might continue down to even lower SNR values. On the fiducial simulations, AL achieved a *Q* value nearly four times higher than previous work, marking a significant improvement. The performance at low SNR is the clear next area for investigation for this method. TK obtains good results, in particular when the underlying model was similar to the model in the simulation.

The true shear values and input catalogues were released after the Challenge deadline in 2009 September. In early 2010, a new set of simulations were released called the GREAT08 Reloaded. These simulations are the same as GREAT08 RealNoise_Blind in all respects except that different random number seeds were used. There is no Live Leaderboard for this simulation set but a *Q* calculator is available on the web site for individual use. Submissions to this web site are being logged, and a summary of activity will be published close to the GREAT10 launch date, at which point the GREAT08 Reloaded will close.

GREAT08 marks the first in a series of GREAT challenges, which are intended to be a roadmap of simulations leading up to the real grand observational challenges that the community will face with the next generation of cosmic shear surveys. The next challenge in the series will be GREAT10. This will represent the next step towards creating fully realistic simulations. Many aspects of the GREAT10 simulation will be familiar from GREAT08, though they will differ in some key aspects. The most significant change will be spatial variation: both the shear and PSF will vary across each image. GREAT10 will also invite people to solve an extra cosmic shear challenge, estimating the convolution kernel from images to sufficient accuracy. For more information on GREAT10, visit http://www.great10challenge.info.

We thank the P*ASCA*L Network for support. We thank the GREAT08 Team and participants at the GREAT08 Mid-Challenge Workshop and GREAT08 Final Workshop including Hakon Dahle, Domenico Marinucci and Uros Seljak. We thank the organizers of Cosmostats09 for hosting the GREAT08 Challenge Final Workshop within Cosmostats09 in Ascona. We thank the Aspen Center for Physics where part of this work was carried out. We are grateful to Jeremy Yates for help with setting up the GREAT08 server. SLB thanks the Royal Society for support in the form of a University Research Fellowship. TDK is supported by STFC Rolling grant RA0888. JR is supported in part by the Jet Propulsion Laboratory, which is run by Caltech under a contract from NASA. MS was supported in part by the programme 11288 provided by NASA through a grant from the STScI, which is operated by the Association of Universities for Research in Astronomy, Inc., under NASA contract NAS 5-26555.

## REFERENCES

*et al.*,

*et al.*,

*et al.*,

## Appendices

### APPENDIX A: DETAILS OF THE IMAGE SIMULATIONS

#### A1 PSF models

In an attempt to isolate problems in the shear estimation pipelines and make the challenge more accessible, we provided maximal information about the PSFs used during the competition.

The PSFs had a truncated Moffat profile

where we set β= 3.5. This profile is motivated by the combination of diffraction limited optics with random Gaussian blurring by the atmosphere and is therefore reasonably representative of PSFs for ground-based telescopes. The scale radius*r*was determined by setting the full width at half-maximum (FWHM) to 2.85 pixels.

_{d}*r*was set to twice the FWHM. Three different PSFs were used in the GREAT08 Challenge, each with a different ellipticity, as shown in Table A1.

_{c}PSF type | File name | e_{1} | e_{2} |

Fid | set0001 | −0.019 | −0.007 |

Fid rotated | set0002 | 0.007 | −0.019 |

Fid e× 2 | set0003 | −0.038 | −0.014 |

PSF type | File name | e_{1} | e_{2} |

Fid | set0001 | −0.019 | −0.007 |

Fid rotated | set0002 | 0.007 | −0.019 |

Fid e× 2 | set0003 | −0.038 | −0.014 |

Star catalogues consisted simply of the position of the point source. The *x* positions were drawn from a Gaussian of standard deviation 1.2 pixels centred on the middle of the postage stamp, similarly for the *y* positions. The star catalogues were provided at the time of the challenge. The convolution kernel and image generation method are described below.

#### A2 Galaxy catalogue generation

The information provided in this appendix was not available during the Challenge.

In general, the galaxies in GREAT08 are the sum of two components, each with a Sersic (Sersic 1968) intensity profile

where*I*(

*r*) is the amount of light per unit area at a radius

*r*and κ≃ 2

*n*− 0.331 (see e.g. Peng et al. 2002). The scale radius

*r*and the total intensity (which determines

_{e}*I*) are free parameters specified in the catalogues. The first component, with

_{o}*n*= 4, is an approximation to the central bulge component of galaxies, corresponding to a de Vaucouleurs profile. The second component, with

*n*= 1, is an approximation to the exponential disc component of galaxies. Circular galaxy images are made according to the profile

*I*(

*r*) described above and then distorted according to the galaxy ellipticity and shear as described below.

Therefore, the objects are circularly symmetric before the intrinsic ellipticity and gravitational shear are applied. More realistic galaxies will be used in future GREAT Challenges. This was investigated in the precursor STEP, for which simple galaxies like those in GREAT08 were used for STEP1 (Heymans et al. 2006) and sophisticated shapelet galaxies based on the *Hubble Deep Field* were used in most of the STEP2 simulations (Massey et al. 2007). However, note that after the intrinsic ellipticity is applied the galaxies no longer have elliptical isophotes. It was shown that this makes the shear measurement problem non-trivial (Lewis 2009) at the kind of accuracy level required in GREAT08 (Voigt & Bridle 2009).

The *x* and *y* positions of the bulge component were each drawn from a Gaussian of standard deviation 1.2 pixels centred on the middle of the postage stamp. By default the positions of the disc component were set equal to those of the bulge, except in one branch of the RealNoise_Blind simulations, as described below (see Table 2).

For each object, the total flux [integral of *I*(*r*) over the postage stamp] in the disc component, as a fraction of the total flux in both components, is in general a random number drawn from a uniform distribution between 0 and 1. However, for LowNoise_Known, RealNoise_Known and one branch of RealNoise_Blind, this fraction was set to *either* 0 *or* 1. So, in these simulations, the galaxies had either a pure de Vaucouleurs or pure exponential profile.

The scale radii *r _{e}* of each component were set by considering high-resolution circular galaxy images after convolution with the appropriate PSF. For single-component models (i.e. when the bulge to total flux is zero or unity),

*r*is set such that the convolved image has an FWHM of 1.4 times that of the PSF,

_{e}*F*

_{gp}= 1.4

*F*

_{p}, in the fiducial branch. Values 1.22 or 1.6 were used for some other branches to explore the effect of galaxy size, as detailed below (see Tables 1 and 2). The resulting

*r*values for single-component models are provided in Table A2. For two-component models, the disc scale radius is a set multiple of the bulge scale radius,

_{e}*r*

_{e,d}= 2

*r*

_{e,b}*

*r*

_{e,d0}/

*r*

_{e,b0}using values from Table A2. The bulge scale radius was set by simulating a high-resolution two-component circular model with the required bulge to total flux ratio and finding the value such that the FWHM had the required value (by default 1.4 times the PSF FWHM).

R_{gp}/R_{p} | Disc r_{e,d0} | Bulge r_{e,b0} |

1.22 | 0.82 | 1.59 |

1.4 | 1.3 | 3.8 |

1.6 | 2.4 | 18.0 |

R_{gp}/R_{p} | Disc r_{e,d0} | Bulge r_{e,b0} |

1.22 | 0.82 | 1.59 |

1.4 | 1.3 | 3.8 |

1.6 | 2.4 | 18.0 |

*Note.* The left-hand column gives the ratio of PSF convolved galaxy FWHM to the PSF FWHM. The middle column gives the scale radius for a single-component disc model. The right-hand column gives the scale radius for a single-component bulge model. These values are interpolated to produce scale radius values for two-component models, as described in the text.

The ellipticities of the bulge and disc were drawn from

with*B*= 0.05,

*C*= 0.58 for the bulge and

*B*= 0.19,

*C*= 0.58 for the disc; ε≡ (

*a*

^{2}−

*b*

^{2})/(

*a*

^{2}+

*b*

^{2}), where

*a*and

*b*are the major and minor axes, respectively. Since ellipticities close to unity become unphysical, we truncate the distribution at ε= 0.9 and set all objects with ε > 0.9 to have ε= 0.9. This distribution was loosely motivated by results from the APM survey (Crittenden et al. 2001). The bulge and disc ellipticities are drawn independently from the above distributions and are thus uncorrelated. The angle between the bulge major axis and the positive

*x*-axis is drawn from a uniform distribution between 0° and 180°. The disc angle is equal to the bulge angle but perturbed by a Gaussian of standard deviation 20°.

5000 galaxy parameters were simulated per image set by drawing from the above distributions. To minimize noise, the parameters were all rotated by 90° to produce the remaining 5000 galaxy parameters (i.e. all angles are increased by 90°, *x* positions become *y* positions and *y* positions become negative *x* positions.) The list was randomized to hide the pairings. This paired rotation was introduced in STEP2 to reduce shape noise. In the absence of a PSF or shear the shear estimates from each galaxy in a pair are expected to cancel, thus removing noise arising from the intrinsic ellipticities of galaxies.

SNRs are assigned in the catalogues and are used during image simulation to set the flux in the galaxy image. For LowNoise images the value is 200, and for RealNoise images the default value is 20, with variations to 10 and 40 within RealNoise_Blind. The definition of this number in terms of the noise model is described in the following section.

For LowNoise_Known and RealNoise_Known, the galaxies all have just a single component and within each set, each galaxy is assigned a de Vaucouleurs or an exponential profile at random. The galaxies in LowNoise_Blind all have a bulge plus disc two-component model as described in the text above. The majority of the galaxies in RealNoise_Blind have the same two-component model as in LowNoise_Blind. One of the nine RealNoise_Blind branches has single-component galaxies as in the Known simulations. The two-component models all share the same centroid for the bulge and disc, except for one of the nine RealNoise_Blind branches in which the bulge is off-centred from the disc by a Gaussian of standard deviation 0.3 pixels.

The true shears for LowNoise_Known and RealNoise_Known were provided throughout the challenge. They are Gaussian distributed with a standard deviation of 0.03 in each of *g*_{1} and *g*_{2}, and zero mean. The true shears for LowNoise_Blind and RealNoise_Blind have now been released and are illustrated in Fig. A1. These shears are perturbations around the root values and , and thus do not have zero mean. This distribution is chosen instead of a Gaussian to improve the uncertainties on linear fits to the output versus true shear. For LowNoise_Blind, one position in shear space is drawn from around each root and there is one set with this shear. For RealNoise_Blind, 50 positions in shear space are drawn from around each root and there are six sets with each shear, as illustrated in Fig. 2.

#### A3 Image simulations

The galaxy images are created according to the *forward process* using a Monte Carlo simulation technique. The general idea is that the intensity of a pixel in the image of a galaxy is directly proportional to the number of photons falling into that pixel. The photon count at each point depends on the intensity distribution (the light profile) of the galaxy. Therefore, if we draw random samples (photons) from the theoretical light profile function and then count the number of photons falling in each pixel, we obtain the image of galaxy with the required light profile. The circular light profile thus obtained is then reshaped by applying the necessary transformations to the coordinates of the photons. Since the PSF can be considered as a probability distribution, a similar method can be used to simulate it. The light profile of the galaxy is convolved with the PSF and finally pixelized into an fits image.

In general, any Monte Carlo technique can be used for the simulation of the light profile. We use inverse transform sampling for this purpose. It is conceptually simple and generally applicable for sampling from a one-dimensional probability distribution. The basic principle is that given a continuous random variable *U* distributed uniformly in [0, 1] and a random variable *X* with cumulative distribution *F*, then *X*=*F*^{−1} (*U*) has distribution *F*. In other words, to sample from *X*, we generate a random sample *U* and find the value of *X* at which the cumulative distribution is equal to *U*.

In order to simulate the photons distributed by a Sersic Law, we need to find the cumulative distribution of the density given by equation (A2). Taking *r _{e}*= 1 and substituting

*R*=

*kr*

^{1/n}, we obtain the cumulative distribution as

*n*is the Sersic index and Γ(

*a*,

*x*) is the incomplete gamma function. The inverse of the distribution can be approximately calculated by using linear interpolation, given that we have an ordered set of values of {

*R*,

*F*(

*R*)} for the range of

*R*(e.g. from 0 to 20).

The circular light profile of the galaxy obtained by the method above is made larger, elliptical and rotated according to the values of scale radius *r _{e}*, axis ratio

*q*and angle φ, respectively. These operations can be represented in the form of matrices as

*c*≡ cos φ and

*s*≡ sin φ.

Having obtained the light profile of the galaxy, we move on to create a Moffat PSF and convolve it with the galaxy. Using a similar procedure to that described above for the Sersic profile, we can simulate Moffat PSF given by equation (A1). Each sample from the PSF corresponds to the displacement of the photon when convolved with the galaxy. The circular galaxy can be scaled to the required FWHM and made elliptical by applying the transformation

Assuming that the number of samples in the light profile and the PSF are the same, the convolution of the image is accomplished by adding the positions of the galaxy and PSF photons. The image is pixelized by counting the number of photons falling into each pixel of the postage stamp and then it is normalized.The galaxy images in GREAT08 contain two different light profiles. The final image is created by adding together two images with different light profiles. If *I*_{1} and *I*_{2} represent two galaxy images with different light profiles, the final image *I*_{final} is created by the equation

*m*≤ 1 is a multiplication factor. Poisson noise is then added to each pixel according to the SNR.

CCD detectors on ground-based telescopes collect a finite number of photons from both astrophysical objects and atmospheric emission. We therefore mimic this effect by adding the background level *B*= 1 × 10^{6} to each pixel, and drawing a number from a Poisson distribution with a mean equal to the total number (background plus galaxy) in each pixel. For numerical convenience, we then subtract *B* from each pixel. For the RealNoise simulations, this background is much larger than the contribution from the galaxy, so this process is closely approximated by adding a Gaussian random number of standard deviation with zero mean.

Before the noise model is applied, the total flux in the galaxy is set using the SNR given in the catalogue, and the background level discussed above. In summary, we define SNR as the flux divided by the uncertainty in the flux obtained if the true shape (but not normalization) of the object is known.

For the purpose of the SNR calculations, we approximate the Poisson noise as a Gaussian of standard deviation for both LowNoise and RealNoise simulations. We follow the definition

where the flux*F*is the sum of the galaxy counts in each pixel

*I*and σ

_{i}_{F}is the uncertainty in the flux. In general, the uncertainty in the flux depends on the assumptions used to measure it. We make the assumption that the true galaxy shape (profile of counts in all the pixels) is known precisely up to an overall unknown scaling which is proportional to the flux. By considering a χ

^{2}fit it can then be shown that and therefore the flux can be set such that

We note that the images produced using the above ellipticities and *r _{e}* values give some very elliptical images that extend beyond the 39 × 39 postage stamp. We do not take into account the fact that there are some objects which do not fit the postage stamp. This was a practical limitation of GREAT08 due to the increased data volume required for larger postage stamps; however, real data also suffer from edge effects when galaxies are large, for example, due to detector defects and objects which are difficult to model. We therefore felt that it was reasonable to include such limitations in the GREAT08 Challenge. Future GREAT Challenges will not isolate the galaxies on to separate postage stamps and therefore will not have this limitation.

### APPENDIX B: ADDITIONAL INFORMATION ON METHODS

At the launch of the challenge, the GREAT08 Team had put six results on the leaderboard, accompanied by a code wikihttp://great08challenge.pbworks.com summarizing the codes used and linking to downloadable versions of the code that were used on the GREAT08 simulations. Over the course of the challenge, this wiki was updated by external GREAT08 participants, several of whom also provided their codes. The key elements of this code wiki are captured in Table B1.

### APPENDIX C: DETAILED ANALYSIS OF RESULTS

#### C1 LowNoise_Blind

The overall performance, as measured by *Q*, has contributions from various competing effects. We break these up into a multiplicative bias *m*, an additive bias *c* and an rms dispersion σ, as defined in Section 3. For each of the three simulation branches in LowNoise_Blind, we fit a straight line to a plot of submitted *g*_{1} versus true *g*_{1} values and identify the slope as (*m*_{1}+ 1) and *c*_{1} as the offset. We repeat for *g*_{2} and average the multiplicative biases together to obtain an overall value *m*, similarly for the additive bias *c*. The scatter σ is given by the standard deviation of the residuals. Note that although the 90° rotations in GREAT08 substantially reduce the effect of shape noise, this would be a large additional contribution to the statistical uncertainty from realistic data, as it roughly adds in quadrature with the statistical scatter (at the level of about 0.2 per galaxy).

The finite number of simulations means that these values cannot be determined exactly. Therefore, we also estimate uncertainties on the fitted multiplicative and additive biases from the submitted shear values. The uncertainty on *m* depends on the shear measurement method used and on the simulation properties. We calculate the uncertainty on the estimated *m _{i}* by calculating the likelihood as a function of

*m*and

_{i}*c*and marginalizing over

_{i}*c*. We then calculate an average uncertainty on

_{i}*m*over shear components

*i*. Uncertainty decreased with increasing galaxy size for most methods, and the winning method HHS1 had one of the smaller uncertainties on

*m*, decreasing from 5 × 10

^{−3}at

*R*

_{gp}/

*R*

_{p}= 1.22 to 2.3 × 10

^{−3}at

*R*

_{gp}/

*R*

_{p}= 1.6. This may be compared to the multiplicative bias values

*m*obtained by different groups, and we see that the uncertainty is small compared to at least one of the values obtained by each group and therefore is not the limiting factor in interpreting these results.

The uncertainties on the additive biases *c*_{1} and *c*_{2} also decrease with increasing galaxy size, as expected. At a given galaxy size, they range over almost an order of magnitude for the different methods. A typically low uncertainty was obtained by HHS1 across the range of galaxy sizes, and it varies from 10^{−4} at *R*_{gp}/*R*_{p}= 1.22 to 3 × 10^{−5} at *R*_{gp}/*R*_{p}= 1.6. Again, this is much smaller than the additive shear biases seen by all groups for at least one galaxy size and is therefore not the limiting factor in obtaining small biases.

Fig. C1 shows the multiplicative bias *m* and additive bias *c* as a function of *R*_{gp}/*R*_{p} for LowNoise_Blind. We now see that HHS1 performs less well at large galaxy sizes due to an increased multiplicative bias, indicating that the shears are overestimated for these galaxy sizes. In the *Q* plot (Fig. 4), the second highest method, AL (blue solid line), does best at the fiducial size and worse at larger and smaller sizes. On the more detailed figures of multiplicative and additive biases we see that the picture seems yet more curious, with good *m* and *c* values (close to zero) at small galaxy sizes, and becoming worse at large sizes. A more detailed analysis shows that the slight improvement at the fiducial galaxy size can be attributed to a partial cancellation between the effects of a negative *m* and a positive *c*. The third best result, PG, is relatively insensitive to the galaxy size; this effect is mirrored in the additive bias, which dominates the overall *Q* result since the multiplicative bias is relatively small.

We see that the CH method acquired a very large positive *m* at small *R*_{gp}/*R*_{p}, indicating a consistent ∼12 per cent overestimation of the true shear when the galaxies are poorly resolved. TK has best performance all round on the fiducial model, and this may be expected because LensFit was optimized to work well on typical galaxies used for cosmic shear, which therefore tends to coincide with the fiducial model used for GREAT08. The fact that MJ, SB and USQM consistently underestimate the shear is the dominant contribution to their poor performance. MV and KK both underestimate the shear at small *R*_{gp}/*R*_{p}, but overestimate the shear at moderate and large *R*_{gp}/*R*_{p}. Note that the MV method is an extension of the KK method, and the two performed very similarly in all the LowNoise_Blind plots. Several methods (KK, MV, USQM, CH) had the largest additive biases *c* for poorly resolved galaxies, which may suggest that the information about the true PSF model was not fully incorporated into their analyses.

As discussed in Section 3, a successful method needs to produce reasonably low-noise shear measurements, which we quantify by the scatter σ, shown in Fig. C1. The scatter decreases as the galaxy size is increased, which is expected as information on the galaxy can be obtained from more image pixels. There is about an order of magnitude difference between the methods, with HHS1 having a consistently low scatter around 10^{−4}. Since there are 10 000 galaxies in each fits file, this corresponds to an uncertainty on the shear of each individual galaxy of 0.01, which is typical for an SNR of 200. For LowNoise_Known, there is only a single fits file for each simulation branch, which means that there is no sum over files *j* in equation (1) (i.e. *j*= 1). So, in the absence of other biases (*m*=*c*= 0) we would have *Q _{l}*∼ 10

^{−4}/σ

^{2}

_{k}, where σ

_{k}is the scatter for a single simulation branch. Therefore, σ

_{k}< 3 × 10

^{−4}is required to reach the target of

*Q*∼ 1000 for a given simulation branch. Some methods have σ

_{l}_{k}∼ 10

^{−3}at the smallest galaxy sizes, which will limit their overall

*Q*to around 100.

#### C2 RealNoise_Blind

Fig. C2 shows the output shear residuals versus the input shear for the top two methods, for the fiducial simulation branch. This figure illustrates how the multiplicative and additive errors are calculated. The total *Q* for a given simulation branch is roughly a combination of the slopes and offsets of each best-fitting line, and the scatter about the lines. The equivalent point for LowNoise_Blind has only five points on it, and the circles are identical to the crosses.

We show the multiplicative and additive biases in Figs C3 and C4. The decreased SNR in RealNoise_Blind is compensated for by averaging over many shear values to reduce the noise and ensure that the quality measures *Q*, *m* and *c* can be dominated by systematic biases.

We first consider overall trends in multiplicative and additive biases (Figs C3 and C4). The ‘psftype’ panels indicate that changes in the PSF had virtually no effect on *m* but quite a large effect on *c*. Incorrect estimation of the PSF size tends to cause a multiplicative bias, so given that the PSFs all had roughly the same size and varied only in ellipticity, this result is not surprising. There is a general tendency for *c* to be best for the fiducial PSF, positive for the ‘PSF rot’ and negative for ‘PSF *e*× 2’. This tendency indicates that the participants made the most efforts to model the fiducial PSF, which is used for almost all of the simulations. It would be interesting to compare the observed trend with the result of wrongly assuming the fiducial PSF for the two other PSF branches in case this explains the result.

The scatter of the submitted shears about the best-fitting line can be seen qualitatively by the range of the circles in Fig. C2, and quantitatively for each simulation branch in Fig. C3. Typical values around 10^{−3} are averaged down in the *Q* calculation in the average over *j*= 1, …, 300 simulations in a given simulation branch which have similar shear values. Therefore, *Q _{l}*∼ 300 × 10

^{−4}/σ

^{2}

_{k}and σ

_{k}should be less than about 5 × 10

^{−3}for all simulation branches to prevent a method with

*m*=

*c*= 0 from reaching

*Q*∼ 1000. This condition is met by most methods even at the lowest SNR value.

The uncertainties on the multiplicative bias are close to constant with respect to galaxy and PSF type and decrease with increasing SNR and galaxy size. With the exception of USQM, there is little scatter between the groups for a given simulation branch (tens of per cent difference), and the smallest uncertainties are obtained by AL and TK. For these methods, since the uncertainty on *m* is always less than 10^{−2}, we infer that the finite number of simulations is not the dominant reason that every submission departs from zero multiplicative bias for at least one simulation branch. The uncertainties on the additive bias are always less than 2 × 10^{−4} for the best methods and therefore also do not dominate the biggest departures from perfection.

For the method HB, the multiplicative calibration bias (upper panels, Fig. C3) is very close to constant with simulation branch. The shears are consistently overestimated by about 2 per cent. This bias is above our detailed simplistic requirements for far future experiments, but note that if a method really did have a multiplicative bias that was completely constant with the properties of the simulation or universe, then it would be trivially removed by dividing all shears by the relevant number. The additive calibration bias for this method is always below our detailed requirement of 0.0003 for far future experiments, except for the ‘Fid rotated’ PSF branch and the low-SNR branch. It would be intriguing to know if this could be fixed further by more detailed modelling of the PSF.

The poor performance of AL on the low-SNR branch appears to come mostly from a multiplicative bias of nearly 10 per cent (Fig. 5). The results on the most elliptical PSF (‘Fid *e*× 2’) are also relatively disappointing and come from the large additive calibration bias (Fig. 5). This result is consistent with a problem with modelling this particular PSF, in which residual PSF ellipticity remains to add to the true shear.

The good results of MV at high SNR and large galaxy size are largely due to the reduction in multiplicative bias in these regimes. This result could possibly hint at inaccurate modelling of the PSF size.

The poorer performance for smaller galaxy sizes for HHS now seems to come from both an increased multiplicative and additive error. The multiplicative bias increases slightly as a function of galaxy size in LowNoise_Blind but decreases as a function of galaxy size in RealNoise_Blind. Perhaps there is some kind of cancellation between the increasingly negative multiplicative bias as a function of SNR and the large positive multiplicative bias seen in RealNoise_Blind at smaller galaxy sizes. HHS2 and HHS3 used stacking to decrease dependence on the assumed galaxy model. The additive calibration bias is still significant and reduces the overall *Q* value. The sharp changes in additive calibration bias with PSF type suggest that the PSF is not being sufficiently well modelled.

In STEP2, there was found to be a systematic difference between *m*_{1} and *m*_{2} that was attributed to the different effective pixel scales in the two directions. We have made separate figures for *m*_{1} and *m*_{2} but find them to be visually similar for most methods except CH. Galaxy type variations, in general, had little effect on *m* and *c* overall, a surprising result also found in STEP2.