Abstract

A new fast Bayesian approach is introduced for the detection of discrete objects immersed in a diffuse background. This new method, called PowellSnakes, speeds up traditional Bayesian techniques by (i) replacing the standard form of the likelihood for the parameters characterizing the discrete objects by an alternative exact form that is much quicker to evaluate; (ii) using a simultaneous multiple minimization code based on Powell's direction set algorithm to locate rapidly the local maxima in the posterior and (iii) deciding whether each located posterior peak corresponds to a real object by performing a Bayesian model selection using an approximate evidence value based on a local Gaussian approximation to the peak. The construction of this Gaussian approximation also provides the covariance matrix of the uncertainties in the derived parameter values for the object in question. This new approach provides a speed up in performance by a factor of ‘100’ as compared to existing Bayesian source extraction methods that use Monte Carlo Markov chain to explore the parameter space, such as that presented by Hobson & McLachlan. The method can be implemented in either real or Fourier space. In the case of objects embedded in a homogeneous random field, working in Fourier space provides a further speed up that takes advantage of the fact that the correlation matrix of the background is circulant. We illustrate the capabilities of the method by applying to some simplified toy models. Furthermore, PowellSnakes has the advantage of consistently defining the threshold for acceptance/rejection based on priors which cannot be said of the frequentist methods. We present here the first implementation of this technique (version I). Further improvements to this implementation are currently under investigation and will be published shortly. The application of the method to realistic simulated Planck observations will be presented in a forthcoming publication.

1 INTRODUCTION

The detection and characterization of discrete objects is a generic problem in many areas of astrophysics and cosmology. Indeed, one of the major challenges in such analyses is to separate a localized signal from a diffuse background. It is most common to face this problem in the analysis of two-dimensional images, where one wishes to detect discrete objects in a diffuse background.

When performing this task it is often assumed that the background is smoothly varying and has a characteristic length-scale much larger than the scale of the discrete objects being sought. For example, SExtractor (Bertin & Arnouts 1996) approximates the background emission by a low-order polynomial, which is subtracted from the image. Object detection is then performed by finding sets of connected pixels above some given threshold. Such methods run into problems, however, when the diffuse background varies on length-scales and with amplitudes similar to those of the discrete objects of interest. Moreover, difficulties also arise when the rms level of instrumental noise is comparable to, or somewhat larger than, the amplitude of the localized signal one is seeking.

A specific example illustrating the above difficulties is provided by high-resolution observations of the cosmic microwave background (CMB). In addition to the CMB emission, which varies on a characteristic scale of order ∼10 arcmin, one is often interested in detecting emission from discrete objects such as extragalactic ‘point’ (i.e. beam-shaped) sources or the Sunyaev–Zel'dovich (SZ) effect in galaxy clusters, which have characteristic scales similar to that of the primordial CMB emission. Moreover, the rms of the instrumental noise in CMB observations can be greater than the amplitude of the discrete sources. In such cases, it is not surprising that straightforward methods, such as that outlined above, often fail to detect the localized objects.

The standard approach for dealing with such difficulties is to apply a linear filter ψ(x) to the original image d(x) and instead analyse the resulting filtered field  

1
formula
Suppose one is interested in detecting objects with some given spatial template t(x) (normalized for convenience to unit peak amplitude). If the original image contains Nobj objects at positions Xi with amplitudes Ai, together with contributions from other astrophysical components and instrumental noise, then  
2
formula
where s(x) is the signal of interest and n(x) is the generalized background ‘noise’, defined as all contributions to the image aside from the discrete objects. It is straightforward to design an optimal filter function ψ(x) such that the filtered field (1) has the following properties: (i) df(Xk) is an unbiased estimator of Ai; (ii) the variance of the filtered noise field nf(x) is minimized; the corresponding function ψ(x) is the standard matched filter (see, for example, Haehnelt & Tegmark 1996). One may consider the filtering process as ‘optimally boosting’ (in a linear sense) the signal from discrete objects, with a given spatial template, and simultaneously suppressing emission from the background.

An additional subtlety in most practical applications is that the set of objects are not all identical, but one can repeat the filtering process with filter functions optimized for different spatial templates to obtain several filtered fields, each of which will optimally boost objects with that template. In the case of the SZ effect, for example, one might assume that the functional form of the template is the same for all clusters, but that the ‘core radius’ differs from one cluster to another. The functional form of the filter function is then the same in each case, and one can repeat the filtering for a number of different scales (Herranz et al. 2002a). Moreover, one can trivially extend the linear filtering approach to the simultaneous analysis of a set of images. Once again, the SZ effect provides a good example. Owing to the distinctive frequency dependence of the thermal SZ effect, it is better to use the maps at all the observed frequencies simultaneously when attempting to detect and characterize thermal SZ clusters hidden in the emission from other astrophysical components (Herranz et al. 2002b).

The approaches outlined above have been shown to produce good results, but the filtering process is only optimal among the rather limited class of linear filters and is logically separated from the subsequent object detection step performed on the filtered map(s). As a result, Hobson & McLachlan (2003, hereafter HM03) introduced a Bayesian approach to the detection and characterization of discrete objects in a diffuse background. As in the filtering techniques, the method assumed a parametrized form for the objects of interest, but the optimal values of these parameters, and their associated errors, were obtained in a single step by evaluating their full posterior distribution. If available, one could also place physical priors on the parameters defining an object and on the number of objects present. Although this approach represents the theoretically optimal method for performing parametrized object detection, its implementation was performed using Monte Carlo Markov chain (MCMC) sampling from the posterior which was extremely computationally intensive. Although considerable progress has recently been made in increasing the efficiency of sampling-based Bayesian object detection methods (Feroz & Hobson 2008), such approaches are still substantially slower than simple linear filtering methods. Therefore, in this paper, we explore a new, fast method for performing Bayesian object detection in which sampling is replaced by multiple local maximization of the posterior, and the evaluation of errors and Bayesian evidence values is performed by making a Gaussian approximation to the posterior at each peak. This approach yields a speed-up over sampling-based methods of many 100s, making the computational complexity of the approach comparable to that of linear filtering methods.

The outline of the paper is as follows. In Section 2 we provide a brief outline of the essential Bayesian logical framework which supports the employed methodology. The previously defined entities (see Section 2) are then characterized in terms of the source detection problem in Section 3. A summary of the strategy to fulfil the desire goals is then suggested in Section 4. In Section 5 we develop the first step of the proposed algorithm: the maximization of the posterior distribution and we give account of the errors bars on the estimated parameters of the putative source under study. A two-step approach split between Fourier and real space is indicated. The second phase of the detection algorithm: the Bayesian validation procedure which we apply on the previously found likelihood maximum is explained in Section 6. A prior is suggested and an upper bound on the quality of the detection is advanced. We end this section by considering a method for correcting possible systematic deviations on the evidence evaluation. In Section 7 we give a detailed study on the effects that develop when adding correlation into the diffuse background that produces the substructure in which the sources are imbedded. The results from a series of simulations covering several typical scenarios are presented in Section 8 and we close indicating our conclusions and possible directions for further work in Section 9.

2 BAYESIAN INFERENCE

Bayesian inference methods provide a consistent approach to the estimation of a set parameters Θ in a model (or hypothesis) H for the data d. Bayes' theorem states that  

3
formula
where forumla is the posterior probability distribution of the parameters, forumla is the likelihood, forumla is the prior and forumla is the Bayesian evidence.

In parameter estimation, the normalizing evidence factor is usually ignored, since it is independent of the parameters Θ. This (unnormalized) posterior constitutes the complete Bayesian inference of the parameter values. Inferences are usually obtained either by taking samples from the (unnormalized) posterior using MCMC methods, or by locating its maximum (or maxima) and approximating the shape around the peak(s) by a multivariate Gaussian.

In contrast to parameter estimation problems, in model selection the evidence takes the central role and is simply the factor required to normalize the posterior over Θ:  

4
formula
where D is the dimensionality of the parameter space. As the average of the likelihood over the prior, the evidence is larger for a model if more of its parameter space is likely and smaller for a model with large areas in its parameter space having low likelihood values, even if the likelihood function is very highly peaked. Thus, the evidence automatically implements Occam's razor: a simpler theory with compact parameter space will have a larger evidence than a more complicated one, unless the latter is significantly better at explaining the data. The question of model selection between two models H0 and H1 can then be decided by comparing their respective posterior probabilities given the observed data set d, as follows:  
5
formula
where forumla is the a priori probability ratio for the two models, which can often be set to unity but occasionally requires further consideration. Unfortunately, evaluation of the multidimensional integral (4) is a challenging numerical task. None the less, a fast approximate method for evidence evaluation is to model the posterior as a multivariate Gaussian centred at its peak(s) and apply the Laplace formula (see e.g. Hobson, Bridle & Lahav 2002).

3 BAYESIAN OBJECT DETECTION

A Bayesian approach to detecting and characterizing discrete objects hidden in some background noise was first presented in an astronomical context by HM03, and our general framework follows this closely. For brevity, we will consider our data vector d to denote the pixel values in a single patch in which we wish to search for discrete objects, although d could equally well represent the Fourier coefficients of the image, or coefficients in some other basis.

3.1 Discrete objects in a background

Let us suppose that we are interested in detecting and characterizing some set of (two-dimensional) discrete objects, each of which is described by a template τ(x; a). This template is defined in terms of a set of parameters a that might typically denote (collectively) the position (X, Y) of the object, its amplitude A and some measure R of its spatial extent. A particular example is the circularly symmetric Gaussian-shaped object defined by  

6
formula
so that a={X, Y, A, R}. In what follows, we will consider only circularly symmetric objects, but our general approach accommodates the templates that are, for example, elongated in one direction, at the expense of increasing the dimensionality of the parameter space.1 In the analysis of single-frequency maps, most authors take the template τ(x; a) to be the (pixelized) intrinsic shape of the object convolved with the beam profile. For multifrequency data, however, it is more natural to treat the intrinsic object shape and the beam profiles separately.2

If Nobj objects are present in the map and the contribution of each object to the data is additive, we may write  

7
formula
where s(ak) denotes the contribution to the data from the kth discrete object and n denotes the generalized ‘noise’ contribution to the data from other ‘background’ emission and instrumental noise. Clearly, we wish to use the data d to place constraints on the values of the unknown parameters Nobj and ak(k= 1, …, Nobj).

3.2 Defining the posterior distribution

As discussed in HM03, in analysing the data the Bayesian purist would attempt to infer simultaneously the full set of parameters forumla. The crucial complication inherent to this approach is that the length of the parameter vector Θ is variable, since it depends on the unknown value Nobj. Some sampling-based approaches are able to move between spaces of different dimensionality, and such techniques were investigated in HM03.

An alternative approach, also discussed by HM03, is simply to set Nobj= 1. In other words, the model for the data consists of just a single object and so the full parameter space under consideration is a={X, Y, A, R}, which is fixed and only four-dimensional. Although we have fixed Nobj= 1, it is important to understand that this does not restrict us to detecting just a single object in the map. Indeed, by modelling the data in this way, we would expect the posterior distribution to possess numerous local maxima in the four-dimensional parameter space, each corresponding to the location in this space of one of the objects present in the image. HM03 show this vastly simplified approach is indeed reliable when the objects of interest are spatially well separated, and we adopt this method here.

3.3 Likelihood

In this case, if the background ‘noise’n is a statistically homogeneous Gaussian random field with covariance matrix N=〈n nT〉, then the likelihood function takes the form  

8
formula
Moreover, if the background is just independent pixel noise, then N2I, where σ is the noise rms.

In the general case, the log-likelihood thus takes the form  

9
formula
where c is an unimportant constant. The first innovation in our new method is to recast the log-likelihood in such a way that the computational cost of evaluating it is considerably reduced. This is achieved by instead writing the log-likelihood as  
10
formula
where c′=c− (1/2)dTN−1d is again independent of the parameters a. The advantage of this formulation is that the part of the log-likelihood dependent on the parameters a consists only of products involving the data and the signal. Since the signal from a discrete source with parameters a is only (significantly) non-zero in a limited region centred on its putative position one need only calculate the quadratic forms in (10) over a very limited number of pixels, whereas as the quadratic form in the standard version (9) must be calculated over all the pixels in the map.

In the case of independent pixel noise, the covariance matrix N in (10) is diagonal, and so the quadratic forms can be calculated very rapidly. For correlated noise, however, N is no longer diagonal and so evaluating the necessary elements of its inverse can be costly for a large map. None the less, if the background is a statistically homogeneous random field N−1 can be computed in Fourier space, where it is diagonal. As a result, we perform our analysis of sufficiently small patches of sky that the assumption of statistical homogeneity is reasonable. An alternative procedure would be to transform to a set of basis functions, such as wavelets, in which the data are readily compress, hence reducing the effective dimensionality of the problem; we will not, however, pursue this route.

3.4 Prior

Having determined the likelihood function, it remains only to assign a prior on the parameters a= (X, Y, A, R). Although not a formal requirement of our method, the simplest choice is to assume the prior is separable, so that  

11
formula
Moreover, we will assume that the priors on X and Y are uniform within the range of the image, and that priors on A and R are the uniform distributions within some ranges [Amin, Amax] and [Rmin, Rmax], respectively. This is true for our first implementation of PowellSnakes (version I). Our general approach can, in fact, accommodate more general priors. Indeed, the assumption of uniform priors on A and R is not a good one for most astrophysical problems. Typically one expects the number of discrete sources to decrease with amplitude, and their angular sizes are unlikely to be uniformly distributed. The best prior to adopt for A is one derived from existing source counts. These are typically power laws over a wide range (although care must be taken at the ends of the range to obtain a properly normalized prior). In our approach presented below, however, we will assume the log-posterior is at most quadratic in the source amplitude A, which restricts π(A) to be of uniform, exponential or Gaussian form. (The second implementation (version II) of PowellSnakes currently under development adopts a power law.) Turning to π(R), existing knowledge of the angular sizes of the objects sought can be use to construct an appropriate prior, and no restriction is placed by our method on this. Of course, in the case of point sources, the overall template is simply the beam profile, the angular size of which is (usually) known in advance.3

3.5 Optimal patch size. The patch border

The patches are divided into two different areas: a central area where the detection takes place and a surrounding border. The border is used to prevent the sources from being truncated by the edges of the patch since this usually impacts severely on the process of their detection and characterization. In order to avoid truncation, the border size must be at least half the largest source radial size. When enforcing this rule we are assuring that a source profile which has been cut by the patch edge is left undisturbed at least on one of its neighbour patches. When defining an optimal patch size several somewhat opposing criteria must be taken into account.

  • The assumption of statistical homogeneity which always favours small patches.

  • Execution speed which tends to exhibit a bias towards somewhat larger patches.

    • The greater the number of patches the greater the number of border overlaps which occur. This increases the number of pixels processed more than once.

    • There is a non-negligible amount of time preparing a new patch for detection.

    • The pre-filtering stage which involves a fast Fourier transform (FFT) requires a power of 2 array size in order to be efficient.

Our work shows that the optimal patch size is always the one which minimizes the effects of the in-homogeneities of the background. However, (128 × 128) pixel patches on the NSide = 1024 Healpix maps (∼7.33°× 7.33°) and (256 × 256) pixel patches on NSide = 2048 Healpix maps (∼7.33°× 7.33°) are always good choices in terms of being a good balance between statistical homogeneity and execution speed.

4 OBJECT DETECTION STRATEGY

Once the likelihood and prior have been defined, the problem of object identification and characterization reduces simply to exploring the resulting posterior distribution:  

12
formula
where we have omitted an unimportant additive constant term on the right-hand side. Rather than using MCMC methods, as advocated by HM03 and Feroz & Hobson (2008), here instead we locate the local maxima of the posterior in the four-dimensional parameter space a={X, Y, A, R} and perform a Gaussian approximation about each peak. The former provides the estimates of the object parameters associated with that posterior peak and, as outlined below, the latter allows one to assign uncertainties to the derived parameter values. The Gaussian approximation also allows one to estimate the Bayesian evidence associated with each posterior peak. The Bayesian evidence is used to decide whether the detected peak corresponds to a real object or is just a ‘conspiracy’ of the background noise field (see Section 6).

Such an approach was also advocated by HM03, in which a simulated annealing downhill simplex minimizer was used in an iterative object detection scheme. At each iteration, the minimizer located the global maximum of the posterior and an object with the optimal parameter values was subtracted from the map before commencing the next iteration. The process was repeated until the first rejected detection.

Here we adopt a different strategy that obviates the need to attempt to locate the global maximum at each iteration, since this is very computationally expensive. Instead, our approach consists of launching a series of simple downhill minimizations. The choice of starting point for each minimization, the minimizer employed and the number of minimizations performed are discussed in detail in Section 5. The end-point of each minimization launched will be a local maximum in the posterior, giving the optimal parameter values for some putative detected object. A Gaussian approximation to the posterior is constructed about the peak and the detection is either accepted or rejected based on an evidence criterion (see Section 6). If the detection is accepted, then an object with the optimal parameter values is subtracted from the map before the next downhill minimization is launched. Although our method does leave open the possibility of repeatedly detecting the same spurious sources, the speed of our algorithm assures that this is not a problem (see Section 8).

It should be noted, however, that the only reason for subtracting accepted objects from the map is to avoid multiple real detections. Moreover, when attempting to detect sources at very low signal-to-noise ratios, one might occasionally accept a spurious detection as a real source, and its subtraction from the map may damage subsequent downhill minimizations. Ideally, therefore, one should avoid subtracting sources altogether. One such approach is simply to accept that real objects may be detected repeatedly, and check for duplications in the optimal sets of parameters obtained in each accepted detection (e.g. parameters sets that are similar to well within the derived uncertainties). We have not implemented such a method for this paper, but this is under investigation.

As discussed in Section 5 determining the number of the downhill minimizations to perform is a key part of our method. However, once we perform these minimizations, we ensure that no real object has been missed by performing a further set of downhill minimizations as follows. Since the spatial extent of any object in the data map must be greater than that of the beam, we divide the map into patches of area equal to the beam size. For each such patch, we then launch a downhill minimizer with initial values of X and Y equal to the coordinates of the centre of the patch, and with A and R values equal to the mid-points of their respective prior ranges. This is the case for our first implementation of PowellSnakes (version I, for white noise only). This approach is unlikely to miss any remaining peaks in the posterior associated with real objects, provided the number density of sources is sufficiently low that they are spatially well-separated. Clearly, such an approach will perform less well if two or more sources (partially) overlap, in which case a single source of large spatial extent may be fitted to them.

In the earlier stages of PowellSnakes implementation we performed the analysis completely in real space, rather than pre-filtering the map. However, this procedure slowed down the execution of the algorithm. The evaluation of the objective function for each set of parameters is a very expensive operation which unfortunately lies deep in the inner loop of the code, ending up being called several hundreds of times for each ‘snake’. Even a small optimization of its execution has a gigantic impact on the overall performance of the code. We initialize the Powell minimizer (i.e. the tail of each of the ‘snakes’), with the central points of the grid defined in Section 5.2 and the mid-points of the [A, R] parameters. The first step of the minimization process is always to search the position subspace keeping R and A constant. As we have to repeat this step each time we start a new ‘snake’, finding a fast way of evaluating the likelihood in all the positional subspace available is very advantageous. This is where the filtering process steps in. Filtering with the matched filter automatically provides us with a fast evaluation of the likelihood restricted to the positional subspace. Furthermore, it gives a very good estimate of the starting value for the A parameter, using the classical formula of the amplitude of a field filtered by a matched filter (16). Next, we need to find out the optimal value of R parameter which is no longer considered constant.

As a final note, one must take care when a source is located near the edge of the map, since it will be partially truncated. This is not, however, a severe limitation as one can allow an overlap of at least the large expected size when dividing the full map into small patches; a truncate source in one patch will appear intact in a neighbouring one. This strategy does result in a small extra processing overhead, since several pixels will be processed more than once, but this is a minor consideration.

5 MAXIMIZING THE POSTERIOR DISTRIBUTION

In this section, we outline the method employed to locate the multiple local maxima of the posterior distribution.

5.1 Pre-filtering step

Let us begin by writing the template (6) as  

13
formula
where forumla and R are, respectively, the amplitude, vector position and size of the object, and t(x; R) is the spatial shape of a unit height reference object of size R centred on the origin. Then the signal vector in the log-likelihood (10) is simply  
14
formula
where the vector t has components forumla, in which xi is the position of the ith pixel in the map. Substituting the above expression into the log-likelihood, assuming a constant prior and differentiating with respect to A gives  
15
formula
which on setting equal to zero yields an analytic estimate for the amplitude:  
16
formula
Note that, under the assumption of statistical homogeneity, the denominator in the above expression does not depend on the source position forumla, and thus need only be calculated once (say for a source at the origin) for any given value of R. Generalizing the notation of Savage & Oliver (2007), we denote this denominator by α(R) and the numerator in (16) by forumla. It is worth noting that the quantity forumla is merely a generalized signal-to-noise ratio for a unit amplitude object integrated over its spatial template.

More importantly the estimator (16) is precisely the filtered field produced by a classical matched filter, in which one assumes a value for R. Moreover, it is straightforward to show that the corresponding log-likelihood (10) can be written  

17
formula
Thus, we show here that for a given value of R, peaks in the filtered field correspond precisely to peaks in the log-likelihood considered as a function of forumla. If the objects sought all have the same size R and this is known in advance, then the local maxima of the now three-dimensional posterior in the space (X, Y, A) are all identified by this method. This scenario corresponds to point sources observed by an experiment with a known beam profile. However, the assumption that the sizes of the objects are known in advance might lead us astray. To show that this is indeed the case, consider the following.

  • Suppose two sources are so close they cannot be resolved by the optical device: the intensity profile usually still matches fairly well the of beam profile but with a much enlarged radial scale. Taking the radius of the beam profile as an unknown parameter, PowellSnakes will still identify them wrongly as a single source, but the total estimated flux will much closely match the true value of the total flux coming from both sources.

  • We are assuming a statistical homogeneous background: this is never strictly true, but keeping the patches small enough, this is usually a good assumption. Nevertheless, small changes in the beam profile across the patch, which in terms of the noise components (CMB, Galactic foregrounds etc.) may be perfectly ignored, when estimating the sources fluxes might lead us to severely biased values.

  • We are assuming point sources: this limits the range of application of this method. One of the driving motivations behind PowellSnakes is the detection of SZ clusters. This implies the capability of multifrequency operation and a source profile with a well-defined shape. This source shape template is only useful in real situations if we are to allow different spatial scales. In order to achieve this extra degree of freedom we introduced the Radius [R] parameter.

When the source size R is not known in advance, or the sources have different sizes, most matched filter analyses (see e.g. Herranz et al. 2005) filter the field at several predetermined values of R and combine the results in some way to obtain estimates of the parameters (X, Y, A, R) for each identified source. This approach is rather ad hoc, however, and so we will pursue a different strategy based on locating the local maxima of the posterior using multiple downhill minimizations. At first sight it might seem tempting to use the result (16) to reduce the parameter space to the three dimensions (X, Y, R), since with knowledge of these parameters one can immediately calculate forumla. Such an approach can, however, lead to numerical instabilities when the denominator α(R) in (16) is very small (which occurs in the low signal-to-noise ratio regime) as our numerical tests confirmed. Thus we do not pursue this approach.

We do, however, make use of the result (16) by first pre-filtering the map assuming a value for R equal to the mean of its prior range. The advantage of this pre-filtering stage is that, even in the presence of objects of different sizes, the filter field often has a pronounced local maximum near many real objects in case the R parameter range is not too wide and the value of R employed in the filtering process stays close to 0 (Fig. 1).

Figure 1

Log-likelihood manifold, equation (10) using template (6) (A and R dimensions suppressed); white noise background, R parameter = 4.0 pixel; a constant has been subtracted.

Figure 1

Log-likelihood manifold, equation (10) using template (6) (A and R dimensions suppressed); white noise background, R parameter = 4.0 pixel; a constant has been subtracted.

5.2 Two-dimensional Powell minimization step

In the next step of our analysis, we launch N= 100 (see below) downhill minimizations in the two-dimensions (X, Y) using the Powell algorithm (Teukolsky & Vetterling 1992) with starting points chosen as described below. This method is a simple direction-set algorithm that requires only function values. The method makes repeated use of a one-dimensional minimization algorithm, which in our case is Brent's method; this is an interpolation scheme that alternates between parabolic steps and golden sections. In addition, we ‘enhance’ the basic Brent line minimizer algorithm by introducing a ‘jump’ procedure designed to avoid small local minima (such as those resulting in the posterior from the noise background). A description of this ‘jump’ procedure is given below. The reason for this two-dimensional minimization step is simply to obtain a good estimate of the (X, Y) positions of the sources. As a by-product, using formula (16), we end up obtaining an approximate estimate of the putative source amplitude A as well.

Assuming that sensitivity and experiment resolution are such that we should only expect a very small number of partially resolved sources, one may assume that in each patch of beam size area we should not find more than one real peak. The likelihood peaks in the positional subspace (keeping A and R constant) are not expected narrower than the original beam at least when we are dealing with the white noise only case. This is so because the likelihood positional surface is nothing but the correlation of the beam shape with, on average, another beam shape eventually with a different R. Thus the total area where the correlation will have values significantly different from 0 is expected to be larger than the largest area of each one of the original beams. These assumptions naturally define the coarseness one should use when exploring the patch, namely, the beam size. As we are assuming that beam shapes with different radii may actually occur, the largest possible resolution happens when the smallest beam is considered. So far there is no prior information to eventually distinguish one part of the patch from the others, therefore, we assume that a uniform rectangular grid with as many cells as the total number of ‘beams in the patch’, would be perfectly adequate to define the starting points of the ‘snakes’. Thus  

18
formula

We add an extra two-dimensional random offset from the central points of each grid cell in order to avoid some unwanted correlation effects due to the perfect spacing of the grid. Some may argue that, in addition to the likelihood real peaks (those resulting from the sources in the map), we should expect a large number of fake ones resulting from the interaction of the filter with the noise fluctuations (Fig. 1). We considered such possibility by devising a ‘jump’ procedure (see below) to avoid the ‘snakes’ from being ‘caught’ on those fake peaks. Nevertheless, as in the case of a CMB background, when the number of these spurious peaks largely outnumbers the real ones, one should increase the density of ‘snakes’ (Fig. 2). For the white noise detection examples presented in Section 8, we have taken all of the above into consideration. The number of snakes chosen, NSnakes= 100, meets these requirements, for both cases [i.e. low integrated signal-to-noise ratio (LISNR) and Hobson–McLachlan (HMcL), see Section 8]. As we assumed that no sources are placed into the patch borders we consequently did not include those pixels.

Figure 2

The likelihood ratio manifold of the models for the presence of a source over its absence forumla and R dimensions suppressed): radius R= 4.0 pixel; background parameters s0= 11.57 pixel, wI= 0.01, wB= 0.99 (see Section 7).

Figure 2

The likelihood ratio manifold of the models for the presence of a source over its absence forumla and R dimensions suppressed): radius R= 4.0 pixel; background parameters s0= 11.57 pixel, wI= 0.01, wB= 0.99 (see Section 7).

5.2.1 The jump procedure

The Brent line minimizer keeps at any time a set of three pairs of values, argument and function value, where the middle pair is said to be ‘bracket’. This is because the argument part of the second pair lies in the interval defined by the other two and its function value is lower than the function values of the interval limiting pairs. We call them  

19
formula

Like the simulated annealing example previously referred, the rational behind the jump procedure was borrowed from natural world, but this time from the realm of quantum mechanics. We imagine our current best guess, the bracketed pair, [b, f(b)], as the particle inside a potential well defined by the limiting pairs (left and right bound). If we adopt the ‘classical’ point of view there is no way the particle could possibly escape from the bracketing barriers. This is exactly what happens with the traditional Brent procedure. Instead if we use a quantum approach it is still possible for the particle to tunnel out from the barrier.

The transmission coefficient T of a particle with energy E which faces a potential barrier with a height equal to V0(V0 > E), and a width l, is given by  

20
formula
where ρ2 is forumla is the particle mass and ħ is Dirac constant, where we assume ρ2l≪ 1 (Cohen-Tannoudji et al. 1973). From the quantum mechanical example we have only retained the negative exponential dependence with the barrier length (the jump distance in our jump model) and the characteristic constant depending on the difference between the particle energy and the barrier maximum potential. Following this example we define a random variable Λ exponentially distributed:  
21
formula
with average θ. The next step is to define θ dependence on the parameters, bearing in mind the tunnelling effect behaviour. We chose  
22
formula
where L is the ‘maximum average jump length’, fs is the ‘function values scale’ and Δ is either the ‘right barrier’≡Δr when we jump forward or the ‘left barrier’≡Δl when we are trying a jump along the opposite direction (Fig. 3).

Figure 3

The jump procedure.

Figure 3

The jump procedure.

The procedure has two tuning parameters: one (whose value is given as input from the parameter file) to divide the maximum allowed range for the current jump in order to produce the L value, and another one which purpose is to define a scale for the function values. This second value, fs, is the maximum range of function values evaluated from the ‘snake’s tails' locations (initial values drawn from the grid as explained above). When we start the minimization process by defining the bracketing interval, the well walls (defined by the differences of the function values, Δr and Δl) are usually large and consequently only very small tunnels will be allowed. If Δ > fs no jumps will be tempted. As the minimizer converges and approaches the minimum, the barriers will become smaller and consequently bigger tunnels will become more and more probable. This jump procedure has the advantage of exploring the function domain progressively as the algorithm approaches the bottom of the well. If by accident we have first hit the global minimum when the solution is still far from the bottom only small jumps are allowed and the probability of tunnelling to a local peak is small. As soon as we get to the bottom, Δ≈ 0, large tunnels become probable (θ≈L), but as the function is now close to its lowest value, a successful jump is now improbable. However, if instead we have started within a local peak instead of becoming ‘stuck’, as the minimizer approaches its last steps, large jumps become common and the probability of ‘plunging’ into a deeper well is now very high. One of the main reasons for choosing the exponential distribution, apart its close relationship with the quantum tunnelling, is the simplicity and efficiency of a random deviates generator whose output will follow it (provided we already have a routine which produces random deviates with a uniform probability distribution inside a certain interval). We proceed as follows.

For each Brent iteration:

  • draw a sample u from an uniform distribution between [0, 1). If u≥ 0.5 choose a positive jump, otherwise move backwards;

  • draw from the tunnelling distribution. If the jump ends up inside the ‘inner zone’ we reject it because we are on the same well we have started from. If the jump is outside the ‘inner zone’ we evaluate the function at that point [z, f(z)]. If the new function value is lower than the one that initiated the jump, f(z) < f(b), we accept it and restart the Brent minimizer with this new value.

5.2.2 The effectiveness of the jump procedure in recovering ‘lost snakes’

Our extensive set of simulations covering a typical assemblage of different scenarios shows that the use of the jump procedure has a fairly good effectiveness. We have achieved equal levels of quality on the obtained results using ∼20–25 per cent fewer snakes. However, within our current set-up its importance is far from critical since speed is not a concern to us. The extra level of complexity which results from the introduction of the procedure could have been avoided by launching extra ‘snakes’. However, when dealing with such a scenario where the cost of starting additional ‘snakes’ is much higher or the problem cannot just be circumvented by setting up more starting points for the minimization procedure, we believe that the jump procedure might have an important role to play.

5.3 Four-dimensional Powell minimization step

Instead of just repeating the filtering procedure using different values for the R parameter, we proceed by following a rationale which tries to mimic that behind the FFT immense speed up. FFT is fast because it reduces, using factorization, the number of operations needed to compute the ‘complete set’ of Fourier coefficients. If we only need one or two (if the number of coefficients we want to compute is less than ln(n); n being the size of the vector) then it would be more advantageous to use the traditional formula directly. Hence, we now search the full parameter space (X, Y, A and R) with the help of the Powell minimizer, but this time using the previously estimated parameters (from the two-dimensional step) as starting values. We are not expecting the ‘Powell snake’ to unfold significantly outside a small neighbourhood of the initial value. Thus, as we do not need to evaluate the likelihood outside this small positional range anymore (as in the FFT case where we just want to compute one or two coefficients) using the real space proves to be more advantageous.

In the next step, we perform multiple Powell minimizations in the full four-dimensional space, starting each minimization from the (X, Y, A, R)– positions found in the previous step. In this step the ‘jump’ procedure described above is not used. This is one of the main reasons why we have first performed the two-dimensional step (and the pre-filtering). From our experience we could verify that after performing the two-dimensional step, the neighbourhood of the likelihood manifold around the vector solution {X, Y, A, R} we had so far obtained was rather smooth and ‘well behaved’. Thus, the necessity for a procedure in order to avoid local maxima is no longer required. The resulting end-point of each minimization is a local maximum of the posterior, giving the optimal parameter values for some putative detected object. A Gaussian approximation to the posterior is constructed about the peak and the detection is either accepted or rejected based on an evidence criterion (see Section 6). If the detection is accepted, then an object with the optimal parameter values is subtracted from the map before the next downhill minimization is launched.

5.4 Error estimation – Fisher analysis

For each accepted object, estimates of the uncertainties on the derived parameters are obtained from the Gaussian approximation to the posterior around that peak. In detail the covariance matrix of the parameter uncertainties has elements Cij≡〈δai δaj〉, which can be obtained by inverting (minus) the Hessian matrix at the peak. In the case of uniform priors, one has simply  

23
formula

Here we present an estimation of the putative source parameters' error bars using the Fisher analysis over the model H1. We present two different cases:  

formula

We start by defining Fij as the negative expectation of the log-likelihood Hessian matrix, in other words the ‘Fisher information matrix’. We are assuming, as we have already stated, a Gaussian approximation of the likelihood around its maximum in parameter space.

Let forumla be the likelihood of model H1, then  

24
formula
Assuming a Gaussian model for the diffuse background we have  
25
formula
where N−1ij is the (ij) element of the inverse correlation matrix of the diffuse background, pi is the i map pixel, τi is the i pixel of the object template forumla and c is an unimportant constant. The parameters of the object template are the following: A the amplitude of the object, r0 the positional parameters and forumla which represents the ensemble of the remaining parameters. In this analysis we are assuming a circular symmetric source. Thus, forumla contains only a single parameter, R which provides a scale for the radial dimension of the object, i.e. R≡‘radius of the object’. We will assume, from now on, the following functional dependence of the source template on its parameters:  
26
formula

5.4.1 White noise

For a background diffuse component generated from a Gaussian stationary process with coherence length much smaller than the pixel size, i.e. Gaussian white noise, the following condition applies:  

27
formula
This is the situation when dealing with a system where the instrumental pixel noise is the dominant source of the background component. Adding the condition for statistical homogeneity:  
28
formula
and substituting both conditions into the log-likelihood expression this becomes  
29
formula
Evaluating expression (24) for all the possible parameters' combinations and using the following sort criterion {A, R, X, Y}, one obtains  
30
formula
where {α, β, δ, γ} are pure numerical constants:  
31
formula
forumla and forumla are the values of the template parameters which maximize the likelihood. We have used pixel metrics together with the assumption that the sums can be evaluated using the continuous limit (this introduces only minor errors). When considering a source template with exactly the same functional dependence of equation (6), the numerical constants become  
32
formula
Now inverting the F matrix  
33
formula
where forumla and using the Cramèr–Rao inequality, which in this case reduces to an equality  
34
formula
since we are dealing with a max-likelihood estimator which is efficient. Writing explicitly the above expression (34) for each parameter, one finally obtains the desired parameters' error bars:  
35
formula
Defining ‘peak signal-to-noise ratio’ (forumla, and substituting into (35):  
36
formula
we get another way of expressing the error bars, where we have emphasized their dependence on the ‘quality’ of the signal.

6 BAYESIAN OBJECT VALIDATION

It is clear that a key component of our approach is the step for accepting/rejecting as a real object each posterior peak. Indeed, a reliable means for performing this step is crucial for the method to function properly. The decision whether to accept or reject a detection is based on the evaluation of the Bayesian evidence for two competing models for the data given by equation (5). PowellSnakes evaluates the models evidence using a Gaussian approximation to the logarithm of the likelihood ratio of the competing models:  

37
formula
It does so by expanding equation (37) around its maxima in the parameter space and rejecting all the terms equal and above the third order. As the null model (H0≡‘there is no source centred in region S’) does not have any parameter, thus behaving like a constant when considering the parameters space, maximizing the logarithm of likelihood ratio holds the same results as the max-likelihood estimator of the source parameters of the H1 model only. Therefore, finding the peaks of the logarithm of likelihood ratio, corresponds to computing the max-likelihood estimator of the putative source parameters.

The precise definition of the models H0 and H1 are given below. In general, however, we shall adopt a formulation of the model selection problem in which each model Hi(i= 0, 1) is parametrized by the same parameters a= (X, Y, A, R), but with different priors. In this case the evidence for each model is  

38
formula
where the likelihood function L(a) is the same for both models and is given by (10). Although not required by the method, we will assume that for each model the prior is separable, as in equation (11), so that for i= 0, 1,  
39
formula
where forumla is the vector position of the source.

Moreover, we shall assume uniform priors within specified ranges on each parameter, although the method is easily generalized to non-uniform priors. The general form of our models for the data is  

formula
If the region S corresponds to the coordinate ranges [Xmin, Xmax] and [Ymin, Ymax], then  
40
formula
where |S| is the area of the region S. We assume the prior on R is also the same for both models, so π0(R) =π1(R) ≡π(R), but the priors on A are substantially different. Guided by the forms for H0 and H1 given above, we take π0(A) =δ(A) (which forces A= 0) and  
41
formula
One could, of course, consider alternative definitions of these hypotheses, such as setting H0: AAlim and H1: A > Alim, where Alim is some (non-zero) cut-off value below which one is not interested in the identified object. We shall not, however, pursue this further here.

Given the above forms of the priors, the evidence for model H0 is simply  

42
formula
 
43
formula
since forumla is a constant and the priors are normalized. For model H1, the evidence reads  
44
formula
 
45
formula
 
46
formula
where we wrote explicitly the dependence of the evidence on the chosen spatial region S; we have also defined the (unnormalized) two-dimensional marginal posterior forumla and |S| is the area of the region S.

So far we have not addressed the prior ratio forumla in (5). Although this factor is often set to unity in model selection problems, one must be more careful in the current setting. For the sake of illustration and simplicity, let us assume that the objects we seek are randomly distributed in spatial position, i.e. they are not clustered. In that case, if μS is the (in general non-integer) expected number of objects per region of size |S|, then the probability of there being N objects in such a region follows a Poissonian distribution:  

47
formula
Thus, bearing in mind the above definitions of H0 and H1, we have  
48
formula
Hence, the key equation (5) for model selection becomes  
49
formula
where μ=μS/|S| is the expected number sources (centres) per unit area.

There is a certain degree of freedom when choosing the level ρ must reach in order to consider that model H1 is present (a detection) or not. This ambiguity may only be overcome by advocating decision theory (see Jaynes 2003). From now on, we will always assume a criterion of  

50
formula
When introducing the criterion of symmetric loss the condition for detection immediately becomes uniquely defined. One accepts the detection if  
51
formula
and rejects it otherwise. One can change our decision criteria, for instance suppose we want to put emphasis on reliability (which means we are willing to accept an increase in the number of undetected sources by a significant decrease in the fake ones), than one must change the threshold for acceptance/rejection. This means that one must change the value that ρ must reach.

The only remaining issue is the choice of the region S. Our method for locating peaks in the posterior is direct (local) maximization, which yields the parameter values forumla at the peak.

In this peak-based approach, S is taken to be a region enclosing the entire posterior peak in the forumla subspace (to a good approximation). This, in fact, requires a little care. If S were taken to be the full mapped region, then the evidence Z1(S) would have contributions from many local peaks in the posterior. For each putative detection, however, we are only interested in the contribution to Z1(S) resulting from the posterior peak of interest. In practice, our Gaussian approximation to the posterior around this peak means that other peaks will not contribute to our estimate of Z1(S), thus making a virtue out of a necessity. Moreover, as we show below, provided the region S does enclose the posterior peak of interest in the forumla subspace, the resulting model selection ratio ρ will be independent of the region size |S|.

In the present case, using (46) and assuming uniform priors on A and R, one has  

52
formula
Making a four-dimensional Gaussian approximation to the posterior about its peak forumla, one obtains  
53
formula
where the elements of (minus) the Hessian matrix C−1 are given by (23) evaluated at the peak forumla. The above expression again ignores possible edge effects due to abrupt truncation of the posterior by the priors. In Section 6.4 we give an account of how we solved this problem. Using the expressions (10), (16) and (17) for the likelihood, one finds (in logarithms)  
54
formula
Most importantly, since  
55
formula
we see that ln ρ is independent of the size |S| of the region considered in the forumla subspace, provided S encloses entirely (to a good approximation) the peak of interest.

6.1 Acceptance/rejection threshold

Consider the variable ν,  

56
formula
where 〈ninj〉=σ2Mij=Nij and forumla is the generalized signal-to-noise ratio for a unit amplitude object (according to Savage & Oliver 2007 notation). This is the same as  
57
formula
which is the more common form one may find of ν≡‘the normalized field amplitude’ (López-Caniego et al. 2005). ‘The normalized field amplitude’ is the amplitude of a random stationary Gaussian field whose power spectrum [forumla] satisfies the condition forumla, i.e. σ= 1. Using Parseval theorem, it is straightforward to show that the power spectrum of some background that was filtered by an un-normalized matched filter, forumla, satisfies  
58
formula

Assuming that μ is neither a function of the position nor a function of the considered area, this implies that μ must be a constant. These conditions have been obtained using the principle of indifference along with the restriction (55). Integrating, one obtains  

59
formula
where N is the expected total number of sources in the patch and Δs is the total area of the patch.

Let us now recover the condition for symmetric loss, ρ > 1, together with the key result for model selection (49). Substituting (59) into it and using expression (10) we obtain  

60
formula
where forumla (the prior term) is  
61
formula
where VT is the total parameter volume, i.e., VT= (AmaxAmin)(RmaxRmins and N is the expected number of sources in the map.

In Fig. 4 we display the right-hand side of inequality (60) as function of integrated signal-to-noise ratio (ISNR) forumla. We plot several curves for typical values forumla. Something which is immediately evident is that each curve has a lower bound which depends on forumla which in turn depends on the priors. Regardless of how small the peak is there is always a lower bound for it to be considered a true detection and not merely a noise fluctuation. This is to be expected from an assessment condition which enforces a policy of robustness against false detections (spurious). The threshold curve rises on both directions: when the signal (peak) is too weak because there is a strong possibility of being a noise fluctuation; and when the signal is high because there is a lot of evidence supporting the presence of an object. Hence, rising up the threshold assures an extra level of security against spurious detections without compromising the degree of detection.

Figure 4

Right-hand side of inequality (60) as function of ISNR forumla. The arrow shows the direction of increasing forumla.

Figure 4

Right-hand side of inequality (60) as function of ISNR forumla. The arrow shows the direction of increasing forumla.

Expanding forumla using the results from Section 5.4 and substituting them into (61) we obtain  

62
formula

The curves in Fig. 5 show the dependency of the threshold for detection on the estimated amplitude, forumla‘estimated source amplitude’, of the putative source under study. We used data that close mimics the example from HM03: VT= 81 000 (5 × 0.5 × 32 400), N= 8, σ= 1, R∈[5, 10] and A∈[0.25, 0.75]. We plot three curves for each prior with R taking the values {5(blue), 7(green), 10(red)}.

Figure 5

Right-hand side of inequality (60) as function of ‘estimated source amplitude’forumla. Three values of R were used (pixel units): 5 (blue), 7 (green), 10 (red).

Figure 5

Right-hand side of inequality (60) as function of ‘estimated source amplitude’forumla. Three values of R were used (pixel units): 5 (blue), 7 (green), 10 (red).

6.2 Detection and characterization robustness

The Bayesian framework provides the necessary tools for both of the primary actions of PowellSnakes: detection and characterization. However, the impact of a mismatched prior parameter value has considerably different consequences depending on what we are doing. According to Jaynes (2003), the Bayesian inference rules prescribe that when we have a sufficiently informative likelihood the posterior is only slightly affected by the choice of the priors. Hence, one may expect that a prior mismatch will only have a minimal impact when we are dealing with bright sources. Our simulations are consistent with these predictions. However, when we are tackling faint sources instead, the situation changes. Now the likelihood is no longer very informative and the priors play a part. Although, when doing parameter estimation and since we are using flat priors only, they do not affect the parameter estimates. On the other hand, something quite different happens when doing the detection step. When performing model assessment the priors must be fully normalized. Choosing a wrong prior parameter range can have an important impact on the quality of the detection, either by significantly increasing the number of false hits (spurious) or, on the contrary, making the algorithm become ‘blind’ to detecting sources (misses).4

6.3 Quality of the detection: an upper bound

So far we have only discussed the condition on the detection related to the tuning of the threshold for acceptance/rejection to minimize the symmetric loss. We have said nothing about the level of success achieved. In other words, we know our procedure is tuned in order to minimize the number of undetected plus spurious sources, but we have not yet discussed how well we can do. Following Van Trees (2001) the best a ‘detector’ can achieve is when we have absolute prior certitude about the characteristics of the detection process which is the same as knowing for sure the true values for all parameters. This condition is also known as ‘simple hypothesis test’ and translates into the following prior:  

63
formula
where δ(x) is the Dirac delta function. Recovering expression (49), making forumla, which stands for, no previous information favours one model over the other ≡‘the data must say it all’, using the condition for symmetric loss (51), we have that  
64
formula
Taking logarithms on both sides of the inequality and using expression (10) we obtain  
65
formula
where a0 is the true parameter vector. The left-hand side of the above inequality is an estimator of [L1(a0)]/L0. It is very easy to show that the likelihood ratio estimator conditioned on ‘there is no source’, forumla is Gaussian distributed with N[− (1/2)ISNR20, ISNR20], where forumla. Hence  
66
formula
Substituting forumla into inequality (65), we obtain  
67
formula
Each time this inequality is satisfied we presume we are in the presence of a source, a true detection. However, in our assumptions we have assumed the opposite. Hence, this is the condition for a spurious detection. Therefore, the probability for the occurrence of a spurious detection is  
68
formula
where forumla means ‘once we have chosen Hi’ and erf (x) is the ‘Gauss error function’. Since the loss is symmetric, the probability for the other type of error forumla, the probability that an undetected source occurs, should match that of the spurious one. Thus,  
69
formula
We are now in position to establish the lower bound for a detection. When performing the detection two independent and exclusive types of errors can occur:  
70
formula
The total probability for an error to occur is  
71
formula
where the conditioning in a0 explicitly expresses the assumption of perfect prior knowledge on the parameters values (‘simple hypothesis’). Taking advantage of the symmetry forumla and the normalization condition forumla, we finally get  
72
formula

Fig. 6 shows the dependence of the loss theoretical lower bound on ISNR0. One can see that ISNR0 plays a pivotal role in defining an upper level of the quality of the detection process.

Figure 6

Curve for the theoretical lower bound for forumla loss as a function of ISNR0.

Figure 6

Curve for the theoretical lower bound for forumla loss as a function of ISNR0.

In many situations the radius of the source intensity profile, as recorded in the pixels map, may be considered known and constant throughout the patch. One of these cases is that of considering point shapeless sources. In this case one can further assume, with a high degree of accuracy, that the pixelized intensity closely follows the point spread function (PSF) of the antenna. Thus, considering a statistically homogeneous background, the ISNR0 becomes a function of the source intensity only, A0, as α0=t(R0)TN−1t(R0) may now be considered constant. This separation allows us to make predictions about the minimum flux that one can reliably detect taking into consideration our goals and the experimental set-up. Let us define ‘normalized integrated signal-to-noise ratio’≡ NISNR which is the ISNR of an unit amplitude source template:  

73
formula
This quantity is of considerable importance when establishing a boundary on the source fluxes that can be reliably detected, as it provides a scale of measure. One may think of A (the source amplitude) as being the ISNR measured in NISNR. For a background correlation matrix considered diagonal (white noise, equation 27) and the Gaussian source template (6), an analytical evaluation of the NISNR0 is possible:  
74
formula

In all interesting scenarios one rarely has an allowed parameter domain which sums up to single point in parameter space (‘simple hypothesis’). Now the question is how to define a single number which somehow describes a representative figure for the detection quality upper bound of the ensemble of ‘simple hypothesis’ that constitute the volume of the parameter domain. Within a Bayesian framework we tackle this issue by treating the simple hypothesis parameters as nuisance parameters and integrating them out. Hence, our proposed solution is to average the ‘simple hypothesis’ loss limit, Loss0, over the properly normalized probability priors for the parameters:  

75
formula
Substituting (72) into the equation (75) above, and dropping the zero subscript once it no longer carries any special meaning, we obtain  
76
formula
where b stands for the entire parameter set except the position coordinates. We are implicitly assuming a factorizable parameter prior (see formula 11). The above formula plays a key role when defining the expected quality for a given detection and measurement set-up. If the predicted detection scenario has a low average ISNR, we should never expect a good detection performance. By experience we have verified that when tackling real situations the expected loss is usually much larger than the predicted theoretical bound.

When considering the case where the prior on the source amplitude, π(A), can be assumed flat on the region of detection, and the source radius constant across the patch, to a good approximation an analytical evaluation of the above expression is possible:  

77
formula
where Im≡‘minimum ISNR’ and Im≡‘maximum ISNR’. From Fig. 7 one realize the importance of previewing the ISNR limits of our measurement set-up in order to define the expected upper bound on the detection quality.

Figure 7

Plot of equation (77). Each curve represents a different value of Im shown close to it.

Figure 7

Plot of equation (77). Each curve represents a different value of Im shown close to it.

6.4 Discussion on truncation of the posterior by the prior

So far the procedure described (formula 54) does not take into account the fact that the prior might abruptly truncate the posterior before it falls (largely) to zero, and hence may lead to an overestimation of the integral. The purpose of this discussion is essentially directed towards the truncation of the posterior which happens in the {A, R} subspace. When considering the position subspace, the prior is independent of the area considered |s|, as far as this fully encompasses the likelihood peak, which is always the case when dealing with real scenarios.

When we were dealing with the source shape and the beam melted into a single template of radius R embedded into a background of white noise only, the effect of truncation was already evident, especially when were tackling cases of low and very low ISNR (forumla), but not extremely severe. This has provided the rational behind the ‘tuning parameter’, see below. The effect was not severe because the extent of the area under the likelihood fitted fairly well inside the prior box because the A and R parameters error bars were comparatively small.

6.4.1 The tuning parameterφ

When assuming flat priors for all the parameters, the priors become constants and therefore can be taken out of the evidence integrals. Thus, one may now use the approximate formula (54) as a fast way of evaluating the evidence integrals, as far as the truncation effects are kept under control. However, especially when dealing with faint sources we should expect these effects to become significant and hence they must be taken into consideration. Therefore, we added a tuning parameter, φ, to correct for possible systematic deviations from the true value of the integral, i.e.  

78
formula
The new expression for the assessment level curve (right-hand side of inequality 60) which results from the introduction of the ‘tuning parameter’, is just the old form where we have only changed the forumla expression:  
79
formula

6.4.2 How to evaluate the parameterφ

In the simple case of white noise it is possible to compute an average value of the tuning parameter:  

80
formula
where, Va≡‘parameter volume of a single peak’. Substituting expression (10) into (80) we get  
81
formula
where ‘average value of the data under hypothesis forumla’=‘signal vector true value ≡s0’. The integral in the numerator is difficult to compute, but restricting our analysis to assuming a white noise background only M−1=I, obviates a numerical evaluation of this integral:  
82
formula
From expression (82), one sees that the value of φ depends on s0, the true shape of the object. In Table 1 we display several values of φ which result from the application of the above formula to typical scenarios (forumla). The values we obtained for the ‘tuning parameter’ are consistent with those we first expected. When dealing with scenarios with faint signals ⇒ low 〈ISNR0〉 we anticipate the parameter values being close to the priors inferior bounds and at the same time large error bars are expected, hence the likelihood truncation effects should make a significant contribution to the value of the evidence integral. Thus, the low value of φ should not be a surprise. When the opposite happens ⇒ high 〈ISNR0〉 the Gaussian approximation to the likelihood peak is no longer accurate and the large φ is just a consequence of the wider tails of true likelihood function.

Table 1

Tuning parameter φ as function of forumla.

forumla is small (≤5)  φ∈ (0.66, 1] 
forumla   φ≈ 1 
forumla is large (≥7)  φ∈ (1, 1.33] 
forumla is small (≤5)  φ∈ (0.66, 1] 
forumla   φ≈ 1 
forumla is large (≥7)  φ∈ (1, 1.33] 

6.4.3 Impact of φ on the performance of the algorithm

Using a series of simulations (see Section 8) for different detection scenarios, we estimated the impact of the tuning parameter, φ, on the performance of the algorithm. We concluded that

  • the algorithm is not very sensitive to changes of φ;

  • the differences in performance never exceeded 1.5 per cent when compared with the fiducial value φ= 1;

  • the largest difference occurred when we dealt with very small values of 〈ISNR0〉(<4), or very large (>8) and

  • most important of all: ‘the exact values of φ always produced the best results’.

Therefore, the golden rule displayed in Table 2 should be applied.

Table 2

Tuning parameter φ; the golden rule.

〈ISNR0〉≤5 Use φ= 0.75 
5 < 〈ISNR0〉 <7 Use φ= 1.00 
〈ISNR0〉≥7 Use φ= 1.15 
〈ISNR0〉≤5 Use φ= 0.75 
5 < 〈ISNR0〉 <7 Use φ= 1.00 
〈ISNR0〉≥7 Use φ= 1.15 

7 INTRODUCING ‘COLOUR’

When dealing with real astronomical detection problems, the background diffuse component of the maps is not usually dominated by the instrumental noise. Nowadays, the most common scenario is the one where the largest signal component of the background has an astronomical origin: the ubiquitous CMB and the galactic foregrounds (free–free, dust, synchrotron). Currently we are focusing on the CMB especially because it can be considered statistically homogeneous and isotropic across the entire sphere with great accuracy (assumptions of our simplified model). The CMB radiation field has already an intrinsic coherence length (∼10 arcmin), which taking into consideration the current surveys resolution (usually lower than 1 arcmin pixel−1), may be hardly considered white. In addition, as it is collected by an antenna with a finite aperture (usually several pixels), whose net effect, in Fourier space, is that of a low pass filter, an extra degree of correlation is injected into the background. Hence, only an algorithm which appropriately deals with a background whose power spectrum is not flat could possibly cope with such a set-up. The power spectrum of the simplified background model we proposed (see 86) is not yet fully realistic. However, despite of its simplicity, this model, already introduces the main characteristics of this background. This allows us a direct comparison between the model predictions and the simulations results. When dealing with a background assumed to be generated by a stationary Gaussian process but now with an arbitrary coherence length, the condition (27) on the inverse correlation matrix no longer holds. Nevertheless, despite the fact the correlation matrix is no longer diagonal, if the condition of statistical homogeneity still holds, one may always assume the background correlation matrix to be circulant. Taking advantage of the circulant property of the correlation matrix and transforming to Fourier space, the problem significantly simplifies as the power spectrum of a circulant correlation matrix is always diagonal. This is not a severe limitation because considering patches of small size only, the condition for statistical homogeneity is a good approximation. We are assuming not only statistical homogeneity but isotropy as well. Most of the formulas presented here will hold even if we drop the condition for statistical isotropy. We shall ignore this fact and we will use the statistical isotropy condition throughout the remaining text. We do so because it gives us the opportunity for significant simplifications without restricting the opportunity for real applications, as this condition holds pretty well when tackling the great majority of the problems of interest. Expressing equation (25) (multiplied by −1) in Fourier space one obtains  

83
formula
where the usual k= 2πη. In a similar way for the likelihood forumla of the background:  
84
formula
where each of the symbols forumla are the Fourier transforms of τ and d, respectively, and forumla is the normalized power spectrum of the background, i.e., forumla. In what follows the symbols with a tilde on top mean the Fourier transform of the original symbol (without the tilde). We are assuming that a pre-normalization of the map's pixels has already been done by dividing the value of each map's pixels by the map rms value (≡σ), which implies a value of σ= 1 for the resulting map.

Let us now introduce, for the first time, correlation effects in the background diffuse component. This correlation is to be expected not only because it already exists in the original astronomical background (CMB), and in the diffuse foregrounds (galactic components), but also due to the effect of the antenna PSF. We are considering now a simplified model for the autocorrelation function of the background, after being passed through the antenna. This model is of the form  

85
formula
where s0 is a measure of the ‘coherence’ length of the Gaussian isotropic homogeneous stationary background field (see Goodman 2000).5 The autocorrelation function is already properly normalized as required by our previous assumptions, i.e., σ2b=〈xixib=nb(0) = 1. Transforming to Fourier space and using the Wiener–Khinchin theorem (Goodman 2000) we get  
86
formula
where forumla is the ‘power spectrum’ and η0= (2πs0)−1.

Together with the astronomical components one should always expect the presence of ‘pixel noise’. In our signal model, ‘pixel noise’ stands for all the noise components resulting from the instrumental set-up. We are assuming the ‘pixel noise’ to be independent from pixel to pixel, hence configuring a white noise process. We also assume that the ‘pixel noise’ is uncorrelated with all the astronomical components, thus allowing us to write the total background power spectrum as the sum of these two components:  

87
formula
where forumla (white noise process) and we added the constants {wI, wB} in order to allow the signal components to be present in different amounts, hence encompassing a broader range of possible scenarios. These constants must satisfy the normalization condition wI+wB= 1 as the total power spectrum must integrate to unity.

7.1 ISNR – the symmetric loss lower bound

Let us begin by expressing in Fourier space, the equivalent representation of the ISNR, using the appropriate form of the Parseval theorem:  

88
formula
First we will handle the simplest possible case where the radii of the sources are constant. We start by evaluating the NISNR:  
89
formula
as this will provide us with a metric to predict how deep we can go in the flux scale without compromising the detection quality. Substituting the expression (87) for the power spectrum and the unity amplitude Gaussian template (6) into (89) and evaluating the integral (which has numerical solution only) and using typical values for the sources and background parameters we obtain the function plotted in Fig. 8.

Figure 8

NISNR (σ units) versus ‘source radius’R (pixel); s0= 11.57 pixel, wI= 1/5, wB= 4/5.

Figure 8

NISNR (σ units) versus ‘source radius’R (pixel); s0= 11.57 pixel, wI= 1/5, wB= 4/5.

Figure 9

‘Whitened source’ R <<s0: radius R = 3.4, NISNR ≍ 23.31; background parameters s0 = 11.57, wI = 0.01, wB = 0.99; all distances in pixels.

Figure 9

‘Whitened source’ R <<s0: radius R = 3.4, NISNR ≍ 23.31; background parameters s0 = 11.57, wI = 0.01, wB = 0.99; all distances in pixels.

Figure 10

‘Whitened source‘ Rs0: radius R = 3.4, NISNR ≍ 1.47; background parameters s0 = 2.0, wI = 1/5, wB = 4/5; all distances in pixels.

Figure 10

‘Whitened source‘ Rs0: radius R = 3.4, NISNR ≍ 1.47; background parameters s0 = 2.0, wI = 1/5, wB = 4/5; all distances in pixels.

When we were dealing with a white noise only background the NISNR was proportional to the radii of the sources (see formula 74). However, when we introduce the correlation, a completely different curve results. Now, the dependence of the NISNR on the object radius stops being linear and a complex shape emerges.

7.2 The ‘whitening’ equalizer

A quantity which plays a central role in the detection process is the likelihood ratio of the competing models (see formulas 83 and 84). After taking logarithms one obtains  

90
formula

Now let us assume that our source template may be modelled by the following expression which is a generalization of equation (6):  

91
formula
where forumla and forumla is the Fourier transform of the source template centred in the origin. Substituting into (90) we get  
92
formula
where forumla is the inverse bi-dimensional Fourier transform. One should already be familiar with this expression: the second term is 1/2 of the ISNR expressed in Fourier space, which doesn't depend on the position coordinates, and the first, in the argument of the inverse Fourier transform, is the ‘linear matched filter’ applied to the map, where  
93
formula
Further we can read this expression in a somewhat different manner. Let us write the power spectrum as the product of its square root (the expected amplitude spectrum), i.e. forumla. Substituting into the argument of the Fourier transform (first term of 92) we have  
94
formula
Evaluating the square of the first term of (94) ensemble average, i.e. its power spectrum we obtain  
95
formula
Hence, after applying the linear filter forumla we removed the ‘colour’ from the background transforming it into a ‘white’ Gaussian random field. This is the reason why one calls the linear matched filter a ‘whitening equalizer’. Let us name the resulting map after being ‘whitened’:  
96
formula
Of course, the sources buried in the map become ‘whitened’ as well. In fact the second term of (94) is nothing else but a whitened source:  
97
formula
Rewriting formula (92) using these new definitions:  
98
formula
It can be easily verified that this formula, where we used the ‘whitened’ counterparts of our entities, reduces to the familiar white noise likelihood ratio (formulas 64 and 65) expressed for convenience in Fourier space. Hence, everything we said for the white noise case applies here, even when we ought to consider the effects of correlation in the background, as long as we use the ‘whitened’ entities instead.

7.3 Fisher analysis in Fourier space

Proceeding with the same line of reasoning as that initiated in Section 5.4, but this time expressing forumla using Fourier space (83), we start by splitting the evaluation of the Fisher matrix into three different cases.

  • The first case refers to those coefficients whose differentiation variables bi are not positional. We shall collectively name them as ≡C:  

    99
    formula
    These elements of the F matrix will be generally ≠ 0.

  • The second case deals with those coefficients where one of the differentiation variables belongs to set bi and the other to xi, the position set:  

    100
    formula
    One may rewrite the expression (100) taking advantage of the symmetries of the expressions involved:  
    101
    formula
    where η > 0 means that the integral extends to all the Fourier modes which have forumla (half of the Fourier plane). A necessary condition for those class of coefficients being equal to 0 is forumla being real. On the other hand, h(bβ, r) having reflection symmetry on both axes is a sufficient condition for that to happen, which is the case for our current model (formula 26).

  • The third case is when we are dealing with the position parameters only:  

    102
    formula
    As the example given above, if h(bβ, r) has reflection symmetry on both axes, this is a sufficient condition for [where we are ignoring the trivial case h(bβ, r) = 0] 
    103
    formula
    to happen. This third set of the Fisher coefficients will be named as ≡P.

Ordering the parameters by collecting the position parameters xk in the upper left-hand corner, the Fisher matrix will then exhibit a structure of the form displayed (⊠≡ in general a non-zero value).

 

104
One may take advantage of the depicted symmetry of the Fisher matrix when trying to compute its inverse:  
105
formula
As an illustrative example and again, with the help of formula (34), one may explicitly write the error bars on the position parameters:  
106
formula
where we have employed the circular symmetry of proposed template (26).

7.4 Extending to real scenarios

So far we have only considered an extremely simplified model. Our purpose was twofold.

  • Make the problem practical in terms of required resources.

  • Avoid obscuring and cluttering the fundamentals with complex ‘decoration’ details.

However, we should always keep in mind that the experimental set-up we wish to address in the future, the Planck Surveyor satellite, and the object of its study, the full sky, adds other significant challenges as well. Three of them are worth of further discussion.

  • The in-homogeneity of the statistical properties of the background. When introducing the galactic emission which is mainly limited to a narrow band around the galactic plane, a severe breakdown on the condition of statistical homogeneity occurs. Moreover, the instrumental pixel noise is never completely homogeneous. As result of the scan strategy there are zones of the sphere which will be sampled a greater number of times than others, increasing the integration time and leading to lower noise levels than those of the areas where beam spends less time. Using small patches (∼128/256 × 128/256 pixel) and recomputing the background for each one of them will decrease this problem to a level which can be safely ignored. Though, it must be noted this is not a limitation of the Bayesian framework which, on theoretical grounds, is prepared for dealing with in-homogeneous backgrounds equally well. We only take advantage of the circulant properties of the covariance matrix, transforming to Fourier space as an implementation technique which allows us a much simpler and fast solution to our problem rending it practical and efficient. However, the same set of assumptions is required in the derivation of the optimal linear filters together with an implicit assumption of Gaussianity (HM03).

  • The non-Gaussianity of the galactic diffuse foregrounds. The galactic diffuse fields, for instances dust, which can show considerable values of non-Gaussianity, as result of the central limit theorem, only introduce extremely small distortions on the Gaussian distribution of the Fourier modes. The non-Gaussian part of the field brings in no more than small correlations between the different Fourier modes which can be securely ignored when compared for instances with the potentially much more serious non-homogeneity of the background (HM03; Rocha et al. 2005).

  • The ‘confusion’ noise as result of the contribution to the background of the faint unresolved point sources. This is the dominant component of the background of the Planck high-frequency (545 and 857 GHz) maps on high galactic latitudes. This component may be modelled as a Poissonian sampling process with a constant rate source. Any realistic background model should take this into account.

8 RESULTS

We tested the presented ideas performing an extensive set of simulations using two different toy models and different detection scenarios (the latter solely for the white noise only model, see Section 8.2). In both cases we used a Gaussian source template (6). The parameters for each source were drawn from uniform distributions with bounds given in Table 4. The patches are square grids of 200 × 200 pixel with a border of 2× the largest source radius, in pixel. The sources were spread uniformly throughout the patch. Care was taken to avoid source clustering and the presence of sources in the patch's borders. We imposed a safe distance between sources by defining a ‘bounding’ square, centred on the source with side =γ× 2 ×R pixel wide, where R≡‘radius of object’ (see Table 4 for the values of γ). We forbade the overlapping of these ‘bounding’ squares.

Table 4

Detection models parameters, where the Rmax and Rmin are the maximum and minimum radius, respectively; and Amax and Amin are the maximum and minimum amplitude, respectively; the radii units are pixel and the source amplitudes units are given in the same unit as σ.

Detection models 
 LISNR-16-var LISNR-16-fix Hobson–McLachlana 
Rmax (pixel) 5.4 5.4 10 
Rmin (pixel) 3.21 3.21 
Amax 0.76 0.8 
Amin 0.2 0.22 0.5 
σ 
Nobjs 16 (variable) 16 (fixed) 8 (variable) 
γ 1.5 1.5 2.5 
Detection models 
 LISNR-16-var LISNR-16-fix Hobson–McLachlana 
Rmax (pixel) 5.4 5.4 10 
Rmin (pixel) 3.21 3.21 
Amax 0.76 0.8 
Amin 0.2 0.22 0.5 
σ 
Nobjs 16 (variable) 16 (fixed) 8 (variable) 
γ 1.5 1.5 2.5 

aastro-ph/0204457 (HMcL).

8.1 Performance of the algorithms

We evaluated the performance of the different algorithms by computing the ‘total error’ defined as  

formula
where

  • a ‘undetected source’ happens when a simulated object is not recognized by the algorithm as a source;

  • a ‘spurious detection’ happens when

    • an object is detected,

    • there is no simulated object such that the entire set of its parameters belongs to the volume defined by the estimated parameters of the detected object ±3σp;

  • the ‘percentage’ is computed dividing the quantities by the total number of sources in the patch.

The algorithm that minimizes the ‘total error’ is the one that performs better.

8.2 White noise background

In Table 3 we show the performance of PowellSnakes, where φ is the tuning parameter. The sources are imbedded into a background which is a stationary Gaussian random field with a coherence length much smaller than 1 pixel, usually known as Gaussian white noise process.

Table 3

Summary of the results from the simulations with a white noise background. Nsimul is the number of patches which have been simulated, φ is the tuning parameter.

Type of simulation LISNR-16-fix LISNR-16-var HMcL 
〈ISNR〉 3.84 3.62 4.68 
φ 0.75 0.66 0.66 
Nsimul 5000 1000 10 000 
Per cent detections 67.41 56.41 82.95 
Per cent spurious 9.60 8.62 8.19 
Total error (per cent) 42.19 52.20 25.15 
Type of simulation LISNR-16-fix LISNR-16-var HMcL 
〈ISNR〉 3.84 3.62 4.68 
φ 0.75 0.66 0.66 
Nsimul 5000 1000 10 000 
Per cent detections 67.41 56.41 82.95 
Per cent spurious 9.60 8.62 8.19 
Total error (per cent) 42.19 52.20 25.15 

Two scenarios were tested.

  • HMcL (Hobson–McLachlan). With this example we tried to mimic, as exactly as possible, the HM03 ‘Toy problem 4.3’.

  • LISNR. This example was inspired in another from HM03, where a SZ example was given, but we have only retained the range of the ISNR involved. We did not try to simulate the SZ case at all as we have always worked with the Gaussian source template.

    Two situations were investigated.

    • With a fixed number of sources per patch: ‘fix’.

    • With a variable number of sources per patch: ‘variable’. In this case the number of sources for each patch was drawn from a uniform distribution whose minimum limit was zero and the maximum was the value displayed in Table 4 (Nobjs).

Table 4 has a summary of the parameters' values, background and foregrounds (sources), for each of the simulated examples. Fig. 1 represents a typical likelihood manifold for this case (A and R dimensions suppressed).

Each run (map complete cleanup) takes less than a second (∼2 each second) on a regular PC (Pentium IV, 2.39 GHz). A considerable effort to optimize the code was made. An intelligent management of previously computed values was essential in achieving such a high performance. Provisions for parallelizing the code were made but not yet employed.

8.2.1 Discussion

One of our major goals when we started this project was making Bayesian methods run fast enough in order to allow us to collect significant statistics figures by performing massive simulations in a reasonable amount of time. We have fulfilled our goal, as you may verify in Table 3, several thousands of simulations were performed in each proposed scenario making our figures statistically sounding. Our results are consistent with the theoretical framework presented here. Examples with lower ISNR show higher levels of ‘total error’ as expect and significantly higher than the lower bound (see Section 6.3). Cases with a fixed number of sources per patch always show a lower error. This should not be a surprise as in our priors we have assumed a constant expected number of sources per patch and the lower the dispersion in the number of sources the better the prior fitted the data, hence the higher performance. Worth of a reference is the ‘tuning parameter’ (Section 6.4). In Table 3 we are only showing results achieved using the ‘tuning parameter’ values which performed better. All the presented examples are low ISNR cases, thus we have only employed values lower than 1 (see Table 3). These ‘best choice’ values are consistent with our predictions.

8.3 ‘Coloured’ background

Our arrangement for the ‘coloured’ background simulations followed the same guidelines as those for the white noise only case. A greater care concerning the realism of the set-up was taken into consideration. We used a pixel size of 1 × 1 arcmin2, and small patches of 200 × 200 pixel (∼3.33°× 3.33°). The background consisted of two uncorrelated components.

  • A stationary Gaussian random field with a Gaussian power spectrum. The power spectrum, already smoothed by the antenna, was modelled by formula (86) with a coherence length-scale parameter of s0= 11.57 pixel (see Fig. 11).

  • Pixel noise with a ‘white’ profile with constant level across the map.

  • The CMB rms level used was 10 μK and the pixel noise 20 μK which together implied wI= 4/5 and wB= 1/5 (see expression 87, Fig. 12).

Figure 11

‘Coloured’ background – astronomical component; power spectrum was modelled using formula (86); s0= 11.57 pixel.

Figure 11

‘Coloured’ background – astronomical component; power spectrum was modelled using formula (86); s0= 11.57 pixel.

Figure 12

‘Coloured’ background – complete; astronomical component 10 μK, pixel noise 20 μK ⇒wI= 4/5, wB= 1/5.

Figure 12

‘Coloured’ background – complete; astronomical component 10 μK, pixel noise 20 μK ⇒wI= 4/5, wB= 1/5.

The simulated sources had amplitudes ranging from 0.9 to 1.0, in normalized units (normalized by the total rms noise level). The radii of the sources were inside the interval [3.2, 5.0] pixel. The whole set-up results in an average 〈ISNR〉 of ≃ 3.84. In each simulated map we put 16 objects with their parameters drawn from uniform distributions (whose limits are given above). Following the same prescription as for the white noise only case, we prevented the sources from forming clusters and being placed on the patch borders (see Fig. 13). Table 3 contains the results obtained. We have performed a total of 5000 simulations.

Figure 13

The simulated sources. The shape of the sources was assumed Gaussian (6); all the parameters {A, R, X, Y} were drawn for uniform distributions (check the text for details).

Figure 13

The simulated sources. The shape of the sources was assumed Gaussian (6); all the parameters {A, R, X, Y} were drawn for uniform distributions (check the text for details).

8.3.1 Discussion

When we introduced correlation in the background, the likelihood manifold has now considerably changed from the white noise only example. When keeping {A, R} constant, looking at the position subspace one may see that the likelihood maxima are now extremely narrow and high and they have become surrounded by subsidiary peaks of considerable height (see Figs 2 and 14). This scenario is even more acute when the proportion of instrumental noise becomes lower in relation to the ‘coloured’ component. In Fig. 15 we depict an exaggerated case (for clarity) where the instrumental noise is only 10 per cent of the astronomical component. Nevertheless, we should stress that current detection instrumentations are already getting very close to these values. The Planck mission Low Frequency Instrument (LFI) (ESA 2005) has a previewed average ratio of ∼20 per cent between the astronomical components and the detectors instrumental noise. Though a more balanced proportion should be expected on the majority of existing instrumental set-ups. When dealing with such scenarios the search for the likelihood maxima becomes harder. The likelihood peaks being thin and completely surrounded by a very high number of other fake peaks are extremely difficult to find. One now needs to unfold a much higher number of ‘snakes’, typically four times the number of the white noise only case. The use of the ‘jump procedure’ is now mandatory in order to pass through the large accumulation of fake peaks which encircles the real ones. Since the likelihood peaks are now extremely thin, the error bars on the source localization are now extremely small, usually much less than a pixel. This is the equivalent of assuming that the estimated values for the position parameters perfectly match the true ones. When we start employing the Bayesian procedure for validation of the detection we should replace the flat position prior (see 52) by a Dirac delta centred on the values previously estimated (see 107) for the position of the source:  

107
formula

Figure 14

Likelihood ratio forumla and R dimensions suppressed) top view; the sources are the same as in Fig. 13; the background parameters are those from the simulations: s0= 11.57 pixel, forumla.

Figure 14

Likelihood ratio forumla and R dimensions suppressed) top view; the sources are the same as in Fig. 13; the background parameters are those from the simulations: s0= 11.57 pixel, forumla.

Figure 15

‘Coloured background’– the likelihood ratio forumla manifold position subspace. Example where the instrumental noise is very low; background parameters: s0= 11.57 pixel, wI= 0.01, wB= 0.99, NISNR ∼ 23.31.

Figure 15

‘Coloured background’– the likelihood ratio forumla manifold position subspace. Example where the instrumental noise is very low; background parameters: s0= 11.57 pixel, wI= 0.01, wB= 0.99, NISNR ∼ 23.31.

As a final note on the speed of the code execution we must note the substantial decrease of performance when dealing with the ‘coloured background’. The average time for completing a full patch is now about 8 s which is about an order of magnitude more than in the previous case. Despite this significant reduction, our initial purpose (performing massive amount of simulations) has not been compromised as one can still undertake thousands of simulations in a reasonable amount of time.

A summary of the sequence of steps performed by PowellSnakes, version I (the version here exposed), is given in Appendix A.

9 CONCLUSIONS

The detection and characterization of discrete objects is a common problem in many areas of astrophysics and cosmology. When performing this task most methods assume a smoothly varying background with a characteristic scale or coherence length much larger then the scale of the discrete objects under scrutiny. However, high-resolution observations of the CMB pose a challenge to such assumptions. CMB radiation has a coherence scalelength of the order of ∼10 arcmin, close to that of the objects of interest such as extragalactic ‘point’ sources or the SZ effect in galaxy clusters. In addition the instrumental noise levels can be greater than the amplitude of the discrete objects. The common approach for dealing with such difficulties is to apply a matched filter to the initial map and instead analyse the filtered map. These approaches have reported good performances. However, the filtering process is only optimal among the limited class of linear filters and is dissociated from the subsequent object detection step of selection performed on the filtered maps. HM03 were the first to introduce a Bayesian approach to the detection and characterization of discrete objects in a diffuse background. Its implementation was performed using MCMC sampling from the posterior which was very computationally expensive. This prompted us to explore a new, fast method for performing Bayesian object detection in which sampling is replaced by multiple local maximization of the posterior, and the evaluation of errors and Bayesian evidence values is performed by making a Gaussian approximation to the posterior at each peak. In this paper we presented such a novel approach, PowellSnakes. We start by pre-filtering the map and explicitly show that for a given value of R (radius of the source), peaks in the filtered field correspond to peaks in the log-likelihood considered as function of the position of the sources. We replace the standard form of the likelihood for the object parameters by an alternative exact form that is much quicker to evaluate. Next we launch N downhill minimizations in the two-dimensional (X, Y) using the Powell algorithm. In order to avoid local maxima in the likelihood surface we devised the so-called ‘jump procedure’, a new concept borrowed from the realm of quantum mechanics. Next we perform multiple Powell minimizations in the full four-dimensional space (X, Y, A, R), starting each minimization from the positions found in the previous step. To evaluate the performance of an algorithm we introduced here for the first time the concept of ‘symmetric loss’ within the context of the proposed subjects. The Peak-based prior, P, was proposed and an upper bound on the quality of a detection presented. We showed that the quantity NISNR plays an important role when establishing a boundary on the source fluxes that can be reliably detected. We included a discussion on the truncation of the posterior by the prior and proposed a novel solution by introducing the ‘tuning parameter’, φ. We studied in detail the consequences of adding correlation into the diffuse background which makes the substructure where the sources are imbedded. As the linear filter works as a whitening equalizer, the coloured noise case reduces to the white noise as long as we use the ‘whitened’ map and sources. Using simulations of several typical scenarios we showed that our approach performs very well on both white and coloured background. We found that our results are consistent with our theoretical framework. For the white noise background, cases with a lower ISNR exhibit higher levels of ‘total error’ as expected. Cases with a fixed number of sources per patch show a lower error. The number of spurious detections increases with this prior, however, the number of detected sources increases by a larger amount resulting in a lower ‘total error’. When we introduce colour in the background the likelihood changed considerably: the maxima in the position subspace are now extremely narrow and high surrounded by subsidiary peaks of considerable height. The search for the likelihood maxima becomes harder: one has to resort to a larger number of ‘snakes’. The usage of the jump procedure becomes mandatory. In this case the errors in the source localization are very small, usually much less than a pixel. This means that we can replace the flat position prior by a Dirac delta function centred on the values previously estimated.

Furthermore, this approach yields a speed-up over sampling-based methods of many 100 s, making the computational complexity of the approach comparable to that of linear filtering methods. Here we presented PowellSnakes, in its first incarnation: ‘PowellSnakes I’. An account of an updated implementation, ‘PowellSnakes II’ will be published shortly. The application of the method to realistic simulated Planck observations will be presented in a forthcoming publication.

1
Such an elongation can be caused by asymmetry of the beam, source or both. If our model assumes a circularly symmetric source but an asymmetric beam, the elongation will be constant across the map, but not if the sources are intrinsically asymmetric. In the latter case, one thus has to introduce the source position angle as a parameter, in addition to the elongation.
2
In its early stages PowellSnakes treated a point source convolved with the beam as a single parametrized template. In the current version these are separate entities.
3
Ongoing work on an improved implementation of PowellSnakes (version II) uses the following priors: uniform priors on the position, X and Y, and scale, R, while the brightness, source amplitude A, is modelled by a power law: N(s) ∝s−β with β in [2, 3), where s is the source flux. The range for parameter R is taken from a parameter file.
4
In PowellSnakes (version II) the flat prior on the source amplitude A has been replaced by a much more realistic power law: N(s) ∝s−β, and the extreme sensitivity of the detection step upon the values of the prior parameters was gone.
5
There is not a complete consensus about the exact definition of ‘coherence length’. We shall follow Goodman's definition, ‘coherence length’forumla, which applied to our particular form of the autocorrelation function gives forumla.

PC thanks the Cavendish Astrophysics Group of the University of Cambridge for support and hospitality during the progression of this work. GR acknowledges support from the US Planck Project, which is funded by the NASA Science Mission Directorate. GR would like to acknowledge useful discussions with Krzystof Górsky and Charles Lawrence.

REFERENCES

Bertin
E.
Arnouts
S.
,
1996
,
A&AS
 ,
117
,
393
Cohen-Tannoudji
C.
Diu
B.
Laloe
F.
,
1973
,
Mecanique Quantique
 .
Hermann
, Paris
ESA
,
2005
,
Planck Bluebook
 .
ESA
, Noordwijk (http://www.rssd.esa.int/)
Feroz
F.
Hobson
M. P.
,
2008
,
MNRAS
 ,
384
,
449
Goodman
J. W.
,
2000
,
Statistical Optics
 .
Wiley
, New York
Haehnelt
M. G.
Tegmark
M.
,
1996
,
MNRAS
 ,
279
,
545
Herranz
D.
Sanz
J. L.
Barreiro
R. B.
Martínez-González
E.
,
2002a
,
ApJ
 ,
580
,
610
Herranz
D.
Sanz
J. L.
Hobson
M. P.
Barreiro
R. B.
Diego
J. M.
Martínez-González
E.
Lasenby
A. N.
,
2002b
,
MNRAS
 ,
336
,
1057
Herranz
D.
Sanz
J. L.
Barreiro
R. B.
López-Carniego
M.
,
2005
,
MNRAS
 ,
356
,
944
Hobson
M. P.
McLachlan
C.
,
2003
,
MNRAS
 ,
338
,
765
(HM03)
Hobson
M. P.
Bridle
S. L.
Lahav
O.
,
2002
,
MNRAS
 ,
335
,
377
Jaynes
E. T.
,
2003
,
Probability Theory: The Logic of Science
 .
Cambridge Univ. Press
, Cambridge
López-Caniego
M.
Herranz
D.
Sanz
J. L.
Barreiro
R. B.
,
2005
,
EURASIP J. Appl. Signal Process.
 ,
15
,
2426
Press
W. H.
Teukolsky
S. A.
Vetterling
W. T.
Flannery
B. P.
,
1992
,
Numerical Recipes in C
 ,
2nd edn
.
Cambridge Univ. Press
, Cambridge
Rocha
G.
Hobson
M. P.
Smith
S.
Ferreira
P.
Challiner
A.
,
2005
,
MNRAS
 ,
357
,
1
Sanz
J. L.
Herranz
D.
Martínez-González
E.
,
2001
,
ApJ
 ,
552
,
484
Savage
R. S.
Oliver
S.
,
2007
,
ApJ
 ,
661
,
1339
Van Trees
H. L.
,
2001
,
Detection, Estimation, and Modulation Theory, Part I
 .
Wiley-Interscience
, New York

Appendix

APPENDIX A:

PowellSnakes is a new fast Bayesian approach for the detection of discrete objects immersed in a diffuse background. This new method speeds up traditional Bayesian techniques by

  • replacing the standard form of the likelihood for the parameters characterizing the discrete objects by an alternative exact form that is much quicker to evaluate;

  • using a simultaneous multiple minimization code based on Powell's direction set algorithm to locate rapidly the local maxima in the posterior and

  • deciding whether each located posterior peak corresponds to a real object by performing a Bayesian model selection using an approximate evidence value based on a local Gaussian approximation to the peak. The construction of this Gaussian approximation also provides the covariance matrix of the uncertainties in the derived parameter values for the object in question.

This new approach provides a speed up in performance by a factor of ‘100’ as compared to existing Bayesian source extraction methods that use MCMC to explore the parameter space, such as that presented by HM03. The method can be implemented in either real or Fourier space. In the case of objects embedded in a homogeneous random field, working in Fourier space provides a further speed up that takes advantage of the fact that the correlation matrix of the background is circulant. Its performance is found to be comparable if not better to that of frequentist techniques such as applying optimal and wavelet filters. Furthermore, PowellSnakes has the advantage of consistently defining the threshold for acceptance/rejection based on priors, the same cannot be said of the frequentist methods.

Let's start by recapitulating that the Powell's method is the prototype of ‘direction-set methods’ for maximization or minimization of functions. These are the methods to apply when you cannot easily compute derivatives. However, this method requires a one-dimensional minimization subalgorithm, e.g. Brent's method. The Brent's method is an interpolation scheme whereby you alternate between a parabolic step and golden sections. It is used in one-dimensional minimization problems without calculation of the derivative. (for more details see Teukolsky & Vetterling 1992, pp. 389, 395, 406).

The steps followed in PowellSnakes, version I, are the following.

  • Compute the classical linear matched filter forumla for the whole patch in the Fourier domain – the denominator is constant and needs to be evaluated only once which can be done in Fourier space as well.

  • Compute all the constants.

  • Fourier transform the map – a 4-Mpixel map takes at most 3 s (FFTW).

  • Filter the map (≃2 s).

  • Transform back to the map space using the inverse Fourier transform (≃3 s).

  • Evaluate the ln [P(H1 |d)/P(H0 |d)] map (≃2 s).

  • Find the peaks of ln [P(H1 |d)/P(H0 |d)] field – check if they are higher than a certain acceptance level: find the peaks using a Powell minimizer with jumps. The evaluation of the objective function is fast – use a kind of evaluation cache (pick a value from a pre-calculated array).

    However, we still need to find the optimal radius, positions and amplitude in real space. When using the matched filter an average value for the radius of the source has been used and the values of the peaks are only approximate. Now we need to find the optimal parameters.

  • Error bars are calculated from approximate versions of the analytic formulae (while MCMC provides these errors automatically).

  • The stop criteria – no stop criteria – however there is a way of ensuring that no peak is left behind – we proceed as follows: first of all we can have an idea of the area (not the volume) under the likelihood peaks. Let's consider now an imaginary pixel of the size of this area. Split the map into these imaginary pixels and start a PowellSnakes search in the centre of each one of them. It is unlikely to miss any of the significant likelihood maxima (specially if we use the jump technique). One might think this process to be time consuming but in fact this is not the case since the objective function is pre-calculated.

A1 Characteristics

Let's now characterize PowellSnakes in terms of its speed and sensitivity.

A1.1 Speed

The PowellSnakes algorithm is a two-step process, in the first part of the algorithm we proceed as follows.

  • First the map is filtered with a linear matched filter given by  

    (A1)
    formula
    The evaluation is done in Fourier space using the average radius of the possible range of radii. The map is Fourier transformed again back to real space. If the radii of the objects were all equal, and if we knew their true value, the resulted filtered field would be proportional to the likelihood field with the absolute maxima on top of the objects. When the available range of radii is small, filtering with the ‘average’ matched filter will not produce absolute maxima but in the great majority of the situations a local peak is created over the objects.

  • Next, we search this map with the Powell minimizer in two dimensions (the position) with the jumps switched on. This is an extremely fast operation because the objective function is already pre-calculated, and we just need to pick up a value from an array, or better, compute a bilinear interpolation.

In the second part of the algorithm we proceed as follows.

  • Now we already have a good idea of where to look further: around the maxima of the likelihood restricted to the subspace of the position of the objects. In step (1) we produced a ‘whitened’ version of the map, and we ‘whitened’ the test objects, at the same time. At this point we switch off the jumps, and launch ‘PowellSnakes’ in all four dimensions (position, radius and amplitude), taking as starting points the previously found maxima restricted to the position subspace. Use the complete expression for the likelihood (no analytical shortcuts).

  • Once Powell minimizer finishes, pick up the optimal values for the parameters and test them for acceptance/rejection. If and only if the maximum is accepted as ‘good’, subtract the object from the map. (This is different from our old solution: sometimes, when we subtracted a rejected peak in the neighbourhood of a true but still undetected object, we ended up damaging the good peak. Unfortunately most of the times that is enough to prevent the real object to be detected. Right now we only subtract accepted peaks to avoid multiple detections of the same object. We think we should try to avoid the subtraction at all, since there is always the risk that a spurious detection and consequent subtraction will damage a good peak.)

A1.2 Sensitivity

Now, let's consider the sensitivity.

  • Analytical solutions are in general great, however, when noise is taken into consideration the functions become hard to handle. Furthermore, most of the theorems do not apply in this case. Therefore, one should try to avoid analytical solutions.

  • Sometimes the estimated values of the parameters fall off the allowed range used to define the Bayesian criterion for acceptance. Therefore, even if the error bars allow them, we do not accept them instead send them for testing (accept/reject), or reject them immediately. Rather you should attach them a very low probability and let the computation continue.

  • PowellSnakes depends on several parameters in order to optimize its performance/sensitivity, i.e. the number of launched ‘snakes’, average number of sources per patch etc. Since the detection conditions (background, noise, signal-to-noise ratio etc.) can exhibit some drastically changes across the sphere, these parameters should be occasionally re-adjusted. In its current implementation PowellSnakes allows the definition of sphere zones. Within each zone a different set of these parameters may be loaded from the parameter file in order to tune its sensitivity/performance.

  • Do not expect miracles: if the NISNR is low you cannot detect reliably.

A1.3 Other technical details

  • The structure of the program was designed to be ready to take full advantage of a multiprocessor system. The performance will scale linearly with the number of processors.

  • Implemented in ISOc++.

  • Completely built on templates. This feature allows you to consider the precision you want.

  • Designed as a ‘framework’, i.e. you do not need to understand how PowellSnakes works to use it. Inherit from its classes, customize it, and PowellSnakes will call your code when needed.