Abstract
A new fast Bayesian approach is introduced for the detection of discrete objects immersed in a diffuse background. This new method, called PowellSnakes, speeds up traditional Bayesian techniques by (i) replacing the standard form of the likelihood for the parameters characterizing the discrete objects by an alternative exact form that is much quicker to evaluate; (ii) using a simultaneous multiple minimization code based on Powell's direction set algorithm to locate rapidly the local maxima in the posterior and (iii) deciding whether each located posterior peak corresponds to a real object by performing a Bayesian model selection using an approximate evidence value based on a local Gaussian approximation to the peak. The construction of this Gaussian approximation also provides the covariance matrix of the uncertainties in the derived parameter values for the object in question. This new approach provides a speed up in performance by a factor of ‘100’ as compared to existing Bayesian source extraction methods that use Monte Carlo Markov chain to explore the parameter space, such as that presented by Hobson & McLachlan. The method can be implemented in either real or Fourier space. In the case of objects embedded in a homogeneous random field, working in Fourier space provides a further speed up that takes advantage of the fact that the correlation matrix of the background is circulant. We illustrate the capabilities of the method by applying to some simplified toy models. Furthermore, PowellSnakes has the advantage of consistently defining the threshold for acceptance/rejection based on priors which cannot be said of the frequentist methods. We present here the first implementation of this technique (version I). Further improvements to this implementation are currently under investigation and will be published shortly. The application of the method to realistic simulated Planck observations will be presented in a forthcoming publication.
1 INTRODUCTION
The detection and characterization of discrete objects is a generic problem in many areas of astrophysics and cosmology. Indeed, one of the major challenges in such analyses is to separate a localized signal from a diffuse background. It is most common to face this problem in the analysis of twodimensional images, where one wishes to detect discrete objects in a diffuse background.
When performing this task it is often assumed that the background is smoothly varying and has a characteristic lengthscale much larger than the scale of the discrete objects being sought. For example, SExtractor (Bertin & Arnouts 1996) approximates the background emission by a loworder polynomial, which is subtracted from the image. Object detection is then performed by finding sets of connected pixels above some given threshold. Such methods run into problems, however, when the diffuse background varies on lengthscales and with amplitudes similar to those of the discrete objects of interest. Moreover, difficulties also arise when the rms level of instrumental noise is comparable to, or somewhat larger than, the amplitude of the localized signal one is seeking.
A specific example illustrating the above difficulties is provided by highresolution observations of the cosmic microwave background (CMB). In addition to the CMB emission, which varies on a characteristic scale of order ∼10 arcmin, one is often interested in detecting emission from discrete objects such as extragalactic ‘point’ (i.e. beamshaped) sources or the Sunyaev–Zel'dovich (SZ) effect in galaxy clusters, which have characteristic scales similar to that of the primordial CMB emission. Moreover, the rms of the instrumental noise in CMB observations can be greater than the amplitude of the discrete sources. In such cases, it is not surprising that straightforward methods, such as that outlined above, often fail to detect the localized objects.
The standard approach for dealing with such difficulties is to apply a linear filter ψ(x) to the original image d(x) and instead analyse the resulting filtered field
Suppose one is interested in detecting objects with some given spatial template t(x) (normalized for convenience to unit peak amplitude). If the original image contains N_{obj} objects at positions X_{i} with amplitudes A_{i}, together with contributions from other astrophysical components and instrumental noise, then where s(x) is the signal of interest and n(x) is the generalized background ‘noise’, defined as all contributions to the image aside from the discrete objects. It is straightforward to design an optimal filter function ψ(x) such that the filtered field (1) has the following properties: (i) d_{f}(X_{k}) is an unbiased estimator of A_{i}; (ii) the variance of the filtered noise field n_{f}(x) is minimized; the corresponding function ψ(x) is the standard matched filter (see, for example, Haehnelt & Tegmark 1996). One may consider the filtering process as ‘optimally boosting’ (in a linear sense) the signal from discrete objects, with a given spatial template, and simultaneously suppressing emission from the background.An additional subtlety in most practical applications is that the set of objects are not all identical, but one can repeat the filtering process with filter functions optimized for different spatial templates to obtain several filtered fields, each of which will optimally boost objects with that template. In the case of the SZ effect, for example, one might assume that the functional form of the template is the same for all clusters, but that the ‘core radius’ differs from one cluster to another. The functional form of the filter function is then the same in each case, and one can repeat the filtering for a number of different scales (Herranz et al. 2002a). Moreover, one can trivially extend the linear filtering approach to the simultaneous analysis of a set of images. Once again, the SZ effect provides a good example. Owing to the distinctive frequency dependence of the thermal SZ effect, it is better to use the maps at all the observed frequencies simultaneously when attempting to detect and characterize thermal SZ clusters hidden in the emission from other astrophysical components (Herranz et al. 2002b).
The approaches outlined above have been shown to produce good results, but the filtering process is only optimal among the rather limited class of linear filters and is logically separated from the subsequent object detection step performed on the filtered map(s). As a result, Hobson & McLachlan (2003, hereafter HM03) introduced a Bayesian approach to the detection and characterization of discrete objects in a diffuse background. As in the filtering techniques, the method assumed a parametrized form for the objects of interest, but the optimal values of these parameters, and their associated errors, were obtained in a single step by evaluating their full posterior distribution. If available, one could also place physical priors on the parameters defining an object and on the number of objects present. Although this approach represents the theoretically optimal method for performing parametrized object detection, its implementation was performed using Monte Carlo Markov chain (MCMC) sampling from the posterior which was extremely computationally intensive. Although considerable progress has recently been made in increasing the efficiency of samplingbased Bayesian object detection methods (Feroz & Hobson 2008), such approaches are still substantially slower than simple linear filtering methods. Therefore, in this paper, we explore a new, fast method for performing Bayesian object detection in which sampling is replaced by multiple local maximization of the posterior, and the evaluation of errors and Bayesian evidence values is performed by making a Gaussian approximation to the posterior at each peak. This approach yields a speedup over samplingbased methods of many 100s, making the computational complexity of the approach comparable to that of linear filtering methods.
The outline of the paper is as follows. In Section 2 we provide a brief outline of the essential Bayesian logical framework which supports the employed methodology. The previously defined entities (see Section 2) are then characterized in terms of the source detection problem in Section 3. A summary of the strategy to fulfil the desire goals is then suggested in Section 4. In Section 5 we develop the first step of the proposed algorithm: the maximization of the posterior distribution and we give account of the errors bars on the estimated parameters of the putative source under study. A twostep approach split between Fourier and real space is indicated. The second phase of the detection algorithm: the Bayesian validation procedure which we apply on the previously found likelihood maximum is explained in Section 6. A prior is suggested and an upper bound on the quality of the detection is advanced. We end this section by considering a method for correcting possible systematic deviations on the evidence evaluation. In Section 7 we give a detailed study on the effects that develop when adding correlation into the diffuse background that produces the substructure in which the sources are imbedded. The results from a series of simulations covering several typical scenarios are presented in Section 8 and we close indicating our conclusions and possible directions for further work in Section 9.
2 BAYESIAN INFERENCE
Bayesian inference methods provide a consistent approach to the estimation of a set parameters Θ in a model (or hypothesis) H for the data d. Bayes' theorem states that
where is the posterior probability distribution of the parameters, is the likelihood, is the prior and is the Bayesian evidence.In parameter estimation, the normalizing evidence factor is usually ignored, since it is independent of the parameters Θ. This (unnormalized) posterior constitutes the complete Bayesian inference of the parameter values. Inferences are usually obtained either by taking samples from the (unnormalized) posterior using MCMC methods, or by locating its maximum (or maxima) and approximating the shape around the peak(s) by a multivariate Gaussian.
In contrast to parameter estimation problems, in model selection the evidence takes the central role and is simply the factor required to normalize the posterior over Θ:
where D is the dimensionality of the parameter space. As the average of the likelihood over the prior, the evidence is larger for a model if more of its parameter space is likely and smaller for a model with large areas in its parameter space having low likelihood values, even if the likelihood function is very highly peaked. Thus, the evidence automatically implements Occam's razor: a simpler theory with compact parameter space will have a larger evidence than a more complicated one, unless the latter is significantly better at explaining the data. The question of model selection between two models H_{0} and H_{1} can then be decided by comparing their respective posterior probabilities given the observed data set d, as follows: where is the a priori probability ratio for the two models, which can often be set to unity but occasionally requires further consideration. Unfortunately, evaluation of the multidimensional integral (4) is a challenging numerical task. None the less, a fast approximate method for evidence evaluation is to model the posterior as a multivariate Gaussian centred at its peak(s) and apply the Laplace formula (see e.g. Hobson, Bridle & Lahav 2002).3 BAYESIAN OBJECT DETECTION
A Bayesian approach to detecting and characterizing discrete objects hidden in some background noise was first presented in an astronomical context by HM03, and our general framework follows this closely. For brevity, we will consider our data vector d to denote the pixel values in a single patch in which we wish to search for discrete objects, although d could equally well represent the Fourier coefficients of the image, or coefficients in some other basis.
3.1 Discrete objects in a background
Let us suppose that we are interested in detecting and characterizing some set of (twodimensional) discrete objects, each of which is described by a template τ(x; a). This template is defined in terms of a set of parameters a that might typically denote (collectively) the position (X, Y) of the object, its amplitude A and some measure R of its spatial extent. A particular example is the circularly symmetric Gaussianshaped object defined by
so that a={X, Y, A, R}. In what follows, we will consider only circularly symmetric objects, but our general approach accommodates the templates that are, for example, elongated in one direction, at the expense of increasing the dimensionality of the parameter space.1 In the analysis of singlefrequency maps, most authors take the template τ(x; a) to be the (pixelized) intrinsic shape of the object convolved with the beam profile. For multifrequency data, however, it is more natural to treat the intrinsic object shape and the beam profiles separately.2If N_{obj} objects are present in the map and the contribution of each object to the data is additive, we may write
where s(a_{k}) denotes the contribution to the data from the kth discrete object and n denotes the generalized ‘noise’ contribution to the data from other ‘background’ emission and instrumental noise. Clearly, we wish to use the data d to place constraints on the values of the unknown parameters N_{obj} and a_{k}(k= 1, …, N_{obj}).3.2 Defining the posterior distribution
As discussed in HM03, in analysing the data the Bayesian purist would attempt to infer simultaneously the full set of parameters . The crucial complication inherent to this approach is that the length of the parameter vector Θ is variable, since it depends on the unknown value N_{obj}. Some samplingbased approaches are able to move between spaces of different dimensionality, and such techniques were investigated in HM03.
An alternative approach, also discussed by HM03, is simply to set N_{obj}= 1. In other words, the model for the data consists of just a single object and so the full parameter space under consideration is a={X, Y, A, R}, which is fixed and only fourdimensional. Although we have fixed N_{obj}= 1, it is important to understand that this does not restrict us to detecting just a single object in the map. Indeed, by modelling the data in this way, we would expect the posterior distribution to possess numerous local maxima in the fourdimensional parameter space, each corresponding to the location in this space of one of the objects present in the image. HM03 show this vastly simplified approach is indeed reliable when the objects of interest are spatially well separated, and we adopt this method here.
3.3 Likelihood
In this case, if the background ‘noise’n is a statistically homogeneous Gaussian random field with covariance matrix N=〈n n^{T}〉, then the likelihood function takes the form
Moreover, if the background is just independent pixel noise, then N=σ^{2}I, where σ is the noise rms.In the general case, the loglikelihood thus takes the form
where c is an unimportant constant. The first innovation in our new method is to recast the loglikelihood in such a way that the computational cost of evaluating it is considerably reduced. This is achieved by instead writing the loglikelihood as where c′=c− (1/2)d^{T}N^{−1}d is again independent of the parameters a. The advantage of this formulation is that the part of the loglikelihood dependent on the parameters a consists only of products involving the data and the signal. Since the signal from a discrete source with parameters a is only (significantly) nonzero in a limited region centred on its putative position one need only calculate the quadratic forms in (10) over a very limited number of pixels, whereas as the quadratic form in the standard version (9) must be calculated over all the pixels in the map.In the case of independent pixel noise, the covariance matrix N in (10) is diagonal, and so the quadratic forms can be calculated very rapidly. For correlated noise, however, N is no longer diagonal and so evaluating the necessary elements of its inverse can be costly for a large map. None the less, if the background is a statistically homogeneous random field N^{−1} can be computed in Fourier space, where it is diagonal. As a result, we perform our analysis of sufficiently small patches of sky that the assumption of statistical homogeneity is reasonable. An alternative procedure would be to transform to a set of basis functions, such as wavelets, in which the data are readily compress, hence reducing the effective dimensionality of the problem; we will not, however, pursue this route.
3.4 Prior
Having determined the likelihood function, it remains only to assign a prior on the parameters a= (X, Y, A, R). Although not a formal requirement of our method, the simplest choice is to assume the prior is separable, so that
Moreover, we will assume that the priors on X and Y are uniform within the range of the image, and that priors on A and R are the uniform distributions within some ranges [A_{min}, A_{max}] and [R_{min}, R_{max}], respectively. This is true for our first implementation of PowellSnakes (version I). Our general approach can, in fact, accommodate more general priors. Indeed, the assumption of uniform priors on A and R is not a good one for most astrophysical problems. Typically one expects the number of discrete sources to decrease with amplitude, and their angular sizes are unlikely to be uniformly distributed. The best prior to adopt for A is one derived from existing source counts. These are typically power laws over a wide range (although care must be taken at the ends of the range to obtain a properly normalized prior). In our approach presented below, however, we will assume the logposterior is at most quadratic in the source amplitude A, which restricts π(A) to be of uniform, exponential or Gaussian form. (The second implementation (version II) of PowellSnakes currently under development adopts a power law.) Turning to π(R), existing knowledge of the angular sizes of the objects sought can be use to construct an appropriate prior, and no restriction is placed by our method on this. Of course, in the case of point sources, the overall template is simply the beam profile, the angular size of which is (usually) known in advance.33.5 Optimal patch size. The patch border
The patches are divided into two different areas: a central area where the detection takes place and a surrounding border. The border is used to prevent the sources from being truncated by the edges of the patch since this usually impacts severely on the process of their detection and characterization. In order to avoid truncation, the border size must be at least half the largest source radial size. When enforcing this rule we are assuring that a source profile which has been cut by the patch edge is left undisturbed at least on one of its neighbour patches. When defining an optimal patch size several somewhat opposing criteria must be taken into account.
The assumption of statistical homogeneity which always favours small patches.
Execution speed which tends to exhibit a bias towards somewhat larger patches.
 –
The greater the number of patches the greater the number of border overlaps which occur. This increases the number of pixels processed more than once.
 –
There is a nonnegligible amount of time preparing a new patch for detection.
 –
The prefiltering stage which involves a fast Fourier transform (FFT) requires a power of 2 array size in order to be efficient.
Our work shows that the optimal patch size is always the one which minimizes the effects of the inhomogeneities of the background. However, (128 × 128) pixel patches on the NSide = 1024 Healpix maps (∼7.33°× 7.33°) and (256 × 256) pixel patches on NSide = 2048 Healpix maps (∼7.33°× 7.33°) are always good choices in terms of being a good balance between statistical homogeneity and execution speed.
4 OBJECT DETECTION STRATEGY
Once the likelihood and prior have been defined, the problem of object identification and characterization reduces simply to exploring the resulting posterior distribution:
where we have omitted an unimportant additive constant term on the righthand side. Rather than using MCMC methods, as advocated by HM03 and Feroz & Hobson (2008), here instead we locate the local maxima of the posterior in the fourdimensional parameter space a={X, Y, A, R} and perform a Gaussian approximation about each peak. The former provides the estimates of the object parameters associated with that posterior peak and, as outlined below, the latter allows one to assign uncertainties to the derived parameter values. The Gaussian approximation also allows one to estimate the Bayesian evidence associated with each posterior peak. The Bayesian evidence is used to decide whether the detected peak corresponds to a real object or is just a ‘conspiracy’ of the background noise field (see Section 6).Such an approach was also advocated by HM03, in which a simulated annealing downhill simplex minimizer was used in an iterative object detection scheme. At each iteration, the minimizer located the global maximum of the posterior and an object with the optimal parameter values was subtracted from the map before commencing the next iteration. The process was repeated until the first rejected detection.
Here we adopt a different strategy that obviates the need to attempt to locate the global maximum at each iteration, since this is very computationally expensive. Instead, our approach consists of launching a series of simple downhill minimizations. The choice of starting point for each minimization, the minimizer employed and the number of minimizations performed are discussed in detail in Section 5. The endpoint of each minimization launched will be a local maximum in the posterior, giving the optimal parameter values for some putative detected object. A Gaussian approximation to the posterior is constructed about the peak and the detection is either accepted or rejected based on an evidence criterion (see Section 6). If the detection is accepted, then an object with the optimal parameter values is subtracted from the map before the next downhill minimization is launched. Although our method does leave open the possibility of repeatedly detecting the same spurious sources, the speed of our algorithm assures that this is not a problem (see Section 8).
It should be noted, however, that the only reason for subtracting accepted objects from the map is to avoid multiple real detections. Moreover, when attempting to detect sources at very low signaltonoise ratios, one might occasionally accept a spurious detection as a real source, and its subtraction from the map may damage subsequent downhill minimizations. Ideally, therefore, one should avoid subtracting sources altogether. One such approach is simply to accept that real objects may be detected repeatedly, and check for duplications in the optimal sets of parameters obtained in each accepted detection (e.g. parameters sets that are similar to well within the derived uncertainties). We have not implemented such a method for this paper, but this is under investigation.
As discussed in Section 5 determining the number of the downhill minimizations to perform is a key part of our method. However, once we perform these minimizations, we ensure that no real object has been missed by performing a further set of downhill minimizations as follows. Since the spatial extent of any object in the data map must be greater than that of the beam, we divide the map into patches of area equal to the beam size. For each such patch, we then launch a downhill minimizer with initial values of X and Y equal to the coordinates of the centre of the patch, and with A and R values equal to the midpoints of their respective prior ranges. This is the case for our first implementation of PowellSnakes (version I, for white noise only). This approach is unlikely to miss any remaining peaks in the posterior associated with real objects, provided the number density of sources is sufficiently low that they are spatially wellseparated. Clearly, such an approach will perform less well if two or more sources (partially) overlap, in which case a single source of large spatial extent may be fitted to them.
In the earlier stages of PowellSnakes implementation we performed the analysis completely in real space, rather than prefiltering the map. However, this procedure slowed down the execution of the algorithm. The evaluation of the objective function for each set of parameters is a very expensive operation which unfortunately lies deep in the inner loop of the code, ending up being called several hundreds of times for each ‘snake’. Even a small optimization of its execution has a gigantic impact on the overall performance of the code. We initialize the Powell minimizer (i.e. the tail of each of the ‘snakes’), with the central points of the grid defined in Section 5.2 and the midpoints of the [A, R] parameters. The first step of the minimization process is always to search the position subspace keeping R and A constant. As we have to repeat this step each time we start a new ‘snake’, finding a fast way of evaluating the likelihood in all the positional subspace available is very advantageous. This is where the filtering process steps in. Filtering with the matched filter automatically provides us with a fast evaluation of the likelihood restricted to the positional subspace. Furthermore, it gives a very good estimate of the starting value for the A parameter, using the classical formula of the amplitude of a field filtered by a matched filter (16). Next, we need to find out the optimal value of R parameter which is no longer considered constant.
As a final note, one must take care when a source is located near the edge of the map, since it will be partially truncated. This is not, however, a severe limitation as one can allow an overlap of at least the large expected size when dividing the full map into small patches; a truncate source in one patch will appear intact in a neighbouring one. This strategy does result in a small extra processing overhead, since several pixels will be processed more than once, but this is a minor consideration.
5 MAXIMIZING THE POSTERIOR DISTRIBUTION
In this section, we outline the method employed to locate the multiple local maxima of the posterior distribution.
5.1 Prefiltering step
Let us begin by writing the template (6) as
where and R are, respectively, the amplitude, vector position and size of the object, and t(x; R) is the spatial shape of a unit height reference object of size R centred on the origin. Then the signal vector in the loglikelihood (10) is simply where the vector t has components , in which x_{i} is the position of the ith pixel in the map. Substituting the above expression into the loglikelihood, assuming a constant prior and differentiating with respect to A gives which on setting equal to zero yields an analytic estimate for the amplitude: Note that, under the assumption of statistical homogeneity, the denominator in the above expression does not depend on the source position , and thus need only be calculated once (say for a source at the origin) for any given value of R. Generalizing the notation of Savage & Oliver (2007), we denote this denominator by α(R) and the numerator in (16) by . It is worth noting that the quantity is merely a generalized signaltonoise ratio for a unit amplitude object integrated over its spatial template.More importantly the estimator (16) is precisely the filtered field produced by a classical matched filter, in which one assumes a value for R. Moreover, it is straightforward to show that the corresponding loglikelihood (10) can be written
Thus, we show here that for a given value of R, peaks in the filtered field correspond precisely to peaks in the loglikelihood considered as a function of . If the objects sought all have the same size R and this is known in advance, then the local maxima of the now threedimensional posterior in the space (X, Y, A) are all identified by this method. This scenario corresponds to point sources observed by an experiment with a known beam profile. However, the assumption that the sizes of the objects are known in advance might lead us astray. To show that this is indeed the case, consider the following.Suppose two sources are so close they cannot be resolved by the optical device: the intensity profile usually still matches fairly well the of beam profile but with a much enlarged radial scale. Taking the radius of the beam profile as an unknown parameter, PowellSnakes will still identify them wrongly as a single source, but the total estimated flux will much closely match the true value of the total flux coming from both sources.
We are assuming a statistical homogeneous background: this is never strictly true, but keeping the patches small enough, this is usually a good assumption. Nevertheless, small changes in the beam profile across the patch, which in terms of the noise components (CMB, Galactic foregrounds etc.) may be perfectly ignored, when estimating the sources fluxes might lead us to severely biased values.
We are assuming point sources: this limits the range of application of this method. One of the driving motivations behind PowellSnakes is the detection of SZ clusters. This implies the capability of multifrequency operation and a source profile with a welldefined shape. This source shape template is only useful in real situations if we are to allow different spatial scales. In order to achieve this extra degree of freedom we introduced the Radius [R] parameter.
When the source size R is not known in advance, or the sources have different sizes, most matched filter analyses (see e.g. Herranz et al. 2005) filter the field at several predetermined values of R and combine the results in some way to obtain estimates of the parameters (X, Y, A, R) for each identified source. This approach is rather ad hoc, however, and so we will pursue a different strategy based on locating the local maxima of the posterior using multiple downhill minimizations. At first sight it might seem tempting to use the result (16) to reduce the parameter space to the three dimensions (X, Y, R), since with knowledge of these parameters one can immediately calculate . Such an approach can, however, lead to numerical instabilities when the denominator α(R) in (16) is very small (which occurs in the low signaltonoise ratio regime) as our numerical tests confirmed. Thus we do not pursue this approach.
We do, however, make use of the result (16) by first prefiltering the map assuming a value for R equal to the mean of its prior range. The advantage of this prefiltering stage is that, even in the presence of objects of different sizes, the filter field often has a pronounced local maximum near many real objects in case the R parameter range is not too wide and the value of R employed in the filtering process stays close to 0 (Fig. 1).
5.2 Twodimensional Powell minimization step
In the next step of our analysis, we launch N= 100 (see below) downhill minimizations in the twodimensions (X, Y) using the Powell algorithm (Teukolsky & Vetterling 1992) with starting points chosen as described below. This method is a simple directionset algorithm that requires only function values. The method makes repeated use of a onedimensional minimization algorithm, which in our case is Brent's method; this is an interpolation scheme that alternates between parabolic steps and golden sections. In addition, we ‘enhance’ the basic Brent line minimizer algorithm by introducing a ‘jump’ procedure designed to avoid small local minima (such as those resulting in the posterior from the noise background). A description of this ‘jump’ procedure is given below. The reason for this twodimensional minimization step is simply to obtain a good estimate of the (X, Y) positions of the sources. As a byproduct, using formula (16), we end up obtaining an approximate estimate of the putative source amplitude A as well.
Assuming that sensitivity and experiment resolution are such that we should only expect a very small number of partially resolved sources, one may assume that in each patch of beam size area we should not find more than one real peak. The likelihood peaks in the positional subspace (keeping A and R constant) are not expected narrower than the original beam at least when we are dealing with the white noise only case. This is so because the likelihood positional surface is nothing but the correlation of the beam shape with, on average, another beam shape eventually with a different R. Thus the total area where the correlation will have values significantly different from 0 is expected to be larger than the largest area of each one of the original beams. These assumptions naturally define the coarseness one should use when exploring the patch, namely, the beam size. As we are assuming that beam shapes with different radii may actually occur, the largest possible resolution happens when the smallest beam is considered. So far there is no prior information to eventually distinguish one part of the patch from the others, therefore, we assume that a uniform rectangular grid with as many cells as the total number of ‘beams in the patch’, would be perfectly adequate to define the starting points of the ‘snakes’. Thus
We add an extra twodimensional random offset from the central points of each grid cell in order to avoid some unwanted correlation effects due to the perfect spacing of the grid. Some may argue that, in addition to the likelihood real peaks (those resulting from the sources in the map), we should expect a large number of fake ones resulting from the interaction of the filter with the noise fluctuations (Fig. 1). We considered such possibility by devising a ‘jump’ procedure (see below) to avoid the ‘snakes’ from being ‘caught’ on those fake peaks. Nevertheless, as in the case of a CMB background, when the number of these spurious peaks largely outnumbers the real ones, one should increase the density of ‘snakes’ (Fig. 2). For the white noise detection examples presented in Section 8, we have taken all of the above into consideration. The number of snakes chosen, N_{Snakes}= 100, meets these requirements, for both cases [i.e. low integrated signaltonoise ratio (LISNR) and Hobson–McLachlan (HMcL), see Section 8]. As we assumed that no sources are placed into the patch borders we consequently did not include those pixels.
5.2.1 The jump procedure
The Brent line minimizer keeps at any time a set of three pairs of values, argument and function value, where the middle pair is said to be ‘bracket’. This is because the argument part of the second pair lies in the interval defined by the other two and its function value is lower than the function values of the interval limiting pairs. We call them
Like the simulated annealing example previously referred, the rational behind the jump procedure was borrowed from natural world, but this time from the realm of quantum mechanics. We imagine our current best guess, the bracketed pair, [b, f(b)], as the particle inside a potential well defined by the limiting pairs (left and right bound). If we adopt the ‘classical’ point of view there is no way the particle could possibly escape from the bracketing barriers. This is exactly what happens with the traditional Brent procedure. Instead if we use a quantum approach it is still possible for the particle to tunnel out from the barrier.
The transmission coefficient T of a particle with energy E which faces a potential barrier with a height equal to V_{0}(V_{0} > E), and a width l, is given by
where ρ_{2} is is the particle mass and ħ is Dirac constant, where we assume ρ_{2}l≪ 1 (CohenTannoudji et al. 1973). From the quantum mechanical example we have only retained the negative exponential dependence with the barrier length (the jump distance in our jump model) and the characteristic constant depending on the difference between the particle energy and the barrier maximum potential. Following this example we define a random variable Λ exponentially distributed: with average θ. The next step is to define θ dependence on the parameters, bearing in mind the tunnelling effect behaviour. We chose where L is the ‘maximum average jump length’, f_{s} is the ‘function values scale’ and Δ is either the ‘right barrier’≡Δ_{r} when we jump forward or the ‘left barrier’≡Δ_{l} when we are trying a jump along the opposite direction (Fig. 3).The procedure has two tuning parameters: one (whose value is given as input from the parameter file) to divide the maximum allowed range for the current jump in order to produce the L value, and another one which purpose is to define a scale for the function values. This second value, f_{s}, is the maximum range of function values evaluated from the ‘snake’s tails' locations (initial values drawn from the grid as explained above). When we start the minimization process by defining the bracketing interval, the well walls (defined by the differences of the function values, Δ_{r} and Δ_{l}) are usually large and consequently only very small tunnels will be allowed. If Δ > f_{s} no jumps will be tempted. As the minimizer converges and approaches the minimum, the barriers will become smaller and consequently bigger tunnels will become more and more probable. This jump procedure has the advantage of exploring the function domain progressively as the algorithm approaches the bottom of the well. If by accident we have first hit the global minimum when the solution is still far from the bottom only small jumps are allowed and the probability of tunnelling to a local peak is small. As soon as we get to the bottom, Δ≈ 0, large tunnels become probable (θ≈L), but as the function is now close to its lowest value, a successful jump is now improbable. However, if instead we have started within a local peak instead of becoming ‘stuck’, as the minimizer approaches its last steps, large jumps become common and the probability of ‘plunging’ into a deeper well is now very high. One of the main reasons for choosing the exponential distribution, apart its close relationship with the quantum tunnelling, is the simplicity and efficiency of a random deviates generator whose output will follow it (provided we already have a routine which produces random deviates with a uniform probability distribution inside a certain interval). We proceed as follows.
For each Brent iteration:
draw a sample u from an uniform distribution between [0, 1). If u≥ 0.5 choose a positive jump, otherwise move backwards;
draw from the tunnelling distribution. If the jump ends up inside the ‘inner zone’ we reject it because we are on the same well we have started from. If the jump is outside the ‘inner zone’ we evaluate the function at that point [z, f(z)]. If the new function value is lower than the one that initiated the jump, f(z) < f(b), we accept it and restart the Brent minimizer with this new value.
5.2.2 The effectiveness of the jump procedure in recovering ‘lost snakes’
Our extensive set of simulations covering a typical assemblage of different scenarios shows that the use of the jump procedure has a fairly good effectiveness. We have achieved equal levels of quality on the obtained results using ∼20–25 per cent fewer snakes. However, within our current setup its importance is far from critical since speed is not a concern to us. The extra level of complexity which results from the introduction of the procedure could have been avoided by launching extra ‘snakes’. However, when dealing with such a scenario where the cost of starting additional ‘snakes’ is much higher or the problem cannot just be circumvented by setting up more starting points for the minimization procedure, we believe that the jump procedure might have an important role to play.
5.3 Fourdimensional Powell minimization step
Instead of just repeating the filtering procedure using different values for the R parameter, we proceed by following a rationale which tries to mimic that behind the FFT immense speed up. FFT is fast because it reduces, using factorization, the number of operations needed to compute the ‘complete set’ of Fourier coefficients. If we only need one or two (if the number of coefficients we want to compute is less than ln(n); n being the size of the vector) then it would be more advantageous to use the traditional formula directly. Hence, we now search the full parameter space (X, Y, A and R) with the help of the Powell minimizer, but this time using the previously estimated parameters (from the twodimensional step) as starting values. We are not expecting the ‘Powell snake’ to unfold significantly outside a small neighbourhood of the initial value. Thus, as we do not need to evaluate the likelihood outside this small positional range anymore (as in the FFT case where we just want to compute one or two coefficients) using the real space proves to be more advantageous.
In the next step, we perform multiple Powell minimizations in the full fourdimensional space, starting each minimization from the (X, Y, A, R)– positions found in the previous step. In this step the ‘jump’ procedure described above is not used. This is one of the main reasons why we have first performed the twodimensional step (and the prefiltering). From our experience we could verify that after performing the twodimensional step, the neighbourhood of the likelihood manifold around the vector solution {X, Y, A, R} we had so far obtained was rather smooth and ‘well behaved’. Thus, the necessity for a procedure in order to avoid local maxima is no longer required. The resulting endpoint of each minimization is a local maximum of the posterior, giving the optimal parameter values for some putative detected object. A Gaussian approximation to the posterior is constructed about the peak and the detection is either accepted or rejected based on an evidence criterion (see Section 6). If the detection is accepted, then an object with the optimal parameter values is subtracted from the map before the next downhill minimization is launched.
5.4 Error estimation – Fisher analysis
For each accepted object, estimates of the uncertainties on the derived parameters are obtained from the Gaussian approximation to the posterior around that peak. In detail the covariance matrix of the parameter uncertainties has elements C_{ij}≡〈δa_{i} δa_{j}〉, which can be obtained by inverting (minus) the Hessian matrix at the peak. In the case of uniform priors, one has simply
Here we present an estimation of the putative source parameters' error bars using the Fisher analysis over the model H_{1}. We present two different cases:
We start by defining F_{ij} as the negative expectation of the loglikelihood Hessian matrix, in other words the ‘Fisher information matrix’. We are assuming, as we have already stated, a Gaussian approximation of the likelihood around its maximum in parameter space.
Let be the likelihood of model H_{1}, then
Assuming a Gaussian model for the diffuse background we have where N^{−1}_{ij} is the (ij) element of the inverse correlation matrix of the diffuse background, p_{i} is the i map pixel, τ_{i} is the i pixel of the object template and c is an unimportant constant. The parameters of the object template are the following: A the amplitude of the object, r_{0} the positional parameters and which represents the ensemble of the remaining parameters. In this analysis we are assuming a circular symmetric source. Thus, contains only a single parameter, R which provides a scale for the radial dimension of the object, i.e. R≡‘radius of the object’. We will assume, from now on, the following functional dependence of the source template on its parameters:5.4.1 White noise
For a background diffuse component generated from a Gaussian stationary process with coherence length much smaller than the pixel size, i.e. Gaussian white noise, the following condition applies:
This is the situation when dealing with a system where the instrumental pixel noise is the dominant source of the background component. Adding the condition for statistical homogeneity: and substituting both conditions into the loglikelihood expression this becomes Evaluating expression (24) for all the possible parameters' combinations and using the following sort criterion {A, R, X, Y}, one obtains where {α, β, δ, γ} are pure numerical constants: and are the values of the template parameters which maximize the likelihood. We have used pixel metrics together with the assumption that the sums can be evaluated using the continuous limit (this introduces only minor errors). When considering a source template with exactly the same functional dependence of equation (6), the numerical constants become Now inverting the F matrix where and using the Cramèr–Rao inequality, which in this case reduces to an equality since we are dealing with a maxlikelihood estimator which is efficient. Writing explicitly the above expression (34) for each parameter, one finally obtains the desired parameters' error bars: Defining ‘peak signaltonoise ratio’ (, and substituting into (35): we get another way of expressing the error bars, where we have emphasized their dependence on the ‘quality’ of the signal.6 BAYESIAN OBJECT VALIDATION
It is clear that a key component of our approach is the step for accepting/rejecting as a real object each posterior peak. Indeed, a reliable means for performing this step is crucial for the method to function properly. The decision whether to accept or reject a detection is based on the evaluation of the Bayesian evidence for two competing models for the data given by equation (5). PowellSnakes evaluates the models evidence using a Gaussian approximation to the logarithm of the likelihood ratio of the competing models:
It does so by expanding equation (37) around its maxima in the parameter space and rejecting all the terms equal and above the third order. As the null model (H_{0}≡‘there is no source centred in region S’) does not have any parameter, thus behaving like a constant when considering the parameters space, maximizing the logarithm of likelihood ratio holds the same results as the maxlikelihood estimator of the source parameters of the H_{1} model only. Therefore, finding the peaks of the logarithm of likelihood ratio, corresponds to computing the maxlikelihood estimator of the putative source parameters.The precise definition of the models H_{0} and H_{1} are given below. In general, however, we shall adopt a formulation of the model selection problem in which each model H_{i}(i= 0, 1) is parametrized by the same parameters a= (X, Y, A, R), but with different priors. In this case the evidence for each model is
where the likelihood function L(a) is the same for both models and is given by (10). Although not required by the method, we will assume that for each model the prior is separable, as in equation (11), so that for i= 0, 1, where is the vector position of the source.Moreover, we shall assume uniform priors within specified ranges on each parameter, although the method is easily generalized to nonuniform priors. The general form of our models for the data is
If the region S corresponds to the coordinate ranges [X_{min}, X_{max}] and [Y_{min}, Y_{max}], then where S is the area of the region S. We assume the prior on R is also the same for both models, so π_{0}(R) =π_{1}(R) ≡π(R), but the priors on A are substantially different. Guided by the forms for H_{0} and H_{1} given above, we take π_{0}(A) =δ(A) (which forces A= 0) and One could, of course, consider alternative definitions of these hypotheses, such as setting H_{0}: A≤A_{lim} and H_{1}: A > A_{lim}, where A_{lim} is some (nonzero) cutoff value below which one is not interested in the identified object. We shall not, however, pursue this further here.Given the above forms of the priors, the evidence for model H_{0} is simply
since is a constant and the priors are normalized. For model H_{1}, the evidence reads where we wrote explicitly the dependence of the evidence on the chosen spatial region S; we have also defined the (unnormalized) twodimensional marginal posterior and S is the area of the region S.So far we have not addressed the prior ratio in (5). Although this factor is often set to unity in model selection problems, one must be more careful in the current setting. For the sake of illustration and simplicity, let us assume that the objects we seek are randomly distributed in spatial position, i.e. they are not clustered. In that case, if μ_{S} is the (in general noninteger) expected number of objects per region of size S, then the probability of there being N objects in such a region follows a Poissonian distribution:
Thus, bearing in mind the above definitions of H_{0} and H_{1}, we have Hence, the key equation (5) for model selection becomes where μ=μ_{S}/S is the expected number sources (centres) per unit area.There is a certain degree of freedom when choosing the level ρ must reach in order to consider that model H_{1} is present (a detection) or not. This ambiguity may only be overcome by advocating decision theory (see Jaynes 2003). From now on, we will always assume a criterion of
When introducing the criterion of symmetric loss the condition for detection immediately becomes uniquely defined. One accepts the detection if and rejects it otherwise. One can change our decision criteria, for instance suppose we want to put emphasis on reliability (which means we are willing to accept an increase in the number of undetected sources by a significant decrease in the fake ones), than one must change the threshold for acceptance/rejection. This means that one must change the value that ρ must reach.The only remaining issue is the choice of the region S. Our method for locating peaks in the posterior is direct (local) maximization, which yields the parameter values at the peak.
In this peakbased approach, S is taken to be a region enclosing the entire posterior peak in the subspace (to a good approximation). This, in fact, requires a little care. If S were taken to be the full mapped region, then the evidence Z_{1}(S) would have contributions from many local peaks in the posterior. For each putative detection, however, we are only interested in the contribution to Z_{1}(S) resulting from the posterior peak of interest. In practice, our Gaussian approximation to the posterior around this peak means that other peaks will not contribute to our estimate of Z_{1}(S), thus making a virtue out of a necessity. Moreover, as we show below, provided the region S does enclose the posterior peak of interest in the subspace, the resulting model selection ratio ρ will be independent of the region size S.
In the present case, using (46) and assuming uniform priors on A and R, one has
Making a fourdimensional Gaussian approximation to the posterior about its peak , one obtains where the elements of (minus) the Hessian matrix C^{−1} are given by (23) evaluated at the peak . The above expression again ignores possible edge effects due to abrupt truncation of the posterior by the priors. In Section 6.4 we give an account of how we solved this problem. Using the expressions (10), (16) and (17) for the likelihood, one finds (in logarithms) Most importantly, since we see that ln ρ is independent of the size S of the region considered in the subspace, provided S encloses entirely (to a good approximation) the peak of interest.6.1 Acceptance/rejection threshold
where 〈n_{i}n_{j}〉=σ^{2}M_{ij}=N_{ij} and is the generalized signaltonoise ratio for a unit amplitude object (according to Savage & Oliver 2007 notation). This is the same as which is the more common form one may find of ν≡‘the normalized field amplitude’ (LópezCaniego et al. 2005). ‘The normalized field amplitude’ is the amplitude of a random stationary Gaussian field whose power spectrum [] satisfies the condition , i.e. σ= 1. Using Parseval theorem, it is straightforward to show that the power spectrum of some background that was filtered by an unnormalized matched filter, , satisfiesAssuming that μ is neither a function of the position nor a function of the considered area, this implies that μ must be a constant. These conditions have been obtained using the principle of indifference along with the restriction (55). Integrating, one obtains
where N is the expected total number of sources in the patch and Δ_{s} is the total area of the patch.Let us now recover the condition for symmetric loss, ρ > 1, together with the key result for model selection (49). Substituting (59) into it and using expression (10) we obtain
where (the prior term) is where V_{T} is the total parameter volume, i.e., V_{T}= (A_{max}−A_{min})(R_{max}−R_{min})Δ_{s} and N is the expected number of sources in the map.In Fig. 4 we display the righthand side of inequality (60) as function of integrated signaltonoise ratio (ISNR) . We plot several curves for typical values . Something which is immediately evident is that each curve has a lower bound which depends on which in turn depends on the priors. Regardless of how small the peak is there is always a lower bound for it to be considered a true detection and not merely a noise fluctuation. This is to be expected from an assessment condition which enforces a policy of robustness against false detections (spurious). The threshold curve rises on both directions: when the signal (peak) is too weak because there is a strong possibility of being a noise fluctuation; and when the signal is high because there is a lot of evidence supporting the presence of an object. Hence, rising up the threshold assures an extra level of security against spurious detections without compromising the degree of detection.
Expanding using the results from Section 5.4 and substituting them into (61) we obtain
The curves in Fig. 5 show the dependency of the threshold for detection on the estimated amplitude, ‘estimated source amplitude’, of the putative source under study. We used data that close mimics the example from HM03: V_{T}= 81 000 (5 × 0.5 × 32 400), N= 8, σ= 1, R∈[5, 10] and A∈[0.25, 0.75]. We plot three curves for each prior with R taking the values {5(blue), 7(green), 10(red)}.
6.2 Detection and characterization robustness
The Bayesian framework provides the necessary tools for both of the primary actions of PowellSnakes: detection and characterization. However, the impact of a mismatched prior parameter value has considerably different consequences depending on what we are doing. According to Jaynes (2003), the Bayesian inference rules prescribe that when we have a sufficiently informative likelihood the posterior is only slightly affected by the choice of the priors. Hence, one may expect that a prior mismatch will only have a minimal impact when we are dealing with bright sources. Our simulations are consistent with these predictions. However, when we are tackling faint sources instead, the situation changes. Now the likelihood is no longer very informative and the priors play a part. Although, when doing parameter estimation and since we are using flat priors only, they do not affect the parameter estimates. On the other hand, something quite different happens when doing the detection step. When performing model assessment the priors must be fully normalized. Choosing a wrong prior parameter range can have an important impact on the quality of the detection, either by significantly increasing the number of false hits (spurious) or, on the contrary, making the algorithm become ‘blind’ to detecting sources (misses).4
6.3 Quality of the detection: an upper bound
So far we have only discussed the condition on the detection related to the tuning of the threshold for acceptance/rejection to minimize the symmetric loss. We have said nothing about the level of success achieved. In other words, we know our procedure is tuned in order to minimize the number of undetected plus spurious sources, but we have not yet discussed how well we can do. Following Van Trees (2001) the best a ‘detector’ can achieve is when we have absolute prior certitude about the characteristics of the detection process which is the same as knowing for sure the true values for all parameters. This condition is also known as ‘simple hypothesis test’ and translates into the following prior:
where δ(x) is the Dirac delta function. Recovering expression (49), making , which stands for, no previous information favours one model over the other ≡‘the data must say it all’, using the condition for symmetric loss (51), we have that Taking logarithms on both sides of the inequality and using expression (10) we obtain where a_{0} is the true parameter vector. The lefthand side of the above inequality is an estimator of [L_{1}(a_{0})]/L_{0}. It is very easy to show that the likelihood ratio estimator conditioned on ‘there is no source’, is Gaussian distributed with N[− (1/2)ISNR^{2}_{0}, ISNR^{2}_{0}], where . Hence Substituting into inequality (65), we obtain Each time this inequality is satisfied we presume we are in the presence of a source, a true detection. However, in our assumptions we have assumed the opposite. Hence, this is the condition for a spurious detection. Therefore, the probability for the occurrence of a spurious detection is where means ‘once we have chosen H_{i}’ and erf (x) is the ‘Gauss error function’. Since the loss is symmetric, the probability for the other type of error , the probability that an undetected source occurs, should match that of the spurious one. Thus, We are now in position to establish the lower bound for a detection. When performing the detection two independent and exclusive types of errors can occur: The total probability for an error to occur is where the conditioning in a_{0} explicitly expresses the assumption of perfect prior knowledge on the parameters values (‘simple hypothesis’). Taking advantage of the symmetry and the normalization condition , we finally getFig. 6 shows the dependence of the loss theoretical lower bound on ISNR_{0}. One can see that ISNR_{0} plays a pivotal role in defining an upper level of the quality of the detection process.
In many situations the radius of the source intensity profile, as recorded in the pixels map, may be considered known and constant throughout the patch. One of these cases is that of considering point shapeless sources. In this case one can further assume, with a high degree of accuracy, that the pixelized intensity closely follows the point spread function (PSF) of the antenna. Thus, considering a statistically homogeneous background, the ISNR_{0} becomes a function of the source intensity only, A_{0}, as α_{0}=t(R_{0})^{T}N^{−1}t(R_{0}) may now be considered constant. This separation allows us to make predictions about the minimum flux that one can reliably detect taking into consideration our goals and the experimental setup. Let us define ‘normalized integrated signaltonoise ratio’≡ NISNR which is the ISNR of an unit amplitude source template:
This quantity is of considerable importance when establishing a boundary on the source fluxes that can be reliably detected, as it provides a scale of measure. One may think of A (the source amplitude) as being the ISNR measured in NISNR. For a background correlation matrix considered diagonal (white noise, equation 27) and the Gaussian source template (6), an analytical evaluation of the NISNR_{0} is possible:In all interesting scenarios one rarely has an allowed parameter domain which sums up to single point in parameter space (‘simple hypothesis’). Now the question is how to define a single number which somehow describes a representative figure for the detection quality upper bound of the ensemble of ‘simple hypothesis’ that constitute the volume of the parameter domain. Within a Bayesian framework we tackle this issue by treating the simple hypothesis parameters as nuisance parameters and integrating them out. Hence, our proposed solution is to average the ‘simple hypothesis’ loss limit, Loss_{0}, over the properly normalized probability priors for the parameters:
Substituting (72) into the equation (75) above, and dropping the zero subscript once it no longer carries any special meaning, we obtain where b stands for the entire parameter set except the position coordinates. We are implicitly assuming a factorizable parameter prior (see formula 11). The above formula plays a key role when defining the expected quality for a given detection and measurement setup. If the predicted detection scenario has a low average ISNR, we should never expect a good detection performance. By experience we have verified that when tackling real situations the expected loss is usually much larger than the predicted theoretical bound.When considering the case where the prior on the source amplitude, π(A), can be assumed flat on the region of detection, and the source radius constant across the patch, to a good approximation an analytical evaluation of the above expression is possible:
where I_{m}≡‘minimum ISNR’ and I_{m}≡‘maximum ISNR’. From Fig. 7 one realize the importance of previewing the ISNR limits of our measurement setup in order to define the expected upper bound on the detection quality.6.4 Discussion on truncation of the posterior by the prior
So far the procedure described (formula 54) does not take into account the fact that the prior might abruptly truncate the posterior before it falls (largely) to zero, and hence may lead to an overestimation of the integral. The purpose of this discussion is essentially directed towards the truncation of the posterior which happens in the {A, R} subspace. When considering the position subspace, the prior is independent of the area considered s, as far as this fully encompasses the likelihood peak, which is always the case when dealing with real scenarios.
When we were dealing with the source shape and the beam melted into a single template of radius R embedded into a background of white noise only, the effect of truncation was already evident, especially when were tackling cases of low and very low ISNR (), but not extremely severe. This has provided the rational behind the ‘tuning parameter’, see below. The effect was not severe because the extent of the area under the likelihood fitted fairly well inside the prior box because the A and R parameters error bars were comparatively small.
6.4.1 The tuning parameterφ
When assuming flat priors for all the parameters, the priors become constants and therefore can be taken out of the evidence integrals. Thus, one may now use the approximate formula (54) as a fast way of evaluating the evidence integrals, as far as the truncation effects are kept under control. However, especially when dealing with faint sources we should expect these effects to become significant and hence they must be taken into consideration. Therefore, we added a tuning parameter, φ, to correct for possible systematic deviations from the true value of the integral, i.e.
The new expression for the assessment level curve (righthand side of inequality 60) which results from the introduction of the ‘tuning parameter’, is just the old form where we have only changed the expression:6.4.2 How to evaluate the parameterφ
In the simple case of white noise it is possible to compute an average value of the tuning parameter:
where, V_{a}≡‘parameter volume of a single peak’. Substituting expression (10) into (80) we get where ‘average value of the data under hypothesis ’=‘signal vector true value ≡s_{0}’. The integral in the numerator is difficult to compute, but restricting our analysis to assuming a white noise background only M^{−1}=I, obviates a numerical evaluation of this integral: From expression (82), one sees that the value of φ depends on s_{0}, the true shape of the object. In Table 1 we display several values of φ which result from the application of the above formula to typical scenarios (). The values we obtained for the ‘tuning parameter’ are consistent with those we first expected. When dealing with scenarios with faint signals ⇒ low 〈ISNR_{0}〉 we anticipate the parameter values being close to the priors inferior bounds and at the same time large error bars are expected, hence the likelihood truncation effects should make a significant contribution to the value of the evidence integral. Thus, the low value of φ should not be a surprise. When the opposite happens ⇒ high 〈ISNR_{0}〉 the Gaussian approximation to the likelihood peak is no longer accurate and the large φ is just a consequence of the wider tails of true likelihood function.is small (≤5)  φ∈ (0.66, 1] 
φ≈ 1  
is large (≥7)  φ∈ (1, 1.33] 
is small (≤5)  φ∈ (0.66, 1] 
φ≈ 1  
is large (≥7)  φ∈ (1, 1.33] 
6.4.3 Impact of φ on the performance of the algorithm
Using a series of simulations (see Section 8) for different detection scenarios, we estimated the impact of the tuning parameter, φ, on the performance of the algorithm. We concluded that
the algorithm is not very sensitive to changes of φ;
the differences in performance never exceeded 1.5 per cent when compared with the fiducial value φ= 1;
the largest difference occurred when we dealt with very small values of 〈ISNR_{0}〉(<4), or very large (>8) and
most important of all: ‘the exact values of φ always produced the best results’.
Therefore, the golden rule displayed in Table 2 should be applied.
〈ISNR_{0}〉≤5  Use φ= 0.75 
5 < 〈ISNR_{0}〉 <7  Use φ= 1.00 
〈ISNR_{0}〉≥7  Use φ= 1.15 
〈ISNR_{0}〉≤5  Use φ= 0.75 
5 < 〈ISNR_{0}〉 <7  Use φ= 1.00 
〈ISNR_{0}〉≥7  Use φ= 1.15 
7 INTRODUCING ‘COLOUR’
When dealing with real astronomical detection problems, the background diffuse component of the maps is not usually dominated by the instrumental noise. Nowadays, the most common scenario is the one where the largest signal component of the background has an astronomical origin: the ubiquitous CMB and the galactic foregrounds (free–free, dust, synchrotron). Currently we are focusing on the CMB especially because it can be considered statistically homogeneous and isotropic across the entire sphere with great accuracy (assumptions of our simplified model). The CMB radiation field has already an intrinsic coherence length (∼10 arcmin), which taking into consideration the current surveys resolution (usually lower than 1 arcmin pixel^{−1}), may be hardly considered white. In addition, as it is collected by an antenna with a finite aperture (usually several pixels), whose net effect, in Fourier space, is that of a low pass filter, an extra degree of correlation is injected into the background. Hence, only an algorithm which appropriately deals with a background whose power spectrum is not flat could possibly cope with such a setup. The power spectrum of the simplified background model we proposed (see 86) is not yet fully realistic. However, despite of its simplicity, this model, already introduces the main characteristics of this background. This allows us a direct comparison between the model predictions and the simulations results. When dealing with a background assumed to be generated by a stationary Gaussian process but now with an arbitrary coherence length, the condition (27) on the inverse correlation matrix no longer holds. Nevertheless, despite the fact the correlation matrix is no longer diagonal, if the condition of statistical homogeneity still holds, one may always assume the background correlation matrix to be circulant. Taking advantage of the circulant property of the correlation matrix and transforming to Fourier space, the problem significantly simplifies as the power spectrum of a circulant correlation matrix is always diagonal. This is not a severe limitation because considering patches of small size only, the condition for statistical homogeneity is a good approximation. We are assuming not only statistical homogeneity but isotropy as well. Most of the formulas presented here will hold even if we drop the condition for statistical isotropy. We shall ignore this fact and we will use the statistical isotropy condition throughout the remaining text. We do so because it gives us the opportunity for significant simplifications without restricting the opportunity for real applications, as this condition holds pretty well when tackling the great majority of the problems of interest. Expressing equation (25) (multiplied by −1) in Fourier space one obtains
where the usual k= 2πη. In a similar way for the likelihood of the background: where each of the symbols are the Fourier transforms of τ and d, respectively, and is the normalized power spectrum of the background, i.e., . In what follows the symbols with a tilde on top mean the Fourier transform of the original symbol (without the tilde). We are assuming that a prenormalization of the map's pixels has already been done by dividing the value of each map's pixels by the map rms value (≡σ), which implies a value of σ= 1 for the resulting map.Let us now introduce, for the first time, correlation effects in the background diffuse component. This correlation is to be expected not only because it already exists in the original astronomical background (CMB), and in the diffuse foregrounds (galactic components), but also due to the effect of the antenna PSF. We are considering now a simplified model for the autocorrelation function of the background, after being passed through the antenna. This model is of the form
where s_{0} is a measure of the ‘coherence’ length of the Gaussian isotropic homogeneous stationary background field (see Goodman 2000).5 The autocorrelation function is already properly normalized as required by our previous assumptions, i.e., σ^{2}_{b}=〈x_{i}x_{i}〉_{b}=n_{b}(0) = 1. Transforming to Fourier space and using the Wiener–Khinchin theorem (Goodman 2000) we get where is the ‘power spectrum’ and η_{0}= (2πs_{0})^{−1}.Together with the astronomical components one should always expect the presence of ‘pixel noise’. In our signal model, ‘pixel noise’ stands for all the noise components resulting from the instrumental setup. We are assuming the ‘pixel noise’ to be independent from pixel to pixel, hence configuring a white noise process. We also assume that the ‘pixel noise’ is uncorrelated with all the astronomical components, thus allowing us to write the total background power spectrum as the sum of these two components:
where (white noise process) and we added the constants {w_{I}, w_{B}} in order to allow the signal components to be present in different amounts, hence encompassing a broader range of possible scenarios. These constants must satisfy the normalization condition w_{I}+w_{B}= 1 as the total power spectrum must integrate to unity.7.1 ISNR – the symmetric loss lower bound
Let us begin by expressing in Fourier space, the equivalent representation of the ISNR, using the appropriate form of the Parseval theorem:
First we will handle the simplest possible case where the radii of the sources are constant. We start by evaluating the NISNR: as this will provide us with a metric to predict how deep we can go in the flux scale without compromising the detection quality. Substituting the expression (87) for the power spectrum and the unity amplitude Gaussian template (6) into (89) and evaluating the integral (which has numerical solution only) and using typical values for the sources and background parameters we obtain the function plotted in Fig. 8.When we were dealing with a white noise only background the NISNR was proportional to the radii of the sources (see formula 74). However, when we introduce the correlation, a completely different curve results. Now, the dependence of the NISNR on the object radius stops being linear and a complex shape emerges.
7.2 The ‘whitening’ equalizer
A quantity which plays a central role in the detection process is the likelihood ratio of the competing models (see formulas 83 and 84). After taking logarithms one obtains
Now let us assume that our source template may be modelled by the following expression which is a generalization of equation (6):
where and is the Fourier transform of the source template centred in the origin. Substituting into (90) we get where is the inverse bidimensional Fourier transform. One should already be familiar with this expression: the second term is 1/2 of the ISNR expressed in Fourier space, which doesn't depend on the position coordinates, and the first, in the argument of the inverse Fourier transform, is the ‘linear matched filter’ applied to the map, where Further we can read this expression in a somewhat different manner. Let us write the power spectrum as the product of its square root (the expected amplitude spectrum), i.e. . Substituting into the argument of the Fourier transform (first term of 92) we have Evaluating the square of the first term of (94) ensemble average, i.e. its power spectrum we obtain Hence, after applying the linear filter we removed the ‘colour’ from the background transforming it into a ‘white’ Gaussian random field. This is the reason why one calls the linear matched filter a ‘whitening equalizer’. Let us name the resulting map after being ‘whitened’: Of course, the sources buried in the map become ‘whitened’ as well. In fact the second term of (94) is nothing else but a whitened source: Rewriting formula (92) using these new definitions: It can be easily verified that this formula, where we used the ‘whitened’ counterparts of our entities, reduces to the familiar white noise likelihood ratio (formulas 64 and 65) expressed for convenience in Fourier space. Hence, everything we said for the white noise case applies here, even when we ought to consider the effects of correlation in the background, as long as we use the ‘whitened’ entities instead.7.3 Fisher analysis in Fourier space
Proceeding with the same line of reasoning as that initiated in Section 5.4, but this time expressing using Fourier space (83), we start by splitting the evaluation of the Fisher matrix into three different cases.

The first case refers to those coefficients whose differentiation variables b_{i} are not positional. We shall collectively name them as ≡C:
These elements of the F matrix will be generally ≠ 0. 
The second case deals with those coefficients where one of the differentiation variables belongs to set b_{i} and the other to x_{i}, the position set:
One may rewrite the expression (100) taking advantage of the symmetries of the expressions involved: where η > 0 means that the integral extends to all the Fourier modes which have (half of the Fourier plane). A necessary condition for those class of coefficients being equal to 0 is being real. On the other hand, h(b_{β}, r) having reflection symmetry on both axes is a sufficient condition for that to happen, which is the case for our current model (formula 26). 
The third case is when we are dealing with the position parameters only:
As the example given above, if h(b_{β}, r) has reflection symmetry on both axes, this is a sufficient condition for [where we are ignoring the trivial case h(b_{β}, r) = 0] to happen. This third set of the Fisher coefficients will be named as ≡P.
Ordering the parameters by collecting the position parameters x_{k} in the upper lefthand corner, the Fisher matrix will then exhibit a structure of the form displayed (⊠≡ in general a nonzero value).
One may take advantage of the depicted symmetry of the Fisher matrix when trying to compute its inverse: As an illustrative example and again, with the help of formula (34), one may explicitly write the error bars on the position parameters: where we have employed the circular symmetry of proposed template (26).7.4 Extending to real scenarios
So far we have only considered an extremely simplified model. Our purpose was twofold.
Make the problem practical in terms of required resources.
Avoid obscuring and cluttering the fundamentals with complex ‘decoration’ details.
However, we should always keep in mind that the experimental setup we wish to address in the future, the Planck Surveyor satellite, and the object of its study, the full sky, adds other significant challenges as well. Three of them are worth of further discussion.
The inhomogeneity of the statistical properties of the background. When introducing the galactic emission which is mainly limited to a narrow band around the galactic plane, a severe breakdown on the condition of statistical homogeneity occurs. Moreover, the instrumental pixel noise is never completely homogeneous. As result of the scan strategy there are zones of the sphere which will be sampled a greater number of times than others, increasing the integration time and leading to lower noise levels than those of the areas where beam spends less time. Using small patches (∼128/256 × 128/256 pixel) and recomputing the background for each one of them will decrease this problem to a level which can be safely ignored. Though, it must be noted this is not a limitation of the Bayesian framework which, on theoretical grounds, is prepared for dealing with inhomogeneous backgrounds equally well. We only take advantage of the circulant properties of the covariance matrix, transforming to Fourier space as an implementation technique which allows us a much simpler and fast solution to our problem rending it practical and efficient. However, the same set of assumptions is required in the derivation of the optimal linear filters together with an implicit assumption of Gaussianity (HM03).
The nonGaussianity of the galactic diffuse foregrounds. The galactic diffuse fields, for instances dust, which can show considerable values of nonGaussianity, as result of the central limit theorem, only introduce extremely small distortions on the Gaussian distribution of the Fourier modes. The nonGaussian part of the field brings in no more than small correlations between the different Fourier modes which can be securely ignored when compared for instances with the potentially much more serious nonhomogeneity of the background (HM03; Rocha et al. 2005).
The ‘confusion’ noise as result of the contribution to the background of the faint unresolved point sources. This is the dominant component of the background of the Planck highfrequency (545 and 857 GHz) maps on high galactic latitudes. This component may be modelled as a Poissonian sampling process with a constant rate source. Any realistic background model should take this into account.
8 RESULTS
We tested the presented ideas performing an extensive set of simulations using two different toy models and different detection scenarios (the latter solely for the white noise only model, see Section 8.2). In both cases we used a Gaussian source template (6). The parameters for each source were drawn from uniform distributions with bounds given in Table 4. The patches are square grids of 200 × 200 pixel with a border of 2× the largest source radius, in pixel. The sources were spread uniformly throughout the patch. Care was taken to avoid source clustering and the presence of sources in the patch's borders. We imposed a safe distance between sources by defining a ‘bounding’ square, centred on the source with side =γ× 2 ×R pixel wide, where R≡‘radius of object’ (see Table 4 for the values of γ). We forbade the overlapping of these ‘bounding’ squares.
Detection models  
LISNR16var  LISNR16fix  Hobson–McLachlan^{a}  
R_{max} (pixel)  5.4  5.4  10 
R_{min} (pixel)  3.21  3.21  5 
A_{max}  0.76  0.8  1 
A_{min}  0.2  0.22  0.5 
σ  1  1  2 
N_{objs}  16 (variable)  16 (fixed)  8 (variable) 
γ  1.5  1.5  2.5 
Detection models  
LISNR16var  LISNR16fix  Hobson–McLachlan^{a}  
R_{max} (pixel)  5.4  5.4  10 
R_{min} (pixel)  3.21  3.21  5 
A_{max}  0.76  0.8  1 
A_{min}  0.2  0.22  0.5 
σ  1  1  2 
N_{objs}  16 (variable)  16 (fixed)  8 (variable) 
γ  1.5  1.5  2.5 
^{a}astroph/0204457 (HMcL).
8.1 Performance of the algorithms
We evaluated the performance of the different algorithms by computing the ‘total error’ defined as
wherea ‘undetected source’ happens when a simulated object is not recognized by the algorithm as a source;
a ‘spurious detection’ happens when
 –
an object is detected,
 –
there is no simulated object such that the entire set of its parameters belongs to the volume defined by the estimated parameters of the detected object ±3σ_{p};
the ‘percentage’ is computed dividing the quantities by the total number of sources in the patch.
The algorithm that minimizes the ‘total error’ is the one that performs better.
8.2 White noise background
In Table 3 we show the performance of PowellSnakes, where φ is the tuning parameter. The sources are imbedded into a background which is a stationary Gaussian random field with a coherence length much smaller than 1 pixel, usually known as Gaussian white noise process.
Type of simulation  LISNR16fix  LISNR16var  HMcL 
〈ISNR〉  3.84  3.62  4.68 
φ  0.75  0.66  0.66 
N_{simul}  5000  1000  10 000 
Per cent detections  67.41  56.41  82.95 
Per cent spurious  9.60  8.62  8.19 
Total error (per cent)  42.19  52.20  25.15 
Type of simulation  LISNR16fix  LISNR16var  HMcL 
〈ISNR〉  3.84  3.62  4.68 
φ  0.75  0.66  0.66 
N_{simul}  5000  1000  10 000 
Per cent detections  67.41  56.41  82.95 
Per cent spurious  9.60  8.62  8.19 
Total error (per cent)  42.19  52.20  25.15 
Two scenarios were tested.
HMcL (Hobson–McLachlan). With this example we tried to mimic, as exactly as possible, the HM03 ‘Toy problem 4.3’.
LISNR. This example was inspired in another from HM03, where a SZ example was given, but we have only retained the range of the ISNR involved. We did not try to simulate the SZ case at all as we have always worked with the Gaussian source template.
Two situations were investigated.
 –
With a fixed number of sources per patch: ‘fix’.
 –
With a variable number of sources per patch: ‘variable’. In this case the number of sources for each patch was drawn from a uniform distribution whose minimum limit was zero and the maximum was the value displayed in Table 4 (N_{objs}).
Table 4 has a summary of the parameters' values, background and foregrounds (sources), for each of the simulated examples. Fig. 1 represents a typical likelihood manifold for this case (A and R dimensions suppressed).
Each run (map complete cleanup) takes less than a second (∼2 each second) on a regular PC (Pentium IV, 2.39 GHz). A considerable effort to optimize the code was made. An intelligent management of previously computed values was essential in achieving such a high performance. Provisions for parallelizing the code were made but not yet employed.
8.2.1 Discussion
One of our major goals when we started this project was making Bayesian methods run fast enough in order to allow us to collect significant statistics figures by performing massive simulations in a reasonable amount of time. We have fulfilled our goal, as you may verify in Table 3, several thousands of simulations were performed in each proposed scenario making our figures statistically sounding. Our results are consistent with the theoretical framework presented here. Examples with lower ISNR show higher levels of ‘total error’ as expect and significantly higher than the lower bound (see Section 6.3). Cases with a fixed number of sources per patch always show a lower error. This should not be a surprise as in our priors we have assumed a constant expected number of sources per patch and the lower the dispersion in the number of sources the better the prior fitted the data, hence the higher performance. Worth of a reference is the ‘tuning parameter’ (Section 6.4). In Table 3 we are only showing results achieved using the ‘tuning parameter’ values which performed better. All the presented examples are low ISNR cases, thus we have only employed values lower than 1 (see Table 3). These ‘best choice’ values are consistent with our predictions.
8.3 ‘Coloured’ background
Our arrangement for the ‘coloured’ background simulations followed the same guidelines as those for the white noise only case. A greater care concerning the realism of the setup was taken into consideration. We used a pixel size of 1 × 1 arcmin^{2}, and small patches of 200 × 200 pixel (∼3.33°× 3.33°). The background consisted of two uncorrelated components.
A stationary Gaussian random field with a Gaussian power spectrum. The power spectrum, already smoothed by the antenna, was modelled by formula (86) with a coherence lengthscale parameter of s_{0}= 11.57 pixel (see Fig. 11).
Pixel noise with a ‘white’ profile with constant level across the map.
The CMB rms level used was 10 μK and the pixel noise 20 μK which together implied w_{I}= 4/5 and w_{B}= 1/5 (see expression 87, Fig. 12).
The simulated sources had amplitudes ranging from 0.9 to 1.0, in normalized units (normalized by the total rms noise level). The radii of the sources were inside the interval [3.2, 5.0] pixel. The whole setup results in an average 〈ISNR〉 of ≃ 3.84. In each simulated map we put 16 objects with their parameters drawn from uniform distributions (whose limits are given above). Following the same prescription as for the white noise only case, we prevented the sources from forming clusters and being placed on the patch borders (see Fig. 13). Table 3 contains the results obtained. We have performed a total of 5000 simulations.
8.3.1 Discussion
When we introduced correlation in the background, the likelihood manifold has now considerably changed from the white noise only example. When keeping {A, R} constant, looking at the position subspace one may see that the likelihood maxima are now extremely narrow and high and they have become surrounded by subsidiary peaks of considerable height (see Figs 2 and 14). This scenario is even more acute when the proportion of instrumental noise becomes lower in relation to the ‘coloured’ component. In Fig. 15 we depict an exaggerated case (for clarity) where the instrumental noise is only 10 per cent of the astronomical component. Nevertheless, we should stress that current detection instrumentations are already getting very close to these values. The Planck mission Low Frequency Instrument (LFI) (ESA 2005) has a previewed average ratio of ∼20 per cent between the astronomical components and the detectors instrumental noise. Though a more balanced proportion should be expected on the majority of existing instrumental setups. When dealing with such scenarios the search for the likelihood maxima becomes harder. The likelihood peaks being thin and completely surrounded by a very high number of other fake peaks are extremely difficult to find. One now needs to unfold a much higher number of ‘snakes’, typically four times the number of the white noise only case. The use of the ‘jump procedure’ is now mandatory in order to pass through the large accumulation of fake peaks which encircles the real ones. Since the likelihood peaks are now extremely thin, the error bars on the source localization are now extremely small, usually much less than a pixel. This is the equivalent of assuming that the estimated values for the position parameters perfectly match the true ones. When we start employing the Bayesian procedure for validation of the detection we should replace the flat position prior (see 52) by a Dirac delta centred on the values previously estimated (see 107) for the position of the source:
As a final note on the speed of the code execution we must note the substantial decrease of performance when dealing with the ‘coloured background’. The average time for completing a full patch is now about 8 s which is about an order of magnitude more than in the previous case. Despite this significant reduction, our initial purpose (performing massive amount of simulations) has not been compromised as one can still undertake thousands of simulations in a reasonable amount of time.
A summary of the sequence of steps performed by PowellSnakes, version I (the version here exposed), is given in Appendix A.
9 CONCLUSIONS
The detection and characterization of discrete objects is a common problem in many areas of astrophysics and cosmology. When performing this task most methods assume a smoothly varying background with a characteristic scale or coherence length much larger then the scale of the discrete objects under scrutiny. However, highresolution observations of the CMB pose a challenge to such assumptions. CMB radiation has a coherence scalelength of the order of ∼10 arcmin, close to that of the objects of interest such as extragalactic ‘point’ sources or the SZ effect in galaxy clusters. In addition the instrumental noise levels can be greater than the amplitude of the discrete objects. The common approach for dealing with such difficulties is to apply a matched filter to the initial map and instead analyse the filtered map. These approaches have reported good performances. However, the filtering process is only optimal among the limited class of linear filters and is dissociated from the subsequent object detection step of selection performed on the filtered maps. HM03 were the first to introduce a Bayesian approach to the detection and characterization of discrete objects in a diffuse background. Its implementation was performed using MCMC sampling from the posterior which was very computationally expensive. This prompted us to explore a new, fast method for performing Bayesian object detection in which sampling is replaced by multiple local maximization of the posterior, and the evaluation of errors and Bayesian evidence values is performed by making a Gaussian approximation to the posterior at each peak. In this paper we presented such a novel approach, PowellSnakes. We start by prefiltering the map and explicitly show that for a given value of R (radius of the source), peaks in the filtered field correspond to peaks in the loglikelihood considered as function of the position of the sources. We replace the standard form of the likelihood for the object parameters by an alternative exact form that is much quicker to evaluate. Next we launch N downhill minimizations in the twodimensional (X, Y) using the Powell algorithm. In order to avoid local maxima in the likelihood surface we devised the socalled ‘jump procedure’, a new concept borrowed from the realm of quantum mechanics. Next we perform multiple Powell minimizations in the full fourdimensional space (X, Y, A, R), starting each minimization from the positions found in the previous step. To evaluate the performance of an algorithm we introduced here for the first time the concept of ‘symmetric loss’ within the context of the proposed subjects. The Peakbased prior, P, was proposed and an upper bound on the quality of a detection presented. We showed that the quantity NISNR plays an important role when establishing a boundary on the source fluxes that can be reliably detected. We included a discussion on the truncation of the posterior by the prior and proposed a novel solution by introducing the ‘tuning parameter’, φ. We studied in detail the consequences of adding correlation into the diffuse background which makes the substructure where the sources are imbedded. As the linear filter works as a whitening equalizer, the coloured noise case reduces to the white noise as long as we use the ‘whitened’ map and sources. Using simulations of several typical scenarios we showed that our approach performs very well on both white and coloured background. We found that our results are consistent with our theoretical framework. For the white noise background, cases with a lower ISNR exhibit higher levels of ‘total error’ as expected. Cases with a fixed number of sources per patch show a lower error. The number of spurious detections increases with this prior, however, the number of detected sources increases by a larger amount resulting in a lower ‘total error’. When we introduce colour in the background the likelihood changed considerably: the maxima in the position subspace are now extremely narrow and high surrounded by subsidiary peaks of considerable height. The search for the likelihood maxima becomes harder: one has to resort to a larger number of ‘snakes’. The usage of the jump procedure becomes mandatory. In this case the errors in the source localization are very small, usually much less than a pixel. This means that we can replace the flat position prior by a Dirac delta function centred on the values previously estimated.
Furthermore, this approach yields a speedup over samplingbased methods of many 100 s, making the computational complexity of the approach comparable to that of linear filtering methods. Here we presented PowellSnakes, in its first incarnation: ‘PowellSnakes I’. An account of an updated implementation, ‘PowellSnakes II’ will be published shortly. The application of the method to realistic simulated Planck observations will be presented in a forthcoming publication.
PC thanks the Cavendish Astrophysics Group of the University of Cambridge for support and hospitality during the progression of this work. GR acknowledges support from the US Planck Project, which is funded by the NASA Science Mission Directorate. GR would like to acknowledge useful discussions with Krzystof Górsky and Charles Lawrence.
REFERENCES
Appendix
APPENDIX A:
PowellSnakes is a new fast Bayesian approach for the detection of discrete objects immersed in a diffuse background. This new method speeds up traditional Bayesian techniques by
replacing the standard form of the likelihood for the parameters characterizing the discrete objects by an alternative exact form that is much quicker to evaluate;
using a simultaneous multiple minimization code based on Powell's direction set algorithm to locate rapidly the local maxima in the posterior and
deciding whether each located posterior peak corresponds to a real object by performing a Bayesian model selection using an approximate evidence value based on a local Gaussian approximation to the peak. The construction of this Gaussian approximation also provides the covariance matrix of the uncertainties in the derived parameter values for the object in question.
This new approach provides a speed up in performance by a factor of ‘100’ as compared to existing Bayesian source extraction methods that use MCMC to explore the parameter space, such as that presented by HM03. The method can be implemented in either real or Fourier space. In the case of objects embedded in a homogeneous random field, working in Fourier space provides a further speed up that takes advantage of the fact that the correlation matrix of the background is circulant. Its performance is found to be comparable if not better to that of frequentist techniques such as applying optimal and wavelet filters. Furthermore, PowellSnakes has the advantage of consistently defining the threshold for acceptance/rejection based on priors, the same cannot be said of the frequentist methods.
Let's start by recapitulating that the Powell's method is the prototype of ‘directionset methods’ for maximization or minimization of functions. These are the methods to apply when you cannot easily compute derivatives. However, this method requires a onedimensional minimization subalgorithm, e.g. Brent's method. The Brent's method is an interpolation scheme whereby you alternate between a parabolic step and golden sections. It is used in onedimensional minimization problems without calculation of the derivative. (for more details see Teukolsky & Vetterling 1992, pp. 389, 395, 406).
The steps followed in PowellSnakes, version I, are the following.
Compute the classical linear matched filter for the whole patch in the Fourier domain – the denominator is constant and needs to be evaluated only once which can be done in Fourier space as well.
Compute all the constants.
Fourier transform the map – a 4Mpixel map takes at most 3 s (FFTW).
Filter the map (≃2 s).
Transform back to the map space using the inverse Fourier transform (≃3 s).
Evaluate the ln [P(H_{1} d)/P(H_{0} d)] map (≃2 s).
Find the peaks of ln [P(H_{1} d)/P(H_{0} d)] field – check if they are higher than a certain acceptance level: find the peaks using a Powell minimizer with jumps. The evaluation of the objective function is fast – use a kind of evaluation cache (pick a value from a precalculated array).
However, we still need to find the optimal radius, positions and amplitude in real space. When using the matched filter an average value for the radius of the source has been used and the values of the peaks are only approximate. Now we need to find the optimal parameters.
Error bars are calculated from approximate versions of the analytic formulae (while MCMC provides these errors automatically).
The stop criteria – no stop criteria – however there is a way of ensuring that no peak is left behind – we proceed as follows: first of all we can have an idea of the area (not the volume) under the likelihood peaks. Let's consider now an imaginary pixel of the size of this area. Split the map into these imaginary pixels and start a PowellSnakes search in the centre of each one of them. It is unlikely to miss any of the significant likelihood maxima (specially if we use the jump technique). One might think this process to be time consuming but in fact this is not the case since the objective function is precalculated.
A1 Characteristics
Let's now characterize PowellSnakes in terms of its speed and sensitivity.
A1.1 Speed
The PowellSnakes algorithm is a twostep process, in the first part of the algorithm we proceed as follows.
First the map is filtered with a linear matched filter given by
The evaluation is done in Fourier space using the average radius of the possible range of radii. The map is Fourier transformed again back to real space. If the radii of the objects were all equal, and if we knew their true value, the resulted filtered field would be proportional to the likelihood field with the absolute maxima on top of the objects. When the available range of radii is small, filtering with the ‘average’ matched filter will not produce absolute maxima but in the great majority of the situations a local peak is created over the objects.Next, we search this map with the Powell minimizer in two dimensions (the position) with the jumps switched on. This is an extremely fast operation because the objective function is already precalculated, and we just need to pick up a value from an array, or better, compute a bilinear interpolation.
In the second part of the algorithm we proceed as follows.
Now we already have a good idea of where to look further: around the maxima of the likelihood restricted to the subspace of the position of the objects. In step (1) we produced a ‘whitened’ version of the map, and we ‘whitened’ the test objects, at the same time. At this point we switch off the jumps, and launch ‘PowellSnakes’ in all four dimensions (position, radius and amplitude), taking as starting points the previously found maxima restricted to the position subspace. Use the complete expression for the likelihood (no analytical shortcuts).
Once Powell minimizer finishes, pick up the optimal values for the parameters and test them for acceptance/rejection. If and only if the maximum is accepted as ‘good’, subtract the object from the map. (This is different from our old solution: sometimes, when we subtracted a rejected peak in the neighbourhood of a true but still undetected object, we ended up damaging the good peak. Unfortunately most of the times that is enough to prevent the real object to be detected. Right now we only subtract accepted peaks to avoid multiple detections of the same object. We think we should try to avoid the subtraction at all, since there is always the risk that a spurious detection and consequent subtraction will damage a good peak.)
A1.2 Sensitivity
Now, let's consider the sensitivity.
Analytical solutions are in general great, however, when noise is taken into consideration the functions become hard to handle. Furthermore, most of the theorems do not apply in this case. Therefore, one should try to avoid analytical solutions.
Sometimes the estimated values of the parameters fall off the allowed range used to define the Bayesian criterion for acceptance. Therefore, even if the error bars allow them, we do not accept them instead send them for testing (accept/reject), or reject them immediately. Rather you should attach them a very low probability and let the computation continue.
PowellSnakes depends on several parameters in order to optimize its performance/sensitivity, i.e. the number of launched ‘snakes’, average number of sources per patch etc. Since the detection conditions (background, noise, signaltonoise ratio etc.) can exhibit some drastically changes across the sphere, these parameters should be occasionally readjusted. In its current implementation PowellSnakes allows the definition of sphere zones. Within each zone a different set of these parameters may be loaded from the parameter file in order to tune its sensitivity/performance.
Do not expect miracles: if the NISNR is low you cannot detect reliably.
A1.3 Other technical details
The structure of the program was designed to be ready to take full advantage of a multiprocessor system. The performance will scale linearly with the number of processors.
Implemented in ISOc++.
Completely built on templates. This feature allows you to consider the precision you want.
Designed as a ‘framework’, i.e. you do not need to understand how PowellSnakes works to use it. Inherit from its classes, customize it, and PowellSnakes will call your code when needed.