-
PDF
- Split View
-
Views
-
Cite
Cite
René Andrae, Knud Jahnke, Peter Melchior, Parametrizing arbitrary galaxy morphologies: potentials and pitfalls, Monthly Notices of the Royal Astronomical Society, Volume 411, Issue 1, February 2011, Pages 385–401, https://doi.org/10.1111/j.1365-2966.2010.17690.x
- Share Icon Share
Abstract
Given the enormous galaxy data bases of modern sky surveys, parametrizing galaxy morphologies is a very challenging task due to the huge number and variety of objects. We assess the different problems faced by existing parametrization schemes (CAS, Gini, M20, Sérsic profile, shapelets) in an attempt to understand why parametrization is so difficult and in order to suggest improvements for future parametrization schemes.
We demonstrate that morphological observables (e.g. steepness of the radial light profile, ellipticity, asymmetry) are intertwined and cannot be measured independently of each other. We present strong arguments in favour of model-based parametrization schemes, namely reliability assessment, disentanglement of morphological observables and point spread function modelling. Furthermore, we demonstrate that estimates of the concentration and Sérsic index obtained from the Zurich Structure & Morphology catalogue are in excellent agreement with theoretical predictions. We also demonstrate that the incautious use of the concentration index for classification purposes can cause a severe loss of the discriminative information contained in a given data sample. Moreover, we show that, for poorly resolved galaxies, concentration index and M20 suffer from strong discontinuities, i.e. similar morphologies are not necessarily mapped to neighbouring points in the parameter space. This limits the reliability of these parameters for classification purposes. Two-dimensional Sérsic profiles accounting for centroid and ellipticity are identified as the currently most reliable parametrization scheme in the regime of intermediate signal-to-noise ratios and resolutions, where asymmetries and substructures do not play an important role. We argue that basis functions provide good parametrization schemes in the regimes of high signal-to-noise ratios and resolutions. Concerning Sérsic profiles, we show that scale radii cannot be compared directly for profiles of different Sérsic indices. Furthermore, we show that parameter spaces are typically highly non-linear. This implies that significant caution is required when distance-based classification methods are used.
1 INTRODUCTION
In the last 10 yr, the field of galaxy evolution has experienced a boost. With the advent of large ground-based spectroscopic and imaging surveys such as the Sloan Digital Sky Survey (SDSS) (Abazajian et al. 2009) or space-based surveys like COSMOS (Scoville et al. 2007), the data base of galaxies has increased enormously. From both very deep and very wide area surveys substantial amounts of data are available, enabling us to study the dependence of galaxy formation and evolution on, e.g. environment, star formation history or stellar/bulge/black hole mass. It is now possible to test multivariate dependencies and, in conjunction with numerical simulations, to describe possible evolutionary tracks of galaxies, and to single out not yet fully understood phenomena like the colour-bimodality of galaxies (e.g. Strateva et al. 2001) or the linear relation between black hole and stellar bulge mass (e.g. Häring & Rix 2004; Woo et al. 2006).
Studies of galaxy morphologies are very important in this context, because different morphologies are caused by different physical processes that are likely to also affect other properties, e.g. star-forming rate, and may also correlate with environment. Despite these efforts, it is still a very challenging task to meaningfully describe (parametrize) the morphologies of galaxies in very large data samples. Although we are well able to parametrize the morphologies of individual galaxies of certain types (e.g. Simmat, Tuffs & Popescu 2010), finding a parametrization scheme that is able to account for the huge variety of galaxy morphologies is a completely different task.
1.1 Strategy
In this paper, we discuss the concept of parametrization and summarize commonly used parametrization schemes, namely CAS (Abraham et al. 1994, 1996; Bershady, Jangren & Conselice 2000), M20 (Lotz, Primack & Madau 2004), Gini (Lotz et al. 2004, 2008), Sérsic profile (Sérsic 1968; Graham & Driver 2005), shapelets (Réfrégier 2003) and sérsiclets (Ngan et al. 2009). We categorize these schemes and identify important differences. However, the main intention of this article is to determine if there are any fundamental problems involved in the parametrization of galaxy morphologies, which may turn out to be subtle or non-obvious. Our investigations are designed to test the current paradigm favouring model-independent schemes. It has already been shown that the diagnostic power of shapelets is limited for elliptical galaxies (Melchior et al. 2010), whereas the method of sérsiclets has not yet been successfully established. Therefore, we focus our attention on the caveats involved in the usage of the other parametrization schemes. In the course of this investigation, we demonstrate that morphological observables are intertwined. This new insight implies that all schemes that try to estimate observables separately without addressing their inherent degeneracies are problematic in principle.
In the remaining part of this introduction, we define the terms ‘galaxy morphology’ and ‘parametrization’ and discuss what parametrization is meant to achieve. In Section 2, we introduce two conceptually different approaches to parametrization, namely model-independent (CAS, M20, Gini) and model-based schemes (Sérsic profile, shapelets, sérsiclets). As a first fundamental problem and one of our main results, we illustrate in Section 3 that morphological observables are intertwined and cannot be measured independently. Secondly, we investigate the impact of the point spread function (PSF) on the concentration index in Section 4. Thirdly, we consider general problems affecting the classification of galaxy morphologies in Section 5. Finally, in Section 6 we summarize our results and give recommendations for improvements of existing or the design of new parametrization schemes.
1.2 Galaxy morphology
The morphology of a galaxy is defined by the characteristics of its two-dimensional light distribution, i.e. by the projected shape of the galaxy. Some morphological observables are as follows:
steepness of radial light profile
ellipticity (i.e. orientation and axial ratio)
asymmetry (e.g. lopsidedness)
substructures (e.g. spiral arm patterns, bars, etc.)
size
centroid.
The centroid position is an important morphological observable as well, since it is often required to derive other morphological estimators (cf. Table 1). For decades galaxy morphologies have been studied in the visual regime, where all these observables are reasonably well defined. However, with increasing observational coverage of the electromagnetic spectrum, it became evident that morphology is a strongly varying function of wavelength. For instance, in the UV we observe mostly star-forming regions but no dust emission, such that galaxies can look patchy and highly irregular. On the other hand, in the far infrared, there is almost no stellar but only dust emission. As we discuss in Section 2.4, many parametrization schemes for galaxy morphologies make rather restrictive assumptions that are too specialized on the visual regime and cannot be generalized to the whole electromagnetic spectrum. As our discussion is set in the context of large surveys where galaxies exhibit a huge variety of different morphologies, we have to look for parametrization schemes that are flexible enough to describe arbitrary morphologies.
Characteristic | C | A | S | M20 | G | Sérsic profile | Shapelets | Sérsiclets |
Model-based | N | N | N | N | N | Y | Y | Y |
Centroid estimate necessary | Y | Y | N | Y | N | Y | Y | Y |
Account for steepness of light profile | N | N | N | N | N | Y | N | Y |
Account for ellipticity | Ya | Yb | Yc | N | N | Y | Y/Nd | Y |
Account for substructures | N | Y | Y | N | N | N | Y | Y |
Characteristic | C | A | S | M20 | G | Sérsic profile | Shapelets | Sérsiclets |
Model-based | N | N | N | N | N | Y | Y | Y |
Centroid estimate necessary | Y | Y | N | Y | N | Y | Y | Y |
Account for steepness of light profile | N | N | N | N | N | Y | N | Y |
Account for ellipticity | Ya | Yb | Yc | N | N | Y | Y/Nd | Y |
Account for substructures | N | Y | Y | N | N | N | Y | Y |
a We can employ elliptical isophotes to compute C.
bA is invariant under all operations that are symmetric under rotations by 180°. Ellipticity is such an operation.
c It is possible to use an elliptical Gaussian for convolution.
d There are spherical and elliptical shapelet formalisms.
Characteristic | C | A | S | M20 | G | Sérsic profile | Shapelets | Sérsiclets |
Model-based | N | N | N | N | N | Y | Y | Y |
Centroid estimate necessary | Y | Y | N | Y | N | Y | Y | Y |
Account for steepness of light profile | N | N | N | N | N | Y | N | Y |
Account for ellipticity | Ya | Yb | Yc | N | N | Y | Y/Nd | Y |
Account for substructures | N | Y | Y | N | N | N | Y | Y |
Characteristic | C | A | S | M20 | G | Sérsic profile | Shapelets | Sérsiclets |
Model-based | N | N | N | N | N | Y | Y | Y |
Centroid estimate necessary | Y | Y | N | Y | N | Y | Y | Y |
Account for steepness of light profile | N | N | N | N | N | Y | N | Y |
Account for ellipticity | Ya | Yb | Yc | N | N | Y | Y/Nd | Y |
Account for substructures | N | Y | Y | N | N | N | Y | Y |
a We can employ elliptical isophotes to compute C.
bA is invariant under all operations that are symmetric under rotations by 180°. Ellipticity is such an operation.
c It is possible to use an elliptical Gaussian for convolution.
d There are spherical and elliptical shapelet formalisms.
1.3 Observation, parametrization, inference
In this section we want to clarify the role of parametrization, i.e. what purpose it serves and what its benefits are. Parametrization is one step in the sequence of observation, parametrization and inference, which is visualized in Fig. 1.

Interplay of observation , parametrization
and inference
. This paper is concerned with the existence of the mapping
, which is necessary for
to exist.
The process of observation provides a non-linear mapping of the true intrinsic galaxy morphology to an observed morphology. This mapping
comprises the projection on to the two-dimensional sky, the binning to pixels, the addition of pixel noise and the convolution with the pixel-response function (gain of the detector). It also involves the convolution with the PSF, taking into account seeing effects, optics and instrument sensitivity.
However, analysing galaxy morphologies directly in pixel space is infeasible, since the number of pixels is typically very large. Therefore, it is necessary to parametrize the observed morphology ( in Fig. 1), a step that has the two following aims: first, we want to reduce the degrees of freedom, since there is a lot of redundant or uninteresting information in pixel space. Secondly, we want to move from pixel space to some other description that better suits a given physical question. Effectively, this means that parametrization can act as a method to reduce the dimensionality of the problem, to suppress noise and to extract information. Note that this definition of parametrization encompasses more than just data modelling.
Based on such a parametrization we can then try to infer the true intrinsic morphology. For instance, inference can be based on the search of multivariate dependencies of morphological descriptors on physical parameters or on classification. The inference step corresponds to the mapping in Fig. 1, where obviously
, i.e. both mappings
and
need to be invertible – at least in a practical sense. Often inference does not aim at the true intrinsic morphology, but at some abstract type or class that represents a reasonable generalization. Still, if either
or
destroys too much information, this type of inference is impossible as well. For
being (approximately) invertible, the observation has to have a high signal-to-noise ratio and a high resolution relative to the features of interest (critical sampling). If this requirement is not met by the data, the observation will not resemble the true morphology and inference will be impossible. Bamford et al. (2009) observe this problem in the Galaxy Zoo project and term it ‘classification bias’. They noticed that type fractions resulting from visual classifications of 557 681 SDSS galaxies with redshifts z < 0.25 evolve significantly with z. As Bamford et al. (2009) do not expect a pronounced morphological evolution in this redshift regime, they assign this effect to the degradation of image quality with increasing redshift.
If is invertible – i.e. whether or not
and thus
exists – depends on the parametrization scheme. This is the topic of this paper. Consequently, a reliable parametrization is as important for inference as sufficient data quality.
2 PARAMETRIZATION SCHEMES
In order to assess the advantages and deficits of different parametrization schemes we now briefly summarize the most common approaches. We divide them into model-independent and model-based approaches. The most important difference is that the model-based approaches try to model the two-dimensional light distribution of an image and are thus mostly descriptive. Model-independent approaches more directly try to extract physical information, hence mixing description and inference steps. We conclude this section by summarizing the assumptions involved in the parametrization schemes.
2.1 Model-independent schemes
The foremost reason to use a model-independent – or ‘non-parametric’– approach is that it appears to be very simple at the first glance. Most of these parametrization schemes seem easy to implement, since they do not require to fit a model. Furthermore, parameters in all these schemes have at least a rough physical interpretation.
2.1.1 CAS system



2.1.2 M20 and Gini




2.2 Model-based schemes I: Sérsic profile
2.2.1 Definition




2.2.2 Redefiningbn andβ
It is important to note that bn and β in equation (7) are completely degenerate. We are free to make any choice of bn that is different from equation (8), thereby redefining the model and changing the meaning of β. There are two reasons why equation (8) potentially is not a good choice for bn:
- from a theoretical point of view, half-light radii β are not comparable for different values of nS, i.e. the size of one galaxies relative to a second galaxy can be inferred from their scale radii if and only if both Sérsic models use identical nS. However, in practice this is rarely a problem, since studies usually compare only sizes of galaxies of similar Hubble types, e.g. in studies of the size-evolution of disc galaxies. None the less, choosing bn according to equation (8), we must not demand that β is smaller than the image size, since β cannot be interpreted this way. We actually need to require that the profile drops within the image boundaries. Fig. 2 shows that the radii where the profile reduces to
and
of its value at r = 0 vary over several orders of magnitude for different nS. The scale radius β is more intuitively defined such that
for some X > 0 independent of nS. This can be achieved by setting bn=b = log X for all nS. Panel (b) of Fig. 2 shows that in this case the radii for different nS change by less than two orders of magnitude and hence can be compared much better.10 It is well known that there is a strong correlation of nS and β (e.g. Trujillo, Graham & Caon 2001), which is problematic for many fit algorithms. This correlation of nS and β is almost completely induced by equation (8), i.e. it is artificial. We can remove this correlation by setting bn = log X for all nS, thereby simplifying the fit problem. We demonstrate this in Fig. 3 showing χ2 manifolds for fitting an artificial light profile once using equation (8) (panel a) and once using bn = log X for all nS (panel b). The noise level in this simulation is low (the signal-to-noise ratio of the central peak is 100). Higher noise levels will not change the curvatures of the χ2‘valleys’ in Fig. 3 but will only broaden them and reduce their depth.

Radii rX where Sérsic profile takes values I(rX)/I(0) = 1/X for X = 2, 4, 10 and bn given by equation (8) (panel a) and bn = log 4 (panel b).

χ2/dof manifolds demonstrating how equation (8) induces the artificial correlation of nS and β. (a) χ2/dof manifold for bn defined by equation (8). The white diamond indicates the optimum. The dashed white line is given by and follows the valley, thereby illustrating that the correlation of nS and β is artificial. (b) Same as in (a) but for bn = log 4 for all nS. The valley is approximately parallel to the nS-axis, i.e. the correlation is gone. Both panels use the same artificial light profile with low noise level to evaluate the χ2/dof manifold. It is much easier to find the optimum in panel (b) than in panel (a). The optimal values of nS are identical in (a) and (b), whereas the optimal values of β are different due to the different choice of bn. χ2/dof is not a simple quadratic form, because the Sérsic profile is a non-linear model.
These issues are not fundamental and there is no theoretical preference for choosing between these approaches apart from the fact that equation (10) is likely to provide more robust parameter estimates. Furthermore, it is possible to convert to and fro the definitions of equations (8) and (10) via .
2.3 Model-based schemes II: expansion into basis functions
An alternative model-based parametrization approach is the expansion into basis functions. The most important advantage of this concept is that the parametrization is more flexible, whereas all previous schemes are highly specialized for certain morphologies. A good set of basis functions should be able to fit almost anything, provided the signal-to-noise ratio of the given data is sufficiently high. Hence, this approach should in principle be favoured when the task at hand is to parametrize arbitrary morphologies.
Basis-function expansions are very common in physics and also in cosmology (e.g. decomposing the CMB into spherical harmonics). Usually, the basis functions are chosen based on symmetry arguments or best as eigenfunctions of the differential equations describing the underlying physics. However, we do not know the physics governing galaxy morphologies yet, hence there is no theoretically motivated choice for the set of basis functions. Therefore, basis functions are chosen such that they possess advantageous analytic properties or overcome special problems.
In the following, we introduce the concept of basis functions. We briefly comment on the issues of orthonormality and completeness and then discuss example sets of basis functions.
2.3.1 General concept

After fitting the model defined by equation (11) to the image, we obtain estimates for the coefficients cn and the parameters θn for all basis functions. Usually, the θn are used to incorporate several effects. For instance, there is typically a size parameter that scales the spatial extent of the basis functions such that the coefficients cn do not depend on the size of the object. If this is the case, then the basis functions are called ‘scale invariant’. The centroid position can also be part of θn. The linear coefficients cn are supposed to capture the morphological information.
2.3.2 Orthonormality and completeness
As aforementioned, sets of basis functions are often orthonormal and complete. The orthogonality would ensure that all coefficients were completely independent of each other. The completeness would allow us to decompose an arbitrary image. In practice, however, the completeness is lost due to pixel noise and pixellation, which sets an upper limit to the number of basis functions that can be used to decompose a given image. This can lead to characteristic modelling failures. We discuss this in slightly more detail in the next section. The strict orthogonality is also lost, due to pixellation (Melchior, Meneghetti & Bartelmann 2007). This means that the resulting coefficients may exhibit minor correlations, but if the galaxy image and all basis functions are critically sampled, these correlations will be negligible.
2.3.3 Shapelets

The Gaussian weight function of shapelets leads to very nice analytical properties. For instance, shapelets are nearly invariant under Fourier transformation, which makes any convolution or deconvolution a closed and analytic operation in shapelet space, as described in Melchior et al. (2009). However, the limitation of basis functions due to pixel noise has a severe consequence: shapelets employ a Gaussian weight function (cf. equation 12), but real galaxies have typically much steeper profiles. This gives rise to characteristic modelling failures that typically manifest themselves in ring-like artefacts in the shapelet reconstructions of galaxies with exponential or steeper light profiles. This severely limits the diagnostic power of shapelets (cf. Melchior et al. 2010) and we therefore exclude them from our subsequent simulations.
Despite these fundamental problems, shapelets demonstrate a very important aspect of basis-function expansions: for highly resolved galaxies of high signal-to-noise ratios Sérsic profiles are incapable of providing excellent models as they are not flexible enough to account for substructures such as spiral arm patterns, i.e. their residuals do not always reach noise level. In case of shapelets – as an example of basis functions – this is fundamentally different. They are highly flexible and reach noise level even for galaxies that are very large, highly resolved and bright (e.g. Andrae, Melchior & Bartelmann 2010).
2.3.4 Sérsiclets
Given the problematic impact of the Gaussian profile on shapelets, a set of basis functions based on the Sérsic profile is an obvious means to overcome the limitations of shapelets. The resulting basis functions are called sérsiclets. Ngan et al. (2009) were the first to realize the potential of this approach, which is capable of accounting for all morphological observables listed in Section 1.2. However, for technical reasons their implementation of sérsiclets was flawed, as we illustrate in an upcoming paper (Andrae et al. in preparation). We therefore also exclude sérsiclets from our simulations.
2.3.5 Outlook: template libraries
We already argued that no basis set – apart from the pixel grid itself – is actually complete due to the limitations induced by pixel noise. Now, we want to briefly touch – without going into details – on a set of basis functions that is finite and thus incomplete from the beginning. The motivation is very simple: for both shapelets and sérsiclets the basis functions lack a physical interpretation. Why not use basis functions that directly correspond to spiral arms, galactic bars or rings? We can use a set of such templates– a template library – to form linear models and decompose the image, resulting in a set of coefficients that form a vector space. The individual templates do not even need to be orthogonal, but just as linearly independent as possible in order to avoid heavy degeneracies during the fitting procedure. Unfortunately, the direct physical motivation is also the major drawback of this approach, since we are strongly prejudiced and lack flexibility in this case. For instance, template libraries are likely to have severe problems in decomposing irregular galaxies, i.e. they are inappropriate for parametrizing arbitrary morphologies. Moreover, the set of morphological features is very large, hence such a library has to contain numerous templates.
2.4 Assumptions
It is crucial to be aware of all assumptions made by a certain method when using it, since if a method fails, it usually fails because one or more of its assumptions break down. In case of model-based approaches, the assumptions are usually rather obvious and therefore can be easily challenged. In contrast to this, the assumptions of model-independent approaches are implicit and often hidden. This may lead to the misapprehension that model-independent schemes were superior since they required fewer or even no assumptions.
In Table 1, we summarize our categorization of parametrization schemes. Based on this table and the definitions given in the previous sections, we now work out the assumptions of all schemes from a theoretical point of view. In practice, it is virtually impossible to satisfy all assumptions. Whether the violation of some assumption leads to a breakdown of a certain method depends on the specific question under consideration, the desired precision, the details of the method’s implementation and the quality of the data. In detail, the assumptions are as follows.
Concentration index: there are no azimuthal structures such as spiral-arm patterns or galactic bars.3 The pixel noise is negligible and the object is not grossly asymmetric such that a centroid is well defined (cf. Section 3.3). The scheme can be enhanced using elliptical apertures.
Asymmetry index: a centre of rotation is well defined. The pixel noise is negligible. Both issues have been addressed by Conselice, Bershady & Jangren (2000). The asymmetry of interest is visible under rotations of 180°.
Clumpiness index: the functional type of the kernel matches the galaxy profile. The width of the kernel is chosen such that the information of interest is extracted. The ellipticity of the kernel matches the ellipticity of the object.
M20: the pixel noise has negligible impact on the estimates of centroid and second moments. The centre of light and the object’s centre coincide, i.e. there is no substantial asymmetry. The structures dominating M20 are of circular shape with the centroid at their centres.4
Gini coefficient: the pixel noise is negligible (see Lisker 2008).
Sérsic profile: the Sérsic profile is a good match of the object’s light profile. In particular, this means that the object’s light profile is symmetric, monotonically decreasing and the steepness is correctly described by the model, and there are no azimuthal structures such as spiral arm patterns, galactic bars or rings.
(Spherical) shapelets: employing the Gaussian weight function fits galaxy profiles. Using spherical basis functions that have no intrinsic ellipticity does not lead to problems.
We now clearly see that model-independent schemes implicitly make assumptions, too. This list suggests that non-parametric approaches tend to invoke fewer assumptions than model-based schemes5 at the loss of reliability, as we are going to demonstrate in the following sections. We also want to emphasize that shapelets – as an example of basis functions – can describe asymmetries.
3 INTERTWINEMENT OF MORPHOLOGICAL OBSERVABLES
The basic idea of model-independent schemes is to estimate the different morphological observables listed in Section 1.2 independently of each other, thereby simplifying the problem. However, in this section we present as one of our main results the fact that these morphological observables are intertwined, which means that it is impossible to measure them independently of each other. Even if we try to measure only a single observable using a method unaware of the other observables, the mere presence of these observable features will influence the results. The notion of intertwinement should not be confused with redundancy, e.g. Sérsic index and concentration index are perfectly redundant (Section 3.1) but asymmetry and concentration index are not (Section 3.3). Of course, for some observables the intertwinement is stronger than for others. This intertwinement is not of physical origin but stems from the fact that usually all morphological observables are present simultaneously, such that the assumptions listed in Section 2.4 are never truly satisfied.
We carry out noise-free simulations of the different parametrization schemes and by doing so we reveal several systematic misestimations – in particular of the concentration index. All simulations invoke Sérsic profiles and we want to explicitly emphasize that it is not necessary for real galaxies to actually follow Sérsic profiles.6 However, as we demonstrate in Section 3.1, Sérsic profiles provide parametrizations that are in excellent agreement with estimates of light concentration. This would not be the case if Sérsic profiles were a bad description. Pixel noise in real data may hide these biases to some extent, but they will still be present.
3.1 Example I: Sérsic profile versus concentration index



Comparing concentration and Sérsic indices of 31 288 COSMOS galaxies from the Zurich Structure and Morphology catalogue (Sargent et al. 2007) (blue points) with the numerical solution (red solid curve) and power-law fit of equation (14) (orange dashed curve). Shown are COSMOS galaxies with I < 22.5, valid axis ratios (0 < q≤ 1), and flags ‘stellarity’, ‘junkflag’ and ‘flagpetro’ of 0. Concentration indices were predicted from analytic Sérsic profiles using numerical integration out to one Petrosian radius. There was no pixellation.
3.2 Example II: steepness of light profile versus ellipticity
Our second example is the intertwinement of the steepness of the radial light profile and the ellipticity. These two are certainly the most important morphological observables listed in Section 1.2, having the largest impact on parametrization results.
It is obvious that estimates of the steepness of the radial light profile must take into account ellipticity. Therefore, it is necessary to use elliptical isophotes in case of the concentration index or to fit a two-dimensional Sérsic profile that is enhanced by an ellipticity parameter. Unfortunately, in case of the SDSS, the aperture radii containing 50 and 90 per cent of the total image flux given in the SDSS data base are chosen as circular apertures (Strauss et al. 2002). This implies that estimates of the concentration index drawn from these values may be biased. In fact, this bias was already discussed by Bershady et al. (2000). They investigated how the concentration index changes with axial ratio for samples of real galaxies of similar morphological types. Bershady et al. (2000) claim that using circular apertures causes an overestimation of concentration indices of at most 3 per cent and is therefore negligible. We investigate this effect in Fig. 5 for a realistic range of axis ratios, as is evident from panel (a). Panel (b) shows how the concentration index is influenced by the axial ratio for Sérsic profiles with fixed Sérsic indices, corresponding to galaxy samples of similar morphologies as in Bershady et al. (2000).8 Evidently, for q≳ 0.5– which is the majority of galaxies in the given set – the bias is negligible. There are galaxies with q < 0.5, which are typically disc-like galaxies with shallow light profiles. For those objects concentration estimates based on circular isophotes are substantially overestimated (≈30 per cent for nS = 1). This bias is not negligible. Bershady et al. (2000) based their investigation on estimated concentration indices of real galaxies. Hence, the most likely origin of this discrepancy in our results is that the intrinsic scatter in the real data used by Bershady et al. (2000) hid this bias. Considering ellipticity and concentration index together – instead of using an elliptical concentration index – is not likely to solve this problem. The reason is that incorporating an ellipticity estimate may add information about the cause of the bias of the concentration index, but it does not provide information about the effect of this bias. Finally, we want to emphasize that Fig. 5 must not be used to calibrate the biased concentration estimates resulting from circular apertures. The reason is that this would now require Sérsic profiles to be a realistic description of galaxy morphologies. Moreover, also the study of Bershady et al. (2000) cannot be used for such a purpose, because the bias clearly depends on the intrinsic concentration. This means that such a correction would require prior knowledge about the object’s true concentration.

Impact of ellipticity on concentration estimates. Panel (a) shows the distribution of axis ratios q=b/a for 2272 SDSS galaxies from the data sample of Fukugita et al. (2007). Panel (b) shows concentration estimates using circular isophotes for elliptical Sérsic profiles with nS = 0.5 (solid orange line), nS = 1 (dashed red line), nS = 2 (dot–dashed blue line) and nS = 4 (dotted black line).
Vice versa, Melchior et al. (2010) showed in the context of weak gravitational lensing that ellipticity measurements using shapelets are strongly biased in case of steep profiles. In other words, shapelets fail to provide reliable ellipticity estimates, because they do not properly account for the steepness of the radial light profile. This impressively demonstrates that these two observables may be closely intertwined.
3.3 Example III: Impact of lopsidedness on centroid estimation
As a third example for the intertwinement of morphological observables, we consider the impact of asymmetry on centroid estimates and the resulting parameter estimation using two-dimensional Sérsic profiles. We simulate a certain type of asymmetry, namely lopsidedness. In order to introduce lopsidedness analytically, we apply the flexion transformation from gravitational weak lensing (Goldberg & Bacon 2005) to the Sérsic profiles as explained in Appendix A. The strength of the flexion transformation is parametrized by F1, F2, G1 and G2. There is no pixel noise in this simulation. Fig. 6 shows Gaussian profiles resulting from this transformation.9 The resulting distortions are not unrealistically strong.

Gaussian profiles of different lopsidedness. The applied flexions are F1 = 0.0 (top left), F1 = 0.0325 (top right) and F1 = 0.065 (bottom). The resulting profiles exhibit realistic lopsidedness. All profiles are evaluated on a 1000×1000 pixel grid using a scale radius of β = 50. White diamonds indicate the maximum position.

![Impact of lopsidedness on centroid (a), asymmetry with respect to maximum (b), absolute difference of asymmetries with respect to centre of light and maximum (c), concentration with respect to maximum (d) and relative difference of concentrations with respect to centre of light and maximum (e). Lopsidedness leads to a difference in maximum position and centre of light. Furthermore, lopsidedness creates asymmetry. Asymmetries evaluated with respect to the maximum or centre of light can differ substantially given that A∈[0, 2]. The concentration evaluated at the maximum position is almost insensitive to lopsidedness. However, the concentration with respect to centre of light is strongly underestimated. All Sérsic profiles are evaluated on a 1000×1000 pixel grid using β = 50. See footnote for explanation of the steps in panels (c) and (e).](https://oup.silverchair-cdn.com/oup/backfile/Content_public/Journal/mnras/411/1/10.1111/j.1365-2966.2010.17690.x/2/m_mnras0411-0385-f7.jpeg?Expires=1748402101&Signature=giDe9iU0SG5FWvNnR6VeJSftLAoNOuRqftwc07Gb5P6dvq5AlX4ZjStGCBax4fAivAdd6ZdPq2H2UYCQtpuAHv4F9PsMSbjWWtZDRhOW8WcJ9JQArP5AT0braYY6uswmvPsgacQN83s7uFoq7qc7sZLX77bXGifCzgW2G2QAaBn1kO~OVGI0LB6~eeNH3WyQtaPIAEhrgRk9wAFpOQJjezWZJO7wzJjLIbJQ-HlObDnpWABnZ9thz72NrBHs2kwXnVIbO~dJqLMxPQSlMgzJulTwO03NdpHARxs3ssBVeP5wJM0PtTSlo9nu47ZeBOxVU3clsnYb9D12tz-24ckmZg__&Key-Pair-Id=APKAIE5G5CRDK6RD3PGA)
Impact of lopsidedness on centroid (a), asymmetry with respect to maximum (b), absolute difference of asymmetries with respect to centre of light and maximum (c), concentration with respect to maximum (d) and relative difference of concentrations with respect to centre of light and maximum (e). Lopsidedness leads to a difference in maximum position and centre of light. Furthermore, lopsidedness creates asymmetry. Asymmetries evaluated with respect to the maximum or centre of light can differ substantially given that A∈[0, 2]. The concentration evaluated at the maximum position is almost insensitive to lopsidedness. However, the concentration with respect to centre of light is strongly underestimated. All Sérsic profiles are evaluated on a 1000×1000 pixel grid using β = 50. See footnote for explanation of the steps in panels (c) and (e).
We have demonstrated that the parametrization results differ significantly depending on whether we use the centre of light or the maximum position as centroid. How do we resolve this ambiguity? And how do we get the maximum position in practice, when we suffer from pixel noise? If the parametrization scheme was model-based, the model would define the centroid during the fit procedure – even in the presence of pixel noise. For instance, the Sérsic profile should use the maximum position as centroid, whereas shapelets can use both maximum position and centre of light. However, since C, A and M20 are not model-based, we have to resort to convention or ad hoc solutions. In case of the asymmetry index, Conselice et al. (2000) solved this problem by searching for the position that minimizes the value of the asymmetry index, also considering resampling the image on a refined pixel grid. They were able to show that there are usually no local minima of asymmetry indices and hence that their method is stable. In case of the concentration index, using the maximum position appears to be more plausible than the centre of light, since Cmax appears to be robust against lopsidedness. Unfortunately, the concentration does not provide us with model and residuals, hence we cannot estimate the most likely maximum position in the presence of noise. However, we can apply the same ad hoc solution that Conselice et al. (2000) introduced for the asymmetry index by searching the position that maximizes the concentration estimate. Nevertheless, this method increases the computational effort tremendously such that the required computation time is approximately of the same order as, e.g. fitting a shapelet model. We conclude that concentration and asymmetry estimates are neither easy to implement nor computationally faster than model-based approaches. In case of M20, there is a theoretical preference to use the centre of light, since it minimizes the total second moments.
3.4 Example IV: impact of lopsidedness on ellipticity estimators




Fig. 8 shows results of this simulation. For perfectly symmetric profiles (F1 = 0) the estimator indeed does not detect any ellipticity. However, if F1 increases, the ellipticity estimator will be biased. The bias is stronger for steeper profiles. The maximum bias is (b/a≈ 0.877), which is substantial.

Impact of lopsidedness on real (a) and imaginary (b) part of . Considering
, the real part is strongly biased by the lopsidedness. The imaginary part is unbiased due to the geometry of F1 (cf. Fig. 6). All Sérsic profiles are evaluated on a 1000 × 1000 pixel grid using β = 50.
We conclude from this simulation that asymmetries have a potentially strong impact on ellipticity estimates, i.e. asymmetry and ellipticity are intertwined. For instance, this is relevant in case of using elliptical isophotes for estimating the concentration index.
3.5 Reliability assessment
In the previous sections we have demonstrated that some important morphological observables cannot be measured independently of one another. Given this, it cannot be guaranteed that estimates of an individual observable will result in a parametrization which is unbiased by the other observables. As all the parametrization schemes mentioned in Section 2 are derived on rather restrictive assumptions (cf. Section 2.4), their flexibility in describing arbitrary galaxy morphologies is therefore limited. Consequently, it cannot be expected that these schemes provide accurate descriptions of all individual objects in a given data sample.
Can we assess the quality or reliability of the parametrization results for individual objects, i.e. can we detect objects where the parametrization failed in order to sort them out?11 If we are using a model-based parametrization scheme (e.g. shapelets or Sérsic profiles), the residuals of the resulting best fit will provide us with an estimate of the goodness of fit. For instance, a very large value of χ2 compared to the number of degrees of freedom indicates a poor fit, i.e. we should not rely on the parametrization of this individual object. However, if the parametrization scheme is not model-based – as in case of CAS, M20 and Gini – we have no residuals and hence we have no way of assessing the reliability for individual objects.12
3.6 How to disentangle observables
As we showed above, morphological observables are intertwined and cannot be measured independently. Is there a way to get independent estimates?





The marginalization integrals of equations (21) and (22) are usually very hard to evaluate, unless we use Markov chain Monte Carlo (MCMC; e.g. MacKay 2008) methods. In case of MCMC methods, we get those marginalizations for free, without any further effort.
4 IMPACT OF PSF ON THE CONCENTRATION INDEX
In Section 3, we introduced the notion of intertwinement that may systematically influence morphological parameters. Another important origin of systematic effects is the PSF, as we illustrate in this section. The fact that parameters such as the concentration index may be influenced by the PSF is not new but has been long known. For instance, Scarlata et al. (2007) find that the PSF has a significant effect for objects with half-light radii smaller than two full width at half-maximum (FWHM) of the Hubble Space Telescope ACS PSF and with high Sérsic index, while the effect is negligible for larger objects. In an attempt to overcome this bias, Ferreras et al. (2009) applied a correction to the measured concentration parameter, based on the half-light radius. The aim of this section is to reassess the impact of the PSF on estimates of the concentration index.
4.1 Forward versus backward PSF modelling
In case of model-based parametrization schemes it is standard practice to account for the PSF by forward modelling, i.e. to fit a convolved model to the convolved data. In case of parametrization schemes that are not model-based this is impossible and we have to resort to backward PSF modelling, i.e. we deconvolve the data before the actual parametrization is done. However, deconvolution in the presence of pixel noise is numerically unstable, so forward PSF modelling is to be favoured if possible. This is another practical disadvantage of parametrization schemes that are not model-based, because they need to perform either an unstable backward modelling or they need to invoke another ad hoc correction calibrated in simulations. Such simulation-based calibrations introduce a further assumption into the parametrization process. Model-based schemes are much more rigorous in this respect, since they allow for a mathematically well-defined PSF treatment that does not introduce any further assumption.
4.2 Impact on concentration
In case of the ZEST, Sargent et al. (2007) accounted for the PSF by forward modelling when estimating the Sérsic index, while Scarlata et al. (2007) neglected the PSF when estimating the concentration index. The fact that the results shown in Fig. 4 are in agreement with theoretical predictions suggests that in the case of the COSMOS data the PSF can indeed be neglected for the concentration index. Therefore, the theoretical prediction supports the claim by Scarlata et al. (2007). Nevertheless, this single example should not mislead us to generalize this conclusion. It is not guaranteed that the PSF will have no impact on the concentration index for data sets other than COSMOS that exhibit different signal-to-noise ratio, PSF and resolution.
In order to test the impact of the PSF on the concentration index, we generate two-dimensional Sérsic profiles with nS = 0.5, 1, 2, 4 and convolve these profiles with a Gaussian kernel of increasing FWHM.13 We expect that the concentration indices of very steep Sérsic profiles are severely underestimated, since the PSF washes out the sharp peak. For lower Sérsic indices this effect becomes smaller. For nS = 0.5 the concentration should not be affected at all, since convolution of a Gaussian with a Gaussian yields a Gaussian, i.e. the steepness of the profile does not change. Fig. 9 confirms our expectation. If we ignore the PSF, we can significantly underestimate the concentration index.

Impact of PSF on misestimation of concentration index for different PSF sizes and Sérsic profiles. All Sérsic profiles are evaluated on a 1000 × 1000 pixel grid using β = 50 and bn = 2nS− 1/3. With increasing PSF size with respect to the object size the concentration index estimated from the convolved image is more and more underestimated.
We conclude from this test that although the PSF is indeed negligible in case of the ZEST, this cannot be generalized to other data sets. Consequently, a PSF treatment is always necessary at least when using the concentration index. In particular concerning ground-based telescopes, the PSF is usually not small compared to the peak exhibited by highly concentrated objects.
5 PARAMETRIZATION AND CLASSIFICATION
We now discuss the parametrization of galaxy morphologies in the context of classification. First, we show that if we do not account for all morphological observables simultaneously, the effects discussed in the previous sections can dilute discriminative information. Secondly, we show that all parametrization schemes discussed here form non-linear or even discontinuous parameter spaces. Thirdly, we comment on the problem of high-dimensional parameter spaces.
5.1 Loss of discriminative information
The conclusion from our investigation of the intertwinement was: if a parametrization scheme does not account for all morphological observables simultaneously, the results will be systematically altered, i.e. biased. How does this influence classification results? For a large sample of objects, the origins of these systematic effects have random strength. Consequently, we have to expect an increase in the scatter of the resulting parameters. The sample distributions of the parameters will be broadened due to the additional scatter, i.e. peaks in the distributions are reduced and troughs between different peaks are washed out. In other words, we are losing discriminative information.
We now demonstrate this broadening of parameter distributions: we generate samples of two-dimensional Sérsic profiles with fixed Sérsic indices of nS = 1, 2, 3, 4. We then add a random ellipticity and a random lopsidedness via the flexion transformation of equation (A10). The flexion parameter F1 is drawn from a uniform distribution on the interval [−0.065, 0.065]. The ellipticity is drawn from the joint distribution of Sérsic indices and axis ratios of 2000 COSMOS galaxies randomly drawn from the Zurich Structure & Morphology catalogue. We then sample the Sérsic profiles on a 1000 × 1000 pixel grid using a scale radius of β = 50. We convolve the resulting image with a Gaussian PSF of FWHM=37.5 chosen such that the effects of Fig. 9 are present but moderate. There is no pixel noise in this simulation. From the pixellized image we then estimate the concentration with respect to the maximum position and the centre of light, since Sérsic index and concentration are two different estimators for the same morphological feature. Concentration estimates also take into account elliptical isophotes, where the ellipticity is estimated via equation (16) with respect to the maximum position and the centre of light, respectively.
Fig. 10 shows the results of this simulation. The distributions of concentration indices have a finite width, in contrast to the distribution of the Sérsic indices, which are infinitely thin δ-peaks. Consequently, we are indeed losing discriminative information. In reality this loss may be even more severe, since the distribution of Sérsic indices has itself a finite width. Moreover, Fig. 10 reveals that the loss of discriminative information is stronger for the concentration index evaluated at the centre of light. Especially for large Sérsic indices the peaks are lowered and broadened. This is a strong argument to evaluate the concentration at the maximum position (if it were accessible), since we conserve more discriminative information. In the presence of an unconsidered PSF, the parameter space is substantially biased. This has the advantage of reducing the width of the distributions, but it also shifts the different modes closer together. If the distribution of Sérsic indices had a finite width, this would wash out the troughs separating the peaks.

Normalized sample distributions of concentration indices estimated with respect to (a) the maximum position of unconvolved image, (b) the centre of light of unconvolved image and (c) the centre of light of convolved images. The modes in the distributions correspond to samples of 10 000 profiles each with fixed Sérsic indices of exactly nS = 1, 2, 3, 4 (from left to right). The finite widths of all modes in all distributions indicate the loss of discriminative information. This is particularly evident in panel (b), where the modes of very compact objects are substantially broadened. All Sérsic profiles were evaluated on a 1000 × 1000 pixel grid using a scale radius of β = 50. The Gaussian convolution kernel for panel (c) was evaluate on the same pixel grid with FWHM = 37.5.
This simulation demonstrates that an incautious use of the concentration index (ignoring asymmetries and the PSF) can lead to a substantial loss of discriminative information. In practice, this loss causes sample distributions of the concentration index to be of low modality, despite the diversity of the galaxy population – a problem already mentioned by Faber et al. (2007). Consequently, the concentration index can only provide a lower bound on the number of classes in a given data sample. If the sample distribution of the concentration is unimodel, this does not imply that all objects are of the same type. The loss of discriminative information implies that the mapping from Section 1.3 does not always exist for the concentration index, i.e. drawing inference is a very difficult task.
5.2 Non-linear and discontinuous parameter spaces
This section highlights an additional problem, which is independent of the previous considerations. It is based on the fact that all parametrization schemes discussed here are non-linear in the data. As a direct consequence of this, the resulting parameter spaces form non-linear spaces too. If the parameter space is non-linear, the distance metric will be non-linear too. Although this fact may be known, it is typically ignored in practice. Usually, the Euclidean metric is employed whenever a distance-based algorithm is used, e.g. a principal components analysis (Scarlata et al. 2007) or classification algorithms (e.g. Gauci et al. 2010). The crucial question is: does ignoring the non-linearity and employing the Euclidean distance lead us to misestimate the true distances between galaxy morphologies in the parameter space? If so, galaxies will seem more similar or less similar than they actually are and hence distance-based classification algorithms may face serious problems. There are only few classification algorithms that do not rely on distances (e.g. Fraix-Burnet, Davoust & Charbonnel 2009).
5.2.1 Non-linearity

We begin by considering CAS (equations 12–3). Apart from the obvious non-linearities in C due to the logarithm and the ratio of radii, the computation of the radii containing 20 and 80 per cent of the total flux itself is highly non-linear. The non-linearities in A and S are caused by the fractions and absolute values in the numerators. Gini (equation 6) and M20 (equation 5) are both non-linear in the data, too. For both of them the major non-linearity is hidden in the sorting of the pixel values. The Sérsic model given by equation (7) contains the Sérsic index and the scale radius as non-linear parameters.
The non-linearity of (spherically symmetric) shapelets is due to the scale radius β and the centroid x0. Both enter the basis functions non-linearly, as is evident from equation (12). The non-linearity of shapelets has been investigated in detail by Melchior et al. (2007), so we do not need to elaborate on this here. In case of sérsiclets, the Sérsic index is another non-linear model parameter in addition to the scale radius.
5.2.2 Demonstration of non-linearity of C, A and Gini


Two-dimensional profiles with different asymmetries used for demonstration of non-linearity. All objects are evaluated on a 1000 × 1000 pixel grid with scale radius β = 50. No intrinsic ellipticity was applied. All maximum positions are identical. Profile I1 (top left) has flexion G1 = 0.1 and nS = 0.5. Profile I2 (top right) has flexion F1 = 0.05 and nS = 1. Profile I3 (bottom) has flexion G1=−0.1 and nS = 4.
Fig. 12 shows the trajectories in the subspaces of C, A and Gini. Example objects I1 and I2 have very similar Sérsic indices and flexion parameters, hence their transition produces trajectories that are only moderately non-linear. However, example object I3 is very different from I1 and I2 and thus its transitions produce trajectories that exhibit substantial non-linearities. Note that the non-linearities in Fig. 12 are primarily induced by the lopsidedness via the asymmetry parameter, as is evident from the centre panel where A is not shown and virtually all non-linearity is gone.

Trajectories in CA-Gini subspaces revealing substantial non-linearities. Left-hand panel: trajectories in CA space. Centre panel: trajectories in C-Gini space. Right-hand panel: trajectories in Gini-A space. In this simulation the non-linearity is induced by the different lopsidedness of all objects (cf. Fig. 11). The asymmetry is evaluated with respect to the maximum position, whereas the concentration is evaluated with respect to the centre of light.
We conclude from this simulation that for galaxy morphologies exhibiting realistic asymmetries the Euclidean distance is a very poor approximation to distances in parameter space. Consequently, any algorithm based on Euclidean distances would severely underestimate the true distances, i.e. objects would appear more similar than they actually are. This may be an explanation why the drop in the spectrum of eigenvalues of the principal components analysis of Scarlata et al. (2007)– which justifies the reduction of dimensionality – is not very decisive. It may also partially account for the difficulty of recovering visual classifications using automated algorithms (see e.g. Gauci et al. 2010). This is no particular drawback of C, A and Gini, but applies to all other parametrization schemes discussed here. It is highly questionable whether a ‘calibration’ of the Euclidean distance in order to account for the non-linearity is possible. The reason for this is that, due to non-linearity, the distance is an unknown function of the positions of both objects in parameter space, i.e. the distance depends on the morphology. One possible solution is to try to estimate the true distance via a linear transformation as given by equation (24), although that is computationally very expensive. Another option is to employ a method called ‘diffusion distance’ (Richards et al. 2009) in order to estimate the true non-linear distances.
5.2.3 Discontinuity of spaces formed by C and M20
In Fig. 13, we investigate the behaviour of concentration and M20 under a linear transformation between two Sérsic profiles. C and M20 exhibit substantial discontinuities due to pixellation effects. These effects increase for decreasing resolution (i.e. decreasing β in Fig. 13).

Discontinuity of concentration (a) and M20 (b). For poor sampling (small β), concentration index and M20 exhibit substantial discontinuities. For better sampling (larger β) the discontinuities decrease. The transition was between two Sérsic profiles with nS = 0.5 and nS = 2.0 and no intrinsic ellipticity or lopsidedness. The scale radii were β = 5 (blue lines) and β = 15 (red lines), respectively. The profiles were evaluated on a 300 × 300 pixel grid.
In the case of C, the discontinuities occur because the radii containing 20 and 80 per cent of the total image flux can only change in discrete steps. With increasing resolution, the pixel size decreases and the discontinuities of R20 and R80 become smaller (cf. panels (a) and (b) in Fig. 13). Hence, this is not a problem for well-resolved galaxies as in Fig. 12. However, it is a problem for poorly sampled galaxies. In this case, we can overcome this problem by interpolating the pixellized image and integrating numerically. Unfortunately, this would drastically increase the computational effort. In fact, the discontinuity of the concentration index has already been observed by Lotz et al. (2006).
In the case of M20, the origin of the discontinuity is the sum over the second-order moments in the numerator of equation (5), which stops as soon as 20 per cent of the total flux is reached. This threshold is the problem, as it causes the set of pixels fulfilling this criterion to change abruptly during the linear transformation. Again, the discontinuities of M20 decrease with increasing resolution. However, for poorly sampled galaxies we cannot overcome these discontinuities by interpolation, since the definition of M20 only makes sense for pixellized images.
A parametrization scheme forming discontinuous parameter spaces is problematic, because it is not guaranteed that objects with similar morphologies end up in neighbouring regions of the parameter space. This implies that distances in the space formed e.g. by M20 do not necessarily correlate with the similarity of galaxy morphologies. We need similar morphologies to have smaller distances than dissimilar morphologies, but this is not guaranteed for C and M20 if the resolution is poor. Fig. 13 suggests that such discontinuities become important when galaxies are smaller than 10 pixels in radius, maybe even earlier depending on the precise morphology. In this case, we even cannot rely on hard-cut classifications and it is questionable whether meaningful classification based on distances is possible at all.
5.3 High-dimensional parameter spaces
Concerning classification, the current paradigm appears to favour low-dimensional parameter spaces (e.g. Scarlata et al. 2007) that simplify the analysis or even allow a visual representation. However, we have to keep in mind that a high-dimensional parameter space may be necessary in order to differentiate between different groups of galaxy morphologies. There is no physical reason to expect that a two- or even three-dimensional parameter space should be able to host such groups without washing out their differences. This solely depends on the complexity of the physics governing galaxy morphologies.
In particular, basis-function expansions typically form parameter spaces of high dimensionality. For instance, the morphological parameter space used by Kelly & McKay (2005) had 455 dimensions. Apart from problems with visualization, we suffer from what is commonly called the curse of dimensionality (Bellman 1961): the hypervolume of a (parameter) space grows exponentially with its number of dimensions.14 Consequently, the density of data points in this parameter space is suppressed exponentially. Therefore, it is impossible to reliably model a data distribution in a parameter space of several hundred dimensions, no matter how much data are available. Nevertheless, it is preferable to employ a parametrization scheme that produces a high-dimensional parameter space. Loosely speaking, it is better to start with too much information than with too little. We can overcome the curse of dimensionality, if we compress the parameter space, i.e. if we reduce its number of dimensions by identifying and discarding unimportant or redundant information. For instance, Kelly & McKay (2004, 2005) applied a principal component analysis (PCA) in order to reduce the dimensionality of their parameter space.15 An alternative approach to overcome the curse of dimensionality is to employ a kernel approach by describing the data using a similarity measure. We demonstrated in Andrae et al. (2010) that this yields excellent results, e.g. allowing us to classify 84 galaxies populating a 153-dimensional parameter space into three classes.
6 SUMMARY AND CONCLUSIONS
In this paper, we have described and compared two different approaches to the parametrization of galaxy morphologies: first, model-independent schemes – CAS, Gini and M20. Secondly, model-based schemes – Sérsic profiles and basis functions.
Our most important result is that morphological features (steepness of light profile, ellipticity, asymmetry, substructures, etc.) are intertwined and (at least some) cannot be estimated independently without introducing potentially serious biases. This intertwinement stems from the violation of one or more assumptions invoked by the parametrization schemes. We emphasize that combining separate estimates of individual observables does not overcome the intertwinement. For instances, combining an ellipticity estimate and the fit of a circular Sérsic profile does not give the same result as fitting an elliptical Sérsic profile. No parametrization scheme discussed in this article accounts for all these observables simultaneously, i.e. their usage will inevitably cause problems when trying to parametrize large samples of galaxies that exhibit a huge variety of morphologies.
In the context of classification of galaxy morphologies, which is an important application, we have the following results.
The intertwinement can wash out discriminative information in the context of classification.
All parametrization schemes form non-linear parameter spaces with a potentially highly non-linear and unknown metric. Distance-based classification algorithms that employ the Euclidean distance measure therefore suffer from a loss of discriminative information.
For poorly resolved galaxies (object radius smaller than ≈10 pixels), concentration and M20 form discontinuous parameter spaces that do not conserve neighbourhood relations of morphologies and may therefore fool classification algorithms.
Due to the complexity of a non-linear metric, it appears unlikely that calibrating results obtained from Euclidean distance is possible. As we cannot expect to find a parametrization scheme that is linear in the data, a more promising approach is to estimate the non-linear metric, e.g. via diffusion distances (Richards et al. 2009), or to use a classification algorithm that is not distance-based. An example for such an algorithm can be found in Fraix-Burnet et al. (2009).
6.1 Arguments in favour of model-based approaches
In this paper, we also collected arguments in favour of model-based approaches:
a (compact) model defines the term ‘centroid’, i.e. whether we have to use the centre of light or the maximum position.
a model allows us to disentangle observables by marginalizing the joint posterior distribution of all observables.
a model allows us to assess reliability by providing residuals.
a model allows forward PSF modelling, which is more stable than backward modelling in the presence of pixel noise.
Each of these arguments by itself disfavours model-independent approaches. Therefore, we conclude that schemes such as CAS, Gini and M20 are problematic for three reasons.
They try to measure morphological features independently ignoring their intertwinement (e.g. concentration does not account for asymmetry and vice versa).
They do not provide residuals, i.e. we can neither assess reliability (to sort out failures for individual objects) nor marginalize.
They do not allow forward PSF modelling, i.e. we may suffer from the instability of backward modelling, or, we need to introduce further assumptions via calibrations.
Moreover, we have seen that robust implementations of CAS and M20 are neither easy nor computationally fast, since we have to consider centroid misestimations and – in the case of the concentration index – interpolation.
We conclude that model-based parametrization schemes are clearly superior. They provide reliable parametrization schemes in all regimes of signal-to-noise ratios and resolutions. For low signal-to-noise ratios and low resolution, the Sérsic profile allows excellent parametrizations (e.g. Sargent et al. 2007). In the limit of high signal-to-noise ratios and high resolutions the method of shapelets is flexible enough to provide excellent model reconstructions (e.g. Andrae et al. 2010). With the advent of sérsiclets there will be another set of basis functions that is designed to provide even better parametrizations than shapelets (Andrae et al. in preparation).
6.2 Trade-offs
Throughout this work we were facing two important trade-offs when comparing different parametrization schemes for arbitrary galaxy morphologies, namely
simplicity versus reliability and
interpretation versus flexibility.
The first trade-off – simplicity versus reliability – is obvious. When dealing with large data samples, we have to find a parametrization scheme that is not too expensive from a computational point of view. Apart from computational aspects, we also favour simple solutions in general (Occam’s razor). However, we have to beware of oversimplification which inevitably leads to unreliable results. The borderline between reasonable simplification and oversimplification should be defined by the data only and not by the researcher.
The second trade-off – interpretation versus flexibility – is at the heart of this article. We have seen that parametrization schemes that easily offer interpretation often lack flexibility (e.g. CAS), whereas other schemes (e.g. shapelets) excel in flexibility but lack interpretation. This is still an open issue and more work is needed on the interpretation of basis-function expansions.
We should also add that there is actually no trade-off concerning computational feasibility. The parametrization of samples of galaxies is trivial to parallelize, i.e. it can be done on numerous computers simultaneously.
6.3 Recommendations and outlook
We do not conclude that CAS, Gini and M20 should not be used anymore. According to their assumptions as given in Section 2.4, these parametrization schemes are highly specialized on certain morphologies and their usage should be safe, if it is ensured that the sample of interest only contains galaxies of this special type. However, this obvious lack of flexibility renders these approaches inappropriate for general samples. Our most important recommendations for using CAS, Gini and M20 are as follows.
A PSF treatment is necessary at least in case of the concentration index.
Beware of undersampling effects in case of concentration index and M20. Discontinuities can appear for objects of up to 10 pixels in radius.
Beware of the centroid ambiguity: even for galaxies with realistic asymmetries the centre of light and maximum position does not coincide. In case of the concentration index, we recommend to fit for the centroid by maximizing C, similar to the method of Conselice et al. (2000).
Concerning the concentration index, we also recommend to use it only in the regime of intermediate signal-to-noise ratios and resolutions. The reason is that its assumptions (Section 2.4) are almost identical to the assumptions of a Sérsic profile. As a rule of thumb we can say that the concentration index is reliable whenever the Sérsic profile is a good description, and vice versa.
Currently, the most reliable parametrization scheme is the two-dimensional Sérsic profile enhanced by ellipticity, since it accounts for the steepness of the light profile and for ellipticity. These are definitely the two most important morphological observables. In the presence of asymmetries we recommend defining the centroid by fitting for the maximum position of the profile rather than fixing it to the centre of light. However, the Sérsic profile does not account for asymmetry or substructures and is thus of limited usefulness for samples containing highly irregular galaxies and in the regime of high signal-to-noise ratios and high resolutions. Moreover, we have shown that the scale radius of the Sérsic profile is difficult to interpret. In particular we have argued that the scale radii of profiles of different Sérsic indices cannot be compared directly. We also demonstrated that a redefinition of the Sérsic model may simplify the fitting procedure and provide more robust parameter estimates.
Our main conclusion is: none of the existing parametrization schemes is applicable to the task of parametrizing arbitrary galaxy morphologies that occur in large samples, since they all have their drawbacks. Therefore, we need a new parametrization scheme. Our recommendations for its design are as follows.
It should be model-based.
It should estimate all relevant morphological features simultaneously.
It should provide excellent model reconstructions of galaxies in the regime of high signal-to-noise ratios and high resolutions.
It should form a metric parameter space such that it is possible to estimate distances for classification purposes.
One possible solution is to modify e.g. the Sérsic profile in order to account for asymmetries and substructures (Galfit 3; Peng et al. 2010). In our opinion basis functions are also promising candidates to describe arbitrary morphologies, since they are highly flexible. However, current sets of basis functions still lack direct physical interpretation. Currently, we reinvestigate the method of sérsiclets which appears to be the most promising approach given the considerations of this paper.
There are several variations of the concentration index: sometimes it is based on the ratio of r90 and r50. Some authors (e.g. Bershady et al. 2000) consider the whole image for estimating C, others (e.g. Scarlata et al. 2007) estimate C within a region given by one Petrosian radius.
The scale radius β is expressed in units of pixels, i.e. β−1 is the pixel size relative to the object size.
This is a mathematical and deeply implicit assumption that is generally not realized when working with actual galaxy data: the ‘radii’ used to compute the concentration index are estimated from a curve of growth. This curve of growth is actually a two-dimensional integral over the galaxy’s light profile (though it is usually reduced to a summation due to pixellation to allow a comment on an actually irrelevant practical detail). Nevertheless, it is inevitable to parametrize this integration in some way in order to be capable of evaluating it (analytically or numerically). In simple words, one has to define what ‘radius’ means (e.g. spherical or elliptical radius) and this definition is the assumption. For instance, assuming spherical integration contours, the curve-of-growth integral of an image f(r, φ) reads , where the integral has been parametrized in polar coordinates. In fact, Fig. 5 can be understood as investigating what happens if the curve of growth indeed takes this spherical form but the image data are not spherically symmetric but elliptical. More physically, though already beyond the point: in case of an image that has perfectly circular or elliptical symmetry, the azimuthal integration in p(R) is well defined and so are the radii and the concentration index. However, if there is more complicated azimuthal structure than ellipticity, there is no simple way to generally define the curve of growth. Either, the integration is along true isophotes. In this case, the shape of the integration regions will vary from object to object and potentially also with radius. Then the resulting concentration indices would not be comparable. The other option is to integrate along given circular or elliptical isophotes to enforce comparability. This approach explicitly assumes that there is no azimuthal structure or else the radius in p(R) has no strict relation to the galaxy, and the estimated curve of growth will be biased. The justification to use this in practice is to assume that in reality objects of similar type will catch similar biases, such that concentration indices still have discriminative power in a differential sense, though their absolute values may be biased. Furthermore, the mere presence of such a bias does not automatically imply that the resulting estimates of the curve of growth and the concentration index, respectively, are not meaningful anymore.
This assumption stems from the term (xn−x0)2 in equation (4).
However, it is not true in general that model-independent schemes invoke fewer assumptions than model-based approaches. As an exception to this ‘rule’, compare concentration index and shapelets.
To be more precise, it is perfectly valid to use such idealized simulations to discover these biases, but in order to correct for them more realistic simulations are necessary.
Obviously, Sérsic profiles are rather idealized and by far not as realistic as the sample used by Bershady et al. (2000). However, this does not hamper the validity of this test, but rather serves the purpose of isolating this bias. Apart from that, there is no difference in both studies.
The flexion transformation of equation (A10) will produce a second solution of x′ = 0, which corresponds to multiple images in weak lensing. We only consider cutouts with just one image, where the other image resulting from the second solution to x′ = 0 is far away.
The steps in panel (c) are due to the computation of Acol, since xcol is changing as F1 increases. Whenever xcol enters a new pixel, the set of pixels used for computing Acol changes. There are also steps in Ccol, but they are very small.
Note that this task is completely different from testing the reliability using simulations. Such simulations allow us to assess and calibrate a parametrization scheme in general, but they do not help in detecting parametrization failures for individual objects.
Note that reliability assessment and error estimation are two different things. Error estimation is possible for model-independent approaches, e.g. via bootstrapping.
We are aware that the COSMOS PSF is not a Gaussian. This test is meant to demonstrate the principle of this effect.
Consider a hypercube of edge length L in d dimensions. Its hypervolume Ld grows exponentially with d.
Unfortunately, a PCA is a risky and often inappropriate tool in the context of classification. The reason is that PCA diagonalizes the sample covariance matrix, i.e. it assumes that the whole data sample comes from a single Gaussian distribution. This assumption obviously jars with the goal of assigning objects to different classes.
RA thanks Eric Bell for discussions that initialized this work. RA also thanks Matthias Bartelmann, Thorsten Lisker, Aday Robaina Rapisarda, Mark Sargent, Paraskevi ‘Vivi’ Tsalmantza, Glenn van de Ven and Katherine Inskip for helpful comments on the content of this paper. Furthermore, we thank Claudia Scarlata for pointing out a mistake in an earlier version of this manuscript. RA is funded by a Klaus-Tschira scholarship. KJ is supported by the Emmy-Noether-programme of the DFG. PM is supported by the DFG Priority Programme 1177.
REFERENCES
Appendix
APPENDIX A: SHEAR AND FLEXION TRANSFORMATION
We now briefly resume the shear and flexion transformation we are using to simulate ellipticity and lopsidedness – the latter being a special kind of asymmetry.











