A method for classification of red, blue and green galaxies using fuzzy set theory

Red and blue galaxies are traditionally classified using some specific cuts in colour or other galaxy properties. These cuts remain largely arbitrary supported only by empirical arguments. Galaxy colours show a gradual transition and the vagueness associated with such cuts are likely to introduce a significant contamination in these samples. Fuzzy set theory provides a natural framework to capture the classification uncertainty in the absence of any precise boundary. We propose a method for classification of galaxies according to their colours using fuzzy set theory.

Introduction: Colour is considered to be one of the fundamental properties of a galaxy. It is defined as the ratio of fluxes in two different bands and characterizes the stellar population of a galaxy. Understanding the distribution of galaxy colours and their evolution can provide important clues to galaxy formation and evolution.
The modern redshift surveys like SDSS [1] have now measured the photometric and spectroscopic information of a large number of galaxies. Using SDSS, Strateva et al. [2] first revealed a striking bimodal feature in the distribution of galaxy colours. Subsequent studies with SDSS and other surveys [3] confirmed the colour bimodality and quantified it in greater detail. The distribution of galaxy colour may depend on various other parameters related to the galaxies and their environment. Balogh et al. [4] and Baldry et al. [5] explore the dependence of colour bimodality on luminosity, stellar mass and environment. Observed colour bimodality indicates the existence of two different populations namely the 'red sequence' and 'blue sequence' with a significant overlap between them. The overlapping region is often termed as 'green valley'. Driver et al. [6] argued that galaxy colours are an outcome of the mixture of colours from stars in the disc and bulge of a galaxy and the colour bimodality can be related to the bimodal distribution of their bulge to disc mass ratios.
The origin of the observed colour bimodality must be explained by successful models of galaxy formation. Considerable efforts have been directed towards reproducing the observed distribution of galaxy colours using different semi-analytic models of galaxy formation [6,7]. Significant number of studies have been also devoted to understand the spatial distribution of red and blue galaxies using various clustering measures like two-point correlation function [8], three-point correlation function [9], genus [10], filamentarity [11] and local dimension [12]. The studies of mass function of red and blue galaxies [13,14] play a crucial role in guiding the theories of galaxy for- * E-mail: biswap@visva-bharati.ac.in mation and evolution. Any such analysis requires one to classify the red and blue galaxies using some operational definition. But colour is a subjective judgment and it is difficult to classify galaxies according to their colours in an objective manner. Strateva et al. [2] prescribed that the red and blue galaxies can be separated by using an optimal colour separator (u − r) = 2.22. However a substantial overlap between the two populations prohibit us to objectively define galaxies as 'red' and 'blue' based on their colours. The boundary separating the two populations remains arbitrary and currently there are no meaningful theoretical argument favoring any specific cut over others. Applying a hard-cut to define samples of red and blue galaxies is expected to introduce significant contamination in these samples.
The fuzzy set theory was first introduced by Lotfi A. Zadeh [15] which has been applied in various fields such as industrial automation [16], control system [17], image processing [18], pattern recognition [19], robotics [20], optimization [21] and economics [22]. It provides a natural language to express vagueness or uncertainty associated with imprecise boundary.
In this paper, we propose a framework to describe galaxy colours using a fuzzy set theoretic approach. We also study the prospects of using fuzzy relations to understand the relationship between different galaxy properties by treating them as fuzzy variables.
Fuzzy set: A fuzzy set A in an Universal set X is defined as a set of ordered pairs [15], S describes the relations between the elements of the two fuzzy sets A 1 and A 2 . The strength of the relation between the ordered pairs of the two sets is expressed by the membership function of the relation µ A1×A2 (x, y) = µ S (x, y) = min µ A1 (x), µ A2 (y) . The fuzzy relation S itself is a fuzzy set whose membership function is represented with a matrix known as relation matrix. Why fuzzy set? Application of a hard-cut yields two crisp sets corresponding to red and blue galaxies. This denies a gradual transition between the two populations which is mathematically correct but unrealistic. In reality, there is an uncertainty involved in the decision of labelling a galaxy to either 'red' or 'blue' class whenever the colour falls in the neighbourhood of the precisely defined border. This uncertainty is completely ignored while using any hard-cut separator. The fuzzy set theory provides a framework to capture such uncertainties or vagueness in natural language. In fuzzy set theory, an element may partly belong to multiple fuzzy sets with different degree of memberships. This flexibility delivers a greater expressive power which can be used effectively whenever the boundaries are imprecise.
SDSS Data: The Sloan Digital Sky Survey (SDSS) is one of the most successful sky surveys of modern times. The SDSS has so far measured the photometric and spectroscopic information of more than one million of galaxies and quasars in five different bands. We use data from SDSS DR16 [23] for the present analysis. We retrieve data from the SDSS SkyServer 1 using Structured Query Language. We download the spectroscopic information of all the galaxies with r-band Petrosian magnitude 13.5 ≤ r p < 17.77 and located within 135 Here α, δ and z are right ascension, declination and redshift respectively. We obtain a total 376495 galaxies which satisfy these cuts. We then apply a cut to the K-corrected and extinction corrected r-band absolute magnitude −21 ≥ M r ≥ −23 which corresponds to a redshift cut of 0.041 ≤ z ≤ 0.120. This produces a volume limited sample with 103984 galaxies. We use a ΛCDM cosmological model with Ω m0 = 0.315, Ω Λ0 = 0.685 and h = 0.674 [24].
Definition of fuzzy sets for different colours and luminosity: We do not label galaxies as purely red, blue or green. Rather we attach 'redness', 'blueness' or 'greenness' to all the galaxies. We use SDSS data to define fuzzy sets corresponding to three colours red, blue and 1 https://skyserver.sdss.org/casjobs/ green. We define a fuzzy set R corresponding to 'redness' of galaxies using (u − r) colour of 103984 galaxies in the volume limited sample as, . Here X is the Universal set of (u − r) colour of all galaxies. We choose a sigmoidal membership function, , where a and c are constants. We choose a = 5.2 and c = 2.2 for our analysis. It may be noted that the parameter c represents the crossover point of the fuzzy set R where µ R (u − r) = 0.5 and parameter a is responsible for the slope at the crossover point. The crossover point of a fuzzy set is the location where the fuzzy set has maximum uncertainty or vagueness. SDSS observations of the bimodal distribution of (u − r) colour show that the two peaks corresponding to 'red' and 'blue' population merge together at (u − r) ∼ 2.2. It is most difficult to classify a galaxy as red or blue with (u − r) colour around this value. A larger 'redness' is associated with smaller 'blueness' and vice versa. So we can define a fuzzy set B corresponding to 'blueness' of galaxies by simply taking a fuzzy complement of the normal fuzzy set R i.e. B = R c . The membership function µ B (u − r) of fuzzy set B can be simply obtained as follows, The green galaxies are the galaxies which simultaneously belong to both the fuzzy sets R and B. The fuzzy set G corresponding to the 'greenness' of galaxies can be defined from the two fuzzy sets R and B by taking a fuzzy intersection between them i.e. G = R ∩ B. The fuzzy set G is pointwise defined by the membership function µ G (u − r) as where min is the minimum operator. Clearly G will be a subnormal fuzzy set with a height 0.5. Both the fuzzy sets R and B have a membership value of 0.5 at the crossover point (u − r) = 2.2 where the uncertainty is maximum. However a galaxy with (u − r) colour of 2.2 should be maximally green with a membership function µ G (u − r) = 1. To normalize the fuzzy set G, the membership values corresponding to each (u − r) values in it are scaled by a factor of 2.
It is worthwhile to mention here that the hard-cut samples can be easily obtained from these fuzzy sets in a more reasonable manner by constructing appropriate α−level sets which are crisp sets. The elements that belong to a fuzzy set with at least a degree of membership α is called the α-level set: One can construct such samples for the red, blue and green galaxies by applying an appropriate α-level cut to the fuzzy sets R, B and G respectively.
Similarly, we also define a fuzzy set for luminosity of SDSS galaxies as, where Y is the Universal set of r-band absolute magnitude (M r ) of all galaxies. The membership function of this fuzzy set is, The choice of this membership function is based on the fact that a difference of 1 in absolute magnitude corresponds to a factor of 2.512 in luminosity. Thus the brightest galaxy in the sample have a membership function of 1 and the membership function µ L (M r ) for the galaxies in L are assigned following equation 9. All the fuzzy sets R, B, G and L defined here are convex fuzzy sets.

Results and conclusions:
In the top left panel of figure  1, we show the pdf of (u − r) colour of SDSS galaxies in the volume limited sample analyzed here. The bimodal nature of the (u − r) colour can be clearly seen in this figure. It has been shown that the colour distribution can be well described by a double Gaussian [4,5,14]. We observe that the pdf of the (u − r) colour exhibit two distinct peaks which merge together at (u − r) ∼ 2.2. The (u − r) colour of SDSS galaxies show a gradual transition on either sides of this merging point. Traditionally, the galaxies are classified as 'red' or 'blue' by employing a hard-cut around the merging point of the two peaks. However, any (u−r) colour close to this border should be regarded as an equal evidence for both the classes. The uncertainty reaches its maximum at the border which is completely overlooked by such a hard-cut separator. We treat this merging point as the point of maximum confusion and use it as the crossover point (µ R (u − r) = 0.5) for the membership function (equation 4) of the fuzzy set of red galaxies (R) in SDSS. We obtain the fuzzy set of blue galaxies (B) in the SDSS by taking a fuzzy complement of the fuzzy set R. The green galaxies belong to an intermediate class which are most difficult to disentangle. We carry out a fuzzy intersection between the two fuzzy sets R and B to construct the fuzzy set of green galaxies (G) in the SDSS. We compute the membership functions of all the SDSS galaxies in these fuzzy sets following equations 4, 5 and 6. The membership functions µ R (u − r), µ B (u − r) and µ G (u − r) corresponding to red, blue and green galaxies respectively are shown as a function of (u − r) colour of SDSS galaxies in the top right panel of figure 1. We can see that galaxies are maximally green at the merging point of the peaks where classifying a galaxy as red or blue is most uncertain.
Thus, in this framework all galaxies are considered to be red, blue and green irrespective of their (u−r) colours.
The membership functions of the fuzzy sets for the red, blue and green galaxies quantify the degrees of 'redness', 'blueness' and 'greenness' of the member galaxies in the respective fuzzy sets.
We note that the hard-cut samples for red, blue and green galaxies can be easily obtained from the fuzzy sets R, B and G by applying an appropriate α-cut to the membership functions of these fuzzy sets. For instance, the top right panel of figure 1 suggests that one can obtain reasonably clean samples of red, blue and green galaxies by applying a threshold of 0.65 to the membership functions of the fuzzy sets R, B and G respectively.
We define the fuzzy set L for luminosity of SDSS galaxies using their r-band absolute magnitudes. The membership function µ L (M r ) of the fuzzy set L is defined according to variations in luminosity with absolute magnitude (equation 9). We show the membership function µ L (M r ) of the fuzzy set L for SDSS galaxies as a function of r-band absolute magnitude (M r ) in the bottom left panel of figure 1. We also plot the membership function (µ L (M r )) of set L against the membership function µ R (u − r) of set R of the SDSS galaxies in the bottom right panel of figure 1. The bottom right panel of figure  1 shows that there are two distinct peaks located near the both extremities of µ R (u − r) around which the members of the two fuzzy sets are clustered. Clearly, near the lowest value 0 of µ R (u − r) in R, the members tend to have a relatively lower values of µ L (M r ). On the other hand, the members of the fuzzy set R tend to have preferentially larger values of µ L (M r ) adjacent to its highest membership value µ R (u − r) = 1. A combination of two α-cuts within 0 to 1 on the two membership functions µ R (u − r) and µ L (M r ) can be used to define a galaxy sample with desired properties. This can be extended to any number of fuzzy sets and hence allows a greater flexibility in classifying galaxies with different properties.
We compute the fuzzy relation S between the fuzzy sets L and R using equation 2. The resulting relation matrix is shown in figure 2. This shows the strengths of association for different possible combinations of r-band absolute magnitude (M r ) and colour (u − r). We find that for (u − r) > 2.2, the membership function of the fuzzy relation gradually increases with increasing luminosity and the association is strongest near the largest luminosity. Contrary to this, no such gradients are seen for (u − r) < 2.2. This implies that brighter galaxies are fairly redder for galaxies with (u − r) colour larger than 2.2. But this is not necessarily the case when we consider the region below (u − r) < 2.2 which represents less redder or bluer galaxies. However, one should remember that such a boundary is not precise and a mixed behaviour can be observed in the proximity of (u−r) = 2.2. We should also remember that different elements in fuzzy sets only partially belong to them and carry different degrees of membership. In probability theory, we may be simply interested in calculating the covariance of two ran-dom variables M r and (u − r) which can be represented by a 2 × 2 matrix. The elements corresponding to each variables in that case are treated as crisp as there are no uncertainty about their memberships. Also the covariance matrix does not provide the association between all possible elements of two sets.
One important caveat in our analysis lies in the choice of the membership function of a fuzzy set. In the present analysis, the shape and parameters of the membership function µ R (u − r) of the fuzzy set R are decided based on the observed distribution of (u − r) colour of SDSS galaxies. However one may choose to use a different shape (e.g. linear) for the membership function and there are no clear recipes for choosing it. This ambiguity can be avoided by training an artificial neural network with sample data to learn about the membership function. This can help a more efficient construction of the membership function. Further, we are using Type-1 fuzzy sets whose membership functions are crisp sets. So there are no uncertainties in the membership functions of the fuzzy sets defined here. One can include the uncertainties in the membership functions by using Type-2 fuzzy sets. A Type-2 fuzzy set is a fuzzy set whose membership function is a Type-1 fuzzy set. We plan to address these issues in a future work.
Finally, we note that fuzzy set theory has great potential to become an useful tool in highly data driven fields of astrophysics and cosmology. An imprecise boundary between different class of objects and their properties are quite common in astronomical datasets. In cosmology, reproducing different observed galaxy properties using various semi-analytic models help us to fine-tune our understanding of galaxy formation and evolution. There are complex interplay between various factors (e.g. density, environment) and galaxy properties which leads to large scatter in their values and relationships with each other. Fuzzy relations and their compositions in such situations may help us understand their relationships from a different perspective and can serve as a complementary measure to other existing methods.