Tracing the green valley with entropic thresholding

The green valley represents the population of galaxies that are transitioning from the actively star-forming blue cloud to the passively evolving red sequence. Studying the properties of the green valley galaxies is crucial for our understanding of the exact mechanisms and processes that drive this transition. The green valley does not have a universally accepted definition. The boundaries of the green valley are often determined by empirical lines that are subjective and vary across studies. We present an unambiguous definition of the green valley in the colour-stellar mass plane using the entropic thresholding. We first divide the galaxy population into the blue cloud and the red sequence based on a colour threshold that minimizes the intra-class variance and maximizes the inter-class variance. Our method splits the region between the mean colours of the blue cloud and the red sequence into three parts by maximizing the total entropy of that region. We repeat our analysis in a number of independent stellar mass bins to define the boundaries of the green valley in the colour-mass diagram. Our method provides a robust and natural definition of the green valley.


Introduction
The luminous component of the matter distribution in the present universe is primarily represented by the galaxies.The galaxies have a wide variations in their physical properties.Such broad variations indicate a range of possible formative and evolutionary pathways.Classifying the galaxies in the nearby universe based on their physical properties is a step forward towards understanding their formation and evolution.
The distribution of several galaxy properties exhibit a distinct bimodality.The colour is one among them.It is the fundamental property of a galaxy characterizing its stellar population.The distribution of galaxy colour is known to be strongly bimodal (Strateva et al. 2001;Blanton, et al. 2003;Bell, et al. 2003;Balogh, et al. 2004;Baldry, et al. 2004b).The populations corresponding to the two peaks in the distribution of galaxy colour are usually referred to as the 'blue cloud' (BC) and the 'red sequence' (RS).The two representative populations have noticeably different physical properties.The differences between the physical properties of the two populations have been studied in numerous works (Strateva et al. 2001;Blanton, et al. 2003;Kauffmann et al. 2003;Baldry, et al. 2004b).The galaxies in the blue cloud ⋆ E-mail: biswap@visva-bharati.ac.in host younger stellar populations which are actively star forming.Their morphology is disk-like and they have a lower stellar mass.Contrarily, the galaxies in the red sequence are mostly passive and represented by an older stellar population.They mostly have a bulge-dominated morphology and higher stellar mass.However, these correlations are not absolute.Observations show a significant number of ellipticals and spirals in the blue cloud and the red sequence respectively (Schawinski et al. 2009;Masters et al. 2010) suggesting a complex evolutionary history of the galaxies.
The bimodality in the colour distributions of the galaxies is not only seen in the nearby universe but is also observed for the galaxies at higher redshifts (Bell, et al. 2004;Brammer et al. 2009).Madau et al. (1996) shows that the star formation rate sharply declines after z = 1 which indicates a significant evolution in the galaxy properties.The luminosity function of the galaxies in the red sequence has doubled since z ∼ 1 (Bell, et al. 2004;Faber et al. 2007) indicating an ongoing transition of the galaxies from the blue cloud to the red sequence.Such transitions may occur due to a number of different secular and environmental processes or mechanisms.Some possible physical mechanisms that may be responsible for such transformations are strangulation (Gunn & Gott 1972;Balogh, Navarro, & Morris 2000), galaxy harassment (Moore et al. 1996;Moore, Lake, & Katz 1998), starvation (Larson, Tinsley, & Caldwell 1980;Somerville & Primack 1999;Kawata & Mulchaey 2008), ram pressure stripping (Gunn & Gott 1972) and gas expulsion through starburst or AGN (Cox et al. 2004;Murray, Quataert, & Thompson 2005;Springel, Di Matteo, & Hernquist 2005).The cessation of star formation may also be driven by a number of other physical processes such as mass quenching (Birnboim & Dekel 2003;Dekel & Birnboim 2006), bar quenching (Masters et al. 2010) and morphological quenching (Martig et al. 2009).A proper understanding of these physical processes and mechanisms and their roles in producing the observed bimodality is a crucial requirement for modelling galaxy formation and evolution.The observed bimodality must be explained by the successful models of galaxy formation.Various semi-analytic models of galaxy formation have been used to explain the observed bimodality (Menci, et al. 2005;Driver, et al. 2006;Cattaneo, et al. 2006Cattaneo, et al. , 2007;;Cameron, et al. 2009;Trayford et al. 2016;Nelson, et al. 2018;Correa, Schaye & Trayford 2019).
A number of different definitions for the blue cloud and red sequence have been proposed in the literature.Strateva et al. (2001) propose a colour cut of (u − r) = 2.22 to separate the galaxies in blue cloud and red sequence.Baldry et al. (2004a) separate the red and blue galaxies by fitting a double-Gaussian function to the observed (u − r) colour.The observed colour bimodality is sensitive to the luminosity, stellar mass and the environment (Balogh, et al. 2004;Baldry, et al. 2006;Pandey & Sarkar 2020).This advocates the use of other galaxy properties along with colour to separate the blue cloud and the red sequence.There is an extensive literature on this topic.Numerous works in the literature propose to separate the blue cloud and the red sequence in the colour-magnitude plane (Baldry, et al. 2004b;Faber et al. 2007;Fritz et al. 2014), colour-stellar mass plane (Taylor, et al. 2015) or the colour-colour plane (Williams et al. 2009;Arnouts et al. 2013;Fritz et al. 2014) using different empirical lines.
The narrow region between the blue cloud and the red sequence is generally termed as the 'green valley'(GV) (Wyder et al. 2007).The green valley represents a critical phase in the evolution of galaxies over cosmic time.The galaxies in the green valley are in a transitional phase, moving from being actively star-forming to becoming quiescent.They contain a mix of stellar populations, with both young and old stars coexisting.They would allow us to understand the stellar population synthesis, stellar evolution, and the overall demographics of galaxies.There is no unique evolutionary route for the galaxies that leads them from the blue cloud to the red sequence through the green valley.The multiple possible evolutionary pathways from the blue cloud to the red sequence is a major challenge in understanding the transitioning galaxies in the green valley.
Identifying the green valley population is crucial for unraveling the complex processes that quench the star formation in galaxies.The different existing methods for separating the blue cloud and the red sequence are mostly empirical.Galaxies in the green valley are intermediate between the blue and the red galaxies and are generally regarded as contamination in either sample.It is difficult to define the green valley from any of these definitions in a precise manner.Determining the exact criteria for classifying galaxies in the green valley is quite subjective and vary across different studies.Schawinski et al. (2014) provide a defini-tion of the green valley using two empirical lines in the colour-stellar mass plane.Bremer et al. (2018) divide the red, blue and green galaxies using three broad colour bins based on the surface density of points in the colour-mass plane.Coenda, Martínez, & Muriel (2018) define the green valley in the (NUV-r) colour-stellar mass diagram using empirical lines and study the properties of the transitional galaxies in different environments.Eales et al. (2018) analyze the galaxies selected at submillimetre wavelength and argue that the green valley may not represent a true third population, but merely a smooth transition from the blue cloud towards the red sequence.Angthopo, Ferreras, & Silk (2019) propose a definition of the green valley using the 4000Å break strength.This definition has been used for a detailed study of the stellar populations in green valley galaxies (Angthopo, Ferreras, & Silk 2020).Pandey (2020) use a fuzzy set theory based method to classify the red, blue and green galaxies in the SDSS.Das, Pandey, & Sarkar (2021) and Sarkar, Pandey, & Das (2022) use this classification to study the properties of the green valley galaxies and red spirals in different environments.Quilley & de Lapparent (2022) relate the morphology of galaxies to their evolution and redefine the green valley using the mean colour of Hubble types.Noirot et al. (2022) use the NUVrK colour-colour diagram to identify the blue cloud, green valley and the red sequence.More recently, Estrada-Carpenter et al. (2023) define the green valley using the shape of the log(sSFR) distribution and study the morphological evolution of the transitional galaxies in the CLEAR survey.Brambila et al. (2023) use empirical lines to define the green valley in the SFRstellar mass plane and explore the roles of different environments in the quenching of transitional galaxies.
Most of the definitions of the green valley are either empirical or based on certain user defined parameters, which lack solid mathematical justifications.Recently, Pandey (2023) propose a parameter free method to separate the galaxies in the blue cloud and the red sequence using Otsu's method for image segmentation.However, there are no provision for identifying the green valley in their method.
The goal of this work is to present a method for identifying the green valley based on the entropic thresholding (Pun 1981;Kapur et al. 1985).The method would provide an unambiguous definition of the green valley based on the distribution of galaxies in the colour-stellar mass plane.The advantage of this method is that the green valley can be identified solely based on the distribution alone without relying to any empirical lines or user defined parameters.
The plan of the paper is as follows.We describe the data in the Section 2, explain the method in Section 3, discuss our results in Section 4 and present our conclusions in Section 5.

SDSS data
We use the data from the Sloan Digital Sky Survey (SDSS) (York et al. 2000).The SDSS is the largest redshift survey of the nearby Universe.It has gathered the images and spectra of millions of galaxies in the universe with unprecedented accuracy.The availability of a large number of galaxies in the SDSS makes it ideally suited for a statistical analysis of the galaxy bimodality.We obtain the data from the SDSS DR16 (Ahumada et al. 2020) by using a SQL in the SDSS SkyServer †.We select a contiguous region of the sky that spans 135 • ≤ α ≤ 225 • and 0 • ≤ δ ≤ 60 • in the equatorial co-ordinates.We extract the information of all the galaxies in this region that lie within redshift z < 0.3 and have r-band Petrosian magnitude within 13.5 ≤ r p < 17.77.We construct a volume limited sample of galaxies from this dataset by applying a cut −23 ≤ M r ≤ −21 to the K-corrected and extinction corrected r-band absolute magnitude.The resulting volume limited sample lies between the redshift limit 0.041 ≤ z ≤ 0.120 and contains a total 103984 galaxies.The stellar masses and the specific star formation rates (sSFR) of the galaxies in our volume limited sample are obtained from a catalogue based on the Flexible Stellar Population Synthesis model (Conroy, Gunn & White 2009).We utilize the concentration index r 90 r 50 (Shimasaku et al. 2001) to assess the morphology of galaxies.Here, r 90 and r 50 represent the radii encompassing 90% and 50% of the Petrosian flux, respectively.The values for r 90 and r 50 for each galaxy are extracted from the photoObj table.Additionally, Brinchmann et al. (2004) provide an emission line classification of galaxies using the BPT diagram developed by Baldwin, Phillips, & Terlevich (1981).This classification is recorded in the 'bptclass' variable within the galSpecExtra table.We retrieve this classification data for all galaxies and identify AGNs based on their 'bptclass'.We use a ΛCDM cosmological model with Ω m0 = 0.315, Ω Λ0 = 0.685 and h = 0.674 (Planck Collaboration et al. 2020) for our analysis.

Entropic thresholding
The information entropy (Shannon 1948) quantifies the uncertainty in the measurement of a random variable.In other words, it is a measure of the amount of information necessary to describe a random variable.One can define the information entropy H(X) associated with a discrete random variable X as, where X has a total n possible outcomes and p(x i ) is the probability of the i th outcome.
In the context of image processing, entropy is a measure of the amount of information or randomness in an image.It is often used to quantify the level of uncertainty or disorder in the pixel intensity values within the image.The high entropy images are more complex with a wide range of pixel values, whereas the low entropy images are more uniform and simple.Image segmentation is the process of partitioning an image into specific regions and extract objects or features of interest.The image segmentation techniques assume that the object and the background in the image have different gray-level distributions.One can divide the pixels into foreground and background by applying different intensity thresholds and then calculate the sum of their entropies.The image is optimally thresholded when the sum of the two † https://skyserver.sdss.org/casjobs/class entropies reaches its maximum.Any incorrect separation of the foreground and background pixels would reduce the total entropy of the image from its maximum value.
We first describe the idea of entropic thresholding in the context of image segmentation.Kapur et al. (1985) propose an algorithm for distinguishing objects from the background in a gray-level image.In any thresholding technique, the pixels having intensities greater than a threshold are identified as a part of the object whereas the remaining pixels are labelled as the background.One can choose the threshold intensity based on certain mathematical definitions.The method proposed by Kapur et al. (1985) is based on the idea of information entropy.If N is the total number of pixels in the image and f 1 , f 2 , ...., f n are the gray-level frequencies in n different bins then the probability of the i th gray-level is given by p i = f i N .The sum of all these probabilities n i=1 p i = 1 by definition.If the threshold intensity corresponds to the s th bin then one can define the probability of the object as, The The information entropy associated with the intensity distribution of the object is given by, where Similarly, the entropy associated with the intensity distribution of the background can be expressed as, where The idea is to choose a threshold that maximizes the total entropy H O + H B .The same idea can be extended to any number of objects superimposed on the same background.This thresholding technique is based on the entropy maximization principle and is a natural choice in many situations.We use this thresholding technique for the identification of the green valley that lies between the blue cloud and the red sequence.We first apply Otsu's method to determine an optimal threshold that divides the galaxies into two population by minimizing the intra-class variance and maximizing the inter-class variance.The intervening region between the mean colours of the two populations is then split into three parts using the entropic thresholding.The green-valley is represented by the region of the PDF bounded by the two red lines in the middle.We apply this technique to a number of independent stellar mass bins to determine the boundary of the green-valley in the colour-stellar mass plane and show it in Figure 2.
Figure 2.This shows the distribution of the galaxies in the colour-stellar mass plane where each yellow dot represents a galaxy.The contours represent regions with different density of points.The highest density region is bounded by the innermost contour.The boundary of the green valley in this work is shown using two solid red lines.We obtain these lines by applying the entropic thresholding to a number of independent stellar mass bins.The entropic thresholding is applied in the region between the mean colours of the blue cloud and the red sequence (the region between the two solid green lines) associated with each mass bin.The green valley between the two empirical lines (blue-dashed lines) defined in Schawinski et al. (2014) is shown together for a comparison.

Defining the blue cloud and the red sequence using the
Otsu's method Otsu (1979) propose a thersholding technique for separating the foreground and background pixels in a gray-level image, which is ideally suited for a bimodal distribution of pixel intensities.The method provides an optimal threshold that the 'intra-class variance' and maximizes the 'inter-class variance'.This method has been recently used by Pandey (2023) to separate the galaxies in the blue cloud and the red sequence.The Otsu's method is a natural choice for separating the blue cloud and the red sequence from the bimodal (u − r) colour distribution.We briefly outline the method proposed in Pandey (2023).
We first calculate the probability distribution of (u − r) colour of the SDSS galaxies in our volume limited sample using n bins.The probability associated with the i th colour bin is p i = f i N where N is the total galaxies in our sample and f i is the number of galaxies in the i th colour bin.If the (u − r) colour threshold corresponds to the k th bin then all the galaxies in the bins [1, ...., k] would belong to the blue cloud whereas the galaxies in the remaining bins [k + 1, ...., n] would represent the red sequence.The probabilities of the class occurrences for the two populations can be simply written as, and We iterate through all the possible (u − r) colour thresholds and estimate the class means for the blue cloud and the red sequence for each threshold.These are given by, and where, (u− r) i is the (u− r) colour corresponding to the i th bin.
Clearly, we have P BC + P RS = 1 for each and every threshold.
One can also determine the variances in the (u− r) colour of the blue cloud and the red sequence defined at each threshold as, and The threshold for the desired separation can be obtained by minimizing the intra-class variance σ 2 intra and maximizing the inter-class variance σ 2 inter .These can be expressed as, and The intra-class and inter-class variances depend on the chosen threshold.However, their sum σ 2 total = σ 2 intra + σ 2 inter is independent of the threshold.
The optimal threshold for the separation of the blue cloud and the red sequence is the one which simultaneously minimizes the intra-class variance and maximizes the interclass variance.Pandey (2023) show that this optimal threshold is insensitive to the choice of the number of bins.

Defining the green valley between the blue cloud and the red sequence with entropic thresholding
The optimal threshold from the Otsu's method divides the galaxies into the blue cloud and the red sequence but does not help to define the green valley.The galaxies that are transitioning from the blue cloud to the red sequence reside near the boundary, and are generally treated as contamination in either sample.Our primary interest is the identification of the transitional green valley that must lie somewhere in the intervening region between the blue cloud and the red sequence.Consequently, we require to focus only on the intervening region between the mean colours of the two populations.The class means µ BC and µ RS corresponding to the optimal threshold from the Otsu's method determine the target region for the entropic thresholding.The galaxies with (u − r) < µ BC are undoubtedly a part of the blue cloud and those with (u − r) > µ RS are definitely a part of the red sequence.
The traditional entropic thresholding determines a single threshold to segment an image into two regions (typically foreground and background).The multilevel entropic thresholding extends this approach to partition the image into multiple regions based on different threshold values.This can be particularly useful for segmenting images with more than two distinct regions.The process of multilevel entropic thresholding is similar to single-level entropic thresholding but involves iteratively determining multiple threshold values.Thus the entropic thresholding also allows one to separate multiple objects superimposed on a background (subsection 3.1).The primary advantage of this method is that it can be applied to situations where the distributions are not bimodal or multimodal.This makes it suitable for the identification of the green valley in the present context.The green valley is in transitional phase and is intermediate between the blue cloud and the red sequence.In this work, we are interested to use this technique for tracing the green valley embedded between the blue cloud and the red sequence.Three different galaxy populations exist between (u − r) = µ BC and (u − r) = µ RS .Our goal is to optimally separate the three populations in this region.We employ the entropic thresholding described in subsection 3.1 to split the intervening region into three parts and distinguish the green valley from the blue cloud and the red sequence.
The entropy corresponding to the colour distributions of the blue cloud, the green valley and the red sequence can be respectively written as, and We iterate through all the possible thresholds between (u − r) = µ BC and (u − r) = µ RS and choose two thresholds s 1 and s 2 (where s 1 < s 2 ) in the interval [0, n] that maximizes the total entropy H total = H BC + H GV + H RS .The two thresholds s 1 and s 2 optimally separate the blue cloud, the green valley and the red sequence.Maximizing the total entropy ensures that the different classes are optimally separated.The primary advantage of this method is that it is solely based on the entropy maximization principle and does not rely on any user defined relations or parameters.
It may be noted that there are clear relations between the galaxy colour and the stellar mass or luminosity.So an application of the method to obtain s 1 and s 2 from the entire dataset can not define the green valley in an effective manner.
We apply our method to a number of independent stellar mass bins.We first calculate µ BC and µ RS corresponding to each stellar mass bin and then apply the entropic thresholding between these (u − r) colours to obtain the values of s 1 and s 2 associated with that bin.This provides us two lines separating the green valley from the rest of the galaxies in the colour-stellar mass plane.The same method can be also adopted to define the green valley in the colour-magnitude plane or stellar mass-SFR plane.

Results and discussions
We first demonstrate our method by applying it to the entire dataset and find out the two colour thresholds that define the green valley in the (u − r) colour distribution of the galaxies.We calculate the PDF of the (u − r) colour using 50 bins and then apply the Otsu's method to divide the entire distribution into two populations (blue and red).The Otsu's method provides an optimal colour threshold that minimizes the intra-class variance and maximizes the inter-class variance of the two populations.This threshold is insensitive to the choice of the number of bins (Pandey 2023).The green valley must lie in the region between the mean colours of the two populations.The galaxies with (u− r) colour smaller than the mean colour of the blue cloud are part of the blue cloud itself.Similarly, the galaxies with (u − r) colour greater than the mean colour of the red sequence can only belong to the red sequence.The class uncertainty is only a characteristic of the intervening region between the mean colours of the two population.We split this region into three parts using the entropic thresholding.The three consecutive parts belong to the blue cloud, the green valley and the red sequence.The green valley is sandwiched between the blue cloud and the red sequence, which is shown with two vertical red lines in Figure 1.It may be noted that the mean colours of the two populations in Figure 1 seem to be different from what one would expect by fitting a double-Gaussian to the colour distribution.These mean colours are obtained from the Otsu's method that separates the two populations by minimizing the within-class variance and maximizing the between-class variance.
Observations suggest that the colour is strongly correlated with the stellar mass.Consequently, two (u − r) colour thresholds are incapable to describe the green valley in the entire dataset.We repeat our analysis in a number of independent stellar mass bins and obtain a pair of colour thresholds corresponding to each mass bin.This provides us two boundary lines describing the green valley in the colourstellar mass plane.The green valley is shown with two solid red lines in Figure 2. The two solid green lines in this figure represent the variations of the mean (u − r) colours of the blue cloud and the red sequence with the stellar mass.The number of bins n used for entropic thresholding is a free parameter in our method.We repeat our analysis for three different choices of n (n = 10, n = 20 and n = 30) and find that the boundary of the green valley does not depend on the choice of n.This implies that our method provides a robust definition of the green valley.Schawinski et al. (2014) use two empirical lines to define the green valley population in the colour-mass diagram.We show these empirical lines (blue dashed lines) in the colour-mass diagram of our data in Figure 2 for a comparison.Clearly, the green valley defined by the empirical lines of Schawinski et al. (2014) is much broader than the green valley identified by our method.It may also be noted that the slopes of the dividing lines in our method change with the stellar mass whereas they remain fixed for the empirical lines defined in Schawinski et al. (2014).These differences may have important consequences for any analysis with the green valley galaxies.It would be intriguing to compare the star formation rate, stellar mass, morphology, and AGN activity of the quiescent galaxies identified by our method with those identified by the empirical method (Schawinski et al. 2014).
The application of our method and the empirical method to the same SDSS dataset yield 14110 and 29184 green valley galaxies, respectively.We first compare the specific star formation rates (sSFR) of the green valley galaxies that are identified by the two methods.The results are shown in Figure 3.We find that the sSSFR distributions of the green valley galaxies identified by both the methods peak between 10 −10 − 10 −11 /yr.Several studies at low redshift show that the sSFR distribution of the actively star forming galaxies is skew-lognormal with a peak ∼ 10 −10 /yr and a tail extending towards lower sSFR (Wetzel, Tinker, & Conroy 2012;Eales et al. 2018).The low sSFR tail primarily represents the quiescent galaxies.The actively star forming galaxies can be separated from the quiescent galaxies by applying a fixed sSFR boundary at log(sS FR/yr −1 ) = −10.5 (Leja et al. 2022;Black & Evrard 2022).This cut shows a mild stellar mass dependence that indicates that a lower sSFR cut is necessary to separate the quiescent galaxies from the starforming galaxies at higher stellar masses (Choi et al. 2014).Nonetheless, the galaxies with log(sS FR/yr −1 ) < −10.5 are Figure 3.This shows the distribution of the specific star formation rates (sSFR) among the green valley galaxies identified through our methodology.We present the sSFR distribution for the green valley galaxies that fall between the two empirical lines delineated in Schawinski et al. (2014), alongside the distribution for galaxies excluded by our method, for the of comparison.The 1σ Poisson errorbars are shown at each data point.mostly quiescent.Figure 3 shows that the primary difference between the two green valley populations identified by our method and Schawinski et al. (2014) lies in the abundance of the green valley galaxies near the peak and the tail of the sSFR distribution.We observe that the empirical method shows a higher abundance of the green valley galaxies compared to our method at log(sS FR/yr −1 ) > −10.5.On the other hand, there are more green valley galaxies at log(sS FR/yr −1 ) < −10.5 in our method compared to the empirical method.A higher abundance of the green valley galaxies at log(sS FR/yr −1 ) < −10.5 indicates that our method is somewhat better at identifying the quiescent population.A greater abundance of the green valley galaxies at log(sS FR/yr −1 ) > −10.5 in the empirical method perhaps shows more interloping of the green valley from the actively star-forming galaxies in the blue cloud.Our method excludes  a significant number of galaxies from the green valley as identified by the empirical method.Additionally, we analyze the specific star formation rate (sSFR) distribution of the excluded population, as illustrated in Figure 3.It is evident that the excluded population contains a higher proportion of actively star-forming galaxies and a lower proportion of quiescent galaxies compared to the green valley population identified by our method.The disparities between the em-pirical method and our approach are primarily driven by the excluded population.The 1σ Poisson error bars displayed at each data point emphasize the statistical significance in the sSFR differences of the green valley populations identified by the two methods.
The green valley defined in our method and the empirical method have little to no overlap at lower (log M M ⊙ < 10.2) and higher masses (log M M ⊙ > 11.4) (Figure 2).It may intro-duce some differences in the stellar mass distribution of the green valley populations in the two methods.The stellar mass distribution of the green valley galaxies in the two methods are compared in Figure 4.It shows that the stellar mass distribution of the green valley galaxies peaks at log M M ⊙ ∼ 10.85 for both the methods.However, the empirical method exhibits a broader peak compared to our method.The tails of the stellar mass distribution extend to similar masses in both methods.However, compared to our method, the empirical method yields a greater fraction of green galaxies near the peak and a lower fraction of green galaxies near the tails of the stellar mass distribution.It suggests that the empirical definition is less sensitive in identifying the quiescent galaxies at lower and higher masses.We also examine the stellar mass distribution of the green valley population that is excluded by our method (Figure 4).The differences between the stellar mass distributions of the excluded population and the population identified by our method are somewhat more pronounced, although they exhibit similar trends.The 1σ Poisson error bars depicted at each data point highlight statistically significant differences in the stellar mass distributions of these populations within the intermediate mass range.
We compare the morphology of green valley galaxies as identified by the empirical method, our method, and the population excluded by our method in Figure 5.We observe that the distributions of the concentration index r 90 r 50 are similar across all three methods.However, we note statistically significant differences in the distribution amplitudes within specific ranges: there is a notably higher amplitude between 3 < r 90 r 50 < 3.6 and a lower amplitude between 2.2 < r 90 r 50 < 2.6 in the green valley identified by our method compared to the other two populations.It is established that r 90 r 50 = 2.3 signifies a pure exponential profile (Strateva et al. 2001), while r 90 r 50 = 3.33 describes a pure de-Vaucouleurs profile (Blanton et al. 2001).Therefore, a higher concentration index is typically associated with ellipticals and bulgedominated systems, whereas a lower value (< 2.6) is associated with disk-dominated spiral galaxies (Strateva et al. 2001).This suggests that the green valley population identified by our method consists of a higher proportion of bulge-dominated systems and a lower proportion of diskdominated systems compared to both the empirically defined population and the excluded population.
Several observational studies suggest that AGNs may play a crucial role in quenching star formation in green valley galaxies (Nandra et al. 2007;Cimatti et al. 2013;Zhang et al. 2021).We compare the AGN fraction as a function of stellar mass for green valley galaxies identified by our method and the empirical method in Figure 6.The results for the excluded population are also presented in the same figure.Notably, compared to the other two populations, the green valley identified by our method exhibits a statistically significant higher fraction of AGNs in the mass range 10.5 < log M M ⊙ < 11.3.However, the AGN fraction is relatively lower at smaller and higher masses for the green valley galaxies in our method compared to the other two populations.Nevertheless, the differences are not statistically significant due to relatively larger error bars at low and high masses.
These comparisons do not show the superiority of our method in an absolute sense.Nevertheless, our analysis shows the resulting differences in the physical properties of the green valley galaxies arising out of their selection.

Conclusions
We propose a method for identifying the green valley galaxies and apply it to the SDSS data.The proposed method is based on two different methods of image segmentation (Kapur et al. 1985;Otsu 1979).The resulting green valley in our method occupies a lesser area in the stellar mass-colour plane compared to the empirical method.The slope of the boundary lines of the green valley in our method changes with the stellar mass, whereas they remain the same at all masses in the empirical approach.
The comparison of sSFR distributions between green valley populations identified by our method and the empirical method reveals notable differences in the abundance of galaxies near the peak and tail of the distribution.Our method tends to identify more quiescent galaxies at lower sSFR values, suggesting its effectiveness in distinguishing between actively star-forming and quiescent populations.The differences are even higher for the empirically defined green valley galaxies that are excluded by our method.The analysis of the stellar mass distribution further emphasizes differences between green valley populations identified by different methods.While both methods exhibit similar peak locations, the empirical method yields a broader distribution with a higher fraction of green galaxies near the peak and a lower fraction near the tails compared to our method.The differences are somewhat larger for the population excluded by our method.We also observe disparities in the morphology of the green valley populations between the two methods.Our method suggests a greater prevalence of bulge-dominated systems and a reduced occurrence of diskdominated systems compared to both the empirically defined population and the excluded population.Our method identifies a higher fraction of AGNs in the intermediate mass range and a lower AGN fraction at lower and higher masses.The differences in AGN fractions between populations are not statistically significant at low and high masses, suggesting potential complexities in the relationship between AGN activity and green valley galaxies.
The observed differences have some implications on the evolutionary history of the quiescent galaxies in the green valley.The galaxies may quench star formation following different evolutionary routes.The physical processes/mechanisms responsible for quenching are different in higher and lower mass galaxies.
The galaxies are gas fed from the cosmic streams (Dekel & Birnboim 2006) and the circumgalactic medium (Maller & Bullock 2004).A fraction of the gas reservoir is converted into stars over every dynamical time.The star formation decays with age as the gas reservoir is depleted over time.Besides this natural ageing, galaxies can reach a quiescent stage due to the physical processes/mechanisms that either prevent the accumulation of gas or prevent gas from forming stars or expel gas from the galaxy.A host of physical processes/mechanisms have been proposed for quenching in the green valley.However, it is difficult to correctly assess the relative roles of these processes/mechanisms in quenching the galaxies.A precise identification of the green valley can provide valuable insights into several aspects of galaxy evolution.The study of the green valley galaxies would reveal the different physical processes and quenching mechanisms that are responsible for the suppression of star formation in the transitional galaxies.
The green valley population identified by our method occupies a narrower region in the colour-stellar mass plane.The extent and size of the green valley in the stellar masscolour plane can offer insights into the quenching timescales.Galaxies spending more time in the green valley may undergo a gradual quenching process, while those traversing it swiftly may experience a rapid cessation of star formation.A smaller area of the green valley may suggest a shorter quenching duration, reflecting the effectiveness of processes transitioning galaxies from star-forming to quiescent states.This implies a faster transition through this phase, possibly indicating more efficient or intense mechanisms driving the halt of star formation, such as gas depletion or feedback mechanisms.Galaxies undergoing strong interactions or mergers with other galaxies might experience rapid cessation of star formation due to gas influx triggering intense starbursts followed by swift depletion.Likewise, galaxies hosting active galactic nuclei (AGN) might encounter feedback processes that promptly suppress star formation.Several earlier studies (Ferrarese & Merritt 2000;Häring & Rix 2004;Kauffmann & Heckman 2009;Banerjee, Pandey, & Nandi 2023) have identified the presence of a bulge and gas availability as crucial requirements for AGN activity.Our analysis reveals that the green valley population identified by our method tends to host a higher proportion of bulge-dominated systems and a greater fraction of AGN.This suggests that both merger events and AGN activity may play significant roles in suppressing star formation in green valley galaxies, thereby reducing the quenching timescale.
A caveat in our analysis is that we identify the green valley using only optical colour.A number of studies suggest that the (NUV-r) or UV-optical colours are superior for the identification of the green valley (Wyder et al. 2007;Salim 2014).The primary advantage of the UV over optical is that it can detect even a low level of ongoing star formation.We plan to use different colours to identify the green valley with our method and carry out an in depth study of the green valley galaxies in a future work.
Finally, we note that the green valley does not have a universally accepted definition.Different studies define the green valley using empirical lines in the colour-mass, colourmagnitude or mass-SFR plane.The exact criteria for identifying the green valley remains subjective.Keeping this subjectivity in mind, we propose a new definition of the green valley using the entropic thresholding.The entropic thresholding is based on the idea of maximum entropy which is more general in nature.The boundary of the green valley in our method is exclusively decided by the data.We conclude that our method provides a natural and robust definition of the green valley.

Figure 1 .
Figure1.This figure shows the green valley in the (u − r) colour distribution of the entire volume limited sample.We first apply Otsu's method to determine an optimal threshold that divides the galaxies into two population by minimizing the intra-class variance and maximizing the inter-class variance.The intervening region between the mean colours of the two populations is then split into three parts using the entropic thresholding.The green-valley is represented by the region of the PDF bounded by the two red lines in the middle.We apply this technique to a number of independent stellar mass bins to determine the boundary of the green-valley in the colour-stellar mass plane and show it in Figure2.

Figure 4 .
Figure 4. Same as Figure 3 but for the stellar mass.

Figure 5 .
Figure 5. Same as Figure 3 but for the concentration index.

Figure 6 .
Figure6.This figure compares the AGN fraction as a function of stellar mass among the green galaxies identified by our method, those identified by the empirical method(Schawinski et al. 2014), and those excluded by our method.The 1σ Binomial errorbars are shown at each data point.