Classification of Radio Galaxies with trainable COSFIRE filters

Radio galaxies exhibit a rich diversity of characteristics and emit radio emissions through a variety of radiation mechanisms, making their classification into distinct types based on morphology a complex challenge. To address this challenge effectively, we introduce an innovative approach for radio galaxy classification using COSFIRE filters. These filters possess the ability to adapt to both the shape and orientation of prototype patterns within images. The COSFIRE approach is explainable, learning-free, rotation-tolerant, efficient, and does not require a huge training set. To assess the efficacy of our method, we conducted experiments on a benchmark radio galaxy data set comprising of 1180 training samples and 404 test samples. Notably, our approach achieved an average accuracy rate of 93.36%. This achievement outperforms contemporary deep learning models, and it is the best result ever achieved on this data set. Additionally, COSFIRE filters offer better computational performance, ∼ 20 × fewer operations than the DenseNet-based competing method (when comparing at the same accuracy). Our findings underscore the effectiveness of the COSFIRE filter-based approach in addressing the complexities associated with radio galaxy classification. This research contributes to advancing the field by offering a robust solution that transcends the orientation challenges intrinsic to radio galaxy observations. Our method is versatile in that it is applicable to various image classification approaches.

modern radio surveys.These data sets can be terabytes to exabytes in size (Booth & Jonas 2012;Swart et al. 2022).
Radio galaxies can be divided into four primary categories based on their intrinsic morphological differences.The most important two are Fanaroff & Riley I (FRI) and Fanaroff & Riley II (FRII) (Fanaroff & Riley 1974).FRI galaxies have bright cores relative to their lobes (have decreasing luminosity from the core).FRII galaxies, on the other hand, have edge-brightened lobes with a core at the centre.The third and most common category of radio galaxies is the Compact class, which refers to point-like radio sources (Baldi et al. 2015(Baldi et al. , 2018)).The fourth category is Bent, which is composed of radio sources with jets that are bent at an angle, either in a narrow-angled tail (NAT) or a wide-angled tail (WAT) configuration (Rudnick & Owen 1976).While the Compact and FRI classes are quite distinctive, the FRII and Bent classes are much more challenging.In Fig. 1 we illustrate a few examples of each of the four classes as well as superimposed variants created by utilizing all images from the data set described in Table 1.In addition to the difficulties related to their shape characteristics, the classification of FRI, FRII, and Bent galaxies is further exacerbated by the fact that the galaxies in the data set have differing orientations.Upon visual inspection, it becomes apparent that each category exhibits a discernibly distinct brightness distribution.Nonetheless, it is worth noting that the FRII and Bent classes share substantial similarities, making it difficult to discern between the two classes hence a challenging task to differentiate them.
In this paper, we propose a lightweight paradigm that involves trainable COSFIRE (Combination of Shifted Filter Responses) filters (Azzopardi & Petkov 2012a;Azzopardi et al. 2016), which have been found effective in various computer vision applications.This approach is efficient, learning-free, rotation-tolerant, explainable, and does not require a huge training set.
The implementation of a COSFIRE-based classification pipeline is relatively easy and straightforward from a conceptual standpoint as described in Section 4. It involves the configuration of COSFIRE filters (Azzopardi & Petkov 2012a) whose selectivity of each filter is automatically determined from the shape properties of a single training example.The objective is to set up multiple filters, whose combined responses generate a feature signature for the type of galaxy present in an image.This approach is analogous to how visual cells in the mammalian brain encode visual information, a concept known as population coding (Pasupathy & Connor 1999, 2002).COSFIRE filters have been applied in various computer vision tasks: gender recognition (Azzopardi et al. 2016), contour detection (Azzopardi & Petkov 2012b) and handwritten digit classification (Azzopardi & Petkov 2013).In this work, COSFIRE filters are configured to extract the hyperlocal geometric arrangements that uniquely describe the patterns of radio sources (in terms of blobs) in a given image.
The rest of the paper is structured as follows.Section 2 presents the current state-of-the-art approaches.Section 3 describes the data set used in this study.Section 4 describes the proposed COSFIRE-based paradigm.Section 5 presents the evaluation criteria used to assess the performance of our approach.Section 6 describes the experiments and the results obtained.Section 7 provides a discussion of the results obtained in relation to the relevant work.Finally, we draw conclusions in Section 8.

RELATED WORKS
Different automated computer intelligence paradigms have been exploited to automate the rather daunting manual classification approaches that radio astronomers use to identify different types of radio sources (Ndung'u et al. 2023).Convolutional neural networks (CNNs) have dominated the field in recent years.In particular, Aniyan & Thorat (2017a), used the AlexNet architecture (Krizhevsky et al. 2017), calling the trained model Toothless2 , to achieve accuracies of 91%, 75% and 95% for the FRI, FRII, and Bent-tailed morphologies, respectively.Subsequently, notable incremental breakthroughs have been made in the applications of deep learning to the field of radio astronomy, ranging from shallow CNN architectures (Lukic et al. 2019) to deep and complex architectures such as DenseNet (Huang et al. 2017;Samudre et al. 2022).Furthermore, other noteworthy advancements within the field are: model-centric strategies such as group-equivariant CNNs (G-CNNs) (Scaife & Porter 2021) to support equivariance translations on various isometries of radio galaxies and multidomain multibranch CNNs (Tang et al. 2022) to allow models to learn jointly from various survey inputs; data-centric approaches such as data augmentation (Maslej-Krešňáková et al. 2021;Kummer et al. 2022;Ma et al. 2019;Slĳepcevic et al. 2022); transfer learning (Tang et al. 2019;Lukic et al. 2019) and -shot learning3 (Samudre et al. 2022) to overcome the limited availability of annotated data sets in radio astronomy.These models have been shown to perform competitively, providing promising alternatives to prior models such as the one by Aniyan & Thorat (2017a), which showed signs of overfitting.
Conventional machine learning has also been explored for the morphological classification of Fanaroff-Riley (FR) radio galaxies (Sadeghi et al. 2021;Becker et al. 2021;Ntwaetsile & Geach 2021;Darya et al. 2023).This paradigm involves extracting handcrafted features such as intensity, shape, lobe size, number of lobes, and texture from input radio image data, which are then utilized in machine learning algorithms (Brand et al. 2023).Previous studies have leveraged feature descriptors to capture the texture of radio images, such as Haralick features, which consist of a group of 13 non-parametric measures derived from the Grey Level Co-occurrence Matrix (Ntwaetsile & Geach 2021).Gradient boosting methods (Friedman 2002) such as XGBoost (Chen & Guestrin 2016), LightGBM (Ke et al. 2017) and CatBoost (Dorogush et al. 2018) have been explored (Darya et al. 2023).Research has shown that feature engineering can provide machine learning algorithms with highly significant features, leading to promising results (Sadeghi et al. 2021). Moreover, Darya et al. (2023) demonstrated that conventional machine learning (LightGBM, XG-Boost, and CatBoost) perform competitively to CNN-based models with deep learning on relatively small data sets (about 10,000 images and below).
CNNs are regarded as the state-of-the-art in various image classification applications (Aniyan & Thorat 2017a;Lukic et al. 2019;Scaife & Porter 2021;Tang et al. 2022).However, they require large amounts of training data and are susceptible to overfitting when trained with small data sets, which is the case in radio astronomy.Moreover, the high computational demands of deep architectures for training and applying CNNs often require GPUs, which can be costly and limit their applicability in resource-limited settings.Moreover, CNN-based models lack insufficient intrinsic robustness to rotations.To address rotational variations in radio sources, multiple approaches have been taken.One approach is to utilize group-equivariant CNNs, where the network is designed to capture the diverse orientation information of a given input galaxy in encoded form (Scaife & Porter 2021).Another method involves augmenting the training data by applying rotations to the training samples, enabling the CNNs to learn different orientations of the classes.Furthermore, a pre-processing step can be employed to standardize the rotation of all radio sources.This may be achieved by using principal component analysis (PCA) to align the galaxies' principal components with the coordinate system's axes, effectively normalizing their orientations (Brand et al. 2023).
As evident from this literature review, numerous challenges remain, including the need for efficient (computationally inexpensive) and rotationally invariant methods.In this work we address these limitations with the proposed COSFIRE filter approach.

DATA
The data set of radio galaxies used in this paper was compiled and processed by Samudre et al. (2022).It was constructed by selecting well-resolved radio galaxies from multiple catalogs: Proctor catalog (Proctor 2011) (derived from the FIRST4 survey, 2003) for the Bent radio galaxies; FR0CAT (Baldi et al. 2018) and CoNFIG5 catalogs (Gendre & Wall 2008;Gendre et al. 2010) for Compact radio galaxies; FRICAT (Capetti et al. 2017a) and CoNFIG catalogs for FRI radio galaxies and finally FRIICAT catalog (Capetti et al. 2017b) and CoNFIG catalogs for FRII radio galaxies.This data set is used in this paper with the objective of conducting comparative analyses of similar work done by Samudre et al. (2022).
The initial data set is composed of the following classes: Compact (406 samples), Bent (508 samples), FRI (389 samples), and FRII (679 samples).These samples are further divided into training, validation, and testing data sets as shown in Table 1.According to Samudre et al. (2022), the original data set's underrepresented classes were balanced by adding randomly duplicated samples to the training and validation data sets.Table 2 depicts the distribution of the balanced data set.
The images were pre-processed by utilizing sigma-clipping with a threshold of 3 (Aniyan & Thorat 2017b).This technique involves eliminating or discarding pixels that have background noise levels above or below 3 standard deviations from the mean (Aniyan & Thorat 2017b).
Although there exist more recent catalogs as described by Ndung'u et al. (2023), the latest catalogs, such as LoTSS (DR1 & DR2)6 (Shimwell et al. 2019;Shimwell et al. 2022), are not annotated into Compact, FRI, FRII and Bent classes.Instead, they are labelled by automated tools such as the Python Blob Detector and Source-Finder (PyBDSF) (Mohan & Rafferty 2015) that categorize astronomical sources into three types: 'S' for single isolated sources modeled with one Gaussian, 'C' for sources that are within a group but can be individually modeled with a single Gaussian, and 'M' for extended sources that need multiple Gaussians for accurate modeling.This system aids in the efficient identification and analysis of space emissions.

METHODS
This section gives a comprehensive overview of the methodology that we propose for radio galaxy image classification with the COSFIRE filter approach.We explain the process of radio source blob detection, the configuration of COSFIRE filters with rotation invariance, the formation of feature descriptors, and the utilization of these descriptors for classification.An overview of this pipeline is illustrated in Fig. 2.

Blob detection
Blob detection is a technique used for identifying points or regions in an image that exhibit a sudden change in intensity (areas that are either brighter or darker than the surrounding areas), known as a "blob".This approach enables the identification of regions that may correspond to objects or structures of interest within the image in our case radio source(s).In computer vision, one of the commonly used blob detectors is based on the Laplacian of Gaussian (LoG) (Wang et al. 2017(Wang et al. , 2020)).The LoG is the second-order derivative of the Gaussian function, denoted by  (, ): where  is the standard derivation.The Laplacian of Gaussian is typically estimated by a Difference-of-Gaussian (DoG) function, which is separable, and thus convolving it with a 2D image is much more computationally efficient.Given two Gaussian functions,  1 and  2 with their respective standard deviations  1 and  2 , the Gaussians are defined as: The DoG function is derived by subtracting two Gaussian functions with different standard deviations:

𝐷𝑜𝐺 (𝑥, 𝑦
For a point (, ) and an image  with intensity distribution  ( ′ ,  ′ ), we calculate the response   (, ) of a DoG filter with a kernel function DoG  ( −  ′ ,  −  ′ ) by convolution: where the rectification linear unit (ReLU) is an activation function that sets to zero all values below  1 and ★ represents convolution.
When  1 <  2 we refer to the resulting  function as a centeron  or  + for brevity, with the central region exhibiting a positive response and the surround exhibiting a negative response.Conversely, when  2 <  1 the configuration results in a center-off  function, which we denote by  − .In this work, we adopt the approach used by Azzopardi et al. (2015) and always set the smaller standard deviation to be half of the larger standard deviation.
The centre-on DoG filter highlights bright blobs on a dark background and is more sensitive to intensity increases at the centre of the blob.This filter responds well to objects and is less sensitive to edges or sharp changes in intensity.On the other hand, the centre-off DoG filter highlights dark blobs on a bright background and is more sensitive to intensity decreases at the centre of the blob.This filter responds well to corners and edges and is less sensitive to objects or regions of uniform intensity.Therefore, both centre-on and centreoff DoG filters are used to detect blobs or regions of interest in an image, see Fig. 3 for an illustration on Compact, FRI, FRII and Bent radio sources.The DoG filter is a type of band-pass filter that can eliminate high-frequency components that represent noise as well as some low-frequency components that represent homogenous areas.

COSFIRE filter configuration
A COSFIRE filter is automatically configured by examining the shape properties of a given prototype pattern of interest in an image, which ultimately determines its selectivity.The process of configuration can be summarized in three main steps: convolve-ReLU-keypoint detection.The first two steps involve the convolution of center-on and center-off DoG filters followed by ReLU as described above.Finally, keypoint detection requires the determination of local maximum thresholded DoG responses along a set of concentric circles around a point of interest.For this application, we use the center of the image as the point of interest, but in principle, any location can be used for this purpose.The point of interest is the location that best characterizes a pattern of interest.The number and radii of the concentric circles along with the threshold  1 used by the ReLU function are hyperparameters of the COSFIRE approach.
A keypoint denoted by , which is identified as a local maximum of a DoG filter along a concentric circle, is characterized by a fourelement tuple: (  ,   ,   ,   ).Here,   and   indicate the standard deviation of the outer Gaussian function of the DoG function and its polarity that achieved the highest response in the polar coordinates with radius   at an angle of   radians with respect to the given point of interest.We denote by   a COSFIRE filter, which is represented as a list of such tuples: where  refers to the number of DoG responses considered in   , which plays a crucial role in the selectivity and generalization of the COSFIRE filter.Typically, selectivity increases and generalization decreases with increasing value of  (i.e.number of keypoints).Fig. 4 illustrates an example of the automatic configuration of a

COSFIRE filter response
The response of a COSFIRE filter   for a given (, ) location is computed by combining the responses of the DoG functions whose scale   , polarity   and position (  ,   ) with respect to (, ) are indicated in the set   .The process of computing the response can be summarized in five main steps: convolve-ReLU-blur-shiftcombine.The convolve step refers to the convolution of center-on and center-off DoG filters followed by the ReLU operation that sets to zero all values below the given threshold  1 .These are the same two steps that were required for the configuration described above.
Then, in order to allow for some tolerance in the preferred positions of the involved DoG responses, we blur each DoG response with a Gaussian function   ′ ( ′ ,  ′ ), whose standard deviation  ′ is a linear function of the distance   :  ′ = ⌊ ′ 0 +   ⌋, with  ′ 0 and  being hyperparameters.Moreover, we shift the blurred responses in the direction opposite of the polar coordinates such that all DoG responses of interest meet at the same (, ) location.Finally, the combine step is implemented as suggested by Azzopardi & Petkov (2012a), where a COSFIRE filter response, which we denote by    , is obtained by the geometric mean function that combines all blurred and shifted thresholded DoG responses: where is the combined blurred and shifted DoG filter response map of tuple .The shifting operation displaces each blurred DoG response of interest to the support centre of the COSFIRE filter.The shift vector is denoted by (Δ  , Δ  ), where Δ  = −  cos   and

Tolerance to rotations
Tolerance to rotations can be attained by configuring multiple COSFIRE filters using rotated versions of a single prototype pattern.An effective approach to achieve this involves defining new filters by modifying the parameters of an existing COSFIRE filter.
For instance, consider a COSFIRE filter denoted as    , designed to be selective for the same underlying pattern that was employed to configure the original COSFIRE filter   , but rotated by  radians.This new filter is defined as follows: The rotation-tolerant response of a COSFIRE filter r  is then achieved by taking the maximum response across all COSFIRE filters selective for the same pattern at 12 equally-spaced orientations: r  (, ) = max     ( ,) ∀  ∈ {0, /6, . . ., 11/6} (10)

COSFIRE descriptor
For a given image, a COSFIRE descriptor, which we denote by  and define below, is generated by applying all COSFIRE filters in rotation-tolerant mode and extracting the maximum value from each filter, regardless of its location.Consequently, for a set of  filters, the resulting description of a given image is represented by a dimensional vector.
This concept is inspired by neurophysiology research, which suggests that the shape representation of a stimulus is based on the combined activity of a group of shape-selective neurons in area V47 (Wielaard et al. 2001;Azzopardi & Petkov 2012b;Weiner & Ghose 2015).

COSFIRE descriptor pre-processing
The only pre-processing step done to the COSFIRE descriptors is L2 normalization8 .Data normalization is a common preprocessing step in tabular machine learning tasks that transforms the input data into a more uniform scale across the features.In particular, the Support Vector Machines (SVM) objective function includes the distance between data points, making it sensitive to the scale of the input features.Therefore, non-normalized features with larger scales can dominate distance calculations and may hence affect the performance (Hsu et al. 2003).

Classification model
The SVM model, introduced by Cortes & Vapnik (1995), has been selected for the classification of COSFIRE-based image descriptors due to its capacity to handle high-dimensional data, manage outliers, and achieve robust generalization.The training of the SVM model is conducted using the COSFIRE descriptors extracted from a training set that encompasses four distinct classes of radio galaxies.Given that the training set is characterized by an imbalance in class distribution (as indicated in Table 1), a bagging (bootstrap aggregating) approach is employed to train an ensemble of classification models.This technique improves model generalization and suppresses any biases towards the majority classes (Breiman 1996).Figure 2 illustrates the entire process of training and inference.In the training phase, we employ a resampling technique with replacement to randomly select a balanced subset of 290 images from each class.The rationale behind this choice is as follows: we aim to create a balanced training data set for learning an SVM model.To achieve this balance, we calculate two-thirds of the size of the majority class (in this case, 434), which results in 290 images.By selecting 290 images from each class, we also ensure that the resulting balanced subset is roughly the same size as the original imbalanced training set.With the subsets, we then train a set of ten different SVM models.In the inference phase, we use the ten SVM models to perform classification by aggregating their predictions on the test data set.In practice, the subset size and the number of classifiers are two hyperparameters, which can be fine-tuned by cross validation.However, we choose to work with the mentioned default values as the fine-tuning of such parameters is beyond the scope of the proposed COSFIRE paradigm.

Accuracy
As in prior research on morphological classification (Lukic et al. 2019;Samudre et al. 2022), we evaluate the performance of the proposed COSFIRE approach following the widely adopted accuracy metric.To assess the performance of our model in the experiments, we leverage the validation set for determining the hyperparameter values.

Floating Point Operations (FLOPs)
FLOPs is a measure of computer performance that indicates the number of floating-point operations that a processor executes to complete a task.This measurement is commonly used in scientific programs/applications that heavily depend on floating-point (FP) calculations such as CNNs.We will evaluate the computational complexity of our approach by measuring the number of floating point operations (FLOPs) required, and then compare it to the computational complexity of DenseNet161 (Samudre et al. 2022).
FLOPs estimation for both algorithms is performed during the inference phase rather than the training phase of the classification process.This phase entails only the forward pass of input data through the model to produce predictions, focusing on the computational cost of executing the already trained model.

EXPERIMENTS AND RESULTS
In this section, we present and analyze the experimental results obtained in our research on radio galaxy classification.We also compare the performance and computational complexities of our method with other existing approaches in radio galaxy classification.

Performance
We conducted a series of experiments to evaluate the performance of the proposed trainable COSFIRE filter approach.As detailed in Section 4, our pipeline involves the automatic configuration of COSFIRE filters and their application to training, validation and test data sets for each distinct class (Bent, Compact, FRI, and FRII).Our primary objective was to attain the highest classification accuracy for distinguishing among the four radio source classes.Through this rigorous evaluation process, we conducted an in-depth exploration of the trainable COSFIRE filters' performance and their potential in the realm of radio source classification.
We used the validation data set to determine the hyperparameters of the COSFIRE filter configuration (,  9 ,  1 10 ), and application ( ′ 0 , and ).In our experimentation, we explored three values per hyperparameter:  ∈ {5, 6, 7},  ∈ {{0, 5, 10, 15, 20}, {0, 5, 10, 15, 20, 25}, {0, 5, 10, 15, 20, 25, 30}},  1 ∈ {0.05, 0.1, 0.15},  ′ 0 ∈ {0.5, 0.75, 1}, and  ∈ {0.1, 0.15, 0.2}.This resulted in a total of 243 unique parameter sets.For every unique set, we configured up to 100 COSFIRE filters for each class, leading to a total of 400 COSFIRE filters.The filters were configured by selecting random images from the training set.To compensate for the randomness of the image selection, we executed three experiments with the same set of hyperparameters and then took the average results across the three experiments.These experiments resulted in a 243 × 400 matrix of accuracy rates.We then identified the maximum accuracy rate for each row (i.e. for each set of hyperparameters).Importantly, multiple sets of hyperparameters yielded very close accuracies.Therefore, to account for all the hyperparameters generating similar results, we performed a right-tailed student -test statistic between the row with the global maximum accuracy rate and all the other rows.It turned out that 26 sets of hyperparameters yielded results that are not statistically different from the global maximum.The average maximum accuracy of the experiments with these 26 sets of hyperparameters was achieved with 93 COSFIRE filters per class on the validation set.
Subsequently, we employ the COSFIRE filters in conjunction with the corresponding classifiers, which are configured using 26 distinct sets of hyperparameters determined previously on the validation set, to the test data.The outcomes of this process are graphically represented in Fig. 5.This figure includes two principal plots illustrating the variation in accuracy rates as a function of the number of COSFIRE filters used.One plot delineates the performance when the filters operate in a rotation-invariant manner, while the other depicts the scenario without rotation invariance.Each plot is the average across the results obtained by the experiments using the 26 sets of hyperparameters, with the grey shading indicating the standard deviations.With 93 filters per class (as determined from the validation data) and rotation invariance, the average accuracy on the test set is 93.36% ± 0.57 (with the minimum being 92.24% and the maximum being 94.31%), whereas, without rotation invariance, the same filters yielded an average accuracy of 82.25% ± 2.06.This performance trend is visually depicted in Fig. 5, highlighting that as the number of filters increases, the disparity in performance between including and excluding rotation becomes more pronounced.Additionally, the figure shows that the model performance is more stable with rotation invariance than without it.The latter has high variability even with more COSFIRE filters. 9These are the radii used to configure COSFIRE filters. 10This is the threshold used by the ReLU function.1) as a function of the number of COSFIRE filters used.The plots demonstrate how the average accuracy rate over the experiments with the determined 26 sets of hyperparameters varies based on the number of COSFIRE filters used, considering both cases of rotation invariance and without rotation invariance.Specifically, employing 48 filters for each class approximates the 92% accuracy level of DenseNet161, as reported by Samudre et al. (2022).Using 93 filters per class, which led to the maximum average accuracy on the validation set, we achieve an accuracy of 93.36% on the test set.

Comparison with previous works
The most direct comparison to our study is the research that used DenseNet-161 (Huang et al. 2017) and Siamese networks (Koch et al. 2015) methodologies conducted by Samudre et al. (2022) on the same data.In their work, they conducted their experiments on two versions of the data: the original and the balanced data set as presented in Table 1 and Table 2, respectively.According to their reported results, Siamese networks and DenseNet-161 achieved the highest accuracies of 71.1% ± 0.40 and 91.2% ± 0.60, respectively, on the original data set.On the other hand, Siamese networks achieved 73.9% ± 0.50 and DenseNet-161 achieved 92.1% ± 0.40 accuracy on the balanced data set.Notably, our COSFIRE approach achieves an accuracy of 93.36% ± 0.57 without data augmentation or any further image preprocessing.This is more than 16% reduction in the classification error rate compared to the DenseNet-161 model (Samudre et al. 2022).The significant reduction in the error rate is mainly attributed to the rotation invariance properties of the COSFIRE approach.We did not perform runs on the balanced data set (Table 2) since this would not generate new insights, as the data set was augmented through image duplication of the minority classes.

Convolutional Neural Networks
Calculating the FLOPs in a convolutional neural network involves determining the number of arithmetic operations performed during the forward pass.This includes the multiplications and additions between the input data elements and the trainable convolution kernels.The FLOPs computation takes into account factors such as the input size, kernel size, number of filters, and the presence of padding, dilation, and stride.A CNN convolutional operation with padding set to 0, dilation set to 1, and stride set to 1, for simplicity (Freire et al. 2022), can be summarised as follows: In equation 12, the output of a specific convolutional layer is denoted by   , , which represents a feature map.Here, the superscript  indicates the filter index, while the subscripts , , and  represent the row, column, and channel indices of the output feature map, respectively.The trainable convolution kernel is denoted by   , , , and it performs element-wise multiplications with the corresponding input elements of image  of dimensions   ×   ×  11 .The bias term   is added to the sum, and the resulting value undergoes the activation function (•) to generate the output feature map element   , .This equation embodies the convolution operation in a CNN, where each output feature map element is computed by convolving the corresponding region of the input feature maps with trainable filters and introducing non-linearity through the activation function.
The output size of the CNN at each layer is important when calculating the FLOPs of the CNN architecture used.It is given by, where  ℎ is the image input size,  is padding,  is the kernel size and  is the stride.The number of FLOPs can then be estimated as: where   is the number of filters.In each sliding window, there are     multiplications and the sliding window process needs to be repeated as many times as the OutputSize .This entire procedure is then repeated for all   filters.The estimated number of FLOPs of the DenseNet-161 model is 15.6 GFLOPs based on the workflow12 13 of (Sovrasov 2023).Importantly, the 15.6 GFLOPs calculations include the operations involving the dense and activation layers in the DenseNet architecture.

COSFIRE filter approach
The FLOPs associated with the single convolution required by the COSFIRE filter approach are computationally inexpensive due to the exploitation of the separability property of Gaussian filters.For the separability of the Gaussian filter, equation 15, the 2D Gaussian can be written as the multiplication of two functions: one that depends on  and the other that depends on .In this case, the two functions are the same and they are both 1D Gaussian14 .Because the DoG function combines two Gaussian functions linearly, the separability property is also preserved in the DoG function.

𝐺 (𝑥, 𝑦
Table 3 provides a breakdown of the FLOPs that are required by the COSFIRE methodology at the inference stage.In that Table, we use the following set of hyperparameters as an example:  = 5,  ∈ {0, 5, 10, 15, 20},  1 = 0.1,  0 = 0.5, and  = 0.1.The FLOPs computations are mainly on the processes of convolve-ReLU-blur-shift-combine.For a given image of 150 × 150 pixels and 400 COSFIRE filters, the total estimated number of FLOPs is ∼1.5 GFLOPs.Also, in order to optimize the computational efficiency in a system using 4800 (i.e. 100 filters for each of the four classes applied in 12 orientations) COSFIRE filters that collectively involve 64,284 tuples, we employ the following strategy: • Elimination of Redundancy: Recognizing that numerous tuples are repeated across these filters, we first isolate every distinct tuple from the entire set.In fact, the number of tuples increases sublinearly with the increase in the number of COSFIRE filters used, Fig. 7.
• Computation and Storage: We calculate the response map for each unique tuple only once.This response map is essentially a modified version of the center-on or center-off DoG response map, subjected to blurring and shifting.We then store each computed response map in a hash table for quick retrieval, avoiding redundant computations.
• Configuration Sharing: Upon further analysis, we find that the configurations of many COSFIRE filters share identical pairs of tuples.
• Pairwise Response Maps: We compile a list of these common tuple pairs and pre-calculate the combined response map for each pair by performing a multiplication of the individual response maps.
• HashTable for Pairs: The resulting combined response maps of tuple pairs are also stored in a hash table.This ensures that the response for any pair that is used by more than one filter can be quickly fetched without recalculating it.
By implementing this approach, we make the process more efficient, as we avoid unnecessary recalculations for both individual tuples and their pairs, which significantly reduce the computational load and speeds up the overall filtering process.Using this efficient technique, the total number of FLOPs required is reduced to ∼1.1 GFLOPs; i.e. 26% reduction in the FLOPs (comparison of FLOPs basic and FLOPs column values in Table 3).
Similarly, conducting three trials with 26 sets of optimal hyperparameters from the validation data set, we noted a significant decrease in FLOPs, as depicted in Figure 6, corresponding to an increasing number of COSFIRE filters.In fact, with reference to Fig. 6 and Fig. 5 we demonstrate that we can achieve very high performance with fewer filters using the COSFIRE approach.Using just 48 filters for each class yields an accuracy rate of 92.46% ± 0.76, with a computational cost of 0.8 GFLOPs-approximately 20 times less than the DenseNet161 model that achieves a similar result.

DISCUSSION
The proposed trainable COSFIRE filter approach, which is designed to capture radio galaxies at different orientations, achieved better results than the CNN-based approach for radio galaxy classification on Table 3. COSFIRE filter FLOPs estimation at inference stage.The FLOPs calculation in the columns FLOPs basic and FLOPs (computation with pre-computed response maps of pairs of tuples) are based on the number of tuples obtained when using a single image for inference.As an example here is a breakdown of FLOPs at each stage for the specific set of hyperparameters:  = 5,  ∈ {0, 5, 10, 15, 20},  1 = 0.1,  ′ 0 = 0.5, and  = 0.1.In the FLOPs column, the symbol -represents the equivalent value from the FLOPs basic column.
Step Formula FLOPs basic FLOPs Center-On DoG sigma: Convolve the  × (150 × 150) image with a DoG kernel whose size is  = 6 + 1 (here  = 5).The expression  combines the product and summation operations, and the result is multiplied by 2 to account for the vertical and horizontal separable filters of the DoG functions.
3,330,000 - the same data set.This suggests that the trainable COSFIRE filter approach can capture more relevant salient features of the radio galaxy images than the CNN-based approach.Rotation invariance plays a vital role in robust galaxy classification15 , facilitating accurate and consistent outcomes.Also, the limited availability of annotated training data is attributed to the high cost of the labelling and the limited number of experienced astronomers dedicated to this task (considering the size of the astronomical data generated by modern telescopes).Therefore, for effective model generalization, it is essential that a classification model remains invariant to rotations, ensuring consistent predictions irrespective of the input's orientation.
Our results are consistent with previous studies exploring different approaches to incorporating rotation invariants of radio galaxies in the training set as a means of improving classification model accuracy and generalization.In other similar studies, in radio astronomy, Brand et al. (2023) showed that classification performance is im-proved when the orientations of training galaxies are normalized as opposed to when no such attempt was made to address rotational variations.Scaife & Porter (2021) used group-equivariant convolutional neural networks that encode multiple orientation information of a given input galaxy.Moreover, the widely adopted data augmentation pre-processing step in machine model development is another approach that also addresses this challenge of radio galaxy equivariance (Becker et al. 2021).While data augmentation is a useful tool, especially in small data sets, it comes with its own problems, namely the potential introduction of unrealistic or irrelevant variations into the data, which can lead to model overfitting and reduced interpretability.In addition, it requires domain knowledge to develop a good augmentation strategy, which considers all possible equivariant transformations given a galaxy image.The rotation invariance in the COSFIRE approach is intrinsic to the method and does not require data augmentation.This makes the approach more robust, versatile, and completely data-driven, with no domain expertise required, and hence highly adaptable to various computer vision applications.
CNN-based networks are computationally expensive.For instance, the forward pass of DenseNet161 consumes ∼15.6 GFLOPs.On the other hand, the COSFIRE filter approach demonstrates efficiency by utilizing a substantially lower number of FLOPs (Fig. 6), resulting in significantly lower demand for computational resources.The workflow of the COSFIRE approach is also more flexible than conventional CNN architectures.The number of filters used is a hyperparameter of this paradigm, which is in contrast to the fixed number of filters used in its counterpart.Our approach, based on the COSFIRE filter, is a novel algorithm for the classification of radio galaxies and comes with a low computational cost.

CONCLUSION
In this work, we introduce a novel descriptor based on trainable COSFIRE filters for radio galaxy classification.We combine this descriptor with an SVM classifier using a linear kernel and achieve superior results compared to a DenseNet161 transfer learning-based pre-trained network and a few-shot learning-based Siamese network on the same training data set.Our technique is computationally inexpensive (∼20× lower cost with the same accuracy as the DenseNet161 model), rotation invariant, free from data augmentation and does not rely on domain expert knowledge.These features make our technique not only effective (in terms of accuracy) but also efficient (in terms of FLOPs) for radio galaxy image classification tasks.The accuracy rate of 93.36% that we achieved is the best result ever reported for the benchmark data set prepared by (Samudre et al. 2022) 16 .
This work contributes to the field of radio astronomy by providing an alternative technique for identifying and analyzing radio sources.As the next-generation telescopes (such as LOFAR, MeerKAT, and SKA) produce high-resolution images of the sky, our future work will explore the effectiveness of our methodology on these new data and assess the possibility of cross-survey predictions.
in part by the National Research Foundation of South Africa (grant numbers 119488 and CSRP2204224243).

Figure 1 .
Figure 1.(a-c) Sample illustration of the Compact, FRI, FRII and Bent radio sources morphological appearance.(d) Maximum image superposition of all Compact, FRI, FRII and Bent radio galaxies from the data set used in this paper.

Figure 2 .
Figure 2. A graphic representation of the proposed COSFIRE filter pipeline for radio galaxy classification for both training and inference phases.The training phase involves configuring COSFIRE filters, extracting descriptors and training an ensemble of classifiers using a bagging approach.These models are applied in the inference stage and the label of the given image is determined by decision fusion.

Figure 3 .
Figure 3. Centre-on and centre-off DoG filter response maps obtained with convolution on Compact, FRI, FRII and Bent radio sources utilizing a standard deviation  of 3 for the outer Gaussian function.The images are of size 150 × 150 pixels.

Figure 4 .
Figure 4. Example of keypoint detection for the configuration of a COSFIRE filter.(a) Synthetic image simulating the superposition of a center-on and a center-off DoG response maps.(b) The five black dots indicate the five keypoints that are identified in locations that exhibit local maximum DoG values along the given two radii   1 and   2 at an angle    with respect to the center ( , ).

Figure 5 .
Figure5.Accuracy rate achieved on the original test set (see Table1) as a function of the number of COSFIRE filters used.The plots demonstrate how the average accuracy rate over the experiments with the determined 26 sets of hyperparameters varies based on the number of COSFIRE filters used, considering both cases of rotation invariance and without rotation invariance.Specifically, employing 48 filters for each class approximates the 92% accuracy level of DenseNet161, as reported bySamudre et al. (2022).Using 93 filters per class, which led to the maximum average accuracy on the validation set, we achieve an accuracy of 93.36% on the test set.
DoG (same sigma): Center-off is simply the negative of the centeron response map. =  22,500 -ReLU of center-on DoG map: Set all values below  1 to zero across the centeron responses using ReLU activation function. =  22,500 -ReLU of center-off DoG map: Same as above but for the center-off response map.Similar to , convolve the ×  (150×150) image with a DoG kernel of size:  ′  = ⌊  0 +   ⌋.The expression  is multiplied by two to account for center-on and center-off DoG response maps.The extent of blurring depends on the radii: 0, 5, 10, 15, and 20.The kernel size of the blurring function for the -th radius is calculated as   = 6 σ + 1:  0 = 6(1) + 1 = 7,  5 = 7,  10 = 13,  15 = 13,  20 = 19.ê

Taking the Root for Geometric Mean:
5,10,15,20} ê(  =) Shift the two DoG response maps as many times as the number of all unique tuples   across all COSFIRE filters.In our experiments, the   was 3187 (with rotation invariance).Compute the responses of all COSFIRE filters by multiplying the corresponding shifted response maps.The number of rotations is denoted by   = 12.The variable   = 4800 represents the number of COSFIRE filters, which here is set to 400 (100 per class) and multiplied by the total number of rotations (  = 12).  represents the total number of tuples across all the COSFIRE filters (  = 64, 284).The set of total tuples configured considering rotation invariance have repetitions that do not need to be re-computed.In this step, we pair the tuples that appear more than once among the 64,284 tuples.Then, we only compute and store in the memory a single pair of the shared tuples (duplicates) to save on computations.Therefore, the FLOPS are calculated from the pre-computed response maps of pairs of tuples and those of unique tuples.Before applying the multiplication operation, the shifted response map must first be retrieved from a hashtable residing in memory.The keys of this hashtable are determined by multiplying the first four prime numbers raised to given , ,  and : ℎ  = 2  3  5  7  .The retrieval of each tuple response map from memory is therefore 7 FLOPS.ℎ =   ℎ    .Geometric mean calculation operations for each response map of all   COSFIRE filters.Determination of a    = 400-element descriptor by taking the maximum value of each of the    response maps.SVM calculations depend on the number of decision function hyperplanes ( ℎ  ) given the number of classes under classification.This experiment uses the one-vs-one ('ovo') decision function approach, which means that  ℎ  =  ( − 1)/2.In this case,  = 4 and hence  ℎ  = 6.Each hyperplane separates one class from the rest of the classes.The number of FLOPs is therefore based on the dot products needed for one SVM between the given feature vector of size    and the  ℎ  hyperplanes.Since the dot product involves multiplications and additions then one SVM takes 2    ℎ  FLOPs.Having 10 SVM, this is repeated 10 times and finally, we choose the maximum which takes another 9 FLOPs.The number of FLOPs with and without the pre-computation of response maps of pairs of tuples against the number of COSFIRE filters per class.Number of tuples as a function of the number of COSFIRE filters used per class.The total number of tuples increases linearly with the increase in the number of COSFIRE filters per class, but the number of unique tuples increases sublinearly, since most tuples are shared among all COSFIRE filters and stored in the memory, hence requiring less computations.The total number of tuples required by 48 filters is very close to the number required by 93 filters.