Classifications of Fermi-LAT unassociated sources in multiple machine learning methods

The classifications of Fermi-LAT unassociated sources are studied using multiple machine learning (ML) methods. The update data from 4FGL-DR3 are divided into high Galactic latitude (HGL, Galactic latitude $|b|>10^\circ$) and low Galactic latitude (LGL, $|b|\le10^\circ$) regions. In the HGL region, a voting ensemble of four binary ML classifiers achieves a 91$\%$ balanced accuracy. In the LGL region, an additional Bayesian-Gaussian (BG) model with three parameters is introduced to eliminate abnormal soft spectrum AGNs from the training set and ML-identified AGN candidates, a voting ensemble of four ternary ML algorithms reach an 81$\%$ balanced accuracy. And then, a catalog of Fermi-LAT all-sky unassociated sources is constructed. Our classification results show that (i) there are 1037 AGN candidates and 88 pulsar candidates with a balanced accuracy of $0.918 \pm 0.029$ in HGL region, which are consistent with those given in previous all-sky ML approaches; and (ii) there are 290 AGN-like candidates, 135 pulsar-like candidates, and 742 other-like candidates with a balanced accuracy of $0.815 \pm 0.027$ in the LGL region, which are different from those in previous all-sky ML approaches. Additionally, different training sets and class weights were tested for their impact on classifier accuracy and predicted results. The findings suggest that while different training approaches can yield similar model accuracy, the predicted numbers across different categories can vary significantly. Thus, reliable evaluation of the predicted results is deemed crucial in the ML approach for Fermi-LAT unassociated sources.


INTRODUCTION
Early gamma-ray source catalogs, such as Celestial Observation Satellite (COS-B) source catalogs (e.g.Hermsen 1981;Pollock et al. 1987) and Compton Gamma Ray Observatory (CGRO) source catalogs (e.g.Fichtel et al. 1994;Thompson et al. 1995;Hartman et al. 1999), only included a small number of sources, and most of them were not identified or associated in other wavelength bands, label as unassociated sources.The detection capability of GeV gammaray sources was greatly enhanced with the launch of the Fermi Gamma-ray Space Telescope in 2008 (Atwood et al. 2009).The Fermi-LAT collaboration regularly releases catalogs of Fermi-LAT GeV gamma-ray sources (FGL).With increasing exposure time and improved gamma-ray background modeling, a large number of gamma-ray sources have been detected.Recent releases of the Fermi-LAT source catalogs, such as 3FGL (Acero et al. 2015), 4FGL-DR1 (Abdollahi et al. 2020), 4FGL-DR2 (Ballet et al. 2020), and ⋆ E-mail: lizhang@ynu.edu.cn4FGL-DR3 (Abdollahi et al. 2022), contain thousands of gamma-ray point sources.Alongside the detection of a large number of gamma-ray sources, a significant number of dark, non-variable unassociated sources have been discovered.
Recently, the Fermi-LAT collaboration released a new version of the Fermi Large Area Telescope (FGL) catalog, known as 4FGL-DR3 (Abdollahi et al. 2022).This updated catalog comprises 6658 point sources, making it the largest gamma-ray source catalog to date.However, approximately one-third of these sources remain unassociated with known counterparts.Based on a Galactic latitude cut at |b| = 10 • , Abdollahi et al. (2022) divides the sky into high Galactic latitude (HGL) and low Galactic latitude (LGL) regions.The parameter distributions of unassociated sources were analysed in both regions, indicating that HGL unassociated sources are likely dominated by blazar-like objects, while the composition of sources in the LGL region may be more complex.Based on the assumption of a uniform distribution of blazars in the entire sky and considering the background contamination at low latitudes, an estimation of detectable blazars in the low-latitude region was performed.The num-ber of blazars in the low-latitude region is limited to 340±20.Additionally, It was also pointed out that the distribution of spectral indices for low-latitude blazars shows anomalies, with an excess of 75±4 soft-spectrum sources, which may be attributed to contamination from the Galactic component.
The classification of unassociated sources and finding their multi-wavelength counterparts are important scientific goals.It has significant implications for understanding highenergy radiation mechanisms, the origin of cosmic rays, and other astrophysical phenomena.Unfortunately, most of the unassociated sources are faint and exhibit weak variability, with their significance often close to the detection threshold.Moreover, unassociated sources are predominantly concentrated in LGL regions, where the presence of strong diffuse gamma-ray background and high source density near the Galactic plane makes their detection and identification more challenging.
Statistical methods (e.g.Ackermann et al. 2012) or multi-band characterization (e.g.Frail et al. 2018;Kaur et al. 2021) have been used for the classification of Fermi-LAT unassociated sources.In recent years, machine learning (ML) has achieved success in the field of big data mining and analysis, and it has been widely applied to astronomical data (Baron 2019).ML can be divided into supervised learning and unsupervised learning.Classification mainly refers to the application of supervised ML (referred to as ML below).ML methods have been widely applied to the classification of Fermi-LAT unassociated sources (Mirabal et al. 2012;Doert & Errando 2014;Saz Parkinson et al. 2016;Mirabal et al. 2016;Lefaucheur & Pita 2017;Luo et al. 2020;Zhu et al. 2021;Finke et al. 2021;Germani et al. 2021;Chiaro et al. 2021;Bhat & Malyshev 2022;Coronado-Blázquez 2022;Malyshev & Bhat 2023), achieving high levels of accuracy in the training set and test set (e.g., >95%, see Bhat & Malyshev 2022;Coronado-Blázquez 2022).
However, there are still some issues that need to be addressed.The sample representativeness is one of the fundamental assumptions in ML applications (Bishop 2006).It requires that the training dataset and predicted samples are sampled independently and identically from the overall data distribution, capable of representing the features and patterns of the entire dataset.Only by fulfilling this assumption can we ensure that ML models have good accuracy and generalization ability when making predictions.
In the task of ML classification Fermi-LAT unassociated sources, the sources used for training the models are primarily those with high significance and strong variability.However, it is questionable whether such models can be effectively applied to predict the nature of dark sources (Zhu et al. 2021).Moreover, The associated sources are mainly dominated by HGL active galactic nuclei (AGNs), while most of the unassociated sources are located in LGL regions, which may be dominated by the Galactic population.Due to the strong diffuse gamma-ray background in the neighbouring region of the Galactic plane and the distribution difference of Galactic and extragalactic sources between the HGL and LGL regions, it remains uncertain whether the all-sky models trained mainly by HGL sources are suitable for classifying LGL sources.
In addition, previous attempts have primarily focused on optimizing the performance of models on training and test sets, but lacked the necessary assessment of the credibility of prediction results.For instance, the high density of LGL AGN-like candidates resulting from LGL unassociated sources, as well as the different distributions between predicted candidates and associated samples in the same parameter space.
In this work, we established classification models in the HGL and LGL regions, respectively.Different feature sets and classification strategies were chosen based on different datasets.The models were trained and optimized, and the validity of the prediction results was evaluated when classification results were obtained.In particular, in the LGL region, we developed a Bayesian-Gaussian (BG) model that incorporates spectral index to eliminate the excess of soft spectral AGNs in the training samples, and to reassess the AGN-like candidates obtained from ML classification.Combining the classification results from both high and lowlatitude regions, we established a catalog of unassociated sources across the all sky.At the all-sky scale, we conducted an analysis of the plausibility of the classification results and compared the differences between the models in the HGL and LGL regions.
The structure of the paper is as follows.Section 2 describes the process of data collection, selection, and preprocessing.Section 3 provides a brief overview of ML classifiers, where training, optimization, and ensemble methods are introduced.Section 4 provides a detailed description of the training, testing process, and classification results of the LGL supervised learning classifier.Section 5 presents the establishment of the low-latitude classification model.Specifically, Section 5.1 describes the development and utilization of the BG model, while Section 5.2 discuss the training, optimization, and classification results of the ML model.Section 6 combines the results obtained in the previous sections to construct a catalog of all-sky unassociated sources, and provides the distribution of candidates at the all-sky scale.The conclusion and discussion are presented in Section 7.
To ensure the reproducibility of our work, we fixed the random seed to "123" in the code involving random processes.

DATASET PREPARATION AND PREPROCESSING
The Fermi-LAT Collaboration published the Fourth Fermi-LAT Gamma-ray Source Catalog (4FGL) in 2020, covering the results of Fermi-LAT's sky survey observations from 2008 to 2016, spanning a period of eight years (Abdollahi et al. 2020).Subsequently, the Fermi-LAT Collaboration updates the 4FGL catalog every two years, incorporating an additional two years of observational data.The latest version of the Fourth Fermi-LAT Gamma-ray Source Catalog is the third release (4FGL-DR3), published in 2022, encompassing GeV observational data from 2008 to 2020 (Abdollahi et al. 2022).It contained the largest catalog of gamma-ray sources in the GeV energy range to date.
4FGL-DR3 1 contains 6658 point sources.Among them, Note: Column (1)-( 4) are the feature number, name, symbol, and unit used for classification; Column (5) describes of the physical meaning of the feature; Column (6) indicates whether the feature has been logarithmically transformed.
4367 sources have been identified or associated with counterparts in other wavebands and are classified into 22 subclasses.134 sources with LGL are weakly associated with Xray counterparts but their nature is unknown.Additionally, 2157 sources have not been found to have counterparts in other wavebands and are referred to as unassociated sources.
Based on the distribution of GeV gamma-ray sources, the 6,658 point sources can be categorized into 4 major classes: 1. AGN-like class, which includes different types of blazars (FSRQ, fsrq, BLL, bll, BCU, bcu 2 ) and non-blazar AGN (RDG, rdg, AGN, agn, css, NLSY1, nlsy1, sey), characterized by significant flux variability.There are a total of 3813 sources in the AGN class, with 3406 at high Galactic latitude area (|b| 10 • ) and 407 at low Galactic latitude area (|b| < 10 • ).They are labeled as "agn" in the context.
2. Pulsar-like class, which includes millisecond pulsars (MSP, msp) and young pulsars (PSR, psr), characterized by curved spectra and spectral cutoffs in the GeV range.There are a total of 290 sources in the Pulsar class, with 124 at high Galactic latitudes and 166 at low Galactic latitudes.They are labeled as as "psr".
3 unassociated sources, 1166 are in low Galactic latitude regions and 1125 in high Galactic latitude regions.
Each source in the 4FGL-DR3 catalog contains 160 columns of data.After excluding descriptive columns, errors, missing values, historical data, duplicate data (i.e.PLEC Index2), and inter-dependent parameters (e.g.GLAT, GLON vs r.a., decl., and νFν vs Fν), we obtained a total of 26 feature parameters directly for 4FGL-DR3 fits table (See Table 1 feature 1-26).The parameters are mainly divided into four categories: 1)Positional features: These describe the celestial coordinates of the sources, including Galactic longitude and Galactic latitude.2)Spectral features: These include spectral parameters derived from fitting the GeV data with PowerLaw, LogParabola, and PLSu-perExpCutoff43 .They also encompass the significance differences when fitting with different spectral models.3)Flux features: These involve the differential average flux and integrated flux in eight Fermi-LAT's energy bands4 , as well as the flux.4) Significance features: These consist of the average significance and predicted photon event count of the sources.5)Variability features: These include the variability index and fractional variability index.
To mitigate the systematic differences in flux and significance caused by the distances of sources, which could potentially mislead ML classification, we introduced some induced parameters as internal features.Following Ackermann et al. (2012), we provided the hardness ratios between eight Fermi-LAT bands to describe the soft and hard changes of spectrum, using where, νFν (i) and νFν(j) are the SEDs of different Fermi-LAT band, in which j = i + 1.The quantity of hrij is always between -1 and 1.Furthermore, to characterize the variation of the spectral index and the concavity of the spectrum, we define the concavity coefficient H(i, j, k), as The quantity of H ijk is always between -2 and 2. So, there are seven hardness ratios and six concavity coefficients in a total of eight Fermi-LAT bands (See Table 1 feature 27-39).
In order to rationalize the dataset and optimize the machine learning workflow, it is important to pay attention to certain details: i) To simplify the parameter space and reduce the computational load of ML, we took the logarithm of 18 features that spanned more than three orders of magnitude.The log10 flags of features can be found in Table 1.ii) Due to its uniqueness, the source 4FGL J1745.6-2859,labeled as a galaxy center, was removed from the dataset.
iii) The inverse Compton component of the Crab Nebula, denoted as 4FGL J0534.5+2201i, was removed from the dataset.iv) Less than 1% of the sources have parameter values of LP SigCurv, PLEC SigCurv, and νFν(8) equal to 0, which appear as infinitesimally small in logarithmic space.We have filled these values with the smallest non-zero value of the respective parameter.
Naturally, divided by a Galactic latitude threshold of |b| = 10 • , the ML classification task is split into two frameworks.In the HGL region, the training set consists of 3407 AGNs, 124 pulsars, and 55 samples from the "other" class.The objective of this classification is to identify a small number of non-AGN sources among 1125 unassociated sources, using active galactic nuclei as the background.In the LGL region, the training set comprises 407 active galactic nuclei, 166 pulsars, and 208 samples from the "other" class.With a significantly larger number of unassociated sources (1166), a model is developed using a limited number of training samples for the purpose of three-class classification.

ML Classifiers and Optimization Methods
In the field of ML, there are many classification algorithms available.Examples include decision trees (DT), random forests (RF), logistic regression (LR), support vector machines (SVM), and multilayer perceptrons (MLP), among others, which are widely used for classifying unassociated sources in Fermi-LAT or evaluating the types of Fermi-LAT BCU (Blazar of unknown type).In this study, the chosen methods are LR, SVM, RF, and MLP.
LR is a commonly used statistical learning method (Cox 1958).It models the relationship between input features and class labels by establishing a logistic function (also known as the sigmoid function).It maps the feature space to the probability space, enabling classification.LR is relatively simple and has strong interpretability.
SVM is a classical ML algorithm that aims to construct an optimal hyperplane or maximize the margin in a multidimensional parameter space to achieve effective data classification (Cortes & Vapnik 1995).It has the advantage of being able to handle high-dimensional data and non-linear problems while exhibiting good generalization capabilities.
DT is one of the earliest ML algorithms.It uses feature parameters to create nodes and makes branching decisions based on certain criteria (Breiman et al. 1984).A large number of nodes form a tree-like structure.However, decision trees are prone to over-fitting when the depth increases.To address this issue, an early ensemble learning algorithm called random forest was developed.In random forest, multiple decision trees are combined using the bagging method (Breiman 2001).The trees are trained on different subsets of the data, and the final result is determined by voting.This approach helps mitigate over-fitting and improves the model's generalization ability.
MLP is a simple artificial neural network model that consists of an input layer, hidden layers, and an output layer (e.g.Pedregosa et al. 2011).Each layer contains multiple neurons.The input layer receives raw data, the hidden layers are responsible for feature extraction and nonlinear transformations of the data, and the output layer produces the final prediction results.Each neuron has an activation function, commonly used activation functions include Sigmoid, ReLU, and Tanh.However, training and optimizing MLP can be complex, and it requires a substantial amount of data and computational resources for hyperparameter tuning and optimization.
Our dataset consists of a 39-dimensional parameter space.Having too many features can result in a large computational burden and potentially lead to a decrease in accuracy (e.g.Kang et al. 2019).To address this, we employ a model-dependent feature selection method called Recursive Feature Elimination (RFE), which allows us to select the optimal parameter space for different classification algorithms and scenarios.RFE works by training the model using all the features and iteratively removing the least important features based on the model's feedback.This process continues until an accuracy-feature count curve is obtained, from which the optimal feature subset can be determined.
All machine learning algorithms have model parameters that affect the training and performance of the model.However, there are certain parameters that cannot be learned during training and are referred to as hyper-parameters.For example, hyper-parameters include the number of trees and maximum depth in random forests, the hidden layer structure and activation function in MLP.Hyper-parameters play a crucial role in model training and performance, but they need to be manually set and cannot be directly learned from the data.To determine the optimal hyper-parameter values, a common approach is grid search.
To train and optimize a model, it is necessary to partition the dataset into separate sets for training and testing the performance.However, the randomness of data partitioning, especially for imbalanced samples, can lead to unstable classifier performance.In order to ensure stable and reproducible classifier performance, we employed 5-fold stratified cross-validation in the training and optimization of all classifiers.
Due to the inconsistent kernels and classification principles of different classification algorithms, it is impossible for all algorithms to produce identical results for the same sample.To obtain a unified result, different methods can be employed: i) Seeking the union of predicted results: In this approach, inconsistent classification results are disregarded, and only the agreed-upon classifications are consid-ered (e.g.Zhu et al. 2021;Bhat & Malyshev 2022).ii) Using a voting ensemble classifier: This method involves combining the predictions of multiple classifiers and determining the final classification based on a voting scheme.Through the votes of different classifiers, a consistent classification result is obtained.In this study, an ensemble voting classifier is employed to achieve a unified classification result.This approach takes advantage of the collective decision-making of multiple classifiers, which enhances the robustness and reliability of the final classification outcome.
The process of creating, training, optimizing, and testing all the classifiers mentioned above can be implemented using the Python library scikit-learn (Pedregosa et al. 2011).

Bayesian Gaussian model
The Bayesian principle is a fundamental inference tool in the fields of statistics and ML (e.g.Gelman et al. 2013).It is built upon the relationship between prior and posterior probabilities.The Bayesian principle plays a key role in various aspects such as parameter estimation, hypothesis testing, and model selection, providing a solid theoretical foundation for data analysis and inference.
In a multi-class classification problem, let there be classes m1, m2, ..., mn, and given the multidimensional parameters (x1, x2, ..., xm).It is assumed that these parameters are mutually independent.Each parameter (x1, x2, ..., xm) in each class follows an independent Gaussian distribution N (µi, σ 2 i ), where µi represents the mean and σi represents the variance.The probability density function of the Gaussian distribution is given by: The prior probability P (mn) for each class can be estimated based on the sample counts: Here, Nm n represents the number of samples in class mn, and N all represents the total number of samples.By fitting Gaussian distributions to all the parameter distributions and applying Bayes' theorem, the posterior probability of a sample belonging to class mn can be calculated as follows: The normalization factor P (x1, x2, ..., xn) is defined as: For each sample, by substituting it into Equations 3-6, the probabilities of it belonging to different classes, (P1, P2, ..., Pn), can be calculated.Here, P1 + P2 + • • • + Pn = 1.By comparing the relative values of these probabilities, the estimated class of the sample can be determined.
The following sections describes the the process of constructing classification models for the HGL and LGL regions.

HIGH GALACTIC LATITUDE REGION
In the HGL dataset, there are 3407 AGN-like samples, 124 Pulsar-like samples, 55 other-like samples, and 1125 unassociated sources.Abdollahi et al. (2022) has discussed the distribution of gamma-ray spectral index in unassociated sources at high Galactic latitudes and finds that it resembles that of BCU sources, as well as all AGN-like objects (see Figure 1 left panel), suggesting that most unassociated sources in the HGL region are likely AGNs.Here, we present the distributions of variability index and significances of logparabolic fits for both associated and unassociated sources in the HGL region.As shown in Figure 1 middle and right panel, the unassociated sources do not exhibit strong variability or significant spectral curvature compared to the associated sources.Their distribution is similar to that of AGNs, indicating that they are primarily AGN-likes.However, due to the overlap in some parameter distributions between AGNs and pulsars, it is not possible to rule out the possibility of non-AGN contamination.
Additionally, the number of other class samples is limited (55), and introducing them would significantly disrupt the distinguishing features of the samples and result in a significant decrease in classification accuracy.Therefore, a binary classification approach using AGN-like and pulsarlike categories is adopted for the HGL region.

Results of Individual ML model
Due to the imbalanced nature of binary classification, we use balanced accuracy instead of simple accuracy to evaluate the model.Balanced accuracy is an evaluation metric used for imbalanced classification problems.For n-class classification, its definition is as follows (Urbanowicz & Moore 2015): For each class, define it as the positive class, calculate the average of its sensitivity and specificity, and then calculate the average of all classes.When the samples of each class are perfectly balanced, the balanced accuracy is equal to the accuracy.The balanced accuracy for multiclass classification takes into account the imbalance among different classes, allowing for a more comprehensive evaluation of the model's performance in multi-class classification problems.
In situations where there is sample distribution imbalance or varying importance among different classes, the balanced accuracy provides a more reasonable assessment of the model's performance.
Using the classifier optimization methods described in Section 3.2, we trained and tested four different classification algorithms.The RFE curves for these four classifiers are shown in Figure 2 top panel, and the corresponding optimal feature combinations are presented in Table 2.
The grid search results for hyper-parameter tuning of LR, SVM, and RF are shown in Figure 2   with different hyper-parameter values.The optimal hyperparameter combinations for different algorithms are listed in Table 2.
The classification results are shown in Table 2.The four classification models exhibit consistent results, with a balanced accuracy of approximately 90%.Among them, LR has a slightly lower balanced accuracy, while MLP has a slightly higher balanced accuracy.The classification yields approximately 1032 -1059 samples classified as AGN-like and 66 -93 samples classified as pulsar-like.

Results of Ensemble ML model
By using an ensemble voting classifier, we combined the results of four individual classifiers.We performed a grid search on the weights of the four sub-classifiers in a "soft" ensemble voting classifier.Through five-fold stratified crossvalidation, we identified the optimal weights for the ensemble voting classifier.Among multiple optimal weight combinations, we selected the combination with the simplest weight sum (i.e., the simplest model).
With the optimal weights [1, 2, 1, 4] as hyperparameters, the balanced accuracy reached 0.918 ± 0.029 in cross-validation (See Table 2).We used an ensemble voting classifier to evaluate the categories of 1125 unassociated sources, resulting in 1036 AGN-like candidates and 89 pulsar-like candidates.Probability assessments were also conducted for the unassociated sources, leading to the creation of a probability catalog.
To assess the validity of the classification results, we examined the parameter distributions of the candidate objects, as shown in Figure 1 bottom panel.The distribution of the gamma-ray spectral index, variability index and sig-nificance of log-parabolic fits for both types of candidates are not significant different with associated ones.Specifically, the pulsar-like candidates exhibited weaker variability compared to the AGN-like candidates, while they showed a higher degree of spectral curvature.This consistency with the parameter distributions of known AGNs and pulsars confirmed the reliability of classification results.

LOW GALACTIC LATITUDE REGION
In the LGL dataset, there are 407 AGN-like, 166 pulsar-like, 208 samples from the "other" class, and 1166 unassociated sources.Due to the strong diffuse gamma-ray background radiation near the Galactic plane, the features of sources are not clear.Moreover, the number of unassociated sources is larger than that of associated sources, making it challenging to build a high-performance classifier.
We provide the distributions of gamma-ray photon index, variability index, and significances of log-parabolic fits for both associated and unassociated sources in the LGL dataset.From Figure 3, it can be seen that the unassociated sources in the LGL dataset are predominantly characterized by soft spectrum, weak variability, and moderate to weak significances of spectral curvature.Figure 3 showed that the parameter distribution of unassociated sources in the LGL dataset partially overlaps with the AGN-like and Pulsar-like classes, and is most similar to the other-like class.The results indicated that the unassociated sources exhibit less significant spectral curvature, unlike pulsars, and their variability is extremely weak, unlike AGNs.These results suggested that unassociated sources in the LGL dataset are   likely dominated by the other-like class rather than the pulsar and AGN classes.
Comparing the gamma-ray spectral index distribution of AGNs near the Galactic plane with those at HGL, it is evident that there is an excess of soft spectral samples (with Γ > 2.4) in the LGL region, as shown in Figure 3 (left panel).According to Abdollahi et al. (2022), the estimated number of these excess blazars is 75 ± 4, which could be attributed to contamination from Galactic components.
Based on the detected counts of blazars at high latitudes and accounting for the detected flux difference due to the brighter diffuse emission background near the Galactic plane, the number of blazars with |b| < 10 • is estimated to be 340 ± 20 (Abdollahi et al. 2022).Considering the 1037 AGN-like candidates provided by the HGL ML analysis, we can estimate the number of AGNs in the low-latitude region using the following equation: Here, N bla = 3342 represents the current number of blazars in the HGL region, N bla + N nonbla = 3407 is the total number of AGNs at HGL region, N can agn = 1036 denotes the number of AGN candidates in the high-latitude region, and N e bla = 340 ± 20 represents the estimated number of blazars in the low-latitude region based on existing observations (Abdollahi et al. 2022).According to our estimation, there are approximately 452 ± 27 observable AGNs in the LGL region.The sum of the obtained AGN-like candidates and the existing AGNs should roughly satisfy this constraint.
Due to the near saturation of associated AGN counts at low Galactic latitudes and the presence of an excess of soft-spectrum sources leading to sample impurity, we first employ a Bayesian-Gaussian model to screen the training samples.

Bayesian Gaussian estimation
In Abdollahi et al. (2022), the variability index and the significance of the log-parabolic fit were employed to differentiate between AGNs and pulsars.This study focused specifically on the excess of soft-spectrum index sources in LGL AGN.By utilizing these three parameters, a BG classifier is established for probabilistic inference in AGN-nonAGN classification.
First, the known samples from the LGL training dataset are divided into two categories: AGN and non-AGN.Then, Gaussian function fitting is performed separately for the distributions of the three parameters (gamma-ray spectral index, variability index, and logarithm of the parabolic fit) for both AGN and non-AGN samples.These distributions are represented as N (µi, σ 2 i ).For a given sample to be classified, its parameter values and the obtained parameter distributions N (µi, σ 2 i ) are substituted into Equations 3-6, which calculate the likelihood probabilities of it belonging to the AGN or non-AGN category.These probabilities are denoted as La and Lna, respectively.It should be noted that La + Lna = 1.By comparing the relative magnitudes of La and Lna, the closeness of the sample to AGN or non-AGN in the parameter space can be assessed.This method is initially used to estimate the reliability of associated AGN samples in the LGL region.
When applying the BG model to the associated LGL AGN, it was found that 81 sources, characterized by excessively soft spectra and weak variability (the red region in Figure 4 middle panel), were considered dissimilar to AGN with a classification threshold of Lna > 0.5 > La.This result was highly consistent with the 75 ± 4 sources reported in 4FGL-DR3 (Abdollahi et al. 2022).These sources were likely to be misassociations in the Fermi-LAT catalog.The information for these 81 sources was provided in a machinereadable format for further analysis.
After excluding these low-confidence samples, there remain a total of 326 AGNs in the low-latitude region.The complete distribution of their spectral indices can be observed in the bottom panel of Figure 4, where an excess of soft-spectrum AGNs has been suppressed.These samples, combined with 166 pulsars and 208 other-like sources, were used as the training set to construct a ternary ML classifier for the low-latitude region.

Results of Individual ML model
Using the classifier optimization methods described in Section 3, we trained and tested four different classification algorithms.
The RFE curves for these four classifiers are shown in Figure 5 top panel, and the corresponding optimal feature combinations are presented in Table 3.The grid search results for hyper-parameter tuning of LR, SVM, and RF are shown in Figure 5 bottom panel.From the figure, we can observe how classifier performance varies with different hyper-parameter values.The optimal hyperparameter combinations for different algorithms are listed in Table 3.The classification results are shown in Table 3.The results from the four classification models are consistent, with a balanced accuracy of approximately 80%.Among the four classifiers, LR and RF achieved slightly higher accuracy, while MLP had lower accuracy.The classification results indicate that in the low-latitude region, there are approximately 260-318 samples classified as AGN-like, 106-199 samples classified as pulsar-like, and 706-760 samples belonging to other-like classes.The evaluations of sources varied greatly among different algorithms, highlighting the lack of reliability in the results obtained from a single classifier.
To obtain a unified result, we used a voting ensemble classifier to combine the results of multiple classifiers.

Results of Ensemble ML model
Using an ensemble voting classifier, we combined the results of four individual classifiers.We performed a grid search to optimize the weights of the four sub-classifiers in the "soft" ensemble voting classifier.Through five-fold stratified crossvalidation, we determined the optimal weights for the ensemble voting classifier.
Using the best weights [4,1,1,1] as hyper-parameters, the balanced accuracy reached 0.815 ± 0.027 in crossvalidation (see Table 3).We evaluated the ensemble voting classifier on 1166 unassociated sources, resulting in 290 AGN-like candidate sources, 135 pulsar-like candidate sources, and 741 candidates from other categories.From the weights of the ensemble classifier, it can be known that the LR classifier dominates in the voting process.
We investigated the spectral index distributions of the candidate sources, with different candidate categories shown in blue in Figure 6.From the figure, it can be seen that the LGL unassociated sources are predominantly other-like, and their spectral index distribution is similar to that of the unassociated sources (gray line).However, the AGN-like candidates still exhibit an excessive soft component, which is not reasonable.Additionally, we obtained 290 AGN candidate sources, and when combined with the existing 326 high-confidence associated AGN samples, the total number of AGNs reached 616, significantly exceeding the estimated value of 452±27.The higher density of AGNs in the low latitude region compared to the high latitude region is clearly unreasonable.
We re-evaluated the AGN-like candidates using a BG model, as shown in Figure 3.Among them, 83 candidates were identified as non-AGN-like and were labeled as lowconfidence AGN-like candidates (LACs), while the remaining 207 samples were considered high-confidence AGN-like candidates (HAC).By combining the HACs with the 326 high-confidence associated AGN samples, shown as the blue area at the bottom panel of Figure 3, the excess of soft spectral sources was suppressed, resulting in a total of 533 sources, slightly higher than the estimated number, which can be considered a reasonable outcome.

RESULTS
In the previous sections, we conducted models for highlatitude and low-latitude sources and classified the unassociated sources in 4FGL-DR3.Combining the classification results from these two frameworks, we constructed an all-sky catalog of unassociated sources.Among the 2291 unassociated sources, 1327 were identified as AGN-like candidates (1244 HAC, 83 LAC), 223 as Pulsar-like candidates, and 741 as Other-like candidates.In Figure 7, we presented the scatter plot of the associated sources and candidates in the Galactic coordinate system, as well as the density distribution curves for Galactic longitude and Galactic latitude.
In the upper panel of Figure 7, we presented the plot related to AGN.The gray dots represent known AGN samples, the blue dots represent AGN candidates identified through LGL classification, and the red dots represent LACs.Associated active galactic nuclei are widely distributed throughout the celestial sphere, but there is a gap near the Galactic plane.A large number of sources cluster around the Galactic center, especially among the unassociated sources, resulting in a concentration of candidates identified by ML in that region.Furthermore, considering the strong diffuse gamma-ray background near the Galactic center, such a concentration have intensified.Applying the Bayesian-Gaussian model for correction, the excluded samples are mainly concentrated near the Galactic center.However, even after the correction, the remaining AGN-like candidates (HACs) still exhibit a distribution near the Galactic center at Galactic longitude that exceeds the expected.This indicates that our constraints are still insufficient, and there is still a part of non-AGN contamination among the AGN-likes.Additionally, this was consistent with the total number of associated AGNs and AGN candidates exceeding the expected limit.
The middle panel of the figure corresponds to the Pulsar plot.The gray dots represent known pulsar samples, and the red dots represent pulsar-like candidates identified through ML.The number of high Galactic latitude pulsar-like candidates is slightly lower than that of LGL candidates.Comparing the distribution of associated pulsars and pulsar-like candidates, we can observe an accumulation of pulsar-like candidates near the Galactic center.However, due to the strong gamma-ray background near the Galactic center, the validity of these samples needs to be carefully considered.
The lower panel of figure 7 represents the Other-like plot.The gray dots represent known samples, and the green dots represent other-like candidates identified through machine learning.Since the high Galactic latitude region was classified using an AGN-pulsar binary classifier, the resulting other-like candidates are only present in the low Galactic latitude region.The Other category is more complex as it includes both Galactic components (PWN, SNR, etc.) and extragalactic components (galaxies, etc.).Both the candidates and associated sources exhibit a symmetric distribution centered around the celestial coordinates.It should be noted that the AGN-like candidates excluded using the Bayesian-Gaussian model are considered as other-like samples, but they are not depicted in the plot.
We present a machine learning classification catalog of unassociated sources, which consists of the following three parts: 1. High Galactic Latitude Unassociated Source Catalog (una high.fits):This catalog includes source information for HGL unassociated sources along with their ML classification results.
2. Low Galactic Latitude Unassociated Source Catalog (una low.fits):This catalog contains source information for LGL unassociated sources, their ML classification results, and the re-evaluated results using the BG model.
3. Misassociated Low Galactic Latitude AGN Candidate Catalog (agn low.fits):This catalog provides source information for 80 LGL samples that bg model considers unlikely to be AGNs.It also includes the likelihood probabilities of being AGN/non-AGN based on the Bayesian Gaussian model.
All tables are available in FITS format and can be accessed online through the supplementary material provided by MNRAS (See Data Availability).Detailed descriptions of each column in these three FITS tables can be found in Table 4.

CONCLUSION AND DISCUSSION
In this paper, we divided the task of classifying unassociated of Fermi-LAT gamma-rays sources into two frameworks.
In the high Galactic latitude region, we employed a binary classifier to classify AGN-like and Pulsar-like sources and trained it using imbalanced samples.By utilizing four supervised machine learning algorithms and optimizing the models, we achieved a balanced accuracy of 90% in a 5fold stratified cross-validation experiment.The predicted results exhibited consistency.The classification results obtained from the four algorithms demonstrated a high level of consistency.By employing an ensemble voting classifier, we identified 1037 AGN candidates and 88 pulsar candidates with a balanced accuracy of 0.918 ± 0.029.
In the low Galactic latitude region, the number of unassociated sources exceeded the number of associated sources, and the features of the sources were not clear, resulting in challenges in the classification process.We introduced BG model by fitting Gaussian functions to the distributions of gamma-ray spectral index, variability index, and logparabolic fit significances of the associated sources.During the evaluation of associated AGNs, we identified 81 sources with low confidence as misassociated candidates, which is consistent with the findings of Abdollahi et al. (2022).After removing these samples from the training set, we constructed a three-class classifier for AGN-like, pulsar-like, and other-like sources using the same four supervised ML algorithms.The balanced accuracies of the various classifiers in the three-class classification were all close to 80%.By employing an ensemble voting classifier, we obtained 290 AGN-like candidates, 135 pulsar-like candidates, and 742 other-like candidates, achieving a balanced accuracy of 0.815 ± 0.027.After re-evaluating the AGN-like candidates using the BG parameter model, we found 83 candidates with low confidence, label as LAC.Our ML results directly indicated that non-AGN and non-pulsar sources dominate the low Galactic latitude region.
By combining the classification results obtained from the high and low Galactic latitude regions, we constructed a comprehensive catalog of unassociated sources across the all-sky.
In the HGL region, a simple method achieved a high accuracy on the training set, and no significant anomalies were found when examining the parameter space of the classification results.However, several challenges were encountered in the LGL region.Firstly, the number of unassociated sources exceeds the number of associated sources, posing challenges to model construction; Secondly, the purity of the training samples is not high, with an excessive number of soft spectrum samples in the LGL region, raising suspicion of contamination from other sources.Thirdly, there is a significant mismatch in the proportions of different categories between the training samples and the samples for predicted.In the training set, the ratios of AGN-like, Pulsar-like, and otherlike sources are 407:166:209, while based on our classification results, the ratios of these three categories are 290:135:742, indicating a severe deviation between these two ratios.All of these factors pose challenges to the performance of machine learning classifiers, and the generalization ability from the training set to the test set is questionable.Additionally, the accuracy of the classifiers in the low Galactic latitude region is lower compared to the high Galactic latitude region, and there are differences in the results among different classification algorithms, especially for the pulsar category, resulting in fluctuations in the predictions for unassociated sources, ranging from 109 to 199.
We compared the classification results of our work with the results from early attempts, such as Zhu et al. (2021), Bhat & Malyshev (2022), and Coronado-Blázquez (2022) , and the specific details are listed in Appendix A. The results show that current ML methods have achieved high accuracy on the all-sky training set.Successful classification of unassociated sources in the HGL region has reached a consensus: these sources are predominantly dominated by AGNs, with a small fraction being pulsar-likes and other-likes categories.However, the all-sky ML models face challenges in classifying unassociated sources in the LGL region, and there is considerable inconsistency among the results of different classifiers.In single-classifier classification, the predicted results indicate that unassociated sources near the Galactic plane are dominated by AGNs.In multi-classifier classification, although the results are unified through "All-Agree", more than half of the low-Galactic-latitude sources are classified as "MIXED" without specific classes.
Although our results directly indicated that unassociated sources in the low-latitude region are neither dominated by AGNs nor pulsars, this is consistent with the distribution of these sources in parameter space, such as gamma-ray spectral index.However, the "other-like" category is more complex, including both Galactic and extragalactic components, and the sample size for each individual category is small (the maximum is 114).This poses difficulties in constructing machine learning models for further classification, which are more suitable for large-scale data mining and analysis.Additionally, according to Abdollahi et al. (2022), unassociated sources near the Galactic plane (Gus, −3 • < b < 3 • ) have been occupying an increasing proportion in recent years and exhibit an unusually dense distribution.Furthermore, a special distribution is observed in a subset of LGL unassociated sources characterized by extreme softness (SGUs, Γ > 2.4), with no known Galactic gamma-ray emitting sources similar to them.Currently, it is speculated that residual background or undiscovered new gamma-ray sources contribute to this phenomenon.Our study has largely ruled out the association of these sources with AGN and pulsars, but there is still a significant gap in achieving detailed classification.
In this study, we used the BG model to filter out softspectrum excessed samples from LGL AGN cores and removed them from the training and testing sets of the LGL classifier.To validate the effectiveness of removing abnormal samples using the BG parameter model, we compared the models trained on different datasets, including datasets with and without the removal of soft-spectrum abnormal AGN and their incorporation into the Other-like category.The specific results can be found in Appendix B. The results showed that after removing 81 soft-spectrum abnormal AGN from the training set, the test balanced accuracy of various classification models significantly improved, yielding stable results.However, the obtained AGN-like candidate samples still included some soft-spectrum abnormal samples.Furthermore, the results indicated that minor changes in the training set would not significantly affect the accuracy of the classifier but would completely alter the classification predictions.
Previous research has explored various methods for handling imbalanced sample classification, such as adjusting training sample weights and oversampling techniques like Synthetic Minority Over-sampling Technique (SMOTE), in the context of classification e.g., Zhu et al. 2021;Bhat & Malyshev 2022).However, these studies primarily focuses on evaluating the performance of classifiers, such as accuracy, on the training and test sets, without discussing the impact on prediction results.In this study, we investigate the effects of adjusting model hyper-parameters and applying the SMOTE algorithm for oversampling on the accuracy of a binary classifier for unassociated sources at high Galactic latitudes.We also provide evaluation results of different models for high Galactic latitude unassociated sources.Detailed information can be found in Appendix C. The results show that artificially changing the weights of training samples does not decrease the accuracy of the classifier, and may even improve it.However, significantly alters the prediction results.
Previous investigations have delved into a multitude of methodologies to address imbalanced sample classification.However, these studies primarily focused on assessing classifier performance using metrics such as accuracy on training and test sets, without considering the impact on prediction outcomes.We have thoroughly examined the effects of adjusting model hyper-parameters and employing the SMOTE algorithm for oversampling on the accuracy of a binary classifier dedicated to high Galactic latitude unassociated sources in this work.Additionally, we have presented evaluation results for diverse models targeting high Galactic latitude unassociated sources.For comprehensive details, please refer to Appendix C. The outcomes reveal that modifying the weights of training samples artificially does not diminish classifier accuracy; in fact, it may even enhance it (refer to Table C1).Nevertheless, this practice significantly alters the prediction results and leads to more substantial disparities in classification outcomes among different classi-fiers.Thus, further scrutiny is imperative to ascertain the reliability of this approach.
Because of the unique properties of astronomical data, sources with higher significance are usually detected first, while sources with lower significance are more difficult to identify.In high Galactic latitude regions, the detection of samples is relatively easier, whereas in the Galactic plane, the high source density and strong background radiation pose challenges for detection.This evident imbalance challenges the assumption of sample representativeness in machine learning.This problem is not only present in the classification of unassociated sources in Fermi-LAT but also in various other astronomical applications of ML.
Based on the above analysis, we provide the following recommendations for ML classification of Fermi-LAT unassociated sources: 1.The high Galactic latitude classification has been successful, so it is advisable to focus more on the low Galactic latitude region for further improvement.
2. Considering the systematic differences between unassociated sources and associated sources, it is important to prioritize the rationality of classification results when constructing the model, rather than solely focusing on the classifier's performance on known samples.In such an imperfect dataset, the accuracy of the classifier alone cannot fully represent its performance.It is necessary to involve a reasonable assessment of the predictive capability of the samples.
This study divided the Galactic latitude region into high and low regions using a threshold of |b| = 10 • .However, it is worth noting that there is a significant enrichment of sources in the region where 10 • < b < 20 • within the range of mid-to-low Galactic latitude range, and previous classification results have also shown considerable differences.Additionally, although we partially considered the differences between high and low Galactic latitude samples, we did not account for variations in significance levels and variability between high and low classes.This aspect will be considered in future work.A single algorithm overlooks this situation, while multiple classifiers employing the "All-agree" strategy classify inconsistent results as the "MIXED" category.The bottomright panel of Figure A1 shows the distribution of the MIXED class in Galactic latitude, while the left panel of Figure A2 presents the distribution of gamma-ray spectral indices for the MIXED class.As shown in the right panel of Figure A2, different classifiers trained on all-sky samples yield completely different AGN-like candidate sets.By using the "All-agree" strategy, only a small fraction of the common sample is considered as AGN-like candidates, while the majority is assigned to the "MIXED" category.The concentration of the "MIXED" category in the LGL region eliminates a significant number of AGN candidate sources near the Galactic plane.It also addresses the issue of an excess of soft-spectrum sources in LGL AGN-like candidates.However, it is worth noting that the "MIXED" category is primarily concentrated near the Galactic disk.In previous attempts, the "MIXED" category accounted for over 50% in the LGL region, meaning that over half of the sources at LGL did not receive successful classification.The nature of these sources in the "MIXED" category still needs to be explored.
In our LGL classification framework, all training samples are LGL sources.Additionally, we have developed a BG model to handle soft-spectrum AGN.In our ML classification, the results of individual sub-classifiers tend to be consistent, providing reasonable classification results for all LGL samples.B1.
When training the LGL classification model using the complete Dataset 1 of 407 AGNs, 166 pulsars, and 207 Others, the balanced accuracy obtained through 5-fold stratified cross-validation was only around 75%. Evaluation of 1166 unassociated LGL sources revealed approximately 600 AGN candidates, 90-168 pulsar candidates, and 400-450 Other candidates.The results indicated that AGNs dominate the unassociated LGL sources.
After removing the 81 samples identified as "non-AGN" by the BG model from the training set, the number of predicted other-like candidates rapidly increased to approximately 700, while the number of AGN candidates decreased to around 300.The balanced accuracy of the training set improved to approximately 80%, and the excess of softspectrum sources in AGN candidates was alleviated.The standard deviation of the cross-validation balanced accuracy decreased, indicating increased model stability.However, when re-evaluating the AGN candidates using the BG model on Dataset 2, approximately 83 sources were still classified as misclassified soft-spectrum sources, suggesting that the predicted results were still contaminated.Abdollahi et al. (2022) suggested the presence of 75 ± 4 non-AGN spectral sources in the LGL AGN dataset, possibly originating from the Galactic component.If we include the 81 samples identified as "non-AGN" by the BG parameter model in the other-like for model training, the number of predicted other-like candidates further increased to approximately 850, while the number of AGN candidates decreased to around 200 (see Table B1).The number of AGN-like candidates is similar to the high-confidence candidates obtained by BG model re-evaluating of Dataset 2 ML classification model.This suggests that these 81 soft-spectrum sources may indeed belong to a category other-like than AGN and pulsars, significantly impacting the ML classification of LGL sources.However, further evidence is still needed to support this viewpoint.
There are a total of 781 associated LGL sources, and the only difference between these three training sets is the presence of 81 AGN samples.The influence of different training sets on the accuracy of ML models on training and testing sets is limited (approximately 5%), but it leads to significant differences in the prediction number for unassociated sources.Different training sets yield distinct results regarding whether low-latitude unassociated sources are dominated by AGNs or Other-like sources.This highlights the significant impact in prediction of even minor changes in the dataset during the ML model training process.
Although the prediction results for AGN and Other categories remain relatively stable after removing lowconfidence soft-spectrum AGNs, there are still significant differences in the number of pulsar candidates among different classifiers.This raises concerns about the reliability of pulsar candidate results and may require further consideration of the purity of pulsar training samples.

APPENDIX C: EFFECT OF HYPER-PARAMETER TUNING AND OVERSAMPLING METHODS ON THE HGL CLASSIFICATION
The HGL AGN-like and pulsar-like binary classification is a typical imbalanced classification problem.In the scikitlearn library, many classification algorithms have a hyperparameter called "class weight" (Pedregosa et al. 2011).This hyper-parameter is used to adjust the weights of different class samples in imbalanced classification problems, to avoid bias towards the majority class.The default value for  In this study, four supervised ML classification algorithms were used, with the MLP algorithm not having a "class weight" hyper-parameter.For the other three algorithms, LR, SVM, and RF, the "class weight" was set to "balanced," and they were compared with the classifiers that did not use the "'class weight" parameter.After training and optimizing the models using the methods described in Section 3.2, the balanced accuracy of these seven classifiers was evaluated on the 5-folds stratified cross validation, and predictions were provided for the 1125 unassociated sources (see Table C1).
SMOTE is a synthetic oversampling method used to address class imbalance classification issues (Chawla et al. 2002).In imbalance classification problems, the minority class has a smaller number of samples, leading to poor performance of the classifier in learning and predicting the minority class.SMOTE balances the dataset by synthesizing new samples for the minority class, improving the performance of the classifier.In this study, the SMOTE algorithm from the imbalance-learn library was used to oversample the pulsar samples to match the number of AGN samples (Lemaitre et al. 2017).The constructed dataset was then used to train LR, SVM, RF, and MLP classifiers for HGL classification.After training and optimizing these classifiers, their performance on the test set and the prediction results were evaluated and presented (see Table C1).
In this context, balanced accuracy was used instead of accuracy to evaluate the models.As shown in the Table C1, increasing the weight of the pulsar samples or increasing Note.Column (1) represents the estimator, which refers to the classification algorithm used.Column (2) shows the test balanced accuracy, which represents the accuracy of the model on the test set.Column (3) -( 4) indicates the number of sources classified as AGN-like and pulsar-like.
their number through SMOTE led to an improvement in the accuracy of the pulsar class, resulting in an overall increase in balanced accuracy.By adjusting the weight parameters by turning hyper-parameter, the balanced accuracy on the test set increased from 90% to over 96%.However, there were significant differences in the predictions between different classifiers, and the predicted number of pulsar candidates also showed large fluctuations.After applying SMOTE for weight adjustment, the balanced accuracy on the test set rapidly increased from 90% to over 99%.Different classification algorithms responded differently to SMOTE, for example, MLP showed little change in the prediction results for the 1125 unassociated sources, while the other three algorithms exhibited significant changes.Similarly, there were significant differences in the predictions between different classifiers, and the predicted number of pulsar candidates also showed large fluctuations.Although these methods significantly improve the accuracy on the training and testing sets, they also widen the differences in prediction results among different classifiers.Therefore, it cannot be concluded that the classifiers have been optimized and resulted in more reliable predictions.
These results indicate that artificially changing the weights of samples or using oversampling methods like SMOTE does not necessarily lower the performance of classifiers on the training and test samples; in fact, they may even enhance the performance.However, they significantly alter the prediction results.This reminds us that the inherent differences in sample quantities are important parameters in machine learning training.It also highlights that making subtle changes to the dataset during the model training process can have an impact on performance metrics such as accuracy and may lead to significant differences in classification results.
below panel 5 .From the figure, we can observe how classifier performance varies

Figure 1 .
Figure 1.Distribution plots of the gamma-ray spectral index, variability index, and log-parabolic fit significance for associated and unassociated sources in HGL.The top panel represents AGN-like and pulsar-like associated samples, the middle panel represents unassociated sources, and the bottom panel represents the results of the ML classification.

Figure 2 .
Figure 2. The RFE curve plots and hyper-parameter gird search curve plots of four classifiers in HGL areas.

Figure 3 .
Figure 3. Distribution plots of the gamma-ray spectral index, variability index, and log-parabolic fit significance for associated and unassociated sources in LGL.

Figure 4 .
Figure 4.The spectral index distribution of the Bayesian Gaussian parameter evaluation results for low-latitude AGN-like candidates and associated AGNs.The gray line represents all sources, the red region represents sources with low confidence according to Bayesian Gaussian parameter evaluation, and the blue region represents the remaining samples.

Figure 6 .
Figure 6.The gamma-ray spectral index distribution of lowlatitude unassociated sources, associated sources, and the candidates identified by the ML.The gray line represents all the unassociated sources, the red color represents associated sources of different categories, and the blue color represents candidates of different categories identified by ML.

Figure 7 .
Figure 7. All-sky scatter plot and density distribution of associated and candidates of AGN-like, Pulsar-like, and other-like sources.The upper panel shows the AGN category, with candidates represented in green, associated sources in gray, and low-confidence samples removed by the BG model in red.The middle panel displays the pulsar category, with candidates shown in red and associated sources in gray.The lower panel presents the other-like category, with candidates depicted in green and associated sources in gray.The density distribution curves provide insights into the spatial distribution of unassociated sources across Galactic latitude and longitude.

Figure A2 .
Figure A2.The density Distribution of "MIXED" class and AGN-like candidates in Gamma-ray Spectral Index APPENDIX B: EFFECT OF REMOVING SOFT SPECTRAL AGNS ON THE LGL CLASSIFICATION: RESULTS FROM BG MODEL In this study, we used a Bayesian-Gaussian parameter model to exclude 81 soft spectral outliers in LGL AGNs and removed them from the training and testing sets of the LGL classifier.To validate the effectiveness of the BG model in removing soft spectral AGNs, we conducted comparative experiments.These experiments involved training classification models on different datasets: Dataset 1, the complete training set without removing the 81 soft spectral AGNs; Dataset 2, the training set with 81 soft spectral outliers removed (the current dataset); and Dataset 3, where the 81 spectral outliers were add to Other-like.The results are presented in Table

"
class weight" is "None", which means no artificial weights are added during the training process.For a binary classification problem where class a has Na samples and class b has N b samples, setting the "class weight" to "balanced" assigns a weight of N b /(Na + N b ) to class a and a weight of Na/(Na + N b ) to classb.This way, the majority class is suppressed while the minority class is emphasized, achieving balanced classification.

Table 1 .
The feature parameters used for classification 1 gll psc v30.fit, see https://fermi.gsfc.nasa.gov/ssc/data/access/lat/12yr_catalog/gll_psc_v30.fit.Please note that this work is based on the v30 version.The latest version v31 has made modifications to some keywords and incorrect TeV associ-

Table 3 .
The information of the low-latitude classification models

Table B1 .
Classification Results of Low Galactic Latitude Sources with Different Trainingsets ) represents the estimator, which refers to the classification algorithm used.Column (2) shows the test balanced accuracy, which represents the accuracy of the model on the test set.Column (3)-(5) indicates the number of sources classified as AGN-like, pulsar-like and other-like.Dataset1, Dataset2, and Dataset3 refer to different datasets used for model training and test.

Table C1 .
The classification results of the high Galactic latitude classifier using different class weights