Minute-Cadence Observations of the LAMOST Fields with the TMTS: V. Machine Learning Classification of TMTS Catalogues of Periodic Variable Stars

Periodic variables are always of great scientific interest in astrophysics. Thanks to the rapid advancement of modern large-scale time-domain surveys, the number of reported variable stars has experienced substantial growth for several decades, which significantly deepened our comprehension of stellar structure and binary evolution. The Tsinghua University–Ma Huateng Telescopes for Survey (TMTS) has started to monitor the LAMOST sky areas since 2020, with a cadence of 1 minute. During the period from 2020 to 2022, this survey has resulted in densely sampled light curves for ~30,000 variables of the maximum powers in the Lomb-Scargle periodogram above the 5 𝜎 threshold. In this paper, we classified 11,638 variable stars into 6 main types using XGBoost and Random Forest classifiers with accuracies of 98.83% and 98.73%, respectively. Among them, 5301 (45.55%) variables are newly discovered, primarily consisting of 𝛿 Scuti stars, demonstrating the capability of TMTS in searching for short-period variables. We cross-matched the catalogue with 𝐺𝑎𝑖𝑎 ’s second Data Release (DR2) and LAMOST’s seventh Data Release (DR7) to obtain important physical parameters of the variables. We identified 5504 𝛿 Scuti stars (including 4876 typical 𝛿 Scuti stars and 628 high-amplitude 𝛿 Scuti stars), 5899 eclipsing binaries (including EA, EB-and EW-type) and 226 candidates of RS Canum Venaticorum. Leveraging the metal abundance data provided by LAMOST and the Galactic latitude, we discovered 8 candidates of SX Phe stars within the class of " 𝛿 Scuti stars". Moreover, with the help of 𝐺𝑎𝑖𝑎 color-magnitude diagram, we identified 9 ZZ ceti stars.


INTRODUCTION
Since ancient times, people have been fascinated by changes occurring in the night sky.Large-scale surveys play a crucial role in the study of variables in the modern era.Over the past decade, mis-sions like the All-Sky Automated Survey for Supernovae (ASAS-SN, Jayasinghe et al. 2019), the Catalina Sky Survey (CSS, Drake et al. 2014), the Asteroid Terrestrial-impact Last Alert System (ATLAS, Heinze et al. 2018) among others offer unique opportunities to explore variable stars.The  DR2 provides a detailed catalogue of more than 1 million variable stars, including ~228 thousand RR Lyrae stars, ~151 thousand long-period variable stars (LPVs), ~147 thousand rotational variables, etc (Brown et al. 2018;Holl et al. 2018).The color displays the maximum number of uninterrupted exposures within a single night, as in TMTS-II.We used the HEALPIX package (http://healpix.sourceforge.net)with NSIDE=128 to plot this sky map (Gorski et al. 2005).
,  BP and  RP integrated photometric band measurements as well as parallaxes with unprecedented accuracy substantially help to locate stars in the  color-magnitude diagram (CMD), which facilitates classification and further studies of stellar evolution history.The Optical Gravitational Lensing Experiment (OGLE) sky survey has started sky monitoring for more than 30 years.Initially designed as hunter for gravitational microlensing events and dark matter, the OGLE has obtained photometric datasets for millions of variables in the Galactic bulge, Galactic disk and the Magellanic Clouds (Udalski et al. 2015).In the foreseeable future, the Large Synoptic Survey Telescope (LSST, Ivezić et al. 2019) may discover billions of variable stars, potentially ushering in a transformative era in the field of astronomy.However, due to the long sampling cadences (e.g.≥ 1 day), previous large-scale surveys are constrained in their ability to identify short-period variables.The advent of high-cadence surveys, such as the ZTF high-cadence Galactic Plane Survey with a cadence of 40 sec (Kupfer et al. 2021), has brought groundbreaking opportunities to the identification of variables with periods below several hours, such as ultra-compact binaries (UCBs) and compact pulsators.
Study of variable stars is an essential part in time-domain astronomy, which focuses on transients and violent outbursts in the universe.Pulsating stars are a vital category among variable stars.Situated within the instability strip, pulsating stars mainly consist of Cepheids,  Scuti stars, RR Lyrae stars, ZZ Ceti stars and LPVs.Mainly triggered by the  mechanism, these pulsators follow periodluminosity ( − ) relations, among which the well-defined  −  relations of Cepheids (Leavitt & Pickering 1912) rendering them invaluable for measuring cosmic distances and aiding in the determination of the Hubble constant (Sandage et al. 2006;Riess et al. 2018).In the big family of pulsating variables, stars manifest significant discrepancies in their periods. Scuti stars are a subclass of pulsating stars which undergo multi-mode pulsations (Breger 2000) with periods below 0.3 d.This bound is somewhat arbitrarily set to distinct them from classical pulsators such as RR Lyrae stars and Cepheids.Pre-main-sequence A-type stars are the fastest known  Scuti pulsators (Holdsworth et al. 2014), among which HD 34282 has the shortest period ( = 18.12 min) ever observed (Rodríguez 2004).The study of  Scuti stars would answer some long-standing questions in stellar physics, such as the pulsator fraction within the  Scuti instability strip as well as the determination of the edges of the instability strip (Murphy et al. 2019).On the other hand, the periods of Cepheids are usually a few days.For LPVs like mira variables which are characterized by very red colors, periods can extend to hundreds or even thousands of days (Iwanek et al. 2022).This vast disparity in periods reflects the abundance and diverse nature of pulsating stars.
Eclipsing binaries are another important subclass of transients discovered by wide-field survey projects.Binaries play a crucial role in understanding stellar evolution history (e.g. common envelope evolution, mass exchange and stellar winds, Taam & Sandquist 2000;Plavec 1968;Theuns & Jorissen 1993).In addition, they could be progenitor of Type Ia Supernovae (Rebassa-Mansergas et al. 2019), which is of significant importance in cosmology.Periods of binaries range from several hours to a few days, while short-period binaries have been getting increasing attention recently, since the mergings of UCBs, whose period are typically less than an hour (Chen et al. 2020b), are the most prominent gravitational wave source (Postnov & Yungelson 2014).Observed by the LIGO-Virgo detector network, the gravitation-wave signal GW170817 is explained as a binary neutron star merger, while GW190814 is generated by a compact binary coalescence involving a black hole and a compact object (Abbott et al. 2017(Abbott et al. , 2020)).Several short-period eclipsing binaries, such as the AM CVn stars SDSS J0926+3624 ( = 28 min, Copperwheat et al. 2011), the double-white-dwarf binary SDSS J0651+2844 ( = 12.75 min, Brown et al. 2011) and ZTF J1539+5027 ( = 6.91 min, Burdge et al. 2019), have periods considerably shorter than the 0.22 day short-period limit of contact eclipsing binaries (Rucinski 1992), posing a new challenge to this short-period limit.
In the past decade, machine learning technique becomes increasingly prevalent in astronomy, with the astronomical data underwent exponential expansion, (e.g., Richards et al. 2011;Bloom et al. 2012;Kim & Bailer-Jones 2016;Naul et al. 2018;Hosenie et al. 2020).Holding great potential to enhance our interpretation of vast and complex data, machine learning algorithms effectively resolve regression, classification, clustering and dimension reduction problems.In practice, machine learning techniques show particular promising in tackling intriguing problems such as galaxy identification (Krakowski et al. 2016), variable star classification (Jayasinghe et al. 2019;Chen et al. 2020a) and exoplanet exploration (Nixon & Madhusudhan 2020).
The Tsinghua University-Ma Huateng Telescopes for Survey (TMTS), consisting of four optical telescopes, is located at the Xinglong Station of NAOC.TMTS has a field of view (FoV) up to 18 deg2 (Zhang et al. 2020), covering a wavelengths range from 400 to 900 mm.For an exposure time of 60s, TMTS can reach ~19.4 mag in white light (or Luminous filter) at 3 detection limit.To ensure the quality of our analysis, we only picked the light curves (LCs) with at least 100 uninterrupted measurements (Lin et al. 2022, hereafter TMTS-I).TMTS has already yielded progress in several aspects, including the identification of over 1000 short-period variables (i.e.,  < 2 hr, Lin et al. 2023c, hereafter TMTS-II), the record of 125 flare stars (Liu et al. 2023), the discoveries of a 18.9-min blue large-amplitude pulsators (BLAPs, Lin et al. 2023b) and a 20.5-min sdB binary (Lin et al. 2023a).In this paper, we search for periodic variables ( < 7.5 hr) using two advanced algorithms (XGBoost and Random Forest) in the TMTS Catalog of Periodic Variable Stars.Observations and data are introduced in Section 2, and the method and result of the classification are discussed in Section 3. Section 4 explains the characteristics of the catalogues.We discuss the overall catalogue and each type of variables in detail in Section 5. A summary of our work is given in Section 6.

DATA
Since 2020, TMTS has monitored 449 LAMOST/TMTS plates by the end of 2022, covering a total of 6977 deg 2 of the sky, which are shown in Figure 1.This effort leads to the production of 19,099,266 uninterrupted LCs with more than 100 epochs.We adopted the Lomb-Scargle periodogram (LSP;Lomb 1976;Scargle 1982) to analyze these LCs.Using the distribution of modified false-alarm probability (FAP) as adopted in papers of TMTS-I and -II, we calculated the maximum powers in LSP (hereafter  max ) and determined the 5 and 10 thresholds for each source.Among the millions of sources detected, the numbers of those with  max > 5 and > 10 thresholds are > 30000 and > 8000, respectively.The TMTS LCs are cross-matched with  DR2 (the pipeline system of TMTS cross-matched DR2) and LAMOST DR7 to obtain some crucial photometric and spectroscopic parameters, such as dereddening color ( p −  p ) 0 , effective temperature  eff and spectral type.We derived the G-band absolute magnitude ( abs ) of variables with reliable parallax (i.e., /  ≥ 5.0) as in paper TMTS-II.
To ensure the accuracy of our classifications of the TMTS periodic variables, we first cross-matched the TMTS LCs with the variable sources of the International Variable Star Index (VSX, Watson et al. 2006)1 , which is a database providing comprehensive information of over 2200,000 known and suspected variable stars.This cross match indicates that 6801 sources have been recorded and identified by the VSX.To augment the dataset, a manual examination was conducted, resulting in another 1015 sources with assigned classifications.Among the (6801 + 1015 =) 7816 identified sources, 6146 have  max above the 10 threshold.
We visually inspected all the LCs of the 6146 sources and excluded some low-quality (low SNR ratio, i.e.SNR < 20) LCs.It is worth noting that classifying a variable star based solely on the information extracted from the LCs can be challenging.Incorporating supplementary information, especially color and absolute magnitude, can significantly enhance the reliability of classification.The  color-magnitude diagram (CMD), a close counterpart to the Hertzsprung-Russell diagram, is a useful tool for the classification, as it enables the segregation of variables into distinct regions based on their characteristics (Eyer et al. 2019).For instance,  Scuti stars locate in the instability strips, while eclipsing binaries can appear anywhere in the CMD.
Considering the importance of  ( p −  p ) 0 and  abs , we exclusively deemed the classification of sources with these information to be reliable.A total of 4506 sources were finally selected as the labelled dataset (hereafter Dataset-I), which serves as the training set and test set to train the classifiers and evaluate their performances.Dataset-I contains 1120 typical  Scuti stars (DSCT), 256 high-amplitude  Scuti stars (HADS), 49 RS Canum Venaticorum (RS), and 3081 eclipsing binaries.The eclisping bianries includes 2985 W UMa type (EW), 42 Algol type (EA) and 54  Lyr type (EB)."DSCT" refers to typical  Scuti stars which include subtypes of DSCT, DSCTc (obsolete VSX type designation, low-amplitude  Scuti stars with V-band light amplitude < 0.1 mag) and DSCTr (VSX type designation, a subtype of  Scuti stars in ASAS-3).SX Phe variables are metal-poor pulsating sub-dwarfs which resemble  Scuti stars phenomenologically.As SX Phe variables can hardly be identified through LCs, they are included in the class "DSCT" as well (Soszyński et al. 2021).We will discuss candidates of SX Phe variables using the metal abundance data provided by LAM-OST and the Galactic latitude in Section 4. We use "HADS" to represent HADS and HADS(B) (VSX type designation, first/second overtone double-mode HADS).The criteria that distinguish HADS from DSCT is that the former light-variation amplitude exceeds 0.3 mag in the  band.While this threshold may appear somewhat discretionary, HADS and DSCT exhibit divergent characteristics in light of the LC shapes, rotation speed, evolutionary stages and so on (Chang et al. 2013;Chen et al. 2020a;McNamara 2011).We therefore regard them as two distinct groups.To avoid ambiguity, we use " Scuti stars" to encompass the entire category (including typical  Scuti stars and high-amplitude  Scuti stars), while "DSCT" is used to denote typical  Scuti stars and "HADS" specifically designates high-amplitude  Scuti stars.The sample sizes for other types are insufficient (i.e., number < 30) to establish datasets with adequate statistical robustness without risking overfitting, even with carefullydesigned over-sampling algorithms.This overfitting could arise from that the models misinterprete the specific characteristics of individual light curves as general traits of the entire class.In the future, we intend to employ novelty detection algorithms for the identification of such variables.
We randomly checked 300 unlabeled LCs, finding a relatively low false-positive rate in the 10 threshold LCs.However, over 70% of the 5 threshold LCs were recognized to be non-astrophysically variable sources, as a typical concern in high-cadence surveys (Kupfer et al. 2021).We thus retained all the 10 threshold LCs, while applying an additional minimum threshold of  max = 30 for the 5 threshold LCs.This selection process yields ~8000 unlabeled LCs, referred to as Dataset-II.
Dataset-I and II only include limited kinds of variables due to the observation strategy of TMTS (see paper TMTS-I).The upper period bound of TMTS is ~7.5 h, preventing us from discovering variables with longer periods, such as Cepheids and LPVs.In fact, the longest period detected in Dataset-I and -II is 6.21 h.Periods of Cepheids are typically above 1 d, far exceeding the upper period limit of TMTS.RR Lyrae stars are another frequently observed stars with periods of 0.2-1.0d.A considerable portion of RRc variables having periods below 0.3 d (7.2 h), falls within the detection range of TMTS.However, they are old and commonly present in globular clusters located around the Galactic core (Soszynski et al. 2014), which is not covered by TMTS, resulting in their scarcity in our catalogue.

Method
In pursuit of higher prediction accuracy, we trained two classifiers, the XGBoost (XGB) classifier and the Random Forest (RF) classifier.Both classifiers would undergo training and testing in Dataset-I, and then predict the classes of each LCs in Dataset-II as well as providing a probability of the classification (Chen et al. 2015;Breiman 2001).XGBoost and random forest are two popular machine learning algorithms based on ensemble learning techniques, where multiple decision trees are integrated to generate predictions, constructing a powerful knowledge discovery and data mining model (Dong et al. 2020).Both algorithms can be paralleled to speed up the training process and are robust to over-fitting.However, there are still certain differences between them.Random forest creates multiple weak decision trees independently, aggregating the predictions given by each tree to reach the ultimate result.XGBoost, using gradient boosting, optimizes the loss function by iteratively adding new trees.Comparing with neural networks (NNs), the two algorithms are more interpretable and require less preprocessing of the data.For instance, NNs exhibit sensitivity to the initial phases of LCs, which are external to the variability.Consequently, preprocessing steps or structural adjustments of the neural network are necessary (Zhang & Bloom 2021).Moreover, XGBoost and random forest algorithms can provide insight into the importance of each feature input, allowing us to identify the key characteristics of the LCs to help distinguish different types.Although NNs may work better in large and complex datasets (Bishop 1994), the capabilities of our classifiers are sufficient for the study.
Dataset-I is extremely imbalanced, as illustrated in Figure 2, where the number of EWs is more than 70 times that of the least abundant EAs.Considering eclipsing binaries alone, VSX recorded 812 EAs, 121 EBs and 62,356 EWs with periods below 7.5 h, roughly consistent with the proportion of these variables in Dataset-I.Due to the small numbers of some classes and the modest size of Dataset-I, it is hard to build a balanced training set.Although XGBoost and Random Forest algorithms can handle imbalanced datasets, the result without any preprocessing was not satisfactory.So we used Synthetic Minority Oversampling Technique (SMOTE, Chawla et al. 2002), an over-sampling technique, to prepare for a balanced training set.Based on the K-Nearest Neighbors algorithm, SMOTE synthesizes new instances for the minority classes by interpolating between existing samples from these classes, effectively increasing their representation.
We extracted some basic features (i.e.,features not related to period and the light curve fitting) of the LCs with the help of the UPSILoN package (Kim & Bailer-Jones 2016).We simply used the "extract_features" module instead of its classification algorithm.Due to the importance of period determination, we carefully calculated it through LSP (see also paper TMTS-II) and fit the LCs with a fourth-order Fourier function (Chen et al. 2020a): where   (i = 0,1,2,3,4) and   (i = 1,2,3,4) represent the Fourier amplitudes and phases in each order.Notice that we used a fourthorder fitting, while UPSILoN chooses a third-order function.We studied the residual of the third-order fitting, finding that nearly half of the residuals exceed the error of the magnitude.On the contrary, the fourth-order fitting leads to only 7.4% of the residuals above the magnitude error.Then we calculated some critical parameters like  21 and Φ 21 based on the fitting.Considering the challenge of classifying variable stars based on LCs alone, we included  ( p −  p ) 0 and  abs into the feature list.Since only a small fraction of the sources in Dataset-I and II have LAMOST log ,  eff and spectral type, we did not select these features.The complete set of features and their explanations in both classifiers are summarized in Table 1.
Features that exhibit low correlations are generally preferred in machine learning classification, as they are more likely to provide independent information, thereby contributing to the model's ability to generalize and improve the classification accuracy.However, correlated features also have positive impacts.They enhance the stability of the algorithms by offering complementary information, and reduce the risk of overfitting by preventing the algorithms from relying excessively on an individual feature.Commonly, a correlation coefficient  above 0.8 or below -0.8 would be considered a strong  G G-band (330 nm to 1050 nm) absolute magnitude provided by  DR2.Amplitude Peak-to-peak amplitude obtained from the fourth-order Fourier fitting of the LCs.

Cusum
The range of the cumulative sum of the LCs.Cusums of LCs with longer-term variability are usually larger.Kim et al. (2014) Eta Measures the degree of trends in a long-term baseline, which is useful in separate variables with different periods.Kim et al. (2014) Hl_amp_ratio Ratio between higher and lower amplitude than average.Hl_amp_ratios for EAs are higher by definition.

Kurtosis
The fourth standardized moment of the distribution, measuring the tailedness of the distribution. Kurtosis= Difference between 75% percentile and 25% percentile of the LCs.The third standardized moment of the distribution, measuring its asymmetry. Skewness=

Stetson_k
Stetson K index, describing the shape of the LCs. 2 /  1 , 2nd to 1st amplitude ratio obtained from the fourth-order Fourier fitting.R 31  3 /  1 , 3nd to 1st amplitude ratio obtained from the fourth-order Fourier fitting.R 41  4 /  1 , 4nd to 1st amplitude ratio obtained from the fourth-order Fourier fitting.Φ 21 Φ 2 -2Φ 1 , the phase difference between 2nd and 1st phase obtained from the fourth-order Fourier fitting.Φ 31 Φ 2 -3Φ 1 , the phase difference between 3rd and 1st phase obtained from the fourth-order Fourier fitting.
correlation.According to this criterion, as illustrated in Figure 3, most of the features in our feature set exhibit weak or moderate correlations.However, strong correlations exist between certain features.For example,  abs and ( p −  p ) 0 show a positive correlation with  = 0.9, because of the distribution pattern in the CMD.Similarly, quartile 31 , weighted_std and amplitude show a strong correlation, because they represent different measurements of LC variabilities.Shapiro_w and hl_amp_ratio exhibit a noticeable negative correlation with r = −0.82.This relationship can be readily comprehended, as shapiro_w measures the extent to which a distribution deviates from a normal distribution.Larger deviations (as those found in the LCs of EAs) tend to result in an increase in hl_amp_ratio (a ratio between higher and lower amplitude relative to the average).
Figure 4 shows the importance of all features used in our classification, where the importance is represented by a parameter named as F 1 score.The more a feature contributes to the improvement of classification accuracy, the greater its gain and therefore its F 1 score.Among various features, the most important one is the period, highlighting the critical role of precise period determination.Features that characterize the shape of the LCs from different aspects, such as kurtosis, amplitude and R 21 , are of significant importance.Additionally, in accordance with our predictions,   abs and ( p −  p ) 0 also have great effects on the classifications.

Accuracy
Three scores are used to evaluate the performance of the classifiers.Precision measures the fraction of true positives relative to the predicted positive instances: (2) Recall refers to the proportion of true positives among actual positive instances: Where the true positives refer to the samples correctly classified as belonging to a specific class, while false positives indicate samples that do not belong to this class but are incorrectly classified as part of it.Moreover, false negatives represent the samples that actually belong to that class but are incorrectly classified as not belonging.A clear trade-off exists between these two measurements.Aiming to im-prove precision, the algorithm tends to minimize predicting positive samples to mitigate false positives.On the contrary, for optimizing recall, the model is inclined to select more positive samples to avoid missing any possible positives.Thus, F 1 score is the harmonic mean of precision and recall, which provides a balanced measure of the algorithm's performance: The normalized confusion matrices of both classifiers is shown in Figure 5.We used two measurements to evaluate the overall performance of the classifiers.Macro-average treats each class fairly, while weighted-average gives each type a weight that is proportional to the number of samples in that class.As our dataset is heavily imbalanced, weighted-average is more appropriate in measuring the performance of the classifiers.Some analogous patterns can be discerned in both confusion matrices.Both classifiers exhibit excellent performance in distinguishing EAs and RS CVn candidates, potentially owing to the distinctive shapes of their light curves.The overall performance of the XGB classifier is slightly better than the RF classifier.The macro-average and weighted-average F 1 scores of the XGB classifier are 98.81% and 98.84%, respectively.The precision, recall and F 1 scores for each type are summarized in Table 2.The F 1 scores for all six types are higher than 96%, which indicates that the variable stars are well distinguished by the XGB classifier.With F 1 scores ≥ 99%, the DSCT, EA, EB and HADS classes are very well classified, while the scores of EWs appear to be relatively diminished.
For the RF classifier, its performance is summarized in Table 3.All the F 1 scores are higher than 96%, with the macro-average and weighted-average F 1 scores of 98.71% and 98.74%.Similar to the situation of the XGB classifier, the EW class has lower F 1 scores than other classes.In both classifiers, the relatively lower F 1 scores of EWs may potentially attributed to the contamination of EBs and RS CVn candidates, as illustrated in Figure 5. EBs are semi-detached eclipsing binaries occupying an intermediate evolutionary state between EAs and EWs, and in some cases their LCs closely resemble those of EWs, leading to challenges in their differentiation.In fact, sometimes the classification of eclipsing binaries has been simplified to primarily two categories (Chen et al. 2020a): EA-type (detached) and EWtype (contact).Furthermore, some RS CVn variables are themselves eclipsing binaries, which explains the occasional similarity between their LCs and those of EWs.
In both classifiers, the F 1 score for DSCT, HADS, EA and EB exceeds 99%, while EW slightly lags behind at 96%.These results underscore the effectiveness of our feature selection and algorithm training processes.However, we shall point out that the confusion matrices and the scores are derived from Dataset-I (the labelled dataset), while Dataset-II (the unlabelled dataset) might contain fainter objects with noisier light curves, making them more susceptible to misclassification.As a consequence, the scores might be degraded in Dataset-II.Additionally, we acknowledge that our training and test datasets are relatively limited in size and may not yield the same level of accuracy if expanded.
When appling the classifiers to Dataset-II, we find that some objects do not belong to the 6 main classes explored in this paper.To avoid misclassifications, we conducted a comprehensive manual inspection of all sources, which should minimize the probability of such errors.

CATALOGUES
We identified 11,638 variables stars based on the first three-year surveys, which we regard as TMTS Catalogues of Periodic Variable Stars, including 4876 DSCT (including 910 released in the shortperiod variable stars catalog in TMTS-II), 628 HADS (including 166 released in the short-period variable stars catalog in paper TMTS-II), 117 EA-type eclipsing binaries, 84 EB-type eclipsing binaries, 5698 EW-type eclipsing binaries, 226 candidates of RS CVn variables and 9 ZZ Ceti stars.Note that the (910+166=)1076  Scuti stars identified in TMTS-II were all classified as  Scuti stars, as TMTS-II did not distinguish between DSCT and HADS.All the LCs in our catalog have been visually checked.Table 4 shows all and newly discovered variable stars we identified.Among them, 4297 DSCT are newly discovered, accounting for 88.13% of all DSCT in our catalogue.Comparing to the 9162 DSCT recorded by VSX (including known and suspected DSCT, DSCTr and DSCTc), newly discovered DSCT from our catalogue greatly increases the number of known samples, demonstrating the ability of TMTS to discover short-period variables (see also Lin et al. 2023c).In comparison, other largescale surveys are less efficient in identifying  Scuti stars because of short periods intrinsic to these variables.In our catalog, the shortest period observed for  Scuti stars is 0.21 h, with 0.35% possessing periods below 0.5 hours.Furthermore, our survey exhibits higher fractions of  Scuti stars with periods below 1 h and 1.5 h compared to other surveys, underscoring the TMTS's pronounced proficiency in detecting short-period variables.Table 5 shows an example of the DSCT and HADS in our catalogue, while Table 6 shows an example of EA-, EB-and EW-type eclipsing binaries.In Table 7, we summarize an example of RS CVn candidates, and we show the catalogue of ZZ Ceti stars in Table 8.The machine-readable version of these catalogues can be found online.As the photometric period can be obtained directly from the LC and is used as a feature of the classifiers, the "Period" in our catalogue refers to the photometric period instead of the orbital period.For binaries, we can double the photometric period to obtain the orbital period.The catalogues contain the source ID, position (R.A. and decl.), period (photometric period for binaries), LC parameters (amplitude, R 21 , Φ 21 ), VSX type, as well as information from  DR2 (color and magnitude) and LAMOST DR7 (spectral type, effective temperature and log g).It is evident that fewer than 20% of the variables in our catalogue possess LAMOST information, which imposes limitations on our ability to accurately classify certain subtypes.0.9956 0.0000 0.0000 0.0022 0.0022 0.0000 0.0000 1.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.9989 0.0011 0.0000 0.0000 0.0080 0.0057 0.0102 0.9420 0.0068 0.0273 0.0033 0.0000 0.0000 0.0044 0.9923 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 1.0000In Figure 6 we show the distribution of the variable stars in our catalogues across the CMD.Consistent with previous studies (Eyer et al. 2019),  Scuti stars locate in the lower part of the instability strip.We marked the red-edge and blue-edge of the instability strips of  Scuti stars (Murphy et al. 2019) and ZZ ceti stars (Caiazzo et al. 2021).Murphy et al. (2019) defined the instability strip as specific boundaries in the CMD where about 20% of the stars are pulsators.While the fraction of pulsators would tend to be lower outside the strip, this tendency does not necessarily preclude the existence of pulsators beyond the strip.Therefore, it is not unexpected that some  Scuti stars are found outside the instability strip.Moreover, the effective temperatures of our  Scuti sample are inferred from  ( p − p ) 0 as in Jordi et al. 2010, which may also introduce additional uncertainties.
With the help of the  CMD, we identified 9 ZZ Ceti stars, 5 of which have been reported in paper TMTS-II and Guo et al. 2023Guo et al. , 2024 did detailed analysis of TMTS J23450729+5813146 and TMTS J17184064+2524314, respectively.ZZ Ceti stars are the most populous group of pulsating white dwarfs, with periods between 100 and 1200s.They occupy a distinct and narrow area in the instability strip of the CMD, rendering them more readily detectable (Fontaine et al. 2003).(Handler 2009).Furthermore, it's noteworthy that SX Phe stars exhibit periods ranging from 0.03 to 0.08 d (0.7-1.9 h, Kim et al. 2002) Figure 7 shows the distribution of the 11,638 variables from TMTS in Galactic coordinates.Several variables in the classes of "DSCT" and "HADS" are found at high Galactic latitudes (i.e., |b|>30°).Combining with the criteria of −1.5 < [Fe/H] < −1.0 (McNamara 2011), we identified 8 candidates of SX Phe stars, which are summarized in Table 9.To illustrate their distribution in the Galactic coordinates, we provide their Galactic longitude (l) and Galactic latitude (b), as well as their [Fe/H] obtained by the LAMOST.As the [Fe/H] parameter can be extracted for less than 20% of our sample presented in this paper, this posts constraints on our capacity to identify additional candidates of SX Phe stars.A forthcoming work (Chen L., et al., in prep.) will discuss these SX Phe candidates in detail.(Caiazzo et al. 2021), respectively.The edges of the instability strip are calculated by using the relationship of   P −  P and  eff (see also Jordi et al. 2010).

DISCUSSIONS
We elaborate on the features used in the machine learning algorithms and the underlying physics, as well as discuss each type of variable stars in detail in this section, including their LC features and Period-Luminosity relations.

Comprehensive discussion of variable stars
Figure 8 presents the relationship between period and amplitude of the variable stars while quantifying their distribution within each bin.This distribution allows for a rough differentiation between DSCT, HADS, eclipsing binaries and RS CVn candidates.DSCT and HADS typically have shorter periods, with HADS inherently possessing greater amplitudes.Conversely, eclipsing binaries and RS CVn candidates tend to have longer periods, while LCs of the former generally have larger light-variation amplitudes.As shown in Figure 4, period is the most important feature in the classification process.The number of  Scuti stars decrease with pulsating period (Rodríguez & Breger 2001;Qian et al. 2018), and most  Scuti stars exhibit periods predominantly below 0.1 d.On the contrary, contact binaries have a short-period cutoff of 0.22 d (some studies have revised this cutoff to 0.15 d, Rucinski 1992;Qian et al. 2020), probably due to the fully convective limit (Rucinski 1992), timescale of the angular momentum loss (Stepien 2011), the low mass limit of the primary component (Jiang et al. 2012), etc.Although the cutoff-period refers to the orbital period (twice of the photometric period), most of the binary stars have photometric periods longer than the cutoff period.These trends suggest the presence of a natural period dichotomy between  Scuti stars and binaries.It is noteworthy that the majority of  Scuti stars in our catalog exhibit periods below 4 h, indicating the efficiency of TMTS in identifying short-period variables.
In Figure 9, the Fourier parameter  21 helps to separate EAs, EBs and EWs, especially EAs. 21 measures the extent of deviation    10.All p-values, including that of  21 , are noticeably below 0.0001, indicating a statistically significant distinction in the distribution of skewness, cusum and R21 between EBs and EWs at a significance level of 99.99%.
The nature of eclipse is that the two stars periodically shade each other when orbiting around a common center of mass.As the process is symmetric, the LCs of eclipsing binaries tend to display a higher degree of symmetry.In contrast, driven by the  mechanism, the light curves of  Scuti stars typically manifest rapid ascending phases followed by slower descents (saw-tooth-shaped), thereby introducing a higher level of asymmetry.As  Scuti stars undergo contraction and expansion periodically, their opacity and luminosities change accordingly.Given that the contraction phase (corresponding to the decrease in the LC) is of a longer duration, the LCs of  Scuti stars show a saw-tooth shape (Eddington 1917;Zhevakin 1963).
Given that various types of variable stars tend to occupy distinct regions within the  CMD, it becomes a powerful tool for classification.For instance,  Scuti stars are typically found in the instability strip on or above the main sequence in the CMD.Although eclipsing binaries can appear anywhere, the observations conducted by TMTS (with upper-period-bound of 7.5 h) are primarily geared towards detecting -, -, or -type stars (the CMD's lower right quadrant corresponds to stars with smaller radii, so that the distance between the two stars could be closer, which leads to shorter periods).This limitation naturally segregates eclipsing binaries from  Scuti stars, which are typically comprised of -to -type stars.

𝛿 Scuti stars
Scuti stars are 0-to 5-type stars pulsating in radial and nonradial acoustic modes.Their absolute magnitudes are typically below 3.5 mag, with temperature between 6900 K and 8900 K.  Scuti stars locate in the intersection of the main sequence and the classical instability strip in the Hertzsprung-Russell diagram (as well as the  CMD).From a stellar evolutionary standpoint, most of them are main sequence stars in the hydrogen-burning stage, and a few may be zero-age main sequence (ZAMS) stars, red giants or blue stragglers moving off the main sequence.The  mechanism accounts for most oscillations seen in  Scuti stars, where opacity increases with temperature in H and He ionisation zone.Their pulsation modes are complex, including low and intermediate order pressure modes (p-modes) and gravity modes (g-modes).Some stars are known to have multiple modes, which makes them promising objects for asteroseismological study.Among  Scuti stars, HADS typically exhibit radial mode pulsation.With increased photometric precision, nonradial modes have also been detected in some cases.Compared with DSCT, HADS generally have lower rotational velocities (i.e.,  sin ≤ 30 km −1 ), while  sin can reach 200 km −1 for DSCT (McNamara 1997;Pigulski et al. 2006).
Numerous models have provided theoretical calculations of the instability strip of  Scuti stars (Dupret et al. 2004;Xiong et al. 2016).However, the theoretical edges does not match the observed pulsator fraction (Murphy et al. 2019).The new identification of over 4000  Scuti stars in our catalogue not only helps to better constrain the edges of the instability strip, but also offers valuable insights into the pulsator fraction inside the strip.
−  relations have been established for various pulsating stars (Leavitt & Pickering 1912).Among them, the  −  relations of Cepheids have been extensively studied.Like other pulsating stars,  Scuti stars also follow  −  relations (Breger & Bregman 1975;McNamara 1997), which offers an independent avenue for calibrating cosmic distances, complementary to the Cepheids.However, due to their intricate pulsation modes, lower luminosity and smaller amplitudes, the  −  relations of  Scuti stars has remained comparatively less well-defined (Ziaali et al. 2019).Figure 11 shows the P-L relations of DSCT, HADS and EW-type eclipsing binaries in The increased dispersion observed in the  −  relation of DSCT, as compared to HADS, can be ascribed to the fact that DSCT undergo multi-mode pulsations, encompassing both overtone mode and fundamental mode, whereas HADS predominantly exhibit fundamental mode pulsations.As expected, the overtone mode pulsation corresponds to a brighter  −  relation compared to the fundamental mode (see also Lin et al. 2023c).Additionally, the overtone mode comprises a blend of various overtone modes, which may explain the larger dispersion and is consistent with Jayasinghe et al. (2020).In the short-period end of the period-luminosity diagram, the distribution of DSCT appears relatively sparse, forming what seems like two distinct lines, one aligns with the dash-dotted line ( −  relation of DSCT), while the other corresponds to the solid line ( −  relation of HADS).This may be attributed to the possibility that within the DSCT class, stars pulsating in the fundamental mode and overtone mode adhere to different  −  relations.Conducting separate fittings of the  −  relations for DSCT with distinct pulsation modes contributes to enhancing the accuracy of the  −  relation.Further discussion and comparison of the  −  relations will be undertaken in our future work (Chen L, et al., in prep.).
Figure 12 shows some light curves of DSCT (a) and HADS (b).HADS commonly exhibit LCs characterized by a rapid rise and slow decline, while DSCT show a broader range of LC shapes, including more sinusoidal patterns.

Eclipsing Binaries
Eclipsing binary exhibits eclipses due to the nearly edge-on orientation of its orbital plane with respect to the Earth, causing periodic changes in its brightness as one star passes in front of its companion.Unlike pulsating stars, they can appear anywhere in the CMD (Eyer et al. 2019).Eclipsing binaries offer unique opportunities for investigating material exchange and common envelope evolution due to the intricate interactions between the two components (Stassun et al. 2014;Pols et al. 1997;Nelson et al. 2023).
Eclipsing binaries can be categorized into three types based on their LC shapes, among which EWs accounts for the highest proportion, as illustrated in Figure 2 and Table 4. Figure 13 displays typical light curves of the EWs (a), EB(b) and EAs (c).EAs exhibit clearly defined moments of beginning and end of the eclipse.Typically, there is minimum material exchange between the two spherical or slightly ellipsoidal stars, and they may be detached or semi-detached.In contrast, stars in the EB-type systems have active mass transfer between the envelopes, leading to continuous LC variations that make it impossible to determine the exact start and end moments of the eclipse.Additionally, EBs are characterized by different primary and secondary minimum depths, which distinguishes them from EW-type binaries.Finally, EWs are marked by nearly equal depths in their primary and secondary minima.Both components fill their Roche Lobes in EW-type systems.In addition, the densely sampled TMTS LCs would greatly contribute to the determination of minima epoch of eclipsing binaries in our catalogues.EW-type eclipsing binaries also follow a  −  relation, especially in infrared bands, owing to the strong geometric constraints imposed by the common envelope (Rucinski 2004;Ren et al. 2021).In simple terms, orbital periods are correlated with radius of the system, and therefore with its luminosity.This  −  relation renders EWs as reliable distance indicators.Compared to Cepheids and RR Lyrae stars, EWs trace older stellar population than Cepheids and are more common in open clusters and the solar neighborhood than RR Lyrae stars.This highlights their unique value as distance tracers (Chen et al. 2016).However, owing to the limited sample size, the determination of  −  relations for EWs is characterized by less precision, which impedes their application.Chen et al. (2018) studied 27,318 EWs with accurate distance measurements, and they reached an distance accuracy of 8%. Figure 11 shows the  −  relation of EWs in our catalogue, the best fitting is Orbital period is used in this fitting.The  −  relation of EWs may be influenced by factors such as color and metallicity (Rucinski 2004), and we only provide the simplest analysis in this paper.

RS CVn
RS CVn variables are an active subclass of binary systems characterized by strong Ca II H & K emission lines emerged from at least one of the system's components (Hall 1976).Consequently, the periodic photometric variations in these systems arise not only from pulsations or eclipses, but also from their rotation.The primary star of RS CVn is a F to K-type subgiant or giant star, with a G to Mtype dwarf or subgiant companion (Martínez et al. 2022;Fekel et al. 1986).The orbital periods of RS CVn variables exhibit variability over time, and studying these changes would shed light on the mass loss process, magnetic field interactions and the dynamic evolution of binary systems (Hall & Kreiner 1980).It's crucial to mention that a comprehensive understanding of RS CVn variables requires detailed spectral properties.Therefore, in this paper, we only present a catalogue of RS CVn candidates.
The LCs of RS CVn variables exhibit a discernible semiperiodic pattern, which coexists alongside the eclipse features.This phenomenon is attributed to the presence of "starspots".Arising from strong magnetic fields and stifled convection, these starspots are cooler regions on the photosphere in the outer layer of the star (Roettenbacher et al. 2015).Figure 14 shows the LCs of some RS CVn candidates in our catalogue.

SUMMARY
With a cadence of ~1 minute, TMTS has shown great potential in the detection of short-period variables.In this paper, we systematically classified 11,638 variable stars into 6 main categories using the state-of-the-art XGBoost and Random Forest algorithms, achieving remarkable accuracy levels of 98.83% and 98.73%, respectively.We constructed a well-chosen training and test dataset of 4506 light curves of the variables.Period is the most important feature in the classification, followed by features that characterize the shape of the LCs as well as   abs and ( p −  p ) 0 , which is in accordance with our predictions.Once the effectiveness of the classifiers was confirmed, we utilized them to assign labels to the remaining unlabeled light curves.Notably, in both classifiers, the macro-average and weighted-average F 1 scores are higher than 98%, with the F 1 scores for DSCT, EA, EB and HADS exceeds 99%, demonstrating the effectiveness of our feature selection and model training methodologies.
To ensure the quality of the catalogue, we conducted a manual inspection to all included light curves.We acknowledge the support of the staffs from Xinglong Observatory of NAOC during the installation, commissioning, and operation of the TMTS system.
Guoshoujing Telescope (the Large Sky Area Multi-Object Fiber Spectroscopic Telescope LAMOST) is a National Major Scientific Project built by the Chinese Academy of Sciences.Funding for the project has been provided by the National Development and Reform Commission.LAMOST is operated and managed by the National Astronomical Observatories, Chinese Academy of Sciences.
This research has used the services of www.Astroserver.org.This work has made use of data from the European Space Agency

Figure 1 .
Figure 1.Sky map (in equatorial coordinates) of TMTS uninterrupted observations during 2020-2022.The color displays the maximum number of uninterrupted exposures within a single night, as in TMTS-II.We used the HEALPIX package (http://healpix.sourceforge.net)with NSIDE=128 to plot this sky map(Gorski et al. 2005).

Figure 2 .
Figure 2. Pie chart of the distribution of variable stars within Dataset-I, which is heavily imbalanced.EWs account for 66.25%, while EAs comprise a mere 0.93%, underscores the pronounced disparity of over 70-fold between these two groups.

Figure 3 .
Figure 3. Correlation matrix of the features used in classification.The color of each square represents the value of the Pearson correlation coefficient between the corresponding features.A predominantly white color indicates a minimal correlation between a pair of features.Conversely, a thicker red (blue) color indicates a stronger positive (negative) correlation between the features.

Figure 4 .
Figure 4. Feature importance of the feature set of the XGB classifier.The x-axis shows the F 1 scores (represents the gain) of each feature, and the y-axis displays the names of the features.

Figure 5 .
Figure 5.The normalized confusion matrix derived from the XGB classifier (left) and the RF classifier (right) based on the labelled dataset (Dataset-I).The x-axis represents the prediction class obtained from the classifier, and the y-axis displays the true group of a variable star.
Phe stars SX Phoenicis (SX Phe) stars are a subtype of  Scuti stars.The primary criterion for distinguishing SX Phe stars from classical  Scuti stars is their metal abundance.Most  Scuti stars exhibit metal abundances similar to that of the Sun, placing them in the category of young, metal-rich population I stars typically found in the Galactic disk.In contrast, SX Phe stars belong to Population II stars.They possess relatively lower metal abundances, generally falling within the range of −1.5 < [Fe/H] < −1.0 (McNamara 2011), and are typically located in Galactic halos or globular clusters.Taking into account this distribution tendency, there is an enhanced likelihood of detecting SX Phe stars in regions of high Galactic latitude.Despite similarities exist in their oscillation modes, there are also discernible differences between  Scuti stars and SX Phe stars.Compared with  Scuti stars, SX Phe stars have on average lower luminosities and higher space velocities.Note that the SX Phe stars are usually discovered in globular clusters where blue stragglers are also prevalent (Cohen & Sarajedini 2012).Whereas  Scuti stars are frequently observed in the main sequence, indicating potential evolutionary distinctions, as SX Phe stars possibly originating from stellar mergers , statistically shorter than  Scuti stars.As other pulsating stars, SX Phe stars follow  −  relations.Given their presence in globular clusters, SX Phe stars can assist in determining the distances to globular clusters or dwarf galaxies (McNamara 2011).

Figure 6 .
Figure 6.Distribution of variable stars in TMTS Catalog of Periodic Variable Stars across the color-magnitude diagram.Color-coded symbols represent different classes of variable stars, include typical  Scuti stars (DSCT, royal blue), high amplitude  Scuti stars (HADS, red), EW-type eclipsing binaries (EW, orange), EA-type eclipsing binaries (EA, pink), EB-type eclipsing binaries (EB, green), RS CVn candidates (RS, brown) and ZZ Ceti variables (ZZ, azure).The dashed lines and dash-dotted lines indicate the instability strip edges for  Scuti stars (Murphy et al. 2019) and ZZ Ceti stars(Caiazzo et al. 2021), respectively.The edges of the instability strip are calculated by using the relationship of   P −  P and  eff (see alsoJordi et al. 2010).

Figure 7 .
Figure 7. Distribution of variable stars in TMTS Catalog of Periodic Variable Stars in Galactic coordinates.

Figure 8 .
Figure 8. Relation between period and amplitude in logarithmic scale.DSCT, HADS and eclipsing binaries are well separated.A period histogram is at the top, and an amplitude histogram is on the right.

Figure 9 .
Figure 9. Histogram of  21 of EWs, EBs and EAs.Larger  21 is correlated with more asymmetric light curve.EAs statistically have largest  21 , followed by EBs and EWs.

Figure 10 .
Figure 10.The box plot, density plot and violin plot depicting the distribution of skewness (upper) and cusum (lower) of the light curves of EBs and EWs.Box plot displays the minimum, first quartile, median, third quartile and maximum of a distribution, while density plot represents the probability density function of it.Violin Plot combines the information provided by the box plot and the density plot, using a kernel density estimation on each side.
The period-amplitude diagram and the period- 21 diagram prove the validity of the classification.We identified 5504  Scuti stars (including 4876 typical  Scuti stars and 628 high-amplitude  Scuti stars), 5899 eclipsing binaries (including 117 EAs, 84 EBs and 5698 EWs) and 226 candidates of RS Canum Venaticorum systems.Additionally, with the help of color-magnitude diagram established from the Gaia database, we discovered 9 ZZ Ceti stars.Combining Galactic longitudes with metal abundances, we discovered 8 metal-poor  Scuti stars (SX Phe stars).
Kim et al. (2014)Weighted_mean Weighted mean of the LCs, the weight of each data point in the LCs is inversely proportional to its measurement error, which assigns higher weights to data points with smaller errors.

Table 2 .
The precision, recall and F 1 scores of the XGB classifier.

Table 3 .
The precision, recall and F 1 scores of the RF classifier.

Table 4 .
All and newly discovered variables in TMTS Catalogues of Periodic Variable Stars.

Table 5 :
Example catalogue for typical  Scuti stars (DSCT) and high amplitude  Scuti stars (HADS) in TMTS Catalogues of Periodic Variable Stars.

Table 6 :
Example catalogue for EA, EB and EW-type binaries in TMTS Catalogues of Periodic Variable Stars.

Table 7 :
Example catalogue for candidates of RS CVn in TMTS Catalogues of Periodic Variable Stars.

Table 8 :
Catalogue for ZZ Ceti stars in TMTS Catalogues of Periodic Variable Stars.