Asteroids co-orbital motion classification based on Machine Learning

In this work, we explore how to classify asteroids in co-orbital motion with a given planet using Machine Learning. We consider four different kinds of motion in mean motion resonance with the planet, nominally Tadpole, Horseshoe and Quasi-satellite, building 3 datasets defined as Real (taking the ephemerides of real asteroids from the JPL Horizons system), Ideal and Perturbed (both simulated, obtained by propagating initial conditions considering two different dynamical systems) for training and testing the Machine Learning algorithms in different conditions. The time series of the variable theta (angle related to the resonance) are studied with a data analysis pipeline defined ad hoc for the problem and composed by: data creation and annotation, time series features extraction thanks to the tsfresh package (potentially followed by selection and standardization) and the application of Machine Learning algorithms for Dimensionality Reduction and Classification. Such approach, based on features extracted from the time series, allows to work with a smaller number of data with respect to Deep Learning algorithms, also allowing to define a ranking of the importance of the features. Physical Interpretability of the features is another key point of this approach. In addition, we introduce the SHapley Additive exPlanations for Explainability technique. Different training and test sets are used, in order to understand the power and the limits of our approach. The results show how the algorithms are able to identify and classify correctly the time series, with a high degree of performance.


Introduction
In the last decades, the use of Artificial Intelligence (AI) for data analysis has significantly increased in scientific applications, in particular thanks to its sub-field known as Machine Learning (ML), where an algorithm is said to improve its performance on a specific task by experience (e.g., Hastie et al. 2009b;Jordan & Mitchell 2015).More recently, many authors started to use such methods in astronomy and solar system science (e.g., Ball & Brunner 2010;Ivezić et al. 2014).Although well-known and broadly applied in several contexts, we recall here the general concepts of AI and ML, for the sake of completeness.With AI we mean methods by which a computer makes decisions or discoveries that would usually require human intelligence, while with ML we mean automated processes that learn by examples in order to classify, predict, discover or generate new data.Part of ML is the class of algorithms known as Deep Learning (DL) which is based on artificial neural networks (e.g., LeCun et al. 2015;Goodfellow et al. 2016).ML and DL are the key of the success of AI nowadays.There are three classes of ML algorithms (see, for example, Hastie et al. (2009a) for more ★ E-mail: giuliaciacci8@gmail.com † E-mail: a.barucci@ifac.cnr.it‡ E-mail: sara.diruzza@unipa.it§ E-mail: elisamaria.alessi@cnr.itdetails): supervised learning, where a labeled dataset is used to help to train and tune the algorithm, with the goal to create a map that links inputs to outputs; unsupervised learning, where no labels are provided and the goal is to discover hidden patterns allowing the data to speak for itself; reinforcement learning, where an agent learns by interacting with an environment and modifying its behavior to maximize its reward.It is important to keep in mind that this line between classes can occasionally become hazy and fluid because numerous applications frequently combine them in inventive and unique ways (e.g.self-supervised learning, see Liu et al. (2021)).These approaches are firmly established in astronomy and an important survey of the state of art can be found in Fluke & Jacobs (2020), who analyse the published articles in the last years.They highglight applications in many sub-fields of astronomy where ML could be used for several activities, as classification, regression, clustering, forecasting, generation of data, discovering, development of new scientific insights.Fluke & Jacobs (2020) also classify the different fields of astronomy where ML is used as "emerging", "progressing" and "established", depending on the progress of its use.
The first approach in astronomy to Principal Component Analysis (PCA), an algorithm devoted to Dimensionality Reduction, which is nowadays a standard technique, was introduced in the 1980s for morphological classification of spiral galaxies (e.g., Whitmore 1984), in the 1990s for quasar detection (e.g., Francis et al. 1992) and spectral classification (e.g., Singh et al. 1998), while more recent applications with ML have been done for discovering extrasolar planets (e.g., Pearson et al. 2018;Shallue & Vanderburg 2017), for studying gravitationally lensed systems (e.g., Jacobs et al. 2019;Lanusse et al. 2017;Pourrahmani et al. 2018) and for discovering and classifying transient objects (e.g., Connor & van Leeuwen 2018;Farah et al. 2018).For a complete and detailed bibliography about all the ML applications in the astronomical fields we suggest a careful reading of Fluke & Jacobs (2020).
The analysis of motion of the solar system bodies is considered one "progressing" field of application of ML.Several authors in the last years studied problems related to solar system objects as, for example, applications to TransNeptunian objects (e.g., Chen et al. 2018), or detection and classification of asteroids through taxonomies of spectrophotometry, as studied in Erasmus et al. (2017Erasmus et al. ( , 2018)).
One "emerging" field concerns asteroid dynamics (e.g., Carruba et al. 2022).Indeed, the numerical propagation of asteroids' orbits, based on continuous improved information, implies a large volume of data, that requires fast and novel methods to be analyzed.For example, in Smirnov & Markov (2017), the authors use ML methods to identify three-body mean motion resonance asteroids in the main belt without requiring numerical integration.They use proper elements which are quasi-integral of motion that are stable for a long time (e.g., Knezevic & Milani 1994;Knezevic et al. 2002), and use four different supervised ML methods as reported in Hastie et al. (2009a).The authors compare their results with the ones of the previous paper by Smirnov & Shevchenko (2013) remarking that, with the new approach, the identification of the objects trapped in mean motion resonance is very good and the procedure requires few seconds, while the numerical integration requires days and weeks.Very recently, Smirnov (2023) provides a new open-source package for identifying objects trapped in mean motion resonances (MMR).The main objective they have is to distinguish resonant and non-resonant orbits, but they do not aim at distinguishing different classes of 1:1 MMR, like we will do here.
Other new works comparing results from ML algorithms with previous known asteroid classifications are, for example, Smullen & Volk (2020), where the authors classify objects of the Kuiper belt into four classes based on their dynamics; Carruba et al. (2019), where hierarchical clustering algorithms for supervised learning are applied to identify 6 new families and 13 new clustering of asteroids; Carruba et al. (2020), where ML classification algorithms are used to identify new families of asteroids based on the orbital distribution in the parameters (, , sin()) (where , ,  are, respectively, the semi-major axis, the eccentricity and the inclination of the asteroid orbit) of previous known family objects.Some other very interesting and recent works explore the use of ML to classify regular or chaotic motions.For example, Kamath (2022) studies and classifies orbits in Poincaré maps: the major challenge of this problem is solved by creating high-quality training sets with few mislabeled orbits and converting the coordinates of the points into features that are discriminating, despite the apparent similarities between orbits of different classes.Celletti et al. (2022) use DL methods, such as convolutional neural networks (CNNs), to show how it is possible to classify different types of motion, starting from time series, without any prior knowledge of the dynamics.Indeed, the identification of a motion usually requires a knowledge and the solution of the differential equations governing the dynamical system.Instead using CNNs trained on one dynamical model, the type of motion could be predicted, for example, from observational data.
All these examples show how ML algorithms are increasingly used in astronomy, as well as in dynamical systems and in particular in celestial mechanics.
In this paper, leveraging on the recent work Di Ruzza et al. (2023), we focus on asteroids that are in co-orbital motion (1:1 Mean Motion Resonance) with a planet of the solar system.We apply ML methods to classify the various types of co-orbital motion that can arise in the planar case, through features derived from time series corresponding to the evolution of a specific variable -the angle , that we will define in the following.
The current paper is organized as follows.In Section 2, we recall the averaged problem of circular restricted three-body problem for the co-orbital motion in the planar case and how the approximation can be applied to classify co-orbital objects in the solar system.In Section 3, it is explained how the training and testing data are generated.In Section 4, the whole algorithmic pipeline is detailed, while in Section 5 the results are given together with a critical analysis on the procedure.In Sections 6 and 7 a possible future direction is proposed and the conclusions are drawn.

Co-planar co-orbital asteroids in the solar system
The main idea considered by Di Ruzza et al. (2023) was to show how an integrable approximation of the restricted three-body problem can be applied to describe the dynamics of real natural objects and the goal was to provide a general catalogue of co-orbital objects in the solar system in the co-planar case and a tool to visualize them.
We recall here the general setting and main features that will be important for the present work.More details can be found in Pousse & Alessi (2022) and Di Ruzza et al. (2023).The theoretical model is the Planar Circular Restricted Three-Body Problem (PCR3BP) where a massless body is interacting by gravitational attraction with two massive bodies.The Hamiltonian describing the motion of the massless body can be written as where r, r ∈ R 2 are, respectively, the heliocentric position and velocity vectors of the massless body (the asteroid); ,   are the mass parameters of the massive primary body (the Sun) and of the massive secondary body (the planet), respectively; :=    +   is a dimensionless parameter characterizing the mass ratio of the Sun-planet system; the heliocentric vector r    denotes the position of the planet, for a given value of the mean longitude   , which follows the solution of the two-body problem for the Sun-planet system.Usually, the Hamiltonian (1) is analyzed in the synodic reference frame rotating with the planet.It is well-known that the problem admits 5 equilibrium points, called Lagrangian points and denoted by   for  = 1, . . ., 5. If  is small enough, we could rewrite the Hamiltonian (1) as H r, r,   = H K (r, r) + ( +   )  H P r,   , where H K is the unperturbed Kepler motion of the massless body (around the Sun) and H P is the perturbation depending on the gravitational influence of the planet and, then, we consider the averaged problem with respect to the fast angle   obtaining the new Hamiltonian where H P is the average over the period of revolution of the planet with respect to the fast angle   .
We assume that the particle and the secondary are in a 1 : 1 Mean Motion Resonance (MMR), that is, their orbits have the same value of semi-major axis.Within this approximation, the problem can be studied by means of the action-angle variables (, ), defined as follows: is the resonant angle (being  the mean longitude of the asteroid) and is its conjugated action whose modulus measures the distance to the exact Mean Motion Resonance, with  and   being the semi-major axis of the asteroid and of the planet orbit, respectively; the exact 1 : 1 MMR is obtained for ( , ) = (0, 0).
In this system, the quantity is a first integral of the problem, being  the eccentricity of the asteroid orbit.For different values of Γ ∈ [0 : √   ], the phase portrait in resonant variables (, ) allows to understand the whole co-orbital motion structure.In the planar circular case we can have three types of co-orbital motion, depicted in Fig. 1 in the synodic reference system.The tadpole (TP) motion (on the left) stemming from   with  = 4, 5 is such that  experiences a periodic oscillation around a given   (Γ) satisfying 23.9 • < (−1)    (Γ) < 180 • ; the horseshoe (HS) motion (in the middle), stemming from  3 is such that  oscillates around 180 • with a large amplitude that decreases as long as Γ increases; the quasi-satellite (QS) regime (on the right) is such that  librates around zero for Γ > 0.
In the given phase space, the co-orbital trajectories are solutions located in the neighborhood of  = 0 and such that  oscillates around the given value.The crossing with the section  = 0, that corresponds to  =   , provides a way to understand the global evolution of the dynamics at varying Γ, or equivalently, the eccentricity  of the asteroid's orbit.In this way it is possible to derive a (, )-map, represented in Fig. 2, that allows to classify the different domains of co-orbital motion.We remark that, in first approximation, this map is invariant with respect to the mass parameter , so it has the same features for all the planets.
In the upper panels of Fig. 3, the graphs of the evolution of the time series (, ) of the three real examples of asteroids in the different regimes TP, HS, QS are plotted.In these cases, the evolution appears The (  , )-map of the co-orbital motion defined by the section  = 0.The black and red thick curves stand, respectively, for the singularity of collision and the crossing of the separatrices that originate from  3 (thick red curve).They divide the map in three regions.The QS domain is between the dark curves; the HS region, split in two parts, is between the separatrix (red curve) and the the dark curve; the TP regions are inside the separatrices (respectively, TPL4 for positive values of the angle  and TPL5 for negative values of the angle ).
very regular, while in bottom panels, three less regular cases are reported for comparison.
It is important to underline that the analysis done in the current work, and described in the next Sections, takes specifically into account the time evolution of the resonant angle .Subsequently, we will exploit the time series (, ) in order to recognize the different kinds of co-orbital regime as shown in Fig. 3.
In Di Ruzza et al. (2023), co-orbital asteroids of Venus, Earth and Jupiter have been analyzed to show a practical application of the (, )-map just explained.After a suitable filtering on the asteroid orbital elements in order to fulfill the resonance condition and the quasi-coplanar configuration at a given epoch, the ephemerides of asteroids have been computed by means of JPL HORIZONS API service (NASA 2022) for an interval of time of about 900 years.The real data have been compared with the theoretical model and a very good correspondence has been found.Asteroids in quasi-coplanar co-orbital motion with Venus, Earth and Jupiter have been cataloged according to their co-orbital dynamics and their representation can be seen in Fig. 4. A very refined analysis has been done checking  by hands if the time series (, ) of each asteroid (as represented in Fig. 3) was in agreement with its position in the (, )-map (Fig. 4).The results presented in Di Ruzza et al. (2023) are very promising for TP, HS and QS motion: under given assumptions, data of real observations fit very well with theory.The analyzed series comprised also transitions (TR) between different co-orbital regimes as well as the compound (CP) motion (a particular combination between QS and HS dynamics) ¶.In this case, the map was not able to accurately catch the behaviour, as expected, since TR and CP are proper of the three-dimensional model, not of the planar one.
At this point, an automatic tool capable of distinguishing the different co-orbital regimes becomes essential in order to improve our study.Indeed, in the future we aim to extend the analysis for a ¶ We refer to Namouni (1999); Namouni et al. (1999) for more details about the appearance of these kinds of motion.longer time span (order of thousands of years or more), to consider the spatial problem including asteroids with very high inclination and to understand better and classify TR and CP motions.All these information would be desirable to create a complete catalogue of asteroids in co-orbital motion with all the planets in the solar system.
For these reasons, a ML approach in this problem is highly recommended in order to deal with a huge number of very long time series that can exhibit very rich dynamical behaviors.The aim of the present and coming works is to become able to manage any kind of real data, for short, medium and long timescales also when transitions between different co-orbital motions occur or when new kinds of motion appear, as, for example, the compound motions.In what follows, we will consider only TP, HS and QS orbits since the foundations of the work are the results obtained in Di Ruzza et al. (2023).In particular, we will classify co-orbitals motions belonging to the four classes QS, HS, TPL4 (a tadpole around the equilibrium position  4 ) and TPL5 (a tadpole around the equilibrium position  5 ).

Data
Let us underline that our final goal is to be able to recognize, through the use of ML, co-orbital dynamics of real asteroids for short, medium and long timescales also when transitions between different co-orbital motions occur or when new kinds of motion appear, as, for example, the compound motions.
The data described in this section are the basis to outline the work done by the ML algorithms.As mentioned before, the information used in this work is the time evolution of the angle , computed considering three different sources of data, as summarized in Tab. 1.
In general, training a ML algorithm requires large amounts of data in order to provide accurate predictions.In our case, obtaining numerous time series of real asteroids with regular trends and clearly attributable to a single class (QS, HS, TPL4, TPL5) is not straightforward as real cases may present some complex behaviors, sometimes making labeling difficult and unclear.In particular, a high number of asteroids among those considered can escape from the given resonance or experience a co-orbital transitions.
We start our work by using the time series of asteroids reported in Table 3, 4, 5 of the paper Di Ruzza et al. (2023).Looking at those tables, it is evident that most of the asteroids exhibit motions with different co-orbital dynamics and, as previously stated, these cases must be excluded so that, as shown in Tab. 1, the real cases dataset used in the current work turns out to be composed by only 50 series, that is an absolutely insufficient number for a training set.
To overcome this issue, a dataset containing simulated data of ideal cases is introduced.This kind of data can be produced by using suitable model and initial conditions (as depicted in the following) in order to get the four desired classes.It is possible to obtain as many cases as we need and we produced a total number of 1999 time series of ideal cases.This dataset allows us to train the ML models with a consistent number of cases with well-known labels (i.e., motion clearly attributable to a single class), leaving the real cases dataset for testing purposes.
On the other hand, to have more data to evaluate the performance of our pipeline, we decided to increase the number of cases that can be used.To this aim, we generated time series deviating from the ideal ones by perturbing the model used to generate ideal cases.This process only partially enlarges the number of cases to be used; in fact, by adding perturbations, the time series become more similar to real cases and most of them must be eliminated because escapes from the resonance or transitions between different co-orbital regimes appear.For this reason, the number of perturbed cases can not be as large as the ideal ones.As reported in the last row of Tab. 1, the total number of produced perturbed series is 347.
A detailed description of how the data are obtained is provided below.
1. Real ephemerides are obtained from the JPL HORIZONS system (NASA 2022), following the approach adopted in Di Ruzza et al. (2023).In this case, from the database analyzed in Di Ruzza et al. (2023), we have selected 50 asteroids that exhibit a regular tadpole, horseshoe, quasi-satellite behavior, that is, we excluded the compound motions and transitions.In this case, the simulated data cover an interval of time equal at most to 900 years.We refer to these data as real data.
2. Ideal cases of TP, HS, QS motions are generated by propagating the equations of motion of the Circular Restricted Three-Body Problem (CR3BP) with initial conditions obtained from the (, )-map in the corresponding orbital domain (see Fig. 2).In this case, the initial condition in the synodic reference system is computed starting from the heliocentric orbital elements (, , , , Ω, ) in the inertial system, by assuming the initial semi-major axis  equal to 1, the eccentricity  given by the map, the initial inclination , the longitude of the ascending node Ω and the mean anomaly  equal to 0 and the argument of pericenter  equal to .In this case, the simulated data cover an interval of time equal to 3000 years.We refer to these data as ideal simulated data and we produced a total number of 1999 time series of such cases.
3. Perturbed cases from the ideal cases are computed by propagation of initial conditions obtained from the (, )-map, considering a dynamical model that accounts for Sun, Moon and the planets from Mercury to Mars.The propagation is performed by means of REBOUND Rein & Liu (2012), taking the initial states for the massive bodies from NASA (2022) assuming as initial epoch  0 =   2305537.5.The initial orbital elements for the asteroids are taken as above, except that now the argument of pericenter is set as  =  +   ℎ , where the mean longitude of the Earth   ℎ is given by   ℎ =   ℎ + Ω  ℎ +   ℎ with   ℎ , Ω  ℎ ,   ℎ being, respectively, the argument of pericenter, the longitude of the ascending node and the mean anomaly of the Earth at  0 .Also in this case, the simulated data cover an interval of time equal to 3000 years.We refer to these data as perturbed simulated data and we produced a total number of 347 time series for this dataset.They present variations to the ideal cases that resemble the behavior of real objects, although no further perturbations have been added otherwise the motion more frequently escapes from the resonance.However, we consider this dataset to test algorithms trained on ideal simulated data.
We note that data produced as described in point 2. and 3. above could be also interpreted as a good test of the results obtained in the previous paper Di Ruzza et al. (2023).Indeed, we have chosen initial conditions (, ) in the (, )-map and propagated them in order to obtain the desired kind of co-orbital motion.

Data analysis workflow
As shown in Fig. 5, our data analysis workflow can be conceptually divided in three macro blocks.The first step consists in preparing and labelling the data described in Sec. 3, i.e., the output of the propagation of orbital elements of the asteroids.The data are collected in .outformat files: each file is associated with a single asteroid and it contains 7 columns corresponding, respectively, to time (in Julian date), elapsed time in years (starting from  0 ), semi-major axis , eccentricity , inclination , resonant angle  and associated action .The filenames contain acronyms useful to recognize the name of the asteroid, the kind of co-orbital motion, the planet that the asteroid is in resonance with and the kind of propagation used to get the data (points 1., 2., 3. in Sec. 3).In this way, files can be easily shared if required.It is important to stress that in this work we focus only on the time evolution of the variable angle , but the other information can turn out to be useful for future analysis.
These tabular data are passed to the next block, where the tsfresh python package (e.g., Christ et al. 2018) provides a systematic time series feature extraction thanks to the combination of established algorithms from statistics, time series analysis, signal processing and non-linear dynamics.
Before giving the extracted features to the Machine Learning classification algorithms, two additional steps can be applied: selection and standardization.Selection can be performed thanks to tsfresh, which represents a robust feature selection algorithm (e.g., Li et al. 2017), while standardization can be obtained by any kind of library such as Scikit Learn pre-processing functions (e.g., Pedregosa et al. 2011).
The final classification step (last two blocks in Fig. 5) is performed in two parallel branches, with two classes of ML algorithms involved, namely, Dimensionality Reduction and classification algorithms.
Before moving into a deeper explanation of all the details regarding the steps involved in the data analysis workflow, it is worth noting how our approach based on features extraction and standard Machine Learning algorithms is very well suited for our case where we have two constraints: data numerosity and physical interpretability.Both these constraints encourage an approach based on Machine Learning algorithms where the requirement on the number of data to train the algorithm is less tight with respect to Deep Learning.At the same time, thanks to the features extraction, a time series of any length can be converted into a finite number of features, all of them holding a physical meaning.This physical meaning is deeply important, because not only at the end of the whole data analysis workflow it is possible to identify the most important features responsible for a good time series classification (Feature Importance), but in addition we can look at the discriminating features between the different classes of signals, recovering a physical understanding of such processes.

Features extraction and selection: the tsfresh open-source package
In order to train a ML model, features need to be extracted from the data.In our case a total of 789 features are extracted from each time series representing the time evolution of the angle  () by the Python package tsfresh (e.g., Christ et al. 2018).For a detailed description of the meaning of each feature please refer to Christ et al. (2023).
After feature extraction, usually, it is worth to introduce a step of Feature Selection.This step can be performed in different ways or not performed at all.However, in general, it has been demonstrated (e.g., Guyon & Elisseeff 2003) that Feature Selection can improve ML performances.Therefore, we decided to implement such step in our workflow using a built-in function of tsfresh, which provides a feature selection method based on Mann-Whitney Test.In our case, this step reduces the number of features to 239.

Features standardization
Again, pre-processing data is an essential step to achieve good classification performance, with the importance of data standardization (or normalization) for improving the performance of ML algorithms described in many studies as stated in Singh & Singh (2020).In our study, features are standardized using the Scikit Learn function StandardScaler (e.g., Pedregosa et al. 2011).

Dimensionality Reduction
The process of transforming data from a high-dimensional space into a low-dimensional space with the goal of keeping the low-dimensional representation as close as possible to the inherent dimension of the original data is known as Dimensionality Reduction.There exist many different ML algorithms able to perform such transformation on data.In this work, we focus on two of them, namely, Principal Components Analysis (PCA) (e.g., Cozzolino et al. 2019) and t-distributed Stochastic Neighbor Embedding (t-SNE) (e.g., Van der Maaten & Hinton 2008;Arora et al. 2018;Kobak & Berens 2019).PCA and t-SNE operate in two different ways: PCA is a linear method that seeks to preserve as much variance as possible and the global structure of the data, while t-SNE is a non-linear optimized technique that concentrates on preserving local similarities between data points.Additionally, PCA uses a well-known transformation making it a deterministic technique.On the other hand, t-SNE is a stochastic optimized method, which tend to preserve points which are close to each other.However, the method doesn't construct an explicit function that maps high dimensional points to a low dimensional space, but it just optimizes low dimensional positions of the data points directly.Since it does not define a data transformation function, the method cannot be applied to newer data, but a newer optimization must run.
Both algorithms are Dimensionality Reduction techniques particularly well suited for the visualization of high-dimensional datasets as in this case, where, after the feature selection step, the number of features is still above 200.The utility of such kind of algorithms is twofold: on the one hand they can be used as unsupervised learning methods which allow to visualize the data distribution in two dimension, providing a deep insight on whether and, in case, how the data can be divided in the higher dimensional space.Moreover, they usually can give an idea of how the classifiers will perform.Indeed, well clustered data visualized by Dimensionality Reduction methods are usually well classified by ML algorithms, whereas the contrary is not necessarily true, meaning there could be data with a low degree of clustering where the classification algorithms still perform very well.

ML classification
We use three ML algorithms: Support Vector Machine (SVM) (e.g., Cervantes et al. 2020), Random Forest (RF) (e.g., Biau & Scornet 2016) and XGBoost (XGB) (e.g., Chen & Guestrin 2016).We evaluate the performances of these algorithms with different combinations of training and test sets, as reported here: (i) trained on real data and tested on real data; (ii) trained on ideal simulated data and tested on real data; (iii) trained on ideal simulated data and tested on perturbed simulated data; (iv) trained on ideal simulated data and tested on real and perturbed simulated data.

Cross-Validation
When evaluating the performances of a ML model, it is highly important to validate its stability.This step is called validation and it consists in making sure that the model has learned the right patterns of the data and it is not picking up too much noise.In other words, it evaluates the model's ability to generalize on unseen data.
In Machine Learning, the most used validation technique is Cross-Validation (CV).It consists in splitting the dataset into multiple subsets, usually called "folds", then training the model on some of the folds and evaluating it on the remaining fold.This process is repeated multiple times, each time changing the remaining fold.The result is the mean score of all the performed tests.This allows to train and test the model on different data partitions, providing a robust and unbiased estimate of a model's performance.
There are many types of Cross-Validation; for this work we use a technique named k-folds Cross-Validation (e.g., Fushiki 2011), where the dataset is divided in  folds and  − 1 folds are used as training set and the remaining one as test set.

Hyperparameters Tuning
When dealing with a ML model, one of the main aspects of designing the structure is a step called Hyperparameters Tuning, which consists in finding the best combinations of hyperparameters' models in order to achieve the best performance.Unfortunately, there are no rules or formulas to calculate these parameters, and an approach based on an extensive exploration of the hyperparameters' space along with some experience is the only way to find them, making hyperparameters tuning a computationally long and tedious process.In Python, many techniques have been developed to automate the tuning of hyperparameters and in this work we apply two of them: GridSearchCV and RandomizedSearchCV.Both these techniques make use of k-fold Cross-Validation.

SHAP: features interpretability
Machine Learning models are frequently considered "black boxes", which make their interpretation challenging.In order to understand the main features that affect the output of the model, we can leverage on Explainable Machine Learning techniques that can unravel some of these aspects (e.g., Roscher et al. 2020).One very promising technique is the SHapley Additive exPlanations, more commonly known as SHAP (e.g., Lundberg & Lee 2017;Lundberg et al. 2018Lundberg et al. , 2020;;Van den Broeck et al. 2022;Mitchell et al. 2022).It is based on Shapley values, which use game theory to assign credit for a model's prediction to each feature or feature value, increasing the transparency and the interpretability of Machine Learning models (e.g., Molnar 2022).In particular SHAP is known for its "Consistency" property.SHAP values do not change when the model changes unless the contribution of a feature changes.This means that even when the model architecture or parameters change, SHAP values still offer a coherent interpretation of the behaviour of the model.
In our case, SHAP is applied to the ML models used for time series classification.

Results
The results are presented in the following, according to the considered techniques.

Unsupervised ML: PCA and t-SNE
As stated in Sec.4.3, Dimensionality Reduction techniques can be used to discover whether a high dimensional dataset presents separate clusters when projected in lower dimensional space (e.g., bi-dimensional).Therefore, the first step of our analysis has been to perform PCA and t-SNE on the features extracted from the real time series (real data) to see if they would cluster into four separated groups corresponding to four classes: QS, HS, TPL4, TPL5 (described in Sec. 2).PCA and t-SNE visualizations show four well separated clusters, as can be appreciated in Fig. 6 (a), (b), respectively, where real data are considered.
Next, we performed PCA and t-SNE on the ideal simulated data to determine whether the trend of clustering in the four groups was also present in this dataset.As it can be appreciated in Fig. 6 (c), (d), clusters are still well visible.
Finally, given the positive results of the previous tests, we have applied the Dimensionality Reduction techniques on a dataset containing both the real and ideal simulated data expecting an overlap between the real and simulated clusters for each class.The encouraging results of this analysis are reported in Fig. 6 (e), (f).It is worth to observe that in these plots, PCA and t-SNE show the overlapping between real and simulated data clusters.In particular, the orange points representing the real TPL4 cases overlap the yellow points representing the simulated TPL4 cases; the red points representing the real TPL5 cases overlap the light-red points representing the simulated TPL5 cases; the purple points representing the real HS cases overlap the violet points representing the simulated HS cases; finally, the blue points representing the real QS cases overlap the light-blue points representing the simulated QS cases.This overlapping between clusters of real and simulated data in the reduced space confirms that the features extracted from these two datasets are similar and meaningful.In particular, these results confirm our expectations that both datasets are extracted from the same data distribution, making them suitable for the deeper machine learning analysis shown hereafter.

Supervised ML
While Dimensionality Reduction techniques allow to visualize high-dimensional data and eventual clusters within them, supervised ML algorithms provide an actual classification of the data.In our case, six classification metrics are considered to evaluate the supervised ML algorithms performances: Accuracy, Balanced Accuracy, ROC AUC, Recall, Precision, f1.A full description of the metrics can be found in Scikit-Learn (2023a) It is worth to note how some ML algorithms do not require features normalization, such as Random Forest, while for some others, such as Support Vector Machine, the normalization step strongly improves the classification performances (e.g., Singh & Singh 2020;Ozsahin et al. 2022).This peculiarity can be ascribed to the intrinsic differences in the working principles at the basis of each algorithm.
As was already noted, another crucial step that is typically (but not always) necessary to enhance classification performances is features selection.Our data shows that this is not the case; the outcomes are unaffected by the pre-processing stage.It should be highlighted, nevertheless, that this step generally needs to be preserved in the data analysis workflow.This is not the case for our data, results not being affected by this pre-processing step.However, it should be noted that in general such step must be kept in the data analysis workflow, evaluating its importance case by case.Concerning our work, the results reported in this section are then relative to datasets containing all the extracted features.

Test results
The classification performances of the three used supervised ML algorithms (SVM, RF and XGB, see Sec. 4.4) are reported in Tab. 2 for four different combinations of training and test sets.Although the motivations behind the chosen approach have already been partially described above, we remark the following observations.First of all, the real cases dataset is limited, therefore it is impossible to give a clear answer regarding the generalization capability of our models to unseen data when trained and tested on real data.For this particular reason we introduced the ideal and perturbed simulated datasets, where the ideal one is intended for training purposes leaving the perturbed one to testing ones.
The hypothesis regarding the use of the ideal simulated as training set is confirmed by the fact that the classifiers trained in this way classify correctly the real series with an accuracy that reach 98%.Lastly, classifiers trained on ideal simulated data and tested on perturbed simulated data obtain an accuracy of 100% for all algorithms, while a slightly lesser accuracy is achieved testing on real and perturbed data.
All classification results are reported in Fig. 7, where confusion matrices for each performed test are presented.A Confusion Matrix is a type of visualization particularly well suited for evaluating the performance of a ML algorithm.The rows of the matrix represent the actual labels of the test set while the columns represent the labels predicted by the algorithm.Accordingly, the corrected predictions can be found along the diagonal of the matrix and the wrong ones outside of it.
In Tab. 3 they are reported all the selected hyperparameters for each performed test divided by algorithm.

Cross-Validated results
As introduced in Sec.4.4.1 Cross-Validation is a crucial step to evaluate the model's ability to generalize on unseen data and it provides a more accurate evaluation of the model's performance.
Results obtained with a 5-fold Cross-Validation are reported in Tab. 4, where we test on different combinations of the three datasets described in Sec. 3.
The mean accuracy relative to the real cases dataset is quite high, but as already mentioned in the previous paragraph this may be due to the very limited dimensions of the dataset.In fact, this case is the one with the highest CV error score (4%) appearing on the table.Adding the ideal simulated dataset, not only increases the mean accuracy (up to 99.9% for XGB) but it also decreases the CV error score by an order of magnitude (0.09% for XGB).
The third row of Tab. 4 is relative to the combination of the two simulated datasets, where we reach extremely high accuracy and quite low CV error score for all algorithms.
Finally, the algorithms' performances is cross-validated using all the available data.Although this is the case with the highest number of series and highest variability we still achieve remarkably good results with a mean accuracy that reaches 99.9 % (for RF and XGB) and overall low CV error score.
It is important to note how in the current section we report extremely good results, sometimes reaching up to 100% accuracy, but these high numbers should not mislead the reader.The main purpose of this work is to demonstrate that our approach based on features extraction and Machine Learning algorithms works.For this reason, we have considered about 2400 series with quite regular trends and belonging to only 4 possible classes.Increasing the number of series, the number of classes or the irregularity of the series trends may lead to a worsening of the performances.
In other words, in this work we establish that our approach perfectly works in the most basic settings and, considering the extremely satisfactory results obtained, we plan to extend our goal to a more complete analysis increasing the complexity of the data in future works.

Features Importance
Features Importance is one of the key points when using a Machine Learning algorithm for an application, where the interpretation and/or explanation of the results are as much important as finding good classification/regression results.The term Features Importance relates to methods for scoring each input feature given to the model based on how useful they are when predicting a target variable; the scores indicate what we call "importance" of each feature.A higher score indicates that the particular feature will have a greater impact on the model.There are many ways to assign scores to the features; in our case we have used two different approaches: one based  It is important to keep in mind that each algorithm has a tendency to weight features in a different way, even though some of them may be the same across all algorithms.In our case, it appears that there are no features common to all three algorithms, although we can find some common ones when comparing the algorithms two at a time.These common features are reported in Fig. 8.
Let us recall that, in this work, we have used three different classification algorithms: Random Forest, Support Vector Machines and XGBoost.Our results, reported in Fig. 9 (a), (b), (c), (d) show that, for RF and SVM, most features are quite difficult to interpret, while the features ranking provided by XGBoost (Fig. 9 (e), (f)) propose a more straightforward and interpretable explanation of the model.For XGBoost in particular, the two approaches for Features Importance point out two similar pools of features, where 7 out of 10 are the same.In addition, as shown in Fig. 9 (e) and (f), both approaches rank in the top positions features whose physical meaning is quite easy to deduct from their name, such as theta sum values, theta standard deviation, theta mean and theta variance.Additionally, for XGBoost in Fig. 10, two other SHAP plots are shown: a summary plot where each feature's bar has a division into colors based on importance for each class and a beeswarm plot.A beeswarm plot is a data visualization tool used to display a summary of how the top features impact the model's output.Each point in the scatterplot represents a data point from the dataset, the vertical line represents the baseline value, which may be the model's average prediction or the expected value of the output.The position of the point in relation to the vertical line reveals whether a feature makes a positive (increasing the prediction) or negative (decreasing the prediction) contribution to the prediction and this position is determined by the Shapley value of the data point.What is important to understand is that the farther a point is from the vertical line, the higher its impact  will be on the output of the model, regardless of whether it is on the left or on the right side of the plot.For a more detailed explanation of the plot please refer to SHAP (2023).
6 Future perspective: time series with transition between trends, an approach based on sliding windows We are aware that the general case of time series observed could comprise different kinds of motion (such as the ones described and used in this work) due to transitions.In order to move towards this more complex real scenario, we have begun to work to identify regions in the time series where the kind of motion is of the same type.This capability would allow our data analysis pipeline to deal with any kind of scenario.As first approach, we have decided to leverage on standard packages for time series data analysis in the case of segmentation of non-stationary signals (e.g., Truong et al. 2020) and anomaly detection (e.g., Gensler & Sick 2018).We have performed some preliminary tests and some results are reported in this section and in the figure below.Our aim here is to give a possible direction for the next works.
The results show that it is possible to arrange a semi-automatic division of the time series in the different trends, looking for example at the average over a fixed window length (in this case made of 8500 points) sliding over the | ()| signal.The signal's mean of a window is compared to the mean of the following window; if the difference between those two values exceeds a certain threshold (empirically determined), a transition is detected.
However, despite the results can be useful and sometimes impressive (see Fig. 11), we have to investigate further how to generalize the definition of the time windows.This will be left to a future work.

Conclusions
This work deals with the problem of classification of asteroids in co-orbital motion with a given planet using a Machine Learning approach.The main parameter analysed to determine the type of co-orbital motion is a suitable angle , that is defined following the assumption of the Planar Circular Restricted Three-Body Problem (PCR3BP) and its averaged approximation.The time evolution of  allows to identify if the asteroid is in Tadpole motion, distinguishing between TPL4 (around the equilibrium point  4 ) and TPL5 (around the equilibrium point  5 ), Horseshoe (HS) motion or Quasi-satellite (QS) motion.We produce three different kinds of datasets called real, ideal simulated and perturbed simulated in order to apply Machine Learning algorithms.The datasets are formed by time series of the angle , that consist in its evolution in time for short and medium timescale (about 900 years for real asteroid cases and 3000 years for simulated cases).
The Python package tsfresh is applied to such time series, extracting meaningfully features, which are selected and, if needed, standardized.Then, a Machine Learning pipeline based on  algorithms for Dimensionality Reduction and Classification, is built, with the features extracted as input.The results show the power of such approach, with very well evident clusters in Dimensionality Reduction visualization plot and classification accuracy above 99%.This paper aims to define a methodological approach to such kind of data, serving as a backbone model for further studies, where more and more complex cases are faced.

Figure 1 .
Figure 1.In red, a sketch of the tadpole motion (left), horseshoe motion (center), quasi-satellite motion (right), in the synodic reference system.The yellow circle represents the Sun and the green one the planet.

Figure 2 .
Figure2.The (  , )-map of the co-orbital motion defined by the section  = 0.The black and red thick curves stand, respectively, for the singularity of collision and the crossing of the separatrices that originate from  3 (thick red curve).They divide the map in three regions.The QS domain is between the dark curves; the HS region, split in two parts, is between the separatrix (red curve) and the the dark curve; the TP regions are inside the separatrices (respectively, TPL4 for positive values of the angle  and TPL5 for negative values of the angle ).

Figure 3 .
Figure 3. Upper: evolution of the angle  versus time of three real asteroids in a regular co-orbital motion; from left to right, respectively, TP with Jupiter, HS with Earth, QS with Jupiter.Bottom: evolution of the angle  versus time of three real asteroids in co-orbital motion with non-regular oscillations; from left to right, respectively, TP with Earth, HS with Jupiter, QS with Venus.

Figure 4 .
Figure 4.The (  , )-maps for the three planets; from left to right, respectively, Venus, Earth and Jupiter.The points in magenta represent the distribution of co-orbital asteroids in the (  , )-map at a reference date, while the two horizontal lines stand for the eccentricities of an object in co-orbital motion with the considered planet  when it crosses the orbit of the inner and the outer planet (respectively in green and purple) with respect to .The figures are already used in DiRuzza et al. (2023).

Figure 5 .
Figure 5. Data Analysis Workflow.The first step is the time series preparation, followed by the tsfresh python package block where features are extracted and possibly selected and standardized.The final step regards the Machine Learning analysis performed using Dimensionality reduction algorithms (PCA and t-SNE) and classification algorithms (SVM, Random Forest and XGBoost).

Figure 6 .
Figure 6.PCA and t-SNE of selected and standardized features extracted from: real data (a) and (b); ideal simulated data (c) and (d); overlapping between ideal simulated and real data clusters (e) and (f).In this last case it is worth to note as the orange points representing the real TPL4 cases overlap the yellow points representing the simulated TPL4 cases; the red points representing the real TPL5 cases overlap the light-red points representing the simulated TPL5 cases; the purple points representing the real HS cases overlap the violet points representing the simulated HS cases; the blue points representing the real QS cases overlap the light-blue points representing the simulated QS cases.

Figure 7 .
Figure 7. Confusion matrix for SMV (a), RF (b) and XGB (c) algorithms when trained and tested on real data.Confusion matrix for SMV (d), RF (e) and XGB (f) algorithms when trained on ideal simulated data and tested on real data.Confusion matrix for SMV (g), RF (h) and XGB (i) algorithms when trained on ideal simulated data and tested on perturbed simulated data.Confusion matrix for SMV (j), RF (k) and XGB (l) algorithms when trained on ideal simulated data and tested on real and perturbed simulated data.

Figure 8 .
Figure 8. Common important features of the three supervised ML algorithms ranked by SHAP and Feature Importance tools.

( a )Figure 9 .
Figure 9. Feature Importances for the three different Machine Learning Algorithms, evaluated with Scikit Learn packages and SHAP.

Figure 11 .
Figure 11.Three cases of real time series data: evolution of the resonant angle  versus time; in the three cases several transitions between QS and HS regimes occur.

Table 1 .
Summary of the data available.

Table 2 .
Machine Learning multi-class classifiers results obtained with different combinations of training and test sets divided by algorithm.Because this is a multi-class classification, AUC, Recall, Precision and f1 are averaged.In the Average AUC the acronym "ovo" stands for One-vs-one and it computes the average AUC of all possible pairwise combinations of classes.

Table 3 .
Machine Learning selected hyperparameters.A full description of their meaning can be found, for instance, in Scikit-Learn (2023b,c); xgboost (2023a).Training set -Test set Algorithm Hyperparameters Real-Real Ideal-Real Ideal-Pert.Ideal-Real+Pert.

Table 4 .
Machine Learning multi-class classifiers results obtained in 5-fold Cross Validation.Training sets and test sets contain, respectively, 80% and 20% of the dataset.Standard deviation reported in parentheses.In the Average AUC the acronym "ovo" stands for One-vs-one and it computes the average AUC of all possible pairwise combinations of classes