Assembling a high-precision abundance catalogue of solar twins in GALAH for phylogenetic studies

Stellar chemical abundances have proved themselves a key source of information for understanding the evolution of the Milky Way, and the scale of major stellar surveys such as GALAH have massively increased the amount of chemical data available. However, progress is hampered by the level of precision in chemical abundance data as well as the visualization methods for comparing the multidimensional outputs of chemical evolution models to stellar abundance data. Machine learning methods have greatly improved the former; while the application of tree-building or phylogenetic methods borrowed from biology are beginning to show promise with the latter. Here we analyse a sample of GALAH solar twins to address these issues. We apply The Cannon algorithm to generate a catalogue of about 40,000 solar twins with 14 high precision abundances which we use to perform a phylogenetic analysis on a selection of stars that have two different ranges of eccentricities. From our analyses we are able to find a group with mostly stars on circular orbits and some old stars with eccentric orbits whose age-[Y/Mg] relation agrees remarkably well with the chemical clocks published by previous high precision abundance studies. Our results show the power of combining survey data with machine learning and phylogenetics to reconstruct the history of the Milky Way.


INTRODUCTION
Chemical enrichment plays an important role in the formation and evolution of our Galaxy.Significant advances in our understanding of galaxy evolution have come from interpreting data from major stellar spectroscopic surveys, such as the Galactic Archaeology with HER-MES (GALAH) Survey (De Silva et al. 2015;Buder et al. 2021), the Gaia-ESO Survey (Gilmore et al. 2022;Randich et al. 2022) and the ★ E-mail: kurt.walsen.b@gmail.comSloan Digital Sky Survey (Abdurro'uf et al. 2022).These surveys have observed hundreds of thousands of stars for which chemical abundances of up to 30 elements have been measured (see extensive discussion in Jofré et al. 2019).However, while this increase of data is essential, it is also important to develop tools for analysing larger datasets.In this paper we are concerned with the presentation and assessment of such methods in two ways -first, the extraction of chemical information from stars via stellar spectroscopy, and secondly, the analysis of chemical abundances to retrieve an evolutionary history of Galactic chemical enrichment.
Galactic history shares some features with biological evolution.Indeed, Jofré et al. (2017) and Jackson et al. (2021) showed that phylogenetic techniques can be employed to reconstruct the evolution of star formation within the solar neighbourhood using a small but very precise sample of stellar abundances of solar twins.Interpreting their results was difficult because of selection effects in their stellar samples, which were a rather small sample of spectra obtained from public archives (Nissen 2016;Bedell et al. 2018).While major spectroscopic surveys retrieve data on millions of stars observed with a well-defined selection function, there is a trade-off between the scale of data and its resolution.This study is thus motivated by carefully creating a sample of high-precision chemical abundances from survey data.We then use a phylogenetic tree algorithm to assess the formation sequences of the Milky Way.First, though, we discuss the potential and dangers of using machine learning tools (ML) to extract higher precision chemical abundances from stellar spectra for this purpose and then we introduce phylogenetic methods.

Chemical abundances from stellar spectra
In this era of large spectroscopic surveys, ML has become a revolutionary way to both, precisely and quickly derive spectral properties (see e.g.Ambrosch et al. 2023;Wheeler et al. 2020;Leung & Bovy 2019, to name a few).The revolution became notable perhaps with the introduction of The Cannon (Ness et al. 2015) into the field.The method finds a polynomial function that directly relates the spectra with labels.To do so, the function is found from stellar spectra for which these labels are known.The Cannon is very fast, and provides more precise results than standard methods when the spectra are noisy or not of very high resolution.Since it is easy to implement, it has quickly been applied on a large variety of stellar spectra (Casey et al. 2017;Buder et al. 2018;Wheeler et al. 2020;Nandakumar et al. 2022).Many other spectral analyses using ML have followed (Ting et al. 2019;Leung & Bovy 2019;Guiglion et al. 2020).Nowadays there is a large variety of chemical data products derived from neural networks or mathematical functions trained on synthetic or observed spectral grids.Since ML allows labels to be transferred from one survey to the other (Wheeler et al. 2020;Nandakumar et al. 2022), it is possible to put several surveys on the same scale provided there are stars in common between surveys (Ness 2018).Indeed, ML is so powerful that today it might seem absurd to aim performing a spectroscopic analysis on million spectra with the "standard" methods (see discussion of such methods in Jofré et al. 2019).
Although methods like The Cannon provide many advantages over the standard methods, it relies on training sets that are calibrated with the standard analyses.Choosing the training set is not straightforward, it should sample fully and evenly the parameter space of the test set, and should have accurate and precise values for the labels that want to be determined.A grid of synthetic spectra is powerful because it ensures even sampling (Ting et al. 2019), but synthetic spectra and real spectra are different from each other leading to a level of uncertainty in the results (see discussion in O'Briain et al. 2021).ML methods are suitable when the methods used on the observed test stars are the same as for the training set of stars, especially if there is information additional to the spectra (interferometry, accurate parallaxes, astroseismology, etc).This extra information provides higher accuracy for the labels (Miglio et al. 2013;Jofré et al. 2014;Heiter et al. 2015).The problem here, though, is the sampling.ML works better with large datasets, and so using the largest number of stars as possible is preferable.However, the full parameter space is not often available in these very large datasets, and so there is a trade-off between sample size and data completeness.It is thus worth investigating systematically how to train a ML algorithm with the more limited data available today.

Phylogenetic trees as a promising tool to trace Galactic chemical evolution
Phylogenetic trees are graphs that illustrate the shared evolutionary history among a dataset, allowing us to understand the hierarchical pattern of ancestry and descent which connects all of the observations (Baum et al. 2005).Phylogenetic methods can reconstruct ancestral relationships as long as there is a shared history and a heritable process linking the data objects.These objects are normally individuals, species and higher taxa in biology, where methods to analyse them have been developed (Felsenstein 1988), but they are applicable more broadly.By making the hypothesis that the stars in the Milky Way disk come from the same but evolving interstellar medium (ISM), and that the evolutionary marker (i.e. the heritable component) of the ISM is the chemical composition, we can use the chemical abundances of low-mass stars as fossil records for building phylogenetic trees (see also Freeman & Bland-Hawthorn 2002).
The hypothesis that stars in the Milky Way form from the same evolving ISM is a simplification of reality.Indeed, the Milky Way has accreted other dwarf galaxies, depositing in the ISM some gas that has been enriched by a different chemical evolutionary history.An example is the interaction of the Milky Way with Sagittarius, which has affected the star formation history of the disk significantly (Ruiz-Lara et al. 2020).This is not a limitation for phylogenetic trees however, "hybridization" processes are common in biology, and ways to characterizing their impact are active topics of research.Another simplification of this hypothesis is the fact that chemical abundances in low-mass stars are not as constant over their lifetime as we would like them to be.Heavy elements sink in the atmospheres due to gravitational settling (Lind et al. 2008;Souto et al. 2018), causing an effect in the measured abundances depending on the age and the mass.This issue has an impact on chemical tagging studies overall.
Phylogenetic trees have already been constructed in Jofré et al. (2017) and Jackson et al. (2021).These papers focused on solar twins for the practical reason that estimates of chemical abundances in solar twins are very precise, particularly if they are derived differentially with respect to the Sun (e.g.Nissen & Gustafsson 2018).Jofré et al. (2017) used high precision data published by Nissen (2015Nissen ( , 2016) ) and Jackson et al. (2021) the data published by Bedell et al. (2018).The trees were built using a nearest neighbourhood distance method, which essentially considers the pairwise distance in chemical abundances between stars to find the hierarchical differences, displaying them in a tree.Jofré et al. (2017) found a tree with different branches where the relationship between branch length and age was different, suggesting that with trees it might be possible to identify stellar families (i.e.groups of stars that cluster together and so may have a shared history), but more importantly, study their different evolutionary processes such as chemical enrichment rate.Jackson et al. (2021) followed that study by enlarging the number of stars and by choosing elemental abundance ratios which evolve with time, e.g. the so-called chemical clocks (e.g., Nissen 2016;Casali et al. 2020;Jofré et al. 2020).They also found different branches that had different ages and dynamical distributions, and attributed them to different stellar groups co-existing in the solar neighbourhood.How far these stellar groups were representative of the broader stellar population remains uncertain due to the selection function.

Aim and structure of this study
In this work we make the first study of a phylogenetic tree from survey data, for which we chose a set of solar twins from GALAH DR3 (Buder et al. 2021).We compare our tree using the published abundances from GALAH and a set of abundances obtained using ML, for which the precision is higher.To do so, we first apply the spectral fitting machinery of The Cannon to GALAH data (Buder et al. 2021) to improve the precision of the chemical abundance measurements.We systematically assess the steps in training The Cannon in solar twins.
GALAH has observed a very large sample of solar twins (to date about 40,000).By carefully applying The Cannon we provide a sample of high precision abundances for a much larger sample of solar twins that those previously published from high resolution data.This takes the sample size from around 500 stars (Casali et al. 2020) to two orders of magnitude.We then select a sample of this catalogue to test if the phylogenetic signal improves from GALAH to The Cannon abundances for the same stars.
The paper is structured as follows: In the next section we describe the data used, and this is followed by a description of how The Cannon training is set up.Our new catalogue is presented in Sec. 3 and we apply phylogenetic techniques to analyse this catalogue in Sec. 4. Some general conclusions are presented in Sec. 5.

DATA AND METHODS
In this work we use stellar spectra, parameters, and abundances published as part of the third data release of GALAH (Buder et al. 2021, hereafter GALAH DR3).Stellar spectra were observed with the HER-MES spectrograph (Sheinis et al. 2015) at the Anglo-Australian Telescope with a resolution of ∼ 28 000 across four wavelength regions in the optical (see Tab. 1) that include absorption features of more than 30 elements.For this study, we use the radial velocity corrected and normalised spectra downloaded from the datacentral 1 web interface and interpolate them onto a common wavelength scale given in Tab. 1. Flux uncertainties are derived from the provided relative error spectra and multiplied with the flux.
We use the stellar parameters ( eff , log , and [Fe/H]) as well as logarithmic elemental abundances relative to the Sun and iron, that is, [X/Fe].These were extracted from the GALAH spectra via 2 optimisation of on-the-fly computed synthetic spectra with Spectroscopy Made Easy (sme Valenti & Piskunov 1996;Piskunov & Valenti 2017).Synthetic spectra were computed based on 1D marcs model atmospheres (Gustafsson et al. 2008) and non-local thermodynamic equilibrium for eleven elements (Amarsi et al. 2020) and local thermodynamic equilibrium for all other elements.The optimisation included the constraint of surface gravities log  from bolometric luminosity estimates based on photometric information from the 2MASS survey (Skrutskie et al. 2006) and distances inferred from the Gaia satellite's second data release (Gaia Collaboration et al. 2018;Bailer-Jones et al. 2018).
The GALAH DR3 catalogue further provides age estimates, which are needed for evolutionary studies.These ages and their uncertainties are estimated via isochrone interpolation with stellar parameters through the Bayesian fitting machinery BSTEP (Sharma et al. 2018).Valued-added catalogues in GALAH also include information about orbital properties, such as total velocities, eccentricities and actions.For more details we refer the reader to Buder et al. (2021

Abundance uncertainties
The total error budget in GALAH considers a combination of precision and accuracy uncertainties.Precision uncertainties are calculated from the internal covariance uncertainties of sme, that is, the uncertainties are computed from the diagonal of the covariance matrix given by sme fitting procedures, which were adjusted to be consistent with the scatter of repeat observations as a function of the signal-to-noise ratio (SNR).For this work, we consider the precision to be the maximum between both uncertainties described above.
Accuracy uncertainties for the stellar parameters are derived from comparisons with reference stars, such as the Gaia FGK benchmark stars (Jofré et al. 2014;Heiter et al. 2015;Jofré et al. 2018), whereas for the abundances we only have precision uncertainties reported.
We thus consider for this work the GALAH DR3 precision uncertainties, which only report the final abundance uncertainty based on the maximum of the internal covariance error for the abundance fit and the SNR response to repeat observations (Buder et al. 2021).

Solar twin selection
We are interested in the solar twins of GALAH DR3.We select them from the file called GALAH_DR3_main_allspec_v2.fits 2 .This file contains measurements of astrophysical parameters and chemical abundances of 678 423 spectra from 588 571 stars derived as explained above.We consider only spectra that have a signal-tonoise ratio (SNR) above 10 in all 4 CCDs.In order to select the solar twins from that sample, we used the reported astrophysical parameters and performed the following cuts: where we assume that the solar temperature is 5 777 K, the solar surface gravity is 4.44 and the logarithmic number density of iron to hydrogen compared to the sun, [Fe/H], is 0.0 (Prša et al. 2016).
All these cuts gave us 44 317 entries, which correspond to 39 554 spectra.
Among these spectra, some stars have repeated observations.We use that sample to assess the uncertainties in our results.Information about repeat observations can be found in Appendix A.

Determining abundances with The Cannon
We use the method The Cannon to determine the new abundances.This is a data-driven approach that allows us to derive stellar labels (in our case, stellar parameters and chemical abundances) from stellar spectra.In short, the code connects the flux of a spectrum at different wavelengths with a set of labels, by constructing a polynomial model with linear, quadratic, and cross-term coefficients for the labels.It was introduced by Ness et al. (2015) and has since then been widely applied to stellar spectra (Buder et al. 2018;Casey et al. 2017;Nandakumar et al. 2022).One of its most practical strengths is that it does not use a physical but an empirical model of the spectra.This allows the method to obtain labels at comparable high precision compared with a standard derivation of abundances from e.g.synthetic spectral fitting, and it does it fast and computationally cheap.
The Cannon requires the existence of a subset of reference objects (in other words, a training set) which has well-determined stellar labels and must cover the parameter space sufficiently well and evenly.In a dataset like GALAH, it was found that there are too few metalpoor stars with well-determined labels to characterise well enough the metal-poor population observed by this survey (Buder et al. 2018).In our case, by focusing on the solar twins only, we have higher confidence to build a valuable training set for The Cannon.

Training set
We are interested in training our model with a set of high quality spectra with accurate label measurements.That model has to be used to generate new labels and uncertainties for the 39 554 solar twin stars.To select the best quality data we base our criterion on the SNR of the spectra, because GALAH DR3 provides its most accurate and precise parameters and abundances for high SNR spectra (Buder et al. 2021).We thus consider a set of spectra with SNR above 50 across all CCDs because these are the highest quality spectra obtained in GALAH DR3 and thus provide the most precise results.Since we want to make new predictions on the labels of the whole solar twin dataset, we consider the full dataset of 39 554 spectra as a test set.There are 5 144 high SNR spectra (hereafter called high SNR sample) for training and 39 554 for test/prediction.These are different spectra of different stars.Among these 5 144 high SNR spectra there are still some stars which have problems of normalisation or data reduction.We explored by eye all these stars and removed the bad spectra, reducing our training set to 5 040 stars.
Fig. 1 shows the median uncertainties in all labels for different selections of our sample.In gray we show the median internal uncertainties of the entire solar twin catalogue.In blue we plot the median uncertainties for all stars in the high SNR sample, and in orange we show the median uncertainties for a selection of a training set of 150 stars (Sect.B4).We can see how our training set has labels that are more precise than the rest of the catalogue for all labels except for surface gravity, which was predominantly estimated from nonspectroscopic features.Therefore, its precision is not dominated by SNR.
In Fig. 2 we show the individual abundances as a function of the metallicity, defined as [Fe/H], for the 39 554 solar twins as well as the 5 040 high SNR solar twins subset and the 150 high SNR solar twin selected as training set (see Sect.B4) following the same colour scheme as Fig. 1.We see the high SNR solar twin subset does not fully cover the parameter space of the entire catalogue.The low SNR spectra might induce a spread in the abundances that is driven by the uncertainties.However, we observe the selected training of 150 stars set has a good coverage of the metallicity and abundances when compared with the high SNR sample.Buder et al. (2018) discussed that The Cannon had a better performance while using masks in the spectra for each label than when using the entire spectrum without filtering specific wavelength regions.This means our The Cannon models are not performed in every pixel for every label but only on pixels that were selected to contain information about the label.That information is known from synthetic spectra (Buder et al. 2018).Later on, these masks have been considered by sme to perform the fitting of observations with synthetic spectra to provide the abundances of all GALAH stars in GALAH DR3.We used the same masks to make models for the different label predictions, although we comment that the masks are constantly being revisited in GALAH, therefore they might not be identical to those used by us here.

THE CANNON CATALOGUE
In this section we present our results about the performance of our new catalogue of high precision abundances of solar twins in GALAH.Our catalogue of abundances can be downloaded from Vizier (link provided upon acceptance of paper).

Comparison with GALAH DR3
We use the The Cannon Model 3, namely the model built with 150 solar twins of minimum SNR of 117 across all CCDs (See Appendix B).In order to validate our results we look at the agreement of the predicted values over the 39 554 solar twin catalogue at high and low SNR, as well as the uncertainties reported by our The Cannon model and compare them to the current GALAH DR3 reported uncertainties.
In Fig. 3 we show the comparisons of our labels and the GALAH DR3 results for stars in the high SNR sample.For stellar parameters  eff and [Fe/H] we have a good agreement within 50 K and 0.03 dex, respectively, with some exceptions of overestimates in the 5 boundary for  eff .However for log  we are not able to consistently recover the labels, obtaining a scatter in the one-to-one relation of 0.1 dex which is very large considering the small range of surface gravities in our sample.We recall that GALAH DR3 does not derive surface gravities from the spectra because these spectra are not sensitive to surface gravity.It is thus not surprising that the agreement is poor.For the chemical abundances we have a good agreement overall.However, we can notice some outliers for Al and Cr in the sense that we overestimate the abundances for a few metal-poor stars and underestimate the abundances for some other metal-richer stars.
In Fig. 4 we display the same results but for stars with low SNR (below 50).As expected, the number of outliers increases, with stars predicted to have effective temperatures outside the solar twin range.It is expected that the comparison will be worse in this case, as we know the measurements obtained by the pipeline of GALAH DR3 are more uncertain for low SNR spectra.We still obtain a good agreement for Na, Mg, Cu, Zn, Y, Ba, although with a higher dispersion than the high SNR stars.However for Ca, Cr we observe two trends in our prediction.The model makes near-flat predictions for metal-poor and metal-rich stars, resulting in two groups in the comparisons, with the model underestimating the abundances for these groups.For Al we observe a higher slope in the one-to-one relation, e.g.we overestimate this abundance.For Ti we obtain a flat prediction, namely all the stars in the low SNR regime have solar-like Ti abundance.In our final catalogue we remove all the outliers outside 5 boundaries found in both high and low SNR.This translates into further removing 89 high SNR stars and 749 low SNR stars, obtaining a final catalogue of 38 716 solar twins.

Comparison of uncertainties
We use the internal uncertainties obtained by our model and compare them to the ones already given by GALAH DR3.We also use the repeat observations sample (see Appendix.A) where we made predictions with the model and compare them with the uncertainties of repeat observations given by GALAH DR3.In Fig 5 we summarise our findings, we have both internal uncertainties (in solid lines) and repeat observations uncertainties (in dashed lines) for GALAH DR3/sme in black and our The Cannon in red as function of SNR.We removed the sme uncertainties for log  since the estimation for this label does not rely on spectral fitting like our The Cannon model does.
In general the internal uncertainties reported by our model are below the ones reported by GALAH DR3.For GALAH DR3 they tend to increase as SNR decreases.Our model also predicts labels more uncertain at lower SNR, but the difference in uncertainties between high and low SNR is smaller than for GALAH DR3.For the uncertainties obtained from repeat observations however, our results are comparable to GALAH DR3 for all SNR ranges.The uncertainties of repeat observations for the stellar parameters  eff and [Fe/H] are similar for both pipelines as function of the SNR, and are generally higher than the internal uncertainties.For the chemical abundances we find some cases, where our model has higher uncertainties than SME, especially at high SNR.A notable example is Ca, where our The Cannon model reports higher repeat observations uncertainties at high SNR.We note that in GALAH DR3 a more detailed approach in masking telluric lines was made (Buder et al. 2021) but here we used a less refined mask which likely contains more telluric features.
For Cu and Zn differences are negligible, which could be due to the fact that in both procedures the masking of the spectra is the same therefore the methods consider the same information from the spectra.The opposite difference is found for Si, Sc, Ti, Cr, Ni, Y where in general our uncertainties of repeat observations are lower than GALAH DR3.In particular, or Si and Mn our masks have more pixels than sme in GALAH DR3 which included an additional filtering due to blending but were not labeled as such in the masks.Our The Cannon model does not seem to be affected in terms of precision suggesting that perhaps the further filtering in the GALAH DR3 masks was too strict for solar twins, removing pixels.For Ti we also observe that The Cannon obtains more precise abundances.sme determines Ti separately from neutral and ionised Ti lines, whereas The Cannon takes all lines together.It is thus expected to have better precision in our model because more pixels are used.
In general, at lower SNR The Cannon performs better since it is always using information from the whole spectrum, whereas sme applies further filtering in masking the detected lines in each spectrum (Buder et al. 2021).This effect may be reflected in the higher internal uncertainties reported at low SNR.When there is more information in lines available in the spectra The Cannon will use this information.If we are using the same amount of pixels as SME, the repeat observations uncertainties are comparable since they are fitting essentially the same features.

Fig. 6 shows the individual abundances as a function of metallicity for
The Cannon values coloured as density plots and for GALAH DR3 as contours.We can see how the precision in terms of the dispersion improves significantly, with exceptions for Al, Cu, Zn, Y and Ba.Al, Cu and Zn have very weak lines which makes them a very difficult element to measure even for The Cannon.The internal uncertainties are comparable for these elements as discussed in the previous section.Y and Ba are elements that are expected to present a large scatter.Stars in binary systems that had an AGB companion might have been polluted by s-process elements such as Y and Ba that were produced by the AGB star (Escorza et al. 2019).Based on Gaia DR3, however, we do not find a clear signature of binarity for stars with higher [Ba/Fe] and [Y/Fe] abundance ratios in their RUWE3 parameter or the uncertainties in the radial velocities.If such high Y and Ba stars were in binary systems, their separations would be large and their periods long.Elements such as Mg, Si, Ca and Ti are formed predominantly in SNII progenitors (see Kobayashi et al. 2020, and references therein).In this sample, we do not find stars that might have formed from an −enhanced gas such as the thick disk, deducing that this sample SNR CCD2 is composed predominantly by thin disk stars.Iron-peak elements such as Sc, Cr, Mn and Ni follow the trends observed in other higher resolution and higher precision studies, such as Adibekyan et al. (2012), Bensby et al. (2014), and Battistini & Bensby (2015).
Fig. 7 illustrates our values as a function of age.Here the age corresponds to the values reported by GALAH DR3.We note these ages are not fully consistent with our new parameters since they are estimated with the GALAH-DR3 parameters.A new derivation of ages is beyond the scope of this paper.Here we aim to provide an illustration of the type of studies that could be performed with the entire usage of our catalogue.
As above, the coloured density plots represent our values determined with The Cannon.The red line corresponds to the linear regression fits of the abundance-age trend determined by Spina et al. (2016) and Bedell et al. (2018) who performed a high precision spectroscopic analysis of solar twins using high resolution.We qualitatively obtain consistent trends for all elements.There are minor offsets, such as Cr, but that was already discussed in Buder et al. (2021).Since we train with GALAH DR3 values, it is expected that the offset remains here.
With respect to Fig. 7, we also want to stress that the underlying training set of GALAH DR3 is subject to significant selection effects and systematic parameter inaccuracies.This includes the overestimation (or systematic clumping) of stellar ages of stars around 2 Gyr, due to the missing separation of young star isochrones in our solar twin parameter space.We also expect to sample more intermediate age thin disk stars from the underlying set of stars due to their relative abundance within the GALAH selection function (neglecting the Galactic plane and sampling within magnitude ranges).

PHYLOGENETIC TREES WITH GALAH SOLAR TWINS
In this section, we use our new catalogue in the construction of phylogenetic trees.For our purpose, we select from the catalogue two groups of solar twins with different orbit eccentricity.More specifically, we select the 100 stars with lowest eccentricities and 101 stars with highest eccentricities in the sample.These values come as valued-added information in GALAH and are derived from Gaia DR3 data (see Sect. 2 for details).Our goal is to test how the phylogenetic trees built using our measurements tell us about their relatedness, and how our measurements help in this goal compared to the GALAH-DR3 ones.For this experiment, we compare the phylogenies constructed from both datasets.The stars selected for the analysis are listed in Tab.C1.In that table, we are including the stars ID as labelled in the tips of our trees, in addition to the Gaia DR3 IDs for further references, and their ages and eccentricities as downloaded from the valued added catalogues of GALAH-DR3.
A natural question arises here about our choice of stars.Indeed, there is no particular reason to choose our sample above any other sample.But we need to stress that the time complexity of the NJ algorithm is  ( 3 ) (Yang 2014), where  is the number of tips, so applying NJ algorithm on 40,000 stars might be computationally impractical.This implies we need to make choices of smaller sets of data, and focus our scientific aim to a particular question.In this paper, we aim to compare the phylogenetic signal between our The Cannon and the standard GALAH-DR3 abundances.

Building trees with the Neighbor-Joining algorithm
Trees are built following the methods used in Jackson et al. (2021) and Jofré et al. (2017) by using the classical method called Neighbour-Joining (NJ), a computationally fast agglomerative cluster algorithm proposed by Saitou & Nei (1987) that iteratively joins nodes (in our case, stars) closely related by a given pairwise distance matrix.This matrix of distances has size  ×  with  being the number of stars considered in the analysis, and the distance is calculated as a Manhattan Distance between the chemical abundances of each star in the pair.The tree reconstruction is made in a greedy way by iteratively joining a pair of nodes from the distance matrix that minimises the Q criterion.For a tree with  nodes, the pair (, ) is joint by minimizing where  is the distance between the pairs (See Sect.3.3.3 of Yang 2014, for more details of this terminology).Then a new node is created and is called an internal node, which acts as predecessor of the two joint nodes.The procedure continues by recomputing the distances of the remaining nodes to the new internal node, removing the rows and columns in the distance matrix related to the two joined nodes and adding the ones related to the new one.This reduces the dimension of the distance matrix by 1.The process iterates again until the distance matrix is of size 2 × 2, joining the two remaining nodes and returning the built tree.We build our distance matrix using our selected sample of stars from both eccentricity groups (see Tab. C1).To compute the distance, we consider a vector of measured abundances for each star, and use the Manhattan Distance which is the absolute difference of two vectors.This means that we are using chemical distances for the stars in our sample in a similar way to Jofré et al. (2017).In our case, we perform a further selection of the abundances, namely those whose age-abundance trend is monotonic in Fig. 7.This is known to increase the additivity of the distance matrix, hence making NJ trees that are closer to the true phylogenetic tree (Retzlaff & Stadler 2018, see also Eldridge et al in prep for additivity in stellar abundance data).We thus exclude Ca, Cr and Ni since their age-abundance trends are flat, thus not evolving in time.Since these abundance ratios do not change in our sample, they add noise in our tree reconstruction (Yang 2014;Jackson et al. 2021).
To account for the uncertainties in the data, we build NJ trees from distance matrices computed by empirically sampling a random value out of a normal distribution for each of the abundances, centered at the reported measurement and with a standard deviation of its reported uncertainty.We build 2 000 trees by sampling the abundances according to their uncertainties and study their distribution.
From these trees, we select the best tree to be the one which has highest node support.To do so, we follow the process used in biology that searches the maximum clade credibility (MCC) tree out of a sample of trees.Clades correspond to a group of nodes that includes all the descendants of a common predecessor node in the tree.We note that with trees being built empirically we cannot immediately assume that the trees make evolutionary sense, hence an internal node can not be immeditately associated with a clade.Using simulated data is needed to learn the prospects and limitations of interpreting clades and evolutionary histories from empirical trees (de Brito Silva et al. 2023).In the MCC, each clade (or node with all the descendant nodes in our trees) in each sampled tree is given a score that reflects the fraction of times that the same pattern appears in all the sampled trees.If the clade occurs for all the trees then the support value is 1 (100%).This indicates high consistency in the data for that topological relationship.The product of these scores is defined as the tree score, so that the MCC tree is the one with the highest tree score.Here we employ this method to select our best tree and evaluate its robustness, despite not being able to ensure that our nodes and branching pattern can be directly related to clades (see more discussions below).
The distribution of support values for the nodes is shown in Fig. 8. .In gray, support percentages for the MCC tree built using GALAH DR3 data.In orange, support percentages for the MCC tree built using The Cannon data.
In gray and orange we have the support values for the MCC tree built using GALAH and The Cannon, respectively.By comparing the distributions we observe that The Cannon MCC tree is overall better supported.The GALAH highest support value in the GALAH MCC tree is 20%, meaning that every clade seen in the MCC tree only appears in at most 20% of the remaining sampled trees.However, even though the The Cannon MCC tree has more support, the overall values do not exceed 50%.This means that even at the high precision in the abundances of our new catalogue, the trees are overall poorly supported.
The root of the tree is the basal split that separates the most distant (in an evolutionary way) object from all the rest.The NJ algorithm produces unrooted trees because in this tree reconstruction method there is no evolutionary model considered and hence no way predict the ancestral state in the relationships of our stars.Hence, even though we are able to apply the MCC method to find the most supported tree, we are not able to attribute a clade in our trees as a group of nodes that includes all the descendants of a predecessor node in an evolutionary context.In our case, it is more appropriate to refer to possible groups in the trees as clans instead of clades.

GALAH vs Cannon trees
In Fig. 9 we show the MCC trees built from our sample stars.For better visualisation of our trees, we choose the tip corresponding to the star labelled with ID 0 as our reference star.This means, the tree is displayed in a way that all branch lengths are visualised with respect to ID 0. That star is the one with highest eccentricity in our sample.Its GaiaDR3 ID is 5396076243592498944, and it has an eccentricity of 0.63.This allows us to study the relationship of all stars with respect to that high eccentric one.
The left panel of Fig. 9 shows the MCC tree obtained using GALAH abundances and the right panel shows the MCC tree obtained using The Cannon abundances.Both trees have the branches coloured by the metallicity as obtained from the corresponding catalogue.The parenthesis in each tip of the trees correspond to the eccentricity group, where 0 represents circular orbits and 1 represent more eccentric orbits.Specific information about eccentricities and ages of our stars can be found in Tab.C1.
By comparing the topologies of these trees, we see some similarities.In both cases, the eccentric stars are located close to the star ID 0, and after a few splits we are able to observe two main branches.Within these branches, we select clans for further studies, which we label Clan A and B for the GALAH tree, and Clan C and D for the The Cannon tree.We will discuss these clans with more detail later on.
We also observe that the length of the tip branches differ between the trees.In fact, the tip branch lengths of the GALAH tree are larger than the ones with The Cannon.Some GALAH branches reach 4 dex while Cannon ones reach 3 dex.In fact, the GALAH tree has tip branch lengths which are larger than the inner branches.This is an indication of a hard tree (Yang 2014), which are trees prone to errors.In the GALAH tree, the difference between two stars, reflected by the sum of horizontal branch lengths connecting the two tips along the tree, is dominated by the tip branch lengths rather than the internal branch lengths, dominating over the hierarchical structure in the tree.This means that more of the chemical differences among the stars is being explained by their tips than any one of the internal branches.This is also noticed in The Cannon tree but to a lesser extent.
In any case, we observe that both trees chemically cluster together stars from both eccentricity groups, but with the The Cannon data the grouping is more resolved.In the GALAH tree, Clan A (highlighted with blue in Fig. 9) contains mostly the stars in circular orbits which are primarily metal-rich, while Clan B (in orange in the figure) con- tains a mix of stars.When using our catalogue, we see that Clan C (enclosed in red in the right hand panel of Fig. 9) contains mostly low eccentric metal-rich stars.Clan D has (in violet in the figure) stars with high eccentricities and rather metal-poor.
It is worth to comment on the selection of our clans.Between Clan C and Cland D there is a branch of stars that is more similar to Clan C than Clan D but the internal branches are short compared to the length of the tip branches, and the topology is overall more balanced.That branching pattern is indeed similar similar to a random tree, lacking phylogenetic signal (de Brito Silva et al. 2023).

Astrophysical interpretation of the selected clans
We now look into the Clan A and B from the GALAH DR3 tree and Clan C and D from the The Cannon tree.To do so, we explore the trend of age and [Y/Mg], commonly referred to as chemical-clock.Indeed, in the analysis performed on solar twins by Nissen (2015) a tight relationship between age and [Y/Mg] was found.This trend was explained with the argument that yttrium, which is an element produced by AGB stars, increases with increasing Fe, while Mg, which is an element produced by SNII, decreases with increasing Fe.Since Fe increases with time, this difference in dependency with Fe causes a strong dependency of [Y/Mg] with age.
After that study, several works have studied the applicability of this trend considering different kinds of stars, finding that solarmetallicity giants in the solar neighborhood behave similarly to the solar twins (Slumstrup et al. 2017;Casamiquela et al. 2021a) but at lower metallicities, this relationship might weaken (Delgado Mena et al. 2019;Casali et al. 2020, Vitali et al in prep).It is also suspected that this relation is subject to systematics in the age determination (Berger et al. 2022).Further studies have found this relation might not hold for stars outside the solar neighborhood (Casamiquela et al. 2021a), which can be explained by the fact that this relation has a strong dependency of the the star formation rate, which is different at different birth radii (Ratcliffe et al. 2023).
Considering that [Y/Mg] = [Y/Fe] − [Mg/Fe], the errors in the abundance ratio used in this work are computed as the quadratic sum of the errors reported for both abundances.Age estimates as well as age errors are taken from Buder et al. (2021).In Fig. 10 we show the age−[Y/Mg] trends for the selected clans in Fig. 9 following the same colours.We compute a linear regression fit of the stars in each group, and plot the fit with a line of the same colour as the corresponding clan.The legend indicates the value of the slope and the uncertainty of the fit.The dashed black line corresponds to the slope of the linear fit found by Nissen et al. (2020), for reference.
We see that the age−[Y/Mg] trends have different slopes for the different clans.However, among all the trends, it is remarkable the agreement of the slope in the fit found trend found for Clan C and the Nissen et al. (2020) fit.Clan C is the group which is composed by a majority of low eccentric stars using The Cannon abundances.We note that the stars analysed by Nissen et al. (2020) have eccentricites normally below 0.1 (see also Nissen 2015;Jofré et al. 2017;Jackson et al. 2021).It is encouraging to find that the stars in Clan C follow so well the chemical clock found in other studies on solar twins.
Interpreting this finding in terms of the phylogenetic nature of this group is however tricky.As shown recently by Ratcliffe et al. (2023), the age−[Y/Mg] relationship found by Nissen et al. (2020, and references therein) can be interpreted only by considering that birth radii also plays a fundamental role.It is only possible to explain that a sample of stars with a restricted metallicity range in the solar neighborhood can have a range in ages if they come from different galactic radii.Like this, each star traces a different star formation rate and reaches the same [Fe/H] at different timescales.The fact that Clan C is composed of stars that are mostly on circular orbits, but have a range in ages, suggests that the oldest stars might have migrated from inner regions.We would expect that old stars with circular orbits that have not migrated should be significantly more metal-poor.These are not here because we have only selected solar-metallicity stars for our study.
Our selection in metallicity is however not too restrictive.In fact, we have a range of 0.6 dex in metallicity in the sample, and Clan C contains stars of all metallicities, indicating that some ISM evolution at the solar radius must be present.Disentangling which stars are product of the inheritance of the ISM at the solar radius and which have migrated or are visiting due to dynamical heating (higher eccentricities) is tricky since all these processes are mixed in the disk (Feltzing et al. 2020) and evolve as time passes (Aumer et al. 2016;Bird et al. 2021;Lu et al. 2022) The slope of Clan D is rather flat, but that could be due to the fact that the stars in Clan D are predominantly eccentric.These stars might have originated from different Galactic radii and are less exposed to have a shared history.Most of them are also old, making the resulting fit biased.It is currently believed that stars originating from different galactic radii might have different age−[Y/Mg] relations (Casali et al. 2020;Casamiquela et al. 2021a).Furthermore, the trend and its relationship with birth radius evolves with time.For oldest stars, this ratio could have been flat across the Galaxy (Ratcliffe et al. 2023), and this is consistent with our findings.
For the GALAH Clans A and B we see a smaller difference when considering the uncertainties.Moreover, Clan A has an age-[Y/Mg] trend which is flatter than Clan B, which is the opposite to what we find with the The Cannon abundances.It is thus hard to explain that Clan A, which contains predominantly low eccentric stars, deviates more from the Nissen et al. (2020) relation than Clan B, which contains a mix of stars.This might be a consequence of higher uncertainties in the abundances of GALAH DR3.
A natural question may arise as to whether we are able or not to recover the same trends by simply doing dynamical cuts for the sample stars.To answer this question we consider Clans A and C, because they are the groups that mainly contain low eccentric stars.We compare the slopes of their chemical clocks with the ones we would obtain if we considered all the 100 low eccentricity stars in the sample.Fig. 11 summarises this result.On the left panel we plot again in blue the stars belonging to Clan A and the blue solid line is the resulting linear fit to these stars.Since Clan A does not only contain low eccentric stars, for better illustration of our findings we plot with different symbols the stars with circular orbits (in circles) and eccentric orbits (in stars).The resting stars in low eccentricity orbits are plotted with gray circles, and the linear fit is shown in the black solid line.Finally, for reference, the chemical clock fit found by Nissen et al. (2020) is shown with the dashed line, as before.All slopes are specified in the legend.For Clan C, which considers the abundances obtained by us with The Cannon, the same information is shown in the right-hand panel, and the stars are coloured in red.
As we see in Fig. 11, by just considering the low eccentricity stars we are not able to find trends that are consistent within the literature, even considering the uncertainties.This might be because we are missing important older stars which now are on less circular orbits.A cut on dynamical properties only is therefore not sufficient since the selected groups are incomplete (Soubiran & Girard 2005;Hawkins et al. 2015).When using the tools provided by phylogenetic analyses, we can chemically identify different groups of stars and find patterns that could be associated to their shared history.We stress that we obtain this result only for our new high precision abundances, demonstrating also the importance of having very high precision abundances for a better selection of stellar populations.

Discussion
The NJ algorithm is essentially a clustering algorithm, not particularly different to others available in the literature (see e.g.Ratcliffe et al. 2020, for descriptions and discussions of different clustering algorithms used in chemical data).What makes the NJ algorithm however attractive here is that first we do not have to specify the number of clusters we aim to find, unlike other fast clustering algorithms such as K-means.This is important when studying the relationships and shared history of a group of stars as a whole, where we are not primarily interested in finding groups but in studying the way in which the entire system is ordered and how this order might tell us something about their evolutionary history.Second, the NJ algorithm is not designed to only cluster the data, but to visualise the amount of divergence between pairs of objects.This translates into branch lengths that have a meaning of difference.NJ trees therefore are not expected to have the objects aligned at the tips of the tree, in contrast to other dendograms obtained by clustering algorithms such as DBSCAN or HDBSCAN (Casamiquela et al. 2021b) or other agglomerative methods (Ratcliffe et al. 2020).The procedure to define the branching pattern only depends on the distance matrix.Other clustering algorithms require to specify parameters of closeness and density in the parameter space, which cannot be constrained in an objective way (Casamiquela et al. 2021b).In fact, here we are not primarily interested in finding clusters and quantifying the number of clusters and their properties, but to visualise how the data is structured in their hierarchical order.Because of the heritable information we consider to build the trees, that order can be used to interpret shared histories, which is the essence of phylogenetics.
We comment on the poor support of our trees, where in the best case we still have a large majority of nodes with a support below 50% (see Fig. 8).Considering that the NJ algorithm takes a distance matrix as input and joins the elements that are closest to each other, by construction it will generate a tree that will reflect the hierarchical order of the distance matrix.But when the range of differences is small, a perturbation of that distribution given the uncertainties will imply a very different tree.The distance method for tree reconstruction becomes uncertain if the distances are too small for the entire sample (Yang 2014).
Using solar twins for this kind of study might also add further challenges in the interpretation of the results.Having a sample of a restricted range in metallicity makes it impossible to trace back the population that was formed from the pristine gas (e.g. with no metals).The fact that all stars are metal enriched tells us that we are studying a population from a stage in which considerable chemical enrichment already happened.Stars of different ages and different [Y/Mg] but similar [Fe/H] might well be formed at different galactic environments and arrive to the solar neighborhood through a dynamical process that can be radial migration or heating.
Given the different timescales in the pollution of Mg, Y and Fe into the ISM, it is not straightforward to interpret a clan which has a tight relationship of [Y/Mg] and age but has a limited range in [Fe/H].Is it that the NJ is simply clustering stars of different birth radii which trace different evolutionary histories of the ISM?Perhaps the AGB stars, which live longer than the progenitors of SNII, had enough time to radially migrate and pollute the ISM with Y at a different location than their birth place in contrast to the pollution of Mg by their massive siblings (Johnson et al. 2021).It is possible that the ISM has a shared evolutionary history at a wider range in radii due to migration.The interplay between migration, heating, blurring and mixing in the ISM of the Milky Way is still poorly understood (Feltzing et al. 2020).Phylogenetic methods might thus offer an interesting opportunity to learn more about these processes.
We also need to remind ourselves of the selection effect in our sample, where the selection of solar twins systematically underrepresents young stars.As explained in Sharma et al. (2018), the age estimation for both the youngest and oldest stars via the isochrone fitting is biased towards intermediate ages.
There is another source of uncertainty here, which is the fact that by considering loose stars in the disk, we can not rule out the possibility that two stars will have the same origin (e.g.be siblings of the same star formation episode of the same molecular cloud).The NJ will place these two stars in two different tips of leaves in the tree (by construction), but they in fact represent one leave.Without a previous selection of stars belonging to distinct star formation episodes, the NJ algorithm will fail in ranking the leaves, because any hierarchical order obtained for stars from the same star formation episode with be driven by the errors in the abundances and the intrinsic dispersion of such populations.We also have to keep in mind that there will be a noise because of the ISM inhomogeneities that does not reflect evolution (Kos et al. 2021;Ness et al. 2022).The hope is that noise is less than the change due to evolution (Manea et al. 2022).In that sense, nodes of poor support could also be used to identify co-natal stars.To test these possibilities, simulated data is more suitable, because in that case we know with certainty the origin of the data (de Brito Silva et al. 2023).
As seen from Figs. 8, 10 and 11, GALAH data still does not have the precision in the abundances required to perform a robust phylogenetic study.However, GALAH data has been central for this analysis, because The Cannon uses the best GALAH results to perform a re-analysis which then allows us to apply phylogenetic techniques in other GALAH stars.Considering that future data releases of GALAH are expected to increase in precision, the prospects of phylogenetic studies in GALAH are indeed very promising.

CONCLUSIONS
In this paper we have performed a systematic application of the machine learning algorithm The Cannon to a set of solar twins observed and analysed in GALAH DR3 (Buder et al. 2021) with the aim to provide a catalogue of high precision abundances of 38 716 solar twins and use a set of this catalogue for a phylogenetic study of GALAH data.Other scientific applications of high precision abundances of solar twins include setting constraints on planet engulfment processes (Bedell et al. 2018;Maia et al. 2019) or the level of homogeneity in star formation regions such as open clusters or wide binaries (Liu et al. 2019;Hawkins et al. 2020;Espinoza-Rojas et al. 2021).Therefore, our catalogue can be used to explore these subjects, in addition to our primary intention to perform a phylogenetic analysis.
In the systematic application of The Cannon for the generation of this catalogue, we investigated the impact of the labels considering different training sets.We first varied the size of the training set, and then studied the label recovery when removing outliers.This analysis helped us to conclude that a training set with 150 GALAH stars of SNR > 117 was sufficient for predicting precise labels of stellar parameters and 14 chemical abundances of solar twins observed with GALAH.
Our results agree within 50K in temperature, 0.09 dex in log , 0.03 in [Fe/H], and in 0.05 dex in abundances with GALAH for stars SNR > 50.For lower SNR the results agree less, within 60 K in temperature, 0.1 dex in gravity, 0.07 dex in metallicity and 0.1 dex in the other abundances.This is expected considering that also GALAH data are more uncertain for lower SNR.The internal uncertainties of our model at lower SNR do not significantly increase compared to high SNR results.The consistency of predicted labels for repeat observations remains comparable to the GALAH ones, provided both GALAH and The Cannon used the same pixels to extract the information.
Our new catalogue allows us to perform phylogenetic studies on the solar neighborhood that require high precision abundances.We analysed 200 stars separated in two eccentricity groups, namely a group with circular orbits and another one with orbits of eccentricity around 0.4, and we compared the trees obtained for these groups using GALAH and The Cannon abundances.In both cases, we were able to find clans which were distinct in eccentricities, with one clan notably grouping the stars with circular orbits.While the node support in the The Cannon tree is higher than for the GALAH tree, the overall support in both trees is not outstanding.This is expected for a sample of stars which are so similar to each other that the hierarchical differences between stars is very small and thus uncertain.It is also possible that many of these solar twins are tracers of the same star formation episode, causing a conflict in the tree shape which is forced by the neighbor joining algorithm.To truly study the support and amount of information carried out in chemical abundances, simulated data should be used instead, even if simulated data is prone to systematic uncertainties (de Brito Silva et al. 2023).
The trees still allowed us to study the astrophysical nature of the clans found.To this aim, we compared age-[Y/Mg] relation (e.g. chemical clock) of the clans with the literature (Nissen et al. 2020), obtaining a remarkable agreement for one of our clans with our The Cannon abundances, but no agreement with the clans found in the GALAH tree.The agreement between the chemical clock of Nissen et al. (2020) and ours was obtained noting that the clan included mostly stars with circular orbits, but some older stars with eccentric orbits as well.Indeed, the age-[Y/Mg] relation found for only stars with circular orbits does not agree with the chemical clock obtained by Nissen et al. (2020), because of the lack of old stars with circular orbits.A phylogenetic tree can thus help to identify the stellar family that traces the chemical evolution of the solar neighborhood, despite these stars having changed their orbital properties during their lifetimes.The abundances, however, need to be of very high precision.
Our work demonstrates the promising future of galactic phylogenetics, in which we can use large spectroscopic surveys like GALAH with machine learning to improve the chemical abundances which then can be used as input for phylogenetic analyses and so reconstruct the history of our home galaxy, the Milky Way.
For this sample we compute the uncertainties as follows: for each star we consider its observation of maximum SNR in CCD2, then for each repeat observation we compute the difference in the prediction compared to the one of maximum SNR.Finally, after doing it for all the stars in the sample, we divide the SNR range in bins of length 10 and we compute for each bin the standard deviation of the difference in predictions of the repeat observations within SNR bin to their respective maximum SNR prediction.
The bottom panel of Fig. A1 shows a normalised density histogram of the repeat observation counts (in cyan) as function of the CCD2 SNR.In gray we show the observation counts for the entire solar twin catalogue.The spectra of repeat observations cover well the SNR distribution of the entire catalogue.The figure shows that our sample is a good representation of the SNR of the entire GALAH DR3 dataset.are essentially zero for all the labels, showing the overfitting effect in The Cannon.
We comment the bias in the parameter logg and its similarity with [Y/Fe], which are highlighted with orange and brown colors, respectively.GALAH DR3 surface gravities are not determined directly from the spectra, but from photometry and the parallax (Buder et al. 2021) because GALAH spectra do not contain sufficient dependency of this parameter.It is hence expected that the The Cannon model does a poor job in predicting this parameter.
The impact of the bias in log  with [Y/Fe] is because the our The Cannon model considers the ionised Y lines in GALAH spectra.Ionised lines have a dependency on surface gravity.If the surface gravity is poorly determined, it is expected that a method deriving abundance of ionised lines of a given strength will respond by balancing the ill-determination of surface gravity with an ill-determination of that abundance.Most of the other abundances are derived from neutral lines, which are less sensitive to gravity.From the bottom panel of Fig. B1 we observe that for most of the labels the differences show a decreasing trend with training size .The trend reaches a plateau at around  = 150 which is marked with a vertical dashed line.
To further assess the potential problems of overfitting in the training process, we compute the Mean Squared Error (MSE).This allows us to assess the quality of the predictions made by the different The Cannon models in the test step, as well as the self-test, i.e. the  Cannon model and perform a self-test.The results for Model 1 can be found on the left panels in Fig. B4 for stellar parameters and Fig. B5 for chemical abundances.We set up the 3 boundaries for all labels and define for each one of them the outliers those who lie outside this boundary.In the figures, these outliers are plotted with red and blue colours.In general the predictions about the parameters agree with GALAH DR3.However, in Fig. B5 we begin to see outliers for some of the chemical abundances, in particular as Ca and Mn.There are four Ca lines, but two of them lie in telluric lines where the correction may not always be perfect (Buder et al. 2018).In GALAH DR3 the spectrum uncertainty is increased for these lines to account for strong blends of the telluric lines.Here with The Cannon we take the spectrum uncertainty directly from the database, obtaining overabundances of Ca because the model finds a high absorption feature (Buder et al. (2018)).
We took two different approaches to deal with the outliers.The results of these two approaches can be seen in the middle and right panels of Figs.B4 and B5, for Model 2 and Model 3, respectively.There are no outliers for the stellar parameters in both approaches.The same holds for the chemical abundances in Fig. B5 where most labels have either no outliers or very few.

B4 Choosing final training set
From Figs.B4 and B5 we can see that Models 2 and 3 are an improvement with respect to Model 1 in terms of agreement in with respect to GALAH DR3.Fig. B6 shows the overall percentage improvement of the dispersion  in each label, given by the two latter models with respect to Model 1.The blue bins represent the percentage of improvement for Model 3, and the red lines represent the increment/decay in percentage of the improvement in  by the Model 2. For stellar parameters we observe a higher improvement given by Model 2, with a difference of 3.3% , and 4.9% for the  eff and metallicity, respectively.For log  we have a considerable difference of 11.5% in favor of Model 2. For chemical abundances we observe negligible differences for Mg, Ti, Mn, Ba where the differences are up to 1.4%.For Al, Si, Ca, Sc, Cu we find higher differences up to 13.7% in favor of Model 2, and for Na, Cr, Ni, Zn, Y we obtain differences up to 11.9% in favor of Model 3. The mean improvement over all labels is 36.2% and 34.8% for models 2 and 3, respectively.
Taking into consideration that the overall difference of 2.6% in mean improvement for all the labels is very small, we choose Model 3 for training.This model has an optimal size of 150 as well as a better coverage in the parameter space of stellar parameters and 14 chemical abundances for solar twins.

APPENDIX C: STARS SELECTED FOR PHYLOGENETIC ANALYSIS
Table C1 shows the ages and eccentricities of 50 stars used in the tree.The information about the rest can be found online.

Figure 1 .
Figure 1.Median internal uncertainties per label as reported in GALAH DR3.In gray dots we show all 39 554 solar twins from our selected catalogue.In blue triangles we show the 5 040 high SNR sample of solar twins.The orange stars represent the selection of 150 stars used as final training set (see discussions in Sect.B4).Uncertainties are smaller for the high SNR samples which are used for training.

Figure 2 .
Figure 2. Panels of abundances in the format of [X/Fe] vs [Fe/H].The corresponding abundance X is displayed in each panel.In gray, all 39 554 solar twins data.In blue, all 5 040 high SNR solar twins data.In orange, 150 high SNR spectra of solar twins training set selection (see Section B4).The entire dataset covers a wider range in abundances, but that could be due to higher uncertainties.Both train set and high SNR set have similar range in abundances.

Figure 3 .
Figure 3.Comparison of labels obtained with The Cannon and GALAH-DR3 for stars with min(SNR) > 50.Each panel correspond to a different label.Outer dashed lines correspond to 5 boundaries.Overestimates are plotted in red, underestimates in blue.The mean  and standard deviation  of the difference between results are specified in each panel, as well as the number of outliers found outside each boundary.

Figure 4 .
Figure 4. Same as Fig. 3 but for stars with spectra of min(SNR) < 50.

Figure 5 .
Figure 5.Standard deviation of uncertainty labels as function of SNR CCD2.In black, sme uncertainties.In red, The Cannon uncertainties.Solid lines represent the internal uncertainties given by the covariance matrix of the fitting by both sme and The Cannon.Dashed lines represent the uncertainties of repeat observations.

Figure 6 .
Figure 6.Individual abundances as a function of metallicity for the solar twins analysed in this work.Coloured density plots correspond to our labels as determined using The Cannon while contours delineate the distribution of GALAH DR3.

Figure 7 .
Figure 7. Individual abundances as function of stellar age for the solar twins analysed in this work.Coloured density plots correspond to our labels as determined using The Cannon.Red solid lines correspond to the linear regression fits of abundance-age trends found by Spina et al. (2016) and Bedell et al. (2018).Age estimates are taken from GALAH DR3 (Buder et al. 2021).

Figure 8 .
Figure8.Node support percentage for MCC trees of Fig.9.In gray, support percentages for the MCC tree built using GALAH DR3 data.In orange, support percentages for the MCC tree built using The Cannon data.

Figure 9 .
Figure 9. Maximum Clade Credibility (MCC) trees from a sample of 2 000 trees for 201 stars of different eccentricity groups with tips coloured by eccentricity.On the left, MCC tree obtained with GALAH labels.On the right, MCC tree obtained with The Cannon labels.Stars in selected eccentricity groups are enumerated in the tips with values between 0 and 200, 0 being the fixed branch placed at the most eccentric solar twin in the catalogue (eccentricity of 0.63), 1-100 being the following most eccentric stars and 101-200 referring to the less eccentric stars in the catalogue (see Tab. C1).Selected clans A-B and C-D can be seen from the coloured areas of GALAH and The Cannon, respectively.

Figure 10 .
Figure 10.Age-[Y/Mg] relations of the stars in the clans selected from the trees of Fig. 9. Linear regression fits are drawn to quantify the slopes in these relations.Trend found by Nissen et al. (2020) displayed as a dashed black line.[Y/Mg] errors taken from the internal uncertainties reported by GALAH DR3 and our The Cannon catalogue.Age errors taken from GALAH DR3 Buder et al. (2021).

Figure 11 .
Figure 11.Age-[Y/Mg] relations of the stars in Clans A (GALAH DR3) and C (The Cannon) selected from the trees of Fig. 9. Linear regression fits are drawn to quantify the slopes in these relations.A linear fit using all low eccentricity stars selected for the analysis is displayed as a solid black line.Trend found by Nissen et al. (2020) displayed as a dashed black line.[Y/Mg] errors taken from the internal uncertainties reported by GALAH DR3 and our The Cannon catalogue.Age errors taken from GALAH DR3 Buder et al. (2021).

Figure B1 .
Figure B1.Top panel: Bias in train as function of training size  .Bands represent the 16th and 84th percentile of the values obtained for the 10 different trained models with training sets of size  .Bottom panel: Bias Median in test as function of training size.For better visualisation of the results, effective temperature is not shown here.

Figure B4 .
Figure B4.One-to-one comparison of stellar parameters  eff , log , [Fe/H].Comparison of 3 setups/models.Model 1: 150 spectra of minimum SNR of 132 (left column), Model 2: 100 spectra of minimum SNR of 132 (middle column) and Model 3: 150 spectra of minimum SNR of 117 (right column) .In x axis, GALAH DR3 labels.In y axis, The Cannon model estimates for labels.Outer dashed lines correspond to 2 and 3 boundaries for Model 1 and Models 2,3 respectively.Overestimates in red, underestimates in blue.Upper left and bottom right of each panel shows the median  and standard deviation  of the difference, and the number of outliers found outside each boundary, respectively.
The first one was to remove all the 50 outliers found in the self-test and train a new The Cannon model with the remaining 100 stars of minimum SNR 132 (hereafter Model 2).The second one was to remove the outliers in the first model and set a lower threshold in SNR to add more stars and build up a new training set of size 150 to then train a new The Cannon model.In this second case that this model produced new outliers, so we reiterated the process by removing such outliers and setting a lower minimum SNR threshold to build a new training set of size 150.After 4 iterations we converged to a The Cannon model trained with a set of 150 stars with minimum SNR of 117 (hereafter Model 3).

Figure B5 .
Figure B5.One-to-one comparisons for chemical abundances for the 3 setups/models.Model 1: 150 spectra of minimum SNR of 132 (left column), Model 2: 100 spectra of minimum SNR of 132 (middle column) and Model 3: 150 spectra of minimum SNR of 117 (right column) .In x axis, GALAH DR3 labels.In y axis, The Cannon model estimates for labels.Outer dashed lines correspond to 2 and 3 boundaries for Model 1 and Models 2,3 respectively.Overestimates are plotted in red, underestimates are plotted in blue.Upper left and bottom right of each panel shows the median  and standard deviation  of the difference, and the number of outliers found outside each boundary, respectively.
).1 https://datacentral.org.au/services/download/ Figure B6. improvements with respect to Model 1 for all labels (in percentage).Improvement given by model 3 in blue bins and improvement/decay in percentage made by model 2 in red lines.