Carbon- and Oxygen-rich stars in MaStar: identification and classification

Carbon- and Oxygen-rich stars populating the Thermally-Pulsing Asymptotic Giant Branch (TP-AGB) phase of stellar evolution are relevant contributors to the spectra of ~1 Gyr old populations. Atmosphere models for these types are uncertain, due to complex molecules and mass-loss effects. Empirical spectra are then crucial, but samples are small due to the short (~3 Myr) TP-AGB lifetime. Here we exploit the vastness of the MaNGA Stellar library MaStar (~60,000 spectra) to identify C,O-rich type stars. We define an optical colour selection with cuts of (g-r)>2 and (g-i)<1.55(g-r)-0.07, calibrated with known C- and O- rich spectra. This identifies C-,O-rich stars along clean, separated sequences. An analogue selection is found in V,R,I bands. Our equation identifies C- and O-rich spectra with predictive performance metric F1-scores of 0.72 and 0.74 (over 1), respectively. We finally identify 41 C- and 87 O-rich type AGB stars in MaStar, 5 and 49 of which do not have a SIMBAD counterpart. We also detect a sample of non-AGB, dwarf C-stars. We further design a fitting procedure to classify the spectra into broad spectral types, by using as fitting templates empirical C and O-rich spectra. We find remarkably good fits for the majority of candidates and categorise them into C- and O-rich bins following existing classifications, which correlate to effective temperature. Our selection models can be applied to large photometric surveys (e.g. Euclid, Rubin). The classified spectra will facilitate future evolutionary population synthesis models.


INTRODUCTION
Stellar populations in galaxies include stars across all evolutionary phases, which contribute to the integrated galactic spectrum according to their energetics, timescales of evolution and stellar spectral energy distribution.For calculating stellar population models (e.g.Maraston 2005), which aim at describing the integrated emission of complex stellar systems, a proper account of all stellar species is key.While energetics and timescales are usually known from stellar evolution calculations, stellar spectral energy distributions can either be taken from theoretical model atmosphere calculations (e.g.Kurucz 1979) or from libraries of observed spectra of Milky Way stars (e.g.MaStar, Yan et al. 2019).While both approaches are advantageous in different respects (e.g.see extensive discussion in Maraston & Strömbäck 2011), the use of empirical spectra is unavoidable for certain stellar types.To this class belong Carbon-rich (hereafter C-rich) and Oxygen-rich (hereafter O-rich)-type stars, populating the short yet luminous Thermally-Pulsating Asymptotic Giant Branch (TP-AGB) ★ E-mail:lewis.hill@port.ac.uk † E-mail:claudia.maraston@port.ac.uk phase of stellar evolution.The TP-AGB phase, with a maximum fuel consumption for 3  ⊙ stars (Maraston 1998),was predicted and has been shown to be relevant for the spectral modelling of ∼ 1 Gyr populations, such as those featuring high-redshift galaxies and local star forming galaxies (Maraston 2005;Maraston et al. 2006;Capozzi et al. 2016;Riffel et al. 2015;Liu & Luo 2023).This is now confirmed spectroscopically by the detection of the strong spectral features typical of C-rich and O-rich TP-AGB stars, in the spectra of distant ( ∼ 1 − 2), massive ( ∼ 10 10  ⊙ ) quiescent galaxies (Lu et al. 2024).
The challenges presented by TP-AGB C-and O-rich stars and their spectra are manyfold.Firstly, the whole TP-AGB phase only lasts ∼ 3 Myr, meaning subphases tracing the stellar and envelope evolution along the phase are even shorter.The short timescales imply a paucity of calibrators for theoretical tracks and spectra.Next, Cand O-rich type spectra are featured by low temperatures, in general, emitting little flux below 6000 1 and deep molecular absorptions, such as TiO, VO, CO, H 2 O, CN which are complicated to model. 1 An exception are Carbon stars of secondary origin, see Sec. 4.3 Finally, the TP-AGB phase is characterised by strong mass-loss and pulsation (on evolutionary and dynamical timescales), which adds further complications as the spectra are not static in most cases (Iben & Renzini 1983;Lançon & Wood 2000;Lançon & Mouhcine 2002;Rosenfield et al. 2016;Höfner & Olofsson 2018).In summary, large stellar samples are needed to deal with C-O-rich type TP-AGB spectra.
In this work we exploit the latest release of the MANGA Stellar Library MaStar (MaStar Yan et al. 2019, and Yan et al. in prep.) the largest empirical stellar library assembled to date containing as many as 59,266 good quality spectra for 24,130 unique stars at a median resolution of R ∼ 18002 -to hunt for the rare C-and O-rich type TP-AGB spectra.The wavelength range covered by MaStar -3620 − 10350 Å -albeit still confined into the optical, is wide enough as to sample the unique features found in this type of spectra and allow a distinction between Carbon-rich and Oxygen-rich type stars.Below 1, the spectra of late, AGB O-rich stars are mostly featured by TiO (Titanium Oxide) and VO (Vanadium Oxide) bands (see discussion and Figure 4 in Lançon & Wood (2000) and references therein).Carbon-rich spectra are quite different from the O-rich ones, being mostly featured, in the same spectral range below 1, by sharp CN (cyanide radical) bands.Broadly speaking, O-rich spectra display broader, more gentle absorptions, whereas C-spectra have harder, sharper features (cfr.Figures 2 and 7 in Lançon & Mouhcine 2002).Due to these difference, it maybe possible, albeit time-consuming, to visually inspect all MaStar spectra for C-and O-rich type stars and perform a rough classification.On the other hand, we wish to obtain a more accurate, quantitative identification and classification and to define an automated procedure also in view of large scale surveys that will observe orders of magnitude more spectra than MaStar.
In order to reach this goal, we explore broad-band colours based on the SDSS filters u, g, r, i, z (Fukugita et al. 1996) to identify and separate C-and O-rich type spectra.While the use of only SDSS filters was previously explored in the literature (Margon et al. 2002;Downes et al. 2004;Green 2013), the novelty here is that we adopt previously classified C-and O-rich type spectra (from Lançon & Mouhcine 2002, hereafter LM02) to calibrate the equation of our colour selection cut.This way we obtain a new and calibrated colour selection cut based on g−r and g−i to select C-and O-rich type spectra from any photometric survey3 .We quantify the efficiency of our colour selection and of literature work (Margon et al. 2002), in terms of purity and completeness, which demonstrate the increased accuracy we obtain.
We then perform spectral fitting of the colour-selected candidate MaStar spectra for C and O-rich types using our spectral fitting pipeline for MaStar spectra (Hill et al. 2022a) but now adopting as fitting templates empirical C-and O-rich spectra.With this procedure we could also identify a set of non-AGB, dwarf Carbon star spectra of secondary origin.
It should be stressed that our classification metric is based on average spectral types for O-rich and C-rich stars (see LM02, Tables 1,2 for details on the component stars in each average), the latter mostly containing standard N-type AGB carbon stars.Hence by their use we can only probe this broad distinction and cannot attempt any finer spectral classification into (rare) Carbon sub-types such as J,R,S (see Wallerstein & Knapp 1998, for a review on Carbon stars).Note also that a proper analysis of these peculiar spectral types requires spectra with a higher spectral resolution than that of MaStar (Sec.2) and a wavelength extension into the near-IR or a combination of both, i.e. spectral indices plus matched near-IR magnitudes as in the Carbon star classification for the LAMOST survey by Ji et al. (2016), or using a combined GAIA-2MASS colour diagram as in Abia et al. (2020Abia et al. ( , 2022)).In our work we make use of just SDSS optical colours and MaStar spectra.
The paper is organised as follows.In Section 2 we summarise the MaStar observations and describe our calculation of synthetic colours.Section 3 presents our new colour equation for the selection of C-and O-rich stars.Section 4 presents a quantitative evaluation of the performances of the new selection and of a previous selection for C-rich stars from the literature.Section 4 also reports the results of the identification.Section 5 present a spectral classification of the candidates selected via photometry.Summary and conclusions are in Section 6.

Generalities
MaStar (the MaNGA Stellar Library, (Yan et al. 2019) is currently the largest empirical stellar library of Milky Way stellar spectra, with extensive optical wavelength coverage and spectral types spanning the HR-diagram.Observations were carried out with the Baryon Oscillation Spectroscopic Survey (BOSS) spectrographs (Smee et al. 2013) mounted on the 2.5m Sloan Foundation Telescope (Gunn et al. 2006) at the Apache Point Observatory over a 6 year period.By observing in parallel to the APOGEE-2N survey (Majewski et al. 2016), the MaStar survey was performed through the acquisition of optical spectra in the same field of view.By using fiber bundles, MaStar can provide spectra with a high median signal-to-noise of 96 per pixel.The resulting catalogue of stellar spectra contains 80,592 per-visit-spectra for 27,945 unique stars with the relatively wide wavelength coverage of 3620 − 10350 Å. Spectra are observed at a medium, wavelength dependent resolution (R) of 1800, which varies from fiber to fiber and from observation to observation (see Yan et al. 2019;Law et al. 2021, for details on the resolution vector).
The unique observing strategy of MaStar has allowed for one of the most comprehensive stellar libraries in the literature, with an extensive coverage of the H-R diagram and rare combinations of atmospheric parameters (Chen et al. 2020;Maraston et al. 2020;Hill et al. 2022a,b;Lazarz et al. 2022, Chen et al. 2024, submitted).The full HR of MaStar can be seen in Figure 12 of Hill et al. (2022a).To achieve this, stars with known stellar parameters from APOGEE ( (Majewski et al. 2017), LAMOST (Boeche et al. 2018), SEGUE (Yanny et al. 2009)) were targeted to ensure a uniform coverage of atmospheric parameters and element abundance ratios.Then we complement this using photometry from Gaia DR1, Gaia DR2, Pan-STARRS1 (Chambers et al. 2016) and APASS to identify stars in rare and/or extreme parts of the stellar parameter space.Moreover, in early phases MaStar targeting was augmented by manual target selection for rare spectral types.This strategy allowed the sampling of stars in short-lived stellar phases, i.e. the TP-AGB phase.It should be stressed that MaStar -by its observational strategy -is not meant to be a complete sample of spectra in any stellar phase.Relevant to this paper, MaStar observations were not performed by target selecting C and O-rich types specifically hence not tuned according to a star's periodicity.Further details of the MaStar observing strategy and data reduction can be found in Yan et al. (2019) and Yan et al. (in prep).
The MaStar data products are described in section 6 of Abdurro'uf et al. (2022) and are publicly available to download4 .
MaStar observations are split into two catalogues named 'goodspec' and 'badspec' (see Yan et al. 2019), containing 59,266 and 21,326 per visit spectra, respectively.The 'goodspec' is the standard catalogue of good, ready-to-use spectra, while the 'badspec' is a mixed bag of truly bad spectra due to e.g.missing pixels, but also unusual spectra, including emission-line spectra and rare types that were not obviously classified by the visual inspection, which was the main criterium to place spectra in either catalogue.Due to their peculiar spectra, C and O-rich type stars may exist in the 'badspec' catalogue, which is why in our analysis we make use of both catalogues.However, in order to avoid using spectra with a large number of missing pixels, we remove spectra with a missing pixel fraction greater than 1% from the analysis.This leaves 20,653 'badspec' spectra5 .

Synthetic Magnitudes
In order to study a colour classification we need magnitudes for all our spectra and for the calibrating set by LM02, by XSL17 and for the theoretical Kurucz spectra.While available SDSS magnitudes could be used for the MaStar stars, they need to be calculated from the spectra for the other sets.Therefore it is more rigorous to recalculate magnitudes for all spectra, MaStar and non MaStar ones.This way we make sure all values are homogeneously determined from the spectra with the same code, response functions and zero points).
MaStar observations are matched to the Gaia EDR3 survey6 and it turned out that 97 percent of MaStar spectra have corresponding Gaia data (Gaia Collaboration et al. 2021).Using the Bailer-Jones et al. (2021) distances and 3D dust map values of Green et al. (2019), we correct spectra for galactic extinction before calculating the synthetic photometry.This is done using the Fitzpatrick (1999) dust extinction law.
For each spectrum we then calculate the synthetic photometry in the AB system for the SDSS filters u,g, r, i, z (Fukugita et al. 1996).The synthetic photometry is calculated with the code of Maraston (2005), which performs a convolution integral of the spectrum with the response function normalised to the integral of the response function.We further adjust the filter transmission profiles of  and  to the MaStar wavelength extension (3630 -10,300 Å ), which does not cover  and  entirely.We cut-off the filter curves of the standard u band from 3040 Å to 3622 Å and of the standard z band from 11600 Å to 10300 Å respectively.It should be noted that this does not affect our analysis because the same modified response function is applied to all spectra and also because the colour selection does not employ either affected bands  or .
In addition to SDSS synthetic magnitudes, we calculate the   colours in the Johnson-Cousins system.In Section 3, we show how our selection of C-,O-rich spectra forms a unique sequence in the ( − ) vs. ( − ) colour-colour plane (Figure 3).Furthermore, we crossmatch all MaStar observations with 2MASS (Skrutskie et al. 2006) to obtain JHK  photometry.This returns 2MASS photometry for 99% of the spectra we use.Details of how the crossmatch is performed can be found in Abdurro'uf et al. ( 2022) and Yan et al. (in prep).
In Figure 1 we show the distribution of all data, in a ( − ) vs. ( − ) diagram.We find stars as red as ( − ) = 6.5 and  − ) ∼ 4.5 (for reference, in Gaia colours, MaStar contains stars as red as G  -G  = 4.7).A clear bifurcation is seen at  −  > 2, which it turned out to coincide with the C-and O-average colours, as we shall explain in Section 3. To gain an overview of all five magnitudes, we plot their pairwise relationship in Figure A1.The diagonal plots in grey show the distribution of each magnitude as a kernel density estimate.As expected, magnitudes that are closer in wavelength have a tight relationship and those at opposite ends of the spectrum, such as u and z, are less correlated.

DEFINITION OF A COLOUR SELECTION EQUATION
Here we study a selection in optical colour space able to identify and separate TP-AGB C and O stars and distinguish them from the rest of stellar species in virtue of their very red colours and peculiar spectra.As mentioned in the Introduction, a similar approach for the selection of C-stars is found in (Margon et al. 2002).In Section 4.1 we shall compare ours to this previous effort.
In order to obtain a colour selection cut able to identify C-and Orich type TP-AGB spectra, we start by calculating the SDSS colours of known C-and O-rich type TP-AGB spectra.We use the Lançon & Mouhcine (2002, hereafter LM02) C-and O-rich type spectra, which are average spectra for these stellar types in order to remove variability.These are binned into 5 and 9 bins for C-type and O-rich type respectively, with 1 the warmest and 5 the coldest for C-types and 9 for O-rich types.These average spectra are used to describe the TP-AGB phase in the stellar population models by Maraston (2005).We also use colours from the C-type spectra identified by Gonneau et al. (2017, hereafter XSL17) in the X-Shooter Library.
After exploration of combinations of the five SDSS magnitudes, we conclude in favour of the ( − ) vs. ( − ) colour-colour plane.This combination allows for the identification of C-and O-rich type spectra and the separation between the two types best and minimise contamination from other spectral types.For the selection of C-type stars, we suggest a colour selection as follows: For O-rich type stars we use a similar colour relation, where the inequality of the second equation is reversed: The calibration is visualised in Figure 2, where blue and red filled circles are the 9 O-rich type and 5 C-type averaged spectra from LM02, red open circles are the colours of classified C spectra from XSL17 and grey points represent MaStar spectra.Also plotted as black symbols are colours of standard Kurucz spectra from atmosphere models (Kurucz 1979;Lejeune et al. 1998), further divided between giants (log g < 2.5), dwarfs (log g > 4.7) and intermediate types (2.5 < log g < 4.5) 7 .The blue line, described by Equation 1 and 2, was determined by firstly fitting a linear model to each sequence of C-and O-rich types from LM02 and XSL17.The average value of the gradient of these two lines was then calculated to return the 7 We note that some Kurucz giants are offset from the main locus of stars.This does not affect our analysis.blue line as plotted8 .A clear separation is visible between C and O-rich type spectra, with the two sequences lying roughly parallel in this colour plane.Furthermore, we can see that the reddest MaStar spectrum corresponds to the coolest LM02 C-type 5 bin.The C-type colour sequence is unique, while some contamination from dwarfs is found in the O-rich type sequence.In Section 4.1 we shall quantify the contamination in both sequences.Figure 2).Therefore we conclude that MaStar is able to cover the main broad spectral types of O-rich and C-rich stars as dedicated optical plus near-IR surveys such as e.g. the one by Lançon & Wood (2000) and also most part of the XSL17 C-stars.We shall return to this point in the Conclusions (Section 6).
To conclude, the existence of unique colour sequences tracing C-rich and O-rich spectra is not a prerogative of SDSS colours.In Figure 3 6).The -filter (effective wavelength at ∼ 6400 Angstroms) encompasses a relatively smooth region of the continuum for both types, which is why their colours are similar, but note the extension towards (V-R) colours for the later (i.e.coolest) O-rich types.

Quantitative assessment of different colour cuts.
In this work we present a new colour equation for selecting C-and O-rich type stars.Our equation only uses SDSS photometric bands and is therefore cheap, not requiring any near-IR photometry nor spectra, and can be easily applied to a variety of galaxy surveys.In this section we quantitatively assess the power of our classification method plus the Margon et al. colour equation for C-types, using an homogeneous test sample.
To measure the effectiveness of the colour cut classifiers, we use the F1 and F-beta score metrics.The F1 score averages the precision and recall and is a useful metric when working with class labels that are not equally represented.In our case, there are only a few C-and O-rich type spectra compared to all other stellar types, making our class labels imbalanced.The F1-score metric is defined as: where precision measures the number of true positives as a fraction of all predictions and recall is the number of true positives as a fraction of all data points in the target class, where the target class is labelled C-or O-rich type spectra9 .For this data, a true positive would be the case where a C-or O-rich type spectrum is predicted successfully by the classifier.Additionally, we consider the F-beta score, which allows one to adjust the importance of either precision or recall in the F1 score calculation.The F-beta score is defined as: Since we are interested in how well the classifiers can select a specific class of stars out of all observations, we set beta to 0.5 which makes the output of the metric more sensitive to precision.The best value for the F1 and F-beta score is 1 and the worst is 0. The confusion matrices for the data represented in Figure 4 are shown in Figure 5.
In Table 1 we show the mean F1 and F-beta scores.Focusing on C-star methods as this is covered by both approaches, the Margon et al.'s method returns a mean F1-score of 0.27 while our cut of ( − ) > 2, ( − ) < 1.55( − ) −0.07 returns 0.72.This trend is repeated for the mean F-beta scores.Therefore the method presented in this paper is an improvement over the Margon et al. method.However, as mentioned, their method does capture the colour region where non-AGB C stars exist; this may be useful for general C star identification, but not for our analysis.For O-rich type stars, the calibrated colour equation method we put forward returns a mean F1-score of 0.74.
Finally, a confusion matrix for each classifying method is provided in Figure 5.The values correspond to the results shown in Figure 4.     Table C1, a sample of C-and O-rich spectra are represented, where the row number matches the order that the spectra are plotted, going from left to right, top to bottom.The full table can be downloaded at http://www.icg.port.ac.uk/mastar/.In the same table we provide the MaStar identification, median signal-to-noise ratio per pixel, star coordinates in degrees, calculated colours in both SDSS and Johnson-Cousins systems, whether the spectrum is from the badspec catalogue (flag=1) or not (flag=0), the C-or O-rich classification and the class given by full-spectral fitting with LM02 or XSL17 templates (see Section 5).

SPECTRAL CLASSIFICATION INTO C AND O-RICH SUBTYPES.
So far we have identified a set of candidates C-rich and O-rich spectra by only using photometry, then confirmed their nature via visual classification and quantitative metrics (plus existing tools like SIM-BAD).Here we shall perform a spectral classification of the spectra of these candidates selected via photometry.The distinction between C-rich and O-rich type spectra is relatively straightforward and differentiation can be done by eye using the prominent bands of VO, C 2 , CN and TiO.However, to gain an estimate of atmospheric properties such as temperature, it is necessary to use the spectral energy distribution through spectral fitting or by photometry matching.For the cold C-and O-rich type spectra, the spectral features that are most sensitive to changes in effective temperature and chemical abundance are located in the near-IR.For example, the C 2 H 2 and C 3 features at 3m and 5.1m are found to be most sensitive to T eff (Paladini et al. 2011) and the C/O ratio (Jørgensen et al. 2000).For this reason, in our paper we use spectra that have been classified into temperature bins based on their near-IR photometry as templates for spectral fitting in the wavelength range of MaStar spectra.

Empirical Templates
In order to classify the identified spectra, we look to comparing our observations with other classified empirical spectra.We compare with two libraries of lower (R ∼ 1100) and somewhat higher (R ∼ 2000) spectral resolution that brackets the MaStar median resolution of R ∼ 1800.The lower resolution library is provided by LM02, whose colours we have used in Section 3, who -we recall -provide averaged spectra of C-and O-rich type stars from the Milky Way and Magellanic Clouds.We focus on this library in particular as they have been used to supplement the Maraston models since first introduced in Maraston (2005) and in their models since.The averaged spectra are derived using individual stellar spectra as described in Lançon & Wood (2000), at a resolution of R ∼ 1100 and wavelength range 0.5 − 2.4 m.The motivation for averaged spectra is due to uncertainties in individual stars from thermal pulses, stellar wind and mass loss.LM02 sort 63 individual spectra of O-rich type stars into 9 bins of 7 spectra each, with bin number increasing with decreasing effective temperature, from  eff = 3930  for the first bin O1 to  eff = 2430  for the last bin O9 (Table 3 of LM02).These bins are sorted based on the broad band colours (I -K) which they find to correlate well with effective temperature (see Figure 11 of Lançon & Mouhcine (2002)).For C-type stars, 5 bins are used, where bins 1 -3 contain 6 spectra each.Bin 1 includes spectra for one S/C-type star.Bin 4 contains the spectrum of R Lep at maximum light 11 and bin 5 the average of 3 spectra from R Lep at minimum light.They sort these based on the broad band colours (R -H) and C/O abundance ratios.Unfortunately we do not have such abundance ratio information for MaStar spectra, so a similar binning approach is not possible.The correspondence to temperature for the C-bins is  eff = 3200  for the first bin C1 to  eff = 2000  for the last bin C5 (Table 3 of  LM02).
The second empirical library used in the classification is provided by the X-Shooter Spectral Library (Gonneau et al. 2017) (hereafter XSL17).They provide spectra of only C-type stars at a resolution R ∼ 2000 and wavelength range 0.4 − 2.4 m.To estimate atmospheric properties, XSL17 fit their observations with theoretical spectra based on the models of Aringer et al. (2009) that were calculated for their analysis.Their C-type stars are binned into four groups based on broad band colours: the bluest in group 1 (J -K  ) < 1.2, group 2 1.2 < (J -K  ) < 1.6 and the reddest in groups 3 and 4 with (J -K  ) > 1.6.The distinction between the reddest groups is whether the 1.53 m absorption feature is present.Without this feature, stars are placed in group 3 and all other red stars place in group 4. As our wavelength range does not cover this feature we aggregate these groups and label all templates with (J -K  ) > 1.6 as group 3.

Spectral Fitting Method
In this paper we adapt our spectral template fitting method which we developed for MaStar spectra and which we recall below, to work with the empirical templates described in Section 5.1.Our method to determine stellar parameters -,  eff , [/] -for the observed MaStar spectra is based on a single stellar template fitting approach over the whole wavelength extent of the MaStar spectra, using as fitting templates theoretical stellar spectra from model atmospheres.This method, first described in Maraston et al. (2020) was later developed to incorporate MCMC sampling in Hill et al. (2022a) and also expanded to add the estimation of the [/] parameter alongside the fundamental stellar parameters quoted above (Hill et al. 2022b).The full spectral fitting is performed using the penalized pixel-fitting method (pPXF, Cappellari & Emsellem (2004);Cappellari (2017)) to fit templates to data.This algorithm performs a  2 minimisation by comparing the observed spectra with the templates.The minimum  2 is found by solving the quadratic programming problem described by equation 20 in Cappellari (2017).Furthermore, the algorithm adds a penalisation term as the line of sight velocity distribution (LOSVD) deviates from a Gaussian shape.This reduces the noise in recovered kinematics when fitting for stellar populations and is the original functionality of the algorithm.However, in the case of single stars, the parameterisation of the LOSVD can be used to correct for velocity offsets in the observations.Furthermore, by using multiplicative polynomials we can normalise the continuum and account for inaccuracies in spectral calibration caused by dust.This is particularly import for the AGB stars that can have dusty envelopes.
For the estimation of stellar parameters in this article we adopt the single template approach due to the scarcity of templates and the discreteness of their parameters.As fitting templates we adopt the two sets described in Section 5.1, namely the LM02 empirical binned C-rich and O-rich spectra and the XSL17 empirical binned C-rich spectra.Therefore it is important to stress that with this fitting we do not determine explicit stellar parameters, rather we determine the 11 The optical part of the spectrum is an average of bins 3 and 5.
best fitting bin, which in turn correlates with temperature via near-IR colours as discussed in the papers where the binning is proposed, namely Lançon & Mouhcine (2002) and Gonneau et al. (2017).
After correcting the observed spectra for Milky Way reddening (as described in Section 2.2) we treat each of them as a single star, assuming that no binaries exist in either templates or observations.For the comparison with LM02 templates, MaStar spectra are firstly convolved to match the template resolution of R ∼ 1100.When comparing to XSL17, the resolution is close enough to MaStar's such that we do not require this step.We then minimise the  2 between each template and the observation.By selecting the template with the lowest  2 we can determine the spectral type as assigned by either LM02 or XSL17.We stress that the two sets of templates are used for independent  2 fitting.In Section 5.3 we shall show and discuss the results of our spectral fitting analysis.

Results of Spectral Fitting
In Figure 7 we show example spectral fits of C (top row) and O (bottom row) stars using the LM02 averaged spectra as templates.In the top left is an example of the warmest bin in LM02 fit to a MaStar C star spectrum and the top right shows one of the coolest C spectra in our library, fit with LM02 bin-5.Below are example fits using the LM02 O type spectra as templates, the warmer bin-3 and cooler bin-7 are shown.Not all O type spectral fits were successful due to noise and our limited template bank.We therefore omit these poor fits based on their  2 from the final catalogue.In Figure 8 we show example fits from fitting C star spectra to the XSL17 data.In general, the XSL17 spectra work well as templates and the assigned bin corresponds with the SED shape of each spectrum.A sample of C-and O-rich type MaStar IDs and their corresponding template fit from either LM02 or XSL17 are in Table B1.
Using the synthetic photometry in the Johnson-Cousins system (BVRI) and the 2MASS JHK  photometry, we are able to assign each spectrum according to the bins defined in LM02 for C-and O-rich types.Specifically, C-types are categorised based on their ( − ) colour and O-rich types using ( − ).In Figure 9, we compare this bin number based on colour to the result of the spectral fitting approach previously described.This shows some correlation between the two methods.This result is encouraging as it shows that the spectral fitting method using the binned templates is accurate to within a few bins for most spectra and can be used alternatively to near-IR colours when these are not available.We note a preference towards later bins, i.e. colder temperatures using the colour binning.This maybe due to the fact that the spectral fitting is performed on the MaStar wavelength range, which -by being limited to 10, 300 -fails to assess the actual spectral type for the coldest spectra whose energy emission peaks further from the maximum wavelength covered by MaStar.It will be interesting to probe this speculation when near-IR spectra for our sample will be available.

Effect of spectral variability.
In our analysis we consider each individual spectrum because we aim at classifying each spectrum collected by MaStar.This means we consider different spectra for the same star.The aim at analysing average spectra for a same star could be pursued in the future.
Here we report our findings regarding spectral variability.Of the 69 C stars and 118 O stars we have identified, there are 88 stars that have multiple observations.We do not find any instance where magnitudes are identical.We checked if and how the spectral classification changes between multiple observations.With respect to the X-shooter templates, we find 2 stars which have a variable classification between observations.The uncertainty is one classification interval in the XSL classification scheme.With respect to the Lancon & Wood templates, we find 5 stars that vary between observations.These have larger variations between observations, which could be due to the lower resolution of the L&W templates or to intrinsic variations.
In total, we find 7/88 stars whose spectral classification depends on the spectrum visit adopted.This is less than 10%, which supports our suggestion that stellar variability is not much of an issue in the optical.Detailed stellar IDs and classification bins are reported below.

Detection of non-AGB, dwarf Carbon-type spectra
During our spectral analysis we discovered a number of spectra with Carbon features, but a higher temperature as qualitatively deduced from their spectra.These are "non-AGB" or dwarf carbon stars, they sit in the main sequence and are thought to form from mass transfer in a binary system.In the formation scenario for classical, AGB Carbon stars (Iben 1974;Iben & Renzini 1983), the Carbon is pushed into the stellar atmosphere from successive dredge up events, in particular the third dredge up, a recurring event which moves carbon from the inner layers to the atmosphere thereby increasing the Carbon abundance relative to oxygen in the stellar atmosphere.The formation process of a dwarf, non-AGB C star is instead thought to be a consequence of carbon enhanced material from a classical TP-AGB carbon star being accreted by the dwarf companion.The TP-AGB star eventually evolves to a white dwarf via mass loss, remaining undetectable at optical wavelengths, leaving a Main Sequence star with Carbon features (Dahn et al. 1977;McClure & Woodsworth 1990;Roulston et al. 2022).
Figure 10 shows that the spectra of dwarf C-types (black) display similar carbon bands to what is observed in AGB C-types (grey), but with a warmer continuum due to the higher effective temperature.
Non-AGB C-type spectra are not pertinent to AGB modelling (and cannot be fitted meaningfully by AGB C-type templates as we do here, due to a vastly different continuum).
Therefore we do not include them in our analysis.On the other hand, these are unusual spectra that are interesting in themselves and for other astrophysics scopes.The IDs of the detected spectra are provided in Figure 10.

SUMMARY AND DISCUSSION
We have exploited the vastness of the SDSS-IV/MaNGA Stellar Library, MaStar (∼ 60, 000 spectra, covering the wavelength range 3620 − 10350 Å) in order to hunt for rare C-and O-rich type TP-AGB stars to facilitate the future modelling of intermediate age (∼ 1 Gyr) stellar population.These stars feature unique spectral patterns, which help break well-known degeneracies in the interpretation of integrated galaxy spectra, such as the age/metallicity and age/dust degeneracies (e.g.Maraston (2005), Maraston et al. 2006).This aids the interpretation of extra-galactic stellar population spectra.Strikingly, spectral features typical of C-rich and O-rich stars evolving on the TP-AGB have recently been detected on the spectra of distant ( ∼ 1 − 2), massive (10 10  ⊙ ), quiescent galaxies (Lu et al. 2024).Galaxy ages determined by fitting stellar population models including the contribution from the TP-AGB phase are indeed around 1 Gyr and the galaxy spectra cannot be fitted satisfactorily without a TP-AGB contribution, by e.g.changing the model metallicity, age, dust or star formation history.This fulfills the promise that the TP-AGB phase helps breaking degeneracies in the interpretation of the integrated spectra of stellar systems.
While the low-temperatures typical of TP-AGB stars ( eff < 4000 K) are best probed with near-IR surveys capturing the peak of their energy emission and most prominent spectral features (e.g.Lançon & Wood (2000)), MaStar covering spectra from 6000 to 10300 Å already capture CN, TiO and VO bands and -as we show here -allows the identification of C,O type spectra just using optical SDSS colours.In addition, the large number of spectra probed by MaStar allows us to beat the expected low-number statistics for TP-AGB stars due to the short timescales involved (∼ 3 Myr).Our results will allow a cheap and quick identification of these exotic stellar spectra and the exploitation of optical surveys without the need to obtain extra (and usually expensive) near-IR data.
Our method seeks to identify the colour space occupied by these stars in SDSS colours in MaStar and we started by calculating the expected colours of named C and O stars from previously published catalogues.In this way we could robustly identify their colour loci and found that they stick out from the rest of spectra in a (−) vs. (− ) colour displaying a clear bifurcation, which further allows a neat separation between the two types.Our newly proposed cuts are: (−) > 2 and ( −) < 1.55( − ) -0.07 for C-rich types and ( − ) > 2 and ( − ) > 1.55( − ) -0.07 for O-rich types.When compared with previous results for C stars (Margon et al. 2002), we find our cut to be more effective.This is probably the result of our calibration of the colour selection using the colours of known C-and O-type spectra.Furthermore, new with respect to these past efforts, it also allows the identification of O-rich type spectra, albeit with more contamination.
With our colour-colour cut ( − ) > 2, ( − ) > 1.55( − ) −0.07 we have identified 41 AGB C-type stars, which are represented by 69 spectra as some stars are observed multiple times and 87 O-rich type stars, for which MaStar contains 118 spectra.Of the C and O stars identified here, 5 C stars and 49 O stars do not have a SIMBAD classification and are therefore new identifications from our analysis, which would be interesting to confirm with other observational facilities in the future.
Further, we put forward a new approach to spectral classification for C-,O-rich spectra.This is based on the same full spectral fitting approach we developed for determining the atmospheric stellar parameters for MaStar (Hill et al. 2022a), but in this case we use as fitting templates the empirical C,O spectra of Lançon & Mouhcine (2002) and Gonneau et al. (2017).This way we are able to assign a spectral class to each C-and O-rich type MaStar candidate, with correlation with effective temperature.Using the colour bins to define the spectral type, as defined in Lançon & Mouhcine (2002), we are able to compare to the spectral fitting method.We find some correlation between the two methods and that there is a preference to colder temperatures when using the colours to determine spectral type.
Our work focuses on optical spectra which is what is available in MaStar.On the other hand, galaxies emit most energy at optical/near-IR wavelength, therefore for studying galaxy evolution it is important to include the contribution of stellar phases at these wavelengths.As well known, TP-AGB stars emit most of their light in the near-IR and the coldest and dustier among them, which correspond to the shortest lived phases, at even longer wavelength (e.g.Dell'Agli et al. 2017).In this sense our work sits besides long wavelength surveys explicitly dedicated to the study of AGB stars, e.g. the Nearby Evolved Star Survey (NESS12 ), a (sub-)mm, multi-band, volume-limited survey of mass losing AGB stars.It will be interesting in the future to crossmatch our detections with other catalogues of C-and O-rich spectra such as those mentioned in the Introduction (Ji et al. 2016;Abia et al. 2020Abia et al. , 2022) ) and with NESS in order to reconstruct the multi-wavelength spectrum of TP-AGB stars.
The C-,O-rich TP-AGB spectra that we have identified here will enable a more accurate definition of young and intermediate-age MaStar-based stellar population model spectra (Maraston et al. (in prep)).In addition, the colour cut we developed can be applied to other stellar and galaxy/cosmological surveys employing the same, or similar, filters to either detect even more C,O type spectra or study stellar contamination for high-redshift galaxy surveys.B1 and B2, reading from left to right and top to bottom.The full table can be downloaded at http://www.icg.port.ac.uk/mastar/.Included is the MaNGA ID that represents each star in MaStar (1); MaStar plate number that holds fiber bundles (2); MaStar integral field unit (IFU) number that represents the fiber bundles that collected the spectrum (3); MaStar modified Julian date (MJD) number that represents the date of observation (4); median signal-to-noise ratio per pixel (5); sky position in degrees (6-7); colours used in our analysis (8-10); colours in the Johnson-Cousins system (11-13); whether the spectrum is from the badspec catalogue, where '1' is true and '0' is false (14); the spectral type (15); the bin number according to our spectral fitting using (Lançon & Mouhcine 2002) and Gonneau et al. (2017), respectively (16-17)

Figure 1 .
Figure 1.Distribution of all spectra used in this study shown as a colourcolour plot in  −  vs.  −  plane.Note the bifurcation at  −  > 2.

Figure 2 .
Figure 2. Colour selection for Carbon-rich and Oxygen-rich type stars overlaid to the whole colour-colour plane of MaStar spectra (grey points).Blue and red filled circles plot the 9 O-rich type and 5 C-type average spectra from Lançon & Mouhcine (2002), red open circles are the colours of classified C spectra from the X-Shooter library (Gonneau et al. 2017).The black markers are the calculated colours from Kurucz (1979) model atmospheres, where giants (log g < 2.5) are open circles, dwarfs (log g > 4.7) are filled circles and intermediate types (2.5 < log g < 4.5) are crosses.The blue line represents our separation of C-and O-rich types, following the formula: 1.55( − ) -0.07.The C spectra colour sequence is unique, while some contamination from dwarfs is found in the O-rich type sequence.The lower right inset plot shows the location of the reddest C spectra from the X-Shooter Library.
Margon et al. (2002) developed a similar colour selection for C stars, based on their separation from the main locus of stars, firstly proposed in Krisciunas et al. (1998).The Margon et al. colour selection is: 15 < r < 19.5 and (r−i) < −0.4 + 0.64(g−r).In this paper, we test their effectiveness for C star selection in MaStar (see Section 4.1).
Figure 2 also allows us to address the question of how complete in C-rich and O-rich stars the MaStar sample is.The figure shows that MaStar data cover (in colour) all averaged types by LM02 and XSL17.Candidates in Mastar extend to even redder colours (grey points in

Figure 3 .
Figure 3.The MaStar sample of C-and O-rich type spectra plotted using the synthetic colours  −  vs  −  in the Johnson-Cousins Vega system.Cand O-rich types are represented by red and blue colours, respectively, with data from the badspec sample plotted as open symbols.
Figure2).Therefore we conclude that MaStar is able to cover the main broad spectral types of O-rich and C-rich stars as dedicated optical plus near-IR surveys such as e.g. the one byLançon & Wood (2000) and also most part of the XSL17 C-stars.We shall return to this point in the Conclusions (Section 6).To conclude, the existence of unique colour sequences tracing C-rich and O-rich spectra is not a prerogative of SDSS colours.In Figure3we plot the selected MaStar C-and O-rich type spectra in a ( − ) vs. ( − ) colour diagram, which is the Johnson-Cousin analogue of Figure 2. Well-defined colour sequences trace and distinguish the colour loci of C-rich and of O-rich spectra.Both data from goodspec and from bad spec lie on these sequences (filled and open symbols, respectively).The better separation of the two types in  −  with respect to  −  arises because the I-filter (effective wavelength ∼ 8000 Angstroms) is located around deep and broad TiO,VO absorption bands for O-rich stars, while C-stars, at the same wavelengths, are featured by narrower CN absorptions (cfr.Figure6).The -filter (effective wavelength at ∼ 6400 Angstroms) encompasses a relatively smooth region of the continuum for both types, which is why their colours are similar, but note the extension towards (V-R) colours for the later (i.e.coolest) O-rich types.
Figure2).Therefore we conclude that MaStar is able to cover the main broad spectral types of O-rich and C-rich stars as dedicated optical plus near-IR surveys such as e.g. the one byLançon & Wood (2000) and also most part of the XSL17 C-stars.We shall return to this point in the Conclusions (Section 6).To conclude, the existence of unique colour sequences tracing C-rich and O-rich spectra is not a prerogative of SDSS colours.In Figure3we plot the selected MaStar C-and O-rich type spectra in a ( − ) vs. ( − ) colour diagram, which is the Johnson-Cousin analogue of Figure 2. Well-defined colour sequences trace and distinguish the colour loci of C-rich and of O-rich spectra.Both data from goodspec and from bad spec lie on these sequences (filled and open symbols, respectively).The better separation of the two types in  −  with respect to  −  arises because the I-filter (effective wavelength ∼ 8000 Angstroms) is located around deep and broad TiO,VO absorption bands for O-rich stars, while C-stars, at the same wavelengths, are featured by narrower CN absorptions (cfr.Figure6).The -filter (effective wavelength at ∼ 6400 Angstroms) encompasses a relatively smooth region of the continuum for both types, which is why their colours are similar, but note the extension towards (V-R) colours for the later (i.e.coolest) O-rich types.
Figure 4 visualises the results of the testing, showing the ( − ) vs. ( − ) diagram for the two tested frameworks.The left-hand panel shows the results when using the colour selection for C-stars proposed by Margon et al. (2002), while the right-hand panel shows the C-and O-rich selection using the colour equations we propose (section 3, Figure 2).In each panel, the predicted C spectra are shown as blue open circles while red filled circles show the location of the 35 true C star spectra.In the right-hand panel, where we additionally test for O-rich spectra identification, we show the predicted O-rich type spectra as green open circles while red crosses represent the location of true O-rich type stars.We now report and comment on the quantitative results for the iteration of data shown in Figure 4. Starting from the left panel showing the Margon et al. method, the selection encompasses a large region of the main stellar locus that would be populated by non-AGB, dwarf C stars (see Section 5.5).Nevertheless, 33 of the 35 true C stars are selected, resulting in a recall score of 0.94, but low precision of 0.16.Our newly proposed colour cut (right-hand panel) is able to select all but one C star, returning a recall score of 0.97 and precision of 0.61 due to 20 false positives.It can also identify O-rich type stars in the region 2 < ( − ) < 3 and miscellaneous stellar types in between with a recall of 0.95 and precision of 0.61.

Figure 4 .
Figure 4. Comparison between colour selection methods.Left: the Margon et al. colour equation for C spectra only.Right: the colour equation presented in this paper for both C and O stars.The Margon et al. method returns an F1-score of 0.27, the ( − ) > 2, ( − ) < 1.55( − ) -0.07 cut returns 0.72 for C-types and 0.74 for O-rich types.Red data points show true C stars and blue circles are the predictions for each method.O-rich types are shown by red crosses and their predictions represented by green circles.

Figure 5 .
Figure 5. Confusion matrix showing how each method classified C-types, O-rich types and all other spectra.The values correspond to the results shown in Figure 4.
Using our newly defined colour equation we have identified C-rich and O-rich type candidate spectra in MaStar.Identifications have been confirmed with SIMBAD 10 and with visual inspection.In detail, we have identified 41 TP-AGB C-stars represented by 69 spectra observations and 87 O-rich type stars, represented by 118 spectra observations.Of these 36 C-rich stars and 38 O-rich stars have a SIMBAD classification. 10https://simbad.u-strasbg.fr/simbad/sim-fbasicIn Figure 6 we show three examples of O-rich spectra in the top row and three examples of C-types in the bottom row.The O-rich types in the top row display the strong TiO bands at 7200 Å, 7700 Å etc. and VO bands at 7400 Å and show a gradual decrease in T eff from left to right.The defining spectral features of C-type spectra are the sharp CN bands and the absence of molecular bands such as H 2 O and TiO.As above, the spectra show a decrease in T eff from left to right.Furthermore, within each subplot we show a zoom in of the spectrum in the wavelength range 4000 − 5000Å.The C-and O-rich spectra for all stars is shown in Figures B1 and B2, respectively.Spectra from the goodspec file are in black and those from the badspec file are blue.In

Figure 6 .
Figure 6.Example O-rich (top) and C-type (bottom) spectra identified in this work.Within each subplot we show a closer view of the spectrum in the wavelength range 4000 − 5000Å.The title of each subplot shows the MaNGA ID for the corresponding spectrum.

Table 1 .
Score metrics for the colour selections.For both methods we report the star type, the mean F1-score, the mean F1-score and the variance.We also show the results of the colour cut used to identify O-rich type stars.The column 'Type' signals whether we are considering C-or O-rich type spectra.) > 1.55( − ) −0.07

Figure 7 .
Figure 7. Top row: Example full spectral fits of C-type MaStar spectra using the LM02 averaged spectra as templates.MaStar spectra have been convolved to match the spectral resolution of LM02 (R ∼ 1100).Shown are fits of the warmer bin-1 and cooler bin-5 spectra.Bottom row: Same as above but for bin-3 and bin-7 O-rich type spectra.

Figure 8 .
Figure 8. Example model fits of C-type spectra using the XSL17 spectra as templates, for the four temperature bins defined in Gonneau et al. (2017).

Figure 9 .
Figure 9.A comparison of the selected bin type when using colours to determine the spectral type of C-and O-rich types compared to our spectral fitting method.The thresholds for the colour bins are taken from LM02, corresponding to cuts in ( − ) for C-types and ( − ) for O-rich types.The colour of each data point shows where spectra overlap and the red diagonal line represents a perfect correlation.

Figure 10 .
Figure 10.Dwarf C-type spectra (black) in MaStar that are detected through our methods, but are excluded in the statistics since our focus is on AGB stars.The spectrum for an AGB C-type star (MaNGA ID 60-4074329259731904512) is shown in grey as a comparison in each panel.Note that the magnitude of the flux is not equivalent between the AGB and dwarf types and has been adjusted for the comparison.The title of each subplot shows the MaNGA ID for the corresponding dwarf C-type spectrum.

Figure A1 .
Figure A1.Pairwise comparison of the five magnitudes used for the C-type and O-rich star classification.

Figure B1 .
Figure B1.All 69 spectra for the 41 classical C-type stars identified in MaStar.Spectra used from the 'goodspec' catalogue are plotted in black and 'badspec' are in blue.

Figure B2 .
Figure B2.All 118 spectra for the 87 O-rich type stars identified in MaStar.Spectra used from the 'goodspec' catalogue are plotted in black and 'badspec' are in blue.

Table C1 :
Table showing various data for a sample of C-and O-rich type spectra.The order matches the spectra in Figures . A value of -99 is used when no data is available.