Cool and Data-Driven: An Exploration of Optical Cool Dwarf Chemistry with Both Data-Driven and Physical Models

Detailed chemical studies of F/G/K -- or Solar-type -- stars have long been routine in stellar astrophysics, enabling studies in both Galactic chemodynamics, and exoplanet demographics. However, similar understanding of the chemistry of M and late-K dwarfs -- the most common stars in the Galaxy -- has been greatly hampered both observationally and theoretically by the complex molecular chemistry of their atmospheres. Here we present a new implementation of the data-driven \textit{Cannon} model, modelling $T_{\rm eff}$, $\log g$, [Fe/H], and [Ti/Fe] trained on low-medium resolution optical spectra ($4\,000-7\,000\,$\SI{}{\angstrom}) from 103 cool dwarf benchmarks. Alongside this, we also investigate the sensitivity of optical wavelengths to various atomic and molecular species using both data-driven and theoretical means via a custom grid of MARCS synthetic spectra, and make recommendations for where MARCS struggles to reproduce cool dwarf fluxes. Under leave-one-out cross-validation, our \textit{Cannon} model is capable of recovering $T_{\rm eff}$, $\log g$, [Fe/H], and [Ti/Fe] with precisions of 1.4\%, $\pm0.04\,$dex, $\pm0.10\,$dex, and $\pm0.06\,$dex respectively, with the recovery of [Ti/Fe] pointing to the as-yet mostly untapped potential of exploiting the abundant -- but complex -- chemical information within optical spectra of cool stars.


INTRODUCTION
The Solar Neighbourhood-and indeed the Universe more broadlyis dominated by cool dwarf stars of spectral types K and M (e.g. Henry et al. 1994;Chabrier 2003; Henry et al. 2006;Winters et al. 2015;Henry et al. 2018).While Milky Way stars in general are expected to host at least one planet on average (Cassan et al. 2012), cool dwarfs are actually more likely to host small planets as compared to more massive stars (Howard et al. 2012;Dressing & Charbonneau 2015) with many yet undiscovered (Morton & Swift 2014).Enabled by the space-based Kepler (Borucki et al. 2010), K2 (Howell et al. 2014), and TESS (Ricker et al. 2015) missions, exoplanetary astrophysics now has a large and ever-growing set of such systems to study both individually in detail, as well as collectively in a demographic sense.
When presented as such, it is easy to come to the conclusion that cool dwarfs and their planets are as well-understood as their prevalence might imply.In reality though, these stars are intrinsically ★ E-mail: adam.rains@physics.uu.se (ADR) faint-especially at optical wavelengths-and possess complex spectra blanketed by innumerable overlapping molecular absorption features.In the infrared (IR) this absorption is dominated by molecules like H 2 O, CO, FeH, and OH; and in the optical from oxides like TiO, ZrO, and VO, as well as hydrides like MgH, CaH, AlH, and SiH.Such complexity renders the spectral energy distribution (SED) not just a strong function of temperature, as with Solar-type stars, but also chemistry, making it difficult to ascribe an accurate or unique set of stellar parameters to any given star.This intense molecular absorption makes 'true' continuum normalisation impossible at optical wavelengths, and poses severe challenges for traditional spectroscopic analysis techniques.As a result, our understanding of the chemistry of cool dwarfs and their planets typically lags far behind those of Solar-type stars.
This atmospheric complexity and the large impact a single molecular species can have on an emergent spectrum means that the generation of model spectra that accurately match observations has been, and continues to be, a challenge.While model spectra at cool temperatures demonstrate reasonable performance in the near infrared (NIR, e.g.Allard et al. 1997; Baraffe et al. 1997Baraffe et al. , 1998;;Allard et al. 2012), there have long been issues in the optical (e.g.Baraffe et al. 1998;Reylé et al. 2011;Mann et al. 2013c;Rains et al. 2021).The core reason is likely incomplete line lists for dominant sources of opacity, where the impact of not accurately knowing transition wavelengths or line depths can be severe (e.g.Plez et al. 1992;Masseron et al. 2014)-particularly for TiO (e.g.Hoeĳmakers et al. 2015;McKemmish et al. 2019) which dominates absorption in the optical.All this means that, to this day, it is far from simple to produce accurate cool dwarf temperatures, radii, and especially metallicities en masse-let alone individual elemental abundances.
Given these complexities, it is thus critical to have a set of cool dwarfs of known chemistry to use as benchmarks for testing models or building empirical relations.The widely considered gold standard are cool dwarfs in binary systems with a warmer companion of spectral type F/G/K from which the chemistry can more easily be determined.This relies on the assumption that both stars formed at the same time and thus have the same chemical composition.Thankfully such chemical homogeneity is now well established for F/G/K-F/G/K pairs (e.g.Desidera et al. 2004;Simpson et al. 2019;Hawkins et al. 2020;Yong et al. 2023), and while there remain edge-cases of chemically inhomogeneous pairs (e.g.Spina et al. 2021)-possibly the result of planet engulfment-the level of chemical homogeneity is more than sufficient for the precision of the current state of the art in cool dwarf chemical analysis.
The extreme sensitivity of cool dwarf spectra to stellar chemistry remains present in broadband optical photometry, though this is less the case in the IR where  band photometry at 2.2 m is a comparatively [Fe/H]-insensitive1 probe of stellar mass ( ★ ) for isolated main-sequence stars with  ★ ≲ 0.7 M ⊙ .This is something that was initially predicted by theory (see e.g.Allard et al. 1997, Baraffe et al. 1998, and Chabrier & Baraffe 2000 for summaries), and later confirmed observationally (Delfosse et al. 2000), and allows for the development of photometric metallicity relations using an optical-NIR colour benchmarked on the aforementioned K/M-F/G/K benchmark systems (e.g.Bonfils et al. 2005;Johnson & Apps 2009;Schlaufman & Laughlin 2010;Neves et al. 2012;Hejazi et al. 2015;Dittmann et al. 2016;Rains et al. 2021;Duque-Arribas et al. 2023).While purely photometric metallicity relations suffer from certain limitations, such as their sensitivity to unresolved binarity or young stars still contracting to the main sequence-they are widely applicable given the volume of data available from photometric surveys like 2MASS (Skrutskie et al. 2006), SkyMapper (Keller et al. 2007), SDSS (York et al. 2000), Pan-STARRS (Chambers et al. 2016), and Gaia (Gaia Collaboration et al. 2016).
Greater metallicity precision can be achieved by using lowresolution spectra and building empirical relations from [Fe/H]sensitive spectral regions or indices, again benchmarked against K/M-F/G/K binary systems.Such low resolution spectra contain vastly more information than broadband photometry alone and are relatively observationally cheap to obtain, especially at redder wavelengths (e.g. the   bands) where these stars are brighter.The last ∼10 years has seen a number of studies develop such relations, which span a range of spectral resolutions and wavelengths (e.g.Rojas-Ayala et al. 2010, 2012;Terrien et al. 2012;Mann et al. 2013b,c;Newton et al. 2014;Mann et al. 2015;Kuznetsov et al. 2019), which importantly gives rise to a large secondary set of fundamentally-calibrated cool dwarf benchmarks.This proves useful as the wide separation F/G/K-M/K binaries passing the quality cuts necessary to serve as benchmark systems are more rare-and thus also more distant on average-making the secondary set of benchmarks the brighter and more populous sample.
Other studies have opted to determine [Fe/H] from model fits to high-resolution spectra.Not only can this give access to unblended atomic lines not accessible for observations made at lower spectral resolution-especially in the (N)IR-but it also allows for more detailed testing of the models themselves 2 .These studies span a similarly wide range of optical and IR wavelengths (e.g.Woolf & Wallerstein 2005;Bean et al. 2006a,b;Woolf & Wallerstein 2006;Rajpurohit et al. 2014;Passegger et al. 2016;Lindgren & Heiter 2017;Veyette et al. 2017;Souto et al. 2017;Passegger et al. 2018;Marfil et al. 2021;Cristofari et al. 2022a), and have helped in pushing the boundaries of what we know about cool dwarfs and how best to model and analyse them.
However, despite these advances in the determination of cool dwarf metallicities, it is at best an approximation to assume that their spectra can reliably be parameterised by only three atmospheric parameters in  eff , log , and [M/H] (or [Fe/H], its common proxy 3 ).In reality, individual elemental abundances are able to dramatically change the shape of the observed 'pseudocontinuum'-and thus the measured stellar properties-via their effect on various dominant molecular absorbers.As a specific example, Veyette et al. (2016) demonstrated that independently changing carbon and oxygen abundances by just ±0.2 dex can result in an inferred metallicity ranging over a full order of magnitude (> 1 dex), with typical metallicity indicators-like those from low-resolution spectra previously discussed-showing a strong dependence on the C/O ratio.In cool atmospheres the carbon abundance affects how much oxygen gets locked up in CO, a low energy molecule that preferentially forms, with only the leftover oxygen able to go into other dominant opacity sources like H 2 O and TiO.
Understanding elemental abundances of cool dwarfs beyond just the bulk metallicity is thus a critically important task.This important work is well underway (e.g.Tsuji & Nakajima 2014;Tsuji et al. 2015;Tsuji & Nakajima 2016;Tsuji 2016;Veyette et al. 2016Veyette et al. , 2017;;Souto et al. 2017Souto et al. , 2018;;Ishikawa et al. 2020;Souto et al. 2020;Maldonado et al. 2020;Ishikawa et al. 2022;Souto et al. 2022;Cristofari et al. 2022b), but more research is needed to fundamentally calibrate the results using a larger set of more chemically diverse binary benchmarks, do this at the scale of large spectroscopic surveys containing thousands of stars, and to use this knowledge to improve upon the current generation of cool dwarf model spectra.
Data-driven models present another method to tackling this problem.Provided they are trained on spectra from a set of benchmarks with precise fundamental or fundamentally-calibrated properties, such an approach becomes an effective way of teasing apart the complex chemistry of these stars.Absent the limitations that come with physical models (e.g.incomplete molecular line lists), datadriven models have the potential to turn what is traditionally considered a weakness of cool dwarfs-strong and innumerable overlapping absorption features from multiple different atomic and molecular species-into a strength given the sheer amount of information present-assuming of course this chemical information can be properly exploited.This is a particularly important problem to solve in preparation for upcoming massive spectroscopic surveys like 4MOST (de Jong et al. 2019) and SDSS-V (Kollmeier et al. 2017).
Data-driven models like the Cannon (Ness et al. 2015) have been successfully applied to F/G/K stars observed by spectroscopic surveys like GALAH, APOGEE, LAMOST, and SPOCS (e.g.Buder et al. 2018;Ho et al. 2016;Casey et al. 2016Casey et al. , 2019;;Wheeler et al. 2020;Rice & Brewer 2020;Nandakumar et al. 2022), often with the goal of inter-survey comparison or the computational speed of data-driven stellar property determination versus more traditional modelling.Other studies have extended this work to cool (and brown) dwarfs using a variety of modelling approaches (Behmard et al. 2019;Birky et al. 2020;Galgano et al. 2020;Li et al. 2021;Feeser & Best 2022) for the recovery of properties like spectral type,  eff , log , [Fe/H], [M/H],  ★ , stellar radius ( ★ ), or stellar luminosity ( ★ ), with Maldonado et al. (2020) even reporting the impressive recovery of 14 different chemical abundances with Δ[X/H] ≲ 0.10 dex for their sample of K/M-F/G/K binaries.Finally, these models can also be used to explore complex parameter spaces and make new physical insights-for example wavelength regions sensitive to particular elemental abundances-something more challenging to do with traditional analysis methods.
Here we present a new implementation of the Cannon trained on low-to-medium resolution (R∼3 000-7 000) optical spectra (4 000 <  < 7 000 Å) of cool dwarfs observed with the WiFeS instrument (Dopita et al. 2007) on the ANU 2.3 m Telescope at Siding Spring Observatory (NSW, Australia).Our four label model in  eff , log , [Fe/H], and [Ti/Fe] draws its accuracy from a relatively small, but hand-selected, set of 103 stellar benchmarks primarily composed of stars with interferometric  eff , [Fe/H] and [Ti/Fe] measurements from a wide binary companion of spectral type F/G/K, or [Fe/H] determined from binary-benchmarked empirical relations based on low-resolution NIR spectra.We use our Cannon model in conjunction with a custom grid of MARCS model spectra (Gustafsson et al. 2008) to investigate the sensitivity of optical cool dwarf fluxes to variations in chemical abundances, as well as limitations in reproducing optical fluxes.Our data and stellar benchmark selection are described in Section 2; our Cannon model and its training, validation, and performance in Section 3; an investigation into cool dwarf optical flux sensitivity to elemental abundance variations using MARCS spectra in Section 4; a discussion of results, comparison to previous work, MARCS flux recovery assessment, and future prospects in Section 5; and concluding remarks in Section 6.

Spectroscopic Data
Our data consists of low and medium resolution optical benchmark stellar spectra observed with the dual-camera WiFeS instrument (Dopita et al. 2007) on the ANU 2.3 m Telescope as part of the spectroscopic surveys published as Žerjal et al. (2021) and Rains et al. (2021).All stars were observed with the B3000 and R7000 gratings using the RT480 beam splitter, yielding low-resolution blue spectra (3 500 ≤  ≤ 5 700 Å, /Δ ∼ 3 000) and moderate resolution red spectra (5 400 ≤  ≤ 7 000 Å, /Δ ∼ 7000).This benchmark sample is described in Section 2.2, and is plotted as a colour-magnitude diagram in Fig. 1.
Our spectra were reduced using the standard PyWiFeS pipeline (Childress et al. 2014) using the flux calibration approach of Rains  our stellar benchmark sample, ordered from highest to lowest preference of label adoption.We list the median label uncertainty of each source and for our adopted set of labels as a whole, any inter-sample label systematics, benchmark stars with labels from each sample, benchmark stars without labels from each sample, and benchmark stars whose labels we adopt from each sample.We use Valenti & Fischer (2005) as our reference for computing [Fe/H] and [Ti/Fe] offsets which accounts for both reference abundance scale differences and other related systematics.2021), but remain uncorrected for telluric absorption which we treat by simply masking the worst-affected wavelength regions.Radial velocities were determined by fitting against a template grid of MARCS synthetic spectra as described in Žerjal et al. (2021), and our spectra were subsequently shifted to the rest frame via linear interpolation using the interp1d function from scipy's interpolate module in Python 4 .

Benchmark Stellar Parameters
To train a data-driven model we require a set of stellar parameters to train on.Like previous studies, we adopt the three 'core' stellar parameters of  eff , log , and [Fe/H], but distinguish ourselves by studying an additional chemical dimension in [Ti/Fe].The primary motivation for selecting [Ti/Fe] as our abundance of choice to investigate is due to the strong expected signature of TiO on our optical spectra, something we expect to correlate with [Ti/Fe]5 .Though we also expect strong optical signatures from C and O per Veyette et al. (2016), these elements have fewer absorption lines in the optical than Ti, and thus literature abundances sources from high-resolution spectroscopy are less prevalent.Finally, while we expect strong signatures from other oxides and hydrides on our spectra, we limit ourselves to two chemical dimensions in [Fe/H] and [Ti/Fe] for the purposes of this initial study with our relatively small sample size.When selecting our benchmark sample of cool dwarfs, our objective was to use only those stars with fundamental-or fundamentally calibrated-stellar parameters 6 .A large factor motivating our decision to pursue a data-driven approach stems from the incomplete and physically inaccurate nature of current generation synthetic optical spectra used in more traditional analyses.Given our goal is to avoid, or even shed light on these limitations, and the accuracy of a datadriven model is only as good as its training sample, we must then be very selective.
Thus, our benchmark sample is composed of 103 cool dwarfs with stellar parameters from at least one of the following categories 7 :  (Vallenari et al. 2023) and GALAH DR3 (Buder et al. 2021) data.While small, we note that such a limited training sample has precedent in studies like Behmard et al. (2019), Birky et al. (2020), andMaldonado et al. (2020).
In addition to these parameter-specific requirements, we also impose two additional quality constraints on our sample.Firstly, to ensure we have a clean sample free from obvious unresolved binaries, we require benchmarks have a Gaia DR3 Renormalised Unit spective [Ti/Fe] also traces [/Fe] at early times (Kobayashi et al. 2020).  Where fundamental in this context refers to parameters derived as independently of models as possible, such as  eff or  ★ measured or benchmarked from interferometry rather than those derived purely from isochrone fitting based on physical stellar atmosphere and evolutionary models. 7While beyond the scope of our work here, a potential fourth category of benchmark or science target are stars in moving groups or open clusters.While nominally chemically homogeneous at the current precision of our Cannon model (meaning that cool dwarf chemistry could be adopted from warmer cluster members), cluster chemical inhomogeneities have been observed when taking advantage of the extreme measurement precision offered by differential abundance analysis (e.g.Liu et al. 2016)-inhomogeneities which could plausibly be revealed using a data-driven model trained with an appropriate training sample.
Weight Error (RUWE) of < 1.4, above which the single star astrometric fit is deemed poor by the Gaia Consortium (e.g.Lindegren et al. 2021;Belokurov et al. 2020).Secondly, we reject binary benchmarks with inconsistent Gaia DR3 kinematics between the primary and secondary components in order to better guarantee our gold standard chemical benchmarks are physically associated.
The following sections describe our literature sources for each separate stellar parameter, as well as our adopted hierarchy between sources for those stars with more than a single source: Section 2.2.1:  eff , Section 2.2.2: log , Section 2.2.3: [Fe/H], Section 2.2.4: [Ti/Fe] from binaries, and Section 2.2.5 [Ti/Fe] from empirical chemodynamic trends.Table 1 serves as a summary of these subsections, with sources listed in order of preference when choosing which to adopt (included inter-sample systematics where measured).

Stellar 𝑇 eff
Interferometric temperature benchmarks form the cornerstone of our data-driven temperature scale, and thus we observed 17 stars from van Belle & von Braun (2009), Boyajian et al. (2012), von Braun et al. (2012), von Braun et al. (2014), and Rabus et al. (2019) with a median literature  eff uncertainty of ±25 K.This left a decision on what temperatures to adopt for the remainder of our benchmark sample, specifically with how we handle systematics between different temperature scales or those stars without a previously reported value (mainly our binary benchmarks).As an example, the bulk of our NIR [Fe/H] benchmarks have  eff from either Rojas-Ayala et al. (2012) or Mann et al. (2015), with Mann et al. (2015) noting temperature systematics between the overlapping samples between the two studies at warmer  eff8 .Given this concern, we deemed the use of a single uniform  eff scale for our benchmark sample critical in order to have the best chance of investigating cool dwarf chemistry.To this end, for all non-interferometric benchmarks we adopt  eff obtained via the fitting methodology of Rains et al. (2021)-itself calibrated to our adopted interferometric scale.We describe this method below, and add our statistical uncertainties in quadrature with the median benchmark  eff uncertainty for a final median uncertainty of ±67 K. Rains et al. (2021) undertook a benchmark-calibrated joint synthetic fit to spectroscopic and photometric data using flux calibrated WiFeS spectra and literature Gaia/2MASS/SkyMapper photometry for their cool dwarf sample.As a necessity for model-based work with cool optical fluxes, they quantified model systematics by comparing synthetic MARCS spectra and photometry to observed spectra and integrated photometry from their benchmark sample of 136 cool dwarf benchmarks with 3, 000 ≲  eff ≲ 4, 500 K.These systematics, parameterised as a function of Gaia ( − ) colour, were used to correct synthetic photometry during fitting, with only the most reliable regions in the WiFeS R7000 spectral arm being included. eff was the principal output of their fit, with both log  and [Fe/H] fixed using empirical relations (the former from Mann et al. 2015 andMann et al. 2019, the latter developed in Rains et al. 2021) to avoid parameter degeneracies due to the complexity of cool dwarf fluxes.Reported temperatures were calibrated to a fundamental scale by correcting for temperature systematics observed between fits to the aforementioned benchmark sample, itself fundamentally calibrated to the interferometric  eff scale.

Stellar log 𝑔
For stellar log , we also adopt the uniform values from Rains et al. (2021).Due to model limitations and degeneracies, Rains et al. (2021) fixed log  when fitting for  eff and used a two step iterative process to determine the final gravity.Initial  ★ and  ★ values were obtained from the photometric    band relations in Mann et al. (2019) and Mann et al. (2015) respectively, and were used to compute and fix log  for the initial fit.Following this initial fit,  ★ was recalculated from the fitted  eff and  bol values via the Stefan Boltzmann relation, log  recalculated, and a final fit performed to give the adopted stellar  ★ and log .It should be noted, however, that this process is almost entirely based on photometry, meaning that we are not sensitive to unresolved binarity or youth in the same way that spectroscopic techniques are.The median log  statistical uncertainty of our benchmark sample is ±0.02 dex.

Stellar [Fe/H]
The inability to recover [Fe/H] from optical cool dwarf spectra makes the selection of [Fe/H] benchmarks particularly crucial for any datadriven approach.The gold standard for cool dwarf metallicities continues to be those stars with a warmer companion of spectral type F/G/K from which traditional spectral analysis techniques like measuring equivalent widths or spectral synthesis are reliable.We adopt [Fe/H] from these stars where possible, sourced from Valenti & Fischer (2005), Sousa et al. (2008), Brewer et al. (2016), Montes et al. (2018), and Rice & Brewer (2020).These are our most precise [Fe/H] benchmarks, with −0.86 < [Fe/H] < 0.359 , with a median literature uncertainty of ±0.03 dex for the 17 such stars in our sample.
We adopt Valenti & Fischer (2005) as our reference when computing and correcting for [Fe/H] and [Ti/H] systematics (see the 'offset' column in Table 1), though prioritise Brewer et al. (2016) and Rice & Brewer (2020)-both follow-up studies-for their higher precision.Valenti & Fischer (2005) was chosen as our reference scale due to its large sample size and the fact that Mann et al. (2013a)-the original source for the [Fe/H] relations used in Mann et al. (2015)-also used it as their [Fe/H] reference point.Note that different sources in Table 1 adopt different abundance reference points, either published reference abundance levels or their own solar-relative abundances, and this results in straightforward differences in [Fe/H] scales.However, there exists the possibility for other effective systematics present due to e.g.differences in the temperature scale, and these will not be apparent from a simple comparison between abundance scales-but are accounted for by our approach of identifying and removing systematics between samples.Finally, when computing offsets we do so using a larger crossmatched set of literature F/G/K-K/M binary benchmarks than we have spectra for here, with 256 primaries and 259 secondaries in total (noting that not all of these stars are in all publications).
Our largest sample of [Fe/H] comes from empirical relations built from [Fe/H] sensitive spectral regions in low-resolution NIR spectra based on these binary benchmarks.While there are many such relations in the literature, the bulk of our stars are drawn from just two of them.69 of our stars have [Fe/H] from the work of Mann et al. (2015), with  [Fe/H] = ±0.08 dex.Another 12 are drawn from the work of Rojas-Ayala et al. (2012), with  [Fe/H] = ±0.18dex.Only three stars have adopted [Fe/H] from other relations, with one star from Terrien et al. (2015) with  [Fe/H] = ±0.07dex, and another two from Gaidos et al. (2014) with  [Fe/H] = ±0.1 dex.
Finally, for any star without [Fe/H], we adopt [Fe/H] from the photometric relation of Rains et al. (2021) with  [Fe/H] = ±0.19dex.This relation is only applicable to isolated single stars on the main sequence with reliable Gaia parallaxes.In our case, only two stars-GJ 674 and GJ 832, both interferometric benchmarks-have a value from this relation.However, this ensures that all stars in our sample have an [Fe/H] value.

Measured Stellar [Ti/Fe]
For those benchmark stars in F/G/K-K/M binary systems, we can adopt [Ti/Fe] abundances from the warmer primary where they it has previously been measured in the literature.Per Table 1, we adopt [Ti/Fe] for seven stars from Brewer et al. (2016), four stars from Rice & Brewer (2020), one star from Valenti & Fischer (2005), four stars from Montes et al. (2018), and one star from Adibekyan et al. (2012).We note that Brewer et al. (2016) and Rice & Brewer (2020) are follow-up work to Valenti & Fischer (2005), and thus we preference them due to their higher [Ti/H] precision (while again adopting Valenti & Fischer 2005 as our adopted reference for [Ti/H] systematics), and for Adibekyan et al. (2012) we adopt the abundance derived from Ti I lines.Each of these works report abundances as [X/H], so we have calculated [Ti/Fe] using the adopted systematiccorrected [Fe/H] values (typically from the same literature source) and propagated the uncertainties accordingly, resulting in a median uncertainty of  [Ti/Fe] = ±0.03dex.

Empirical Chemodynamic [Ti/Fe]
To assign [Ti/Fe] values for non-binary benchmarks, we make use of chemodynamic correlations in the Milky Way (MW) discs to "map" [Ti/Fe] values onto the benchmark stars.The chemical distinctness of the thick and thin discs in light elements (e.g.Mg, Ca, O, Si, Ti, or the -elements) across metallicities has become a well-accepted feature of our galaxy (e.g. as observed early-on in high resolution studies of small samples and in the larger APOGEE survey, Nissen & Schuster 2010;Hayden et al. 2015).To map values of [Ti/Fe], we utilise the GALAH DR3 data release (Buder et al. 2021) to recover chemistry and the value added catalogue (VAC) of Buder et al. (2022) for the stellar kinematics.Initial estimates of component membership (e.g.thick or thin disc) were made by comparing the energy (E) and component of the angular momentum (L z ) of the benchmark stars to a subset of the GALAH DR3 sample.The GALAH subset was selected to only include dwarf stars with similar stellar parameters to the F/G/K primaries of our binary benchmark stars ( eff > 4500 K and log  > 3.0) and to exclude stars with potentially unreliable chemistry (i.e.cool stars).The E, L z values for the benchmarks were recovered assuming the McMillan2017 (McMillan 2017) approximation for the MW potential and assuming a solar radius of 8.21 kpc and a circular velocity at the Sun of 233.1 km s −1 .The local standard of rest (LSR) was selected to be in the same frame of reference as the GALAH VAC.That is, the Sun is set 25 pc above the plane in keeping with Jurić et al. (2008) and has a total velocity of (, , ) = (11.1, 248.27, 7.25) km s −1 in keeping with Schönrich et al. (2010).
Unsurprisingly, all of the benchmark stars are found to be on nearly circular orbits, coincident with either the MW thick or thin disc in E, L z space.In addition to calculating the E, L z values for the benchmarks, we also calculated the v  values for the stars (the tangential velocity component in cylindrical coordinates) under the same assumption and orientation of the LSR.Following an exploration of various chemodynamic spaces, we found v  vs.To map a value of [Ti/Fe] to the benchmark stars, we performed a 2D interpolation of the clean GALAH disc sample in v  , [Fe/H] space using the N-dimensional linear interpolator in scipy (Virtanen et al. 2020).To explore the uncertainty associated with the benchmark chemodynamics, we perform 1 000 realisations sampling normal distributions in [Fe/H], and radial velocity (RV) and the multivariate distribution associated with the uncertainties in the astrometric parameters.We build the astrometric covariance matrix using the correlations and errors for the benchmarks within Gaia DR3 (Lindegren et al. 2021).The v  values are then recalculated for each draw and the corresponding v  , [Fe/H] values are used to infer a [Ti/Fe] for the star.The average recovered values of [Ti/Fe] for the benchmarks are shown in the left panel of Fig. 2 as the open circles.They are overplotted on-top of the clean GALAH disc sample (shown as the black points).Note that the uncertainties are largest for the lowest metallicity benchmarks.This is likely a result of the metallicity distribution function of the thick disc being less well-sampled in our GALAH subset.This in itself it likely driven by our conservative cut in v  to remove the bulk of the MW halo (which presents a much more complex trend of light elements with metallicity).
To both validate our methodology and place GALAH and our benchmark stars on the same scale, we repeated the same exercise to predict [Ti/Fe] on the sample of dwarf stars from Valenti & Fischer (2005).Fig. 3 shows the comparison between our predicted values of [Ti/Fe] and those published in Valenti & Fischer (2005) for stars with [Fe/H] ≥ −1 dex.When considering the 1 019 stars that meet our requirement in [Fe/H], we recover a median offset between the predicted and true values of [Ti/Fe] of 0.03 dex, with the residuals having a standard deviation of 0.08 dex.We correct for this offset in order to anchor our predicted [Ti/Fe] values to the Valenti & Fischer (2005) scale, and take the uncertainty of our mapping to be  [Ti/Fe] = 0.08 dex per the standard deviation.

THE CANNON
Our adopted model is the Cannon, first published in Ness et al. (2015).The Cannon is a data-driven model trained upon a library of wellconstrained benchmark stars and is able to learn a mapping between normalised rest frame stellar spectra and the corresponding set of stellar physical parameters.This mapping-essentially a form of dimensionality reduction between many pixels and few parametersworks by building a per-pixel model as a function of these parameters (also known as labels), the simplest of which might be a single label model in terms of spectral type (as in Birky et al. 2020), or a more complex-but more physically realistic-three parameter model in terms of e.g. eff , log , and [Fe/H].
As with any data-driven or machine learning model, a given implementation of the Cannon is only as accurate as its training sample of benchmark stars.When deployed in large spectroscopic surveys focusing primarily on warm stars (e.g.GALAH DR2, Buder et al. 2018), the training sample can easily consist of thousands of stars  Valenti & Fischer (2005) versus [Ti/Fe] as predicted from Gaia and GALAH chemodynamic trends as in Section 2.2.5, with the median and standard deviation of the residuals annotated.The black circled points correspond to the F/G/K primaries of our binary benchmarks.Note that we correct for the observed [Ti/Fe] systematic-that is we put our GALAH [Ti/Fe] values on the Valenti & Fischer (2005) scale-and that the observed value of  [Ti/Fe] is comparable with the mapped [Ti/Fe] statistical uncertainties for our sample quoted in Table 1 when taking into account [Fe/H] and kinematic uncertainties.
all of which have a mostly complete, uniform, and reliable set of stellar labels.This is not possible for cool dwarfs, whose inherent faintness and spectroscopic complexity make it challenging to assemble a large and uniform set of benchmarks with a complete set of training labels.As an example, while the temperature of interferometric benchmarks and the chemistry of binary benchmarks are incredibly well-constrained, their other labels will be known to lower precision-or might even be outright missing.
Given these challenges, for our work here we make use of two separate Cannon models: a three label model in  eff , log , and [Fe/H]; and a four label model in  eff , log , [Fe/H], and [Ti/Fe].We begin our methods section with a description of how we prepare our spectra in Section 3.1, before giving an overview of our Cannon models Section 3.2, model training in Section 3.3, and finally evaluating its performance in Section 3.4.

Spectra Normalisation
Inherent in the use of the Cannon is the assumption that the flux in each spectral pixel varies smoothly as a function of stellar labels, and that stars with identical labels will necessarily have near-identical fluxes.For this to be true, our spectra must be normalised and any pixels where this is not the case masked out and not modelled (e.g.wavelengths affected by stellar emission, telluric absorption, or detector artefacts).While it is possible to normalise optical spectra of warmer stars to the stellar continuum, this is not viable for cool dwarfs due to the intense molecular absorption present at such wavelengths.Fortunately, however, so long as the normalisation formalism is internally consistent, it is sufficient for input into the Cannon.Our approach follows that initially implemented by Ho et al. (2017), and later used by Behmard et al. (2019) and Galgano et al. (2020), to normalise our spectra via a Gaussian smoothing process: where  is the Gaussian normalised flux associated with the restframe wavelength vector ,   is the observed flux calibrated WiFeS spectrum, and f is a Gaussian smoothing vector.Each term of this smoothing vector is defined as: where f (  ) is the Gaussian smoothing term for rest-frame wavelength   ,  , is the observed flux at spectral pixel ,  , is the observed flux uncertainty at spectral pixel , and   (  ) is the Gaussian weight for spectral pixel  given   .The Gaussian weight vector for spectral pixel   is computed as: where  is our wavelength scale, and  is the width of the Gaussian broadening in Å.We used  = 50 Å, noting that we find parameter recovery in cross-validation relatively insensitive for 25 <  < 100 Å.
In the end our data consists of 5 024 spectral pixels with 4 000 ≤  ≤ 7 000Å.We omit wavelengths with  < 4 000 Å due to low SNR for our mid-M benchmarks, and mask out the hydrogen Balmer series and regions contaminated by telluric features.

Cannon Model
The traditional implementation of the Cannon from Ness et al. (2015) seeks to describe the stellar-parameter-dependent flux at a given spectral pixel with a model coefficient vector and an associated noise vector: where  ,  is the normalised model flux (from vector  per Equation 1) of star  at spectral pixel ;   is the model coefficient vector of length  coeff describing spectral pixel , ℓ  is the label vector for star  of length  coeff ; and   is a noise term for spectral pixel , composed of the model intrinsic scatter   and the observed flux uncertainty  ,  added in quadrature as: The full Cannon model describing  spectral pixels thus has two unknown matrices which must be fit for: the coefficient vector  of shape  ×  coeff and model scatter vector  of length .To do so, we make use of our normalised observed flux and flux uncertainty vectors  and   respectively with shapes  ×  star , as well as the label vector ℓ of shape  star ×  coeff constructed from the known stellar parameters of the training sample of  star benchmark stars.The Cannon formalism is sufficiently generic that its model-that is the specifics of the coefficient and label vectors-can in principle be of any complexity and used to describe any number of labels, though typically a quadratic model in each label is considered sufficient (e.g.Ness et al. 2015, Ho et al. 2017, Birky et al. 2020).In the case of a three label quadratic model in  eff , log , and [Fe/H], this results in a 10 term ℓ  : or, alternatively, for a four term model in  eff , log , [Fe/H], and [Ti/Fe], a 15 term ℓ  : where  ′ eff , log  ′ , [Fe/H] ′ , and [Ti/Fe] ′ are the normalised stellar labels such that: where ℓ ′ , is the  th normalised stellar label for star , obtained from the stellar label ℓ , and the mean ( ℓ  ) and standard deviations ( ℓ  ) of the set of training labels ℓ  such that the normalised labels each have zero-mean and unit-variance.
Note that, for a quadratic model, the label vector ℓ  contains three sets of terms: i) linear terms, including an offset term in the initial '1', ii) cross terms, and iii) quadratic terms.More generally, this allows the Cannon to account for covariances between labels, in addition to the isolated contribution from each label.We discuss this in greater detail in Section 5.1.

Model Training
We implement our Cannon model using PyStan v2.19.1.1 (Riddell et al. 2021), the Python wrapper for the probabilistic Stan programming language (Carpenter et al. 2017).Training the Cannon consists of optimising the model on a per-pixel basis for our two unknown vectors   , our coefficient vector, and   , the scatter per pixel using PyStan's optimizing function.This is done via a log likelihood approach as follows: where ln   , |   , ℓ  ,  2  is the log likelihood.We implemented a two-step training procedure to mitigate the impact of bad pixels on our model.Our initial model was trained and optimised on our benchmark set using only a global wavelength mask to exclude wavelength regions affected by telluric contamination or stellar emission.The resulting model was then used to predict fluxes for each benchmark star, with sigma clipping applied to exclude (via high inverse-variances during training) any pixel 6 discrepant from the model fluxes.The final adopted model is then trained using a combination of the original global wavelength mask and the per-star bad pixel mask.Put another way, we do not directly adopt a per-star bad pixel mask as output from the WiFeS pipeline, but instead create one with reference to our initial Cannon model.
Our three and four label models take of order ∼1.5 and ∼2.5 minutes to train respectively on an M1 Macbook Pro in serial for a single model without cross-validation.The spectral recovery for a representative benchmark sample with the blue and red arms of WiFeS respectively can be seen in Fig. 4 and 5, as generated from our fully-trained three label model 10 .While spectral recovery struggles a little more for the bluest wavelengths of our coolest stars, we deem this primarily due to the lower SNR at these wavelengths and more sparse sampling of the parameter space at these temperatures.
This recovery is particularly impressive when considering the deviations observed in Rains et al. (2021) between MARCS and Bt-Settl (Allard et al. 2011) synthetic spectra versus the same flux normalised spectra we train our Cannon model on here.In their Section 4.1, in particular fig. 4 and 5, they discuss 2 − 10 % flux differences between synthetic and observed spectra for several optical bands-deviations large enough to be quite obvious to the eye.We discuss these differences more quantitatively in Section 5.2

Model Validation and Label Recovery Performance
The accuracy of a data-driven or machine learning model can only be truly determined by testing the model on unseen data-i.e.data the model was not trained upon.Ideally one would have a large enough sample to initially partition it into separate training and test sets without compromising on model accuracy.With only 103 benchmarks, however, this is not feasible for our work here, so we instead opt for a leave-one-out cross validation approach.Under this paradigm, we train  different Cannon models on  different sets of  − 1 benchmark stars, testing each model on the  th benchmark left out of the model.Our final reported label recovery accuracy thus consists of the medians and standard deviations of the aggregate 'left out' sample across all models.Labels are fit using the curve_fit function from scipy given the coefficient and scatter arrays  and   from a fully trained Cannon model as described in Section 3.2.
We plot our label recovery performance in leave-one-out cross validation for  eff , log , and [Fe/H] in Fig. 6 for both our threeand four-label models.Similarly, Fig. 7 (2012), and [Fe/H] from F/G/K binary companions-again for our three and four parameter models respectively.Finally, Fig. 8 shows [Ti/Fe] recovery for our four-label model.Table A1 lists the labels inferred from our fully-trained four label model in  eff , log , [Fe/H], and [Ti/Fe] for our benchmark, noting that while these values are from the model trained on all 103 stars, our uncertainties are derived from the leave-one-out cross validation performance standard deviations added in quadrature with the statistical uncertainties.For our four-label model, these statistical uncertainties are rather small with means 1.3 K, 0.001 dex, 0.004 dex, and 0.001 dex in  eff , log , [Fe/H], and [Ti/Fe] respectively, meaning that our reported errors are based primarily on how well we recover our adopted set of literature benchmark labels.We adopt (and correct for)  eff , log , [Fe/H], and [Ti/Fe] systematics and uncertainties of −1 ± 51 K, 0.00 ± 0.04 dex and 0.00 ± 0.10 dex, and 0.01 ± 0.06 dex respectively.We note that these uncertainties purely refer to how well our recovered parameters reproduce the adopted benchmark  eff , log , [Fe/H], and [Ti/Fe] scales-i.e. the quality of our label transfer using the Cannon 11 .It is an altogether different task-and beyond the scope of our work 10 Where we use our three-rather than four-label model for ease of use when working with our existing MARCS grid and to avoid complexities that would arise when interpolating in [Ti/Fe]. 11A contributing factor for any systematics in label recovery is that the traditional Cannon model does not internally consider label uncertainties.If our Cannon model did properly model the uncertainties from benchmarks sourced from separate catalogues with varying label precisions we might expect any bias to be more solely a function of random noise.
here-to refine these benchmark scales, with studies comparing e.g.physically realistic  eff (Tayar et al. 2022) or abundance uncertainties (via differential analysis of Solar twins, e.g.Ramírez et al. 2014) indicating that these uncertainties-including the adopted literature values for our benchmarks-are likely underestimates.
Overall our three and four label models have similar label recovery performance (as distinct from spectral recovery), with Fig. 6 showing the most significant difference between the two models being a reduced  eff systematic (+7.83K to −1.42 K) and scatter (±53.97K to ±51.04 K) for the four label model.Given that one of the main indicators of  eff in cool dwarf atmospheres are the TiO bandheadsthe characteristic 'sawtooth' pattern in cool optical spectra and the traditional indicator of M dwarf spectral types-this improvement is consistent with expectations given the extra constraints provided by modelling [Ti/Fe].By contrast, however, there are no similar improvements to log  and [Fe/H] recovery when using the four label model, and we hypothesise that there are three factors at play here.The first of which is that our three label model already recovers log  and [Fe/H] at or nearly at the precision of the benchmark sample itself, meaning that further improvements would likely be marginal even in the best case.The second is that these parameters are less acutely sensitive to [Ti/Fe] or TiO absorption than  eff is, or at least have constraints from unrelated spectral features.For example, other molecules like CaH are strongly sensitive to log  and have long been used as a discriminator between traditional stellar luminosity classes like subdwarf, dwarf, and giant (e.g.Öhman 1936;Jones 1973;Mould & McElroy 1978;Kirkpatrick et al. 1991;Mann et al. 2012).Thirdly is the effect of small number statistics, as our models are trained on only 103 benchmarks the influence of outliers (discussed more in subsequent paragraphs) is more significant when computing the scatter from the standard deviation of the residuals-an effect which might hide slight improvements in label recovery.When considering Fig. 7, which separates out label recovery for different literature sources, we are especially hesitant to draw firm conclusions about the differences between the two models given the smaller sample sizes-something especially acute for the binary sample-and lack of a Cannon model which models label uncertainties as we can only expect label recovery at the level of the median  [Fe/H] of our benchmark sample.Nonetheless, our results are consistent with the expectation that a model constraining [Ti/Fe] would be better able to recover optical cool dwarf parameters given the significant influence of TiO on optical spectra.
NLTT 10349 (Gaia DR3 3266980243936341248), our most metal poor star with [Fe/H]= −0.86 ± 0.04, proves a consistent outlier in cross validation due to its uniqueness in our small benchmark sample, being roughly ∼3.7 from our sample's mean [Fe/H] (versus ∼2.3 from the mean value for the 2nd most metal poor star).Our model's inability to accurately recover its [Fe/H] in cross validation is a reflection on our sample size, rather than for instance a breakdown in the behaviour of cool dwarf spectra at low [Fe/H], leading to our conclusion that the Cannon is unable to accurately extrapolate far beyond the label values of the benchmarks used to initially train it.
Our  eff recovery for both models is entirely consistent with the median  eff precision of our sample (±67 K vs our ±51 K here).There are similar systematics observed between the bulk and interferometric samples, which points to our adopted  eff scale being correctly calibrated to a fundamental scale, despite the challenges noted in Rains et al. (2021) with their interferometric sample having saturated 2MASS photometry.
Our log  recovery is almost-consistent to within literature uncertainties (±0.02 dex vs our ±0.04dex here), noting that our gravities are mostly from photometric    - ⊙ relations, and should be  reliable for main sequence stars (but less accurate for unresolved binaries or stars still contracting to the main sequence).Assuming the Cannon has successfully learnt how to identify gravity through gravity sensitive spectroscopic features, observed outliers in log  could be unresolved binaries as the Cannon predicting higher gravities is consistent with underpredicted gravities from blended photometry.While cursory inspection of the spectra of these stars did not reveal any spectroscopic binaries, this hypothesis was validated once we made a RUWE cut upon the release of Gaia DR3, which removed most of the targets (i.e.observed benchmarks which no longer pass the quality cuts to appear in our work here) with aberrant log  values, improving the precision of our log  recovery from an initial ±0.06 dex to the reported ±0.04 dex (which itself drops to ±0.03 dex were we to exclude the two most aberrant stars in the sample).
Another alternative to unresolved binaries is that the Cannon could be giving higher gravities to young stars still contracting to the main sequence as it has not been trained on such a sample.We note that Behmard et al. (2019) included a  sin  dimension to their Cannon model, but we are far less sensitive to rapid rotation with our ∼7 000 spectra versus their ∼60 000 resolution spectra.Either of these physical explanations would serve to increase the scatter on our primarily photometric log  values, but regardless of the physical origin we have flagged those remaining benchmarks with fitted log  aberrant by > 0.075 dex in Table A1 using †.
The primary motivation of our work here was to study the chemistry of our cool dwarf sample from optical spectra, something Rains et al. (2021) was unable to accomplish previously using a modelbased approach using the same spectra as we do here.2012) sample (our second largest source of [Fe/H], ±0.12 dex vs our ±0.10dex) suggesting that their uncertainties are overestimated.

COOL DWARF PHYSICAL MODEL SPECTRA & [X/Fe]
To complement our discussion of data-data-driven flux recovery and cool dwarf chemistry, we now turn to a grid of physical models for comparison.More specifically, in this section we set out to study the impact of [X/Fe] on model cool dwarf continuum normalised spectra for the wavelength range covered by our WiFeS spectra.

Sensitivity of MARCS Pseudocontinuum to [X/Fe]
To better understand the physics of cool dwarf atmospheres, as well as guide our interpretation of the results and performance of our Cannon model, here we conduct a pilot investigation into the influence of atomic abundances on cool dwarf spectra using a bespoke grid of MARCS spectra.As in Nordlander et al. (2019), our 1D LTE MARCS grid 12 was computed using the TURBOSPECTRUM code (v15.1;Alvarez & Plez 1998;Plez 2012) and MARCS model atmospheres (Gustafsson et al. 2008).The spectra were computed with a sampling resolution of 1 km s −1 , corresponding to a resolving power of ∼300 000, with a microturbulent velocity of 1 km s −1 .We adopt the 12 While 3D models are in principle possible, Ludwig et al. (2006) demonstrates that for the parameter space considered here 3D structures are very similar to their 1D equivalents-staying close to radiative equilibrium in the optically thin regions due to the convective velocities being too small to drive substantial deviations away from hydrostatic equilibrium.Given this-in addition to the substantial computation requirements of 3D models-we limit ourselves to standard 1D models for our work here.We use a selection of atomic lines from VALD3 (Ryabchikova et al. 2015) together with roughly 15 million molecular lines representing 18 different molecules, the most important of which for this work are CaH (Plez, priv. comm.),MgH (Kurucz 1995;Skory et al. 2003), and TiO (Plez 1998, with updates via VALD3).Finally, we note that while this suite of models has issues reproducing cool dwarf optical fluxes, likely due to missing opacities or incomplete line lists (see Rains et al. 2021 for more information), the input physics should still be more than sufficient for qualitative analysis.
We say 'bespoke grid' because of the limited parameter space covered: the grid has three  eff values (3 000 K, 3 500 K, 4 000 K), two log  values (4.5, 5.0), and solar values of [Fe/H] and [/Fe] (0.0).While this might seem limiting, the strength of this grid comes from the added abundance dimensions, with each of C, N, O, Na, Mg, Al, Si, Ca, Ti, and Fe being able to be individually perturbed 13 by ±0.1 dex from the solar value.Thus, while the modelled star is broadly solar in composition, one can inspect the influence of small variations in abundance in isolation on optical fluxes where the pseudocontinuum position is strongly dependent on molecular absorption.
Fig. 9 shows the result of perturbing the abundances of C, O, and Ti-the three most influential elemental abundances in cool atmospheres-as well as the bulk metallicity [M/H] from −0.1 dex to +0.1 dex for the wavelengths covered by our WiFeS spectra at three different  eff values.C, O, Ti, and [M/H] all have a significantand mostly similar-impact on pseudocontinuum placement, with this similarity indicating the degenerate nature of these parameters.Notably C has the opposite sign to O, Ti, and [M/H], which we suggest relates to CO formation.While CO does not absorb at optical wavelengths it is, however, 'energetically favourable' (Veyette et al. 2016)-for a lower C abundance there is a greater 'relative' abundance of O available to form other molecules like TiO or H 2 O.At  eff = 3 000 K, wavelengths longer than ∼4 500 Å see fractional change in flux of ∼20% for C and [M/H], but closer to ∼40% for Ti and O-likely due to the dominance of TiO.This effect diminishes with increasing temperature, though the pseudocontinuum can still change at the ∼5 − 10% level-more than enough to complicate the measurement of equivalent widths via traditional spectroscopic analysis techniques.The very bluest wavelengths (below ∼4 300 Å) show a reduced influence from C, O, and TiO, potentially indicating that spectra in these regions would be less subject to degeneracies when fitting for the bulk metallicity.While the region between ∼4 100-4 200 Å (likely due to SiH in our synthetic spectra due to our choice of line list 14 , see Fig. 10 ) shows reduced influence from Ti as compared to C, O, and [M/H], on the whole it otherwise appears very difficult to disentangle the contributions from each element and the bulk metallicity.Fig. 10 is the same, but for N, Na, Mg, Al, Si, Ca, and Fe instead.While none of these reach the significance of C, O, and Ti, being generally much more limited in the wavelength regions they affect, everything except N has at least one spectral region with a flux change of ∼5 − 10%.Of note is that, outside of a number of strong lines, the influence of Fe is mainly below ∼4 000Å.This means that 13 Where the chemistry change is modelled by TURBOSPECTRUM while the underlying MARCS atmosphere remains the same.our Cannon model-as well as other optical [Fe/H] relations-are likely not actually sensitive to Fe spectral features directly when trained upon [Fe/H], rather how the shape of the pseudo-continuum correlates with [Fe/H].We attribute the surprisingly large influence Na has on flux (at the ∼5-20% level) to it being an electron donor, rather than being directly involved in atomic or molecular absorption itself.
Collectively, even qualitative analyses like this allow insight into the sensitivity of different wavelength regions to varying elemental abundances to guide spectral analysis.Optical wavelengths clearly contain a wealth of chemical information, but limitations with current models in the cool dwarf regime and the parameters varied in model grids make this difficult to exploit using traditional methods.

Model Scatter and Wavelength Label Sensitivity
Now with some theoretical insight into how sensitive optical cool dwarf fluxes are to variations in abundance, we can begin to assess the performance of our Cannon model in terms of how well it recovers these fluxes.The Cannon attempts to parameterise the flux of each spectral pixel as a function of the adopted stellar labels, in our case  eff , log , [Fe/H], and [Ti/Fe].Also inherent to the Cannon model is a noise term (see Equation 5) associated with each spectral pixel, which accounts for the fact that a) the training spectra are not known to arbitrarily high precision and thus have an associated flux uncertainty, and b) a scatter term to account for remaining flux variation unable to be parameterised as a function of the adopted labels with our adopted polynomial order.Broadly speaking, we would expect spectral pixels associated with strong atomic absorption features for atoms other than Fe or Ti to be poorly modelled by the Cannon as it does not have the constraints to properly learn these features.Similarly, since the entire optical region is affected by molecular absorption for cool stars, we in general expect a higher baseline model scatter for all pixels, peaking where atomic or molecular features from elements unconstrained by our purely [Fe/H]-[Ti/Fe] chemical model dominate.
One important caveat to keep in mind before we begin assigning flux sensitivities or model scatter to being a function of stellar chemistry are observational factors about our spectra or benchmark sample that could cause similar effects.The first of these is the SNR of our data, which is not uniform as a function of wavelength or  eff .In general, the warmer stars in our sample have higher SNR spectra from the WiFeS blue arm, with the coolest stars by comparison having much lower SNR in the blue-the very reason our Cannon model begins at 4 000 Å rather than covering the full extent allowed by the B3000 grating.The second effect is that of spectral resolution, with B3000 pixels having lower resolving power than their R7000 grating counterparts on the red arm.This could result in adjacent spectral features, which might be resolved at R∼7 000, being blended at R∼3 000 and being accordingly more complex for the Cannon to parameterise as compared to the same information spread across two separate pixels.Finally, given the relatively small training sample, there exists the possibility of edge effects near the edge of the parameter space where the model is more poorly constrained-something which could also increase model scatter.
With that introduction and caveating out of the way, Fig. 11 and 12 present benchmark spectra, pixel sensitivity, and model scatter for the B3000 and R7000 wavelength regions respectively, and Table 2 offers a counterpart summary for each term in the label vector   .
From top-to-bottom, each plot shows the normalised benchmark stellar spectra (labelled with prominent atomic features); spectral pixel sensitivity to  eff , log , [Fe/H], and [Ti/Fe] in the form of the linear Cannon coefficients; quadratic Cannon coefficients; cross-term Cannon coefficients; and the Cannon model scatter for each spectral pixel for our 3 and 4 label models (also labelled with prominent atomic features).For the purposes of comparison, note that the y-axis scale for the Cannon scatter panels are the same for both plots.
The model scatter in the blue is on average higher than in the red, which is consistent with both a higher contribution from unmodelled atomic absorption at bluer wavelengths, and the aforementioned lower spectral resolution of our B3000 spectra.We see a reduction in the median scatter value across all pixels by ∼7.2 % when switching from a three label to a four label model, which gives some hint as to the improved explanatory power of adding [Ti/Fe] as a label when attempting to reproduce observed spectra.
Table 2 lists the standard deviation    for each coefficient in our coefficient vector   -linear, quadratic, and cross-term in each label-for both WiFeS wavelength ranges, and for each of our two Cannon models.Larger values of    correspond to the terms that are important for describing the largest-scale changes in the spectrum, for instance the linear and quadratic  eff and log  terms for describing TiO bandheads.Values for all labels are uniformly larger in the blue versus the red, which again is consistent with the lower spectral resolution and higher atomic contribution in the blue.The cross-terms in particular give insight into the correlations between each label, and it will be interesting to see what would happen were we to add further [X/Fe] dimensions in follow-up work.Finally, of particular note is how important log  is as a label in linear, quadratic, and cross-terms-despite the fact it is often determined using photometry and was the label we put the least effort into sourcing literature values for.That said, for stars on the main sequence (i.e. the bulk of our sample bar the aforementioned outliers) log  is going to be well-or entirely-correlated with  eff and [Fe/H] so it likely isn't truly an independent label.
There are numerous strong lines of Ca, Ti, Cr, and Fe in the spectrum (among other elements) which are clearly associated with many of the large 'spikes' in model scatter.At this resolution we do not just expect a contribution from isolated strong lines, but also closely bunched multiplets which, whilst blended, still observably affect flux.As expected, there are many more atomic features towards bluer wavelengths-the exact regions where models are the least reliable, and cool dwarfs the faintest.Nonetheless, this points once more to the sheer amount of chemical information present at optical wavelengths for these stars, most of which is also inaccessible under our current Cannon model formalism.
Our present Cannon model assumes that the flux of every spectral pixel depends on  eff , log , [Fe/H], and [Ti/Fe]-the entire set of modelled stellar labels.While this is physically reasonable in the optical where much of the spectral information for these labels is contained in broad molecular features spanning a large range in wavelength, this assumption ceases to be valid when trying to model the abundances for elements whose primary influence is present only as discrete atomic lines.Thus, to better represent known physics it is more reasonable to implement a Cannon model with regularisation whereby coefficients are actively encouraged to take on values of zero.Casey et al. (2016) demonstrated such an approach successful in modelling the full suite of APOGEE stellar parameters ( eff , log , and 15 abundances: C, N, O, Na, Mg, Al, Si, S, K, Ca, Ti, V, Mn, Fe, Ni), and we are hopeful that following their approach in the future would allow us to add additional [X/Fe] dimensions to our Cannon model.Finally, with reference back to Fig. 9 and 10, it is possible to qualitatively compare model molecular absorption with increased Cannon model scatter   (noting that we cannot expect a 1:1 match in light of known model spectra difficulties reproducing optical fluxes).Two regions of increased scatter seem to correspond with Ca-associated features (likely CaH) 4 100 − 4 300 Å (the region of largest scatter in the blue) and 6 100 − 6 150 Å (Ca I), though there are also relatively strong [Si/H] and [Fe/H] features in the former region.The Na doublet region (5 800 − 6 000 Å) has the most scatter in the red, but it is difficult to ascribe this to Na alone.Curiously, 4 700 − 5 200 Å is the region where we expect Mg to have the largest impact, but the scatter is small compared to the aforementioned regions.While it is difficult to discuss this quantitatively with our existing model, it would be illustrative to revisit this with a future version of the Cannon with regularisation and a broader set of chemical labels.

Data-Driven vs Physical Model Flux Recovery
Given the good performance of our Cannon model in recovering optical fluxes, it is illustrative to perform a comparison between data-driven and physical model spectra.For this comparison we treat our Cannon-produced spectra essentially as a self-consistent and interpolatable proxy for the observed spectra which, among other things, removes the influence of SNR or artefacts present in any single observed spectrum.Fig. 13 shows a comparison between our 3 label Cannon model in  eff , log , and [Fe/H] and the equivalent MARCS model spectra for the same set of benchmarks stars and adopted benchmark parameters presented in Fig. 4 and 5 (again using the three-label model to avoid interpolating the MARCS grid in [Ti/Fe]).Alongside this is Fig. 14 which shows the percentage flux difference between the Cannon and MARCS spectra for all 103 benchmark stars (spanning the ranges 1.7 < ( − ) < 3.8 and 4.55 < log  < 5.20).Our MARCs spectra were generated using the same grid used for Rains et al. (2021), and normalised using the same normalisation formalism applied to our observed spectra as described in Section 3.1.
Takeaways from this comparison are as expected from previous studies: performance is worse at bluer wavelengths and cooler temperatures.A few more detailed observations are as follow: • The best matching wavelength region is the few 100 Å wide region surrounding H (nominally ∼6 400-6 800 Å, though even this gets worse towards mid-M spectral types).
• The wings of the Na D feature are consistently poorly reproduced by MARCS, even for the warmest stars in our sample.
• Discrepancies at blue wavelengths are obvious for even the warmest stars in our sample, and continue to get (sometimes dramatically) worse with cooler temperatures.
• While the positions of TiO bandheads are well matched, their fluxes are not, likely posing problems for temperature or spectral type determination.
• We suspect the mismatch between 4 000 − 4 500 Å is caused by a spectral depression centred on the ∼4 227 Å neutral Ca resonance line that is poorly reproduced by MARCS.This feature-long known in the literature (e.g.Lindblad 1935;Vyssotsky 1943)-was recently investigated in detail by Jones et al. (2023), who conclude the mismatch between models and observations is due to a 'lack of appropriate treatment of line broadening for atomic calcium'.We direct interested readers to this paper and the references within for more information.This isn't a perfect comparison, since we cannot control for the effect of elemental abundance variations which strongly affect optical fluxes.Where our MARCS models have a uniform scaled solar abundance pattern-critically uniform C, O, and Ti abundancesour Cannon model is trained on stars from the Solar Neighbourhood which will instead show a spread in abundances.Further, our normalisation formalism is likely not robust to spectra as dramatically different as the Cannon vs MARCS blue spectra are due to incomplete line lists or opacities in the latter.Nonetheless, even a qualitative comparison is illustrative the degree to which current generation model optical fluxes at low and medium resolution are a poor match to observations, and we advise caution when considering a purely model based approach when working with optical spectra of cool dwarfs.

Comparison with Previous Data-Driven Studies
Now we turn to putting our work in the broader context of other datadriven studies of cool dwarfs, with a broad overview shown in Table 3.While the studies referenced span a range of data-driven algorithms, spectroscopic datasets, and modelling choices, we ultimately find it useful to evaluate them against two metrics that underpin much datadriven work in astronomy: label transfer and domain exploration.
The goal of label transfer is to quickly and precisely propagate a smaller set of potentially computationally expensive labels to some other, often larger, set of data.This might allow the transfer of elemental abundances from high-resolution spectroscopic surveys like APOGEE to their low-resolution counterparts like LAMOST (e.g.Ho et al. 2016;Wheeler et al. 2020), enable putting stellar parameters from distinct surveys on the same scale to enable cross-survey comparison (see e.g.comparing GALAH and APOGEE, Nandakumar et al. 2022), or simply for computational efficiency due to the relative speed of data-driven methods versus more traditional modelling approaches (see e.g.GALAH DR2, Buder et al. 2018).Of note is that label transfer-however precise-also transfers any systematics or quirks from the original sample.While all data-driven studies involve some level of label transfer, not all are-or need be-interested in domain exploration, where the goal is to better physically characterise a given parameter space and Comparison between our fully-trained 3 label Cannon vs MARCS model spectra for a representative set of benchmark stars for 4 000 <  < 7 000 Å at R∼3 000 − 7 000, with Cannon model spectra in red and MARCS model spectra in blue.Both the MARCS and Cannon spectra have been generated at our adopted benchmark labels, rather than our fitted labels.The vertical red bars (from left to right) correspond to H-, H-, H-, a bad column on the WiFeS detector, atmospheric H 2 O absorption, H-, and O 2 telluric features, all of which were masked out during modelling.The stars are sorted by their Gaia ( − ) colour to show a smooth transition in spectral features across the parameter space considered.ideally learn something new or exploitable.A specific example of this is Ness et al. (2016) using the Cannon to recover spectroscopic red giant masses-historically an extremely challenging task-based on mass-dependent dredge-up signatures present in the CN absorption features of APOGEE spectra.In the case of cool dwarfs, domain exploration generally means attempting to overcome model limitations, account for  eff -chemistry degeneracies, and ideally gaining insight into the complex molecular physics governing their atmospheres.
Collectively, the studies cited in Table 3-Behmard et al. (2019), Birky et al. (2020), Galgano et al. (2020), Maldonado et al. (2020) and Li et al. (2021)-demonstrate data-driven label transfer for cool dwarfs across a range of different wavelengths and spectroscopic resolutions.From these studies it is clear that the inability to continuum normalise cool dwarf spectra in any physically meaningful way does not appear to be a huge impediment to precise label transfer with data-driven models, which is fortunate for the potential of the method.That said, the accuracy15 of the transferred labels depend strongly upon the training sample used and the source of its labels, with each study also taking a different approach when it comes to domain exploration.In the next three subsections we put our work ical abundances-which are inherently model-derived-for anything other than the Sun's [X/H] = 0 by definition, so we use it to refer to recovery of the original set of benchmark labels/accepted abundance scales, rather than something more physical like is appropriate in the case of parameters like  eff or log .
in context with these studies as we evaluate the state of the field for data-driven studies of cool dwarfs.Behmard et al. 2019and Birky et al. 2020Behmard et al. (2019) and Birky et al. (2020) both use the Cannon trained on small, but high-resolution, training sample of benchmark stars primarily drawn from Mann et al. (2015).Their resulting [Fe/H] uncertainties prove consistent with the fundamentallycalibrated Mann et al. (2015) sample, while their  eff values are marginally less precise.This is the opposite  eff trend to what we observe with our results, where instead our values are consistent with literature uncertainties.We believe this discrepancy is a function of both spectral resolution and wavelength coverage, rather than inherent differences in our respective Cannon models.Our broader wavelength coverage-enabled by our medium-resolution spectra-encompasses a greater number of highly-temperature sensitive molecular features like optical TiO bandheads, something less present in the shorter optical range covered by the HIRES spectra from Behmard et al. (2019), or the IR APOGEE spectra of Birky et al. (2020).On the other hand, their higher spectral resolution gives them access to many more unblended atomic features to use as [Fe/H] indicators, features which in the IR are also less affected by degeneracies imposed by molecular absorption.

5.3.1
Using the California Planet Search (CPS) HIRES sample, Behmard et al. (2019) trains their Cannon model on a much wider temperature range (3 000 <  eff < 5 200 K range than the other studies.This appears to have resulted in larger  eff residuals at cooler temperatures, potentially indicating that a given Cannon implementation struggles to model the molecule dominated spectra of cool stars at the same time as the more 'regular' spectra of their solar-type counterparts.The implication being that, short of a more complex Cannon model, it is likely best to restrict oneself to modelling the two paradigms separately-something our early prototyping with the broader range of stars from Rains et al. (2021) supports.
Our results at medium resolution and moderate SNR bear out the prediction from Behmard et al. (2019) using convolved and degraded HIRES spectra that the Cannon continues to function effectively across a range of spectral resolutions and SNRs.In an attempt to prevent overfitting, they also implemented a regularised Cannon model which yielded no gain in label prediction.Their hypothesis, similar to our discussion in Section 5.1, is that regularisation is unnecessary where each stellar label affects each spectral pixel, but will become more important for models extended to include elemental abundances where a given abundance might only affect a smaller set of spectral pixels.Birky et al. (2020) takes advantage of the Cannon being an interpretable machine learning model, and uses the first order model coefficients to identify [Fe/H] sensitive atomic or molecular features in their APOGEE spectra-the most significant of which they make publicly available.While we see similar sensitivity when interpreting our Cannon implementation, we don't report a specific list of features as in our spectra they are both more likely to be blended and outweighed in significance compared to the ever-present molecular absorption.Additionally, while they take the time to analyse their fundamentally-calibrated cool dwarf [Fe/H] propagated to the broader APOGEE sample in terms of Galactic dynamic, we leave such work until we're able to derive more detailed abundances for our (much smaller) sample of stars.
Importantly, by putting their stars on a fundamentally-calibrated scale, Birky et al. (2020) are able to reveal both  eff and [Fe/H] systematics in the stellar parameters reported by APOGEE.While they do not undertake a detailed analysis, they attribute these discrepancies to incomplete line lists or opacities in the stellar models used by ASPCAP (APOGEE Stellar Parameters and Chemical Abundances Pipeline, García Pérez et al. 2016), resulting in systematically biased fits as the pipeline attempts to optimise the continuum level in lieu of the missing features.This result is significant, as it hints at the danger of naively trusting cool dwarf parameters produced by generalist pipelines-something not appropriately considered by Li et al. (2021) when propagating APOGEE stellar labels to LAMOST spectra (see discussion in Section 5.3.3).

Maldonado et al. 2020
Using a Principal Component Analysis (PCA) and Bayesian based approach in conjunction with F/G/K-K/M binary benchmarks Maldonado et al. (2020) was able to recover a suite of 14 elemental abundances for their cool dwarf sample-the largest set of abundances to-date using a data-driven framework.Their method used HARPS spectra of 16 binary benchmarks convolved to a resolution of ∼1 000 − 2 000, with the training sample selected to best match the metallicity distribution of nearby (< 70 pc) and kinematically similar F/G/K stars.While they do not undertake leave-one-out crossvalidation as we do here, they validate their approach statistically by comparing abundances trends between K/M dwarfs and F/G/K stars, and methodologically by applying the technique to F/G/K stars to check for overfitting.
While the principal difference between our two approaches is of course the technique-PCA vs the Cannon-there is also the differ-ence in size between our two training samples.Although our binary benchmark sample is similar in size, 17 stars vs their 16 stars, we have a much larger sample of  eff and secondary [Fe/H] benchmarks that serve to more robustly anchor our temperature and metallicity scales over a wider parameter space.Maldonado et al. (2020) notes that only three stars in their training sample have [Fe/H] < 0.1 dex, and aimed to address this by matching the [Fe/H] distributions of their training sample with the Solar Neighbourhood F/G/K star sample.By comparison, six of our binary benchmarks-and roughly a third of our sample overall-have [Fe/H] below this threshold.Additionally, it is unclear to the extent that their approach is sensitive to  eff −chemistry degeneracies and how this affects their parameter recover, as this is something they do not discuss.Despite these limitations, they demonstrate clear recovery of their selected set of abundances-a remarkable achievement which points to the density of chemical information in optical spectra able to be retrieved when relying on a carefully considered set of benchmark stars.
Of their recovered abundances, it is interesting that both Ca and Mg show a comparatively large scatter as both these elements, and their derived molecules in CaH and MgH respectively, have a substantial number of absorption features in the optical.Hopefully follow-up work from both Maldonado et al. (2020) with PCA, and our work with the Cannon, can investigate whether this is astrophysical or sample-related in nature.It is, however, extremely encouraging to see that both C and Ti show much better recoveries given the extent both affect optical fluxes (per our discussion in Section 5.1).
Finally, in terms of the merits between our respective models, it seems clear that PCA is both more effective and computationally less complex when it comes to dimensionality reduction versus our Cannon model without regularisation.However, the Cannon-being a generative model-remains the more interpretable approach, and we are better able to investigate label sensitivity as a function of wavelength and compare back to theoretical models.As such, it would be illustrative to apply both approaches to the same dataset to better compare, contrast, and capitalise on the respective advantages of each model in the pursuit of understanding the chemistry of cool dwarfs.Galgano et al. 2020 andLi et al. 2021 As data-driven works, Galgano et al. (2020) and Li et al. (2021) operate primarily in the label transfer space, with both using large (> 1 000 star) training samples to allow for more statistically robust model training and testing.Galgano et al. (2020) apply the Cannon to LAMOST spectra, and draw their labels ( eff ,  ★ ,  ★ ,  ★ ) from the TESS Input Catalogue (TIC, Muirhead et al. 2018) which is based primarily on photometric empirical relations such as those from Mann et al. (2015).Li et al. (2021)  Of the data-driven studies discussed, Galgano et al. (2020) is unique in that it does not consider chemistry as a stellar label, primarily due to limitations with their use of the TIC as a source of labels.However, for those stars with stellar parallaxes in the TIC,  ★ and  ★ are computed from    , which is relatively insensitive to metallicity (as discussed in Section 1).As such, these labels-analogous to our adopted log  values also from photometric relations-will not be massively affected by parameter degeneracies in the source catalogue, even in the absence of chemical constraints. eff , however, will be, which is part of the reason for the large  eff uncertainties in the TIC to begin with.The TIC, however, is not entirely devoid of spectroscopic metallicity information for its brighter targets and, with the addition of photometric metallicities, an upgraded Cannon model able to take into account label uncertainties would almost certainly prove useful.Nonetheless, there is-as the authors note-utility in transferring labels from photometric empirical relations sensitive to reddening to a more distant sample of stars observed at low spectral resolution like LAMOST, especially as a means of providing an empirically calibrated reference point for future model-based analyses.

5.3.3
An interesting comparison comes about when looking at the SNR investigation undertaken by Galgano et al. (2020).Their required SNR threshold for label uncertainties to be minimised is extremely high (SNR > 150), which indicates that at low spectroscopic resolution much higher-SNR values are necessary in order to constrain stellar labels from blended spectroscopic features, as compared to our much lower-SNR-but higher resolution-spectra.Li et al. (2021) trains two separate SLAM models on LAMOST spectra: one with APOGEE labels, and another with labels from BT-Settl model spectra (Allard et al. 2011).Since both of these label sources are model-based, their stellar labels are not empirically calibrated on benchmark stars in the same way as the previously discussed studies, but they are able to observe and compare systematics between these two sources, as well as with the results of Birky et al. (2020) which were also based on APOGEE spectra.While further comparison with our work is difficult as they neither use the Cannon, nor benchmark-based labels, we note the power of working from a catalogue as large as LAMOST which allows validation using cluster stars or M-M binary systems, as well as the successful deployment of the SLAM algorithm to cool dwarfs in the spirit of algorithmic diversity.
Low-resolution optical spectra show great promise for data-driven studies of cool dwarfs since they are cheap to obtain observationally and, as discussed in Section 4, should show the chemical imprints of a number of molecular species over a broad range in wavelength.With both Galgano et al. (2020) and Li et al. (2021) having laid the groundwork for future LAMOST studies, it would be good to see a data-driven implementation with a more bespoke training sample based on well-characterised stellar benchmarks.Low-resolution surveys should have a particular advantage when it comes to collating a diverse sample of binary benchmarks as these systems rapidly become quite faint and difficult to observe at high-resolution in the optical.As such, there is much untapped potential for the existing LAMOST dataset, and these spectra should not be overlooked when it comes to understanding the physics and chemistry of cool dwarfs.

CONCLUSIONS
In the work presented above, we have detailed the development of a new four-label data-driven spectroscopic model in stellar  eff , log , [Fe/H], and [Ti/Fe] trained on 103 cool dwarf benchmarks observed in the optical (4 000 <  < 7 000 Å, R∼3 000-7 000) with the WiFeS instrument on the ANU 2.3 m Telescope utilising the widely-used Cannon algorithm.Not only do we put our work in context with other data-driven studies on cool dwarfs, but we conduct an investigation into the sensitivity of optical wavelengths to atomic and molecular features informed by both data-driven and physical models, and provide insight into the reliability of fluxes from physical models at cool temperatures.The main conclusions from our work are as follows: (i) Our new four-label Cannon model is trained on 103 cool dwarf benchmarks, 17 of which have literature abundance measurements from a binary companion.Under cross-validation our model is capable of recovering  eff , log , [Fe/H], and [Ti/Fe] with precisions of 1.4%, ±0.04 dex, ±0.10 dex, and ±0.06 dex respectively-a very encouraging result given the extreme  eff -chemistry degeneracy of optical spectra.
(ii) Using kinematics from Gaia DR3 and chemistry from GALAH DR3 we demonstrate the ability to predict [Ti/Fe] for Milky Way disc stars by interpolating in the empirical   -[Fe/H] space to  [Ti/Fe] ± 0.08 dex precision.Given the little that is known chemically about cool dwarfs due to their complex spectra, this approach shows promise for coarsely determining the abundances of  process elements prior to proceeding with more in-depth analyses of difficult-to-interpret optical spectra.
(iii) We find our data-driven approach far superior at recovering optical cool dwarf fluxes compared to theoretical models using modern grids of synthetic spectra (see the discrepancies noted in e.g.Reylé et al. 2011, Mann et al. 2013c, Rains et al. 2021).This demonstrates that data-driven techniques will be essential to fully exploiting optical spectra of cool stars until the next generation of physical models are able to update the currently incomplete line-lists for dominant molecular absorbers (e.g.TiO, McKemmish et al. 2019).
(iv) Using a custom grid of MARCS model cool dwarf spectra we conduct an investigation into the sensitivity of optical fluxes to chemical abundance variations.Our grid has 10 different chemical abundance dimensions (C, N, O, Na, Mg, Al, Si, Ca, Ti, and Fe) able to be individually perturbed by ±0.1 dex from Solar composition.Critically, this allows the inspection of the bulk effects of abundance variations on molecular absorption features.Our results indicate that a change in C, O, or Ti abundances affects the position of the pseudocontinuum to a similar or greater level than changing the bulk metallicity by the same amount (ranging from a 10 − 40% change in flux), in concordance with prior work by Veyette et al. (2016) on C and O abundances. (v) While not reaching the level of significance as C, O, or Ti, our grid also shows a number of spectral regions with a large (10 − 20%) flux sensitivity to Na, Mg, Al, Si, Ca, and Fe, most likely arising from various molecular hydrides or oxides.
(vi) Using the aforementioned model grid and a list of strong atomic features present in cool dwarf atmospheres, we interpret the modelled scatter of our Cannon model-the pixel variation unable to be parameterised by our adopted four label quadratic model-in physical terms.We find the model scatter correlates with numerous strong lines of Ca, Ti, Cr, and Fe (among others), as well as regions we associate with molecular features like Ca or Si.
(vii) We perform a direct comparison between Cannon and MARCS model spectra for wavelengths uncontaminated by strong telluric absorption within the spectral regions 4 000 ≤  ≤ 5 400 Å (/Δ ∼ 3 000) and 5 400 ≤  ≤ 7 000 Å (/Δ ∼ 7 000), with MARCS spectra showing large departures from the Cannon fluxes that get worse at bluer wavelengths or cooler temperatures.We find only the few 100 Å wide region surrounding H (nominally ∼6 400 − 6 800 Å) to be consistently reliable for the parameter space we consider, and warn anyone undertaking a model-based approach on optical cool dwarf spectra using current-generation models to proceed with caution.
This study builds upon previous empirical, benchmark, and datadriven research on cool dwarfs and could not exist without such foundational work dedicated to understanding the most common kinds of stars in the Universe.While we have not yet resolved the  eff -chemistry degeneracy that has historically limited our under-standing of such stars, we are given cause for cautious optimism.The sheer breadth of optical wavelengths that are sensitive to variations in chemical abundance in cool dwarfs are much greater than for Solar-type stars where most chemical information comes from isolated lines.This hints at cool stars being a powerful and as-yetuntapped method for studying the chemistry of our Galaxy and the demographics of planets-if only this information could be properly unlocked.This bodes well for cool dwarf focused work in current or upcoming optical surveys like GALAH, 4MOST, or SDSS-V, especially when combined with Gaia DR3 for continuing to refine and broaden our benchmark sample.

Figure 1 .
Figure 1.2MASS    versus Gaia DR3 ( − ) colour magnitude diagram for our 103 selected cool dwarf benchmarks, coloured according to their adopted [Fe/H].The subsample of benchmarks with chemistry from an F/G/K binary companion are outlined.
[Fe/H] to isolate the [/Fe] bimodality (associated with the thick and thin disks, Nissen & Schuster 2010) the most cleanly.This is shown for a subset of GALAH stars in the left panel of Fig. 2 where we have removed the bulk of the stellar halo by applying the cut, v  > 100 km s −1 .The clean GALAH disc sample is binned into 100 bins in v  , [Fe/H] and coloured by the the average value of [Ti/Fe] in each bin.The benchmark stars are overplotted in the right panel of Fig. 2 to highlight their association with the thick (high [Ti/Fe], low v  ) and thin (low [Ti/Fe], high v  ) discs.

Figure 2 .Figure 3 .
Figure 2. Left: The "mapped" values of [Ti/Fe] for the benchmarks (black circles) using their v  , [Fe/H] values, and interpolation of the space presented in the right panel.The average values are calculated following 1 000 draws sampling the errors associated with [Fe/H] and the input astrometric parameters from Gaia DR3.The GALAH disc sample is plotted underneath.Right: The distribution of a subset of GALAH DR3 stars with large v  (> 100 km s −1 ) selected to isolate the thick and thin MW discs.The subset is binned and coloured by the average [Ti/Fe] value of the bin (note the recovery of the [/Fe] bimodality).Benchmark stars are overplotted as open circles, where their v  values have been calculated under the LSR discussed in Section 2.2.5.

Figure 4 .Figure 5 .
Figure 4. Spectra recovery for a representative set of benchmark stars with the WiFeS blue arm for 4 000 <  < 5 400 Å at R∼3 000, with observed spectra in black and Cannon model spectra in red.We generate model spectra from our fully-trained three-label Cannon model at the adopted (rather than best-fit) benchmark labels.The vertical red bars correspond to H-, H-, and H- from the hydrogen Balmer series which were masked out to avoid emission features.The stars are sorted by their Gaia ( − ) colour to show a smooth transition in spectral features across the parameter space considered.
illustrates our stellar parameter recovery as a function of label source:  eff from interferometry, [Fe/H] from Mann et al. (2015), [Fe/H] from Rojas-Ayala et al.

Figure 6 .Figure 7 .
Figure6.Leave-one-out cross validation performance for recovery of adopted labels (per Table1) for Top: 3 label ( eff , log , [Fe/H]) and Bottom: 4 label ( eff , log , [Fe/H], [Ti/Fe]) Cannon models respectively.Each panel shows the median and standard deviation of the residuals computed between the adopted benchmark values and Cannon predicted equivalents, where  resid is added in quadrature with the Cannon statistical uncertainties to give our adopted label uncertainties.
14 Studies using different line lists attribute features in this region to MgH-A-X or the tail of TiO  (Pavlenko 2014); or AlH(Pavlenko et al. 2022).

Figure 9 .Figure 10 .
Figure 9. Sensitivity of cool dwarf MARCS model fluxes to variations in abundances for elements present in dominant molecular absorbers (C, O, and Ti) over the WiFeS wavelength range, as demonstrated with three different sets of solar [M/H] spectra (from top to bottom) with  eff and log : 3 000 K and 5.0; 3 500 K and 4.75; 4 000 K and 4.65.The upper panel of each plot is the continuum normalised synthetic MARCS model spectra (model continuum level at 1.0) at Solar [M/H] and a Solar abundance pattern.The bottom panel shows the fractional change in continuum normalised flux when changing a single abundance from +0.1 to −0.1 dex of the Solar value, as well as the bulk metallicity perturbed by the same amount for comparison.Note that the flux change from C has the opposite sign to O and Ti and as such has been multiplied by −1 for better comparison, and that the y-axis scale is different for each panel.

Figure 11 .
Figure11.Four label Cannon model pixel sensitivity to stellar labels and model scatter for WiFeS B3000 spectra with prominent atomic absorption features of Ca i, Ti i, and Fe i labelled (EW> 400 mÅ, calculated for  eff = 3 500 K, log  = 5.0, [Fe/H] = 0.0).Panel 1: Normalised B3000 spectra of stellar benchmarks used to train the model, where darker coloured spectra correspond to cooler stars.As before, the shaded red regions correspond to stellar emission, telluric absorption, or bad pixels.Panel 2: First order  coefficients for each of the stellar labels ( eff , log , [Fe/H], and [Ti/Fe]).The further each coefficient is from 0, the more sensitive the flux in a given spectral pixel is to a specific label.Panel 3: As Panel 2, but for second order  coefficients.Panel 4: As Panel 2 and 3, but for cross-term  coefficients.Panel 5: Cannon modelled scatter for each spectral pixel for both our 3 and 4 label Cannon models, where higher values indicate that the adopted stellar labels are increasingly insufficient to completely parameterise the stellar flux.The same absorption features as before are overplotted.
Figure13.Comparison between our fully-trained 3 label Cannon vs MARCS model spectra for a representative set of benchmark stars for 4 000 <  < 7 000 Å at R∼3 000 − 7 000, with Cannon model spectra in red and MARCS model spectra in blue.Both the MARCS and Cannon spectra have been generated at our adopted benchmark labels, rather than our fitted labels.The vertical red bars (from left to right) correspond to H-, H-, H-, a bad column on the WiFeS detector, atmospheric H 2 O absorption, H-, and O 2 telluric features, all of which were masked out during modelling.The stars are sorted by their Gaia ( − ) colour to show a smooth transition in spectral features across the parameter space considered.

Figure 14 .
Figure 14.Percentage flux difference between our fully-trained 3 label Cannon vs MARCS model spectra for all 103 cool dwarf benchmarks, again sorted by Gaia DR3 ( − ) colour.From top to bottom the panels show wavelengths where the percentage flux difference is < 10%, < 5%, and < 2% respectively.Each panel has its own colour bar, where white regions indicate a good match between the Cannon and MARCS spectra (ΔFlux ≈ 0), with the colour getting progressively darker green (ΔFlux > 0) or purple (ΔFlux < 0) as the match worsens.Any wavelength region with a percentage flux difference beyond the 10%, 5%, or 2% levels respectively is set to black, and the vertical red bars are again global masked telluric/emission/bad pixel regions.Note that these percentage differences only apply to our normalised spectra, and not unnormalised flux-calibrated spectra, but they remain a useful metric for quantifying the accuracy of physical models like MARCS.These benchmarks span the parameter ranges 1.7 < ( − ) < 3.8 and 4.55 < log  < 5.20.
also makes use of LAMOST data, but instead use the Support Vector Regression-based SLAM algorithm (Stellar LAbel Machine, Zhang et al. 2020) with APOGEE (Majewski et al. 2017)  eff and [M/H] values.They chemically validate their methodology by checking against stars in known open clusters, as well as those in M-M binary systems whose stars should be chemically consistent.

Table 2 .
Comparison of standard deviations computed for each  termoffset, first order, second order, and cross term-for both our 3 and 4 label Cannon models.Larger values indicate a given coefficient (and thus the associated stellar parameter/s) is a more important term in modelling stellar fluxes.