Inclination Angles for Be Stars Determined Using Machine Learning

We test the viability of training machine learning algorithms with synthetic H alpha line profiles to determine the inclination angles of Be stars (the angle between the central B star's rotation axis and the observer's line of sight) from a single observed medium-resolution, moderate S/N, spectrum. The performance of three different machine learning algorithms were compared: neural networks tasked with regression, neural networks tasked with classification, and support vector regression. Of these three algorithms, neural networks tasked with regression consistently outperformed the other methods with a RMSE error of 7.6 degrees on an observational sample of 92 galactic Be stars with inclination angles known from direct H alpha profile fitting, from the spectroscopic signature of gravitational darkening, and, in a few cases, from interferometric observations that resolved the disk. The trained neural networks enable a quick and useful determination of the inclination angles of observed Be stars which can be used to search for correlated spin axes in young open clusters or to extract an equatorial rotation velocity from a measurement of v sin(i).


Machine Learning in Astronomy
Astronomers are increasingly turning to machine learning to provide automated detection, analysis, and classification in response to large scale surveys that produce unprecedentedly large datasets (Baron 2019).Machine learning differs from traditional model-fitting techniques in that the model is constructed according to the input data rather than being predefined (Ivezić et al. 2020).The flexible nature of machine learning algorithms make them suited to a wide variety of tasks.In astronomical research, common uses of machine learning include classifying objects of interest from large databases (Domínguez Sánchez et al. 2018;Wang et al. 2022), dimensionality reduction (Portillo et al. 2020;Kovačević et al. 2022), anomaly detection (Baron & Poznanski 2017;Giles & Walkowicz 2020), building models that use more parameters than is possible with classical models (Huertas-Company et al. 2008), and visualizing datasets with a high number of parameters (Giles & Walkowicz 2018;Reis et al. 2021).
Broadly speaking, machine learning can be divided into supervised and unsupervised algorithms.In supervised machine learning, a set of input features are mapped to a target variable based on labels provided by a human expert (Ivezić et al. 2020).In unsupervised machine learning, labels are not included and the algorithms are frequently used to cluster data into groups, reduce dimensionality, and detect anomalies (Baron 2019).

Machine Learning in Be Star Research
Classical Be stars are rapidly-rotating, B-type, main sequence stars that are surrounded by an equatorial, circumstellar, decretion disc (Porter & Rivinius 2003).The defining characteristic of a Be star is the presence of emission in the hydrogen Balmer series, notably H , owing to the presence of the disc (Slettebak 1982).The exact mechanism that puts the disc gas into orbit is unknown, but it is thought to be related to near critical rotation, perhaps driven by the redistribution of angular momentum within the star (Granada et al. 2013;Rivinius et al. 2013).
Machine learning has emerged as a promising technique to identify Be star candidates in databases produced by large photometric and spectroscopic surveys.Bromová et al. (2014) used wavelet transformations to reduce the dimensionality of approximately 2,300 spectra of about 300 Be and B[e] stars in the vicinity of H . Each spectrum was given a label corresponding to pure emission, absorption smaller than 1/3 of the emission peak, absorption greater than 1/3 of the emission peak, and no emission.These labels were then used to train a support vector machine (Vapnik 1999) to classify the spectra into emission stars and normal stars.Although Bromová et al. (2014) were not explicitly concerned with the determination of inclination angles, their approach shares significant similarities with the present work.Reis et al. (2018) searched dr14 of the APOGEE near-infrared survey using methods based on anomaly detection.A random forest algorithm (Ho 1995) was trained, using a sample consisting of both synthetic and observed spectra, to create a matrix of similarity scores between each pair of spectra based on the likelihood that a given pair would end up in the same terminal branch of the random forest.The similarity matrix was then used as the input for a t-SNE algorithm (van der Maaten & Hinton 2008) to reduce dimensionality and help with visualization.The spectra with the lowest similarity scores and their nearest neighbors were then manually inspected yielding (among other finds) 40 previously undiscovered, classical Be stars.Wang et al. (2022) found 1,162 Be star candidates in dr7 of the LAMOST survey by searching for H  emission in the spectra of early type stars using the ResNet convolutional neural network (He et al. 2015), combined with a series of tests to remove confounding objects such as B[e] and Herbig stars.A follow up series of tests on the Be star candidates yielded 183 previously undiscovered classical Be stars.
The present work seeks to extend machine learning as applied to the Be stars to include the automatic determination of quantitative information from their spectra.As it is well known that the morphology of the H  line strongly reflects how the star-disk system is viewed (see Figure 1 and the discussion below), we target the extraction of the stellar viewing inclination of the central star from a single, continuum normalized spectrum of a moderate resolution centred on H . The performance of three supervised machine learning algorithms, each trained on synthetic spectra, are compared: neural networks tasked with regression, neural networks tasked with classification, and support vector regression.Each algorithm is then applied to an observed sample of Be star H  spectra to judge performance in realistic cases.

The inclination angle and its relationship to H𝛼 morphology
The inclination angle, i, is the angle between a star's axis of rotation and an observer's line of sight and ranges from 0 • to 90 • for pole-on and edge-on observations respectively (see Figure 1).It is usually assumed that stellar rotation axes are randomly oriented in space which leads to an expected ()  = sin   distribution for any observed sample of stars (Gray 2021).Corsaro et al. (2017) cast doubt on the assumption of random inclinations by finding significant spin axis alignment for the red giant stars in the old open clusters NGC 6791 and NGC 6819 using asteroseismology.Corsaro et al. (2017) investigated 48 oscillating red giant stars with masses in the range of 1.1-1.7 M ⊙ and found that about 70 percent of the stars in each cluster showed a strong level of alignment.The probability that these alignments arose by chance from an underlying random distribution was calculated to be below 10 −7 for NGC 6819 and below 10 −9 for NGC 6791 (Corsaro et al. 2017).Conversely, the inclination angle distribution obtained from a sample of 36 field red giants showed no significant spin alignment (Corsaro et al. 2017).Hydrodynamical simulations (Corsaro et al. 2017) and numerical simulations of the effects of shear versus compressive turbulence (Rey-Raposo & Read 2018) suggest that if a significant fraction of a star cluster's initial kinetic energy is rotational, then stars can form in a cluster with significant correlations in the direction of their rotation axes that can persist over Gyr timescales.
The strongly correlated spin alignments found by Corsaro et al. (2017) have been contested by Mosser et al. (2018) and Gehan et al. (2020), who attributed them to a combination of systematic bias that favoured low inclination angles and neglecting to account for the impossibility of measuring inclination angles near either 0 • or 90 • .A re-analysis by Mosser et al. (2018) of the spin alignments of both NGC 6819 and NGC 6791 found the inclination angle distribution of both open clusters to be consistent with a sin  distribution upon taking these effects into account.Gehan et al. (2020)'s analysis supported Corsaro et al. (2017)'s conclusion that the distribution of the field red giant stars was isotropic, but was unable to test the conclusions on spin-alignment in open clusters because their method is unsuitable for red clump stars.Gehan et al. (2020) urged caution in accepting strongly aligned stellar spins in open clusters and highlighted the need for a dedicated study using another method.
Be stars offer an alternative avenue to search for correlated spin axes in young open clusters.This is because Be stars are bright, common (≈ 20 percent of main sequence B stars are Be stars (Zorec & Briot 1997)), and their inclination angles can be reliably determined spectroscopically (see below).Also, for bright and nearby Be stars,  can be reliably determined using long baseline optical interferometry (LBOI) observations of the star-disc system (van Belle 2012).Additionally, there are methods based on gravitational darkening, in which rapid rotation causes the stellar intensity to vary with latitude (Zeipel 1924;Collins 1963), and  is extracted from detailed spectral synthesis (Townsend et al. 2004;Frémat et al. 2005;Zorec et al. 2016).Sigut et al. (2020) showed that spectral synthesis of H  can accurately determine the orientation of a Be star's disc, and as the disc is in the star's equatorial plane, the inclination of the star itself.The method of Sigut et al. (2020) leverages the fact that the morphology of a Be star's H  emission-line profile varies strongly with inclination even if the disc size and density structure is held constant.This is shown in Figure 1; here low inclinations give rise to singly-peaked emission in H , moderate inclinations result in doubly-peaked emission, and high inclinations result in doubly-peaked lines with deep shell absorption (Porter & Rivinius 2003).By comparing a single observed H  profile to a library of synthetic spectra computed using the Bedisk and Beray suite of codes (Sigut 2018), Sigut et al. (2020) were able to recover the inclination angles of 11 Be stars to within ±10 • as compared to LBOI determined inclinations.Sigut & Ghafourian (2023) further test the H technique using a sample of Be stars with inclinations available from gravitational-darkening studies (Zorec et al. 2016) and find good agreement between the two methods.

Organization
Section 2 describes the synthetic Be star spectra used to train the machine learning algorithms.Section 3 describes the three machine learning algorithms and the associated performance metrics by which they have been evaluated.Section 4 details the procedure for optimizing user-defined model parameters, called hyper-parameters, that must be tuned for each algorithm in order to ensure optimal performance and discusses the accuracy achieved in the synthetic test samples.The results of testing the trained algorithms on observed H  profiles for a sample of 92 Be stars, with available inclination angle determinations from gravity darkening (Zorec et al. 2016) and H  profile fitting (Sigut & Ghafourian 2023), are found in Section 6. Section 7 contains a case study of using the trained algorithms on 11 nearby Be stars with well-constrained inclination angle determinations from LBOI.A discussion of our results follows in Section 8.

SYNTHETIC TRAINING SPECTRA
In order to train machine learning algorithms to determine the inclination angles of Be stars, large libraries of synthetic spectra were generated centred on the vacuum value of H , 6564.6 Å.Each individual model H  profile is represented by 201 continuum-normalized flux values covering the region ±1000 km s −1 from line centre.One library of H  line profiles, corresponding to a range of equatorial disc density models, was generated for each of the central B star masses given in Table 1, which correspond to spectral types ranging from approximately B9V to B0.5V.

Creating the libraries of synthetic spectra
The libraries of Be star H  line profiles were computed by Sigut et al. (2020) using the Bedisk and Beray suite of codes (Sigut 2018).Ekström et al. (2012)'s stellar evolutionary models for a core hydrogen fraction of  = 0.3, which corresponds approximately to the middle-age main sequence, were used to generate the radii, luminosities, and effective temperatures of the central B stars.Table 1 details the stellar properties adopted.
Bedisk outputs the radiative equilibrium temperatures in the Be star's circumstellar disc given the central B star's photoionizing radiation field and density structure as inputs (Sigut & Jones 2007).If the distance from the rotation axis of the central B star is , the central B star's radius is  * , the distance above the equatorial plane is , and the disc scale height is , the density structure of the disc is parameterized by where  0 and  are free parameters that can be adjusted to match observations.The scale height, H, of a disc in vertical hydrostatic equilibrium is given by where temperature  0 = 0.6  eff ,  s is the speed of sound at  0 , and  K () is the Keplerian orbital speed at distance  (see Sigut et al. 2020).
For each central B star mass given in Table 1, 165 different discs were considered, comprised of 15 values of  0 distributed evenly in log-space between 10 −12 g cm −3 and 10 −10 g cm −3 and 11 values of  between 1.5 and 4 in increments of 0.25 (Sigut et al. 2020).A Bedisk model was computed for each of the 165 permutations and then the hydrogen level populations computed by Bedisk were used by Beray to compute individual H  line profiles.Beray accomplishes this task by solving the radiative transfer equation along a series of rays directed at the observer (Sigut 2010;Sigut 2018).The composite disc-plus-star H  profile is computed in a unified way by incorporating the relevant boundary condition for each ray.Rays that terminate on the stellar surface use a Doppler-shifted, photospheric H  profile for the upwind boundary; rays that pass through the disc but miss the star assume no incident radiation.This allows the computed profiles to be directly compared with observed profiles (after convolution to the correct spectral resolution).Calculating the H  line profiles introduces two new parameters,  D , which is the outer radius of the disc and , which is the inclination.Seven disc sizes, from 5  * to 65  * in steps of 10  * , and ten inclinations, from 0 • to 90 • in steps of 10 • , were considered.Each of the 11 central Be star masses detailed in Table 1 has an associated library containing 11,550 line profiles resulting in 127,050 H  line profiles overall.

Samples of synthetic spectra
Several different samples of H  spectra are used in this work, and the following naming conventions are employed.Previously in Section 2.1, a library of 11,550 synthetic spectra was created for each stellar mass in Table 1.This current section details the creation of a sample of ∼8,000 synthetic spectra from each of the libraries of synthetic spectra.Section 5.4 describes how these samples of synthetic spectra are further divided into training, validation, and test sets.The training, validation, and test sets are used to optimize the algorithms' hyper-parameters in Section 4 and to train the algorithms in Section 5. Once trained, the algorithms will be used to determine the inclination angles of two samples of observed spectra: the 92 star Zorec sample in Section 6 and the 11 star NPOI sample in Section 7.
To create a sample of synthetic spectra from a profile library, the desired number of spectra,  spec , is specified.Then, only H  spectra that have an average, absolute percentage difference from the reference photospheric profile (for the same mass) of 3 percent or more are selected randomly from the line profile library corresponding to a central B star of a given mass.Profiles too similar to the reference photospheric line profile are not included because they lack significant line emission (or shell absorption), and therefore poorly constrain the inclination angle.As line profiles within this 3 percent threshold are excluded from the sample, it is not possible to use all 11,550 H  line profiles contained within a given library.This work uses a sample of ∼8,000 H  line profiles for each central B star mass.
Two additional parameters that need to be specified when creating a sample are the spectral resolution, R, and the signal to noise ratio, S/N.If Δ is the characteristic width of the instrumental profile, then the resolution of the spectra is defined as R ≡ /Δ.The signal to noise ratio is the ratio between the measured flux of the signal to that of the noise in the continuum adjacent to the line, i.e.S/N = 100 spectra will have 1  error bar magnitudes equal to 1 percent of their corresponding flux measurements.The profiles were generated at R = 10, 000 and S/N = 25.The resolution was chosen because it matches that of the Zorec and NPOI samples of observed spectra in Section 7.
Although the observed sample spectra have S/N ≳ 100, initial testing found that algorithms trained on S/N = 25 profiles outperformed algorithms trained on S/N = 100 profiles at predicting the inclination angles of observed Be stars, possibly because the algorithms trained at S/N = 100 were overspecialized to synthetic profiles and could not deal effectively with the deviations from those profiles exhibited by observed spectra.
Figure 2 shows several synthetic H  emission line profiles for a 4 M ⊙ Be star at R = 10, 000.Illustrated are a representative range of synthetic profiles for different choices of S/N, disc density parameters, and viewing inclinations, with the the upper-right panel showing a profile rejected for being too close to the underlying photospheric profile and within the 3 percent tolerance.

Preprocessing the input spectra
Each synthetic spectrum is stored in a 201-element vector containing the continuum-normalized, relative fluxes equally spaced in the interval ±1, 000 km s −1 about line centre in H .These vectors of relative fluxes are used as input for both types of neural networks, regression and classification.However, unlike neural networks, support vector regression uses Euclidean distances (see Section 3.2), and vector elements with relatively large values (such as profiles with large emission peaks) will dominate the distance calculations.For this reason, we have scaled each of the samples of ∼8,000 synthetic spectra such that all elements have a mean of zero and unit standard deviation prior to use as input for support vector regression.
Each observed spectrum was visually centered on the vacuum value of H ,  0 .The wavelengths associated with each flux in a spectrum were converted to velocities using the Doppler formula relative to line centre, / = Δ/ 0 , as the H  line covers only a narrow range of wavelengths.Compared to retaining the full wavelength dependence, this simplification results in errors that are at most a tenth of the assumed spectral resolution (i.e., 3 km s −1 compared to 30 km s −1 for R = 10 4 ).The observed spectra were truncated to the range ±1, 000 km s −1 and the fluxes were interpolated so that each observed spectrum lies on the same 201 point velocity grid as the synthetic spectra.As with the synthetic spectra, these vectors of relative fluxes are used, directly, as inputs for both types of neural networks but are standardised to zero mean and unit standard deviation before being used as input to support vector regression.

ALGORITHMS AND PERFORMANCE METRICS
This work uses three types of supervised machine learning algorithms to learn the relationship between H  emission line profiles and : neural networks tasked with regression, neural networks tasked with classification, and support vector regression.The algorithms are trained on grids of relative fluxes from synthetic Be star line profiles, in the vicinity of H , and the trained algorithms are then used to determine  for observed Be stars.
A performance metric is needed in order to quantify how well the relationship between H  emission line profiles and  has been learned.The performance metric used in this work is the root mean squared error (RMSE), defined as where  is the number of Be star spectra in the sample, ŷ are the inclination angle determinations of our machine learning algorithms, and  are our target inclinations.Each spectrum in a sample has an associated target inclination known precisely from the Beray calculation.All sample spectra are uniformly distributed from 0 • to 90 • in steps of 10 • .In Sections 6 and 7, we calculate the RMSE performance of the machine learning algorithms on observed spectra.For observed spectra, the target inclinations, , are the inclination angle determinations of another method (e.g., H  profile fitting).

Neural networks
A neural network (NN) is a supervised machine learning algorithm comprised of computational units called nodes organized in layers.
In feed-forward configuration, every node is a linear combination of the nodes in the preceding layer followed by an application of a non-linear activation function ℎ.A single layer NN receives the 201 relative H  fluxes as an input vector, , and returns a scalar output variable, ℎ(, ), via the equation by finding , the vector of weights, that minimizes a loss function which quantifies the discrepancy between the target values and the output values determined by the NN during training (Bishop 1995).This formulation of the NN equation implicitly includes the bias, a constant offset term, as the element  0 by defining  0 ≡ 1. Information about the loss functions used in this work can be found in Section 5.Although regression is the natural task of a machine learning algorithm that outputs a continuous scalar such as , this work uses both regression as well as classification NNs 1 .The outputs of classifiers are not normally directly comparable to those of regressors.However, by choosing an activation function whose output has a probabilistic interpretation, a weighted average can be used to transform a classification NN's output to a continuous scalar, which can then be compared with the output of the regression algorithms using the same performance metric.Although a full discussion is beyond the scope of this work, the authors are aware that the validity of this approach, which requires interpreting the output of the classification NNs as measures of model confidence, is contested (Gal & Ghahramani 2015;Xing et al. 2019).
The NNs tasked with regression use the hyperbolic tangent function 2 as the activation function for each of their layers.In the above formulation, • represents an arbitrary input.The NNs tasked with classification use two different activation functions.All of the layers other than the output layer use the hyperbolic tangent function, while the output layer uses the softmax function, where • again represents an arbitrary input.The sum in the denominator is taken over the classes (which are the inclination bins 0 • to 90 • in steps of 10 • in this case) such that the denominator is a normalizing factor.This activation function was chosen because it assigns a probability to the likelihood that a given spectrum corresponds to each of the inclination classes.While regression NNs are the natural choice for this work (because  is a continuous scalar), our primary reason for also including classification NNs is exploratory: we are interested in whether the softmax function outputs would be tightly clustered around the bins nearest to the target inclination or more flatly distributed.

Support Vector Regression
Support vector regression (SVR) is a supervised machine learning algorithm that works by fitting a hyper-plane, with as many dimensions as the dataset contains features, to the data points.The SVR algorithm uses only a subset of the training data; data points sufficiently close to the hyper-plane (within a hyper-cylinder of radius ) are ignored (Vapnik 1999).SVR was chosen for this work because it is deterministic, faster to train than NNs, and effective in high dimensional feature-spaces 3 .
3 SVR was implemented using the MATLAB R2021a function fitrsvm.
SVR seeks to minimize 1 2 with respect to |||| 2 , subject to constraints where |||| is the Euclidean norm of the vector of weights.Here  is a regularization parameter,  are the target values of , ŷ are the values of  predicted by the model, and  * and  are distances beginning at the border of the -insensitive region and extending above and below it respectively (Vapnik 1999).

Committees of neural networks
NNs initialized with random weights and biases can become trapped in poor local minima during training (Bishop 2006).Different NNs trained on the same inputs will, in general, have variance associated with their outputs even if the NNs are identically constructed (Bishop 1995).Furthermore, the RMSE is somewhat sensitive to outliers.To address these concerns, we train committees of independent NNs and retain only the median performing member.Two committees of five neural networks were trained for every central stellar mass listed in Table 1; one committee is comprised of NNs tasked with regression and the other, tasked with classification.All 10 of the NNs associated with each central stellar mass are trained on the same sample of synthetic spectra (see Section 2.2 for details).
Our approach differs from the commonly employed technique of bootstrap aggregation, whereby each neural network in the committee is trained on a bootstrapped sample of the original training sample and the overall determination of the committee is the average determination of its constituent members (Breiman 1996).The advantage of bootstrap aggregation is that under ideal conditions (the errors of the committee members are uncorrelated and have a mean of zero), the average error of a committee falls like the reciprocal of the number of its constituent members (Bishop 2006).Unfortunately, these idealized conditions are not met in this work; the errors of the NNs are highly correlated and do not have a mean of zero (see Sections 6 and 7), and we have instead chosen a committee structure that prioritizes outlier removal.

HYPER-PARAMETER OPTIMIZATION
The performance of machine learning algorithms on a given task varies depending on user-defined hyper-parameter values.Since the optimal values of these hyper-parameters are difficult to guess a priori and can significantly impact performance, they must be searched for (Bishop 2006).The NN hyper-parameters that were optimized are the number of hidden layers (  ) and the number of nodes per layer (  ).The SVR hyper-parameters that were optimized are the size of the insensitive region (), the regularization constant (), and the kernel scale ().Additionally, all three algorithms have hyper-parameters that were chosen without being explicitly optimized in order to save computation time.These hyper-parameters were assigned standard choices and can be found in Section 5.
The hyper-parameters of the machine learning algorithms were optimized independently for each of the Be star masses in Table 1.Each Be star mass has an associated sample of 8,000 synthetic H  profiles of R = 10, 000 and S/N = 25.As there are 11 samples of synthetic profiles and three machine learning algorithms, this amounts to 33 sets of hyper-parameters to be optimized in total.
Although the goal of this work is to produce an automated means of determining  for observed Be stars from a single, medium-to-high resolution spectrum, the nature of the training process dictates that both the hyper-parameters and the parameters of the algorithms are optimized based on their ability to determine  for synthetic spectra.The performances reported in this and the following section should be seen in that context.

Hyper-parameter optimization for NNs
The hyper-parameters that were optimized for both types of NNs are the number of nodes per layer (  ) and the number of hidden layers (  ).For NNs tasked with regression, the optimization scheme consists of searching over a grid, found from preliminary trials, that contains the following six values of   ,   ∈ {4, 5, 6, 8, 10, 12}, and two values of   ,   ∈ {1, 2}.
To perform the search, a committee of five NNs were trained on the same sample for each combination of   and   on the grid.The performance of a given (  ,   ) pair is taken to be the RMSE of its median performing committee member.The optimal hyperparameters are taken to be the (  ,   ) pair with the best performance.Figure 3 shows the hyper-parameter optimization scheme applied to the 4 M ⊙ sample; here, the combination of two hidden layers of six nodes was found to be optimal.Then, this process was repeated for each of the remaining ten samples of synthetic profiles.
For NNs tasked with classification, the optimization scheme is nearly identical to that of the NNs tasked with regression.The only differences are that preliminary trials found that the grid to be searched over contains the following seven values of   ,   ∈ {25, 30, 35, 40, 45, 50, 55}, and that the output of the classifier is a vector of probabilities that needs to be converted to an estimate of  using a weighted average before a performance can be assigned via the RMSE.The optimal hyper-parameter combinations for both types of NNs are summarized in Table 2.While the performance always increased going from one to two hidden layers, we chose to limit the NN depth to two hidden layers because preliminary testing found that adding a third hidden layer rendered computation times prohibitive for only minimal gains in performance.

Hyper-parameter optimization for SVR
The hyper-parameters that were optimized for SVR are the insensitive region (), the regularization constant of Equation ( 7) (), and a scaling factor that the input matrix is divided by called the kernel scale (KS).For SVR, the optimization scheme consists of searching over combinations of , , and  that were drawn randomly in log-space from the ranges ∈ [10 − 1 2 , 10 where  and  are drawn from a range spanning an order of magnitude from the prescription of Cherkassky & Ma (2004) and the range for  was determined from empirical trials.
A combination of hyper-parameters is generated by drawing each of the three hyper-parameters independently.Once a combination of hyper-parameters has been drawn, an SVR is trained on one of the samples of synthetic profiles and its performance is stored.This process is repeated 150 times for each of the 11 samples of synthetic profiles.The motivation for repeating the process  = 150 times is that it will find a hyper-parameter combination in the ninety-eighth percentile with 95 percent confidence via solving 1 − 0.98  = 0.95 for  ≈ 148.We determined that significant performance gains were unlikely to be achieved by raising the value of  by comparing the three best hyper-parameter combinations for each of the 11 samples and noting that the performance difference between the best and third best hyper-parameter combination was always below 0.1 • .The optimal SVR hyper-parameter combinations, resulting from this process, are summarized in Table 3.

TRAINING THE ALGORITHMS
This section describes how the machine learning algorithms are trained on samples of synthetic spectra.By modifying their adaptive parameters, namely, weights and biases, training allows the machine learning algorithms to leverage patterns between the synthetic H  profiles and their associated inclination angles.The trained algorithms will then be used to determined the inclination angles of observed Be stars in the following two sections.
The training process introduces additional hyper-parameters to those optimized in Section 4. These hyper-parameters, which have been assigned standard choices, are the loss function, training algorithm, and kernel function.While the work is organized such that Section 4 is about hyper-parameter optimization and Section 5 is about training the algorithms, the two sections should be seen as complimentary: the optimized hyper-parameters are used during training and the training process was used to optimize the hyper-parameters.

loss functions
In order to quantify the discrepancy between the determinations of a machine learning algorithm and their associated target inclinations during training, a loss function,  (), is used.For a machine learning algorithm, learning consists of minimizing the loss function by modifying the adaptive parameters of the algorithm, namely the weights and biases.
For both NNs tasked with regression and SVR, this work uses the mean squared error, where  is the number of Be star spectra in the sample, ŷ are the inclination angle determinations of our models, and  are the target inclinations, as the loss function.The mean squared error was chosen because it contains the same information as our performance metric, is a standard choice for regression problems, and because it has the property of heavily penalizing large errors.
For NNs tasked with classification, this work uses cross-entropy, where , ŷ, and  are defined as in equation ( 12) and the second sum is taken over the classes.The probability that a profile's target inclination belongs to a given class, (), can only have values of zero or one.If we consider a profile with an associated target inclination of  = 20 • , then () is equal to one when  corresponds to the 20 • class and is equal to zero otherwise.The inclination determinations of the classifiers are vectors of length , whose components contain the probability that a profile belongs to each of the inclination classes, ( ŷ).While cross-entropy is the standard loss function used in classification NNs, recent work has cast doubt on its, supposed, superiority over the mean squared error (Muthukumar et al. 2020;Hui & Belkin 2021).Nevertheless, cross-entropy was chosen because using a squared loss function appears to impede the optimization of NNs with a softmax output layer (Hui & Belkin 2021) which is required for converting the probability that each profile belongs to a given inclination class into a scalar estimation of  (see Section 3.1).

Optimization algorithms
In order to minimize the loss functions discussed in Section 5.1, an optimization algorithm is required.The two major classes of algorithms applicable to minimising continuous, differentiable functions of several variables (applicable to both types of NNs) are variants of either gradient descent or Newton's method.The optimization algorithm we use for the NNs tasked with regression, which is effectively an interpolation between these two major classes of algorithms, is the MATLAB R2021a implementation of the Levenberg-Marquardt algorithm (Levenberg 1944;Marquardt 1963).
In order to reduce computational cost, the Levenberg-Marquardt algorithm uses J T J, where J is the Jacobian matrix, to approximate the Hessian matrix when performing Newton's method-like parameter updates; this approximation only holds for squared loss functions and is, therefore, incompatible with the NNs tasked with classification (which use cross-entropy as their loss function).The optimization algorithm we use for the NNs tasked with classification, which is an accelerated variant of gradient descent, is the MATLAB R2021a implementation of the scaled conjugate gradient descent algorithm (Møller 1993).SVR results in a very large convex, quadratic programming (QP) optimization problem.As the optimization surface is convex, SVR cannot become trapped in poor local minima during training the way NNs can.The optimization algorithm we use for SVR, which breaks this large QP problem into a series of minimally sized QP problems that can be solved analytically, is the MATLAB R2021a implementation of sequential minimal optimization (Platt 1998) 4 .

Kernel function
For SVR, the kernel function, , maps inputs to higher dimensional spaces, where a suitable hyper-plane can be found, before backprojecting to the original feature space (Bishop 2006).This allows for a curved hyper-plane, which may provide a significantly better fit to the data than a straight one would.The radial basis function where  and  ′ represent feature vectors of relative fluxes, is the kernel function used in this work.The radial basis function was chosen because it is the standard kernel function used in SVR and has good performance over a wide variety of tasks (Liu et al. 2014).

Training, validation, and testing results
Following standard practice in machine learning, we divided each of the samples of synthetic spectra (see Section 2.2) into disjoint training, validation, and testing datasets (Hastie et al. 2009): (i) The largest of the three datasets is the training set.For both types of NNs, we randomly assigned 70 percent of each sample of synthetic spectra to the training set; for SVR, this percentage is higher, at 90 percent, owing to the different validation methods used.For a machine learning algorithm, training consists of using the profiles of the training set to learn the model parameters that minimize  ().The performances that the algorithms achieve on the training set are prone to being exceedingly optimistic due to over-fitting.Overfitting occurs when an algorithm becomes over-specialized to the peculiarities of the training data (such as noise) and, as a result, generalizes poorly to new data.
(ii) The validation set is held back during training and is used to prevent over-fitting rather than to modify the adaptive parameters of the algorithms.For both types of NNs, we randomly assigned 15 percent of each sample of synthetic spectra to the validation set.Validation is performed by calculating  () on the validation set each time the model parameters are updated during training.If a NN is over-fitting,  () will decrease on the training set but increase on the validation set due to poor generalization to new data.If  () increases for six consecutive parameter updates on the validation set, the model is considered over-fitted and the training ends via a validation criterion known as early stopping (Bishop 2006).The number of parameter-updates performed during training before overfitting began is stored as the 'best epoch' parameter for later use.
For SVR, we used ten-fold cross validation whereby the training set is randomly divided into ten equally sized subsets called folds (Hastie et al. 2009).The SVR model is trained ten different times; each of these times a different fold is held back to be used as a validation set and the remaining nine folds are combined into a training set.Validation is performed by calculating  (), averaged over the ten validation sets, each time the model parameters are updated during training.The number of parameter-updates that minimizes  () is stored as the 'best epoch' parameter for later use.Ten-fold cross validation has the advantage that every profile in the training set contributes to both training and validation, but comes at the cost of significantly increased training times.This trade-off was ideal for SVR, which is relatively quick to train, but computationally prohibitive for the NNs.
(iii) The test set is held back during both training and validation and is used to test the trained algorithms' performance on previously unseen data.Any profile that did not end up in either the training or validation set was assigned to the test set; this amounted to 15 percent of each sample of synthetic spectra for both types of NNs and 10 percent for SVR.Testing is performed by calculating the RMSE performance of an algorithm that was trained for a number of parameter updates defined by its associated 'best epoch' parameter.
In this work, the performance of an algorithm on synthetic spectra (in Tables 2, and 3, and in Figure 3) always refers to the test set performance.
Tables 2 and 3 summarize the RMSE performance of the three machine learning algorithms on the test samples of synthetic spectra for each mass bin.There is a trend that the RMSE of all three machine learning algorithms tends to worsen as stellar mass increases until it plateaus at around 9-10 M ⊙ .The two regression algorithms outperformed the NNs tasked with classification for every stellar mass considered.The NNs tasked with regression outperformed SVR for masses between three and seven M ⊙ whereas SVR outperformed the NNs tasked with regression for eight and nine M ⊙ spectra; the performance of the two regression algorithms is approximately equal at 10 M ⊙ and above.

PERFORMANCE ON OBSERVED PROFILES
The results of the previous section are an encouraging proof-ofconcept.However, the method must still be shown to be effective on observational data.This section is concerned with testing the trained algorithms on an observed sample of H  spectra consisting of 92 of the 233 galactic Be stars considered by Zorec et al. (2016), which we call the Zorec sample following Sigut & Ghafourian (2023).The stars of the Zorec sample were chosen based on the public availability of spectra in the region of H . Spectra for 58 of the stars come from the BeSS spectral database5 http://basebe.obspm.fr,with the remaining 34 stars coming from the sample of Silaj et al. (2010) taken at the John Hall telescope at Lowell Observatory.Sources for individual stars can be found in Table 1 of Sigut & Ghafourian (2023).The spectra typically have S/N ∼ 100 and R ∼ 10 4 , with the latter matching the training resolution.Every star in the sample has an inclination angle determination based on gravitational darkening (Zorec et al. 2016) and H  profile fitting (Sigut & Ghafourian 2023).More information on the 92 stars of the Zorec sample can be found by consulting Sigut & Ghafourian (2023) and the references therein.
The Zorec et al. (2016) inclination angle determinations (referred to as  GD hereafter) are based on gravity darkening whereby very rapid stellar rotation results in a latitude-dependent  eff (Zeipel 1924), causing the spectrum to vary with . Figure 4 shows the inclination angle distribution of the Zorec sample as determined by both gravity darkening ( GD , left) and H  profile fitting ( H , right).Although both distributions peak near 60 • , there is a trend for  H to be higher for low inclinations and lower for high inclinations than  GD .As a comparison between  H and  GD on the stars of the Zorec sample has already been done Sigut & Ghafourian (2023) 6 , this section will focus on a comparison between the machine learning determinations of  and  H .
All three trained machine learning algorithms discussed in Section 5 were used to determine the inclination angles of the Zorec sample stars.These inclination angles were calibrated and then compared with  H to ascertain how effectively the different machine learning algorithms, trained on only synthetic spectra, can determine inclinations using observed spectra.To calibrate the machine learning inclinations, the mean of the distribution ( ML −  H  ) was set to zero by adding a constant offset to  ML .Here  ML refers to inclinations determined using each of the three algorithms, NNs tasked with regression, NNs tasked with classification, and SVR.These calibration offsets are given in Table 4 for each of the three machine learning algorithms.We note that two of these offsets, NNs tasked with regression and SVR, are quite small.NNs tasked with classification has the largest offset of −7.3 • ; however, even in this case, the offset

Additive Calibration Offsets
Table 4. Calibration offsets applied to the three different machine learning algorithms.These offsets were determined by forcing the mean of the distribution  ML −  H  to zero for the stars of the Zorec sample.
is still less than most of the 1 errors in  H as determined by Sigut & Ghafourian (2023).
Figure 5 plots the inclination angle determinations of the three types of algorithms: NNs tasked with regression ( NN ), NNs tasked with classification ( CNN ), and SVR ( SVR ), each versus the corresponding  H for each of the 92 stars of the Zorec sample.The 1 uncertainties in  H are as determined by Sigut & Ghafourian (2023).We have adopted the algorithms' RMSE performance on synthetic spectra of an equivalent mass star (see Tables 2 and 3) as their 1 uncertainties.The Pearson correlation coefficients, , were calculated for each of the three plots, as were least-squares fits to the data, including uncertainties from bootstrap Monte Carlo resampling done 100 times.Figure 6 shows the distribution of the residuals between the inclinations determined by each algorithm  H for the stars of the Zorec sample (i.e,  NN −  H  ,  CNN −  H  , and  SVR −  H  ).Theses residuals are binned in widths of 5 • .The blue curve in each plot shows a Gaussian distribution with the same mean and standard deviation as the distribution of the residuals for comparison.
Figures 5 and 6 show a clear hierarchy of performance; the NNs tasked with regression outperformed the NNs tasked with classification which in turn outperformed SVR.The NNs tasked with regression performed the best with a RMSE of 7.6 • and a correlation coefficient between  NN and  H of  = +0.91.Of the 92 stars, 78 (or 85 percent) were found to have ( NN −  H  ) consistent with zero within the errors.The NNs tasked with classification had an intermediary performance with a RMSE of 10.9 • and a correlation coefficient of  = +0.78;71 of the 92 stars (or 77 percent) were found to have ( CNN −  H  ) consistent with zero within the errors.Finally, SVR performed notably worse than the NNs. with a RMSE of 13.9 • .The correlation coefficient was found to be  = +0.64,and 47 of the 92 stars (or 51 percent) were found to have ( SVR −  H  ) consistent with zero within the errors.Thus NNs tasked with regression are the optimal choice, providing an accuracy comparable to the direct H  profile fitting of Sigut & Ghafourian (2023).

Low Mass Medium Mass
High Mass

Performance by mass and inclination
Overall, NNs tasked with regression best automates the method of  H profile fitting; however, it is possible that one of the other algorithms is better at determining  for Be stars with particular properties.Should this be the case, the best approach would not rely on a single "best" algorithm but would instead be an ensemble of two (or all three) algorithms whose outputs would be weighted by the properties of the star of interest.This subsection will look with more granularity at two such properties, stellar mass and inclination, with the goal of determining if either the NNs tasked with classification or SVR can outperform the NNs tasked with regression on particular mass and/or inclination ranges.
We have designated the stars of the Zorec sample as either low mass (3-5  ⊙ , N=36), medium mass (6-8  ⊙ , N=24), or high mass (9-14  ⊙ , N=32) and tabulated7 the algorithms' performances in Table 5.When tested on synthetic spectra (Section 5), there was a trend that the performance of all three algorithms tended to worsen as stellar mass increased until it plateaued around 9-10  ⊙ (see Tables 2 and 3).With observed spectra, NNs tasked with regression performed similarly on both the observed spectra of low (RMSE = 7.0 • ) and medium mass (RMSE = 6.7 • ) stars, with performance worsening for the high mass stars (RMSE = 8.8 • ).NNs tasked with classification performed worse on the observed spectra of low mass stars (RMSE = 9.1 • ) compared to medium masses (RMSE = 8.0 • ), with their performance worsening for high mass stars (RMSE = 14.2 • ).SVR performed best on the observed spectra of low mass stars (RMSE = 13.4 • ) and performed similarly on both medium (RMSE = 14.2 • ) and high mass stars (RMSE = 14.3 • ).Ultimately, however, the NNs tasked with regression outperformed both of the other algorithms on all three mass ranges suggesting that an ensemble of the algorithms is not warranted based on mass.
Turning now to inclination, we have designated the stars of the Zorec sample as either low  (0-30 • , N=7), medium  (30-60 • , N=41), or high  (60-90 • , N=44) and tabulated the three algorithms' performances in Table 6.The small sample size of low inclination stars is unfortunate but not surprising because of the () ∼ sin  for randomly oriented spin axes (Gray 2021).The NNs tasked with regression performed best on low  observed spectra (RMSE = 5.8   2 and 3).The dashed lines show the least squares fits to the data and the dotted lines show a slope of one for comparison.The Pearson correlation coefficients, as well as the slopes and intercepts of the least squares fits, can be found in the upper-left area of each plot.HD 58050 improves the performance considerably (RMSE = 7.8 • ).SVR performed worse on low  (RMSE = 19.3• ) than on medium  (RMSE = 12.7 • ) observed spectra with an intermediate performance for high mass stars (RMSE = 14.0 • ).The NNs tasked with regression outperformed both of the other algorithms on all three inclination ranges confirming that an ensemble of algorithms is not warranted for this task either.

Discussion
While NNs tasked with regression and SVR performed similarly on synthetic spectra (Section 5), NNs tasked with regression performed significantly better than SVR at automating the results of Sigut et al.
(2020)'s  H profile fitting method on observed Be star spectra.It is also interesting to note that the NNs tasked with classification, the worst performer on synthetic spectra, actually outperformed SVR on observed Be star spectra.With an RMSE of 7.6 • and a Pearson coefficient of  = +0.91, the NNs tasked with regression are the clear choice to automate the  H profile fitting method.The NNs tasked with regression's performance of RMSE = 7.6 • matches the average uncertainty in  H (Δ H  = 7.6 • ) on the Zorec sample, again, suggesting excellent agreement between the two methods.An ensemble of specialists was considered in Section 6.1, but ultimately rejected because the NNs tasked with regression had the best performance on every subdivision of mass and inclination considered.

THE NPOI SAMPLE
This section is concerned with testing the calibrated algorithms on an observational sample of 11 bright, nearby, Be stars taken by the Naval Precision Optical Interferometer (NPOI) (Armstrong et al. 1998)   is the minor axis, we can calculate the interferometrically determined inclination angle via  NPOI = cos −1 (/) on the simple geometric assumption that the disc is circular yet appears elliptical due to projection.While it is well established that Be star discs are thin (Porter & Rivinius 2003), they do have a small associated scale height.Therefore, interferometric observations of sufficient angular resolution can never yield  = 0 and we should take care to only use  NPOI for inclinations where it is appropriate.Sigut et al. (2020) examines when the cos −1 (/) relation is expected to fail and finds it to be when  NPOI > 80 • .None of the 11 Be stars in the NPOI sample have an inclination value outside this range, so it is used for all the determinations of  NPOI in this work.More information about the 11 stars in the NPOI sample can be found by consulting Table 7.
The main advantage of the NPOI sample is the high accuracy of the interferometrically-determined inclinations, which have average uncertainties that are about two and a half times smaller than the uncertainties of the inclinations determined by gravity darkening (5.5 • vs 14.5 • ).Furthermore, unlike  H profile fitting, the method of interferometry is entirely independent of the H  spectroscopy used to train the algorithms used in this work.The main disadvantage of the NPOI sample is its small size; as only the brightest and closest Be stars can be resolved interferometrically, the resulting sample of 11 profiles will necessarily be sensitive to outliers.
The three machine learning algorithms, trained on synthetic H  profiles, were tested on the NPOI sample of observed profiles and the results are compared with the inclination angle determinations made using both interferometry and H  profile fitting.As before, the performance of an algorithm is taken to be the RMSE between its determinations of  and either  NPOI or  H (considered separately).
Figure 7 shows a comparison of the inclination angle determinations of our three machine learning algorithms  NN ,  CNN , and  SVR with those of  NPOI .The NNs tasked with regression performed the best with an RMSE of 12.3 • .Only two of the 11 determinations of  differed by more than 10 • :  Cyg (33.7 • ) and  Cas (13.6 • ).The NNs tasked with classification fared a little worse with a RMSE of 14.2 • .Five of the 11 determinations of  differed by more than 10 • , with the worst cases being those of  Cyg (35.8 • ) and  Oph (14.5 • ).SVR had the worst performance with a RMSE of 19.0 • .Seven of the 11 determinations of  differed by more than 10 • with the worst cases being  Cyg (43.6 • ) and  Aqr (24.8 • ).Although the inclination determinations of all three algorithms were higher on average than  NPOI , the effect was small for SVR (   = +0.4• ) but larger for both types of NNs (   = +7.7 • ,     = +5.6 • ).
Figure 7 also shows a comparison of the inclination angle determinations of  NN ,  CNN , and  SVR with those of  H .The NNs tasked with regression performed the best with a RMSE of 8.5 • .Four of the 11 determinations of  differed by more than 10 • with the worst disagreements being  Cyg (14.3 • ) and 48 Per (14.2 • ).The NNs tasked with classification had a RMSE of 11.2 • .Four of the 11 determinations of  differed by more than 10 • with the worst cases being those of  Aqr (16.4 • ) and  Cyg (16.3 • ).SVR performed the worst with a RMSE of 15.8 • .Five of the 11 determinations of  differed by more than 10 • with the most discordant cases being  Aqr (26.8 • ) and  Cyg (24.2 • ).
The relative performance of the three algorithms for the NPOI sample was the same as that of the larger Zorec sample: NNs tasked with regression performed the best, followed by NNs tasked with classification, and then SVR.All three algorithms performed better when compared to  H than they did when compared to  NPOI .This is not surprising because the synthetic profiles used to train the algorithms come from the same libraries as those used for H  profile fitting.
It is worth highlighting the influence of  Cyg on the results for the NPOI sample as this star caused all three algorithms significant problems.The three worst discrepancies between an algorithm's determination of  and  NPOI all occur for  Cyg.When comparing an algorithm's determination of  with  H  Cyg is the largest or second largest discrepancy in all three cases.While  Cyg does have the smallest value of  NPOI in the NPOI sample (27.3 • ), the issue seems to be more complicated than the algorithms struggling with low inclinations because they performed well on both  Tau (33.0 • ) and  Psc (35.9 • ).When comparing with  NPOI , omitting  Cyg from the sample would cause the following performance changes: NNs tasked with regression would improve by about 40 percent (RMSE falling from 12.3 • to 7.2 • ), NNs tasked with classification to improve by about 30 percent (RMSE falling from 14.2 • to 9.6 • ), and SVR to improve by about 20 percent (RMSE falling from 19.0 • to 14.5 • ).With  Cyg omitted, these resulting performances are similar to the performances on the full Zorec sample (see Section 6); this may suggest the  Cyg determinations are anomalous.To resolve whether the inclination angle determinations for  Cyg really are anomalous, we would ideally like to include more stars in the NPOI sample.Unfortunately, optical interferometry is only possible on the nearest and brightest Be stars, and the question of whether  Cyg is an anomaly remains unanswered.Finally, when comparing against  H , the effect of omitting  Cyg from the NPOI sample results in a smaller performance increase of approximately 10 percent across all three algorithms.

CONCLUSIONS
Three supervised machine learning algorithms were trained exclusively on synthetic, Be star H  spectra computed with the Bedisk-Beray code suite to be able to extract an estimate of the central B star's inclination angle from a single, observed, H  flux profile and the star's spectral type.The algorithms tested were neural networks tasked with regression, neural networks tasked with classification, and support vector regression.When applied to a large ( ∼ 100) observed sample of Be star spectra (Sigut & Ghafourian 2023), neural networks tasked with regression performed best, yielding an inclination accuracy of RMSE = 7.6 • which is comparable to that obtained by direct model profile fitting of the H  line.Neural During the training and hyper-parameter optimizations, it was found that algorithms trained on low S/N = 25 H  profiles yielded much better results compared to those trained on higher S/N profiles when applied to the real, H  spectra of Be stars.We speculate that the wider variation among the lower S/N synthetic spectra, coupled with the large training samples, allowed the algorithms to better deal with natural variations in observed spectra that are not captured by the models.Training on synthetic data has the advantage that cases rare in the general population (in this case, low inclination systems as ()  = sin  ) can be incorporated into the training, as long as over-specialization of the algorithm to purely synthetic data can be avoided.An interesting avenue for future work is testing how the optimal S/N varies with network depth.Further along these lines, we are testing the viability of training deep, convolutional neural networks on images of H  line profiles (rather than 1D vectors of relative fluxes) to determine the inclination angles of observed Be stars.
Finally, future work will focus on further extending the quantitative analysis of Be star spectra by training neural networks to extract  sin  estimates from the relevant portions of Be star spectra by focusing on, for example, the observed profiles of He i 4471Å and Mg ii 4481Å.We feel that this problem is also very amenable training with synthetic line profiles generated with the Bedisk-Beray code suite.Combined with this future work, the inclination finding neural networks will allow equatorial stellar rotation velocities to be directly measured from moderate-to-high S/N spectra of sufficient resolution.

Figure 1 .
Figure 1.H  emission line profile computed by Beray for a 5 M ⊙ Be star at a resolution of R = 10, 000 viewed at the indicated inclination angles.The inclination angle is the angle between the star's rotation axis and the observer's line of sight as illustrated by the upper-left insert.Here the blue arrow points to the distant observer.Note the strong change in the line profile shape as the viewing angle goes from  = 10 • (a nearly face-on disc) to  = 90 • (an edge-on disc).The disc density parameters log  0 = −10.1, = 3.0, and  d = 65  * (See Eq. 1) were used in the Beray calculation.

Figure 2 .
Figure 2. Example synthetic H  line profiles computed with Beray for a 4 M ⊙ Be star.In each panel is shown the H line profile for a resolution of 10 4 for a S/N of 100 (thin black line) and 25 (thin grey line).Also shown in each panel is the reference photospheric H profile (thick grey line, same in each panel).In the top left of each figure is the average, absolute percentage difference (Δ) between the S/N = 100 profile and the reference photospheric profile with the average taken over the region ±1000 km s −1 ; this should be compared to the 3 percent threshold to keep the profile in the sample.The different H profiles are due to the different viewing inclinations (top right in each panel) and different disc parameters (bottom left, listed as log  0 , ,  d ).

Figure 3 .
Figure3.RMSE performance versus number of nodes per hidden layer plot used to optimize the hyper-parameters for the 4 M ⊙ sample for single (+) and dual (X) layer neural networks tasked with regression.Each data point represents the median performing member of a committee of five NNs.Hyperparameters of two hidden layers of six nodes each were found to be optimal for this sample by virtue of having the lowest RMSE.

Figure 4 .
Figure 4. Inclination angle histograms of the 92 Be stars of the Zorec sample as determined by gravity darkening (left) and  H profile fitting (right).The x-axis shows inclinations in nine bins of 10 • , the left y-axis shows the absolute number of stars, and the right y-axis shows the fractional number of stars.Note the different shapes of the distributions showing the trend of higher values of  H as compared to  GD at low inclinations and vice versa.
• ) and similarly on both medium (RMSE = 8.0 • ) and high  stars (RMSE = 7.5 • ).The NNs tasked with classification performed the worst on low  (RMSE = 23.8• ) observed spectra and similarly on both medium (RMSE = 9.2 • ) and high  stars (RMSE = 9.0 • ).The very poor performance of the NNs tasked with classification on low  observed spectra is the result of a small sample size ( = 7) combined with the worst determination of  of any algorithm on the Zorec sample for the star HD 58050 (with  CNN −  H  = +63.1 • ); omitting

Figure 5 .
Figure 5. Inclinations (in degrees)    (left),     (middle), and    (right) versus  H  for the 92 stars of the Zorec sample.The horizontal error bars show the 1 uncertainties determined by Sigut & Ghafourian (2023) and the vertical error bars show the test set RMSEs (see Tables2 and 3).The dashed lines show the least squares fits to the data and the dotted lines show a slope of one for comparison.The Pearson correlation coefficients, as well as the slopes and intercepts of the least squares fits, can be found in the upper-left area of each plot.

Figure 6 .
Figure 6.Residual histograms of  NN −  H  (top),  CNN −  H  (middle), and  SVR −  H  (bottom) for the Zorec sample.The x-axis shows residuals in bins of 5 • and the y-axis shows the fractional number of stars per bin.The blue curve is a Gaussian with the same mean and standard deviation as the distribution of the residuals (shown upper right).that the means were calibrated to be 0.0 • (Table4) and one star, HD58050, is missing from the middle panel to improve readability as  CNN −  H  was very large at +63.1 • .

Figure 7 .
Figure 7. Panel plot of the 11 stars of the NPOI sample with inclination determinations from  GD (magenta),  H (green),  NPOI (cyan),  SVR (black),  CNN (red), and  NN (blue).Values have been staggered for readability.Also shown in each panel are output histograms of the NNs tasked with classification (shown as grey rectangles whose heights sum to unity) giving the probability assigned to each inclination class.The error bars show 1 uncertainties.Note that the values of  CNN are expected to be lower than the centre of their associated histograms because they have been calibrated (see Section 6).Finally,  Dra does not have a determination of  GD .

Table 2 .
The RMSE performance of regression and classification neural networks trained on R = 10, 000 and S/N = 25 synthetic line profiles, for each central B star mass considered.The optimal hyper-parameters determined in Section 4.1, the number of nodes per layer (  ) and the number of hidden layers (  ), are also shown.

Table 3 .
The RMSE performance of SVR trained on R = 10, 000 and S/N = 25 synthetic line profiles, for each central B star mass considered.The optimal hyper-parameters determined in Section 4.2, the epsilon insensitive region ( ), the regularization constant (C), and the kernel scale (KS), are also shown.

Table 5 .
RMSE performance, in degrees, of the three algorithms on the 92 observed Be stars of the Zorec sample subdivided by mass into low, medium, and high mass stars.

Table 6 .
• ) Medium i (30-60 • ) High i (60-90 • ) RMSE performance, in degrees, of the three algorithms on the 92 observed Be stars of the Zorec sample subdivided by inclination angle into low, medium, and high inclinations.It is worth noting that for NNs tasked with classification, the combination of a small sample size (N = 7) and a very poor determination of the star HD 58050 ( CNN −  H  = +63.1 • ) has resulted in a very poor performance on low inclination stars which may be misleading; included in parentheses, is the performance with HD 58050 omitted.

Table 7 .
Zorec et al. (2016)etric, H  profile fitting, and gravity darkening characteristics for the 11 Be stars in the NPOI sample.Note that  Dra lacks an associated value of  GD .Additional information on  NPOI and  H  can be found inSigut et al. (2020); additional information on  GD can be found inZorec et al. (2016).
Silaj et al. (2010)ilable fromSilaj et al. (2010).The NPOI observations feature the spatially resolved circumstellar discs of their associated Be stars which allows for accurate determinations of their inclination angles.If  is the measured major axis of the disc and