Spectroscopic analysis of hot, massive stars in large spectroscopic surveys with de-idealised models

Upcoming large-scale spectroscopic surveys with e.g. WEAVE and 4MOST will provide thousands of spectra of massive stars, which need to be analysed in an efficient and homogeneous way. Usually, studies of massive stars are limited to samples of a few hundred objects which pushes current spectroscopic analysis tools to their limits because visual inspection is necessary to verify the spectroscopic fit. Often uncertainties are only estimated rather than derived and prior information cannot be incorporated without a Bayesian approach. In addition, uncertainties of stellar atmospheres and radiative transfer codes are not considered as a result of simplified, inaccurate or incomplete/missing physics or, in short, idealised physical models. Here, we address the question of"How to compare an idealised model of complex objects to real data?"with an empirical Bayesian approach and maximum a {\it posterior} approximations. We focus on application to large scale optical spectroscopic studies of complex astrophysical objects like stars. More specifically, we test and verify our methodology on samples of OB stars in 30 Doradus region of the Large Magellanic Clouds using a grid of FASTWIND model atmospheres. Our spectroscopic model de-idealisation analysis pipeline takes advantage of the statistics that large samples provide by determining the model error to account for the idealised stellar atmosphere models, which are included into the error budget. The pipeline performs well over a wide parameter space and derives robust stellar parameters with representative uncertainties.


INTRODUCTION
With the advent of large spectroscopic surveys using instruments such as WEAVE (Jin et al. 2022) and 4MOST (de Jong et al. 2019) tens of thousands spectra of massive stars (≳ 10 ⊙ ) will be obtained, which will need to be analysed in a homogeneous and efficient way (e.g.Cioni et al. 2019;Bensby et al. 2019;Chiappini et al. 2019).Current pipelines of large spectroscopic surveys are largely designed for FGK stars, which are either data driven (e.g.Ness et al. 2015;Guiglion et al. 2020) or model driven (e.g.Allende-Prieto & Apogee Team 2015; Ting et al. 2019).Traditionally, and still widely performed today, massive stars have been analysed by "eye", which limits the sample size to < 100 massive stars.In addition, stellar parameters as well as uncertainties are estimated rather than determined.Larger sample of a couple of hundreds of stars are usually analysed with a  2 -minimisation algorithm, where the final fit often needs to be visually verified depending on the goodness-of-fit.
Multi-dimensional probability distribution functions are obtained depending on the number of free parameters and uncertainties are then defined on confidence intervals rather than Gaussian standard deviations.Those uncertainties can be highly asymmetric and very ★ E-mail: j.m.bestenlehner@sheffield.ac.uk large in the case of degenerated parameters.In the massive star community there are 2 main flavours of  2 -minimisation algorithms, grid based (e.g.Simón-Díaz et al. 2011;Castro et al. 2012;Bestenlehner et al. 2014) and Genetic Algorithms on the basis of natural selection (e.g.Mokiem et al. 2007;Brands et al. 2022).All those algorithms use a pre-defined selection of spectral line regions for their analysis.
Theoretical models of complex physical systems are necessarily idealisations.The implied simplifications allow us to focus the view on the essential physics, to keep the model computationally feasible, and to investigate systems for which not all of their components are perfectly known.In contrast to solar-like stars, massive stars have strong stellar winds, which influence the structure of the stellar atmospheres (line blanketing) and a pseudo photosphere at optical depth 2/3, which is located in the stellar winds.The inclusion of linedriven winds into stellar atmosphere models requires the assumption of spherical geometry in the co-moving frame of the star (expanding atmosphere).In addition to effective temperature and surface gravity, mass-loss rate, velocity law, terminal velocity, wind inhomogeneity and line blanketing need to be included into the stellar atmosphere code.The stellar atmosphere of massive stars significantly depart from local thermal equilibrium (LTE) and therefore must be computed in fully non-LTE, which is computational expensive (e.g Santolaya-Rey et al. 1997; Hillier & Miller 1998;Gräfener et al. 2002).This limits state-of-the-art stellar atmosphere codes for hot, massive stars to 1D.
When faced to real data of the actual system these models often perform insufficiently when judged on a goodness-of-fit basis.The difference between real and modelled data can easily exceed the error budget of the measurement.The reason is that the idealized models do not capture all aspects of the real systems, but they are still present on the data.To discriminate these missing aspects from measurement errors or noise, we use the term model errors to describe these imperfections of our theoretical description (Oberpriller & Enßlin 2018).
Often one tries to determine the model parameters from the data via a likelihood based methodology such as  2 -minimisation, maximum likelihood or Bayesian parameter estimation that uses the measurement uncertainties as a metric in data space.The resulting parameter estimates can be strongly distorted by the desire of the method to minimize all apparent differences between predicted and real data, indifferently if these are due to measurement or model errors.
Thus the model errors should be included into the error budget of the parameter estimation.This would require that we have a model for the not yet captured aspects of the system or at least for the model errors these produce.To do this thoroughly we would need to undertake a case by case analysis of the missing physics.
However, this would be quite impractical in cases of the spectroscopic analysis of complex astrophysical objects.Instead, we want to construct a plausible, but by no means perfect, effective description of the model errors.In the construction of the de-idealisation model, we will follow a pragmatic route, but try to indicate the assumptions, approximations and simplifications made.
In Section 2 we introduce the methodology, which is used in our spectroscopic analysis pipeline (Section 3).Using grids of stellar atmospheres (Section 3.3) the pipeline is applied to observational data (Section 3.4).The results are discussed in Section 4. We close with a brief conclusion and outlook (Section 5).

Data model
We assume we have a set of objects (e.g.stars, galaxies, ...: labelled by  ∈ {1, 2, . . .n}) with observable signals  () = ( ()  )  over some coordinate , e.g. the emitted spectral energy distribution )  as a function of the wavelength .These signals are measured with a linearly responding instrument (response matrix ) with additive Gaussian noise () according to the measurement equation  () =  ()  () +  () . (1) The individual elements of the data vector () for the -th object are then given by where in our spectroscopic cases  () is the -th bandpass of our -th observation as a function of wavelength  = .Spectroscopic, colour filter, and bolometric measurements can thereby be treated with the same formalism and even combined into a single data vector and response matrix.In addition, we do not require that all objects are observed in the same way by keeping the response matrix dependent on the object index .In this way, the formalism permits to combine heterogeneous observations.

Model errors
Now we assume that some idealised model for our objects exists that predicts a specific theoretical signal  [ ] given a set of unknown model parameters .These parameters should be physical quantities like surface gravity, radius, effective temperature, stellar wind properties or chemical composition of the object, so that well defined values  () exist for each object1 .Those idealised models can be generated with stellar atmosphere and radiative transfer codes.Knowing these parameters for each object is the primary goal of the inference.In principle, the relation between parameters and signal could be stochastic, but for simplicity we concentrate on deterministic models.The de-idealisation model we develop should serve as an effective description for the remaining stochasticity.
The idealized model captures hopefully the dominant properties of the system but certainly not all aspects.Therefore the real signal  () of an object will deviate by an unknown stochastic component  () , the model error, so that  () =  [   ] +  () . (4) The aim of model de-idealization is to find an appropriate stochastic model P (| ) =  P ( () |  () ) for the model errors.With such, the likelihood becomes where ∫ D denotes a phase space integral for the model errors.In our case, we want to restrict ourselves to using the simplest possible representation of the model uncertainties.This means that we take only the first and second moments of and assume the fluctuations of different objects to be independent, ⟨ ()  (  ) † ⟩ (| ) =  ()  (  ) † +     () .The probability distribution that represents mean and variance without further information on higher order correlations naturally is a Gaussian with this mean and variance.Among all possible probability distributions with given mean and variance it has a maximal entropy (e.g.Jaynes & Bretthorst 2003;Caticha 2008).By adopting a Gaussian for the model model de-idealisation 3 errors, the least amount of spurious information is entered into the inference system in case only  and  = ⟨( − ) ( − ) † ⟩ (| ) are considered.This does not mean that the model error statistics is a Gaussian in reality.It just means that higher order correlations are ignored for the time being.Taking such higher order correlations into account would most certainly improve the method, but is left for future work.
The Gaussianity of measurement noise and modelling error description permits us to integrate Equation ( 5) analytically leading to with  =  +    † being the combined error covariance.

Implicit hyperprior
The de-idealisation model requires that the auxiliary parameters  and  are determined as well, or better marginalised over.This requires that we specify our prior knowledge on these parameters, P (, | ), which is a very problem specific task.Using such a hyperprior we could then derive an auxiliary parameter marginalised estimator for our desired model parameters .A good, but numerically very expensive approach would be to sample over the joint space of model and auxiliary parameters, , , and , for example using the Gibbs sampling method (e.g.Wandelt et al. 2004;Jasche et al. 2010).
In order to have a generic, affordable and pragmatic method we introduce a number of approximations and simplifications.The first is that we replace the auxiliary parameter marginalisation by an estimation using the following approximation P (, ) = ∫ D ∫ D P (| , , ) P (, | ) P ( ) with  ★ and  ★ being suitable estimates of the auxiliary parameters and P ( ) the parameter prior.Instead of constructing these point estimators using the so far unspecified and problem specific priors we propose to pragmatically specify them with an educated ad-hoc construction.
The idea is to assume for a moment that a correct model parameter classification  () for any object exists, which has later on to be estimated self consistently with all the other estimates via iteration.The difference of the signals reconstructed from the data  () = ⟨ () ⟩ ( () |  () ,  () ) and the one predicted from the model  [  () ] plus the current guesses for  ★ , can be analysed to provide information on  and .The signal reconstruction can be done via a Wiener filter, since this is optimal in case of a linear measurement and Gaussian noise model Equation ( 7).
For the signal difference this is where  () is the Wiener variance or uncertainty of the reconstruction.For the current case we have used some guesses for  ★ and  ★ that need to be updated accordingly to the information contained in the statistics of the signal differences  () .To do so, we introduce a suitable proximity measure in parameter space,   ′ = prox(  () ,  ( ′ ) ), that indicates for an object  how much another objects  ′ can be used to learn about the model error statistics of .A naive choice would be   ′ = 1 always, assuming that the model error statistics is everywhere the same in the model parameter space.A more sophisticated method would partition the parameter space into characteristic regions (e.g.corresponding to the different known star and galaxy classes in our spectroscopic example) and to set   ′ = 1 or   ′ = 0 in case  and  ′ belong to the same or different classes.Even more sophisticated proximity weighting schemes can be imagined with   ′ = 1/(1+dist(  () ,  ( ′ ) )) using some distance measure in parameter space.However, in our case we group objects together with respect to their main line diagnostics and analyse them in the same batch.
Given such a scheme, the update operation for  ★ and  ★ are 11) The  ★ update operation is a simple absorption of any systematic difference into the mean component of the model error  ★ .For an initial step it is better to set  ()★ iteration#0 = 0 as it would absorb any differences between data and model, even though the model might be not representative of the objects.However, we find that the the best choice for  is to represent non-stellar features, which are not part of the model, like nebular lines, interstellar bands or telluric lines.
The  ★ update incorporates the variance in the signal difference reconstructions as well as their Wiener variances.The latter express the level of missing fluctuations of the Wiener filter reconstruction.The correction with the Wiener variance is done in analogy to the critical filter methodology developed in (Enßlin & Frommert 2011;Enßlin & Weig 2010), where a similar variance estimation was derived under a non-informative prior on the power spectrum of a statistical homogeneous random process.For this study we intuitively extended this to non-diagonal covariance structures.This means we adopt implicitly a non-informative hyperprior for the model error statistics as summarised by  and .
The logic behind this implicit prior is as follows.Assuming we would have managed to specify an appropriate non-informative hyperprior for  and .From this we would derive some recipe for our point estimates  ★ and  ★ using some approximations.The resulting recipe should not incorporate hidden spurious assumptions.The  ★ and  ★ estimates therefore can only be build from elements like  ( ′ ) ,  ( ′ )  ( ′ ) † , and  ( ′ ) , the latter being a summary of the former elements.Requiring the estimators to be unbiased and to enclose the mentioned critical filter as a limiting case, which is an unbiased power spectrum estimator, then fixes the numerical coefficients in front of  ( ′ ) ,  ( ′ )  ( ′ ) † , and  ( ′ ) in Equation ( 11) to unity.
We admit that there are some frequentist elements in this derivation, since it postulates an estimator and argues for its appropriateness using bias arguments.We hope that it will be replaced by a more stringent calculation once a suitable non-informative hyperprior has been specified.For the time being, we use it in iteration with model parameter estimation.

Method summary
The combined de-idealized model parameter estimation method comprises the following steps: (i) Specify the weighting scheme   ′ = prox(  () ,  ( ′ ) ) that determines how similar the model parameters of two objects have to be so that their model error statistics can be assumed to be similar.
(ii) Adopt some initial guess or default values for the model parameters  () and mode error parameters  ()★ and  ()★ .A naive choice could be  () =  for some plausible central  within the physically permitted parameter space,  ()★ = 0, and (iii) Calculate  () and  () according to Equation ( 10) for all objects.
(v) Update the model parameters  () of all objects using the combined likelihoods Equation ( 7) that incorporates measurement and model errors.
The resulting estimate will be similar to a joint maximum a posteriori estimation of the model and model error parameters which are known to perform worse than a properly marginalised with respect to the model error parameter posterior mean estimator.However, given the large number of degrees of freedom in the signal space (e.g. a highly resolved emission spectrum of a star or galaxy), such optimal estimators can be extremely expensive computationally.The proposed method might therefore be favourable in many circumstances, despite its approximative and partly ad hoc nature.In order to be explicit about the assumptions and approximations adopted we provide an overview below.This list should help to judge the range of applicability and to find possible improvements of the proposed method.In particular we assume (i) that the measurement noise is independent for the different objects and independent of their signals, it has Gaussian statistics with known covariance.
(ii) a linear and known measurement response.
(iii) that the model error knowledge can be approximated by a multivariate Gaussian in signal space.
(iv) that regions in parameter space exist and are known which have similar model error statistics as parametrized by a mean and variance.
(v) no prior knowledge on the values of this model error mean and variances.
(vi) that an iterated point estimate of all involved parameters leads to a reasonable approximative solution of the joint inference problem.

SPECTROSCOPIC ANALYSIS PIPELINE
The pipeline has been developed using python3 with commonly used libraries such as numpy, scipy, pandas, multiprocessing and ctypes plus astropy.io to read fits files.Using commonly and maintained libraries will ensure that the pipeline is easy to maintain and should be usable over a long period of time.The following section outlines the required pre-processing steps ( § 3.1), brief overview of the pipeline implementation ( § 3.2) and, description of the grid of synthetic model spectra ( § 3.3) and the observational data to verify the pipeline ( § 3.4).

Pre-processing
The pipeline requires that all observational data are read in at the beginning as the spectra are required to be analysed all at once to determine iteratively the stellar parameters and model uncertainties ( § 2.4).After a spectrum is loaded it is corrected for the potential radial velocity shift, transformed to the wavelength sampling of the synthetic spectra grid ( § 3.3) and decomposed into principal components using the decomposition matrix calculated from the synthetic spectra to reduce the memory usage and speed up the analysis.This is essential, when analysing sample of spectra of more than a few hundred sources.
The decomposed grid of synthetic spectra is loaded into sharedmemory for parallelisation purposes.The spectra are pre-convolved with combinations of varying broadening parameters of projected rotational velocity ( eq sin ) and macro-turbulent velocity ( mac ).The synthetic grid preparation is also a pre-processing step, which is laid out in § 3.3.If the sample is small and/or sufficient random access memory of the computing system is available, the grid can be convolved with the star specific broadening parameters.Even though the convolution is applied utilising the fast Fourier transformation library of scipy, this could increase the pre-processing time scale up to a few hours per star depending on the size of the synthetic spectra grid (≫ 100 000).
On a standard Desktop computer 1000 spectra can be analysed in less than half a day.Larger set of samples we would advise to sort them into groups of similar objects, which will also lead to more representative model error at a specific parameter space when testing implemented physics or verifying assumptions in the theoretical model.

Analysis pipeline
After pre-processing the observational data including observational error spectra and loading the synthetic grid of model spectra the pipeline is set up according to Equation (1) with model deidealisation Equation (4) assuming a Gaussian noise model for the observational error (3).We are interested in the posterior distribution of the signal given the data (P (|)) and apply the Bayes theorem with likelihood P (|), prior P () and evidence P () to use, first, the likelihood P (|) and, second, apply the Wiener filter ( § 2.3) to reconstruct the signal ((|) → (|)).We modified the likelihood as described in § 2.3 (P (|) ⇒ P (| )), probability of the data  given the stellar parameters .
The best model is determined from a  2 -minimisation Ansatz with model error variance matrix  (Equation 7) to maximise the modified likelihood P (| ): For the mean and model error variance matrix  and , we find that it is better to set  = 0 and multiply  by a factor  = [10 −5 , 0.35, 0.7, 1.0, 1.0], which increases after each iteration, to avoid that bad spectroscopic fits have significant impact on  and  and therefore the determination of the best model. could be also set equal to non-stellar features such as telluric bands, diffuse interstellar bands and prominent interstellar lines like Ca H and K.However, interstellar contribution can significantly vary with the line of side while telluric bands change with atmospheric conditions and airmass.In cases, where the none stellar features vary on a star by star basis, those contribution can be combined with the observational error to give less weight to those spectral regions.
′ (Equation 11) can contain any prior information P ( ), e.g.parameter space of similar objects, stellar structure models or population synthesis predictions.In the current implementation we use a flat prior (  ′ = 1).
The pipeline returns a multi-dimensional posterior distribution function for each star while  is the same for all sources analysed in one batch.Parameters and their uncertainties are determine by defining confidence intervals (Fig. 1).To increase the accuracy the posterior distribution function can be multiplied by an appropriate prior.More details on the implementation can be found in the source code of the pipeline.

Stellar atmosphere grid
The grid of synthetic model spectra was computed with the non-LTE stellar atmosphere and radiative transfer code  Low temperature and high surface gravity models would be better calculated with a plane-parallel code without stellar winds.
be of great value, if a careful telluric correction has been performed.With data from 4MOST (de Jong et al. 2019) and in particular 4MOST/1001MC (Cioni et al. 2019) we are going to verify the FASTWIND LINES-list beyond the well tested wavelength range utilising the pipeline of this study.
The grid covers the parameter space for the effective temperature from  eff = 17 800 K (log  eff = 4.25) to 56 200 K (log  eff = 4.75), surface gravity log /(g cm −2 ) = 2.5 to 4.5, transformed mass-loss rate (e.g.Bestenlehner et al. 2014) from log  t /(M ⊙ yr −1 ) = −6.5 to −5.0 assuming a constant radius and helium abundances by number from  = 0.07 to 0.15. 3 combination of CNO abundances representing LMC baseline abundances plus semi and fully-processed CNO composition due to the CNO-cycle according to 60  ⊙ evolutionary track by Brott et al. (2011).Figure 2 shows the parameter space of the grid with respect to log  and  eff .
The high temperature, low surface gravity regime (upper left area) is unpopulated as those models are unstable as they exceed the Eddington limit at an Eddington parameter Γ e ≈ 0.7 considering only the electron opacity  e .Low temperature and high surface gravity models can be computed with FASTWIND,  eff between 17 800 K (log  eff = 4.25) and 21 400 K (log  eff = 4.33) and log  > 4.0, but a significantly larger number of depth points or high mass-loss rates would be required to make them converge.The computational timescale exceeds 1 day in contrast to less than an hour.However, enhanced mass-loss rates are only observed, if the star is to the proximity of the Eddington limit (Bestenlehner et al. 2014).Therefore, those low temperature and high surface gravity stellar atmosphere models are better calculated with a plane-parallel geometry code without stellar winds (e.g.TLUSTY Hubeny & Lanz 1995;Lanz & Hubeny 2007).
The clumping factor was set to  cl = 1, i.e. a homogeneous stellar wind is adopted.We assumed a wind acceleration parameter of  = 1.0 and a fixed micro turbulence velocity of 10 km/s.The terminal velocity was calculated based on log  and stellar radius of the model using the escape-terminal velocity relation of  esc / ∞ = 2.6 for models hotter than 21 000 K and  esc / ∞ = 1.3 for cooler models (Lamers et al. 1995).In total we computed of the order ≲ 150 000 stellar atmospheres.For around ∼ 20% of those models the atmospheric structure, ionisation balance and/or radiation field failed to converge properly leading to negative fluxes, discontinuities or even failed when the spectral lines were synthesised.
The grid was then convolved with a macro-turbulent velocity of  mac = 20 km/s and varying projected rotational velocity  sin  = [0, 20,50,100,150,200,250,300,400] km/s assuming  eq sin  is the dominant broadening mechanism, which is a reasonable assumption given the spectral resolution of the observational data ( § 3.4) and that typical  mac are of the order of a few 10s km/s.This results in a grid of ≲ 1 100 000 synthetic spectra, which has been used to compute the decomposition matrix to decomposed the grid into its principal components reducing the size by a factor ∼ 200.The decomposition matrix is also used to decompose the observational data ( § 3.1).

Observational data
To validate the methodology of the pipeline we used the VLT/FLAMES data of VLT/FLAMES Tarantula survey (VFTS Evans et al. 2011)  The second data set we used are the VLT/MUSE observation of ∼ 250 OB stars from Castro et al. (2018).The VLT/MUSE data cover the wavelength range between 4600 to 9300 Å at spectral resolution of 2000 to 4000.The normalisation of the spectra was fully automated simulating the work-flow of the spectroscopic analysis of large datasets.35 stars are in common with VFTS, which will allow us to test the reliability of line diagnostics towards the red of H  ( § 3.3) and the automated normalisation routine.

VLT/FLAMES: Evans et al. (2011)
We analysed all 240 O-type stars including stars with low SNR and/or strong nebular contamination (e.g.Fig. A1) while Bestenlehner et al. (2014); Sabín-Sanjulián et al. (2014, 2017); Ramírez-Agudelo et al. (2017) provided reliable results for 173 out of 240 sources.In Fig. 3 we show the spectroscopic fits of a representative early and late O star (VFTS-072 and 076).The shaded error area in Fig. 3 reveal where a general mismatch between model and observations occurs and is the square-root of the diagonal elements of the model-error uncertaintymatrix.In particular the line centres of the Balmer and He i-ii lines seemed to be poorly fitted as a results of nebular lines, inaccurate determined line-broadening, insufficient grid resolution and range for helium abundances, fixed micro-turbulent velocity or shape of line profiles.We are also able to locate spectra lines, which are potentially not included into the FASTWIND LINES-list or require improved atomic data (e.g.Si iii 4552.6,4567.8 and 4574.7).Overall the spectroscopic fit is good for the synthetic spectra (red solid line) to the observations (blue solid line).
Figure 1 visualises the probabilities in the log − eff plane.VFTS-072 (left panel) shows within 2-sigma a dual solution (∼49 000 and ∼46 000 K).At ∼ 45 000 K the He i disappear, but the N iv and v lines sufficiently contributed to the  2 so that the correct solutions around 49 000 K has also the highest probabilities.By looking at the 3-sigma contour we notice a degeneracy between log  −  eff due to the proximity of VFTS-072 to the Eddington limit.In contrast, the heat map of VFTS-076 (right panel) is well centred on a specific log − eff region.However, a slightly higher surface gravity could be probable within 2-sigma, which is the results of a degeneracy between surface gravity and mass-loss rates.High mass-loss rates fill in the wings of the Balmer lines mimicking a lower surface gravity.

Model-error uncertainty-matrix
The model-error uncertainty-matrix is symmetric ( T = ) and shows correlation between wavelength or pixel regions.An example is given in the Appendix (Table A2), where we reduced the rank of the matrix from 11840×11840 to 37×37 for visualisation purposes.The strongest correlations are between the Balmer lines, which are the most prominent lines in O-type stars.On the other hand He i is present in mid to late O stars, while He ii lines are only strong in early O stars.Therefore, to visualise the model-error matrix and its correlations we plot in Fig. 4 the model uncertainties for wavelengths of H  , He i 4471 and He ii 4686.
H  and He ii 4686 are anti-correlated with each other, although we amplified the uncertainties for He ii 4686 by a factor of 10.Wavelength regions of Balmer lines are a blend of hydrogen and He ii lines.With increasing temperature the He ii lines are stronger while the Balmer lines become weaker.In addition, the line strength between hydrogen and helium also determines their abundances, e.g.overall stronger helium lines with respect to hydrogen lines mean lower hydrogen abundances.
He i 4471 (amplified by a factor of 25) is correlated with He ii 4686 for helium lines, but anti-correlated for H  and higher order Balmer lines following the trend of H  .He i 4471 show stronger correlations with Si iii and C iii, which are only present in late O stars, where He i lines are strongest as well.Under this supposition we would expect that we observe a strong correlation between He ii and the higher ionised N iv and v, which seems not to be the case.A reason might be that the number of early O stars is too small due to the stellar mass-function to significantly contribute to the modelerror (< 5%).Grouping similar objects together is advisable when testing model assumptions in stellar atmosphere codes.

Challenges
The examples shown in Fig. 3 show low and modest nebular contamination and the pipeline derives results in good agreement with VFTS.However, the pipeline has difficulties when spectra show strong nebular lines.In the case of VFTS-142 (Fig. A1) the temperature is still reasonably well reproduced, but the surface gravity is by ∼0.3 dex too low.If the spectra is dominated by nebular lines, the pipelines will fail, e.g.VFTS 410 (Fig. A2).Often nebular lines are clipped, but in the case of VFTS-410 only few diagnostic lines would remain.Even though we perform a single star analysis, for double-lined spectroscopic binaries (SB2s) the pipeline is able to fairly fit the primary component, but struggles with the mass-loss rate and helium abundances due to the contribution of the colliding wind region of VFTS-527 (Fig. A3, Taylor et al. 2011).
The goodness-of-fit is usually evaluated by calculating the reduced-chi-square (RCS), which uses in our case the diagonal of the error co-variance matrix.Due to the nebular contamination and diffuse interstellar bands none of our fits had a RCS close to 1.To visualise how well the pipeline performs we compare our results versus tailored analysis of VFTS targets in Fig. 5. Our results agree well with Bestenlehner et al. (2014);Sabín-Sanjulián et al. (2014, 2017); Ramírez-Agudelo et al. (2017) for fits with RCS < 100.Effective temperatures show a tighter relation than the surface gravity.The determination of surface gravity is based on the wings of the Balmer lines, which is influenced by the line broadening and therefore how well  mac and  sin  are determined.If the spectroscopic fit is poor, we derive systematically lower temperatures and surface gravities.Low temperature and gravity models have H  in emission to somehow fit the none-stellar H  nebular line while higher order Balmer lines remain in absorption.Overall there is good agreement considering that our analysis took less than 30 min while the VFTS analysis involved 3 PhD theses.
Looking at the error bars the pipeline obtains systematically larger uncertainties in part due to the inclusion of the model error but mostly as a result of the interpretation of the 4D posterior distribution function (PDF), which includes the degeneracies between  eff , log ,  t and  .The derived errors might be larger, but they are a complete representation of the true uncertainties.A representative prior could increase the accuracy, but might introduce additional biases (Bestenlehner et al. 2020).Temperature uncertainties are systematically larger in the region around 45,000 K as a result of the weakness of He i lines and the ionisation balance is not based on He i-ii, but on the metal ions N iii-iv-v.Nitrogen lines in early O star are relatively weak compared to He lines and therefore contributed little to the global  2 without specific weighting of spectral lines.This can lead to an overall lower RCS, but inaccurate temperature determination (red outliers in Fig. 5).A similar behaviours occurs in the transition from late O to early B stars due to the weakness of He ii lines, where the main temperature diagnostics are Si iii and iv lines.So a careful weighting scheme should be a very promising way for optimising the pipeline by increasing the accuracy while at the same time reducing the degeneracies between parameters.

VLT/MUSE: Castro et al. (2018)
Figure 6 shows the spectroscopic fit of an Of supergiant and a B supergiant with Δ eff ≈ 25 000 K and Δ log  ≳ 2.5 dex.This highlights that stars covering a large spectral type range can be successfully and reliably analysed with a single pipeline set up at the same time.However, both stars would not be considered as similar, which has implication on the model error .Such a model error is averaged over a wide parameter space and is probably not very helpful, when testing specific physics (e.g.stellar wind physics or atomic data) in the model.Similar to the VFTS data the pipeline performs not well for low signal to noise spectra (S/N ≲ 10 to 15), spectra with strong nebular lines and spectroscopic binaries/multiples.In Fig. 7 we compare our results with those from Castro et al. (2021) which is based on the ionisation balance of selected HeI and HeII and the wing of   .In contrast, we used all H, He plus CNO and Si metal lines available in the VLT/MUSE wavelength range.The left panel compares effective temperatures, which shows a large scatter, but mostly agrees within their large uncertainties.Above 45 000 K, when He i disappears or weakly present in the spectra, the temperature needs to be derive based on the ionisation balance of metal lines.In the wavelength range of MUSE we have C iii-iv and N iii-iv.While C iv and N iv are located in relative clean areas of the spectra, the C iii and N iii are often found in the range of telluric bands or near the Paschen jump, where we have issues with the normalisation (Fig. 6).B type stars have per definition no He ii present in their spectra and the temperature is based in the case of early B stars on the ionisation balance of Si iii-iv lines.There is a reasonable number of lines in the MUSE range, but the temperature determination suffers with the presence of nebular lines or low SNR spectra.
The right panel of Fig. 7 compares surface gravities which show an even larger scatter and uncertainties.Even though we utilised the Paschen lines as well, there are 2 potential caveats, first, the normalisation near the Paschen jump is not straightforward as the lines overlap and therefore no continuum, second, the degree of overlapping depends not only on log  but also on the line broadening due to the narrowness of the higher order Paschen lines, which is only approximately determined during the fitting process.This results in a degeneracy between log  and  eq sin .Surface gravities cluster in the range of log  = 3.5 to 4.5, which is expected for a young stellar population of 1 to 2 Myr largely consisting of dwarfs and giants in the proximity of R136 (Bestenlehner et al. 2020).
To better quantify how reliable the analysis based on the VLT/MUSE data is, we compare our results to VFTS.35 VLT/MUSE targets are in common with VFTS (Bestenlehner et al. 2014;Sabín-Sanjulián et al. 2014, 2017;Ramírez-Agudelo et al. 2017) and the comparison is shown in Fig. 8. Uncertainties are systematically larger.Effective temperatures agree within their 1 uncertainties for 26 out 35 stars (left panel) with differences largely as a result of the cleanliness and quality of spectra.Surface gravities are in consent for only half of the sample, which is expected due to the challenges of the Paschen lines.Overall the agreement is reasonable considering the difficulties in the analysing of the MUSE data.
To place the stars into the Hertzprung-Russell diagram (HRD) we derived bolometric luminosity on the basis of optical  from Selman et al. (1999) and near-infrared  s photometry from 2MASS ((2 Micron All Sky Survey)) and Vista Magellanic Clouds survey (VMC Cioni et al. 2011) using the methodology of Bestenlehner et al. (2020Bestenlehner et al. ( , 2022)).The HRD (Fig. 9) shows that most stars are populated near, and to the cool side of the zero-age main-sequence (ZAMS).There are a couple of exceptions but their uncertainties do not exclude a cooler location in agreement with the majority of the sources.This can be improved by including a meaningful prior into the analysis, e.g. based on evolutionary tracks, and could increase the accuracy of the results, as we used only a flat prior (  ′ = 1).For example, only hydrogen deficient stars are found to be on the hot side of the ZAMS, e.g.self-stripping through stellar winds or binary evolution.A prior would give the star a higher probability to be found either on the hot or cool side of the ZAMS depending on its helium composition.Stellar parameters can be found in Table A3 while individual spectroscopic fits for visual inspection are in the supplementary online material.Mass-loss rates, and He, C, N, and O abundances are not included.Optical data can only provide an upper limit for most stars in our sample (flat PDF towards lower mass-loss rates) while the PDF for Helium is cut off at the primordial abundance of 25% by mass.CNO abundances are too coarsely sampled and linked to the predicted chemical evolution of 60  ⊙ star ( § 3.3).

CONCLUSIONS AND OUTLOOK
Large spectroscopic surveys with WEAVE and 4MOST will observed 10 000s of massive stars, which need to be analysed in a homogeneous and efficient way.The pipeline presented in this work takes advantage and utilises the information that large data sets provide by determining the model uncertainties, which are included into the error budget.This methodology could be also applied to galaxies or other domains like biology and geophysics for which approximate and incomplete theoretical models exist as well.
The runtime of the pipeline scales exponentially with the number of spectra, because all stars are analysed at the time and the errormodel uncertainties-matrix is iteratively updated ( § 2.4).However, once a converged error-model uncertainties-matrix is obtained, we can limit the matrix operations to the  2 -minimisation and switch to  a star by star analyses.In this case we are able to analyse 1 star in less than a second.Xiang et al. (2022, The HotPayne) applied the FGK methodology of The Payne (Ting et al. 2019) to OBA stars to derive 2 stellar labels/parameters ( eff and log ) plus 15 chemical abundances.They used the plane-parallel LTE atmospheric model calculated with AT-LAS12 (Kurucz 1970(Kurucz , 1993(Kurucz , 2005) ) and were able to analyse 330 000 spectra.While  eff and log  are sensible for A and mid-late B dwarfs the derived chemical abundances suffer from non-negligible system-atics due to non-LTE effects.AB supergiants require spherical geometry (stellar radius scale height) including stellar winds as these effects cannot be neglected.In hotter and more luminous stars (early B and O stars) a non-LTE treatment is necessary (?) such that The HotPayne results for  eff and log  are not reliable including stars with weak winds.Even the inclusion of a model error will not improve much due to the fundamental missing physics in the ATLAS12 models.
To make The HotPayne usable for OBA stars the underlying stel-  lar models must be replaced with models computed with more sophisticated and fully non-LTE stellar atmosphere codes designed for hot, massive stars with stellar winds, e.g.cmfgen (Hillier & Miller 1998), fastwind (Santolaya-Rey et al. 1997;Puls et al. 2005) or PoWR (Gräfener et al. 2012;Hamann & Gräfener 2003).The approach of the The HotPayne needs to be expanded: to include the model uncertainties into the error budget (e.g. this work) to account for the assumptions and parametrisations utilised in those complex stellar atmosphere codes.Additional stellar labels/parameters need to be incorporated such as mass-loss rate, velocity law, wind-inhomogeneity, terminal and wind-turbulent velocity plus Helium abundances.The helium abundance increases due to the CNO cycle, which in turn increases the mean molecular weight (), impacts the mass-luminosity relation ( ∝  4  3 ), electron density, and therefore the structure and ionisation balance of the stellar atmosphere.When analysing optical spectra the wind parameters can be merged into a wind strength parameter  (Puls et al. 1996), transformed radius  t (Schmutz et al. 1989;Gräfener et al. 2002;Hamann & Gräfener 2004) or transformed mass-loss rate  t (Bestenlehner et al. 2014, § 3.3).
The fully automated spectroscopic analysis tool of this work reduces the human interaction to a minimum to cope with the amount of data.It is able to process ∼ 250 stars in less than half an hour (∼ 6 CPU hours) delivering results comparable to Bestenlehner et al. (2014); Sabín-Sanjulián et al. (2014, 2017); Ramírez-Agudelo et al. ( 2017) over a decade.Overall the quality of the spectroscopic fits are good, but around 15% of the stars need additional attention as a result of strong nebular contamination, low S/N, multiplicity et cetera.The pipeline performances well over a wide parameter space which is support by the spectroscopic analysis of 3 benchmarck stars by several groups within the X-Shooter and ULLYSES collaboration (XShootU Vink et al. 2023) of optical VLT/XShooter data (Sander et al. in preparation).

ID
eff log  log  comments (Castro et al. 2018) [

Figure 2 .
Figure 2.  eff − log  plane of the computed grid of converged stellar atmospheres.The high temperature low surface gravity regime (upper left area) is empty as those models are unstable due to the Eddington limit (Γ e ≈ 0.7).Low temperature and high surface gravity models would be better calculated with a plane-parallel code without stellar winds.

Figure 3 .
Figure 3. Left, spectroscopic fit of an fast rotating early O2 V-III(n)((f*)) star VFTS-072 and, right, a late O9.2 III star VFTS-076 (right).Blue solid line is the observation, red solid line the synthetic spectrum and the grey shaded area is the square-root of the diagonal elements of the model-error uncertainty-matrix calculated by the pipeline.

Figure 4 .
Figure 4. Model uncertainties extracted from model-error matrix as a function of wavelength for H  (solid red), He i 4471 (black dashed) and He ii 4686 (solid blue).For better visualisation uncertainties for He i 4471 and He ii 4686 are multiplied a factor 25 and 10, respectively.

Figure 6 .
Figure 6.Spectroscopic fit of an O2 If (Mk42, left) and a B2 Ib (VFTS-417, right).Blue solid line is the observation, red solid line the synthetic spectrum and the grey shaded area is the square-root of the diagonal elements of the covariant-matrix calculated by the pipeline.In the left panel the newly added N iv mulitplet at 7103 − 29 is able to reproduce the observed line.

Figure 7 .
Figure 7. Effective temperatures (left) and surface gravities (right) determined by the pipeline vs. the results from Castro et al. (2021).

Figure A1 .
Figure A1.VFTS 142: moderate nebular contamination.Blue solid line is the observation, red solid line the synthetic spectrum and the grey shaded area is the square-root of the diagonal elements of the covariant-matrix calculated by the pipeline.Effective temperature is well reproduce while the surface gravity is 0.2 to 0.3 dex too low.

Figure A2 .
Figure A2.VFTS 410: strong nebular contamination.Blue solid line is the observation, red solid line the synthetic spectrum and the grey shaded area is the square-root of the diagonal elements of the covariant-matrix calculated by the The pipeline is unable to reproduce the stellar parameters.

Figure A3 .
Figure A3.VFTS 527: double-lined spectroscopic binary VFTS-527(Taylor et al. 2011).Blue solid line is the observation, red solid line the synthetic spectrum and the grey shaded area is the square-root of the diagonal elements of the covariant-matrix calculated by the pipeline.Effective temperature and surface gravity of the primary are reproduced, but mass-loss rate is too high while helium abundance is too low due to the contribution of the colliding wind region largely effecting H  and H  .

Table A1 .
Lines synthesised when calculation the formal integral.Wavelength ranges are multiplets with diverging central wavelengths.