Convergence properties of fine structure constant measurements using quasar absorption systems

Searches for spacetime variations of fundamental constants have entered an era of unprecedented precision. New, high quality quasar spectra require increasingly refined analytic methods. In this article, a continuation in a series to establish robust and unbiased methodologies, we explore how convergence criteria in non-linear least squares optimisation impact on quasar absorption system measurements of the fine structure constant alpha. Given previous claims for high-precision constraints, we critically examine the veracity of a so-called ``blinding'' approach, in which alpha is fixed at the terrestrial value during the model building process, releasing it as a free parameter only after the ``final'' absorption system kinematic structure has been obtained. We show that this approach results in an extended flat canyon in chi squared-alpha space, such that convergence is unlikely to be reached, even after as many as 1000 iterations. The fix is straightforward: alpha must be treated as a free parameter from the earliest possible stages of absorption system model building. The implication of the results presented here is that all previous measurements that have used initially-fixed alpha should be reworked.


INTRODUCTION
Searches for spacetime variations of fundamental constants are motivated by theoretical expectations, hints in some previous empirical results, and improved technologies, facilities and procedures.Of particular importance is new instrumentation such as the ESPRESSO spectrograph on the VLT (Pepe et al. 2021) and new laser frequency comb calibration facilities (Steinmetz et al. 2008;Milaković 2020;Fortier & Baumann 2019;Milaković et al. 2020;Probst et al. 2020;Zhao et al. 2021), as well as the forthcoming high resolution spectrograph ANDES on the ELT (Marconi et al. 2016;Tamai et al. 2018).
To maximise return from these new and forthcoming facilities, it is important to seek improvements, where possible, amongst all aspects of our current analytic procedures (Webb et al. 2022).In a series of recent papers we have scrutinised existing methods, identified shortcomings, and have developed more advanced numerical and theoretical procedures.That work includes the application of AI methods to spectral modelling (Bainbridge & Webb 2017b,a;Lee et al. 2021a).Recently, strong bias in some existing quasar absorption measurements has been revealed, (Webb et al. 2021a;Lee et al. 2021b;Webb et al. 2022;Lee et al. 2023).
Our recent work has focused on measurements of the fine structure constant α in quasar absorption systems, parame-terised using ∆α/α = (αz − α0)/α0, where the subscripts z, 0 indicate redshift and the terrestrial value and where the fine structure constant α = e 2 /4πϵ0ℏc in SI units.Our measurements have made use of the non-linear least squares code, vpfit (Carswell & Webb 2014;Carswell 2023).The vast majority of existing quasar absorption α measurements in the literature have used this code.The vpfit theoretical background and recent enhancements are described in Webb et al. (2021b); Lee et al. (2022).
One aspect of vpfit that had not previously been studied in any detail concerns convergence, but a number of quite different non-linear least squares applications motivate doing so.Forbes & Bartholomew-Biggs (2009) suggest that Gauss-Newton (GN) can sometimes exhibit slow convergence and minimise the objective function at each iteration of the process, progressing the GN direction and the negative gradient.Forbes and Bartholomew-Biggs apply their modified GN method to five modelling problems and whilst they find evidence for improvement, they also find that it is difficult to identify a generalised method for convergence improvement, noting that specific fine-tuning of their method is probably required for any particular highly nonlinear problem.Transtrum & Sethna (2012) suggest that slow convergence and robustness to initial guesses are complimentary problems and that methods for improving convergence speed can decrease robustness to initial guesses, and vice versa.
Levenberg-Marquardt (LM) algorithms can also be painfully slow to converge when iterations proceed along a narrow canyon (which commonly exists for problems involving a large number of parameters), and suggest several ways in which convergence can be accelerated.One consequence of a flat χ 2parameter space is that algorithms may sometimes push parameters to non-physical (even infinite) values.Transtrum & Sethna (2012) provide an interesting discussion in which they make a distinction between convergence criteria and stopping criteria, and propose a "geodesic acceleration" correction to the LM step, by including second order corrections in the Taylor approximation of the residuals.Fratarcangeli et al. (2020) report similar lengthy convergence issues to those we report in this paper.The context of their investigation is image reconstruction.They derive significant speed gains by using LM to locally linearise the problem, decomposing the linear problem into small blocks, using the local Schur complement, obtaining a more compact linear system without loss of information, generality, or accuracy.The other main gain from their work is that the system becomes parallelised and scalable, making it suitable for use with graphics processing units.A process such as this could be explored for vpfit.
In Lee et al. (2023) (L23 hereon), we showed that if ∆α/α is held fixed at zero whilst developing the absorption system model, then subsequently allowed to vary, the final measurement remains locked in a local χ 2 minimum near the fixed value, so is likely to bear little or no similarity to the statistically correct solution.In the present paper, we develop the L23 analysis further, investigating the impact of non-linear least squares convergence using vpfit.Stopping criteria must be carefully considered for ∆α/α measurements.Previous best-fit quasar absorption system models have adopted a "standard" practice in terminating iterations; when the change in the goodness of fit between two successive iterations, ∆χ 2 , falls below some "reasonable" threshold, iterations are stopped and the "best-fit" model obtained.Previous measurements have assumed that final parameter values have been reached in this way.However, if we force algorithms like vpfit to continue iterating well beyond what would normally be considered as a "reasonable" stopping criterion, could the ∆α/α solution slowly drift from the previous "final" value, ending up with a significantly different solution?Given the strong bias revealed by the L23 calculations, the question naturally arises as to whether convergence failure issues could be also be contributing to systematic errors.

ASTRONOMICAL, ATOMIC, AND LABORATORY DATA
The astronomical data used for this study is an isolated section (i.e.flanked by continuum, line-free regions) within an extensive quasar absorption system at z abs = 1.15 towards the quasar HE 0515−4414 Reimers et al. (1998).Specifically, our study focuses on the lower redshift side the large complex, which we refer to as the "left region".Full details and plots for the left region are described in detail in L23.We use a subsection, rather than the whole absorption system, (a) because the kinematic structure of the left region is such that it provides tight constraints, and (b) to make computing times feasible.Recent studies of this absorption system are given by Milaković et al. (2021); Murphy et al. (2022), L23, and the astronomical (and all other) data used in the present paper are exactly the same ESPRESSO data as used in these last two studies.We opted to use the HE 0515−4414 ESPRESSO data for the present study because the analysis carried out in the present paper is directly motivated by the different approaches in Murphy et al. (2022) (hereafter referred to as M22) and L23 that use the exact same data.In the former, as in some previous studies, ∆α/α = 0 was fixed throughout model development, and only introduced as a free parameter after a completed absorption system model had been derived.In the latter, the same approach was carried out, but with an important addition: as well as fixing ∆α/α = 0 throughout model development, three other fixed values were also tried, ∆α/α = −10, +10, +30 (all in units of 10 −6 ), for the purpose of demonstrating how the fixed-α approach pushes the solution into a false local minimum.In the present analysis, which is an extension of the L23 study, we examine the impact of changing convergence criteria using the same set of four initially-fixed ∆α/α.
All atomic data, transition oscillator strengths f , damping coefficients Γ, laboratory wavelengths including isotope and hyperfine wavelengths, and sensitivity coefficients q, are as tabulated in the atom.datfile provided with version 12.4 of vpfit, Carswell (2023).All calculations reported in this paper have assumed solar relative isotopic abundances.As pointed out in Webb et al. (1999Webb et al. ( , 2014)), deviations from terrestrial relative isotopic abundances impact significantly on ∆α/α.The aim of the present paper is not to examine that issue, which will be addressed in a separate study.

OPTIMISATION METHOD
vpfit v12.4 is used for all calculations in this paper, providing a few improvements over earlier versions.Nevertheless, the results presented here do not depend on the version usedwe have checked that our findings are replicated using earlier versions.
In Webb et al. (2021b) we explored different optimisation approaches, applied within the vpfit code for modelling high resolution absorption lines or systems.Prior to version 12.3, the code used both Gauss-Newton (GN) and Levenberg-Marquardt (LM) algorithms.The two methods differ in the way in which parameter updates are tuned at each iteration during the optimisation process, for maximum descent in χ 2 .In Gauss-Newton optimisation, the vector of optimal parameter updates is given by where γ is a local univariate minimisation parameter whose purpose is to minimise the current value of the merit function χ 2 at each iteration, and where pmin is the vector of parameter updates prior to tuning via univariate minimisation.
The LM method takes a different approach.Instead of optimising the vector of parameter updates, the Hessian matrix is modified at each iteration using a different univariate minimisation parameter, η, again to generate the largest reduction in χ 2 , where I is an identity matrix and η is a positive scalar.Both GN and LM methods are of course very widely used and both work well.Empirically, in the vpfit application, GN sometimes produces the largest χ 2 reduction at some iteration, whilst sometimes LM does.For that reason, vpfit (prior to version 12.3) computes both Equations ( 1) and ( 2) and selects, at each iteration, the best option.We refer to that approach as GN-LM.However, in version 12.3, an enhancement was introduced: instead of employing one or the other of Equations ( 1) and ( 2), both minimisations are done at every iteration, the obvious goal being a more efficient χ 2 descent.Such a method is evidently neither pure GN nor pure LM, but is a merger of the two approaches, termed "hybrid optimisation" (HO) in Webb et al. (2021b).
Another relevant vpfit enhancement concerns the way in which the gradient vector and Hessian matrix are calculated; in Webb et al. (2021b); Lee et al. (2022) we showed how analytic Voigt derivatives avoid problems that occasionally appear using finite difference derivatives and provide better stability in some circumstances.Those changes have been implemented in v12.4.
In all modelling, we imposed an upper limit on the velocity dispersion parameter b ≤ 10 km/s.In all ai-vpfit models used in this paper, the SpIC information criterion was used (Webb et al. 2021a).Also, we have used a Gaussian instrumental profile for convolution with theoretical Voigt profiles (see L23 for details), as did M22.

100 AI-VPFIT compound models
In L23, we computed a total of 400 left region models: 25 independently derived ai-vpfit models for each of 4 initially-fixed ∆α/α values, 2 information criteria (SpIC and AICc), and 2 line broadening mechanisms (turbulent and compound).
In the present study, we drop AICc because previous calculations indicate that SpIC is more effective in this context (Webb et al. 2021a).We also drop turbulent broadening, as compound is more general, more physically reasonable, and it incorporates turbulent broadening as a limiting case.Our initial set of calculations in the present paper therefore only require 100 models: 25 independently derived ai-vpfit models for each of 4 initially-fixed ∆α/α settings, using SpIC and compound line broadening.The 4 initially-fixed ∆α/α settings are ∆α/α = −10, 0, +10, and +30 × 10 −6 (in both L23 and the present paper).These 100 ai-vpfit models provide the starting point for the longer vpfit iterations described in Section 4.2.

1000 VPFIT iterations for 100 AI-VPFIT compound models
Our goal is to examine whether or not the stopping criteria applied in previous measurements result in convergence.The stopping criterion in vpfit is where the subscript n indicates the n th iteration and where Nj is the number of data points for the j th transition, the fit being made simultaneously to a total of M transitions.
The appropriate choice for ∆ in Eq.( 3) clearly depends on the data quality and defines the reliability of the result, as well as the total computing time; if ∆ is too large, iterations may terminate too quickly and convergence may not be reached.
If ∆ is too small, iterations may proceed indefinitely.In all previous α measurements, a balance between these extremes has been attempted, but without sufficiently rigorous checking, hence the present study.The vpfit v12.4 default value of ∆, i.e. the hard-coded value used in the absence of userdefinition, is 10 −3 .The ∆ values used in the three previous studies of this absorption section are 5×10 −4 (Milaković et al. 2021), 5 × 10 −4 (L23); in these both papers, 6 additional iterations were carried out after the stopping criterion had been reached), and 2 × 10 −6 (M22, without additional iterations).
To explore convergence properties, we remove the usual requirement defined by Eq.( 3), and instead force vpfit to iterate 1000 times.Note that whilst the starting models are already "statistically acceptable" fits to the data, (i.e. the reduced χ 2 ≈ 1), their ∆α/α values are consistently biased towards the starting guesses.Fig. 1 shows the ∆α/α convergence tracks for initially-fixed ∆α/α = −10, 0, +10, +30 × 10 −6 .These curves reveal a number of interesting properties: (i) After a small number of vpfit iterations (about 3), each group of 25 models "spreads out" to an approximately constant scatter thereafter.
(ii) Each group, on average, gradually drifts away from the starting ∆α/α value.
(iii) The +30, 0, and −10 samples all drift towards the +10 model, although on average do not quite reach it, even after 1000 iterations.For those 3 samples, convergence is not reached.For the +10 case, drift is very slow i.e. the data appear close to convergence.
(iv) In all 4 cases studied, the drift rate reduces substantially beyond iteration ∼200 − 300.This is more easily seen in the linear plots (Appendix A).
(v) In 3 of the 4 cases studied, the spread is large compared to the statistical error.But in one case (+10), the behaviour is completely different; both the scatter and the drift are very small and the curves appear stable and consistent.
These results show that initially-fixed ∆α/α creates long flat "canyons" in χ 2 -∆α/α space, such that convergence within a reasonable number of iterations is generally unlikely (unless the first guess just happens to coincide with the preferred ∆α/α solution).The curves in Fig. 1, whose properties are enumerated above, indicate that the +10 model is preferred over the other cases.The properties of the +10 models also suggest that if so-called "blinding" is avoided, i.e. ∆α/α is a free parameter from the outset of model construction, the convergence failures seen in the +30, 0, and −10 cases will be avoided (simply because if ∆α/α is a free parameter throughout model building, it's value will already (in this case) be around the +10, iteration 0 value in Fig. 1, and hence will not evolve much subsequently.Whilst these results are derived using one particular absorption system, such that the details would necessarily be different for some other system, there is no reason to expect the more general characteristics found do not apply broadly.

CONVERGENCE TEST USING TURBULENT BROADENING
In the M22 study of the z abs = 1.15 absorption system towards the quasar HE 0515−4414, as already explained, fixed ∆α/α = 0 was used throughout the vpfit model building process.∆α/α was only allowed to vary freely once the kinematic structure of the absorption system had been established.Once ∆α/α is introduced as a free parameter, and all model parameters allowed to vary, the kinematic details evolve a little further, but not much.

25 AI-VPFIT models using turbulent broadening
As shown in L23, extensive ai-vpfit calculations (published after the M22 study) revealed that initially-fixed ∆α/α creates substantial bias.In light of the various results described above, we emulate the M22 analysis, to see if we can reproduce (and hence understand) why that measurement does not reflect the "correct" result for that system1 .We therefore carried out an additional set of calculations.For the purposes of illustrating how one can arrive at the M22 ∆α/α measurement, the relevant starting models from L23 are the 25 independent ai-vpfit models derived using fixed ∆α/α = 0 and turbulent line broadening.

1000 VPFIT iterations for 25 AI-VPFIT turbulent models
We can thus use the 25 models described above again as a starting point, evolving them further using vpfit v12.4,allowing 1000 iterations.None of these 25 models (which involved no human decisions in their construction) would be expected to correspond to the M22 manual process specifically.However, the ai-vpfit process (random placement of trial components, as described in Lee et al. (2021a)), together with the Monte Carlo generation of 25 independent models, provides a range of models that are representative of those produced by a human subjective process.
Fig. 3 shows the same general feature as those seen in Fig. 1; the models spread in ∆α/α as iterations proceed.After 1000 iterations, the spread is large, and in most cases, convergence does not appear to have been achieved.Interestingly, the compound models appear to reach an approximately constant scatter after only ∼3 iterations, but this requires around 25-30 iterations for the turbulent models.
Fig. 3 raises an obvious question: did the M22 model converge?M22 argue that any residual convergence uncertainty in their model is small compared to the statistical uncertainty i.e. that, effectively, convergence had been reached.This can easily be explicitly checked by allowing the final M22 model to iterate further.The results of doing this are illustrated in Fig. 4, which clearly shows that, in fact, the M22 model did not achieve convergence.The starting point in this case is ∆α/α = 2.15 × 10 −6 .It appears to be that the M22 value of 2.15 was reached after 125 vpfit iterations (simply because the ∆χ 2 stopping criterion used in the M22 study was 10 −6 , and it is likely that this criterion was not met, such that vpfit is likely to have reached its default maximum of 125 iterations).The result in this case is interesting: as can be seen in Fig. 4, ∆α/α continuously increases from its starting point, reaching ∼3.2 by iteration 1000 (a change corresponding to 36% of the M22 statistical uncertainty).Fig. 3 also explains why the M22 result was obtained; the point (125, 2.2) is typical of the 25 models illustrated.
Fig. 5 shows the ∆α/α values reached after 1000 iterations for all 25 turbulent models (blue points) as well as the single M22 model (red point).This plot again illustrates that the M22 result (both ∆α/α and χ 2 ν ) can easily be reproduced using initially-fixed ∆α/α and turbulent broadening.See L23 for a detailed discussion on this point, in particular Table 1 in that paper, which shows that ai-vpfit derives models with, on average, close to half the number of absorption components compared to M22 (23.1 instead of 41), with a marginally smaller χ 2 ν .

SUMMARY AND IMPLICATIONS FOR FINE STRUCTURE CONSTANT MEASUREMENTS
We have carried out a series of ai-vpfit and vpfit calculations, to examine convergence properties when measuring ∆α/α in quasar absorption systems.To do so we focused on one particular, very well-studied, system: the left region (as defined in L23) of the z abs = 1.15 towards the quasar HE 0515−4414.No previous studies of convergence issues in this context have been done to our knowledge.The specific goal of our study was to find out how convergence proceeds when "blinding" is used, i.e. imposing an initially-fixed ∆α/α.Of course, in our conclusions here, we do not criticise blinding methods generically; a comprehensive description of the usefulness of blinding methods used in particle physics in particular is provided in Harrison (2002).Nevertheless, we have demonstrated the severe bias that may result if great care is not taken with the way in which a blinding method is applied.
The main result of this work is as follows: when models are developed using fixed ∆α/α, it is unlikely that subsequently releasing ∆α/α as a free parameter will ever result in convergence, and hence the measurement remains strongly biased towards the fixed value.This kind of "blinding" should not    Convergence track for the M22 model, created interactively using turbulent broadening and initially-fixed ∆α/α = 0.The published measurement is the starting point for 1000 iterations (i.e.∼2.2) but this gradually evolves to ∼3.3 after 1000 iterations (and may not have reached convergence).∆α/α is in units of 10 −6 .See Section 5.2.
be used in this context, and the parameter ∆α/α should be allowed to vary freely in the early stages of model building and left as a free parameter throughout.
The structure of the specific absorption system we have studied here is complex.It is that simpler systems, requiring fewer components, suffer less from conver-gence difficulties we have found.Nonetheless, the results derived here provide a strong warning that substantial bias can occur, the clear implication any previous measurement that has been derived using initially-fixed ∆α/α should be reworked.

Figure 1 .
Figure1.Convergence tracks for four sets of 25 initially-fixed ∆α/α ai-vpfit models, created using compound broadening.The continuous lines illustrate means over 25 models.Each colour designates a different initially-fixed value of ∆α/α (equal to the value at iteration one).All individual curves are shown in Figs.A2 and A1.The shaded areas, which illustrate the impact of model non-uniqueness, encompass empirical ±34% ranges.The black (+10) values scatter very little so the shaded area is not much larger than the line thickness.∆α/α is in units of 10 −6 .Terrestrial isotopic abundances have been assumed throughout this work.See Section 4.2 for further details.