Pulse Profile Modelling of Thermonuclear Burst Oscillations II: Handling variability

Pulse profile modelling is a relativistic ray-tracing technique that can be used to infer masses, radii and geometric parameters of neutron stars. In a previous study, we looked at the performance of this technique when applied to thermonuclear burst oscillations from accreting neutron stars. That study showed that ignoring the variability associated with burst oscillation sources resulted in significant biases in the inferred mass and radius, particularly for the high count rates that are nominally required to obtain meaningful constraints. In this follow-on study, we show that the bias can be mitigated by slicing the bursts into shorter segments where variability can be neglected, and jointly fitting the segments. Using this approach, the systematic uncertainties on the mass and radius are brought within the range of the statistical uncertainty. With about 10$^6$ source counts, this yields uncertainties of approximately 10% for both the mass and radius. However, this modelling strategy requires substantial computational resources. We also confirm that the posterior distributions of the mass and radius obtained from multiple bursts of the same source can be merged to produce outcomes comparable to that of a single burst with an equivalent total number of counts.


INTRODUCTION
Astrophysics and nuclear physics have long sought to understand the fundamental interactions of matter at low temperatures and high densities.Neutron stars (NSs) exclusively hold the key to this goal since their cores are expected to be extremely dense and cold (compared to the Fermi temperature).A promising approach to studying the properties of the core is to use measurements of their masses and radii to infer the equation of state (EoS) that governs the matter inside their interiors (see e.g.Lattimer 2012;Oertel et al. 2017;Baym et al. 2018;Tolos & Fabbietti 2020;Yang & Piekarewicz 2020;Hebeler 2021).
The masses, radii, geometric parameters and surface hot spot patterns of neutron stars can be deduced by using the technique of Pulse Profile Modelling (PPM) to analyze the X-ray pulsations originating from the surface of X-ray pulsars (Pechenick et al. 1983;Chen & Shaham 1989;Page 1995;Miller & Lamb 1998; Braje ★ E-mail: y.kini@uva.nlet al. 2000;Weinberg et al. 2001;Beloborodov 2002;Poutanen & Beloborodov 2006;Cadeau et al. 2007;Morsink et al. 2007;Bauböck et al. 2012;Lo et al. 2013;Psaltis et al. 2014;Miller & Lamb 2015;Stevens et al. 2016;Nättilä & Pihajoki 2018;Bogdanov et al. 2019b).This technique, based on relativistic ray-tracing, takes advantage of the fact that the observed X-ray photons encode data about the source's properties and the interactions that the photons experienced en route to the observer.To decrypt the encoded data and infer the relevant stellar properties, PPM uses both mathematical models (e.g. of the space-time, the hot spot properties, and the atmosphere) and statistical inference tools.
Knowledge of several simultaneous masses and radii is required to establish the EoS precisely (see e.g.Özel & Psaltis 2009;Özel et al. 2016) and by extension, the state of matter prevailing in the core.Applying PPM 1 to data from the Neutron Star Interior Composition Explorer (NICER; Gendreau et al. 2016), masses and radii have been so far inferred for two Rotation-Powered Millisecond Pulsars (RMPs) -PSR J0030+0451 (Riley et al. 2019;Miller et al. 2019) and PSR J0740+6620 (Riley et al. 2021;Miller et al. 2021;Salmi et al. 2022) -with results for more RMPs to come (see Bogdanov et al. 2019a, for a non-exhaustive list of targeted sources).For RMPs, the pulsations are stable, which means that temporal variability can be neglected.However there is uncertainty over the size, shape and temperature distribution of the hot spots, which originate as electrons accelerated in the magnetic field bombard the stellar surface, heating the magnetic poles.This is addressed in the modelling using simplified parameterized models that try to capture a wide range of possibilities, motivated by current pulsar theory.
PPM can in principle be extended to other categories of pulsating neutron stars (Watts et al. 2016), in particular Accretion-powered Millisecond Pulsars (AMPs; see e.g.Salmi et al. 2018) and Thermonuclear Burst Oscillation (TBO) sources (see e.g Bhattacharyya et al. 2005).By extending the scope of PPM to encompass other neutron star classes, one obtains a larger sample of inferred masses and radii (reducing statistical uncertainties in inferring EoS parameters), and using a different population allows for independent cross-checks.In this paper we focus on TBO sources.
A thermonuclear burst is a sudden and intense release of Xrays that happens in a neutron star's outer layers.It is caused by a runaway nuclear fusion process in which hydrogen and helium in material accreted from a companion star burn unstably and rapidly, forming heavier elements (for a recent review, see Galloway et al. 2020).During some bursts, uneven heat distribution on the NS surface or in their atmosphere leads to pulsations in the light curve.These pulsations are commonly referred to as TBOs.The cause of the uneven surface temperature distribution that generates the TBOs is still unclear.Various models, mostly complementary, with their respective strengths and weaknesses, have been proposed to explain this phenomenon.These include: dynamics of the flame front (Strohmayer et al. 1997;Spitkovsky et al. 2002;Cavecchi et al. 2013Cavecchi et al. , 2015Cavecchi et al. , 2016)), buoyant r-modes (Heyl 2004;Lee 2004;Piro & Bildsten 2005;Chambers et al. 2019;Chambers & Watts 2020), a cooling wake (Mahmoodifar & Strohmayer 2016) and convective patterns (Garcia et al. 2018).On top of the uncertainties surrounding the origin of the oscillations, the surface conditions during a burst are chaotic and prone to short-term variability (which is both theorized and observed).There may be variations in the temperature of the star and/or hot spot, and the spot may move or change location.
This means that unlike for the RMPs, we have to contend with both increased surface temperature distribution uncertainty (due to the wider range of models) and variability.However TBO sources spin more rapidly than most RMPs, which should help to disentangle the correlation between mass and radius (Lo et al. 2013;Psaltis et al. 2014).Moreover, given that bursts are bright events, obtaining the number of counts required to place meaningful constraints on mass and radius is feasible, particularly if data from different bursts can be combined.Previous studies have shown that we would need ≳ 10 6 counts (from a hot spot) to derive useful constraints on the mass and radius (Lo et al. 2013;Psaltis et al. 2014).This should be easily achievable with proposed large-area X-ray spectral-timing Guillot & Rutledge 2014;Nättilä et al. 2016;Steiner et al. 2018) can also be used to derive the mass and radius.See Degenaar & Suleimanov (2018) for a review of this technique including a discussion of model dependencies and potential uncertainties.
telescopes such as the enhanced X-ray Timing and Polarimetry mission (e-XTP; Zhang et al. 2019;Watts et al. 2019), and the Spectroscopic Time-Resolving Observatory for Broadband Energy X-rays (STROBE-X; Ray et al. 2019).
To better understand the impact of the uncertainties regarding the origin of TBOs and the complexities linked to variability, Kini et al. (2023c) (hereafter K23) developed phenomenological models to mimic the observed TBOs in the bursts from the accreting pulsar XTE J1814−338 (hereafter J1814) (Strohmayer et al. 2003).This particular source has always been regarded as one of the most promising TBO sources for PPM (Bhattacharyya et al. 2005) due to its stable pulsations, combined with high root mean square fractional amplitude (rms FA)2 and harmonic content (see e.g.Braje et al. 2000;Poutanen & Beloborodov 2006;Lo et al. 2013;Psaltis et al. 2014).Using these phenomenological models, K23 produced synthetic bursts that incorporate time-dependent parameters and conducted parameter recovery disregarding these temporal variabilities.It was found that not appropriately accounting for the variability leads to a bias (i.e. the inferred masses and radii deviate from the injected values due to primarily systematic rather than statistical uncertainty) in estimating the mass and radius.This bias becomes noticeable when the count number reaches 10 6 (the threshold for obtaining meaningful constraints, see above).
In this second paper, we investigate methods that can correct the biases introduced by neglecting temporal variability for high counts and accurately determine the parameters of the burster.We also aim to determine the computational cost associated with such methods, to estimate the appropriate allocation of computing resources when analyzing current and future data.To accomplish this, we use a subset of the synthetic data generated in K23, which encompasses diverse phenomenological models incorporating different variabilities.Finally, we explore strategies to combine knowledge acquired from several individual bursts to achieve more refined constraints for a given source.To achieve this, we employ synthetic data generated using a single phenomenological model that incorporates diverse forms of temporal variability for the time-dependent parameters.Since one goal of K23 and the present paper is to provide a foundation to determine the properties of J1814 using the available Rossi X-Ray Timing Explorer (RXTE; Jahoda et al. 1996) Proportional Counter Array (PCA) data, we used RXTE's response matrix throughout these papers.
The rest of this paper is organized as follows.In Section 2, we provide a brief summary of the necessary model ingredients in our PPM framework.This is followed by a description of the simulated data used and an explanation of our inference procedure, including how we combine information obtained from individual bursts.The next section (Section 3) presents our findings, which are then discussed along with their implications in Section 4. The closing remarks are presented in Section 5.

METHOD
As stated in Section 1, determining the properties of a NS using its X-ray pulses requires both mathematical models and statistical tools.The first mathematical model is that for the star's shape and the space-time in which it is embedded.The surface pattern and atmosphere models, which constitute the second component of the mathematical model, determine the source of X-ray photons by taking into account various factors, including the shape of the hot spot(s), the energy and angular distribution (beaming function) of the X-ray photons, and the ways in which the photons interact after they are emitted.The third component is an Interstellar Medium (ISM) model, to account for the absorption of photons by the ISM prior to detection.We also need to decide how to treat the presence of background photons from instrument or other sources in the field of view of the NS.Finally, we need to have a model of the instrument's response (RSP), and the uncertainty on that response.Each of these model components must then be incorporated into a simulation pipeline to make computation possible.
While all of the model components mentioned above are described in greater detail in K23, we provide an overview of the most critical aspects in the first part of this section.The second part, starting from subsection 2.4 details the formalism for integrating information from separate bursts to derive a more robust estimation of stellar properties.

PPM ingredients
We used the X-ray Pulse Simulation and Inference (X-PSI3 ; Riley et al. 2023) code, version v0.7.94 to generate data that replicates the properties of the J1814 light curves, which we then use for parameter inference.In X-PSI, it is presumed that the star has an oblate shape and is embedded in a Schwarzschild space-time.When calculating the observed flux, Doppler terms are included to account for rotational effects.Photons emitted on the oblate surface are conveyed to the observer by means of relativistic ray tracing (for details of the oblate Schwarzschild plus Doppler approximation see Cadeau et al. 2007;Morsink et al. 2007;AlGendy & Morsink 2014;Bogdanov et al. 2019b).
We assumed, as the origin of the pulsations, a single circular X-ray hot spot emitting uniformly.Its main characteristics are: co-latitude (Θ spot ), phase ( spot ), angular radius ( spot ), and temperature ( spot ).We assumed the co-latitude and phase to be constant throughout a burst, but allowed the angular radius and/or temperature of the hot spot to vary depending on the phenomenological model (see Table 1 for details of the varying components of each specific phenomenological model).In some cases, the remaining portion of the star ( star ) was also permitted to emit.We selected models with parameter vectors capable of reproducing the properties of the bursts and TBOs of J1814, as observed with the RXTE PCA: ∼ 105 counts per burst, and ∼ 10% rms FA throughout the burst (Strohmayer et al. 2003).Table 1 provides a summary of all phenomenological models and other model components necessary for generating synthetic data and performing parameter inference.

Synthetic data
In this analysis 5 , we used two data subsets generated using the ingredients described in the previous subsection.
Our first data set, which we used to test methods to mitigate bias, is the 106 count variability data-set generated in K23.The parameter vectors in this data set encompass a wide variety of injected values of e.g.mass, radius.For each phenomenological model and parameter vector (3 per model, see Table A1 of K23), 10 bursts of 10 5 counts were generated, each with different random noise injections, as described in that paper.The 10 bursts were then summed to create a single mega-burst with a total of 10 6 counts (see an example of a burst in Figure 1).
In this approach, each burst in a given set has exactly the same hot spot properties.However this may not be realistic.For J1814, the temperature evolution does vary from burst to burst, as indicated by the light curves 6 , and the ignition location and the position of the oscillation hot spot may be subject to variation during and between bursts (Watts et al. 2008;Cavecchi & Patruno 2022).
To explore this more thoroughly, we also generated a second data subset.We used the first parameter vector (s1) of the most complex phenomenological model, ST-S H, (see table A1 in the appendix of K23).We generated 9 bursts (whose total counts summed to ∼ 10 6 ).The mass, radius, distance, inclination, and hydrogen column density were kept constant, but each burst had its own unique evolution (temporal dependence) of hot spot temperature, hot spot angular radius, and star temperature (see Figure A1 of Appendix A).In addition, the hot spot's co-latitude and phase varied from burst to burst but remained constant during each one.Each of these nine bursts shared the main features of the J1814 bursts, mainly: ∼ 10 5 counts and ∼ 10% rms FA throughout the burst.This set of nine bursts is the second data subset7 .

Inference procedure
To try to overcome the significant bias introduced by neglecting variability during inference as found in K23, we adopted a new modelling approach in this study.This method involves tracking the time-varying parameters (hot spot temperature and/or angular radius and/or star's temperature depending on the phenomenological model) to some degree during the burst.To accomplish this, we divided each burst, in each of the data sets, into eight time segments during which changes in flux are smaller.We elected, from visual inspection,8 to use two 2 s segments in the burst rise, an 8 s segment in the peak and initial decay, and then five segments of equal duration during the decay.This was intended to strike a balance between reducing computational cost (which increases with the number of segments) and minimising variation during each segment.We term this technique the slicing method, first suggested by Lo et al. (2013).An illustration is shown in Figure 1.
During the inference process, we jointly fit the data corresponding to each segment.While the common parameters (mass: , equatorial radius:  eq , distance: , cosine of Earth inclination to rotation axis: cos(), phase of the hot region:  spot , co-latitude of the centre of the hot spot: Θ spot , hydrogen column density:   )9 were kept identical across each segment, the time-varying parameters  Krauss et al. 2005) ISM TBabs: this model only considers neutral gas absorption and adopts photoelectric absorption cross-sections from Wilms et al. (2000).RSP RXTE's response matrix (ObsID: 80418-01-02-00 which corresponds to the observation of J1814's burst 10) Table 1.Summary of the model properties used to produce data and -for all components apart from the background -to conduct inference.No assumption is made about the functional form of the background during the inference.We applied a flat prior distribution for each background variable centered around the true value (for each energy channel), with a hard cut at ±3 to minimise any possible biases coming from the inferred background.See Section 2.4 of K23 for more details of the background modelling and Appendix B.2 (Riley 2019) for the background marginalization procedure.To generate the data, we kept the background count rate at 80 counts/s during the entire burst.

★
The temperature evolution of GS 1826-24 was used as a basis for the temperature evolution for these models.The average temperature evolution of GS 1826-24 denoted  ma6 ( ), was obtained by averaging the temperatures of the simulated bursts in the ma6 model sequence (see Meisel 2018), with the exception of the first and last bursts.† The angular radius was fine-tuned to replicate the properties of J1814.See Figure 2 in K23 for its evolution.‡ The temperature was fine-tuned to replicate the properties of J1814.See Figure 2 in K23 for its evolution.
(hot spot temperature and/or angular radius and/or star's temperature depending on the phenomenological model) were allowed to vary freely between segments.The joint-fit likelihood is the product of the per-segment likelihoods (see Appendix A1 for derivation): where  = 8 is the total number of segments and L  the likelihood of the data of the  th segment given the model.
To compute L total , we used two inference methods: the brute force method (BF) and the with-model method (wM).In the BF method, only basic assumptions were made regarding timedependent parameters.We enforced a temperature hierarchy in which the temperature of each successive segment was required to be higher or lower than the preceding one in a predetermined order.These constraints were necessary as the prior space would be too large to explore otherwise, which would result in a computationally unfeasible scenario.Therefore we had: • For all phenomenological models with  X varying, with  ∈ {spot, star}, we set: • For all phenomenological models with  spot varying, except for ST-S H , we had  spot1 ≤  spot2 ≤  spot3 ≤  spot4 ≤ spot5 ≤ spot6 ≤ spot7 ≤ spot8 .
We let the simulation code X-PSI identify the optimal parameters that jointly fit all the data segments.
In the second method (wM), the evolution of the time-varying parameters was assumed to be well-constrained from theory.In the interests of computational efficiency, we do this by setting the parameter prior bounds (of the hot spot temperature and/or angular radius and/or star's temperature) for a specific segment to be the minimum and the maximum of the values (of the hot spot temperature and/or angular radius and/or star's temperature) used to generate the data of that specific segment (see Table A1 in Appendix A for prior densities).

Combining bursts
Attempting to constrain NS properties using individual observations (bursts) results in significant uncertainties in inferred properties.Combining information from these individual bursts is necessary to overcome these limitations and achieve a more precise determination of properties commonly shared across bursts.The following section details the formalism we employed to achieve improved constraints, primarily on the mass and radius.However, this formalism can also be expanded to include other shared parameters if desired.

Formalism
We denote by D = {  }  =1 a set of data corresponding to  observations (bursts) where   is the data corresponding to the  th burst.According to Bayes' theorem: where (|  ) is the posterior of the parameters  given the data   , (  |) the likelihood of the data   given the parameters , () the prior on , and (  ) the evidence.Note that  = (,  eq , , cos(),  spot , Θ spot ,  spot ,  spot ,  star ,   ) 10 .However, since the focus is on improving the constraints on the mass and radius from individual bursts, we denote by  a subset of ,  = (,  eq ) 11 .(|  ) hence becomes the marginal posterior in the joint mass and radius space, meaning: The main goal is to determine (|D) based on the  bursts.Assuming these  bursts are independent, and using Bayes' theorem, we have: Given that the joint mass-radius prior density is flat in (,  eq ) space i.e () = constant, we obtain: (5) Thus, the joint mass and radius posterior distribution (not normalized) over the  observations can be expressed as the product of the individual posterior distributions computed for each burst. 10When the burst is divided into K segments and the angular radius of the hot spot is kept free for each segment, then  spot = {  spot1 ,  spot2 , ...,  spotK }.The same goes for  spot and  star . 11 can be reduced to any set of parameter of interest as long as these parameters are shared across bursts.

Numerical implementation
To compute Equation ( 5) numerically, we employed the following approach.First, we approximated the joint mass-radius posterior distribution for a specific burst  by using a continuous function   ().This function serves as an approximation for the discrete joint mass-radius posterior distribution.We obtained   by employing a Gaussian kernel density estimator (scipy.stats.gaussian_kdefunction) with the posterior samples from the  th burst obtained from MultiNest (Feroz & Hobson 2008;Feroz et al. 2009Feroz et al. , 2019) ) and its python wrapper PyMultinest (Buchner et al. 2014).Second, we discretized the joint mass and radius prior space into a 400 × 400 mesh grid.For each burst, we computed new weights for the mid-points of each bin within the mesh grid.These weights represent the significance of each pixel in relation to the overall posterior.Next, we multiplied the obtained weights across all bursts and subsequently normalized them.This process resulted in the desired combined posterior weights, which integrate information from all bursts.To ensure consistency with the previous analysis with X-PSI, we adjusted the bandwidth of the kernel density estimator based on the grid resolution.The aim was to minimize the discrepancy between the computed weights for each burst (considering the new samples taken as mid-points of the mesh grid) and the posteriors obtained from MultiNest and PyMultinest output samples.We found that setting the bandwidth to 0.1 yields the desired result.

RESULTS
The following section presents the results of the inference runs, which were conducted to explore parameter recovery and the associated computational costs using the slicing method.The general specifications of the hardware used for these runs are shown in Table A4 Appendix A. A total of 47 inference runs were performed on the first data subset, 23 using the BF method and 24 using the wM method.The results of these runs are presented in sections 3.1, 3.2 and 3.4.Using the second data subset, we carried out 9 additional runs utilizing the wM approach, to examine the impact of merging posteriors from multiple bursts.Section 3.3 shows the results of these runs.
Using the BF method, 19 runs converged, 4 were terminated before convergence due to high computing time relative to the number of accepted samples found by MultiNest, and one we decided not to run based on the anticipated computational cost.This last burst was generated using parameter set 2 of model ST-S H, : this star is exceedingly compact, requiring an increased number of multiple imaging evaluations in the vicinity of the true solution.This already resulted in significantly longer inference runs in K23, so based on the scaling of the other runs once slicing was implemented, we concluded the convergence time would be unreasonable.

Mass and radius recovery
To evaluate the mass and radius recovery, we use both PP-plots and the wrongness parameter.PP-plots provide a cumulative fraction of masses or radii recovered within a credible interval as a function of the credible intervals.We define the credible intervals for the PP-plot to always start from the lower tail of the posterior distribution.This is to align with the convention used by e.g.Bilby12 (Ashton et   .The black solid line represents the result obtained in K23, while the green solid and orange solid lines show the results obtained using the BF method and the wM method, respectively.The color-coded values are the p-values from the KS-test and the CvM-test.The grey regions, in order of decreasing transparency, are the cumulative 1, 2, and 3 confidence intervals computed as in Cameron (2011), with a sample size of 24 bursts.The upper and lower bounds of the 3 cumulative confidence intervals for a sample of 19 bursts, which corresponds to the completed runs using the BF method, are represented by black dashed lines in the upper panel.2019).On the other hand, the wrongness 13 parameter quantifies the deviation of the inferred value from the injected one, measured in units of 68% credible intervals.A wrongness of e.g.+0.5 implies that injected value is recovered at +1 away from the median of the posterior distribution.A Kolmogorov-Smirnov test (KS-test) is then conducted to examine whether the proportion of recovery conforms to a uniform distribution, which is the expected result.The combination of these techniques was used in K23 and provides a more comprehensive overview of parameter recovery than using either technique alone.In addition to the KS-test, we performed the Cramér-von Mises test (CvM-test) since it is more sensitive to deviations from uniformity across the entire distribution.
We show in Figure 2 the PP-plots for both the mass and the 13 For a given population X of mass or radius, wrongness = − 1 2  ( inf ) if X = X. ( inf ), X and X are respectively the _score of the inferred value  inf , the median and the mean of the population X.
radius.The p-values from the KS-test and the CvM-test are indicated and colour-coded by method.Both mass and radius recovery was significantly improved by slicing and jointly fitting the data, compared to ignoring variability (solid black line).The BF method yielded slightly less accurate mass recovery than the wM method, with the mass plot consistently above the diagonal.This implies that, on average, the mass is slightly overestimated.Nevertheless, the recovered mass mostly falls within the 95% credible region and never exceeds the 99% credible region (corresponding to 19 bursts).In contrast, the wM model effectively recovered the mass with no observable bias, as evidenced by its p-values.While the radius in K23 deviates significantly from the diagonal, it is recovered here accurately (i.e. with no measurable bias) using both the BF and wM methods In Figure 3, we compare the wrongness of the parameters (mass and radius) provided by the slicing technique with those acquired without slicing as in K23.The blue cross symbols indicate situations in which inference was either not performed or was incomplete due to insufficient computational resources using the BF method.The grey regions, in order of decreasing transparency, are the 68%, 95%, and 99% credible intervals.A simple heuristic was used to approximate the 95% and 99% credible interval, with these values taken to be 2 and 3 times the 68% credible intervals respectively.We note that the posterior distributions are not perfectly symmetrical.As a result, a direct calculation to determine the 95% and 99% credible intervals from the 68% credible intervals is not feasible without additional computational endeavour.However, they appear to be well-behaved from a visual inspection.Evaluating the wrongness for the mass and radius results generated using the slicing method results in a clustering of wrongness around zero.This is indicative that slicing provides good parameter recovery.In fact, the majority of the mass and radius parameters were found to fall within the 68% credible intervals, and all of them were found to be within the 99% credible intervals.Despite the PP-plot showing that the inferred masses are mostly higher than the injected masses, from the wrongness figure, we note that these elevated inferred masses remain mainly within the 68% credible intervals and never exceed the pseudo 99% credible intervals.In comparison, when variability is neglected during the inference (marked as K23 in the figure), only a limited number of parameters were determined to be within the 99% credible region.This disparity suggests that slicing the bursts and jointly modelling the slices leads to a substantial reduction in systematic uncertainties compared to the K23 method.
Figure 4 shows the uncertainties at 68% credible intervals14 on both the mass and radius.Typically, as the number of free parameters increases, the mass uncertainty decreases, although there are some outliers.The decrease in the uncertainty is most probably a consequence of using a constant number of live points (LP = 2000) for all the runs.Both the BF method and the wM method yield comparable uncertainties, with an average mass uncertainty of 12±7% and 13±5% when using the BF and wM methods, respectively.The mean radius uncertainty on the other hand is 10±7% and 10±3% when using the BF and wM methods, respectively.When not accounting for variability as in K23, the average uncertainty in the mass and radius was 17±7% and 16±15%, respectively.
Wrongness of mass and radius both methods: slicing (BF & wM) and K23.The Wrongness = ( inf −  inj )/Δ inf , quantifies the deviation of the inferred value ( inf ) from the injected one ( inj ) in units of 68% credible interval; X ∈ { ,  eq }.Each model is represented on the xaxis along with its corresponding parameter sets, consisting of three distinct sets: S1, S2, and S3.The grey regions, in order of decreasing transparency, are the 68%, 95%, and 99% credible intervals computed as described in the text (Section 3.1).The blue cross symbols show where the inference of the parameters could not be done because of computing limitations for the BF method.

Recovered temperature and angular radius evolution
In Figure 5, we present an illustrative example of the recovered hot spot temperature and angular radius evolution, using both the BF and wM methods.This specific example corresponds to the run performed on the data generated with model ST-S H, , parameter set 1. The remaining parameter evolution can be found in the Zenodo files (Kini et al. 2023b).
The wM method typically produces a temperature evolution that closely matches the injected evolution.In this particular example, the temperature determined for each segment corresponds to the average temperature of that segment.However, it is worth noting that this is not always the case, as there are instances where the recovered temperature may slightly exceed the segment's average (see Kini et al. 2023b).In contrast, the angular radius of the spot is consistently equal to the average injected angular radius for all the runs with the wM method.Using the BF method, the temperatures and angular radius evolution also closely resemble the injected evolution in terms of shape.However their normalization, as compared to the injected evolutions, are not consistently equal to one.

Combining multiple bursts
We conducted inference runs on each of the nine bursts from the second subset, with each burst containing approximately 10 5 counts.
Uncertainty on the mass and radius (68% credible intervals).The x-axis shows each phenomenological model and its corresponding parameter sets, which are categorized into three distinct sets: S1, S2, and S3.The blue cross symbols indicate parameter sets where inferences were cancelled or could not be performed due to computing limitations for the BF method.We used the wM method for this purpose.Figure 6 presents the posterior distributions for both mass and radius.Notably, the mass and radius posteriors exhibit wide dispersion around the injected values ( = 1.6  ⊙ ,  eq = 8.0 km) and vary from burst to burst.This variation can be primarily attributed to differences in burst properties, since the angular radius of the hot spot, the temperature of the hot spot, the star temperature evolutions, and the realization of noise in each simulated burst are all different.Sampling noise may also contribute to the observed differences (see e.g.Vinciguerra et al. 2023).Despite the variations in the posteriors, the injected mass and radius consistently fall within the 68% credible intervals when the wM method is applied, except for burst 5. On average, the uncertainty in mass is 18±3%15 , while the uncertainty in radius is 17±4%.Combining all nine bursts using the formalism described in section 3.3 leads to tighter constraints, as expected.The posteriors are narrower and the uncertainties on the mass and radius are respectively 7±7%16 and 6±6%.The injected radius falls within the 68% credible interval, whereas the mass is just outside it.The wrongness of the mass is approximately -0.505 which indicates that the injected value is very close to being within the 68% credible interval.

Computational cost
Figure 7 shows the computing time in core hours for each run.The blue cross at the bottom represents model ST-S H, parameter set 2, where the inference run was not conducted.The number of free parameters in each model is indicated by the numbers at the bottom, and sets highlighted by a green circle are those that had their inference interrupted before convergence.The full set of all runs cost approximately 5×10 6 core-hours in total.Typically, a single inference run takes on average 10 5 core-hours, and models with more free parameters take longer to converge.An exception is ST-S H , which has 17 parameters but took as long as the models with 24 parameters.
The ST-S H, runs, along with the ST-S H -set 2 run, were stopped since the number of core-hours spent was high.The ST-S H, simulations in particular consumed in total approximately 1.5 million core-hours without converging.Unquestionably, the large number of free parameters in the model contributed to this substantial use of computation resources.ST-S H -set 2 and ST-S H, -set 2 both assume an extremely compact star.This leads to the computation of higher which slows down the likelihood evaluation as the sampler homes in on the injected solution, leading to substantial consumption of computation resources.

DISCUSSION
We conducted this analysis with three primary objectives in view.The first objective was to address the systematic bias that arises when TBO source properties are estimated using PPM without accounting for temporal variability, identified in K23 as being particularly problematic once burst counts exceed ∼10 6 .Secondly, we aimed to assess the computational resources needed to address these biases, to help plan the allocation of computing resources for analyses of current and future TBO data.Finally, we aimed to explore how to combine the results of individual bursts to yield more robust results.This will help us to exploit existing TBO data captured by RXTE.
Our study used synthetic burst data generated assuming various phenomenological models for the time evolution of burst and hot spot.We first looked at slicing bursts into shorter segments, during which variability can be neglected, and jointly fitting those segments.To carry out the inference runs, we used two different methods.In one (the BF method) only very broad prior knowledge of the time-varying parameters was assumed, leaving the inference code X-PSI to identify the time variation.In the other (wM) we assumed some knowledge of the time-varying parameters was available from theory.Finally, we looked at combining individual bursts whose properties, even from a single source, may differ.

Constraints on the mass and radius
The BF method and the wM method both lead to accurate recovery of the mass and radius, with negligible differences between the radius estimations.Almost all of the recovered radii fall within the 68% credible intervals.While the inferred masses are generally slightly larger than the injected masses, the majority fall within the 68% credible intervals, and all injected masses fall within the pseudo 99% credible intervals, as indicated by the wrongness plots.Both techniques yield mass and radius estimates with approximately 10% uncertainty (68% credible interval) for bursts with 10 6 counts.
This result highlights the significant potential of TBO sources in constraining the EOS, since 5%-10% uncertainty is necessary for distinguishing between different EOS (see e.g.Özel & Psaltis 2009;Psaltis et al. 2014).Slicing bursts containing 10 5 counts and performing joint fitting for each segment resulted in a decrease in uncertainty from 30% (as found in K23) to approximately 20%.Subsequently combining the posterior distributions of nine bursts, totalling around 10 6 counts, yielded uncertainties of about 7±7% and 6±6% for mass and radius measurements, respectively.These findings align closely with the ∼ 10% uncertainty observed when using the slicing method on a single burst of approximately 10 6 counts for both parameters.This implies that merging results from individual bursts can improve the constraints, highlighting the significant potential of TBO sources, especially considering that sources typically exhibit multiple bursts during their outburst phase.With next-generation instruments like eXTP and STROBE-X, which would have larger effective areas and will capture more photons, we anticipate that capturing a few bursts will significantly reduce uncertainties in the mass-radius space.Further reduction could be possible if we had independent prior constraints e.g., on the mass and inclination (see e.g.Wang et al. 2017, although the constraints in that specific paper are rather broad).17

Recovered temperature and angular radius evolutions
Determining, from first principles, the temperature and angular radius that accurately represent the observed counts for each segment is challenging due to numerous factors, such as atmosphere reprocessing.However, by employing the wM method and constraining the temperature and/or angular radius of each segment within the range of the lowest and highest limits used in generating the data, we mostly retrieve the average temperature.This is not always the case with the BF method, which may seem a little surprising since it shows strong agreement in matching the masses and radii obtained through the wM method, which does recover the average temperature.This is because of the multiple degeneracies between parameters, especially between the spot properties and the distance.
For the BF method, it could be argued that imposing some constraints (see section 2.3) on the behaviour of temperature and radius evolutions might have contributed to the accurate recovery of the shape of the temperature and angular radius evolutions.However, it is important to note that this approach allows for a wide range of other possible temperature and radius evolutions.In the case of a real burst, similar constraints could reasonably be applied to the temperature evolution as the temperature curve is expected to exhibit similar behaviour to the light curve during a burst.However, the situation is different for the angular radius evolution, as the geometry of the hot spot during a burst remains poorly understood.

Combining bursts
As shown in section 3.3, posteriors obtained from multiple bursts from the same source can potentially offer more precise and reliable constraints on stellar properties.When combining multiple bursts, the posteriors of mass and radius become more tightly constrained, with uncertainties reducing from approximately 20% to about 6-7% compared to a single burst.This posterior shrinkage aligns with the expected

√
(where  is the number of bursts) improvement compared to that of a single observation.This is encouraging for attempts to obtain PPM constraints from TBOs in the RXTE archive.
However, it is important to consider that while combining multiple bursts yields similar outcomes to a single burst with equivalent counts, modelling a single burst with higher counts can reveal peculiarities in the model that may not be apparent in individual bursts with lower counts.Therefore, continually adding posteriors from low-count bursts to obtain better constraints may result in flawed results, especially if the total counts reach a range where statistical uncertainties become lower than systematic uncertainties.Unless a diagnostic method ensures that the combined posteriors behave as expected, caution should be exercised when combining too many bursts.
Future data from proposed large-area X-ray spectral-timing telescopes like eXTP and STROBE-X will undoubtedly contribute to reducing uncertainties in the employed models (surface patterns, atmospheres, etc) given the data quality expected.With these missions, we expect to be able to obtain interesting constraints even from a single burst.However the strongest constraints will come from combining multiple bursts, providing that model uncertainties can indeed be reduced.If bursts from the same star were captured with RXTE and also by newer instruments like XTP, and STROBE-X-they can, in principle, be combined to yield improved uncertainties.However, the newer, larger instruments would statistically dominate the results, given their better data quality.

Computational cost
Although the BF and wM methods recover mass and radius with comparable accuracy, the computational time required for each method differs significantly.On average, it took about 4 × 10 4 core hours for the runs to converge using the wM method, while the BF method required approximately 10 5 core hours.Runs conducted on the identical dataset without employing data slicing, by contrast, took on average 10 4 core hours18 (K23).As expected, phenomenological models with more free parameters generally took longer to converge.However, the completed runs were still three times faster when using the wM method compared to the BF method.Although more than 3×10 6 core hours were spent on set 2 of ST-S H and all the sets of ST-ST-S H, using the BF method, those runs failed to converge.This was primarily due to the extensive prior space that needed to be explored for some models, often combined with the very compact star assumption used to generate these synthetic data.
The computational demands of the slicing method stem from the need to compute a pulse profile for each segment.This involves performing tasks like ray tracing and interpolating atmospheric properties for each segment.In an attempt to expedite the computations, we significantly reduced the resolution in the energy grid used for pulse profile computation compared to that used in K23 19 .However, the computational cost remains substantial.
When modelling sources with higher counts, it may be necessary to include even more time segments to effectively address mass and radius biases.This would result in further increases in computation time.Computational costs could also escalate if other complexities need to be taken into account.Even in the case of J1814, considered to be one of the most static sources in terms of temporal variability during the burst, the hot spot location is observed to change slightly (Watts et al. 2008;Cavecchi & Patruno 2022).Investigating how such complexity affects mass and radius estimation is still an open question.For many sources (see e.g.Strohmayer et al. 1996;Muno et al. 2002a,b;Chakrabarty et al. 2003;Bhattacharyya & Strohmayer 2005, 2006;Bilous & Watts 2019) the changes in some of the pulsation properties are more pronounced.Ideally, keeping the parameters describing the hot spot properties as well as the stellar temperature as free parameters between segments would be desirable.
How might computational cost be reduced?The wM method highlights that despite the inherent complexity, theoretical work to establish expected behaviour for these variabilities can significantly reduce computational time.In addition, X-PSI or any other pulse profile modelling codes, could be more optimized for analyses with many time segments.We could also explore other sampling options, as current nested samplers are known to require many proposals for large parameter spaces to be explored thoroughly.Recent advances in machine learning techniques may offer a worthwhile alternative (see e.g.Baron 2019;Chua & Vallisneri 2020;Bhardwaj et al. 2023).Using Surrogate models (neural network emulators) could potentially help create a faster simulation model.

Limitations and futures prospects
This study, just like K23, is limited to synthetic data sets that have been tailored to mimic the properties of one particularly promising source, J1814.And due to the computational cost of the slicing method, we have in this paper conducted parameter recovery only using the 10 6 counts data set from K23.We chose not to include the 10 7 counts data in our analysis for two main reasons: the anticipated (even higher) computational cost of a larger data set; and the fact that the total number of counts for the J1814 bursts recorded by RXTE during its 2003 outburst, which we plan to analyse in a future paper, is a few times 10 6 .We also considered only one particular slicing formulation (with eight segments).Within these limitations, however, both slicing and combining bursts improves parameter recovery to the desired level.
Ultimately, we would like to perform PPM on a wider variety of TBO sources (with much more variability than J1814, including hot spot motion), for larger data sets with ∼ 10 7 counts, and using different instrument response matrices.While prospects for the slicing method, and for combining bursts, look promising, verifying that this holds for the more general case -and optimizing the technique to minimize computational cost -will certainly require additional exploratory simulations.

CONCLUSION
In this series of papers (K23 and the current one), we have undertaken an exploration of the challenges and opportunities associated with the use of PPM to deduce the masses and radii of neutron stars.Our specific focus has been on Thermonuclear Burst Oscillation (TBO) sources.
We have found that slicing bursts into shorter segments, where temporal variability can be neglected, allows good parameter recovery for count rates at the level necessary to place meaningful constraints on the EoS.This process of slicing the data does however come with a notable computational cost.Thus, to accurately model data from bursting sources, especially those obtained with nextgeneration instruments, it will be imperative to allocate sufficient computational resources.The development of theoretical models that better constrain the properties of bursts and oscillations would also be of significant assistance.
We also explored strategies for combining information obtained from multiple individual bursts to achieve more precise constraints for a given source.By merging bursts, we can overcome the limitations of individual observations and enhance the statistical significance of our findings.
TBO sources hold great promise for contributing to a deeper comprehension of the core composition of neutron stars and providing valuable insights into the fundamental properties of matter under extreme conditions.Moving forward, further advancements in observational capabilities, coupled with ongoing theoretical developments, will continually enhance our understanding of the physics and astrophysics of neutron stars.

A3 Priors
Table A1 shows a comparison of the prior distributions for each parameter between the BF and WM methods for burst shown in Figure 5.It is apparent that the WM method exhibits a substantial reduction in the prior space, resulting in significant improvements in computing time.

A4 Hardware specifications
Some of the key hardware and operating system specifications are presented in Table A4.The links in the caption of the table lead to a more detailed description of each hardware component.

Figure 1 .
Figure 1.Example of light curve generated with parameter set 2 of model ST-S H .The vertical bands indicate each data segment as described in the text.

Figure 2 .
Figure 2.Cumulative fraction of masses or radii recovered within a credible interval as a function of the credible intervals (from the lower tail of the posterior distribution, see text).The black solid line represents the result obtained in K23, while the green solid and orange solid lines show the results obtained using the BF method and the wM method, respectively.The color-coded values are the p-values from the KS-test and the CvM-test.The grey regions, in order of decreasing transparency, are the cumulative 1, 2, and 3 confidence intervals computed as inCameron (2011), with a sample size of 24 bursts.The upper and lower bounds of the 3 cumulative confidence intervals for a sample of 19 bursts, which corresponds to the completed runs using the BF method, are represented by black dashed lines in the upper panel.

Figure 5 .
Figure 5. Example of the (median) hot spot temperatures and angular radii recovered using the BF and the wM methods for each segment.The error bars are the 68% credible intervals.The injected hot spot temperature and angular radius evolutions used to generate the synthetic data are shown as solid black curves.The vertical bands, coloured in grey and white, represent the boundaries of each segment.

Figure 6 .
Figure 6.Mass and radius posterior distributions for each burst and when they are combined.The dashed lines indicate the injected values ( = 1.6  ⊙ ,  eq = 8.0 km) while the green vertical bands are the 68% credible intervals corresponding to the combined bursts.For each burst, colour-coded numbers show the median values and their corresponding 68% credible intervals.

Figure 7 .
Figure7.Core-hours spent on runs for each model and parameter set.Each model is represented on the x-axis along with its corresponding parameter sets: S1, S2, and S3.The green circles show runs that were terminated due to excessive computing demand.The blue cross indicates the parameter set where inference was not performed.

Table A2 .
Hardware and operating system specifications *