Applying clock comparison methods to pulsar timing observations

Frequency metrology outperforms any other branch of metrology in accuracy (parts in $10^{-16}$) and small fluctuations ($<10^{-17}$). In turn, among celestial bodies, the rotation speed of millisecond pulsars (MSP) is by far the most stable ($<10^{-18}$). Therefore, the precise measurement of the time of arrival (TOA) of pulsar signals is expected to disclose information about cosmological phenomena, and to enlarge our astrophysical knowledge. Related to this topic, Pulsar Timing Array (PTA) projects have been developed and operated for the last decades. The TOAs from a pulsar can be affected by local emission and environmental effects, in the direction of the propagation through the interstellar medium or universally by gravitational waves from super massive black hole binaries. These effects (signals) can manifest as a low-frequency fluctuation over time, phenomenologically similar to a red noise. While the remaining pulsar intrinsic and instrumental background (noise) are white. This article focuses on the frequency metrology of pulsars. From our standpoint, the pulsar is an accurate clock, to be measured simultaneously with several telescopes in order to reject the uncorrelated white noise. We apply the modern statistical methods of time-and-frequency metrology to simulated pulsar data, and we show the detection limit of the correlated red noise signal between telescopes.


INTRODUCTION
Millisecond pulsars (MSP) are considered extremely stable astronomical clocks because of their high energy and momentum in a small size (Verbiest et al. 2009), albeit the observations can be affected by large observational white noise. Such noise can be due to the low signal to noise ratio of the MSP observations, and depends on the radio-telescope (RT). On the other hand, red noise could originate from the MSP, the propagation through the interstellar medium or from gravitational waves (GW) on the line of sight, hence it is common to all the RTs (Hellings & Downs 1983;Jenet & Romano 2015). The characterization of the spectral signature of GWs is a major scientific challenge because it is expected to disclose information about the astrophysical sources. As pulsars have been precisely timed for decades with up to weekly cadence, the frequency range that can be probed by Pulsar Timing Array (PTA) (Foster & Backer 1990) projects is f ∈ [10 −9 , 10 −6 ] Hz, which is a window out of reach for the LIGO/VIRGO interferometers and at the edge of the LISA band. The most interesting and likely to be detected source is a cosmic population of super massive black hole E-mail: siyuan.chen@cnrs-orleans.fr binaries (Taylor et al. (2016) and Perrodin & Sesana (2018) for a recent review), whose interaction with the MSP signals introduces a correlated red noise in the Time of arrival (TOA) series, with a phase power spectral density (PSD) proportional to 1/ f 13/3 (for a circular population, see eg. Phinney (2001); Chen et al. (2017)). The choice of the algorithm is therefore of paramount importance to disentangle the pulsar specific small red noise (signal) from the effects of GWs in the presence of large white noise (background) in the shortest observation time. In this regard, PTAs (Desvignes et al. 2016;Alam et al. 2020;Kerr et al. 2020;Perera et al. 2019) enable the simultaneous measurement of the same pulsar with several RTs, especially the Large European Array for Pulsars (LEAP) project (Bassa et al. 2016).
The extraction of small signals from noise exploiting simultaneous measurements is a well-known method for the measurement of oscillators and atomic clocks. Specific tools have been developed over more than 50 years using the PSD and wavelet variances (Barnes et al. 1971;Rutman 1978;Gray & Allan 1974;Allan & Barnes 1981). Among them, the cross-spectrum method (Rubiola & Vernotte 2010) deserves separate mention in view of our application. Under certain assumptions, this method rejects the background noise (uncorrelated), and converges to the oscillator noise even if it is significantly lower than the background. Wavelet variances are commonly used in the time domain, the most known of which is the 2-sample Allan variance (AVAR), after the pioneering work of Allan and Barnes (Allan 1966). Such variances enable to distinguish different types of noise defined by their exponent in the PSD, e.g. f 0 for white phase, 1/ f for flicker phase modulation, 1/ f 2 for white frequency modulation, 1/ f 3 for flicker frequency modulation, 1/ f 4 for random walk frequency modulation noise, etc. The parabolic variance (PVAR) (Benkler et al. 2015;Vernotte et al. 2016) extends this concept, and exhibits (i) the highest rejection of white noise, and (ii) the efficient detection red noise underneath the background with the shortest data record. Moreover, the wavelet covariance (Fest et al. 1983;Lantz et al. 2019), i.e., Allan covariance (ACOV) or parabolic covariance (PCOV), enables the rejection of the background using two (or more) uncorrelated instruments.
This article is intended to port the time-and-frequency metrology methods to pulsar astronomy. In this respect, the pulsar is the clock under test, observed simultaneously with two or more RTs playing the role of phase meters. We compare the results of different methods applied to simulated time series representing the TOAs of millisecond pulsars. In section 1 we summarize the concept of pulsar timing as well as the spectral and variance methods from the time-and-frequency metrology. Section 2 describes the simulation process of the TOA time series, followed by the statistical methods in section 3. Results and the conclusions are presented in sections 4 and 5 respectively.

Pulsar timing
Pulsars are spinning neutron stars that emit radio waves from the region above their magnetic poles. If these regions are misaligned with the rotation axis and happen to point towards Earth as they sweep space, we receive one radio pulse each rotation. The most stable pulsars have rotation period of milliseconds and are stable on a very long timescale. Over the decades that they have been timed, very small variations have been detected, which translate into low frequency red noise in the Fourier domain. A detailed review about pulsars can be found in Lorimer & Kramer (2012).
In general, the time of arrival (TOA) of the pulses can be described precisely by the following timing model (Hobbs et al. 2006) where t obs is the observed and t model is the predicted TOA considering all known pulsar properties and propagation effects. For this study we also include the effects of dispersion measure variation over time into the perfect model. Ideally, we should be able to precisely predict the TOAs if all model parameters are perfectly known and there are no unknown sources of noise left. However, this is not the case in practice. The difference between the observed and model observation forms the residual series x = t red + t white . Therefore, we are interested in both white and red noise components. The pulse residual series can be transformed into Fourier frequency space. The noise components are then described by a PSD. The PTA noise description can be found in Lentati et al. (2016); Caballero et al. (2016); Arzoumanian et al. (2016). Following the notation of the latter, the white noise is described by two parameters E k and Q k where W is the initial estimate of the white coming from the radiome- ter noise and template matching error, E k and Q k are the so-called EFAC and EQUAD parameters respectively. Note that S w is constant over the whole frequency range.
The red noise is described by a power law with two parameters where A PTA is the amplitude and γ PTA is the spectral index of the power law. The total noise PSD S( f ) becomes (Vernotte & Zalamansky 1999)

Power and Cross spectrum
In time and frequency metrology, the cross spectrum is the most often used method to measure the phase noise of oscillators. The method relies on the simultaneous measurement with two separate instruments, assuming that the background noise is statistically independent. The same hypothesis holds for pulsar observations from different radio telescopes. The telescopes are obviously independent, and sufficiently far from each other for the tropospheric/ionospheric noise to be statistically independent. We summarize the main results from our tutorial on the cross spectrum method (Rubiola & Vernotte 2010). Given a residual series x from an instrument, we can write its Fourier transform as X = ℜ(X) + i ℑ(X). The one-sided PSD can be calculated as where T a is the acquisition time, and the factor 2 is due to energy conservation after surpressing the negative frequencies.
Measuring a clock (pulsar) signal c simultaneously with two instruments (telescopes), we get two residual series x and y, each containing the white background noise of the measurement added to c. The cross spectrum (CS) converges to S c = 2 T a CC * . Notice that, after compensating for the differential path, c goes only into ℜ(S yx ), while the ℑ(S yx ) contains only the background noise of the observation. After Rubiola & Vernotte (2010), the fastest, unbiased estimator is and can give negative values. We average over all the CSs coming from the different RT pairs to produce one simultaneous observation series for the analysis. Figure 1 shows an example of a comparison between the different spectra. The PSD can be computed on the averaged residual series (ARS) or on each residual series individually, and then averaged (PSD). As the ARS is very similar, but slightly worse than the CS, we opt to focus the analysis on the CS method, whose results should be comparable to those from the ARS. The CS is better at rejecting uncorrelated white noise and produces a lower level than the average PSD, which serves as a good reference.
We also investigated the effects of the choice of the window function on the Fourier transform by comparing the sinc function used above with a Blackman window. Although there is spectral leakage of the red noise, the amount is only noticable in the lowest frequencies and is within the uncertainty of the PSD/CS value. We found minimal effects on our analysis. In fact, as the spectral value decreases, the recovered red noise parameters provide a worse match to the injected values. We also found similar results with a prewhitening / postdarkening procedure. Therefore, we show only the results obtained with the above equations.

Variances
Different variances have been proposed to describe the amount of variation in a given residual series x with a total time span T for time steps τ. The two prominent ones are the Allan (Allan 1966) and more recently the parabolic variance (Vernotte et al. 2016), which we summarize here.
The AVAR is a simple computation comparing three observations separated by m time steps where M = N −2m +1 is related to the total number N of data points in the residual series x. Compared to similar variances, AVAR is the most efficient at estimating for the largest τ = T /2. By contrast, the Modified Allan variance is more suitable to the analysis of shortterm fluctuations (Allan & Barnes 1981). The parabolic variance combines the original long-term computability as well as a good response to short timescales Similarly to the CS method, we can also use two indepedent residual series x and y to compute the ACOV and the PCOV A comparison of the two different (co)variances can be found in figure 2. In general, the variance plot has two prominent features: (i) at lower time steps, the variance is dominated by white noise, which leads to a power law decrease with fixed spectral index, (ii) at higher time steps, red noise start to appear, which shows up as a flattening and then rising of the variance, depending on the amplitude and spectral index of the red noise. The variance curve can be parametrized as The relation between the PTA parameters EFAC, EQUAD, γ PTA and A PTA and the variance parameters is not trivial. As the white noise follows a fixed theoretical spectral index for a given variance β VAR = −2/ − 3 for AVAR and PVAR respectively, we can compare the overall amplitude of the variance white noise with S w from the PTA white noise in (2) where τ 0 is the sampling time.
The two red noise spectral indices can be related via The red noise amplitudes follow analytic relations for given integer spectral indices (see table 1 in Vernotte et al. 2016). Under the condition that γ PTA ∈ [1.5, 4.5] it is also possible to find an analytic relation between A VAR and A PTA for non-integer γ PTA , see also Vernotte et al. (2020), where A (γ PTA ) is a function that relates to the coefficients of the variance. Using the same approach as Walter (1994) we find the folowing relationships for AVAR and PVAR

SIMULATED DATASETS
Clock comparison methods require several time series of the same pulsar observed simultaneously with different telescopes over a long time span. The LEAP project uses the 5 major radio telescopes in Europe: Effelsberg in Germany, Lovell in UK, Westerbork in the Netherlands, Nançay in France and Sardinia in Italy as an interferometer with monthly simultaneous observations since 2012 (Bassa et al. 2016;Smits et al. 2017). Thus we simulate 100 realizations of 5 TOA series over a 4000day long observing campaign with a cadence of 30 days, this is of the order of the nominal LEAP observations up to today. They have been created using the libstempo package (Vallisneri 2015). For simplicity, we assume that all 5 RTs have comparable instrumental white noise, and give to all residuals the initial uncertainty W = 500 ns. As the two white noise parameters are degenerate, we will only focus on the dominant term, ie. EFAC, injecting different realizations of white noise with (EFAC = 1.05, EQUAD = 0).
We also simulate the same residual series as it would be observed by LEAP under ideal conditions. In the LEAP statistical limit (LSL) of the coherently added mode, the total S/N of the LEAP observation is the sum of the individual RTs. Since we assume that the 5 RTs are identical, the LEAP observations have a 5 times higher S/N with the TOAs having a white noise level that is 5 times lower as S/N ∝ 1/W . This also translates into an EFAC injection of 1.05/5 = 0.21, while EQUAD remains at 0.
In total, we consider three cases: (i) white noise case: no red noise signal injected (ii) boundary case: (log 10 A PTA = −14, γ PTA = 3) (iii) red noise case: (log 10 A PTA = −13, γ PTA = 3) For each case we simulate 100 sets of 5 residual series with different red noise realization between each set. For a given set, the red noise realization is the same amongst the 5 residual series, but the white noise realizations differ. An example of a set of 5 simulated residual series can be seen in figure 3. As we are focusing only on the noise properties, we fix the timing model at its injected values. This means that the residual series are fully determined by the injected white and red noise. In practice, there will be covariances between the timing model and the noise parameters. Here we only show the theoretically optimal results as a proof-of-concept.

METHODS
The standard PTA analysis is done via the enterprise package (Ellis et al. 2017), details of the analysis can be found in Arzoumanian et al. (2016Arzoumanian et al. ( , 2018. We use all 5 individual RT residual series for the PTA analysis comparison. For the LSL analysis we only have one residual series, but with a lower white noise level. We compute the PSD and CS through equations (5) and (7),  figure 1, and compare them directly to equation (4) to get constraints on the noise parameters. The PSD can be averaged over the 5 residual series, whilst the CS can be computed from the average of the 10 different combinations of 2 out of the 5 residual series. The Allan and parabolic Variances are calculated through equations (8) and (9), see figure 2, and matched to the parametrization in equation (12) with the use of equations (13), (14) and (15). Unlike the PSD, we use the averaged residual series to compute the variances.
We use Bayesian statistics and a nested sampling algorithm, implemented in the cpnest package (Del Pozzo & Veitch 2015), to look for the posterior constraints on the white noise properties as well as the spectral index and amplitude of the red noise parameters.
A likelihood function can be derived from a weighted least squares fitting procedure where σ 2 i are the weights, D are the computed data points and F (φ ) are the corresponding parametrizations with the model parameters φ = (EFAC, log 10 EQUAD, γ PTA , log 10 A PTA ).
For the spectral analysis we work in the logarithmic space, where D corresponds to the computed log 10 PSD/CS values. From a large Monte-Carlo simulation we infer the uncertainties to be σ i = 0.2136 across all frequencies in the Gaussian approximation using only the positive estimates, see appendix A.
For the (co)variance analysis, where D are the computed (co)variances, we compute the weights σ 2 i from the effective degrees of freedom for each time step. A more detailed description can be found in appendix B.

RESULTS
To ensure that our results are statistically robust, each of the three cases is based on 100 different realization sets and doing the analysis using all the clock comparison methods and the standard PTA/LSL comparisons. All following posterior distributions and recoveries are based on the addition of all 100 individual posterior distributions, unless stated otherwise. They should thus be representative of the constraints from the analyses of a single simulated set of 5 residual series.

White noise
We first explore the performance of the clock methods in the case of pure white noise in the residual series, and compare it to the traditional PTA/LSL analysis. Figure 4 shows the PTA noise parameter marginalized 1D and 2D posterior distributions in a corner plot. As there is no red noise present, A PTA and γ PTA return very uninformative posteriors for all methods. γ PTA simply returns a nearly flat uniform prior. The quantity A PTA returns a posterior distribution that is flat for low amplitudes, but has a cutoff at an higher amplitude. This can be interpreted as a upper limit case, as red noise with a higher amplitude should have affected the residual series. Similarly, as there is no EQUAD injected, all analyses also show upper limits.
The only parameter of interest in the case of white noise is EFAC. As expected, both the PSD and PTA methods return distributions around the injected value, with the PTA distribution being more constraining than the PSD. The CS method does not seem to lower the white noise to the same level as the LSL, see top panel of figure  4. However, this is misleading, as the white noise is constrained from the positive CS values only. In fact, the CS method lowers the white noise to almost zero and could perform similarly or better to the LSL residuals by also taking the negative CS values into account. This would, however, come at the cost of a decreased performance in the recovery of the red noise parameters. In the middle panel of figure 4, both AVAR and PVAR show similar, but worse, contraints than the PTA analysis, where AVAR is closer to the injected value and PVAR has a small bias towards lower values. The bottom panel shows that the covariances reject most of the white noise, pushing the EFAC towards to lower end of the allowed values. This effect is intended as the covariances 'try' to find a correlated signal, but as there is only uncorrelated noise a upper limit on EFAC is placed.
However, we track the evolution of the performance of the different methods to recover white noise as the amount of red noise gradually increases as we move through the 3 simulated cases. The main result in figure 5 is that as the red noise increases, all methods seem to progressively perform worse in constraining the white noise. For the spectral and variance analysis, the posterior distributions progressively widen. Particularly in the strong red noise case, the AVAR and PVAR analysis are completely unable to recover the injected white noise (bottom middle panel). For the covariance analysis, the EFAC upper limit gradually increases, most notable in the strong red noise case.
A different visual aid is given in figure 6, which shows the constraints on EFAC from the individual analyses of a subset of 10 sets of realizations.

Boundary
A summary of the constraints on the PTA parameters from the standard PTA/LSL analysis and the clock comparison methods is given in table 1. The posterior distributions for all analysis methods with the recovered spectra and variances can be found in figure 7. Examples from analyses of a single simulation set are shown in figure 8. Despite no injection of a EQUAD value, we still perform the all analyses with all 4 PTA noise parameters. However, the recovered EQUAD posterior distributions are all upper limits as in the white noise case above. Thus, we will omit the EQUAD parameter from our discussion from here on, as the focus will be on the red noise parameters.

Spectral analysis
The comparison of the marginalized 1D and 2D posterior distributions of the PTA noise parameter for the boundary case from the spectral analysis can be found in the top left corner plot of figure 7. The middle and bottom left panels in figure 7 show the injected red and white noise along with the recovered PSD and CS respectively. The red noise signal is not well detectable for the PSD as only the lowest frequency bin in the left middle panel of figure 7 shows signs of deviating away from white noise. There is a small hint of the red noise. Which is sufficient to put loose constraints on the spectral index γ PTA and amplitude A PTA from the PSD method in the top left panel. The constraints on the two PTA red noise parameters are comparable to the standard PTA analysis.
On the other hand, the decrease of the white noise level through the CS improves the detection of the injected red noise and the constraints. The lowest 3 frequency bins contribute to the recovery of the red noise power law, see bottom left panel of figure 7  signal is clearly detectable and the tightest constraints can be placed in the top left panel.

Variance analysis
The middle column of figure 7 shows the comparison red noise parameter corner plot in the top panel and recovered Allan variance (middle panel) and parabolic variance (bottom panel) with the injected red noise. As equation (16) is only valid for 1.5 ≤ γ PTA ≤ 4.5, the boundaries are reduced in the top row middle panel of figure 7. With this caveat, the red noise spectral index γ PTA and amplitude A PTA can still be compared, however the cutoffs at 1.5 and 4.5 limit the comparison and can thus influence the red noise recovery. The recovered variances in the middle and bottom panels of the middle column of figure 7 illustrate the fact that the highest octave of τ with very large uncertainty is unreliable and contributes very little to the overall analysis. Since we are looking at a boundary case, the injections are close or within the uncertainties of the variance computation. Therefore, they do not necessarily coincide with the median of the recovered variance, see also table 1. Additionally, the prior choice allows for a large distribution of small amplitude red noise, producing a median variance close to white noise.
This results in a conservative estimation that there is little red noise in the boundary case with AVAR and PVAR. However, as there a small turn-up in the higher time steps, both provide some evidence of the injected red noise. This can be seen in the top row middle panel corner plot, especially in the A PTA parameter and a small improvement in constraining the spectral index. The PTA analysis is also able to pick up the red noise and performs better than the variances. With the lowest level of white noise, it is no surprise that the LSL analysis performs the best.

Covariance analysis
We can further focus on the red noise detection by sacrificing the constraints on the white noise. This is done in a similar way as with the CS by computing the covariances, such that uncorrelated white noise is rejected to allow for the correlated red noise to be more prominent. The right-hand column of figure 7 shows the posterior distributions, recovered ACOV and PCOV from top to bottom. The same limitation on the spectral index γ PTA for variances applies also to the covariances.
Despite the lower white noise from the covariances, the constraints on the red noise parameters γ PTA and A PTA are actually worse than their variance counterparts, as it can be seen in the top right panel of figure 7 and table 1. As there is no correlated white noise, the lowest time steps become upper limits with large uncertainties and do not provide any information on the red noise. This leaves only one or two time steps with information, which is shown in the right column middle and bottom panels. The overall recovery using all time steps thus favours low amplitude white and red noise. Using only time steps with small uncertainties could produce tighter, but less reliable, red noise constraints.

Red noise
The strong red noise case is of particular interest as it shows how well the different methods can in principle divide different red noise realizations in the large-signal regime. An overview on the con-straints on the PTA parameters for the red noise case is given by table 2 for the standard PTA/LSL and clock metrology analysis. The overall posterior distributions for all methods are shown in figure 9. The results of some example analyses of a single simulation set can be found in figure 10.
In general, all methods can detect the injected red noise very well. As the red noise is very large, there is very little difference in the constraints on the red noise parameters γ PTA and log 10 A PTA between the PTA and the LSL analysis. Both are centered around the injected values with tight constraints, see top panels of figure 9. So do the spectral method recoveries, although with a very small tail, which is longer for the PSD compared to the CS. The uncertainties in the spectral methods allow for small values even at the lowest frequencies. This leads to a tail of lower amplitudes being recovered, see table 2. Both variances and covariances perform very similarly to the PTA/LSL analysis in terms of the tightness of the constraints on the red noise parameters, especially PVAR and PCOV are comparable to the LSL analysis. This is due to the fact that the higher time steps are dominated by red noise and have therefore very similar VAR/COV values. Additionally, the absolute uncertainties σ i are the same for both variances and covariances. Consequently, the covariances at the lowest τ are not as well defined as the variances for the same time steps. These large relative uncertainties and the lower COV values translate into a small bias for the red noise to have steeper spectral indices γ PTA than with the variance methods. On the other hand, the variance methods are still a little bit impacted by the choice of priors, see table 2. This can be seen in the bottom middle panel of figure 9, where the median recovered PVAR is lower than the median PVAR values of the 100 realizations. The same is also true for AVAR recovery.
Table 2 also shows that both AVAR and ACOV have a longer tail towards lower red noise amplitudes log 10 A PTA when compared to PVAR and PCOV. This can be the advantage of the parabolic (co)variance transitioning from a spectral index of β VAR = −3 for white noise to γ VAR = 0 for the injected red noise. In comparison the Allan (co)variance has a transition of one order less, ie. from β VAR = −2 to γ VAR = 0.

CONCLUSIONS
We have shown that clock comparison methods, such as the cross spectrum and variances, can be applied to pulsar timing observations. Initial tests show that these methods produce slightly worse, sometimes comparable constraints on the pulsar noise parameters when compared to the standard PTA and optimal LSL analysis. Both perform very well in recovering the injected white noise, regardless of the level of red noise. The red noise recovery itself is broadly consistent with the injections, with different realizations giving similar constraints. In general, the metrology clock comparison methods perform well in the pure white noise case, slightly worse in the boundary case and comparable in the strong red noise case. The spread of recoveries between different realizations is also larger than the comparison PTA/LSL analysis.
The PSD method produces slightly broader parameter posterior distributions compared to those from the PTA analysis. The CS allows for a long tail of low values, including negative values. This means a reduction of white noise to almost zero at high frequencies, but including these negative values will skew the red noise recovery. Using Gaussian distributions also decreases the possibility for low CS values and thus give slightly more optimistic constraints for A PTA and γ PTA . Thus, the CS method performs slightly better than the PTA analysis, but still a bit worse than the LSL analysis. Since the LSL's white noise level is only a fifth of that of the PTA observations, it can put the tighest constraints on the red noise.
Similar constraints can be put on the red noise parameters for both the variances and covariances. The (co)variance methods perform particularly badly in recovering the white noise in the red noise case. By design, ACOV and PCOV should reject uncorrelated white noise and only leave the correlated red noise signal. Therefore, no useful information can be gained for the white noise, while some extra power may be pushed into the red noise. This lack of information may limit the performance of the covariances in precisely constraining the red noise and manifests as a small bias in γ PTA or a long tail of low red noise amplitudes A PTA .
The clock comparison methods complement the standard PTA analysis by giving independent constraints on the noise parameters. This can be especially valuable for certain realizations, when the different analysis methods do not converge to the same values.
We continue the study and apply the methods described to the observations from the LEAP project, where we deal with the challenges from real pulsar timing data. These include unevenly sampled data points, different levels of white noise between RTs and observations and uncertainties in the computation of the spectral densities and (co)variances.

DATA AVAILABILITY
The simulated residual series and posteriors can be obtained from the authors upon request.

APPENDIX A: ESTIMATES AND WEIGHTS OF THE CROSS SPECTRUM
For a given frequency f 0 , we consider the noise level S w from uncorrelated observational noise of the RTs and the signal level S r from the pulsar red noise, see equation (4). We perform a large Monte-Carlo simulation with 10 7 realizations at the given frequency with varying levels of noise and signal.
The large simulation allows for the evaluation of the estimates  σ emp log 10 1.576 S w + 4.147S r S w + 3.833S r log 10 (2.045) = 0.3106 σ mod log 10 1.251 S w + 3.835S r S w + 3.640S r log 10 (1.635) = 0.2136 Table A1. Dependence of the position and width parameters of the CS estimates on S w and S r .
of the CS. We determine empirically the statistics and conclude that the most reliable representation can be obtained from the logarithmic CS estimates distinguishing between positive and negative estimates. The empirical distributions are described by two parameters: µ emp and σ emp are the mean and variance of the log of the CS estimates respectively. Despite the asymmetry of the probability density functions we choose to model them with Gaussians with mean µ mod and variance σ mod and find the best fit values to both positive and negative CS histograms (see inset of figure A1). By varying the noise and signal levels we obtain numerical expressions for the 4 parameters as a function of S w and S r . They can be found in table A1. The factor (S w + 5S r ) shows that we can retrieve the correct number of degrees of freedom, ie. 5, in a purely empirical way.
The important results are the following: (i) the width parameters σ emp,mod can be considered as constants (ii) the position parameters for negative estimates µ − emp,mod only depend on S w (iii) the position parameters for positive estimates µ + emp,mod depend on S w as well as S r (iv) the shift between µ emp and µ mod is constant and can be evaluated from the 10 7 realizations to be log 10 (1.373) = 0.138. This shift between simulation and model can be explained by the long tail towards lower numbers of the empirical distributions which decrease µ emp compared to µ mod . The (almost) constancy of the width parameters confirms that the log of the CS estimates is a reliable representation. Moreover, the position parameters of the negative estimates do not depend on S r . This implies that the negative estimates do not carry any information about the signal level and should not been taken into account. The variation of the signal and noise levels makes f 0 representative for any frequency. Additionally, since the PSD is similar to the positive CS estimator, we can apply the same Gaussian model.

APPENDIX B: WEIGHTS AND UNBIASING OF THE VARIANCES
The uncertainty σ i on each time step i, whether variances or covariances, can be calculated from the average values of the variances VAR where ν are the effective degrees of freedom, whose expressions depend on the type of noise. The expressions used in the analysis for AVAR can be found in Lesage & Audoin (1973): for white phase modulation (γ PTA = 0) and flicker frequency modulation (γ PTA = 3): ν ≈ 5N 2 4m(N + 3m) .
The equation for PVAR is given in Vernotte et al. (2020) as where A and B are constants which depend on the type of the modulation. They are given as A = 23 for γ PTA = 0 and A ≈ 28 for γ PTA = 3, whilst B ≈ 12 for all spectral indices. As pointed out in Vernotte et al. (2020), the formula is unreliable for the highest τ. However, from the theory we know that ν = 1 at the largest time step for PVAR. We also notice a mixture of white and red noise for our realizations, especially for the boundary case. Consequently, we approximate the degrees of freedom for the boundary case using those from the white noise case as a conservative approach. The degrees of freedom used for our analysis can be found in tables B1 and B2 and are the same for the covariances. Finally, Vernotte & Lantz (2012) have shown that the largest time steps are statistically biased towards lower VAR/COV values. The factors that need to be applied for any time step τ with its degrees of freedom ν can be found in table II in Vernotte & Lantz (2012). We also show the factors used in this paper in tables B1 and B2.  Table B2. Effective degrees of freedom and unbiasing factors for the variances in the white noise and boundary case.