Bayesian noise wave calibration for 21-cm global experiments

Detection of millikelvin-level signals from the 'Cosmic Dawn' requires an unprecedented level of sensitivity and systematic calibration. We report the theory behind a novel calibration algorithm developed from the formalism introduced by the EDGES collaboration for use in 21-cm experiments. Improvements over previous approaches are provided through the incorporation of a Bayesian framework and machine learning techniques such as the use of Bayesian evidence to determine the level of frequency variation of calibration parameters that is supported by the data, the consideration of correlation between calibration parameters when determining their values and the use of a conjugate-prior based approach that results in a fast algorithm for application in the field. In self-consistency tests using empirical data models of varying complexity, our methodology is used to calibrate a 50 $\Omega$ ambient-temperature load. The RMS error between the calibration solution and the measured temperature of the load is 8 mK, well within the 1$\sigma$ noise level. Whilst the methods described here are more applicable to global 21-cm experiments, they can easily be adapted and applied to other applications, including telescopes such as HERA and the SKA.


INTRODUCTION
For nearly a century, scientists have been using radio-frequency instruments to advance the study of astronomy and complement information from the visual regime of the electromagnetic spectrum (Pritchard & Loeb 2012). As we begin to take measurements of the early universe, these instruments must continue to evolve to support observations. Unexplored cosmic information from the Epoch of Reionisation and Cosmic Dawn redshifted into the radio spectrum could provide constraints on fundamental physics such as primordial black holes, galaxy formation, and universal curvature as discussed in Furlanetto et al. (2009). A unique probe of phenomena from the early cosmos is the hydrogen that inundates the intergalactic medium (IGM). Heating and cooling of the IGM associated with hydrogen's absorption and emission of 21-cm photons produce a dynamic brightness temperature relative to the cosmic microwave background temperature, tracing the evolution of surrounding structure during the Cosmic Dawn. The brightness temperature of this 21-cm photon signal can be described by which is heavily dependent on environmental factors of the early universe such as H , the fraction of neutral hydrogen, Ω m and Ω b , the matter and baryon densities with respect to the universal critical density for a flat universe and Hubble's constant. Here, the 0.023 is a constant from atomic-line physics. R is the background radiation ★ E-mail: ilvr2@cam.ac.uk temperature and S is known as the '21-cm spin temperature', which is related to the kinetic temperature of neutral hydrogen gas in the IGM (Zaldarriaga et al. 2004;Field 1958). This cosmic hydrogen signature measurable in the spectral sky has been redshifted to wavelengths under 200 MHz through the expansion of the universe as discussed in Pritchard & Loeb (2012).
There has been a recent surge in the field of 21-cm cosmology following the reported detection of an absorption feature consistent with a Cosmic Dawn signature. This was reported by the Experiment to Detect the Global EoR Signature (EDGES) in early 2018 from measurements of a sky-averaged radio spectrum (Bowman et al. 2018). The signal, centred at 78 MHz with a width corresponding to a period between 180 million and 270 million years after the Big Bang, matches the theoretical position in frequency, but its depth of ∼ 0.5 K is a factor of two greater than the largest predictions from theoretical models (Cohen et al. 2017). This discrepancy would suggest that the temperature difference between the IGM and the cosmic microwave background was much larger than previously thought and would require new physics to explain, such as dark matter-baryon interactions (Barkana 2018) or excess radio backgrounds (Ewall-Wice et al. 2020).
Another possible explanation for this discrepancy is that the measured signal is not cosmological but of systematic origin. This may be the case in EDGES due to some of the methodology used, such as a potentially unphysical foreground removal method and calibration of the receiver in a separate environment from the data acquisition (Hills et al. 2018;Razavi-Ghods 2018). In this paper, we present a novel calibration algorithm that improves on the work of the EDGES team (Rogers & Bowman 2012) through the utilisation of a Bayesian framework to promote efficient use of the data to remove system-atics. Using conjugate priors and machine learning techniques, our pipeline can be applied in the field with the collection of data with additional capabilities for optimising individual noise wave parameters and incorporating correlations between them. This paper is organised as follows. In section 2 we review the methodology behind calibration using noise waves as well as present a Bayesian framework that provides greater flexibility in radiometer calibration. Section 3 describes the process of using mock data sets modelled after empirical measurements of reflection coefficients with the incorporation of a realistic noise model to evaluate our pipeline.

METHODS
In this section, we detail the methodology behind radiometer calibration using noise wave parameters. An overview of global signal measurement are outlined in section 2.1. Section 2.2 summarises the basic procedure with some mathematical improvements while section 2.3 describes our Bayesian framework and its associated advantages.

Measuring the global signal
The noise necessitating calibration emerges during measurementtaking. In an averaged or global experiment, the sky temperature sky (Ω, , ) is a function of the direction Ω, frequency and time . This can be broken down into two primary components: the global 21-cm signal 21 and astrophysical foregrounds f The antenna measures the sky signal convolved with the normalised antenna directivity . The process of measurement introduces the random noise term data .
Our 21-cm signature can thus be represented as Here, the integral is assessed through foreground and beam modelling techniques such as those discussed in Anstey et al. (2020) while modelling of data from the statistical properties of ( , ) is accomplished by a calibration algorithm as articulated in this paper and outlined in Fig. 1. Having a fully Bayesian framework when modelling the beam, the sky and the systematics has major advantages for global 21-cm experiments such as REACH (de Lera Acedo et al. 2021), as it provides the greatest flexibility in being able to model all effects and jointly fit for them.

Calibration methodology
The standard calibration strategy follows the method introduced by Dicke to characterise systematic features in radio frequency instruments (Dicke 1946) and is widely used in experiments such as EDGES (Monsalve et al. 2017) and LOFAR (Bilous, A. V. et al. 2016) to evaluate the spectral index of the sky's diffuse radio background (Rogers & Bowman 2012). This technique involves measurements of two internal reference standards; a load and a noise source, in addition to a series of external calibration sources attached to the receiver input in lieu of the antenna. These include an ambient-temperature 'cold' load, a 'hot' load heated to ∼ 400 K, an open-ended cable and a shorted cable. A block diagram showing this arrangement is presented in Fig. 2. When calibrating the receiver, reflection coefficients are taken of the calibration source connected to the receiver input (Γ cal ) and of the receiver itself (Γ rec ) as well as power spectral densities (PSDs) of the input ( cal ), the internal reference load ( L ) and the internal reference noise source ( NS ) (Monsalve et al. 2017). These measurements are used to calculate a preliminary 'uncalibrated' antenna temperature * cal * where L and NS are assumptions for the noise temperature of the internal reference load and excess noise temperature of the internal noise source above ambient, respectively. This initial calculation is used to calibrate out any time-dependent system gain that emerges from a series of filters, amplifiers and cables, as well as the analogueto-digital converter within the experimental apparatus (Monsalve et al. 2017). Each PSD measurement can be expressed in terms of specific response contributions as detailed in Bowman et al. (2018) Here, sys is the system gain referenced to the receiver input and cal is our calibrated input temperature. unc , cos , and sin are the 'noise wave parameters' introduced by Meys (1978) to calibrate the instrument. unc represents the portion of noise reflected by the antenna that is uncorrelated with the output noise of the low noise amplifier (LNA). cos and sin are the portions of reflected noise correlated with noise from the LNA (Monsalve et al. 2017;Rogers & Bowman 2012). In the EDGES experiment, these calibration quantities are modelled using seven-term polynomials in frequency.
The PSDs for the internal reference load and noise source can similarly be expressed as in equation (6). However, since the reflection coefficients of the internal references are typically less than 0.005, they are taken to be zero in order to simplify the equations As shown in Fig. 2, the internal references may be on a separate reference plane than the receiver input, resulting in a system gain * sys and a noise offset * 0 different from those defined in equation (6). This effect is taken into account by two additional scale and offset parameters, 1 and 2 , introduced by EDGES ( Monsalve et al. 2017).
Since 1 and 2 also correct for first-order assumptions in the noise temperatures of the internal reference load and noise source, we have chosen to absorb these terms into L and NS . This adjustment allows all calibration parameters, unc , cos , sin , and an 'effective' NS and L , to be solved for in units of kelvin, facilitating a joint solution of parameters. Expanding equation (5) using equations (6)  to (8) yields a linear identity providing a relationship between the uncalibrated input temperature and a final calibrated temperature of any device connected to the receiver input where all parameters are frequency-dependent. This is not explicitly shown for simplicity of notation. For estimation of the noise wave parameters, cal , Γ cal and Γ rec are measured along with the PSDs while sys and 0 are calibrated out. The cold and hot loads exhibit the main temperature references needed for L and NS . The cables facilitate the derivation of the noise wave parameters describing spectral ripples from the noise properties of the receiver by acting as antennas looking at an isotropic sky with temperatures equal to the cables' physical temperatures (Rogers & Bowman 2012).

Bayesian calibration framework
One possible source of systematics in the calibration methodology used by EDGES comes from measuring the response of the four external calibrators along with the receiver reflection coefficient in a laboratory away from where the instrument is actually deployed (Bowman et al. 2018). This process, especially with regards to how calibration parameters change, can be non-trivial. Furthermore, the fixed polynomial order used by EDGES for all noise wave parameters may underfit or overfit individual parameters and thus 'fit out' data useful for determining systematics or potentially even the 21-cm signal itself if a joint fit is performed. In response to these issues, we have developed a calibration pipeline that improves on the strategies presented in section 2.2. We introduce a novel Bayesian methodology using conjugate priors for a dynamic application of our algorithm to be run with data collection regardless of system complexity. Also included are model selection methods using machine learning techniques for the optimisation of individual noise wave parameters to combat overfitting and underfitting, the results of which converge with that of a least-squares approach when wide priors are adopted. Our pipeline easily incorporates many more calibrators than the standard four shown in Fig. 2 to increase constraints on noise wave parameters while identifying possible correlations between them. A schematic of the improved calibration method is shown in Fig. 3.
In order to simplify our calibration approach, we first define the following terms Figure 3. Outline of the Bayesian calibration algorithm. Blue blocks represent data to be taken, red blocks represent calculations and green blocks represent calculation outputs.
which represent initial calibration measurements on in the frequency domain for the characterisation of data from equation (3) via our noise wave parameters. It is expected that calibration-related deviations of in the time domain are sufficiently curtailed through practical strategies such as temperature control of the receiver environment. Incorporating these into equation (9), with some rearrangement, then gives the equation unc unc + cos cos + sin sin + NS NS + L L = cal , at each frequency. Here, there are no squared or higher-order terms, allowing us to take advantage of the linear form by grouping the data and noise wave parameters into separate matrices In these equations, all of our data; the reflection coefficient measurements and power spectral densities, are grouped in an X vector which forms a matrix where one of the axes is frequency. The calibration parameters as frequency-dependent polynomials of varying degree are collected into a vector which serves as our model describing data . Applying these definitions condenses the calibration equation into where T cal is a vector over frequency and is a noise vector representing our error. Since EDGES assumes that each power spectral density measurement is frequency independent, we have assumed that is a multivariate normal distribution. This assumption is implicit in the EDGES analysis in which they use a least-squares minimisation approach for solving model parameters.
For calibration of the receiver, we are concerned with the construction of predictive models of the noise wave parameters, , in the context of some dataset, T. We can use to calculate the probability of observing the data given a specific set of noise wave parameters: where, is the number of measurements. This distribution on the data is the likelihood. For the purposes of calibration, T may be T cal measurements or alternatively, T sky for prediction of a sky signal. Our model must also specify a prior distribution, quantifying our initial assumptions on the values and spread of our noise wave parameters which we specify as a multivariate normal inverse gamma distribution: which is proportional up to an integration constant. Here, and , which are greater than zero, along with V and represent our prior knowledge on the noise wave parameters. is the length of our vector .
Equation (18) is determined by a set of values for our model . We can marginalise out the dependence on and our noise term by integrating over the prior distribution by both and 2 at once. Following the steps in Banerjee (2009) where * = + 2 and Γ ( ) represents the Gamma function, not to be confused with the notation for our reflection coefficients. Equation (20) is the evidence, which gives the probability of observing the data T cal given our model. 1 With the prior distribution specified, we use Bayes' equation to invert the conditioning of the likelihood and find the posterior using the likelihood, prior and evidence: Similarly from Banerjee (2009), this can be written as The posterior distribution represents the uncertainty of our parameters after analysis, reflecting the increase in information (Nagel 2017). We highlight the difference between the 'likelihood-only' least-squares approach versus the Bayesian approach with the former being a special case of the latter with very wide priors demonstrable when V → ∞ ⇒ V −1 → 0, and * becomes . The transition from 'non-starred' variables to 'starred' variables represents our 'Bayesian update' of the prior to the posterior noise wave parameters in light of the calibration data T cal .
As we can see, the posterior distribution is in the same probability distribution family as equation (19), making our prior a conjugate prior on the likelihood distribution. The use of conjugate priors gives a closed-form solution for the posterior distribution through updates of the prior hyperparameters via the likelihood function (Banerjee 2009;Orloff & Bloom 2013). The resulting numerical computation is many orders of magnitude faster than MCMC methods relying on full numerical sampling and permits an in-place calculation in the same environment as the data acquisition. This becomes particularly useful for the speed of the algorithm as frequency dependence is introduced in which the computations would not be manageable without conjugate gradients.
To allow for a smooth frequency dependency, we promote each of our noise wave parameters in equation (16) to a vector of polynomial coefficients where is our noise wave parameter label; ∈ {unc, cos, sin, NS, L}, modelled using + 1 polynomial coefficients. Likewise where is a vector of input frequencies which are raised to powers up to . For a vector of 's attributed to our calibration parameters, under this notation multiplication in equation (17) is element-wise and equation (20) is effectively (T cal |n). Assuming a uniform prior on n, inverting Bayes' theorem gives (n|T cal ) for use in model comparison in which the relative probabilities of models can be evaluated in light of the data and priors. Occam's razor advises whether the extra complexity of a model is needed to describe the data (Trotta 2008), permitting optimisation of the polynomial orders for individual noise wave parameters as detailed in section 3.3. By 1 It is in fact better to use the equivalent more numerically stable expression * = + + XV X , where = T cal − X * to avoid cancellation of large terms. taking a random sampling of the resulting posterior, we characterise the noise wave parameters as multivariate distributions depicted in contour plots which exhibit a peak value accompanied by 1 and 2 variance as well as correlation between parameters inferred from a covariance matrix. Following characterisation of the receiver, we next apply the T cal from our calibration to a set of raw antenna dataX for prediction of our sky signal, T sky , from equation (3). The predictions for the data follow from the posterior predictive distribution The first probability in the integral is the likelihood for our antenna measurement T sky and the second is our posterior from equation (23). Following the steps in Banerjee (2009), this can be shown to be a multivariate Student's t-distribution written as: where is the × identity matrix and * , * , * and V * are defined in equation (21). This new distribution on T sky corresponds to a set of points with error bars and represents the calibrated sky temperature as the output of the receiver.

EMPIRICAL MODELLING AND SIMULATIONS
To verify the performance of our pipeline and highlight features of the algorithm, we evaluate the results of self-consistency checks using empirical models of data based on measurements taken in the laboratory. To make this data as realistic as possible, we used actual measurements of the reflection coefficients of many types of calibrators (see table 1) to generated power spectral densities using equations (6) to (8) given a set of realistic model noise wave parameters along with some assumptions about the noise, which are described in section 3.4. The impedance of the calibrators which were measured with a vector network analyser (VNA) and used in our pipeline are shown on a Smith chart in Fig. 4 We start by demonstrating the importance of correlation between noise wave parameters when determining their values to provide a better calibration solution for the reduction of systematic features in the data such as reflections (section 3.1). We then show the increased constraints on these noise wave parameters attributed to the inclusion of more calibrators than the standard number of four (section 3.2). Following this, we illustrate the effectiveness of model selection for the optimisation of individual noise wave parameters to prevent the loss of information resulting from overfitting or underfitting of the data (section 3.3). Finally, these features are incorporated into a calibration solution applied to a 50 Ω load (section 3.4).

Correlation between noise wave parameters
In this section, we show the first major feature of our Bayesian pipeline; the consideration of correlation between noise wave parameters when deriving their values. This is best demonstrated when noise is introduced in an idealised way as to retain a form matching the Gaussian form of our mathematical model. To do this, empirical models of power spectral densities are calculated from equations (6) to (8) using measurements of Γ rec , Γ cal and cal for the cold and hot loads, as well as a set of realistic noise wave parameters. Gaussian noise of one unit variation is then added to the cal measurements after the calculation to conserve its Gaussian form. This data is submitted to our algorithm and the resulting posterior distributions for coefficients of the polynomial noise wave parameters are compared to the initial values.
Such posterior distributions can be seen in Fig. 5 showing the results of models using only the cold load (grey posterior), only the hot load (red posterior) and using both loads in tandem (blue posterior). For these calculations we chose a set of model noise wave parameters as constants across the frequency band; unc = 250 K cos = 190 K sin = 90 K NS = 1200 K L = 298 K Figure 5. Plot showing the joint posteriors of L and NS for models using the cold load, the hot load, and both loads concurrently shown as the grey, red and blue posteriors respectively. The black cross hairs mark the noise wave parameter values used to generate data submitted to the pipeline. A zoom-in of the posterior intersection is provided to illustrate the constraint of noise wave parameter values attributed to the correlation between parameters.
In Fig. 5, a strong correlation between the L and NS is evident as the hot-load posterior is highly skewed as expected from equations (11) and (14). The resulting intersection of posteriors from the individual loads facilitate the derivation of noise wave parameters as the dual-load posterior is found within the region of posterior overlap crossing with the values of the model shown in the inset of Fig. 5. Retrieval of the noise wave parameter values using correlations between them found in the data demonstrate the relevance of this information which is not taken into account in previous calibration techniques.

Constraints with additional calibrators
A nice feature of our pipeline is the ability to include as many calibrators as required to constrain the calibration parameters. For analysis, six more calibrators are introduced in pairs following the order presented in table 1. We include data generated from measurements of multiple resistors terminating a high quality 25 m cable made by GORE ® . Data for these calibrators is once again generated using fixed terms and Gaussian noise of one unit variation added to cal as discussed above. Fig. 6 shows the results of models using four, six, and eight calibrators.
As shown, the inclusion of more calibrators increases the constraint on the resulting noise wave parameters. However, we note that after the inclusion of four calibrators, the relative additional constraint decreases with each additional calibrator and thus the use of more than eight calibrators would be unnecessary. The values of noise wave parameters used to generate the data as indicated by the cross hairs in Fig. 6 all fall within 1 of our pipeline's resulting posterior averages for models using all eight calibrators.

Optimisation of individual noise wave parameters
The final highlight of our Bayesian pipeline is a the use of machine learning techniques to optimise individual noise wave parameters. This is advantageous as a blanket set of order-seven polynomials applied to all noise wave parameters, such as done in the EDGES experiment, may underfit or overfit individual parameters and misidentify systematics or information about the signal being measured.
The optimisation procedure compares the evidences (equation (20)) of different models to determine the vector of noise wave parameter polynomial coefficients n that best describes the data as briefly mentioned at the end of section 2.3. Since the model favoured by the data will have the highest evidence, we use a steepest descent procedure to compare models in 'n-space' and determine the direction of the gradient in 'evidence-space'. After multiple iterations, this brings us to the model with the maximal evidence. Since n consists of five numbers corresponding to the number of polynomial coefficients for each of the five noise wave parameters, models are generated by individually increasing each index of n by 1. We expect the evidence to follow an 'Occam's cliff,' in which the evidence sharply increases preceding the optimal n with a slow fall off following the maximum.
To demonstrate this, data is generated using measurements from all eight calibrators of table 1 and noise wave parameters as second- Figure 7. Evidence of multiple models are plotted which display the Occam's cliff. Data is generated using noise wave parameters as order-2 polynomials. We see that for the model with the highest evidence, that is, the model favoured by the data, the number of polynomial coefficients matches that of the model noise wave parameters. order polynomials unc = 2 − 3 + 250 K cos = 2 2 + 190 K sin = 3 2 + 8 + 90 K NS = 4 2 + 5 + 1200 K L = 5 2 + 10 + 298 K where is our normalised frequency. Gaussian noise of one unit variation is applied to the calibrator input temperatures as before. The evidences of various models are plotted in Fig. 7 in which an Occam's cliff can be seen peaking at polynomial order two. As expected from the plot, the steepest descent algorithm finds that noise wave parameters modelled as second-order polynomials best describe the data.

Application with realistic noise
To demonstrate the robustness of our pipeline, we conducted selfconsistency checks using empirically modelled data with a more complicated noise model. This data was generated using reflection coefficients of eight calibrators and the receiver measured in the laboratory. These reflection coefficients were then smoothed using a cubic smoothing spline (Weinert 2009) in order to maintain their approximate shape over frequency. The same second-order noise wave parameters detailed in section 3.3 are used with the reflection coefficients to generate our model power spectral densities. Following this, we added of order 1% Gaussian noise independently to the smoothed Γ rec and Γ cal as well as cal to more accurately represent the instrument noise from measurement equipment such as vector network analysers. No noise was added to the calibrator input temperatures. This results in a model that does not match the Gaussian form of our mathematical model as in the previous sections and thus does not demonstrate the features of our pipeline as explicitly, but is more representative of data set expected from measurements in the field. Data for the receiver and the cold load generated using this noise model are shown in Fig. 8.
Using data generated for all eight calibrators with our realistic noise model, the calibration algorithm selects optimal polynomial orders matching those of the model noise wave parameters whose values fall within within 1 of the posterior peak values as shown in Fig. 9. For these higher order tests, we use fgivenx plots which condense noise wave parameter posteriors into samples that can be compared to the model parameter values instead of comparing each individual coefficient (Handley 2018). When this calibration model is used to calibrate an ambienttemperature 50 Ω load, the RMS error between the calibrated temperature and the measured temperature is 8 mK, well within the 1 noise level (bottom right panel of Fig. 9). This level of accuracy is comparable to the 26 mK noise floor estimated for the EDGES pipeline in 2016 (Monsalve et al. 2017).
By individually adjusting each component of noise arising in our realistic noise model, we may determine what kind of noise our calibration algorithm is most sensitive to, as well as calculate the maximum amount of noise permissible for a specified level of systematic feature reduction. These topics are intended to be explored in a future work.

CONCLUSIONS
Here we presented the development of a calibration methodology based on the procedure used by EDGES but with key improvements to characterise reflections arising at connections within the receiver. Our pipeline utilises the Dicke switching technique and a Bayesian framework in order to individually optimise calibration parameters while identifying correlations between them using a dynamic algorithm to be applied in the same environment as the data acquisition. In a comprehensive investigation, we have evaluated our algorithm's interpretation of empirical models of data which have been generated from known noise wave parameters and a realistic noise model. The solution, applied to an ambient-temperature 50 Ω load, produces a calibrated temperature with an RMS residual temperature of 8 mK. Future work for the pipeline regards application of real calibrator data, optimisation of noise wave parameter coefficients through marginalisation techniques and incorporation into an endto-end simulation based on an entire experimental apparatus to better understand error tolerances. The flexibility of the algorithm attributed to our novel approach allows its application to any experiment relying on similar forms of calibration such as REACH (de Lera Acedo et al. 2021), were we intend to use this for in-the-field and on-the-fly radiometer calibration.