Mitigating the effect of 1/f noise on the detection of the HI intensity mapping power spectrum from single-dish measurements

We present and compare several methods to mitigate time-correlated (1/f) noise within the HI intensity mapping component of the MeerKAT Large Area Synoptic Survey (MeerKLASS). By simulating scan strategies, the HI signal, foreground emissions, white and correlated noise, we assess the ability of various data processing pipelines to recover the power spectrum of HI brightness temperature fluctuations. We use MeerKAT pilot data to assess the level of 1/f noise expected for the MeerKLASS survey and use these measurements to create realistic levels of time-correlated noise for our simulations. We find the time-correlated noise component within the pilot data to be between 10 and 20 times higher than the white noise level at the scale of k = 0.04 Mpc^-1. Having determined that the MeerKAT 1/f noise is partially correlated across all the frequency channels, we employ Singular Value Decomposition (SVD) as a technique to remove both the 1/f noise and Galactic foregrounds but find that over-cleaning results in the removal of HI power at large (angular and radial) scales; a power loss of 40 per cent is seen for a 3-mode SVD clean at the scale of k = 0.04 Mpc^-1. We compare the impact of map-making using weighting by the full noise covariance (i.e. including a 1/f component), as opposed to just a simple unweighted binning, finding that including the time-correlated noise information reduces the excess power added by 1/f noise by up to 30 per cent.


INTRODUCTION
Hi intensity mapping, the measurement of unresolved, redshifted 21cm emission from neutral hydrogen, can be a powerful tracer of the formation of Large Scale Structure (Chang et al. 2008;Wyithe et al. 2008;Peterson et al. 2009;Chang et al. 2010;Masui et al. 2013;Bull et al. 2015).Both interferometric and single-dish (autocorrelation) experiments have been developed for the purposes of tomographically mapping the 21cm brightness temperature fluctuations, with each method having its pros and cons (Battye et al. 2013;Santos et al. 2015;Abdalla et al. 2021;Yohana et al. 2021;CHIME Collaboration et al. 2022).Many of the challenges associated with measuring the Hi power spectrum are common to both interferometer and single-dish experiments, for example foreground removal or beam characterisation.There are important systematic ★ E-mail: mirfan@myuwc.ac.za effects that only affect either interferometers or auto-correlation experiments however.A particular example is correlated receiver gain fluctuations, which are an important contaminant for single-dish experiments, but can be reduced and controlled by interferometers.Such fluctuations can be seen as an effective multiplicative term, which is time-correlated over time scales longer than characterised by the 'knee frequency' (   ) and known as either 1/  , pink or flicker noise.The receiver gain fluctuations (Δ/) behind 1/  noise are a phenomenon found in a wide range of electronic systems, not only radio receivers, yet there is still a lack of understanding regarding their origin (Harper et al. 2018;Chen et al. 2020).This 1/  noise can be modelled as an independent random Gaussian term with a power spectrum (  ) for a single frequency channel () given by: where  n is the thermal noise level,  is the Fourier pair of the time axis of a stream of time-ordered data and the knee frequency (   ) determines the frequency at which the 1/  slope (of gradient ) meets the white noise level (Bigot-Sazy et al. 2015).This noise term multiplies the sky temperature measured by the receiver.However, with the onset of intensity mapping experiments intent on measuring a multitude of frequency channels with a single receiver, the possible correlation of the time-correlated noise across different frequency channels must be considered and parameterized (Harper et al. 2018).In this work we build on this formalism by considering the impact of frequency-correlated 1/  noise on the Hi 3D power spectrum, with specific applications to single dish observations using the MeerKAT telescope (Jonas & MeerKAT Team 2016).The findings are crucial to guide a future large survey with MeerKAT: MeerKLASS (Santos et al. 2016), aiming to directly measure the anisotropic 21cm power spectrum at ∼ degree scales and above, over a wide redshift range.Moreover, MeerKLASS is a pre-cursor to the 21cm intensity mapping survey which will be conducted with the SKA-mid dishes and as such the findings in this work hold relevance for determining the optimum survey and data reduction strategies for 21cm intensity mapping with SKA-mid.A calibration pipeline was developed and applied to pilot observations from 2019 data (Wang et al. 2021), which have already been used to obtain a cross-correlation detection of the Hi power spectrum (Cunnington et al. 2022).In this paper, we use observational measurements of receiver properties alongside actual scan strategies to produce realistic simulations of thermal and 1/  noise.These simulations help to determine the optimal configuration for the SKA-mid dishes in terms of scan strategies, time ordered data (TOD) reduction and map-making techniques to mitigate 1/  noise.
The choice of scan strategy plays a pivotal role in the reduction of systematic noise.For 1/  noise the scan strategy (pattern and speed) determines the number of correlated data samples to be binned together into a single map pixel, whilst any changes in elevation during the scan effects the stability of the ground emission pick-up detected by the receiver.Tegmark (1997a) compares scan strategies in the context of the cosmic microwave background plus 1/  noise and determines that the optimal observational method is one which maximises the independent cross-linking between data samples.This can be done in practice by using a 'fence' technique where regions close to each other on the sky are measured by the telescope across a time period shorter than the knee frequency, or by combining the data samples close to each other on the sky that have been measured by completely separate, and so independent, observation blocks.In this work we shall investigate two strategies; one with very little and one with moderate cross-linking.Whilst it is also possible to increase the number of data samples measured with uncorrelated noise properties by increasing the scan speed, we leave the simulation scan speed set to 5 arcmin per second.This value was chosen for the MeerKLASS pilot survey as it ensures that the pointing only changes by 10 arcmin within the 2 second integration time, which is sufficiently small compared with the primary beam size (approximately a degree at 1400 MHz) that it does not introduce significant smearing.
In terms of data reduction, several options for the mitigation of 1/  noise are available.As this time-correlated noise manifests as multiplicative gain drifts in the TOD, it can in principle be calibrated out.Some single dish receivers make use of noise diodes to inject a constant and stable signal into the TOD, which can be used as an internal calibrator (Boothroyd et al. 2011;Jones et al. 2018;Hu et al. 2021).It is also possible to use a component separation technique, such as Singular Value Decomposition (SVD), on the TOD.Such methods can be used to decompose a matrix into components which are independent from each other.As the 1/  noise displays very different properties across frequency and time to the Hi signal, Li et al. (2021) have shown it is possible to use an SVD to help separate the two signals from each other.Li et al. (2021) use SVD cleaning on MeerKAT observational data of the South Celestial Pole and find that a 2-mode subtraction is enough to lower the knee frequency from around 0.1 Hz to 0.003 Hz.A final possibility, which we investigate in this work, is the level of 1/  mitigation that can be achieved from map-making using a weighting informed by the noise covariance matrix, as opposed to just a straightforward averaging (Tegmark 1997b;Sutton et al. 2010).
As already mentioned, 1/  noise has previously been investigated in the context of intensity mapping simulations.In particular, Bigot-Sazy et al. (2015) made an analysis considering a component completely correlated across frequency while Harper et al. (2018) considered noise partially correlated across frequency and its impact on the angular power spectrum in the context of an SKA wide survey.This work was further expanded in Chen et al. (2020) to analyse the impact on cosmological parameters through Fisher forecasting, using an analytical model of the 1/  noise angular power spectrum.We build on these previous works and expand them with a detailed analysis of 1/  simulations in order to try to clarify the impact the 1/  noise will have on SKA-mid observations.In particular, we construct a realistic model of frequency-correlated noise using observational data from MeerKAT receivers taken during pilot surveys for intensity mapping.We then simulate the impact that both map-making techniques and proposed observational scan strategies have on the measurement of the 3D Hi power spectrum (both in its cylindrical form and spherical average).Finally, we test how cleaning methods can alleviate the 1/  noise and check for possible signal loss using the cross correlation power spectra.In this context we present a simulation pipeline which will be useful for future works where realistic noise and scanning strategies are required.
The paper layout is as follows: in Sect. 2 we outline the simulation pipeline and in Sect. 3 we assess how SVD cleaning the TOD impacts the magnitude of the 1/  noise within the 1D temporal power spectrum.The temporal power spectrum allows us to view the 1/  noise and white noise power levels as distinct from each other due to their different regions of dominance in  .Sect. 4 details the map-making step and in Sect. 5 we present the effect that SVD cleaning has on maps contaminated with 1/  noise through the use of power spectra.We conclude in Sect.6.Throughout the paper vectors are capitalised, and matrices are bold and in capitals.

SIMULATION DATA
The simulation data have been created to reflect the working set-up of the MeerKAT L-band (856-1711 MHz) receivers.We simulated data from 60 dishes, as this is typically the number of working dishes available from the full array of 64 for an average observation block.As each dish has two receivers, measuring HH (horizontally polarised) and VV (vertically polarised), this gives 120 sets of simulated data.The frequency resolution was set to that of the L-band receivers (0.2 MHz) and we chose to simulate 1000 data channels starting from 971.2 MHz, as Wang et al. (2021) found significant satellite radio frequency interference under 970 MHz.The L-band receivers have a frequency-dependent beam which, when approximated as a Gaussian primary beam, decreases in full width half maximum (FWHM) from 1.6 summarised in Table 1.The MeerKAT dishes can be set to follow numerous scan patterns; in this work we consider the Horizontal Raster Drift (HRD) and Random Pointing strategies.These scans will be divided into observing blocks of the same size in time, covering the same region sky.

Horizontal Raster Drift
The HRD strategy is currently the survey mode being used to obtain MeerKAT data for Hi intensity mapping.It entails holding each dish at a constant elevation and scanning back and forth in azimuth, and is motivated by keeping the elevation-dependent power contributions from ground spill and atmospheric emission close to constant.In this strategy, scan lines from different blocks are made to cross so as to improve the overall relative calibration.A scan speed of 5 arcmin per second is used.Note that we do not consider here the impact of this relative (or redundant) calibration in the 1/  noise.Removing the 1/  noise through improvements in calibration through this crossing or noise diode injection will be left for future work.Moreover, as discussed later, the calibration in between observing blocks effectively removes the 1/  time correlations between blocks so that the lines crossing from different blocks should have little impact on improving map-making.This also means that we can effectively suppress the 1/  contribution by cross-correlating different blocks or dishes.In this instance, improvements in scanning strategies or map-making will not impact the final power spectrum on average.Instead, they will help to reduce the variance of the measurement.
The terrestrial contributions to the total system temperature are the receiver temperature, the atmospheric temperature and the ground emission pick-up.The receiver temperature is expected to change over frequency but to remain constant over long timescales with a magnitude between 6 and 8 K (Wang et al. 2021).We use the receiver temperature models provided by the MeerKAT office, which are a function of dish, polarisation and frequency.The atmospheric contribution is the same for each receiver but changes across frequency and elevation.As the HRD scans at constant elevation though, the atmospheric contribution only changes across frequency and is constant in time.We calculate the simulated atmospheric contribution using typical pressure, humidity and temperature values for the MeerKAT site.The ground emission pick-up is different between the HH and VV polarisations and changes over frequency and elevation.As before, the constant elevation of the HRD scans means that we only need to use the MeerKAT office ground emission models in VV and HH across frequency.The combined elevationdependant temperatures contribute between 4.4 and 5.5 K, depending on the frequency channel, to the total emission temperature.The HRD strategy, therefore, has a combined terrestrial temperature of around 10-14 K which varies across frequency and receiver but is well modelled as constant in time.
As the 1/  noise is a multiplicative factor, the magnitude of its fluctuations ( 1/  ) scale with the total system temperature at each frequency: where and  sys,m is the measured system temperature.The only temperature contribution which changes in time for the HRD comes from the sky, namely the combined Hi , diffuse Galactic and extragalactic point source emission temperature contributions.When it comes to map-making for the HRD strategy, the temperature averages (across time) at each frequency channel are subtracted from the TOD to remove the terrestrial contributions to the measured system temperature.This is a common strategy in data analysis (removing a constant offset in time), which can also be accomplished through the SVD cleaning.

Random Pointing
For the Random Pointing strategy we first divide the area of sky we wish to observe into a hexagonal close packed grid of R.A and declination values with a spacing of 0.35 • .For the first 90 minute observation block we select a random (R.A, declination) pair from the grid and observe it for 30 seconds, then we select another random coordinate pair and slew to the next location.The MeerKAT dishes can slew at a speed of 2 • /s in azimuth/elevation so the slew time is calculated using the difference between one pointing and the next in the azimuth/elevation reference frame.If the difference in azimuth is larger than the difference in elevation then the azimuth difference is used to calculate the slew speed, while if the elevation difference is larger then that is what is used to calculate the slew speed.New pointings continue to be selected at random from the total grid until the 90 minutes observational time is used.
The observed (R.A, declination) pairs are then removed from the grid options and the process is repeated again for the rest of the observational blocks.This random pointing is a closer representation of the strategy currently employed by the MeerKAT telescopes in interferometric mode, in terms of following a schedule of tracking observations.However, with just 30 seconds per pointing, the overheads in slew time will be quite large for MeerKAT, so this strategy is highly inefficient.Longer tracking times, would be more efficient but increase the 1/f noise dramatically.
The simulated receiver temperatures used for the random strategy are identical to those used for the HRD.The elevation-dependant contributions, however, change over time for the random strategy, unlike for the HRD, as the dish elevation changes with each pointing.The pointing for each random observation block is used to provide the elevations necessary to calculate the atmospheric and ground spillover temperatures from the MeerKAT office models.These additional fluctuations,  el (), can be up to 1 K and so  1/  ×  el () will not be negligible and need to be included in the TOD.The MeerKLASS data reduction pipeline contains models for the elevation-dependant temperature contributions which we subtract from our empirical data (Wang et al. 2021).For these simulations we assume these models are perfect and so we completely subtract the time-varying, elevation-dependant temperatures at each frequency channel, after the TOD is generated but before map-making the random survey.The aim of this paper is to investigate the effect of 1/  noise on the Hi power spectrum, therefore this work will not deal with the impact of incorrect ground and atmospheric emission modelling.
A random pointing strategy would make using the interferometric visibilities a straightforward task.As it stands, the HRD scan strategy complicates this matter but not to the extent that it is no longer achievable; future papers will demonstrate the ability of MeerKLASS to measure simultaneously in auto-and crosscorrelation mode.We do not consider more scanning strategies as these two sample what we believe is reasonable for a MeerKAT intensity mapping survey.A more complicated movement of dishes on the sky would improve the crossing between lines within the 1/  noise time scales but would severely affect the ground pick up due to strong elevation variations.On the other hand, an even more "classic" survey where we spend more time tracking each point, would accumulate 1/  noise on the scales of interest beyond any acceptable level.
The simulated HRD/random pointing data consist of 16 observation blocks, each of which has a 90 minute duration.The 90 minute time duration was chosen as the temperatures of the MeerKAT LNAs have been shown to be stable over this time period Wang et al. (2021).16 observation blocks provides a high enough signal-to-noise ratio between the Hi signal and the instrumental Gaussian noise level.
Figure 1 shows the sky coverage for each strategy; the TOD samples are recorded every 2 s for both scanning strategies but whilst the HRD strategy changes position on the sky with each sample, the random strategy observes each position on the sky for 30 s (15 hits) before slewing to a new location.In order to assess the ability of SVD cleaning to reduce the impact of 1/  noise on the Hi cross-correlation power spectrum measurement we also needed to simulate the other frequency-correlated emissions expected to contribute to the total system temperature: Galactic and extragalactic foregrounds.We use the Global Sky Model software package1 to simulate the foreground contributions at each frequency channel (Zheng et al. 2017).

Simulated 21cm signal
For the 21cm signal we generate Gaussian realisations of a density field from an input power spectrum.A Gaussian overdensity field δ(k) is generated in Fourier space, populated with realisations of Gaussian random numbers with standard deviation () = √︁ ()/2 vox , where  vox is the number of voxels in the 3D field and a factor of two is required as the numbers are complex.() is the input power spectrum, which we obtain from CAMB (Lewis & Bridle 2002) evolved to a redshift of  = 0.2.Inverse Fourier transforming the field gives a periodic Gaussian realisation of the matter over-densities (x).To produce a Hi field from this we assume some linear bias  HI for the Hi and the mean brightness temperature  HI .The Hi temperature fluctuations are then given by  HI (x, ) =  HI () HI ()(x, ) . (4) For the bias we assume  HI = 1 and we follow Switzer et al. (2015) for the brightness temperature of the Hi, modelled by (Chang et al.

2008)
HI () = 0.39 mK Ω HI () We assume Ω HI = 6.2 × 10 −4 .In this simplified model of Hi emission, we do not include the effect of redshift space distortions (RSD); leaving this for future work where we can fully investigate the effect that signal loss along the line of sight () has on measuring RSDs.
The simulated Hi and foreground signals were convolved with an estimate for the MeerKAT beam.The convolution assumes a Gaussian beam where the beam FWHM changes across frequency as since the MeerKAT L-Band receivers are measured to have a FWHM of 1.2 • at 1280 MHz (Knowles et al. 2022).Once more, no beam fluctuations were considered here in order to not complicate the understanding of the 1/  noise effects.

Simulated noise
White noise was generated for each receiver using a Gaussian distribution centred at zero.The white noise r.m.s level for each receiver is calculated using the receiver equation: where  and  are the integration time and spectral resolution, respectively.Each dish has two linearly polarised receivers which are averaged together in order to form Stokes I at the map level.
Therefore the white noise level in the maps can be calculated using Equation 6 with an additional factor of √ 2 in the denominator.The 1/  noise was generated using the 2D 1/  simulator described in (Li et al. 2021).The 1/  noise in an individual receiver channel is correlated across the timestamps and simulated to have a temporal power spectrum which deviates from the constant power displayed by simple white noise in the form of a power law.This power law is characterised by the  parameter and the knee frequency (   ), which is where the 1/  slope meets the white noise power level.The 1D Fourier transform of one TOD channel gives the noise power as a function of  which is the reciprocal space of the time intervals.The 1/  noise across frequency channels may also be correlated (Harper et al. 2018) i.e. the time-correlated noise is correlated over frequency as well.In this case the 2D Fourier transform is required to capture the full 1/  noise power information; the 1/  correlations in frequency are a function of  which is proportional to 1/ frequency intervals between the different channels.We have chosen to model the spectral correlations using a power-law as well.The parameter  quantifies how strongly correlated across frequency the 1/  noise is, ranging from not correlated at all ( = 1) to fully correlated across the entire frequency range ( → 0).The 2D power spectra is calculated as: where with  0 = (  ) −1 , and   is the total number of frequency channels.Sect.A details the measurement of the ,  and   parameters used within this simulation.
In order to simulate the 1/f noise, we start by considering the 2D power spectrum which is converted to Fourier coefficient amplitudes: These coefficients just contain amplitude information about the 1/  noise power; they have no phase information.The Fourier transform of real TOD produces a complex vector with both real and imaginary parts, therefore we need to add phase information to our amplitudes before performing the inverse Fourier transform.We add randomly generated phase information by multiplying the amplitudes by a unit amplitude Gaussian vector of complex numbers.Finally an inverse Fourier transform, enforcing that the resulting TOD should be real, can be used on the total (amplitude plus phase information) coefficients to produce the total noise timestream.These noise realizations are done independently for each receiver, dish and scanning block.While receivers and dishes should have statistically independent quantities, the 1/  noise could in principle be correlated across time scales much longer than the duration of each scan.However, in observations, the gains are fully calibrated at the beginning and end of each scan, which should calibrate out any large 1/  noise fluctuations and remove the very long time scale correlations.

QUANTIFYING THE LEVEL OF 1 / f NOISE AFTER SVD CLEANING
In this work we aim to investigate the effect that SVD cleaning has on the 1/  in both the TOD and the final maps.The basis of an SVD is that any real matrix A can be understood as the composite of two real orthogonal matrices U and V: where A has dimensions  × , U has dimensions  × ,  has dimensions  ×  but only has values (the singular values of A) on the diagonal and V has dimensions  × .For the TOD in a single observation block we have a two dimensional array of temperature samples at each frequency.The covariance matrix is calculated by taking the average over time of frequency-frequency pairs (or over pixels in the map).Then the Eigen-decomposition is performed on the frequency-frequency covariance matrix.
To quantify the effect of the differing 1/  noise levels (due to SVD cleaning) on the TOD we compare the Fourier transform of the various simulations.As, in this section, we are only interested in the relative level of the 1/  noise to the white noise level we compare TOD which only contain noise components, e.g.white noise and 1/  fluctuations only, no Hi emission and no foregrounds.In Figure 2 we plot the 1D power spectra of the TOD after subtraction and division by the mean (see Equation A1).If the TOD only contain white noise fluctuations, Li et al. (2021) demonstrated that the power for the normalised TOD will be a flat power spectrum at a level proportional to 1/.Although the simulation frequency resolution is 0.2 MHz, to match the frequency resolution chosen for the MeerKLASS project, we chose to bin the simulated TOD channels into groups of 5 channels, giving a practical frequency resolution of 1 MHz.
In Figure 2 we show the temporal power spectrum for the HRD simulation.Binning our data to give a frequency resolution of 1 MHz reduces the number of channels from 1000 to 200; the temporal power spectrum is calculated for each channel and then the mean power across the 200 channels is plotted, with the standard error on the mean giving the error bars.The blue curve shows the white noise power level at 10 −6 Hz −1 .Light pink and orange denote the 1/  power slope for the TOD that have been SVD cleaned by removing the first 5 principal modes and the first 2 principals modes, respectively.The green curve shows the raw level of 1/  noise expected in the TOD.It is clear that an SVD clean of the TOD decreases the 1/  noise power.We did not attempt to remove more than 5 SVD modes, despite the fact that the 1/  power is still larger than the white noise power level after a 5-mode clean, as we were concerned about possible over-cleaning i.e. removing the cosmological signal as well as correlated noise.The problem of over-cleaning is discussed further in Sect. 5.
The second x-axis shows the angular distance covered by the telescope as the scan progresses.If the scan strategy just drew a straight line on the sky this would also be the angular scale covered and so fluctuations at the Baryon Acoustic Oscillation (BAO) scale, around 1 deg, would see little impact from 1/  noise.However, as the HRD scan reverses direction after 18 degrees in azimuth, it re-visits points close together on the sky, so that this scale of angular distance is not equivalent to an actual angular scale on the sky.Noise measurements less than ∼ 20 s apart can be seen to be independent.The typical   value for the raw level of 1/  noise is 5 mHz suggesting a typical timescale of 200 s for 1/  noise dominating the total noise contribution.However, even if 1/  noise is not dominant its contribution is not less than 1 per cent of the white noise level until ∼ 1/10 th of 200 s, assuming an  value of 1, as the 1/  and white noise combine in quadrature.The choice of scan strategy has no impact on the power levels measured from time-ordered data meaning that both the HRD and random strategy produce almost identical power spectra and so we choose to only plot the HRD results in Figure 2.

MAP-MAKING
We now discuss the impact of map-making algorithms on the final 1/  noise properties of the power spectrum.The simulated TOD frequency channels are again first binned across 5 channels, thus increasing   to 1 MHz and reducing  ch from 1000 to 200.We apply our map-making at each 1 MHz frequency channel independently of the other channels.Even though our simulated 1/  noise is correlated in frequency, this correlated signal can be removed, or at least greatly reduced in magnitude, if a foreground component separation technique such as SVD is applied to the TOD.Therefore the main concern of the map-making is to mitigate biases and reduce errors in the Hi auto-and cross-correlation power spectra caused by the time-correlated noise component in each individual frequency channel.Including frequency correlations in the noise covariance could improve the results but it would make the mapmaking much more computationally heavy (since then we could not do the map-making per frequency) and also it would introduce more inefficiencies due to the uncertainties in the full noise covariance matrix.
Following the approach of Tegmark (1997b); Stompor et al. (2001); Hamilton (2003), we use a noise-informed map-making solution, assuming no prior knowledge of the Hi signal: where the vector of map pixels () is of length   , the vector of TOD ( ) is of length   , the pointing matrix (A) is of size   ×   and the inverse noise covariance matrix, of dimensions   ×   , is used to weight the averaging of the TOD samples into map pixels.
In this paper, map-making is performed for all dishes and 16 observation blocks, so the length   for the TOD samples includes the samples from each dish for all 16 of our observation blocks.

Simple binning
In the case where there is no noise information or the data noise properties are believed to be purely Gaussian, Equation 11 simplifies to just averaging the TOD samples into the appropriate map pixels: In this case all that is required to make a map from the TOD is the pointing matrix.We use the Zenith Equal Area (ZEA) map projection making use of the astropy coordinate library2 selecting a pixel size of 0.458 • (to match the pixel size in a HEALPix3 map of  side 128).The MeerKAT beam size at 971 MHz is almost 1.6 • , giving between 2 and 3 map pixels per beam.

Noise covariance
What we denote as 'map-making' in this work is the use of Equation 11 explicitly including the noise covariance matrix.To estimate the noise vector we use an iterative process which starts from a naive map estimate: and is updated with each map estimate: where the number of iterations used () is 2. In practice we have 16 observation blocks and 60 dishes.Each dish during a single observation block has noise properties independent from all other dishes and observation blocks.Therefore, the noise covariance is calculated per dish, per observation block and then all the temperature samples are simultaneously combined using this weighting and the full (for all observation blocks) pointing matrix.Following Sutton et al. (2009), we use two approximations in order to implement Equation 11 in the least computationally expensive manner.Firstly, in place of calculating a covariance matrix we use the circulant matrix approximation.If the noise covariance matrix is circulant, i.e. each row has identical elements to the row above, just rotated one element to the right, then: where  1n  is the first row of the noise covariance matrix, F denotes the Fourier transform and F ( 1n  ) is the statistical average over all realisations of the noise power spectrum (note that matrix division here corresponds to an element by element division).In practice only one measurement of the noise vector exists per dish and so we fit the 1/  noise model to the single noise power spectrum for each dish (Dupac & Giard 2002).The 1/  noise model in 1D is simply:   (1 + (   /  )  ) and we leave all three parameters free to be fit.This approximation relies on the fact that the noise covariance is a circulant matrix.Although the noise covariance matrix for 1/  noise is not strictly circulant, Tegmark (1997a) detail how it can be decomposed into the sum of two matrices, one circulant and one sparse.Tegmark (1997a) explain how the circulant matrix dominates the noise covariance matrix operations as long as the rank of the noise covariance matrix is far larger than the sample difference over which the correlations fall to zero.For these simulations we are using 16 observation blocks of 90 minutes (where the sample rate is every 2 seconds) and 120 receivers; giving a noise matrix covariance rank of the order of 10 6 .The 1/  noise is correlated for a single receiver and only within the 90 minute observation block, making the noise correlations only hold over a sample difference of order 10 3 .
Therefore, as our noise covariance matrix can be decomposed into two matrices, one of which is sparse and the other of which is dominant and circulant, we can proceed by using the noise power spectrum approximation from Equation 15.Thus making the implemented map calculation differ from Eq. 11 like so: where B = F ( )/F ( Ñ) and D = F (A)/F ( Ñ), where each row of the matrix D is constructed one at a time.The second approximation is used in place of matrix inversion; from Equation 16we can see that the term in the brackets requires inversion.However in place of explicit matrix inversion we use the iterative generalized minimal residual function lmgres in python.
For experiments which aim to measure large regions of sky at high resolution and possibly in more than one Stokes parameter as well, the maximum likelihood method of map-making used in this paper might be prohibitively slow to implement.An alternative method of map-making, which is still capable of obtaining the optimum solution but at a far lower computational expense, called DESCART (Sutton et al. 2009(Sutton et al. , 2010) ) has been proposedspecifically in the context of mitigating the effects of 1/  noise on the measurement of the cosmic microwave background (CMB) angular temperature and polarisation power spectrum.The typical frequency for experiments which target a measurement of the CMB anisotropies is around 100 GHz and at this frequency the CMB is dominant compared to foregrounds, making systematics a more important consideration.DESCART divides the TOD into smaller time periods over which the 1/  noise can be approximated as an additive offset, which can then be calculated and subtracted.Sutton et al. ( 2009) demonstrate a cut of computation time between 5 and 22 times when comparing DESCART to the standard maximumlikelihood map-making method.However, for intensity mapping, we only observe in Stokes I and at low resolution, therefore we choose to implement the full maximum likelihood solution.

RESULTS
In this section both spherically and cylindrically averaged auto-and cross-correlation power spectra are used to understand the impact of the various levels of 1/  noise on the detection of the Hi signal.Spherically averaged power spectra show power as a function of  where  =

√︃
2  +  2  +  2  while cylindrically averaged power spectra show the distribution of the power both perpendicular ( ⊥ = √︃  2  +  2  ) and parallel ( ∥ =   ) to the line of sight.For each scenario, e.g.TOD containing Hi, white noise and the raw 1/  noise, we create 20 realisations in order to plot the mean of the power spectra results.The functions used to calculate the power spectrum from the temperature cubes (RA, declination, frequency) have been taken from FastBox4 .In Figure 3 we present the cylindrically averaged power spectrum for the Hi signal only, which shows large-scale frequency and angular correlations as the power is largest at small  ⊥ and  ∥ values.Both Figure 4 and Figure 5 show the effect 1/  noise has on the cylindrically averaged power spectra.Figure 4 shows the white noise plus 1/  noise level for both the HRD and random survey while in Figure 5, the top panel is the 1/  noise power divided by the white noise power for the HRD scan strategy and the bottom panel image is the same but for the random scan strategy.For the random strategy we remove the perfect time-varying model for the elevation-dependant temperature before using the data.The 1/  power level is shown after division by the white noise level, in order to demonstrate the impact of solely the time-correlated noise component on the overall error budget.On both plots we highlight the region in -space containing the three largest BAO peaks using green contour lines.
The 2D power spectrum for the random scan strategy will always have a fractionally higher white noise power level than the equivalent HRD power spectrum as observation time is lost due to telescope slewing; on average around 15 per cent of the total observation time is lost.For these simulations the HRD white noise power level sits around 15 mK 2 Mpc −3 , while the random survey has a white noise power of ∼ 20 mK 2 Mpc −3 .A lower white noise level is not a motivator for selecting the scan strategy however, as the total observational time can always be increased.For the HRD strategy significant time is dedicated to observing the sky beyond the central rectangular region investigated in this work; in practice this time is not wasted as it can be combined with observations from an adjacent field.
The random pointing strategy means that data points spatially close together could have any possible separation in time between 30 s and 90 minutes and so the 1/  noise is not constrained to lie within large angular scales/small values of  ⊥ .Whereas for the HRD strategy small values of  ⊥ mean longer time scales and so this region of the 2D power spectra is where we see the additional power of the 1/  noise.Both the top and bottom plots show the 1/  noise increasing the power at low values of  ∥ , due to the frequency correlated nature of the 1/  noise.The random scan strategy results in a spreading of the 1/  noise throughout -space, whereas the HRD strategy contains the 1/  noise to within the largest modes.Cylindrically-averaged auto-correlation power spectra of 1/  noise power divided by the white noise power.Top: the HRD scan strategy, bottom: the random scan strategy.We use a logarithmic scale for the colour bars, imposing an upper limit of log 10 (0.5) to highlight regions of 1/  dominance, and denote the area of -space containing the three largest BAO peaks using two solid green contour lines.
At the largest angular scales ( ⊥ ∼ 0.02) the random scan strategy shows a slightly smaller ratio of 1/  noise over the white noise level than the HRD scan, however this is not the case for the majority of the region in -space containing the three largest BAO peaks.
In Figure 6 the spherically averaged 1D auto-correlation power spectra are plotted for several scenarios; in place of the true power level we choose to plot the ratio between the 1/  noise power spectrum for each scenario and the power spectrum of the white noise only.Each point on the power spectra represents the average power for 20 simulations and the error bars are the standard error on that mean.Figure 6 reveals that the raw level of 1/  noise found in MeerKAT receivers would raise the auto-correlation noise power spectrum measurement to around 20 times (depending on the scan strategy used) the white noise level.Map-making can be seen to provide on a average around a 30 per cent decrease in excess power due to 1/  noise, at the worst effected -scales for both strategies; reinforcing the idea that noise-weighted map-making is preferable over a simple averaging/binning of the TOD into maps.The bottom panel of Figure 6 shows the ratio between the 1/  and white noise fraction for the random and HRD strategy.The power = 1 line is highlighted in gray to draw attention to the threshold below which the random survey outperforms the HRD survey.It is clear from this panel that using a random scanning strategy would result in a higher level of 1/  noise systematic noise above the Gaussian noise level for the majority of the region of -space containing the largest three BAO peaks.Although the random strategy outperforms the HRD .Top: Auto-correlation spherically-averaged power spectra for 1/  and white noise ratios, both scan strategies as well as both binned and noiseweighted maps are considered.Bottom: A comparison between the 1/  and white noise ratios for both the HRD and random survey strategy.Scales larger then 0.03 Mpc −1 are not shown due to the increased noise on the power measurements arising from the compact size of the survey regions.Power spectrum ratio between 1/  noise and the white noise maps for the HRD scan data for different frequency bins.The TOD were made into 2D maps at each frequency using the noise covariance matrix method of map-making, as opposed to simple binning.
specifically at  = 0.04 Mpc −1 , this advantage is not only fractional but also not seen for any other scale and so from this point onward we shall only consider the HRD survey results.We also consider how to bin the frequency channels.As discussed in Sect.4, the frequency channels have been averaged over 5 channels, thus reducing the number of channels from 1000 to 200.This frequency averaging reduces the white noise level and so the knee frequency in each channel map and it also reduces the computation time.However, we need to be careful to only bin channels together which have frequency correlated 1/  noise, otherwise we will change the properties of the 1/  noise in the new averaged channel from the properties used as simulation input.In Figure 7 0 we investigate the effect of frequency binning the HRD data on the ability of the noise-weighted map-maker to reduce the ratio between 1/  and white noise.It can be seen that at large  values the choice of frequency binning has no effect on the accuracy of the map-maker as this is the regime where the white noise dominates and white noise can be reduced through simple averaging.At small  values, where the 1/  noise power dominates, increasing the binning number from 2 to 5 (0.4 MHz to 1.0 MHz frequency resolution) shows a worthwhile improvement in the performance of the map-maker.Increasing the binning up to 10 channels (2 MHz frequency resolution), however, largely amplifies the problem of 1/  noise implying that ten frequency channels span too large a frequency range for the 1/  noise properties to be consistent and so the time-correlated noise no longer averages down.Therefore we selected 1.0 MHz as the preferred frequency resolution.
Figure 8 shows the effect of SVD cleaning on the autocorrelation power spectra.At this point we switch to using simulations which contain Galactic and extragalactic foregrounds from the GSM as these contributions are also correlated over frequency and so they will, alongside the 1/  noise, affect the impact of the SVD clean.We show the fractional difference between maps made from data both with and without the presence of 1/  noise.The HRD scan strategy data have been used and converted to maps using the noise-weighted technique.The larger the number of SVD modes removed, the closer the fractional power difference drops towards zero; meaning the smaller the impact of the 1/  noise above the residual foreground and white noise level.However SVD cleaning has also been shown to subtract the actual Hi signal itself.To help assess the level of over-cleaning/ signal-cleaning seen at low , we enlist the help of cross-correlation power spectra.
Figure 9 shows the effect of SVD cleaning on the crosscorrelation power spectra.The cross-correlation approach is probably the most optimal one for a Hi power spectrum detection since it will help to decorrelate systematics.Importantly, it will remove any noise bias, including the 1/  noise since it is independent between blocks.For the power spectra shown in Figure 9, the first eight HRD observation blocks were made into maps and then cross-correlated with the last eight observation blocks made into maps.The power difference ratio between the various power spectra and the pure Hi  Fractional difference between the total and pure Hi crosscorrelation power spectra for the HRD simulation data: binned maps with and without 1/  noise (top panel) and both binned and weighted maps containing 1/  noise (middle panel).The first eight observation blocks were cross-correlated with the last eight.The TOD for the results shown here all contain Hi emission, foregrounds and white noise.For the bottom panel the fractional power difference for the weighted, 3-mode SVD cleaned map containing 1/  noise is replotted against the fractional power difference for the cross-correlation between the first/last eight observation blocks (3-mode SVD cleaned and weighted map-making) and the Hi signal itself.cross-correlation power spectrum is shown: where auto and cross signify the auto-and cross-correlation power spectrum.The power difference ratio value of zero is overlaid onto Figure 9 using a solid gray line to highlight the optimum Hi power spectrum recovery.In this section the SVD cleaning was performed in the map-domain to represent the post-processing typically applied to intensity mapping data cubes.The three dimensional map cube containing the measurements from all dishes and all observation blocks was reduced to a two dimensional array of angular and frequency information in order for the frequency-frequency correlation matrix to be calculated.The Eigen-decomposition was then performed on this correlation matrix.
The top panel of Figure 9 shows the cross-correlation spectra from maps made using simple binning of the TOD.Both power spectra were made from data containing Hi plus white noise plus foregrounds; the difference between the two spectra is the inclusion of 1/  noise.Both power spectra reveal that a 3-mode SVD clean would remove a significant amount of Hi signal as well as the frequency-correlated contaminants, thus biasing the measurement.However, such a bias could be compensated for by a technique such as the transfer function (see Sect. 6 for a discussion on the transfer function).The addition of 1/  noise both increases the error bars on the measurement as well as mixing with the residual foregrounds, which are common to both sets of observation blocks, to raise the power at all scales.This, in-turn, results in a lesser degree of over-cleaning.The middle panel of Figure 9 reveals that at least 3 Eigen-modes were required to be removed in order to reach the Hi power spectrum level.Any fewer than three and there was residual power left over from the foreground emissions at medium scales.In the bottom panel of Figure 9 we show the fractional power difference for the cross-correlation between the first/last eight observation blocks and the pure Hi signal itself.For this case the 3mode SVD cleaned, weighted maps (contaminated with 1/  noise, foregrounds and white noise) were chosen to perform the crosscorrelation.The fractional power difference for the cross-correlation between the cleaned maps and the Hi signal are comparable to the fractional power difference for the cross-correlation between the first and last eight observation maps; which we plot again in the bottom panel to illustrate this point.The cross-correlations with the pure Hi signal, however, can be seen to result in fractional difference power spectra with smaller errors bars as only one of the data sets involved in the cross-correlation for these spectra contain contaminants (residual 1/  and foreground emissions remaining after the SVD clean).
Regardless of the method of map-making used we see around a 40 per cent signal loss at large scales for a 3-mode SVD clean.This is consistent with the simulations analysis performed by Bigot-Sazy et al. (2015), where it was found that for 1/  noise which is perfectly correlated across frequency channels an SVD clean of 2modes or more could remove all the excess power with a signal loss of only around 5 per cent.However if there is spectral structure to the Hi contaminants, for instance 1/  noise which is not perfectly correlated ( > 0), then the SVD clean struggles to remove the contaminants without a large loss in the measured Hi signal.Bigot-Sazy et al. (2015) simulate perfectly correlated 1/  noise but then add in a non-smooth instrumental band-pass and find a Hi signal loss of up to 66 per cent, depending on the magnitude of the bandpass perturbation.

CONCLUSIONS
This work has used simulated data to investigate the impact that scan strategy, SVD cleaning and noise-weighted map-making can have on the measurement of the Hi power spectrum from TOD containing 1/  noise.The simulations used reflected the parame-1/  noise in MeerKLASS 11 ters of the MeerKLASS Hi intensity mapping survey in terms of receiver temperature, spectral resolution and integration time.The magnitude and frequency correlation of the simulated 1/  noise used in the simulations was a realistic representation of the level of 1/  noise expected for the MeerKLASS data, as it was fitted from empirical 2019 pilot survey data.The discussion of whether or not to include noise diode firing events to calibrate out our 1/  noise is left for future work as the MeerKLASS calibration approach is still being honed.
MeerKLASS pilot data has so far been obtained using the HRD scan strategy, which scans the sky at 5 arcmin per second.This strategy was initially selected as it involves keeping the telescope at a constant scanning elevation; a useful tactic for ensuring minimum variation of the level of ground emission pick-up.Another common strategy is simply to integrate over specific telescope pointings for a longer time, this would entail moving the telescope in both azimuth and elevation.Through the use of these simulations we have shown that the HRD strategy is optimal for covering a large area of sky, without losing time samples to telescope slewing time, within a set observation time and restricting the effect of 1/  noise to the lowest  values on the spherically averaged power spectra.This conclusion was also clear from Figure 5 which only considered the additional power that the raw 1/  noise provided.In these 2D plots the random strategy was seen to spread the 1/  noise throughout the whole of -space, while the HRD strategy contained it to large k-modes which are the modes most likely to be contaminated by Galactic foregrounds anyway.Future simulations will need to include contaminants other than the 1/  noise and foregrounds, such as resdiual radio frequency interference for example.In this work we considered the change in system temperature caused by changing observational elevation, which the random scan strategy does; however, we assumed a perfect modelling and removal of this elevation-dependant contribution.
The TOD can be averaged into maps using either a simple binning or through finding a maximum-likelihood solution using the full noise covariance matrix.We implement both these techniques to assess whether the full noise analysis is a worthwhile strategy to help mitigate 1/  noise.While our 1/  noise is partially correlated across frequency, we anticipate that the empirical data processing pipeline will involve some degree of foreground cleaning aimed at removing frequency-correlated signal.Therefore we limit our map-making considerations to a per-frequency basis e.g.we use the full noise covariance matrix at a single frequency to make the 2D map at that frequency and then repeat this process for each frequency channel.Each of the 16 observation blocks, combined to cover the full RA and declination range in the map domain, have independent systematic noise properties to each other.Therefore, noise-weighted map-making for each individual frequency channel is only beneficial when two or more point samples from a single observation block are combined into a single map pixel.Fortunately it is not particularly computationally expensive for us to implement the full maximum likelihood solution, and in doing so we see a reduction in the contamination of the Hi power spectra by 1/  noise.This reduction is most noticeable, at around 30 per cent, for the raw levels of 1/  noise.
As the MeerKLASS 1/  noise is partially correlated across frequency, component separation techniques which exploit frequencyfrequency correlations can be employed to help remove some of the additional power added by this systematic.We included Galactic and extragalactic foregrounds, courtesy of the GSM, to assess the level of SVD-cleaning required.However, it should be noted that, after residual ground pick-up, residual RFI and calibration errors are included we are very likely to require a higher number of SVD modes to be removed from the data (Cunnington et al. 2022), hence the 1/  noise should no longer be a dominant systematic.
The main problem of SVD cleaning frequency-correlated contaminants is that, at the largest scales, part of the Hi signal is also removed.This over-cleaning can be seen in the cross-correlation power spectra presented in this paper.One solution for empirical data which has been foreground cleaned is to use the transfer function technique (Masui et al. 2013;Switzer et al. 2013;Wolz et al. 2022;Cunnington et al. 2023), which aims to quantify the level of over-cleaning through simulations and then correct for it.1/  noise, however, is unique as a frequency-correlated contaminate as it can actually be calibrated out of the TOD.Having focused in this work on finding the optimum scan and map-making strategies for MeerKLASS, future work will target the mitigation of 1/  noise through data calibration.un-calibrated data from the MeerKAT pilot survey; receiver m002.The sky signal has been isolated using the cross-correlation between receivers m000 and m0002 and removed from the m0002 data.Middle: An example waterfall plot of the normalised simulated 1/  plus white noise data for a single receiver.Bottom: An example waterfall plot of the normalised simulated white noise data for a single receiver.

Figure 1 .
Figure 1.The region of the sky as covered by the HRD (top) and random (bottom) scan strategy for a single observation block (left) or for all 16 observation blocks (right).The purple rectangle denotes the region selected for use in this analysis.

Figure 2 .
Figure2.The temporal (1D) power spectrum of the HRD scan strategy TOD with 1/  noise at the raw and SVD cleaned levels for one observation block and one receiver.The white noise power level is also shown.

Figure 3 .
Figure3.Cylindrically-averaged auto-correlation power spectrum for the Hi signal only.We impose a maximum limit of 30 mK 2 Mpc 3 to avoid saturation from the dominant scales.

Figure 4 .
Figure 4. Cylindrically-averaged auto-correlation power spectrum for the white noise plus 1/  signal only.We impose a maximum limit of 80 mK 2 Mpc 3 to avoid saturation from the dominant scales.
Figure5.Cylindrically-averaged auto-correlation power spectra of 1/  noise power divided by the white noise power.Top: the HRD scan strategy, bottom: the random scan strategy.We use a logarithmic scale for the colour bars, imposing an upper limit of log 10 (0.5) to highlight regions of 1/  dominance, and denote the area of -space containing the three largest BAO peaks using two solid green contour lines.

Figure 7 .
Figure7.Power spectrum ratio between 1/  noise and the white noise maps for the HRD scan data for different frequency bins.The TOD were made into 2D maps at each frequency using the noise covariance matrix method of map-making, as opposed to simple binning.

Figure 9 .
Figure 9. Fractional difference between the total and pure Hi crosscorrelation power spectra for the HRD simulation data: binned maps with and without 1/  noise (top panel) and both binned and weighted maps containing 1/  noise (middle panel).The first eight observation blocks were cross-correlated with the last eight.The TOD for the results shown here all contain Hi emission, foregrounds and white noise.For the bottom panel the fractional power difference for the weighted, 3-mode SVD cleaned map containing 1/  noise is replotted against the fractional power difference for the cross-correlation between the first/last eight observation blocks (3-mode SVD cleaned and weighted map-making) and the Hi signal itself.

Figure A1 .
Figure A1.Top: Normalised (subtraction of and division by the mean), un-calibrated data from the MeerKAT pilot survey; receiver m002.The sky signal has been isolated using the cross-correlation between receivers m000 and m0002 and removed from the m0002 data.Middle: An example waterfall plot of the normalised simulated 1/  plus white noise data for a single receiver.Bottom: An example waterfall plot of the normalised simulated white noise data for a single receiver.

Table 1 .
Pertinent simulation parameters for the Horizontal Raster Drift and Random Pointing survey scan strategies.
• to 1.3 • across the 971.2 to 1171.2 MHz frequency range.Pertinent simulation parameters are Auto-correlation spherically-averaged power spectra for the HRD scan strategy, noise-weighted maps.The maps were SVD cleaned and the fractional change in power is plotted for data with 1/  noise (i.e.1/  noise plus white noise plus foregrounds plus the Hi signal) and for data without (i.e.white noise plus foregrounds plus the Hi signal).