Towards $21$-cm intensity mapping at $z=2.28$ with uGMRT using the tapered gridded estimator -- IV. Wideband analysis

We present a Wideband Tapered Gridded Estimator (TGE), which incorporates baseline migration and variation of the primary beam pattern for neutral hydrogen (${\rm H\hspace{0.5mm}}{\scriptsize {\rm I}}$) 21-cm intensity mapping (IM) with large frequency bandwidth radio-interferometric observations. Here we have analysed $394-494 \, {\rm MHz}$ $(z = 1.9 - 2.6)$ uGMRT data to estimate the Multi-frequency Angular Power Spectrum (MAPS) $C_\ell(\Delta\nu)$ from which we have removed the foregrounds using the polynomial fitting (PF) and Gaussian Process Regression (GPR) methods developed in our earlier work. Using the residual $C_\ell(\Delta\nu)$ to estimate the mean squared 21-cm brightness temperature fluctuation $\Delta^2(k)$, we find that this is consistent with $0 \pm 2 \sigma$ in several $k$ bins. The resulting $2\sigma$ upper limit $\Delta^2(k)<(4.68)^2 \, \rm{mK^2}$ at $k=0.219\,\rm{Mpc^{-1}}$ is nearly $15$ times tighter than earlier limits obtained from a smaller bandwidth ($24.4 \, {\rm MHz}$) of the same data. The $2\sigma$ upper limit $[\Omega_{{\rm H\hspace{0.5mm}}{\scriptsize {\rm I}}} b_{{\rm H\hspace{0.5mm}}{\scriptsize {\rm I}}}]<1.01 \times 10^{-2}$ is within an order of magnitude of the value expected from independent estimates of the ${\rm H\hspace{0.5mm}}{\scriptsize {\rm I}}$ mass density $\Omega_{{\rm H\hspace{0.5mm}}{\scriptsize {\rm I}}}$ and the ${\rm H\hspace{0.5mm}}{\scriptsize {\rm I}}$ bias $b_{{\rm H\hspace{0.5mm}}{\scriptsize {\rm I}}}$. The techniques used here can be applied to other telescopes and frequencies, including $\sim 150 \, {\rm MHz}$ Epoch of Reionization observations.


INTRODUCTION
The redshifted 21-cm line from the cosmological distribution of neutral hydrogen (H i) atoms appears as a faint background radiation at frequencies below 1420 MHz.Mapping the spectral and angular fluctuations in the specific intensity of this radiation (Intensity Mapping -IM) holds the potential to quantify the large-scale clustering of the matter distribution (Bharadwaj et al. 2001).The first detection of this signal by Pen et al. (2009) cross-correlated the  ∼ 0.02 IM signal from the H i Parkes All Sky Survey (Barnes et al. 2001) with the optical six-degree field galaxy redshift survey (Jones et al. 2004(Jones et al. , 2005)).Chang et al. (2010) reported a 4 detection at  = 0.8 cross-correlating the Green Bank Telescope (GBT) radio observation and optical DEEP2 Redshift Survey (Davis et al. 2001).Several subsequent works have reported the detection of the IM signal in cross-correlation at  ∼ 1 using both single-dish telescopes (Ma-★ E-mail:asifelahi999@gmail.com † E-mail:somnath@phy.iitkgp.ac.in sui et al. 2013;Switzer et al. 2013;Anderson et al. 2018;Wolz et al. 2022;Cunnington et al. 2023) and interferometers (Amiri et al. 2023).CHIME Collaboration et al. (2023) recently detected the signal in the redshift range 1.8 <  < 2.5 cross-correlating 1000 hours data from the Canadian Hydrogen Intensity Mapping Experiment (CHIME) with the Lyman- forest measurements from the Extended Baryon Oscillation Spectroscopic Survey (eBOSS; du Mas des Bourboux et al. 2020).Ghosh et al. (2011a,b) have attempted to detect the IM signal in auto-correlation with Giant Metrewave Radio Telescope (GMRT; Swarup et al. 1991) data at  = 1.32, and have obtained an upper limit [Ω H i  H i ] < 0.11 where Ω H i and  H i are the H i mass density and the H i bias parameters, respectively.Paul et al. (2023) recently reported the first detection of the 21-cm signal from  = 0.32 and 0.44 in auto-correlation using 96 hours of observation with the MeerKAT interferometer.
The recently upgraded GMRT (uGMRT; Gupta et al. 2017) with the GMRT Wideband Backend (GWB) provides a large instantaneous frequency coverage, which opens up the possibility for IM studies over a large redshift range.A 25-hour observation of the European Large-Area ISO Survey-North 1 (ELAIS-N1) field was performed using the Band 3 (300-500 MHz) of uGMRT in May 2017.Chakraborty et al. (2019) first identified and removed the point sources (having flux densities > 100 Jy) from this data, and used the residual data to characterize the diffused Galactic foregrounds for the entire 200 MHz bandwidth.Chakraborty et al. (2021) (hereafter Ch21) further estimated the 21-cm power spectrum (PS) using the delay space technique.Frequency channels which are flagged to avoid human-made Radio Frequency Interference (RFI) introduce artefacts in delay space, which Ch21 compensated for using onedimensional clean (Parsons & Backer 2009).Ch21 further used foreground avoidance to obtain upper limits [Ω H i  H i ] ≲ 0.1 − 0.24 for several redshifts in the range 1.96 <  < 3.58 at  ∼ 1 Mpc −1 .
The present paper is the fourth in a series of works which have used the uGMRT observation mentioned in the previous paragraph.In these works, we have used the Tapered Gridded Estimator (TGE; Choudhuri et al. 2014Choudhuri et al. , 2016a,b) ,b) which offers several unique features for estimating the 21-cm PS from radio-interferometric measurements.TGE tapers the sky response of the telescope to reduce the contribution from wide-field foregrounds like bright extra-galactic sources located in the side-lobes.On a different note, missing frequency channels pose a significant difficulty in 21-cm PS estimation, and several algorithms have been proposed to deal with this (Parsons & Backer 2009;Trott 2016;Ewall-Wice et al. 2021;Kern & Liu 2021;Kennedy et al. 2023).TGE naturally overcomes this difficulty as it first estimates  ℓ (Δ) the Multi-frequency Angular Power Spectrum (MAPS).Typically, we get the estimated  ℓ (Δ) for all frequency separation Δ, even if several frequency channels are missing.The 21-cm PS, which is estimated by Fourier transforming  ℓ (Δ) with respect Δ, is found to be free of artefacts (Bharadwaj et al. 2019).Pal et al. (2021) have demonstrated the potentials of TGE by applying it on 150 MHz GMRT data where ∼ 50% of the data are flagged.
In Pal et al. (2022), which is Paper I of this series, we have used TGE to analyse a 24.4 MHz sub-band of the uGMRT data mentioned earlier.This sub-band has the central frequency of 432.8 MHz ( = 2.28).The signal in the two polarization states (RR and LL) were combined for the PS estimation and an upper limit of [Ω H i  H i ] < 0.23 at  = 0.347 Mpc −1 was obtained.Continuing with the same data, in Elahi et al. (2023a) (Paper II) we showed that the foreground level is substantially reduced if the PS is estimated by cross-correlating the RR and LL polarizations rather than combining them, and the upper limit was tightened to [Ω H i  H i ] < 6.1 × 10 −2 at  = 0.804 Mpc −1 .Both the above works found a considerable part of the ( ⊥ ,  ∥ ) plane to be foreground-contaminated, and it was necessary to avoid this region to estimate the 21-cm PS.In Elahi et al. (2023b) (Paper III) we have introduced a foreground removal technique whereupon we are able to utilise the whole ( ⊥ ,  ∥ ) plane to estimate the 21-cm PS and obtain an even tighter upper limit [Ω H i  H i ] < 2.2 × 10 −2 at a larger scale of  = 0.247 Mpc −1 as compared to the earlier works.
The earlier works with the uGMRT data discussed above have all been restricted to small sub-bands (< 25 MHz) of the entire Band 3 frequency coverage.While it is desirable to consider the largest possible bandwidths so as to increase the signal-to-noise ratio (SNR) and also access smaller  ∥ modes, it then becomes imperative to account for the fact that the properties of the instrument changes with frequency.Foremost among these is 'baseline migration' i.e. the baseline corresponding to a fixed antenna pair changes with frequency.Further, the primary beam (PB) of the individual antennas also changes with frequency, a fact which affects the normalisation of the estimated 21-cm PS.It is acceptable to ignore these effects over small observational bandwidths, as is the case for many of

Ergodic
Figure 1.This shows the workflow of this paper.
the visibility-based estimators used for the EoR 21-cm PS (e.g., Parsons et al. 2012;Mertens et al. 2018).We note that the earlier implementation of TGE does not incorporate baseline migration, although it does account for the frequency-dependent variation of the PB.In the present paper we introduce the Wideband TGE (hereafter referred to as the TGE) which also accounts for baseline migration, and is thus well suited for analysing large frequency bandwidths.
In this paper, we present the analysis of 100 MHz bandwidth data which span the frequency range 394 − 494 MHz corresponding to the redshift band  = 1.9 − 2.6.Although the TGE can now be used to analyse the entire 200 MHz Band 3 coverage, we have discarded the rest of the data due to their poor quality.The data is described in Section 2. Figure 1 presents the workflow we have undertaken for analysing the data.We have estimated the MAPS  ℓ (  ,   ) by applying the TGE on the visibility data.The TGE is described in Paper III, and in Section 3 of the present paper we have highlighted the changes we have incorporated in TGE for the wideband IM.For the subsequent analysis, we have assumed that the 21-cm signal is ergodic (statistically homogeneous) along the line-of-sight direction.This allows us to consider the MAPS  ℓ (Δ) instead of  ℓ (  ,   ), where  ℓ (Δ) =  ℓ (  ,   ) =  ℓ (|   −  |).Considering the postreionization H i 21-cm signal, such an assumption is justified because various observations (Prochaska et al. 2005;Noterdaeme et al. 2009Noterdaeme et al. , 2012;;Zafar et al. 2013;Bird et al. 2017) and simulations (Sarkar et al. 2016) respectively indicate that Ω H i () and [ H i ()  ()] (where  () is the growing mode of linear density perturbations) are nearly constant for our observational redshift interval.
We remove the foregrounds from  ℓ (Δ) using the methodologies developed in Paper III.The methods are briefly described in Section 4 and Paper III is referred to for the details.After foreground removal, we use the residual [ ℓ (Δ)] res to obtain cylindrical PS ( ⊥ ,  ∥ ), spherical PS () and the cosmological H i abundance [Ω H i  H i ] which are described in Sections 5, 6 and 7, respectively.We summarize our results and takeaways from this analysis in Section 8.
The paper uses the same values of the cosmological parameters used in our earlier works, where the values are taken from Planck Collaboration et al. (2020).

DATA DESCRIPTION
The details of the observation and initial data pre-processing are described in Chakraborty et al. (2019).A brief summary of the major steps can be found in our earlier works (Paper I, Paper II and In the initial data analysis (Chakraborty et al. 2019), 5 MHz from the edges of the bandwidth and 90% of the data in the frequency range 300 − 390 MHz are flagged due to RFIs using the software aoflagger (Offringa et al. 2010(Offringa et al. , 2012) ) and the rflag routine of casa.We have entirely discarded these in this work.Considering the rest, we have split out 100 MHz bandwidth data in the frequency range 394 − 494 MHz for the present analysis.Initially the data had ∼ 64% flagging and a visibility r.m.s. of 0.54 Jy.The blue curves in Figure 2 show the flagged percentage (left) and the r.m.s. of the real part of the visibility data (right).We see that the flagged percentage varies significantly across the frequency band, where it is < 60% in the frequency range 430 − 470 MHz and ≈ 80% at the two ends of the band.As for the r.m.s., we see that it is nearly constant (0.54 Jy) across the band except for the frequency range 470−494 MHz, where the r.m.s. is somewhat higher (0.6 − 0.8 Jy).
The leftmost panel of Figure 3 shows the probability density function (PDF) of the real and imaginary parts of the visibility data.We see that the central part of the PDF matches with a Gaussian distribution of mean  ≈ 0 Jy and standard deviation  ≈ 0.53 Jy.However, the PDF has large positive and negative values which are not predicted by the Gaussian distribution.We have identified and flagged these large outlier values by considering the median absolute deviation (MAD), which for a given data is defined as the median of the absolute deviations from the median of the data1 (see, e.g.Prasad & Chengalur 2012).For each baseline, we have calculated the MAD and flagged the frequency channels where the visibility amplitude had values significantly larger than the MAD value.The middle panel of Figure 3 shows the PDFs when the visibilities having amplitude larger than 5×MAD are flagged.We find that the large outlier values are flagged by the 5 × MAD filter.However, some outliers still remain in the data which are flagged by the 3 × MAD filter (right panel).We have finally used the data after applying the 3 × MAD filter for which the PDF of the visibility data closely matches with a Gaussian distribution with   = 0.33 Jy that gives an estimate of noise in the visibility data.The change in the flagged data percentage and the r.m.s. of the visibility data due to different thresholds of the MAD filter are shown in Figure 2.Although the fagged percentage goes up when we apply a 3 × MAD filter, we have the advantage that the noise level () drop by ∼ 62% .Further, the visibility statistics is now consistent with a Gaussian distribution, which is expected in our data that is noise dominated.
After applying the 3 × MAD filter, we have carried out IM analysis on the 100 MHz bandwidth data, which span the redshift range  = 1.9 − 2.6.The data have   = 4096 frequency channels where the central channel has the frequency   = 444 MHz (  = 2.2).We note that the present analysis is restricted to the dense and nearly uniform baseline range U ≤ 500  (Paper I).

THE WIDEBAND TGE
The Tapered Gridded Estimator (TGE) which we use here estimates the MAPS  ℓ (  ,   ) from the measured visibility data (Bharadwaj et al. 2019;Pal et al. 2021).In the present work, we use the 'Cross' TGE, which cross-correlates the visibilities measured in the RR and LL polarization states.The formalism of the Cross TGE is detailed in Paper II (also Paper III), and here we briefly summarize it to highlight the changes which we have incorporated specifically for the wideband analysis.
We have introduced a rectangular grid in  space, and the first step in TGE is to calculate the convolved-gridded visibility at each grid point U Here V   (  ) denotes a visibility measured at a frequency   , polarization state  and the baseline U  = (  /) × d  where d  refers to the antenna separation in length units.Regarding a fixed antenna pair, the fact that the baseline U  scales with frequency   (baseline migration) is a feature that we have specifically incorporated here for the wideband analysis.We note that this was not incorporated in the TGE versions used for the earlier works.The other terms retain the same meaning as in our earlier work;    (  ) is 0 if the frequency channel is flagged and it is 1 otherwise, and w(U) is the Fourier transform of the tapering window function W () =  −  2 /[   0 ] 2 in which we have used  = 0.6 for the present work (following Paper I).
After accounting for baseline migration, we cross-correlate these convolved-gridded visibilities V   () and V   () in exactly the same way as done in Paper III (equation 3).We normalize the estimator using simulated visibilities corresponding to the Gaussian random field with  ℓ (  ,   ) = 1.The simulated visibilities also incorporate the baseline migration and the frequency-dependent variation of the PB pattern of the telescope.We refer to Section 3.1 of Paper III for the details.
We use the TGE to estimate  ℓ (  ,   ) at each grid point U  = ℓ  /2.Following the notation of Paper III, here we use the vector ℓ to refer to a grid point.The 21-cm signal is isotropic on the sky plane, and for the signal  ℓ (  ,   ) =  ℓ (  ,   ), where ℓ =| ℓ | corresponds to an angular multipole.However, the measured  ℓ (  ,   ) are expected to be dominated by foregrounds, which are different for different ℓ.We found that it is beneficial to remove the foregrounds from each ℓ independently and impose the isotropy of the signal only after foreground removal.Further, the post-EoR 21cm signal is assumed to be ergodic (statistically homogeneous) over a significantly large redshift range (Sarkar et al. 2016;Noterdaeme et al. 2012;Zafar et al. 2013;Rhee et al. 2018).This ergodicity along the frequency axis allows us to assume  ℓ (  ,   ) to be stationary, i.e.,  ℓ (  ,   ) can be entirely described by the  ℓ (Δ), where Δ =|   −   |.We use the estimated  ℓ (Δ) in the subsequent sections.
The system noise present in the visibility data contributes to the statistical fluctuations of the measured  ℓ (Δ).To estimate this, we have simulated multiple (50) realizations of visibilities containing Gaussian random noise with zero mean and variance  2  where   is the r.m.s. of the actual visibility data.We have applied TGE on each noise-only simulation, and used these to estimate the system noise contribution to the r.m.s.statistical fluctuations of  ℓ (Δ).

FOREGROUND REMOVAL
In this section, we first briefly describe the foreground removal techniques introduced in Section 3 of Paper III and show the results from the present analysis.The idea is based on the distinctly different behaviour of the 21-cm signal as compared to the foregrounds.The signal is expected to be predominantly localized within a typical Δ range of 0 − 0.5 MHz, and its amplitude drops substantially and is close to zero at large Δ (Bharadwaj & Ali 2005;Bharadwaj & Sethi 2001).On the other hand, the foregrounds generally vary smoothly and remain correlated over a large Δ range.Following Ghosh et al. (2011a), we adopt a two-step procedure to remove the foregrounds from the estimated  ℓ (Δ).First, we identify a range Δ > [Δ] where the 21-cm signal is predicted to be negligible, and model  ℓ (Δ) in this range as a combination of foregrounds [ ℓ (Δ)] FG and noise ( The idea is to use the  ℓ (Δ) measured in the range Δ > [Δ] to estimate the foreground model [ ℓ (Δ)] FG .For the present analysis, we have chosen a fixed value [Δ] = 0.6 MHz and used the range 0.6 < Δ < 3.66 MHz to model the foregrounds.In the second step, we extrapolate the foreground model to predict [ ℓ (Δ)] FG in the range Δ ≤ [Δ] and subtract this out from the measured  ℓ (Δ).
Restricting the subsequent analysis to Δ ≤ [Δ], we expect the residual to be a combination of the 21-cm signal and noise and we use this to constrain the 21-cm signal [ ℓ (Δ)] T .
Figure 4 shows  ℓ (Δ) for a few values of ℓ.We see that the Δ dependence exhibits a smooth, slowly varying pattern superimposed on which we have a combination of rapid and slow oscillations.Our aim here is to use the range Δ > [Δ] to model the smooth, slowly varying Δ dependence.We have also tried modelling the oscillatory components, however, in many cases, we find that the extrapolation introduces extraneous features in the range Δ ≤ [Δ𝜈].Even in the cases where the extrapolation works, including the oscillatory components leads to a very large signal loss.Appendix A provides an example of the consequences of including an oscillatory component in the foreground model.The foreground removal technique described above has been implemented in two different approaches.In our first approach, polynomial fitting (PF), we model the foregrounds using (5) We use maximum likelihood to estimate the best-fit polynomial coefficients  2 and their error covariance, and use these to obtain [ ℓ (Δ)] FG and the foreground modelling errors in the range Δ ≤ [Δ𝜈].Note that we have restricted the value of  to the range 0 ≤  ≤ 8 to model only the smooth, slowly varying component from the measured  ℓ (Δ).
In our second approach we have used Gaussian Process Regression (GPR; Rasmussen & Williams 2006), which is non-parametric, to model and predict the foreground.This was first implemented in Paper III, where the reader can find more details.Note that we have used the george (Ambikasaran et al. 2015) library of python for the entire GPR analysis presented here.In GPR, we model [ ℓ (Δ)] FG as a Gaussian Process (GP) with a zero mean and covariance function (kernel)  FG (Δ  , Δ  ).
The idea is to use the  ℓ (Δ) measured in the range Δ > [Δ] to estimate the kernel, which models the smooth, slowly varying foreground component [ ℓ (Δ)] FG .We subsequently use this kernel to predict We have tried different possible forms for the kernel, and found that the polynomial kernel is well suited for our analysis.Here, the constants  1 and  are hyperparameters whose optimal value have been determined by maximizing the log marginal likelihood of the GP, and , which is not a hyper-parameter, denotes the order of the polynomial kernel.The polynomial kernel is found to model the overall smooth Δ dependence expected of the foregrounds and causes a minimal signal loss.
We have considered different values of  and found that larger values provide a better fit to the foregrounds in the range Δ > [Δ] but at the cost of a larger signal loss.Considering this, we have used  = 2 for the entire analysis.We have presented the results from a selected choice of kernels in Appendix A.
The GPR predicts the foregrounds [ ℓ (Δ)] FG and also the foreground modelling errors in the range Δ ≤ [Δ].For both PF and GPR, we have added the variance of foreground modelling errors to the estimated system noise variance to estimate the total error variance for [ ℓ (Δ)] res .
We find that foreground removal does not work for all the ℓ.In order to identify the ℓ where the foreground removal fails, we have quantified the amplitude of the residual [ ℓ (Δ)] res using a single parameter A whose definition we present below.We have fitted [ ℓ (Δ)] res using the predicted 21-cm signal [ ℓ (Δ)]  (equation 5 of Paper III; also see Bharadwaj & Ali 2005) where   (k ⊥ ,  ∥ ) is the dark matter power spectrum at the wave vector k whose line-of-sight and perpendicular components are We have adopted these values for the final results presented in all the subsequent analyses.
We have separately applied PF and GPR on all the ℓ.We find 43 and 47 unflagged ℓ for PF and GPR, respectively.However, these are largely mutually exclusive, and only 6 of the ℓ are common to both PF and GPR.Considering four representative cases, the four rows of Figure 4 show the results for foreground removal.The first row (from the top) considers a case where both PF and GPR successfully model and remove the foregrounds.The left column shows the measured  ℓ (Δ), along with [ ℓ (Δ)] FG for both PF and GPR.The vertical dashed line shows [Δ], and the middle and right columns, which show [ ℓ (Δ)] res for PF and GPR respectively, are restricted to the range Δ ≤ [Δ].For both PF and GPR, the residuals are found to be scattered around zero.Note that the 2 error bars shown for [ ℓ (Δ)] res combine the contributions from system noise and the uncertainties in the foreground model predictions.The second row considers a case where PF works but GPR fails because it over predicts [ ℓ (Δ)] FG whereby [ ℓ (Δ)] res is consistently below zero (right panel).The third row shows a case where GPR works but PF fails because it under predicts [ ℓ (Δ)] FG and [ ℓ (Δ)] res is consistently above zero (middle panel).The last row shows a case where both PF and GPR fail because they under predict [ ℓ (Δ)] FG , and the values of [ ℓ (Δ)] FG are consistently above zero.We draw the reader's attention to the oscillatory features which are visible in the range Δ > [Δ] for all the panels in the first column.This is particularly prominent in the third row.It is also quite obvious that our foreground model predictions do not match these oscillatory features.As demonstrated in Appendix A, including an oscillatory component in the foreground model increases the 21-cm signal loss, and in some cases it may also worsen the extrapolation in the range Δ ≤ [Δ].Based on this, we have only modelled the smooth, slowly varying Δ dependence and avoided modelling the oscillations.
For the subsequent analysis, we have discarded the ℓ grids where both PF and GPR fail to remove the foregrounds.We have separately analysed three sets, namely PF, GPR and Combined, which respectively contain 43, 47 and 84 ℓ grids.For the 6 grids where both PF and GPR work, we have used GPR, which was found to yield tighter constraints.The | ℓ | values roughly span the range 900 ≲| ℓ |≲ 3100.
Foreground removal is expected to also remove some of the 21-cm signal.Following Paper III where we have taken a very conservative approach, we have applied the foreground removal technique on [ ℓ (Δ)] T the expected 21-cm signal and used this to estimate the loss in the 21-cm signal.Considering the power spectrum, we find that the signal loss is most prominent at low  where it has value ∼ 40 − 50% at  ∼ 0.2 Mpc −1 and ∼ 20% at  ∼ 0.4 Mpc −1 .The (1) PF and GPR both work, (2) PF works, but GPR fails, (3) GPR works, but PF fails, (4) PF and GPR both fail.Column-wise from left to right, the first column shows measured  ℓ (Δ), polynomial fit (magenta) and GPR fit (black).The error bars are the 2 uncertainties in the GPR foreground model predictions.Note that the errors for both the measured  ℓ (Δ) and the PF foreground model predictions are rather small, and these would not be visible in the panels of the first column.The vertical lines (black) are drawn at [Δ ] = 0.6 MHz.The second and third columns show [ ℓ (Δ) ] res in the range Δ ≤ [Δ ] for PF and GPR, respectively.Here, the 2 error bars represent the total errors, which combine the errors due to system noise and foreground modelling, and the gray horizontal lines show zero.Note that the error predictions are scaled up by the factor  Est to account for the excess variance in the data which we have discussed later in the paper.
signal loss is less than 10% at larger .We have corrected for this estimated signal loss in all the results presented in the subsequent sections.
Here, for the purpose of visualising the results (only Figures 5  and 6), we have binned the  ℓ (Δ) values into 5 equally spaced annular bins in the -plane, and we show the binned cylindrical PS ( ⊥ ,  ∥ ).The left panel of Figure 5 shows | ( ⊥ ,  ∥ ) | before foreground removal.Considering the full ( ⊥ ,  ∥ ) plane, the values of | ( ⊥ ,  ∥ ) | are found to be in the range ∼ 10 3 − 10 7 mK 2 .We find that most of the large values | ( ⊥ ,  ∥ ) |> 10 5 mK 2 are restricted to the region  ∥ ≤ 2 Mpc −1 which extend slightly beyond the theoretically predicted foreground wedge boundary (blue dashed line).This region is entirely foreground-dominated. Considering the region  ∥ ≥ 2 Mpc −1 , we find | ( ⊥ ,  ∥ ) | has values ∼ 10 3 − 10 5 mK 2 .It is reasonable to expect that the power in this higher  ∥ region is largely due to noise.
The second panel of Figure 5 shows | ( ⊥ ,  ∥ ) | corresponding to [ ℓ (Δ)] res obtained from foreground removal using PF.Here | ( ⊥ ,  ∥ ) | has values ∼ 10 3 − 10 5 mK 2 throughout the ( ⊥ ,  ∥ ) plane.We find that the power in the  ∥ ≤ 2 Mpc −1 region has re- Here also the results are similar to those for PF, and the entire ( ⊥ ,  ∥ ) plane appears to be free of foreground contamination.Comparing the PS for PF and GPR, we find that GPR results in slightly higher values of | ( ⊥ ,  ∥ ) |.It may however be noted that GPR also has larger uncertainties in the predicted foreground model [ ℓ (Δ)] FG , and the total 2 error bars also are larger compared to PF (Figure 4).It may be possible to reduce these by varying the value of  (polynomial order of the covariance) which has now been held fixed at  = 2 for GPR.However, this could enhance the 21-cm signal loss, and we have not tried this here.The | ( ⊥ ,  ∥ ) | for the Combined set is also very similar to that of PF and GPR.In general, we may conclude that after foreground removal the estimated ( ⊥ ,  ∥ ) are largely foreground-free, although some low level residual foregrounds comparable to the noise level may possibly still be present in the data.
For a closer inspection, we have also considered the variation of | ( ⊥ ,  ∥ ) | as a function of  ∥ for the individual  ⊥ bins. ) are removed by PF and GPR.Although there is a substantial reduction, we find that the values of | ( ⊥ ,  ∥ ) | are still above the 2 system noise level, particularly at small  ∥ .However, it is necessary to note that in addition to the system noise, the error estimates after foreground subtraction also have a contribution from the uncertainties in the foreground model.
Further, the latter contribution is expected to differ depending on whether we consider PF, GPR or Combined.The dashed horizontal curve in Figure 6 shows the total 2 errors for Combined.We find that all the | ( ⊥ ,  ∥ ) | values are below this curve after foreground subtraction.
We now examine the statistics of the estimated (  ⊥ ,  ∥ ) through the quantity  (Pal et al. 2021) where   (  ⊥ ,  ∥ ) is the total error due to system noise and the foreground modelling. is expected to have a symmetric distribution around a mean  = 0 with a standard deviation  Est = 1 if the estimated values of (  ⊥ ,  ∥ ) are consistent with the predicted uncertainties   (  ⊥ ,  ∥ ).Any residual foregrounds are expected to skew the distribution of  towards a positive value.On the other hand, negative systematics which could potentially bias the estimated (  ⊥ ,  ∥ ) towards a low value would skew the distribution of  towards a negative value.However, if the distribution of  is symmetric with  Est > 1, we can say that the actual uncertainty in the estimated (  ⊥ ,  ∥ ) are larger than the predicted errors   (  ⊥ ,  ∥ ).
Figure 7 shows the probability density function (PDF) of  for PF (left), GPR (middle) and Combined (right).Considering PF, we see that the PDF is largely symmetric around 0. The majority (∼ 80 − 90%) of the values of  lie in the central part of the distribution |  |≤ 3 which is demarcated by the black dashed vertical lines.We find ∼ 2 − 3% of the samples to have slightly larger values (|  |> 5), which are possibly noise or trace amounts of unsubtracted foregrounds or systematics.The PDF of  for GPR looks similar to PF, but the values of  are found to be somewhat smaller.These smaller values are due to the fact that the error predictions from GPR are larger than those of PF.The PDF of  for Combined also shows a similar trend as that of PF.
We find that the mean () of  is ∼ 0 for all three cases, which suggests that the foregrounds have largely been removed.Considering  Est , we find this to have values 2.66, 1.30 and 2.01 for PF, GPR and Combined respectively.The difference in  Est between PF and GPR is due to the different error predictions from these two methods.The value  Est > 1 indicates that we have some excess variance as res | obtained from PF, GPR and Combined, respectively.The shaded regions show the estimated 2 statistical fluctuations from system noise only.The grey dashed horizontal curve shows the total 2 errors (system noise + foreground modelling) for Combined.compared to our error predictions   (  ⊥ ,  ∥ ).We note that an excess variance after foreground removal has previously been noted in other low-frequency observations as well (Mertens et al. 2020;Pal et al. 2021).The excess variance possibly arises because of imperfect calibration, low-level RFIs, inaccurate point source subtraction, etc. (Gan et al. 2022).Note that we have found a larger value  Est = 4.77 in our earlier analysis (Paper II, Paper III).The reduction in the excess variance in the present analysis is possibly due to the MAD filter we have used to flag the outliers in the visibility data (Section 2).It is further possible that the fourfold increase in the bandwidth has also contributed to this lower excess variance.Although the exact reason for the improvement is not known, it is expected that the reduction in the excess variance will tighten the constraints on the 21-cm PS.Following Paper III, for all the subsequent analyses, the error predictions are scaled up by the factor  Est to account for the excess variance in the data.
Similar to the earlier works, here also we find that the PDFs of  closely follow a Lorentzian distribution.The best-fit Lorentzian PDFs, which are shown with the magenta lines (Figure 7), are found to have a peak location  0 ∼ 0 and spread  = 1.28, 0.40 and 0.68 for PF, GPR and Combined respectively.

THE SPHERICAL PS
We have estimated the spherical PS () directly from [ ℓ (Δ)] res using the MLE developed in Paper II (also see Paper III).The values of () are estimated at 7 spherical bins which span the range 0.2 <  < 6.4 Mpc −1 .We used the estimated () values to obtain the mean squared brightness temperature fluctuations Δ 2 () ≡  3 ()/2 2 .Figure 8 shows Δ 2 () with the 2 uncertainties as error bars, which are obtained in different works, including the present one, using this Band 3 observation.We first focus on the present work for which the magenta, green and teal lines correspond to the Δ 2 () obtained from PF, GPR and Combined respectively.Here, the negative values of Δ 2 () are marked with crosses.
Considering PF, for which the results are also tabulated in Table 1, we find the values of | Δ 2 () | are roughly in the range (5) 2 -(250) 2 mK 2 for the -range probed here.Here, the Δ 2 () values are consistent with noise at 0 ± 2 for most of the  bins barring a few which we discuss below.The Δ 2 () values are consistent with noise at 0 ± 3 in the fourth and seventh -bins, and it slightly exceeds 5 in the sixth bin, possibly due to some surviving foreground contamination.Considering the  bins where Δ 2 () is negative, we find that the values are within 0 ± 2 and 0 ± 3 for the first and last  bins respectively.This indicates that the PS does not have negative systematics, and our estimates are robust.We obtain the tightest 2 upper limit Δ 2 UL () from the first  bin where we have Δ 2 UL () = (4.68) 2 mK 2 at  = 0.219 Mpc −1 .We next consider the results for GPR (Table 2) where we find that | Δ 2 () | have values in the range (3) 2 mK 2 to (350) 2 mK 2 .These values are slightly larger than those we have obtained for PF.Here, the Δ 2 () values are consistent with noise at 0 ± 3 for all the  bins except the third and the fourth bins where the SNR is nearly 5. We conclude that some foregrounds remain in the data due to our conservative choice of a low-order polynomial for the covariance function (equation 6).The tightest upper limit from GPR is found to be Δ 2 UL () = (7.56) 2 mK 2 at  = 0.233 Mpc −1 .For Combined (Table 3), the | Δ 2 () | values are nearly in between those found from PF and GPR, however, the 2 error bars are tightened in most of the -bins.Considering PF, GPR and Combined together, we find that the tightest upper limit on Δ 2 () comes from the smallest  bin for PF.We note that when Δ 2 () is negative, we have conservatively set Δ 2 () = 0 and reported 2 as the upper limit.We have also used these upper limits Δ 2 UL () to obtain corresponding upper limits on the cosmological H i abundance [Ω H i  H i ] at various  bins.The assumption here is that the 21-cm PS   (k) traces the redshift-space matter PS    (k) through where T, the 21-cm mean brightness temperature, is found to have a value 400 mK (equation 6 of Paper III).Here, we have obtained    (k) using Eisenstein & Hu (1998) and ignored the effect of redshift space distortions.As for the tightest constraints, we find [Ω H i  H i ] UL = 1.01 × 10 −2 at  = 0.219 Mpc −1 from PF, whereas, we find [Ω H i  H i ] UL = 1.60 × 10 −2 at  = 0.233 Mpc −1 from GPR.The upper limits obtained from these two independent methods are very close, which gives a nice cross-check for the robustness of our results.From Combined, we find a tight upper limit at a larger , at  = 0.383 Mpc −1 , where we find [Ω H i  H i ] UL = 1.06 × 10 −2 .The upper limits on [Ω H i  H i ] are also presented in Tables 1, 2 and 3.
Figure 8 also consolidates the final results from all the previous works with the same observational data.Paper I (light-blue circle) and Paper II (orange asterisk) have used the Total and the Cross TGE, respectively.Chakraborty et al. 2021 (Ch21) estimated the PS in delay space at multiple redshifts from this data.Here, we present their results (black circle) from the redshift  = 2.19.While the above works considered foreground avoidance to constrain the 21-cm PS, Paper III have used PF (red square) and GPR (blue square) to remove the foregrounds.A compilation of the best upper limits derived 25-hour Band 3 uGMRT data is given in Table 4.We note that the frequency ranges used in these works are not the same, and the results are not directly comparable.Paper I, Paper II, Paper III have used 24.4 MHz bandwidth at the central frequency of 432.8 MHz whereas, Ch21 and the present work have respectively used 8 MHz and 100 MHz bandwidth both at the central frequency of 444 MHz.
The best upper limit from all the earlier works was Δ 2 UL () = (18.07) 2 mK 2 at  = 0.247 Mpc −1 which was obtained in Paper III.The present upper limit Δ 2 UL () = (4.68) 2 mK 2 at  = 0.219 Mpc −1 is approximately 15 times tighter than our previous work.The upper limit [Ω H i  H i ] < 1.01 × 10 −2 is also ∼ 5 times tighter than our previous limit 2.2 × 10 −2 (Paper III).The large bandwidth and the MAD filter are the two key factors that lead to these improved upper limits.The four-fold increase in the bandwidth reduces the noise in the power spectrum by a factor of ≈ 4.This also allows us to probe smaller  ∥ modes, whereby we are able to reach  = 0.219 Mpc −1 in the present analysis compared to the previous analysis where the upper limit was quoted at  = 0.247 Mpc −1 .This lowers the Δ 2 () values and the uncertainties  by a factor of ≈ 1.4.The MAD filter reduces the r.m.s., giving an improvement of ≈ 1.7.We also note that the reduction in the excess variance tightens the constraints by a factor of ≈ 1.8.

CONSTRAINING [Ω H i 𝑏 H i ]
In this section, we use all the available [ ℓ (Δ)] res values to put a constraint on the single parameter [Ω H i  H i ] without estimating the spherical PS.This approach was introduced in Paper II where [Ω H i  H i ] was estimated from the measured  ℓ (Δ) using foreground avoidance.However, we used it in a slightly different form in Paper III (Section 6) which we follow here.We obtain [Ω H i  H i ] 2 = 3.18 × 10 −5 ± 4.62 × 10 −5 and 7.37 × 10 −5 ± 9.99 × 10 −5 with PF and GPR, respectively.We observe that the estimated [Ω H i  H i ] 2 values lie within 0 ± 1 noise fluctuations.Subsequently, we use these values to derive an upper limit on the expected 21-cm signal, [Ω H i  H i ] UL = 1.11 × 10 −2 and [Ω H i  H i ] UL = 1.65 × 10 −2 for PF and GPR, respectively.We obtain the tightest limit from Combined where we find [Ω H i  H i ] 2 = 3.10 × 10 −5 ± 3.60 × 10 −5 which yields [Ω H i  H i ] UL = 1.01 × 10 −2 .These results are quoted in Table 5.The upper limits obtained using this independent technique from our earlier works are also noted in that table.
Previously  the H i mass density Ω H i in the redshift range  ∼ 1.9 − 2.6 using Damped Lyman- observations.These studies quote Ω H i roughly in the range ∼ 0.4 × 10 −3 − 1.0 × 10 −3 .Using a weighted fit of all the Ω H i measurements at  < 6, Rhee et al. ( 2018) reported 0.5 × 10 −3 ≲ Ω H i ≲ 0.9 × 10 −3 for  ∼ 1.9 − 2.6 with a 95% confidence.Simulations of the post-EoR 21-cm signal (Sarkar et al. 2016) predict the bias parameter  H i to be close to unity at the range and -scales probed here.These independent studies suggest that our upper limits on [Ω H i  H i ] is around 10 times larger than the value predicted by these measurements.A wider bandwidth and longer observations from uGMRT can significantly lower the present upper limit and open up the possibility for the detection of the IM signal at these redshifts.

SUMMARY AND CONCLUSIONS
H i 21-cm intensity mapping (IM) has the potential to become a leading tool in observational cosmology.Several works have measured the 21-cm IM signal by using cross-correlation of H i measurements and optical galaxy surveys.However, it is difficult to detect the faint IM signal in auto-correlation due to the many orders of magnitude brighter Galactic and extra-galactic foregrounds.
In a series of works, we have explored a deep (25 hours) observation of the ELAIS-N1 field using uGMRT with the aim of 21-cm IM from a redshift around  ∼ 2. In these works, we have used the Tapered Gridded Estimator (TGE), which offers several unique features for estimating the 21-cm PS from radio-interferometric measurements.In Paper I of this series, we have used the TGE to analyse a 24.4 MHz sub-band of the mentioned data at 432.8 MHz ( = 2.28).In that paper, the signals in the two polarization states (RR and LL) were combined to estimate the 21-cm PS.In Paper II, we showed that the foreground level is substantially reduced if the PS is estimated by cross-correlating the RR and LL polarizations rather than combining them.Both works used foreground avoidance, and only a limited region of the ( ⊥ ,  ∥ ) plane was used to constrain the 21-cm signal.In Paper III, we have introduced a foreground removal technique which allowed us to utilise the full ( ⊥ ,  ∥ ) plane, and we obtained a tighter upper limit [Ω H i  H i ] < 2.2 × 10 −2 at  = 0.247 Mpc −1 as compared to the earlier works.
The present paper is the fourth in this series.Here, we have considered 100 MHz bandwidth data spanning the frequency range 394 − 494 MHz ( = 1.9 − 2.6).The bandwidth under consideration is four times larger than the earlier works (details in Section 2).To analyse this wideband data, we have incorporated baseline migration in TGE, which was not done in the earlier works.The (Wideband) TGE is briefly discussed in Section 3 while the detailed formalism of TGE can be found in Paper III (and references therein).Using the TGE, we estimated the Multi-frequency Angular Power Spectrum (MAPS)  ℓ (Δ) on each grid point ℓ.We then used the two foreground removal tools developed in Paper III, namely polynomial fitting (PF) and Gaussian Process Regression (GPR) on the estimated  ℓ (Δ).PF and GPR are briefly discussed in Section 4. For each of these two methods, we have separately identified the ℓ grids where foreground removal was successful and the remaining ℓ grids were flagged.We found 43 ℓ grids where PF worked and 47 ℓ grids where GPR worked, but only 6 ℓ grids where both worked.Figures 4 shows the results for a few representative ℓ grids from each of the different categories discussed above.In addition to PF and GPR, we also considered a Combined data containing 84 ℓ grids where either PF or GPR or both work.
We note that our flagging criteria is entirely based on the residual [ ℓ (Δ)] res in the range Δ ≤ [Δ], and it is possible that we include some ℓ grids where the foreground model provides a poor fit to the measured  ℓ (Δ) in the range Δ > [Δ].The third row of Figure 4 is an example of the situation where this occurs.It is however useful to note that the grids that have a poor fit in the range Δ > [Δ], have large predicted error bars for the residuals in the range Δ ≤ [Δ].The maximum likelihood estimator for the final 21cm PS automatically gives a lower weight to the contribution from such grids.An alternative approach would be to visually identify and flag such ℓ grids.We have repeated the entire analysis after manually flagging the particular ℓ grid mentioned above.We find that removing this ℓ grid systematically reduces the upper limits on 2) by 6 − 9% across all the  bins.However, we have not incorporated any manual flagging for the results reported here, and the decision to include an ℓ grid or not is entirely set by the objective criteria presented in Section 4.
We have estimated the cylindrical PS (k ⊥ ,  ∥ ) from the measured  ℓ (Δ) as well as the residual [ ℓ (Δ)] res after foreground removal.Note that this was carried out individually for each ℓ grid without binning the data.To visualise the results in Figures 5 and 6, we have binned the values of  ℓ (Δ) into equally spaced annular bins in the -plane and shown the binned cylindrical PS ( ⊥ ,  ∥ ).In Figure 5 we noticed that the large values | ( ⊥ ,  ∥ ) |> 10 5 mK 2 are removed by PF and GPR, and the residual | ( ⊥ ,  ∥ ) | are found to have values in the range 10 3 − 10 5 mK 2 .We showed | ( ⊥ ,  ∥ ) | as a function of  ∥ in Figure 6.We found that although foregrounds are largely subtracted, the PS | ( ⊥ ,  ∥ ) | are not entirely consistent with the system noise level.However, the values of | ( ⊥ ,  ∥ ) | are found to be within the statistical uncertainties when we combine the system noise with the foreground modelling errors.
We have studied the statistics of the estimated (  ⊥ ,  ∥ ) through the quantity  (equation 10). is expected to be symmetric around zero with mean ( = 0) and have a unit standard deviation ( Est ∼ 1) if foregrounds are perfectly subtracted .We find  to be symmetric with  ≈ 0, which indicates that foregrounds are largely subtracted.However, we find an excess variance ( Est > 1), indicating that the actual statistical fluctuations exceed our error predictions.The error predictions are scaled up by the factor  Est to account for the excess variance in the data.
We have used [ ℓ (Δ)] res with the scaled error predictions in a maximum likelihood estimator to obtain the spherical PS () and its uncertainties.We have used these to obtain the mean squared 21-cm brightness temperature fluctuations Δ2 ().In majority of the cases the estimated values are found to be within 0 ± 3, although there are a few cases also where the values exceed 0±5, possibly due to some remaining low level foreground contamination.For PF, we find the tightest 2 upper limits Δ 2 UL () = (4.68) 2 mK 2 at  = 0.219 Mpc −1 which yields [Ω H i  H i ] UL = 1.01 × 10 −2 .The upper limits obtained from GPR and Combined are also close to this value, which shows the robustness of our results.The upper limits and other relevant details for PF, GPR and Combined are presented in Tables 1, 2 and 3 respectively.
The PS results from the present work and all the earlier works using this Band 3 observation are shown together in Figure 8.The best upper limits from these works are all compiled in Table 4.The upper limit Δ 2 UL () = (4.68) 2 mK 2 at  = 0.219 Mpc −1 obtained in the present work is approximately 15 times tighter than the previous upper limit Δ 2 UL () = (18.07) 2 mK 2 at  = 0.247 Mpc −1 (Paper III).The tightest upper limit [Ω H i  H i ] UL = 1.01 × 10 −2 obtained in the present work from the Combined data is found to be nearly 5 times improved over the previous limit In the present work, we have obtained a nearly foreground-free PS from uGMRT wideband data.We have found a tight constraint [Ω H i  H i ] < 1.01 × 10 −2 on the 21-cm signal, which is within an order of magnitude to the expected signal (Section 7).Deeper and larger bandwidth observations from uGMRT can significantly lower this limit and open up the possibility for the detection of the IM signal at these redshifts.This wideband IM analysis technique is also promising for the ongoing MeerKAT observations (Mauch et al. 2020) and the forthcoming Hydrogen Intensity and Real-time Analysis eXperiment (HIRAX; Crichton et al. 2022), which can possibly provide cleaner (RFI-free) wideband data than the data considered here.This analysis is also suitable for CHIME (CHIME Collaboration et al. 2022) which covers large angular scales.On a separate note, the present formalism can also be adapted to low frequency (∼ 150 MHz) Epoch of Reionization (EoR) observations with the uGMRT, Hydrogen Epoch of Reionization Array (HERA; Abdurashidova et al. 2022), LOw-Frequency ARray (LOFAR; Mertens et al. 2020) and Murchison Widefield Array (MWA; Trott et al. 2020;Patwa et al. 2021).We plan to address EoR observations in future work.

APPENDIX A: CHOICE OF COVARIANCE FUNCTIONS FOR GPR
In this appendix, we have considered four different covariance functions (kernels) for the GPR analysis (Section 4).The kernels considered here are all of the form which is the sum of a polynomial kernel of order  and a cosine kernel.Here,  1 ,  2 , , and  are hyper-parameters.Of the four kernels considered here, two (K1. = 2 and K2. = 3) incorporate only the polynomial kernel ( 2 = 0), and the other two (K3. = 2 + cos and K4. = 3 + cos) incorporate both the polynomial and cosine components.We have used  ℓ (Δ) measured in the range Δ > [Δ] to estimate the optimal values of the hyper-parameters of the kernel.We subsequently use the optimized kernel to make foreground model predictions [ ℓ (Δ)] FG in the range Δ ≤ [Δ].
The analysis presented in this appendix is entirely restricted to a particular ℓ for which the results have already been shown in the third row of Figure 4 of the main text.The  ℓ (Δ) measured at this ℓ exhibits a prominent oscillatory pattern, which is why we have chosen this for demonstration purpose.The top panel of Figure A1 shows the measured  ℓ (Δ) along with the GPR foreground model predictions [ ℓ (Δ)] FG and the corresponding 2 uncertainties for the four kernels considered here.We first consider kernel K1, which is the polynomial kernel that has been adopted for the entire analysis presented in the main body of the text.Considering the range Δ > [Δ], we find that the foreground model [ ℓ (Δ)] FG primarily captures the smooth, slowly varying Δ dependence of the measured  ℓ (Δ), and is not much influenced by the oscillatory pattern.Considering the range Δ ≤ [Δ], we find that the extrapolated values of [ ℓ (Δ)] FG are in reasonably good agreement with the measured  ℓ (Δ).We next consider kernel K2, which is a higher order polynomial kernel as compared to K1. Considering the range Δ > [Δ], we now find that the foreground model [ ℓ (Δ)] FG starts responding more to the oscillatory pattern in the measured  ℓ (Δ).Here, the extrpolated values of [ ℓ (Δ)] FG do not match the measured  ℓ (Δ) in the range Δ ≤ [Δ].We next consider K3 where we have augmented K1 with an additional cosine kernel.Considering the range Δ > [Δ], we now find that the foreground model [ ℓ (Δ)] FG closely matches the measured  ℓ (Δ), including the oscillatory pattern which is missed out by K1.Considering the range Δ ≤ [Δ], we find that the extrapolated values of [ ℓ (Δ)] FG are in good agreement with the measured  ℓ (Δ), with a match that visually appears to be better than that provided by K1.Considering K4, we find that [ ℓ (Δ)] FG closely matches K3 in the range Δ > [Δ], however it goes completely off in the range Δ ≤ [Δ𝜈] where the match to the measured  ℓ (Δ) is the worst among the four kernels considered here.Finally, we see that the kernels K2 and K4 fail to match the measured  ℓ (Δ) in the range Δ ≤ [Δ], and we cannot use these for foreground subtraction for this particular ℓ grid.In summary, kernels K1 and K3 work well to predict [ ℓ (Δ)] FG in the range Δ ≤ [Δ].
The bottom panel of Figure A1 shows the expected 21-cm signal [ ℓ (Δ)]  .In order to assess the 21-cm signal loss, we have applied foreground subtraction to [ ℓ (Δ)]  .Considering the range Δ > [Δ], we find that  ℓ (Δ) is close to zero for Δ > 1.2 MHz.The behaviours of the foreground model [ ℓ (Δ)] FG is primarily decided by the values of  ℓ (Δ) in the relatively small range [Δ] < Δ ≤ 1.2 MHz.We see that the smooth, slowly varying polynomial kernel K1 is not very sensitive to this, and the foreground model prediction [ ℓ (Δ)] FG is close to zero in the range Δ ≤ [Δ].The foreground model predictions in the range Δ ≤ [Δ] increase successively as we change the kernel from K1 to K2, K3, and K4.We also expect the 21cm signal loss due to foreground subtraction to increase in the same way if we vary the kernel from K1 to K4.In order to quantify the 21cm signal loss, we have applied foreground subtraction to [ ℓ (Δ)]  considering all the available ℓ grids, and used the residuals to estimate the binned 21-cm PS.We have compared the recovered 21-cm PS with the input model to estimate the percentage of 21-cm signal loss.Table A1 presents the results for the four different kernels considered here.We have the least 21-cm signal loss for K1 where it is 50% for the lowest  bin (0.2 Mpc −1 ), and less than 25% for all the other  bins.The higher-order polynomial kernel K2 ranks second, and the 21-cm signal loss increases considerably for kernels K3 and K4 which incorporate an additional cosine kernel.In summary, we have the least 21-cm signal loss for the  = 2 polynomial kernel, and the loss increases if we increase the order of the polynomial kernel or include an additional cosine kernel.
We now focus our attention on the two kernels K1 and K3, for which [ ℓ (Δ)] FG reasonably matched the  ℓ (Δ) measured in the range Δ ≤ [Δ].The residuals after foreground subtraction are presented in Figure A2.For both the kernels, the residuals are an order of magnitude smaller than the measured  ℓ (Δ), and are comparable to the predicted error bars which are dominated by the uncertainties in the GPR predictions.For the present analysis, we have used the quantity A/dA (defined through equation 8) to quantify the amplitude of the residuals.The criteria | A | /dA > 3 has been used to identify the grids where foreground subtraction fails, and these grids are flagged out from the subsequent analysis.For the particular ℓ being analyzed here, we obtain A/dA = 2.86 and −9.16 for kernels K1 and Although foreground subtraction fails for this particular ℓ if we use kernel K3, there are several other ℓ where this kernel successfully subtracts the foregrounds.For a more comprehensive comparison, we have used kernel K3 on the entire grid, and the results are presented in Table A2.Considering Δ 2 (), we see that the values are not consistent with 0 ± 2 for any of the -bins.Further, the first two -bins have negative values of Δ 2 () that are in excess of the 2 fluctuations, hinting at the possibility that using kernel K3 may be overfitting and introducing some negative systematics.Comparing with the results for kernel K1 (shown in Table 2), we find that values of Δ 2 UL () are comparable to those for kernel K1 at the second and third -bins, whereas they are ∼ 1.5 − 3.6 times larger at the other bins.
Based on the relative performance of the various kernels considered here, we have decided to use kernel K1 for the results presented in the main text of this paper.Kernel K1 has the advantage that it is the simplest of the four kernels that we have considered.It only captures the smooth, slowly varying Δ variation and is reasonably successful in modelling the behavior of the measured  ℓ (Δ) in the range Δ ≤ [Δ].It has the added advantage that it leads to the minimum loss of the 21-cm signal.However, it fails to capture the oscillations seen in the measured  ℓ (Δ) at Δ > [Δ].It is not clear at present whether it is desirable to model these oscillations or not, as kernel K3 which models these oscillations, does not perform any better.The fact that kernel K1 fails to model the oscillations is reflected in the foreground modelling error predictions in the range Δ ≤ [Δ].This ensures that the final results, which combine all the grids, receive a relatively smaller contribution from the ℓ that have large unmodelled oscillations as compared to that from the ℓ that have small unmodelled oscillatory patterns.

APPENDIX B: RESULTS WITH DIFFERENT FLAGGING CRITERIA
Table B1 shows the results we obtain using different flagging criteria set by | A | /dA = 2, 3, 4 and 5 respectively.We find that for both PF and GPR, the results are very similar for the cases with | A | /dA > 2 and | A | /dA > 3.For | A | /dA > 4, however, we find a negative value at the first -bin for GPR, which is not consistent with zero.It implies that we have included some ℓ where GPR has over-predicted the foregrounds, leaving large negative residuals.We also find that for the criteria | A | /dA > 5, there are large positive values at different  bins, which are not consistent with noise for both PF and GPR.Based on these considerations, we have adopted the flagging criteria | A | /dA = 4 and | A | /dA = 3 for PF and GPR, respectively.

Figure 2 .Figure 3 .
Figure 2.This shows the flagging percentage (left) and the r.m.s. of the visibilities (right) as a function of frequency.The blue, orange, and green curves show the cases with No MAD filtering, 5 × MAD filtering, and 3 × MAD filtering, respectively.
∥ and k ⊥ = ℓ/ respectively, and the sinc function accounts for the finite channel width Δ  .The comoving distance , its derivative with respective to frequency  ′ and the 21-cm mean brightness temperature T have the values 5598 Mpc, 9.70 Mpc MHz −1 and 400 mK, respectively, at   = 2.2.We define the parameter A ≡ [Ω H i  H i ] 2 which quantifies the overall amplitude of the predicted 21-cm signal [ ℓ (Δ)]  .Here we treat A as a free parameter and use it to effectively quantify the amplitude of [ ℓ (Δ)] res , with dA denoting the predicted uncertainties.We have initially tried the flagging criteria | A | /dA > 3 to identify and remove the ℓ values where foreground subtraction fails.The aim here is to get those ℓ values where the residuals are largely consistent with noise.We use all unflagged ℓ values to improve the SNR.In our analysis, we have also considered several different flagging criteria such as | A | /dA > 2, 4, and 5.We find that a few more ℓ values are flagged if we choose | A | /dA > 2, whereas we pick up a few ℓ values having residual foregrounds if we choose the flagging criteria | A | /dA > 5.The results from all the different flagging criteria considered here are presented in Appendix B. We obtain the best results if we use the flagging criteria | A | /dA > 4 for PF and | A | /dA > 3 for GPR.

Figure 4 .
Figure 4. Row-wise, four cases are shown from top to bottom:(1) PF and GPR both work, (2) PF works, but GPR fails, (3) GPR works, but PF fails, (4) PF and GPR both fail.Column-wise from left to right, the first column shows measured  ℓ (Δ), polynomial fit (magenta) and GPR fit (black).The error bars are the 2 uncertainties in the GPR foreground model predictions.Note that the errors for both the measured  ℓ (Δ) and the PF foreground model predictions are rather small, and these would not be visible in the panels of the first column.The vertical lines (black) are drawn at [Δ ] = 0.6 MHz.The second and third columns show [ ℓ (Δ) ] res in the range Δ ≤ [Δ ] for PF and GPR, respectively.Here, the 2 error bars represent the total errors, which combine the errors due to system noise and foreground modelling, and the gray horizontal lines show zero.Note that the error predictions are scaled up by the factor  Est to account for the excess variance in the data which we have discussed later in the paper.

Figure 5 .
Figure 5.The left panel shows |  ( ⊥ ,  ∥ ) | before foreground removal.The second, third and fourth panels show |  ( ⊥ ,  ∥ ) | after foreground removal for PF, GPR and Combined, respectively.The blue dashed line shows the predicted foreground wedge boundary.

Figure 6 .
Figure 6.This shows the cylindrical PS |  ( ⊥ ,  ∥ ) | for fixed  ⊥ as a function of  ∥ .The blue curves show |  ( ⊥ ,  ∥ ) | before foreground removal.The magenta, green and teal curves show |  ( ⊥ ,  ∥ )res | obtained from PF, GPR and Combined, respectively.The shaded regions show the estimated 2 statistical fluctuations from system noise only.The grey dashed horizontal curve shows the total 2 errors (system noise + foreground modelling) for Combined.

Figure 7 .
Figure 7.The probability density functions (PDF) of  for PF, GPR and Combined.Lorentzian fits of the PDFs are shown with the magenta lines.The black vertical lines show |  | ≤ 3 within which 80 − 90% values are found to reside.

Figure 8 .
Figure 8.This measured Δ 2 ( ) and 2 uncertainties where the magenta, green and teal lines correspond to PF, GPR and Combined, respectively.The results from Paper III (squares: red -PF and blue -GPR), Paper II (orange asterisks), Paper I (light blue circles) and Ch21 (black circles) are shown for reference.

Figure A1 .Figure A2 .
Figure A1.This figure shows the effect of choosing different covariance functions in GPR.A representative ℓ with ℓ = 2323, previously shown in Figure 4, has been chosen for the demonstration.The top panel shows the measured  ℓ (Δ) and the foreground model predictions [ ℓ (Δ) ] FG obtained from different choices of the kernels mentioned in the legend.The error bars show 2 uncertainties in the foreground modelling.The bottom panel shows the GPR model predictions when we apply the same kernels on [ ℓ (Δ) ]  the theoretically predicted 21-cm signal.The bottom panel presents a visual demonstration of how higher-order polynomials, or complicated kernels, give rise to larger signal loss.

Table 2 .
Same as Table 1, but for GPR.

Table 3 .
Same as Table 1, but for Combined.

Table 4 .
A compilation of the best upper limits derived using the 25-hour Band 3 uGMRT data considered in this paper.The overall tightest constraints obtained using this data are marked with boldface.

Table 5 .
Constraints on [Ω H i  H i ] obtained directly from [ ℓ (Δ) ] res without referring to the spherical PS.

Table A1 .
This presents the signal loss (in percentage) from different covariance functions for GPR.