A statistical measurement of the HI spin temperature in DLAs at cosmological distances

Evolution of the cosmic star formation rate (SFR) and molecular gas mass density is expected to be matched by a similarly strong evolution of the fraction of atomic hydrogen (HI) in the cold neutral medium (CNM). We use results from a recent commissioning survey for intervening 21-cm absorbers with the Australian Square Kilometre Array Pathfinder (ASKAP) to construct a Bayesian statistical model of the $N_{\rm HI}$-weighted harmonic mean spin temperature ($T_{\rm s}$) at redshifts between $z = 0.37$ and $1.0$. We find that $T_{\rm s} \leq 274$K with 95 per cent probability, suggesting that at these redshifts the typical HI gas in galaxies at equivalent DLA column densities may be colder than the Milky Way interstellar medium ($T_{\rm s, MW} \sim 300$K). This result is consistent with an evolving CNM fraction that mirrors the molecular gas towards the SFR peak at $z \sim 2$. We expect that future surveys for HI 21-cm absorption with the current SKA pathfinder telescopes will provide constraints on the CNM fraction that are an order of magnitude greater than presented here.


INTRODUCTION
The coldest (T k < 100 K) interstellar gas has a fundamental role in forming stars and fuelling galaxy evolution throughout cosmic history. Understanding why the star formation rate (SFR) density of the Universe has declined rapidly since peaking at z ≈ 2 (e.g. Hopkins & Beacom 2006;Madau & Dickinson 2014;Driver et al. 2018) is intimately tied to determining how the mass fraction of cold gas in galaxies has evolved.
Stars form in the dusty molecular gas of the interstellar medium (ISM) and so it is believed that this phase is most important in fuelling evolution of the SFR density throughout cosmic history. Recent surveys for the tracers of the dense molecular gas, principally CO emission (e.g. Decarli et al. 2019Decarli et al. , 2020Lenkić et al. 2020;Riechers et al. 2020a,b;Fletcher et al. 2021) and the far-infrared and mm-wavelength dust continuum (e.g. Berta et al. 2013;Scoville et al. 2017;Magnelli et al. 2020), provide evidence that the mass density evolves strongly and roughly mirrors that of the SFR density (albeit with a slightly delayed peak; Tacconi et al. 2020). In contrast, observations of the diffuse atomic gas, principally using H 21-cm emission (e.g. Zwaan et al. 2005a;Braun 2012;Rhee et al. 2018;Jones et al. 2018;Chowdhury et al. 2020) and Lyman-α absorption (e.g. Noterdaeme et al. 2012;Zafar et al. 2013;Crighton et al. 2015;Sanchez-Ramirez et al. 2016;Bird et al. 2017), reveal a mass density E-mail: james.allison@physics.ox.ac.uk that evolves comparatively slowly and declines by only a factor ∼ 2 from peak SFR to the present day. Recently, Walter et al. (2020) showed that this global behaviour could be described by a simple phenomenological model of the gas parametrised by the net infall rate of ionised inter/circumgalactic gas, which replenishes the H reservoir, and the conversion of H to H2. Both these processes have declined by an order magnitude since their peak.
Evolution of the physical state of the atomic gas in galaxies is as yet unknown and could well deviate from that of the total mass density measured from 21-cm emission and Lyman-α absorption surveys. Observations of the Milky Way ISM reveal a multi-phased diffuse neutral medium that spans two orders of magnitude in temperature and density (e.g. Heiles & Troland 2003;Murray et al. 2018). Two distinct stable phases, the denser cold neutral medium (CNM; T k ∼ 100 K) and more diffuse warm neutral medium (WNM; T k ∼ 10 000 K), co-exist in a pressure equilibrium that is determined by heating and cooling processes that are dependent on the star formation rate, dust abundance and gas-phase metallicity (Wolfire et al. 2003). Further dynamical processes, such as turbulence and supernova shocks, are thought to generate an unstable third phase at intermediate temperatures (UNM; e.g. Murray et al. 2018). If the presence of CNM is a pre-requisite for molecular cloud and star formation (e.g. Krumholz et al. 2009), then a similarly strong redshift evolution would be expected for the CNM fraction.
The H 21-cm absorption line detected in the spectra of back-ground radio sources is an effective tracer of the cold phase atomic gas to large redshifts. The equivalent width is inversely related to the excitation (spin) temperature, thereby providing a means by which the relative mass fractions of distinct thermal phases can be inferred. In the few cases where the H column density (NHI) can determined independently from either 21-cm emission (e.g. Reeves et al. 2016;Borthakur 2016;Gupta et al. 2018) or Lyman-α absorption (see  and references therein; hereafter K14), the NHI-weighted harmonic mean of the spin temperature can be measured directly. K14 show that the distribution of spin temperatures for Damped Lyman-α absorbers (DLAs; NHI > 2 × 10 20 cm −2 ) at z > 2.4 is statistically different (at 4 σ significance) to those at lower redshifts. However it is observationally challenging and expensive to draw a sufficiently large sample of nearby H galaxies and distant Lyman-α absorbers in order to provide strong constraints on the redshift of evolution of the cold phase gas. Recently, Curran (2017Curran ( , 2019 showed that the 21-cm absorber population, and hence spin temperature, may evolve with the star formation history Universe. However, they use an evolutionary model for the covering fraction of radio sources that would mimic any perceived evolution in the spin temperature. Model-independent methods are therefore required to verify such a claim. An alternative approach is to infer the spin temperature indirectly from much larger 21-cm absorption-line surveys by comparing the outcomes with the expected detection yield (Darling et al. 2011;Allison et al. 2016Allison et al. , 2020Grasha et al. 2020).
In this paper we present a statistical measurement of the NHIweighted harmonic mean spin temperature in DLAs at intermediate cosmological redshifts (0.37 < z < 1.00). This uses a Bayesian technique first proposed by Allison et al. (2016) to infer the H spin temperature by comparing the expected and actual detection yields in 21-cm absorption-line surveys. The results presented here are derived from a recent commissioning project with the Australian Square Kilometre Array Pathfinder (ASKAP; DeBoer et al. 2009;Hotan et al. 2021), which carried out a survey for 21-cm absorption lines in a sample of 53 bright compact radio sources (Sadler et al. 2020;hereafter S20). This is the first demonstration of this method to measure the spin temperature at cosmological distances. In future work we plan to use data from the ASKAP First Large Absorption Survey in HI (FLASH; e.g. Allison et al. 2016Allison et al. , 2020 to measure the harmonic mean spin temperature to greater precision and enable measurements as a function of redshift. Throughout this paper we use a flat ΛCDM cosmology with Ωm = 0.3, ΩΛ = 0.7 and H0 = 70 km s −1 Mpc −1 (e.g. Spergel et al. 2007).

Data
We use data from the 21-cm absorption-line survey carried out by S20 during commissioning of the ASKAP telescope. S20 searched 21-cm redshifts between z = 0.37 and 1.0 towards a sample of 53 radio sources selected from the Australia Telescope 20 GHz (AT20G) Survey catalogue (Murphy et al. 2010). They detected four intervening H 21-cm absorbers towards PKS 0834−20, PKS 1229−02, PKS 1610−77 and PKS 1830−211, as well as 21cm absorption associated with the radio galaxy PKS B1740−517 and intervening OH 18-cm absorption towards PKS 1830−211. Although the intervening absorbers towards PKS 1229−02 and PKS 1830−211 were already known, the sample was selected based on source flux density and declination and so is not biased by known detections. We refer the reader to S20 for further details of the sample selection, observations and individual detections (see also Allison et al. 2017Allison et al. , 2019 for further details of the ASKAP detections towards PKS 1830−211 and PKS 1740−517).
Each radio source spectrum records the fractional absorption of the background continuum flux density as a function of observed frequency, along with a measurement of the noise per 18.5 kHz spectral channel. The velocity resolution ranges between 5.3 and 7.8 km s −1 and the spatial resolution, using natural weighting, is approximately 1.8 arcmin for the 6-antenna ASKAP Boolardy Engineering Test Array (BETA; Hotan et al. 2014;McConnell et al. 2016) and 50 arcsec for the ASKAP-12 array (Hotan et al. 2021). The spectral baseline is flat for the dynamic range requirements of this survey, although as noted by S20 there is an additional non-Gaussian noise component for the ASKAP-12 spectra, which is discussed further in subsubsection 2.3.1. We use these data to determine the sensitivity to the 21-cm optical depth for each spectral interval, thereby allowing us to estimate the expected number of intervening 21-cm absorber detections as a function of the properties of the foreground H and background sources.

Absorption line detection
We automate line detection using a bespoke software tool called FLASH 1 (Allison et al. 2012), which uses the P M -N (Buchner et al. 2014) implementation of M N (Feroz & Hobson 2008;Feroz et al. 2009Feroz et al. , 2013. The multi-modal capability of M N enables more than one line to be detected in a given spectrum. Detection significance is given by the Bayes factor B (e.g. Kass & Raftery 1995), a statistic that is equal to the ratio of Bayesian evidences for a line model to a null model. Since no prior preference is given for either the line or null models, which is reasonable if we are testing for the unknown incidence of 21-cm absorption lines, B is equal to the odds in favour of the line model. To calculate the Bayesian evidence we use the likelihood function for a normal distribution, with standard deviation equal to the measured rms noise in each spectral channel.
For the purpose of 21-cm absorption line detection we use a single Gaussian profile to model the optical depth in velocity, which is then converted into the observed absorption line. We use non-informative priors for the model parameters within ranges set by the data and physically realistic limits; for the line position we use a uniform prior over the full range of the spectrum, for the full width at half maximum (FWHM) we use a loguniform prior between 0.1 and 2000 km s −1 , and for the peak optical depth we use a loguniform prior between 1 per cent of the median rms noise per 18.5 kHz channel and a maximum value of 100.
A Gaussian profile is appropriate given that broadening is expected to be dominated by Doppler shift due to line-of-sight thermal, turbulent and bulk motion of the gas. A typical 21-cm absorption line will have a more complex velocity profile due to the relative kinematic and spatial distributions of the foreground absorber and background radio source. However, the effect of more complex model choices on detecting high signal-to-noise lines is not expected to be significant. Solid histograms denote results for the S20 spectra. For comparison, the empty hatched histograms denote the larger data set of the ASKAP wide-field survey of the GAMA 23 field, undertaken during FLASH early science (A20). Regions of overlapping negative and positive features are coloured magenta. The solid black line denotes a model fit to the distribution of positive features detected in the A20 spectra. There are six negative features that are clearly reliable, of which four are intervening 21-cm absorbers, one is 21-cm absorption associated with the radio galaxy PKS B1740−517 and one is OH 18-cm absorption towards PKS 1830−211 (see S20 for further details).

Detection reliability
Our detection statistic, B, is the Bayesian odds in favour of a feature represented by the absorption-line model, with features at ln B 3 strongly favoured. Artefacts that remain in the data after processing, and that are not accounted for by the null model, will reduce the detection reliability (also known as the purity or fidelity). This issue is particularly problematic for absorption-line surveys, where any bandpass error not corrected by calibration is amplified in the source spectrum. Possible solutions include further data processing techniques to remove or mask artefacts, adapting the null model to account for them, or distinguishing them from true lines in parameter space, a posteriori. Here we use the latter method, focusing specifically on the detection statistic B, but will explore other approaches in future work.

Using positive features to determine reliability
We characterise the distribution of false absorption-like detections by running our detection method on the inverted spectra, and then inspect the distribution of positive (emission-like) features detected as a function of B. This is analogous to the procedure in blind surveys for H 21-cm emission in the nearby Universe (e.g. Serra et al. 2012Serra et al. , 2015 and CO emission at higher redshifts (e.g Walter et al. 2016;Pavesi et al. 2018;Lenkić et al. 2020) that use negative (absorption-like) features to determine reliability. The implicit assumption is that the incidence of true lines amongst the reference distribution is negligible compared with the incidence of features due to noise and/or artefacts. We discuss the validity of this assumption for our data in subsubsection 2.3.2 below.
In Figure 1 we show the distribution of detected negative 1 https://github.com/drjamesallison/flashfinder and positive features as a function of B. For comparison, we include results from the larger ASKAP wide-field blind survey of the GAMA 23 field by Allison et al. (2020) (hereafter A20), which was undertaken during FLASH early science. The reliability of both surveys is limited by a non-Gaussian contribution to the spectral noise, at the level of ∼ 1 per cent of the signal, which is caused by incorrect firmware weights used for correcting the coarse channelisation in the ASKAP-12 array 2 . Because the additional noise is multiplicative, it is preferentially detected towards sources with higher signal-to-noise continuum. This behaviour is evident in the results shown Figure 1; the distribution of positive features detected in S20 is skewed to higher ln(B) values than A20, which is consistent with the proportion of higher signal-to-noise spectra in that sample. We note that in blind surveys for CO emission the reliability of low signal-to-noise lines is typically determined by modelling the distribution of false detections assuming negative/positive symmetry, thereby enabling fainter sources to be included in the CO luminosity function. However, for the data considered here it is not certain that the additional positive and negative features generated by the channelisation error would be distributed symmetrically. Without further characterisation of the additional noise we cannot determine the reliability of excess negative features within the low signal-to-noise region. We therefore take a conservative approach by selecting a ln(B) threshold above which we are confident that any absorption-like features are reliable.
We capture the behaviour of the well-sampled distribution of positive features in the A20 spectra by fitting a skew normal distribution in the logarithm of ln B, with parameter λ = −3 (Azzalini & Valle 1996). We stress that there was no basis for this choice other than the model provides a reasonably accurate analytical representation of the observed distribution. We extrapolate from this model that there is less than 0.003 probability for positive features above ln B = 16, corresponding to an expectation of less than one detection. This is the reliability threshold adopted by A20 to determine the completeness of those data. In the case of the S20 spectra, the distributions are relatively under-sampled and harder to model; we also note the excess negative features in the low signal-to-noise regime that could either be caused by real absorption lines or simply bias in the noise behaviour. Although it is not possible to determine if some of these excess negative features are true intervening 21cm absorbers, none are detected at the 21-cm line position in the spectrum corresponding to the source redshift and so we can rule out associated absorption. Given that the distribution appears to be skewed to higher ln B than that of A20, we choose a threshold of ln B = 20, above which no detections of negative or positive features are apparent in the low signal-to-noise distribution.

Consideration of other spectral lines
As previously mentioned, our reliability analysis assumes that the reference distribution of positive (emission-like) features is dominated by noise and not contaminated with a significant fraction of true spectral lines. We discuss the validity of that assumption here.
First we consider the possibility that the positive features may contain detections of H 21-cm emission. We note that none of the positive features are detected at the 21-cm line position in the spectrum corresponding to the source redshift, and so we can rule out emission associated with the host galaxy. However, it is also possible that we may detect H emission from other galaxies in our line of sight. In the case of the S20 spectra, the lowest redshift is z = 0.37 with a typical spectral rms noise of ∼ 10 mJy beam −1 per 18.5 kHz, which for a line width of ∼ 100 km s −1 gives a 5-sigma detection limit of MHI ∼ 7.8 × 10 11 M (e.g. Meyer et al. 2017). Likewise, for the A20 spectra, the lowest H redshift is z = 0.34 with a median rms noise of 3.2 mJy beam −1 , giving a corresponding mass limit of MHI ∼ 2.1 × 10 11 M . Both limits are well beyond the high-mass end of the local H mass function (e.g. Jones et al. 2018) and so we conclude that the serendipitous detection of 21-cm emission in these spectra is very unlikely.
Of the other emission lines that we could possibly detect at these frequencies, the most likely are luminous masing emission from the main 1665 and 1667 MHz lines of the hydroxyl radical (OH). The S20 spectra cover OH redshifts between z = 0.61 and 1.34, which for a maser line width of ∼ 100 km s −1 corresponds to a 5-sigma detection limit between L1665 = 0.3 and 1.8 × 10 5 L . In the case of the A20 spectra, they cover OH redshifts between z = 0.57 and 1.10 with a corresponding 5-sigma detection limit between 7.3×10 3 and 3.7×10 4 L . First we consider OH emission (and absorption) associated with host galaxies of both the radio sources and the reliably identified 21-cm absorbers. Only the known OH absorption line towards PKS1830−211 is detected. Secondly, we consider the possibility of serendipitously detecting line-of-sight megamaser emission. We can use the low-redshift (z < 0.23) OH megamaser luminosity function measured by Darling & Giovanelli (2002) to estimate the expected number of detections. Assuming a total sky area covered by the S20 spectra ∼ 0.05 deg 2 (based on the angular resolution of each spectrum) the expected number of serendipitously detected OH megamasers is negligible (i.e. NOH 1), even if the luminosity function evolves by an order of magnitude at higher redshifts. Likewise, for the 1253 spectra of A20 the total sky area covered is ∼ 0.76 deg 2 , again giving a negligible number of expected OH megamasers.
Clearly contamination from other emission and absorption lines is not an issue with these data. However, future large 21cm absorption surveys with the SKA and pathfinder telescopes will be sensitive to a greater volume and will need to take into consideration these other emission and absorption lines when determining the reliability of their detections. For these future surveys, in addition to OH, we will need to consider that other molecular species at high redshift (for example the H2CO 6-cm line) could give rise to confusing absorption lines that would affect 21-cm absorption reliability. Furthermore, there are other masing transitions that lead to positive features that would contaminate the reference distribution, such as the radio-frequency recombination lines and the conjugate OH satellite lines at 1612 and 1720 MHz.

Completeness
We define the completeness as the probability that an absorption line with a given peak signal-to-noise ratio (S/N) and width is recovered from the data using our detection method and reliability threshold. This is determined by randomly populating our sample spectra with 1000 Gaussian-profile absorption lines in bins of peak S/N and FWHM and measuring the fraction that are recovered. The results are shown in Figure 2 and define the completeness as a bivariate function of the peak S/N and FWHM.
We now consider how these are used to determine the completeness in terms of the physical properties of absorption lines. For a line with peak optical depth τ , the peak S/N in i'th spectral in the S20 spectra, as a function of the peak S/N in a single 18.5-kHz (5.3 -7.8 km s −1 ) channel. The lines denote smoothing of the simulated data (circles) using a Savitzky-Golay filter (Savitzky & Golay 1964).
channel towards the j'th source is given by where c f is the source covering factor, S c i,j is the unabsorbed continuum flux density, and σi,j is the rms noise. If we consider a Gaussian profile, then the peak optical depth is given in terms of the physical properties of the absorber by where NHI is the H column density, Ts is the spin temperature, and ∆vFWHM the FWHM. We can therefore use Equation 1 and Equation 2 to calculate the expected peak S/N in a given spectral channel as a function of the physical properties of the absorber, and then use the results of Figure 2 to determine the completeness Ci,j.

THE EXPECTED DETECTION YIELD
To infer the spin temperature, we need to determine how many intervening absorbers we would have expected to detect in the data as a function of Ts. In the discrete limit, the expected number of detected intervening absorbers is evaluated by the following sum over J sight-lines, each with Ij spectral channels, where θ is the set of parameters at the absorber that determine detection, {Ts, c f , ∆vFWHM}, δXi,j is the absorption path length spanned by the i, j'th spectral channel, and f (NHI, z) is the column density distribution function, equal to the frequency of absorbers intersecting a given sight-line with a column density between NHI and NHI + dNHI, per unit column density per unit comoving absorption path. The completeness, Ci,j(NHI, θ), is as defined in subsection 2.4. We integrate over column densities between Nmin and Nmax, which are selected based on the range of column densities for which we expect the data to be sensitive to 21-cm absorption. The source redshift weighting function, wj(z + ∆zasc), is the probability that the j'th source is located at a redshift greater than z +∆zasc. The offset in redshift is used to exclude absorption associated with the host galaxy of the source, and is given by where ∆vasc = 3000 km s −1 .
To determine µ abs as a function of spin temperature, we marginalise over the velocity width and covering factor where p(c f , ∆vFWHM|Ts) is the joint conditional probability distribution for c f and ∆vFWHM. We assume that the covering factor and velocity width are independent, so this can be expressed as a product of the individual conditional probability distributions for these two quantities.
In the remainder of this section we discuss in further detail the factors and assumptions that are used in Equation 3 and Equation 5.

Source redshift
For the 50 sources in the S20 sample with reliable spectroscopic redshifts, the weighting wj(z) is simply given by where z src j is the redshift of the j'th source. Typical uncertainties in the optical spectroscopic redshifts of about 50 km s −1 correspond to a fractional uncertainty in each sight-line of less than 0.05 per cent.
For the remaining three sources without spectroscopic redshifts, we use the statistical redshift distribution given by De Zotti et al. (2010) for bright radio sources (S1.4 > 10 mJy) in the Combined EIS-NVSS survey of Radio Sources (CENSORS; Brookes et al. 2008), where Nsrc(z) = 1.29 + 32.37 z − 32.89 z 2 + 11.13 z 3 − 1.25 z 4 . (8) Measurement uncertainty in the CENSORS redshift distribution contributes a fractional uncertainty in each sight-line that increases with redshift from approximately 4 per cent at z = 0.37 to 8 per cent at z = 1.0. However, since these sight-lines comprise only 5 per cent of the sample, the uncertainty in the total path length due to the source redshifts is only about 0.3 per cent.

Source covering factor
An absorption line measured from a spectrum is an average over the spatial extent of the unresolved background continuum source. For the interstellar atomic gas in galaxies, distinct structures are evident on linear sizes as small as 100 pc (Braun 2012), and their column density distribution has been measured in both the nearby Universe, using resolved studies of 21-cm emission (e.g. Zwaan et al. 2005b;Braun 2012), and at cosmological distances using Lyman-α absorption towards compact UV emission from active galactic nuclei (AGNs; e.g. Noterdaeme et al. 2012;Neeleman et al. 2016;Rao et al. 2017;Bird et al. 2017).
In the case of H 21-cm absorption the background radio source can extend well beyond the AGN, so that at the intervening galaxy a large fraction of the flux density is distributed on scales larger than that probed by Lyman-α absorption. The effect of averaging over extended radio sources is to distribute the 21-cm absorbers to lower optical depths than would be expected given the distribution measured at higher spatial resolution (Braun 2012). The impact on our model is to over predict how many 21-cm absorbers we expect to detect. It is therefore useful to define a source covering factor, c f , that corrects for the unknown areal fraction of the source flux density that is subtended by a foreground absorbing structure of putative constant optical depth (see Equation 1).
Ideally we need information about the mas-structure of each source at sub-GHz frequencies, in order to fully understand the expected effect of source size for 21-cm absorption (e.g. Braun 2012; K14). In the absence of this information for our sample, we instead extrapolate from what we understand about the source morphology at other wavelengths. S20 selected sources from the AT20G catalogue (Murphy et al. 2010) that have flux densities greater than 500 mJy at 20 GHz and 1.5 Jy at 1.4 GHz/843 MHz, thereby including a high fraction of radio-loud QSOs with compact radio emission. Their analysis of very long baseline interferometric (VLBI) imaging at 5 -15 GHz in the literature indicated that these sources typically have 20 -60 per cent of their flux density in components smaller than ∼ 10 mas, corresponding to projected sizes between 50 and 80 pc for the redshift range used in this survey. They note that this is consistent with the results of Horiuchi et al. (2004), who found that for a larger complete and flux-density limited sample of 303 radio sources at 5 GHz, about 50 per cent of the flux density is typically contained in a 10 mas component, with 20 per cent from a radio core of average size 0.2 mas.
If we assume that the ratio of the 10-mas flux density to the total flux density is a reasonable proxy for the covering factor, then based on the above results we might expect that a typical value is c f ≈ 0.5. We note that this also assumes that the source structure at 5 GHz is representative of the structure at sub-GHz frequencies. However, Kanekar et al. (2009a) (see also K14) carried out a VLBI imaging study at sub-GHz frequencies of the radio-loud quasars in their DLA sample, using the ratio of core-to-total flux density as a proxy for the covering factor. Allison et al. (2016) used a twotailed Kolmogorov-Smirnov (KS) test to show that the hypothesis that covering factors obtained by Kanekar et al. are drawn from a 0−1 uniform distribution is true at the level of p = 0.05, but not for p = 0.01. Visual inspection of the distribution shows that there may be a paucity of quasars in the Kanekar et al. sample for covering factors less than c f ∼ 0.2 which is consistent with the Horiuchi et al. (2004) result that 20 per cent of the flux density in radio-loud AGN selected at 5 GHz is within the sub-mas radio core.
Based on these results, we marginalise the expected number of absorbers over the covering factor by drawing randomly from a uniform prior between c f = 0 and 1. We further assume that the covering factor is not conditional on the spin temperature. It is possible that the true distribution of covering factors may be skewed to lower values than uniform, in which case we would underestimate the expected detection yield and therefore overestimate the inferred spin temperature.

Velocity width
Sensitivity to a resolved 21-cm absorption line of fixed equivalent width is inversely related to the square-root of its width. We marginalise the expected number of detections over the distribution of velocity widths for a sample of detected intervening 21-cm absorbers in the literature 3 . We assume that this sample distribution is drawn from a sufficiently large range of observations with different sensitivities and spectral resolutions to be representative of the population and not censored by the underlying sensitivity to velocity width. This is shown in Figure 3, along with a log-normal fit to the data, from which we draw random line widths.
By using this single distribution for the velocity width, we have assumed that it is not conditional on the spin temperature. This is true for absorption lines where the dominant broadening mechanism is from bulk rotational or turbulent motion of the gas, but not when thermal broadening is important. Since the spin temperature is either equal to or less than the gas kinetic temperature, depending on the dominant mechanism for excitation of the 21-cm line (e.g. Purcell & Field 1956;Field 1958Field , 1959Bahcall & Ekers 1969;Liszt 2001), this places a lower limit constraint on the velocity width. We therefore apply the following additional constraint on the allowed range of velocity widths ∆vFWHM ≥ 8 ln (2) kB mH Ts, where kB is the Boltzmann constant and mH is the mass of a hydrogen atom. Note that for typical H spin temperatures in the range 100 -1000 K, this corresponds to lower bounds on the FWHM between 2 and 7 km s −1 , which is consistent with the literature distribution.

Column density distribution function
The column density distribution function, f (NHI), gives the frequency of absorbers intercepting a given sight-line with a column density between NHI and NHI + dNHI, per unit column density per unit comoving absorption path. In Equation 3 we integrate over column densities that span equivalent DLA systems (Nmin = 2 × 10 20 cm −2 , Nmax = ∞), which are thought to have sufficient warm neutral gas to temperature-shield and produce CNM (Kanekar et al. 2011). The four intervening absorbers detected in this sample all have integrated optical depths that correspond to column densities in the equivalent DLA range for Ts > 100 K and c f < 1 (S20), suggesting that this is the range of column densities for which our data sensitive. The expected number of detected 21cm absorbers should therefore be considered as those with column densities in the equivalent DLA range.
Precise measurements of f (NHI) have been obtained using 21-cm emission line surveys in the local Universe (e.g. Zwaan et al. 2005b;Braun 2012) and DLAs at cosmological distances (e.g. Noterdaeme et al. 2012;Neeleman et al. 2016;Rao et al. 2017;Bird et al. 2017). Searches for DLAs at UV-wavelengths using archival HST data by Neeleman et al. (2016) and Rao et al. (2017) are most closely matched in redshift to our data, but the sample sizes are insufficient to provide precise measurements of f (NHI) across the full range of equivalent DLA column densities. We therefore linearly interpolate between the results of Zwaan et al. (2005b) and Braun (2012) at z = 0, and Bird et al. (2017) at z = 2, who obtained the most precise DLA measurement yet using a catalogue generated by a Gaussian Process (GP) method from the Sloan Digital Sky Survey III Data Release 12 ).

The low-z f (NHI): spatial resolution and self-absorption
In the low-z local Universe, the column density distribution functions measured by Zwaan et al. (2005b) and Braun (2012) disagree significantly, the former being steeper than expected at high column densities for a random oriented gas disc. The measurement by Zwaan et al. (2005b) was carried out using HI emission images from a sample of 355 galaxies observed in the Westerbork HI Survey of Irregular and Spiral Galaxies (WHISP; van der Hulst et al. 2001) with a maximum linear resolution of about 1.3 kpc. In contrast, Braun (2012) only looked at the H images of three galaxies in the Local Group (M31, M33 and the Large Magellanic Cloud), but in much greater detail at 100 pc resolution and with thousands of independent sight lines.
There are several possible reasons for the differences seen in these two measurements. First, the lower resolution of the study by Zwaan et al. (2005b) will re-distribute small-scale structures to lower column densities, and thus steepen f (NHI). Braun (2012) suggested that this is broadly consistent with their data once smoothed to a similar resolution, but not reproduced in detail. Secondly, by modelling each emission line profile in detail, Braun (2012) showed that 21-cm self-absorption can have a significant effect in reducing high column densities in the range 22 < log 10 (NHI) < 23. However, we note that despite the exquisite spatial detail and self-consistency within the Local Group sample, it is not yet clear how representative it is of the H distribution in the larger galaxy population at z = 0.
Given the discrepancy between these measurements, we cal- culate the expected number of 21-cm absorbers using both distributions and use the resulting difference as the standard error (with a median of 27 per cent).

Dust obscuration and redenning in DLA sight-lines
The possibility that a significant fraction of quasars with dusty intervening absorbers are missing from flux-limited optical samples has long been the subject of vigorous investigation. If true, then these optical surveys would underestimate the number density of high-NHI DLAs at z > 2 and there would be a corresponding steepening of the column density distribution function that we use here.
Until recently there was no strong evidence of a large missing fraction of optical DLAs; in a combined Bayesian analysis of the optical and radio constraints Pontzen & Pettini (2009) found that only 7 per cent of DLAs are expected to be missing from opticallyselected samples due to dust obscuration, with at most 19 per cent missing at 95 per cent confidence. However, Krogager et al. (2019) have recently considered the colour selection used in SDSS-II Data Release 7, in addition to the magnitude limit, and found that the fraction of missing DLAs could be as much as 28 per cent at z ≈ 3 and 42 per cent at z ≈ 2, for the lowest dust-to-metal ratio of log 10 κZ = −21.1.
The column density distribution function we use here was measured by Bird et al. (2017) using spectra from SDSS-III DR12, for which a similar analysis has yet to be carried out. Krogager et al. (2019) state that although the selection criteria used for SDSS-III is more complex, it may still be susceptible to the same biases in SDSS-II. In the absence of further information, we calculate the expected number of 21-cm absorbers in our data assuming that the column density distribution function of Bird et al. (2017) may be systematically low by 20 per cent, using the resulting difference as the standard error (with a median of 9.7 per cent).

Absorption path length
The absorption path length defines the comoving interval of the survey that is sensitive to intervening absorbers. The infinitesimal path length element dX spanned by a redshift element dz is given analytically by where (13) ν is the corrected observed frequency, and νHI is the rest frequency of the H 21-cm line, equal to 1420.40575177 MHz (Hellwig et al. 1970).
In the discrete limit, We can estimate the expected total absorption path length sensitive to a given column density by evaluating the following sum over J sight lines, each with Ij channels Ci,j(NHI, θ) wj(zi,j + ∆z asc i,j ) δXi,j, (14) where the completeness and source redshift weighting are as defined above. Again marginalising over the above distributions for ∆vFWHM and c f , and assuming a fiducial Ts = 100 K, the expected total path length sensitive to DLA column densities (NHI > 2 × 10 20 cm −2 ) is ∆X = 8.7 (∆z = 4.7) and for super-DLAs (NHI > 2 × 10 21 cm −2 ) is ∆X = 35 (∆z = 19). For a higher spin temperature of Ts = 1000 K, these intervals decrease to ∆X = 0.17 (∆z = 0.092) and ∆X = 7.5 (∆z = 4.1), respectively. In Figure 4 we show the differential absorption path length, which gives the sensitivity to 21-cm absorbers as a function of redshift. Since observations with the more sensitive ASKAP-12 telescope were undertaken at higher frequencies, this sensitivity function is skewed to lower redshifts.

A statistical measurement of the spin temperature
We show the expected number of detected intervening 21-cm absorbers as a function of spin temperature in Figure 5. The 68 per cent uncertainty is also given, equal to the quadrature sum of the standard error due to measurement uncertainty (assumed in total to be about 10 per cent), systematic differences between the nearby 21cm emission line surveys, and the dust obscuration in DLA surveys. The median uncertainty in µ abs is about 30 per cent, but increases significantly at high spin temperatures due to uncertainty in the number of high column density systems that dominate at these sensitivities. The expected number of detections at the mass-weighted harmonic mean spin temperature for the Milky Way ISM (≈ 300 K; Murray et al. 2018) is 0.46 ± 0.06, which is notably less than the four detections obtained by S20 and indicates possible tension with the data. We determine the posterior probability density function 4 for where Pr(N abs |Ts) is the likelihood function, p(Ts) is the prior probability density, and Pr(N abs ) is the normalising constant known as the marginal likelihood or model evidence. In section 3 we obtained the probability density for the detection yield (µ abs ) conditional on the spin temperature, p(µ abs |Ts). We model this as a normal distribution with mean and standard deviation as shown in Figure 5. These results are then used to determine the probability of detecting N abs as a function of spin temperature by evaluating the following integral, Pr(N abs |Ts) = Pr(N abs |µ abs ) p(µ abs |Ts) dµ abs , where the likelihood function for µ abs is modelled as a Poisson distribution given by To determine the posterior probability density given in Equation 15 we also need to choose an appropriate prior, p(Ts). For smaller surveys, where the number of detections are fewer and hence the likelihood is less constraining, this choice can have a significant affect on the inferred spin temperature. Since we have no strong prior information on the value of the spin temperature, we choose a non-informative prior. In the context of the Poisson process being considered here, the spin temperature prior is given in terms of the detection yield by p(Ts) = p(Ts|µ abs ) p(µ abs ) dµ abs , where p(Ts|µ abs ) is just the inverse of the above-mentioned normal distribution, and p(µ abs ) is the prior for the detection yield. We choose the non-informative Jeffreys prior, p(µ abs ) ∝ 1/ √ µ abs , which corresponds to invariance of the detection yield under a change of parameterisation (Jeffreys 1946). This is an improper prior, but valid for a finite range of spin temperatures. We choose 0.1 K < Ts < 10 5 K, covering all reasonable values of the spin temperature expected from our understanding of the physical conditions in the interstellar medium of galaxies (e.g. Wolfire et al. 2003). An equally suitable choice of non-informative prior would be p(µ abs ) ∝ 1/µ abs (e.g. Jeffreys 1961;Novick & Hall 1965;Villegas 1977), which we also consider here in our results.
We can now estimate the posterior probability for the spin temperature given that four intervening 21-cm absorbers were detected in the S20 data. In Figure 5 we plot the resulting cumulative probability function, Pr(< Ts|N abs ) , calculated by integrating the posterior probability density given by Equation 15. We find that Ts < 274 K with 95 per cent probability, increasing to 387 K for the above alternative choice of non-informative prior.

Interpreting the measurement
Since the 21-cm line optical depth is inversely proportional to the spin temperature, the value inferred from such a measurement is a column-density-weighted harmonic mean over the line-of-sight H present in each thermally distinct phase, where NHI = i Ni. In the case of a simple two-phase model, where the spin temperatures of the individual phases are known, then the measurement of Ts can be inverted to infer their relative fraction. In practice the coldest phase dominates the measurement and this expression reduces to a simple ratio of the cold fraction and temperature. However, in general any predictive model of the multi-phased ISM can be tested by measuring Ts.
In the Milky Way ISM there are three observed thermally distinct phases: the cold (CNM; TCNM ∼ 100 K), unstable (UNM; TUNM ∼ 500 K) and warm neutral medium (WNM; TWNM ∼ 10 000 K), with total mass fractions of approximately 28, 20 and 52 per cent respectively (Heiles & Troland 2003;Murray et al. 2018). The mass-weighted harmonic mean spin temperature over the Milky Way is therefore approximately 300 K, and is seen to be constant with galactocentric radius outside of the solar circle (Dickey et al. 2009). This suggests that the phases are sufficiently mixed that random DLA-like sight-lines observed through our Galaxy would give on average a measurement of Ts ∼ 300 K.
Given that the relative fraction of neutral phases depends on the gas-phase metallicity, dust abundance and ambient UV field of individual galaxies (Wolfire et al. 2003), we expect Ts to vary within the population. Such variation is seen within the Local Group, where the Small Magellanic Cloud has a much smaller fraction of CNM than other members, although this is somewhat mitigated by a correspondingly lower CNM temperature (Dickey et al. 2000). Likewise we expect to see variation in the Ts inferred from DLAs simply due to variance created by sampled sight-lines; indeed a high intrinsic scatter is seen in the spin temperature of individual DLAs within a given redshift range (see Figure 8 and K14).
Despite this variance in the Ts for individual objects, we can test for systematic changes in the physical conditions of H gas with redshift. For the individual DLAs in the sample of K14, an estimate of the population Ts can be obtained by taking the sample NHIweighted harmonic mean. Likewise, the value of Ts we infer from Equation 15 is the NHI-weighted harmonic mean over the DLA population within the redshift range spanned by the survey, and can  Figure 4, but including data from the wide field survey of the GAMA23 field, undertaken as a FLASH early science projct (A20). therefore be usefully compared with direct measurements at other redshifts.

Including results from the FLASH GAMA23 survey
As previously mentioned in subsubsection 2.3.1, A20 (Allison et al. 2020) recently reported results from a wide-field H 21-cm absorption survey of the GAMA 23 field , undertaken as a FLASH early science project with ASKAP-12. They searched for H absorption towards 1253 radio sources at a median rms noise level of 3.2 mJy beam −1 per 18.5kHz, covering redshifts between z = 0.34 and 0.79 over a sky area of approximately 50 deg 2 . In a purely blind search of the data Allison et al. did not detect any absorption lines, but by cross-matching radio sight lines with known optically-selected galaxies they did find an absorber associated with gas in the outer regions of an early type galaxy. Allison et al. calculated the expected detection rate as a function of the spin temperature to covering factor, but unlike the method described here, assumed a constant line width ∆vFWHM = 30 km s −1 and a fixed completeness threshold corresponding to a S/N of 5.5 in a single channel. They reported that if the typical spin temperature to covering factor ratio at these redshifts is equal to 300 K, then the probability of detecting no absorbers is about 64 per cent.
It is straight forward to update our statistical measurement of the spin temperature to include this additional information, given that the linear combination of independent Poisson processes is itself a Poisson process with mean equal to the sum of individual means. We re-analyse the data using the method described in this paper, and the completeness simulations carried out by A20 to a reliability threshold of ln B > 16. The total absorption path length in the combined S20 and A20 surveys can be estimated using Equation 14, which for Ts = 100 K we find that ∆X = 15 (∆z = 8.2) for DLA column densities, and ∆X = 120 (∆z = 69) for super-DLA column densities. Likewise, for Ts = 1000 K the intervals are ∆X = 0.18 (∆z = 0.097) and ∆X = 13 (∆z = 7.0), sensitive to DLAs and super-DLAs, respectively. In Figure 6 we show the differential absorption path length as a function of redshift, giving the sensitivity function to 21-cm absorbers over the interval spanned by the combined surveys. Given that the A20 survey spanned redshifts between z = 0.34 and 0.79, there is a corresponding increase in the path length over this range.
In Figure 7 we show the combined number of 21-cm absorbers expected to be detected in both surveys as a function of spin temperature, and the updated spin temperature probability conditional on detecting 4 absorbers. For the Jeffreys prior we find that Ts < 420 K with 95 per cent probability, increasing to 567 K for the alternative prior. Given that the blind survey of the GAMA 23 field did not yield any further detections of intervening 21-cm absorbers, our upper limit on the harmonic mean spin temperature does increase, as one would expect. However, we caution that the sources in the GAMA 23 survey were selected purely based on their total flux density, and so the distribution of covering factors for those sources could be significantly skewed to lower values than the uniform prior assumed here. It is therefore likely that the spin temperature is lower than this limit and closer to that determined from just the survey of S20.

Direct measurements of Ts in individual DLAs
There have been several 21-cm absorption line surveys of known individual DLAs with the goal of directly measuring the spin temperature, although it should be noted that these too suffer from uncertainty in the source covering factor. These targeted surveys have been compiled and summarised into a single study by K14, who found that DLAs at z > 2.4 have a statistically (4 σ significance) different distribution of spin temperatures to those at lower redshifts, and are on average higher. They also found evidence (3.5 σ significance) of an anti-correlation between the spin temperature and metallicity in DLAs. These results are expected if the DLAs observed in the earlier Universe are on average more relatively metal poor (e.g. Rafelski et al. 2012;Cooke et al. 2015;De Cia et al. 2018) and therefore lacked sufficient coolants in the gas to form a significant fraction of CNM via fine structure cooling.
In Figure 8 we plot as a function of redshift the 95 per cent upper limit on the NHI-weighted harmonic mean spin temperature from our statistical measurement, and the direct measurements from the DLA sample of K14. Using the Kaplan-Meier estimator of the survival function to include lower limits 5 , we calculate the median and 95 per cent confidence interval for the NHI-weighted harmonic mean of the K14 sample in redshift bins of ∆z = 1. The results show an evolution with redshift towards larger spin temperatures that is consistent with the conclusions of K14, and in the lowest redshift bin is consistent with our upper limit.

GBT survey of compact sources
In a recent Green Bank Telescope (GBT) survey for intervening 21cm absorbers towards 252 radio sources, spanning redshifts between z = 0 and 2.74, Grasha et al. (2020) reported ten detections in a total comoving absorption path length sensitive to DLAs of ∆X ∼ 155. By comparing their measurement of the cosmological H mass density from 21-cm absorbers with prior measurements at the same redshifts, they obtained a mean spin temperature to covering factor ratio of Ts/c f ∼ 175 K. For all values of c f ≤ 1 their estimate of the mean spin temperature is consistent with our 95 per cent upper limit on the harmonic mean. However, since Grasha et al. did not provide a confidence interval for this measurement we cannot yet draw a robust statistical comparison. We note that the total absorption path length covered by the GBT survey is a factor ∼ 4 greater than our data (and is more sensitive to lower column densities), but achieved only a factor ∼ 2 greater detection yield. This apparent inconsistency in outcomes between the surveys, and the inferred spin temperature, can be resolved by applying our method to these data in future work. 5 We make use of the L L survival analysis package (https:// github.com/CamDavidsonPilon/lifelines).

Spin temperature and evolution of the cold ISM in galaxies
It is now well established that the cosmic star formation rate density underwent a rapid acceleration in the early Universe, peaked at z ≈ 1.5 − 2.5, and then declined by a factor of 10 − 15 to the present epoch (Madau & Dickinson 2014). Since stars form in selfgravitating clouds of dense molecular gas imbedded within the ISM (McKee & Ostriker 2007), it is expected that strong evolution should also be observed in the cosmological mass density of the molecular gas in galaxies. Observations of the bulk tracers of molecular gas in galaxies -CO emission (e.g. Decarli et al. 2019Decarli et al. , 2020Lenkić et al. 2020;Riechers et al. 2020a,b;Fletcher et al. 2021), supplemented by far-infrared and mm-wavelength observations of the dust continuum (e.g. Berta et al. 2013;Scoville et al. 2017;Magnelli et al. 2020)support this expected strong evolution.
In Figure 8 we show the best fitting parametric curve to the observational data obtained by Walter et al. (2020), alongside that expected from the depletion-time model of Tacconi et al. (2020); both agree within the uncertainties of the observational data and show a peaked-evolution that mirrors that of the SFR density (albeit slightly delayed). By balancing the flow rates between phases of the baryonic matter, Walter et al. (2020) showed that the observed changes in SFR and multi-phase gas densities can be simply modelled by changes in the net conversion rates of ionised to atomic gas, and atomic to molecular gas. Both appear to have declined by an order of magnitude since z ≈ 2.
However, when examined in detail the atomic gas in galaxies is far more complex; it is multi-phased and spans two orders of magnitude in density and temperature (Wolfire et al. 2003). As mentioned already in this paper, recent observations of the Milk Way ISM show the H to exist in two stable phases in pressure balance, the denser CNM (Ts ∼ 100 K) and the more diffuse WNM (Ts ∼ 10 000 K), as well as a substantial fraction of unstable UNM (Ts ∼ 500 K) that is generated by dynamical processes such as turbulence and supernova shocks (Heiles & Troland 2003;Murray et al. 2018). If the denser and cooler CNM is a pre-requisite for molecular cloud and star formation (e.g. Krumholz et al. 2009), then evolution of the conversion rate of H to H2 should be matched by a similar evolution in the relative fraction of the warm to cold phase atomic gas.
Evolution in the physical state of the atomic gas would be traced by the H spin temperature inferred from 21-cm absorption line surveys. As yet no strong constraints on the detailed evolution of Ts are possible from the existing data shown in Figure 8. However, we note that at intermediate cosmological redshifts (z ∼ 0.5 − 1) both our upper limit on Ts and the DLAs in the sample of K14 have values that are consistent with being lower than that of the Milky Way ISM at z = 0. This could indicate evolution of the ISM towards higher CNM fractions at intermediate cosmological redshifts, as would be expected for a higher molecular gas density.

CONCLUSIONS
We use a Bayesian technique to infer the harmonic mean spin temperature from a recent 21-cm absorption-line survey with the ASKAP telescope towards 53 compact radio sources (S20). Conditional on the outcome of four detections, we obtain a 95 per cent upper limit on the harmonic mean spin temperature of Ts ≤ 274 K in DLAs at 0.37 < z < 1.00. This measurement is consistent with 21-cm absorption-line surveys of known individual DLAs at the same redshift and is possibly indicative of higher fractions of cold phase gas (T ∼ 100 K) at z ∼ 1 than is typical of the local ISM. Such a result would be expected if the cold fraction in galaxies evolves similarly to the molecular and star forming densities, resulting from changes in the physical conditions of the interstellar medium. However, larger 21-cm line surveys are required to verify such a claim.
For the data considered in this work, the total absorption path length sensitive to 21-cm absorption is only ∆X ∼ 10 − 100. However, future large-scale surveys with the pathfinder telescopes to the Square Kilometre Array, including ASKAP FLASH (e.g. A20, Allison et al. in preparation) and the MeerKAT Absorption Line Survey (MALS; Gupta et al. 2016), are expected to span intervals that are three orders of magnitude larger (∆X ∼ 10 4 − 10 5 ) and will provide correspondingly stronger constraints on the evolution of cold gas in galaxies. By applying the method presented in this paper, we expect these larger surveys to constrain the harmonic mean spin temperature to a precision of ∼ 10 per cent (Allison et al. 2016), allowing self-consistent direct measurements of evolution of the cold phase gas in several redshift bins.