Multiphase Neutral Interstellar Medium: Analyzing Simulation with H I 21cm Observational Data Analysis Techniques

Several different methods are regularly used to infer the properties of the neutral interstellar medium (ISM) using atomic hydrogen (H I) 21cm absorption and emission spectra. In this work, we study various techniques used for inferring ISM gas phase properties, namely the correlation between brightness temperature and optical depth $(T_B(v)$, $\tau(v))$ at each channel velocity ($v$), and decomposition into Gaussian components, by creating mock spectra from a 3D magnetohydrodynamic simulation of a two-phase, turbulent ISM. We propose a physically motivated model to explain the $T_B(v)-\tau(v)$ distribution and relate the model parameters to properties like warm gas spin temperature and cold cloud length scales. Two methods based on Gaussian decomposition -- using only absorption spectra and both absorption and emission spectra -- are used to infer the column density distribution as a function of temperature. In observations, such analysis reveals the puzzle of large amounts (significantly higher than in simulations) of gas with temperature in the thermally unstable range of $\sim$200 K to $\sim$2000 K and a lack of the expected bimodal (two-phase) temperature distribution. We show that, in simulation, both methods are able to recover the actual gas distribution in the simulation till temperatures $\lesssim2500$~K (and the two-phase distribution in general) reasonably well. We find our results to be robust to a range of effects such as noise, varying emission beam size, and simulation resolution. This shows that the observational inferences are unlikely to be artifacts, thus highlighting a tension between observations and simulations. We discuss possible reasons for this tension and ways to resolve it.


INTRODUCTION
Understanding various properties, like temperature, density, velocity and magnetic fields, of the interstellar medium (ISM) provides valuable insights into the turbulent, multiphase ISM, formation and evolution of dense molecular clouds where stars form, and, consequently, the evolution of galaxies (Ferrière 2001;Elmegreen & Scalo 2004;Mac Low & Klessen 2004;McKee & Ostriker 2007;Padoan et al. 2014;Federrath 2018;Ferrière 2020;McClure-Griffiths et al. 2023).The atomic hydrogen (H i) is a major constituent of the neutral ISM, and thus the H i 21 cm emission and absorption spectra, arising from the transitions between the two hyperfine ground state levels, are among the most important tracers of the ISM (Dickey & Lockman 1990;Kalberla & Kerp 2009;McClure-Griffiths et al. 2023).Since the first detections of H i spectrum from the ISM (Ewen & Purcell 1951;Muller & Oort 1951;Hagen et al. 1955), it has been used extensively in studying the thermodynamic, morphological, and turbulent properties of the ISM.
The emission and absorption spectra are generally presented as the brightness temperature,   (), and optical depth, (), in each ★ E-mail: sbhatta2@caltech.eduvelocity channel, , respectively.The main physical parameters describing the ISM are the kinetic temperature,   , which quantifies the average thermal motion of the gas, spin temperature,   , which quantifies the hyperfine level population of neutral hydrogen, and column density,  H i , which gives a measure of the amount of gas.Here we note that   may not be equal to   , especially in the low-density warm phases in which collisions are insufficient (Liszt 2001).However, coupling with background Lyman- radiation can bring the two temperatures closer through the Wouthuysen-Field effect (hereafter, WF effect;Wouthuysen 1952;Field 1958).
Thermal equilibrium models predict two stable phases of the neutral gas: cold neutral medium (CNM, 30 K ≲   ≲ 200 K) and warm neutral medium (WNM,   ≳ 5000 K), with the cold clouds being smaller and denser, existing as clumps within the large-scale warm medium (Field et al. 1969).Contrary to this prediction, careful analyses of the H i spectra reveal that a significant fraction of the gas may lie in the "unstable" phase with 200 K <   < 5000 K (Heiles & Troland 2003b;Kanekar et al. 2003;Roy et al. 2008Roy et al. , 2013b;;Murray et al. 2015Murray et al. , 2018)), the so-called unstable neutral medium (UNM).The most probable reason for the presence of unstable gas is turbulence originating from galactic spiral shocks, supernova explosions, and other physical processes (Elmegreen & Scalo 2004;Scalo & Elmegreen 2004;Federrath et al. 2017).Turbulent heating and mixing may result in less cold and more unstable gas in the ISM.In spectral line observations, turbulence induces additional broadening of the lines, which makes temperature estimation of gas from line widths difficult.
There are various techniques for analyzing the observed H i absorption and emission spectra.The standard method is to decompose them into individual Gaussian components (termed Gaussian decomposition) with the underlying assumption that each component corresponds to an isothermal ISM cloud along the line of sight.Such Gaussian decomposition can give us the   and  H i of the clouds corresponding to the Gaussian components.Recently, a method has been proposed that can decouple the thermal and turbulence broadening of these spectral components (assuming pressure equilibrium and the velocity dispersion-size relation by Larson 1981), and gives us information about the kinetic temperature of the gas clouds (Koley & Roy 2019).The absorption and emission spectra can also be directly used as a whole, without performing Gaussian decomposition, to estimate ISM properties like column density and cold gas fraction (Heiles & Troland 2003b;Chengalur et al. 2013;Roy et al. 2013b;Murray et al. 2018).Although such analyses give us important information about the ISM, the reliability of the techniques and the underlying assumptions require further study and examination, which is the aim of this study.
Several numerical hydrodynamic or magnetohydrodynamic (MHD) simulations have been performed to understand the effect of turbulence on the thermal properties of ISM.Such simulations have been able to produce a significant amount of gas in the unstable phase through turbulence (Audit & Hennebelle 2005;Kim et al. 2013Kim et al. , 2014;;Gazol & Villagran 2016).Many other properties of the ISM have been studied through simulations, such as the magnetic field morphology and the structure of the cold clouds and filaments (Seta & Federrath 2022;Beattie et al. 2022;Lei & Clark 2022).Besides such studies, simulations can also be used to test the observational techniques by creating synthetic H i absorption and emission spectra, applying the analysis techniques (e.g., Gaussian decomposition) to them, and testing the inferences against the ground truth from simulations.
Recently, a few studies have used simulation data to verify some of the widely used H i 21cm data analysis techniques.Kim et al. (2014) used the hydrodynamic simulations by Kim et al. (2013) to verify the methods for determining the line of sight column density, channel spin temperature, and cold gas fraction from the spectra.They also showed that the distribution of the spectral data points,   () and (), was qualitatively similar to the observed data by Roy et al. (2013b).Murray et al. (2017) used the same simulation data to study the effectiveness of Gaussian decomposition in recovering the gas structures.They showed that the method could recover the temperature and column density of most structures within a factor of ≈ 2.More such studies, with different simulation data and analysis techniques, are needed to ascertain the validity of these techniques.
In this work, we undertake a systematic study of verifying several existing data analysis methods used frequently to infer the physical properties of the ISM from the observed spectra.We use the twophase magnetohydrodynamic (MHD) simulation by Seta & Federrath (2022) to create synthetic absorption and emission spectra and test the data analysis methodologies on them.Our work is broadly divided into two parts.The first part is dedicated to using the spectral lines as a whole without any Gaussian decomposition.These simple techniques can provide estimates of several ISM properties, avoiding the complexities involved with Gaussian decomposition.Here, we concentrate on the distribution of   () and () and propose a physical and quantitative model to explain the observed distribution.In the second part, we perform a multi-Gaussian decomposition of the synthetic spectral lines and analyse the inferred properties.We verify the data analysis methodologies involving both the joint decomposition of absorption and emission spectra and using only the absorption spectra.We verify that the Gaussian decomposition method is largely capable of inferring the physical properties of the ISM, though we also discuss a few limitations related to the method.Please note that in comparing the properties inferred from the analysis methods to the actual physical properties of the simulation domain, we refer to the latter as the "true" properties.We clarify that the word should not be confused as indicating a true ISM property, which is yet to be established firmly.
The rest of the paper is organised as follows.In §2, we briefly describe the main properties of the simulations and how we generate the synthetic observations of the simulations.In §3, we use the data analysis methods without Gaussian decomposition and discuss a model for the   () − () distribution.In §4, we describe the inferences from the Gaussian decomposition of absorption and emission spectra.We compare our results with previous works and observations in §5, and discuss a few other aspects of the analysis techniques and some open questions.Finally, in §6, we summarise our results and conclude.

Simulation Data
We use the recent two-phase MHD simulation of the ISM by Seta & Federrath (2022).The simulation uses the FLASH code to solve the non-ideal MHD equations in a triply periodic cubic numerical domain of 512 grid points spanning  = 200 pc along each axis, achieving a resolution of around 0.4 pc (we have also tested our analysis methods with simulations of lower resolutions, see Appendix C).It uses heating and cooling functions as prescribed in Koyama & Inutsuka (2000, 2002); Vázquez-Semadeni et al. (2007).For this work, we use the simulation with solenoidally driven turbulence, and we aim to extend our analysis in the future with other forms of driving (compressive and a mix of both, see Federrath et al. 2010, for a discussion on different types of turbulence driving).The turbulence is driven at a length scale of half the simulation domain (100 pc) with velocity dispersion of  rms = 10 km s −1 .Explicit viscosity and resistivity have been added, yielding a hydrodynamic and magnetic fields Reynolds number ( rms /(2) and  rms /(2), where  and  are the viscosity and resistivity, respectively) of 2000.The simulation does not use a galactic potential or any non-isotropic term in the set of MHD equations.Thus, the simulation domain is expected to be statistically isotropic.The simulation spans a period of 1 Gyr, which is approximately 100 times the eddy turnover time (= (/2)/ rms ≈ 10 Myr).For this work, we take the simulation at a time of 800 Myr, by when the system has reached saturation in terms of its statistical properties and energies (for further details, see figure 8 in Seta & Federrath 2022).
Unless otherwise mentioned, throughout the paper, we use the following definition for the gas phases: cold (CNM) -  < 200 K, unstable (UNM) -200 K <   < 5000 K, and warm (WNM) -  > 5000 K.The simulation has 1.5%, 22.9%, 75.6% by volume and 32.3%, 34.5%, 33.2% by mass of cold, unstable, and warm gas, respectively.There is a very small amount of hot gas (  > 8000 K) in the medium since the simulation does not include supernova explosions.Ionization of most warm gas is expected to be < 5%, with the unstable and cold gas ionization being even lower (Wolfire et al. 1995(Wolfire et al. , 2003;;Liszt 2001).Thus, in our analysis, we assume the gas in the simulation domain to be fully neutral.Here, we note that this may change depending mainly on the ionizing radiation field strength considered, which may result in a non-negligible ionization fraction, mostly for the warm phase.Such considerations are beyond the scope of this paper.However, we do not expect the results of this work to depend significantly on this effect other than the warm gas amount being appropriately scaled by the neutral fraction.Additionally, we also do not expect the small size of the simulation box (200 pc compared to several kilo-parsecs probed in observations) to affect the results as we are probing similar column densities of neutral gas with this simulation box as in observations (10 20 − 10 21 cm −2 ), thus similar ranges of optical depth and brightness temperature in spectra.

Spectrum Generation
The simulation results provide the values of the physical parameters, namely the temperature, density, and velocity for all the grid cells.However, in observations, the primary observables are only the absorption and emission spectra, which are a combined effect of all the gas along the line of sight.The former is obtained by recording the attenuation of the radiation from background point radio sources (like distant quasars), whereas the latter is obtained by recording the radiation intensity away from any bright background.Here, the simulation data is used to generate synthetic absorption and emission spectral lines.Due to the isotropic nature of the simulation, without loss of generality, we have taken the  axis as the direction of a line of sight (LOS).We employ a synthetic spectra generation technique based on the solutions of the line-of-sight radiative transfer equation.We closely follow the formalism that has been applied in various previous works (Miville-Deschênes et al. 2003;Kim et al. 2014;Fukui et al. 2018).We describe the method employed below.
In generating the synthetic spectra, we consider each simulation grid cell to be a parcel of isothermal gas with the physical parameters given by the simulation results.For such a parcel, the absorption spectrum is dependent on the column density and spin temperature of the gas and additionally on the spectral profile of the gas, which is directly related to the velocity profile.For the  th grid, assuming a Gaussian velocity profile of each of the grid cells with mean velocity  , (the  component of the velocity in the  th grid) and width   , the optical depth, following the treatment for isothermal clouds in (Draine 2011), can be written as where   is the column density and  , is the spin temperature of the  th grid cell.  =   Δ, with   being the number density of the  th grid cell and Δ = 0.3906 pc, the resolution of the simulation (the side length of a grid cell).Following the prescription in Miville-Deschênes et al. (2003), the velocity profile width,   , can be calculated as where  , is the temperature of the  th grid cell,  is the hydrogen mass and The first term inside the square root in Equation 2 is the thermal width, and the second term approximates the non-thermal (turbulent) broadening within the grid cell.Following the methodology in observations, for a bright enough background source with respect to which the self-radiation from the ISM can be neglected, the radiative transfer equation suggests that the observed optical depth will be a summation of the optical depths of all the gas parcels along the line of sight.Thus, the absorption spectrum is calculated as the optical depth spectrum The absorption spectra are dominated by the gas clouds with lower temperatures due to their lower spin temperatures and higher optical depths (following Equation 1).
The emission spectra are reported in terms of brightness temperatures, which are related to the radiation intensity by Rayleigh's blackbody relation (Draine 2011).To generate the spectrum, we use a ray-tracing approach and iteratively solve the following equation for the grid cells, , along the line of sight as ) . (5) The first term in this expression is the absorption-corrected selfemission from the gas in the  th grid cell; the second term represents the radiation from the grids behind,  , , attenuated by the gas in the  th grid cell.Assuming that the  = 0 plane faces the observer and there is no background source, we set  ,512 () = 0 (512 being the total number of grid cells in each direction of the cubic simulation domain).This makes  ,0 () =   () the brightness temperature spectrum seen by the observer.Figure 1 shows an example of synthetic absorption and emission spectra, along with the temperature and density variation along the line of sight.In the optical depth spectrum, the cooler gas along the line of sight manifests as distinct narrow and high-amplitude components.The broader warm components are subdominant owing to their high spin temperature (see Eq. 1) and low optical depth.The latter, however, are clearly present in the emission spectrum.
We note here that, as also mentioned in §1, the kinetic temperature   and spin temperature   may not be the same, especially at higher temperatures at which collisions are not sufficient to equilibrate the two temperatures.Alongside several other processes, the WF effect plays an important role in determining the relation between the two temperatures.We present our results assuming a constant but inefficient WF effect resulting in   <   .We use the numerical results of Liszt (2001) to relate the two temperatures.However, several recent studies indicate that the WF effect in the ISM maybe efficient enough to render the two temperatures equal for all phases (Murray et al. 2017;Seon & Kim 2020).Thus, we have also performed our analyses using   =   for all phases.We discuss the results for this case separately in §5.1.
We also note that observed ISM spectra are contaminated by noise, which introduces additional complexities in analyzing the data.An additional limitation with observations is the larger beam size associated with the emission spectra compared to the absorption spectra.We separately incorporate both of these effects in our study (see Appendix B and D, respectively).For simplicity, we primarily present our analysis without these effects, but we refer to the results with these effects when required.7).The appearance of a broad warm component in the emission spectrum and its absence in the absorption spectrum displays the inability of the absorption spectrum to detect warm gas.

Estimating total column densities
The emission and absorption spectra can be used directly to extract several physical properties of the ISM, namely the total column density and cold gas fraction.An unbiased1 estimator of column density, the isothermal estimator, is given by (Dickey & Benson 1982;Chengalur et al. 2013) . (6) This is an isothermal estimator of the column density, with being the isothermal estimator of the spin temperature at the velocity channel  (the points in the right panel of Fig. 1 are coloured according to  iso  []).These estimators assume that the radiation in a single velocity channel arises from an isothermal gas with temperature  iso  (). iso  (), additionally, turns out to be an unbiased estimator of the optical-depth-weighted spin temperature  avg  () =   ,   ()/    () of the gas along the line of sight.We verify that  iso H i agrees with the true column density within a factor of ≈ 1.2 and  iso  agrees with  avg  within a factor of ≈ 1.5.The absorption and emission spectra can be used to determine the cold gas content along the line of sight.The cold gas fraction can be estimated using a parametric cold gas spin temperature,   , and the line of sight average spin temperature ∫ () as   =   / los  (Dickey et al. 2009;Kim et al. 2014).We found that   values in the range of 50 − 100 K fit well the cold gas fraction for most of the sightlines.These results are consistent with those obtained in Kim et al. (see figs. 5 and 6 in 2014).

𝑇 𝐵 (𝑣)-𝜏(𝑣) space: A new model
The optical depth and brightness temperature in a velocity channel are impacted by both the spatial and velocity distribution of the various phases of gas along the line of sight.The distribution of the spectral data points,   () and (), coupled with the channel spin temperature  iso  (), can provide valuable information about the properties of the ISM and has been used in several recent works (Roy et al. 2013a;Kim et al. 2014;Saha et al. 2018;Basu et al. 2022).There have been previous attempts to explain the   () −() distribution using two-phase ISM models (Strasser & Taylor 2004) or using simulations (Kim et al. 2014).Although the general properties of this space could be understood, the analysis was restricted to mostly a qualitative understanding.Here we propose a quantitative model based on physical arguments to describe the boundaries of the distribution of data points in the   ()-() space.We apply our model to the synthetic spectral data generated from our simulations and relate the model parameters to the physical properties of the simulation domain.To test the applicability of the model to real data, we take the Giant Meterwave Radio Telescope(GMRT)/Westerbork Synthesis Radio Telescope (WSRT)/Australia Telescope Compact Array (ATCA) observations of Roy et al. (2013a,b, 37 lines of sight) and fit our model.
Figure 2 shows the   ()-() space for the synthetic spectra (left) and observed data (right), with our models (described in §3.2.2 and §3.2.3) for the boundaries in the   - space overplotted.The data points are contained within a band in this space; the right boundary is relatively more prominent than the left.  () is also seen to increase with optical depth up to a certain value, post which absorption dominates and the value decreases.Our models aim to relate these trends to the quantitative physical properties of the neutral ISM.

The "non-warm" gas spin temperature and optical depth
Before going into the models, we set up the ground by deriving a relationship between the channel spin temperature of the non-warm (cold or unstable) ISM gas,   (), to the optical depth, ().We shall use the following formalism to understand the right boundary of the   ()-() distribution.We start with the optical depth equation for an isothermal gas cloud, which is the same as Equation 1, except not for a grid cell, with  being a product of numerical constants.For non-warm gas, we can approximate the spin and kinetic temperatures to be equal, which is reasonably valid till ≈ 2000 K for any strength of the WF effect.We assume an isobaric equation of state with pressure  (simulations show that lower-temperature gas more closely follows an isobaric equation of state, see, for example, fig. 3 in Seta & Federrath 2022).We further assume that  ∼ √   as the extent of turbulent broadening in smaller-scale non-warm gas will not affect our formalism (turbulent broadening is directly related to the gas length scales, see Larson 1979, 1981, andEquations 16 and17 for the scaling relations).Using these simplifications, we can write where ℓ represents the typical line-of-sight length scales of the nonwarm clouds.For a given (),   is the maximum for  = .This gives the maximum possible temperature of the non-warm cloud, which may yield the given ().We further note that non-warm gas in the ISM is not expected to have arbitrarily large length scales.Consequently, the pre-factor attains a maximum at about the largest non-warm length scale value for a given .We define the parameter  = ( ) 0.4 .Owing to the weak exponent, small variations in  or  will not affect the parameter significantly.This leads us to the following expression: This equation represents the maximum temperature of an isothermal non-warm gas cloud which may result in the optical depth given by the spectrum at a particular velocity channel.We recognize that a spectrum is composed of the optical depth profiles of all the different gas clouds along the line of sight.The temperature calculated using Equation 10 is only an isothermal upper limit estimated from the optical depth at a velocity channel.We note here that a similar relationship of temperature to optical depth has also been reported by Saha et al. (2018) and is also similar to the classical   −  relation (Kulkarni & Heiles 1988;Heiles & Troland 2003b).The above re-parameterization has naturally led to such a relation.

The right envelope of the 𝑇 𝐵 (𝑣) − 𝜏(𝑣) distribution
In this section, we discuss the model for the right envelope and its implications.We divide the   () − () relation broadly into two regions: the lower region, which is characterized by very low optical depths ( ≲ 0.005, the lower region), and the rest (upper region).
In the lower region, the spectra are expected to be dominated by the warm gas, either from the wings of their broad spectral components or from the velocity channels with no cold gas contributions.This is also supported by the high  iso  () value in the region (colourbar in Figure 2, left plot).For this region, our model assumes a simple form given by where   is a parameter of our model and represents the typical spin temperature of the warm gas in the ISM.The final approximation is valid in the low optical depth regime.The upper region ( ≳ 0.005), characterized by higher optical depths, is dominated by non-warm gas components.We assume that the non-warm component is embedded within the warm component, where  , is the parametric brightness temperature of the total warm gas column and   is the parametric spin temperature of the non-warm gas.The first term in the equation is the self-emission term from the non-warm gas.The second and third terms represent the emission from the warm gas in front and behind the non-warm cloud, respectively.The true value of  varies with both line-of-sight and velocity channels and is difficult to determine.In this model, however,  appears as an effective parameter.Previous studies have shown that a single set of ,   and,  , values cannot satisfactorily explain the complete envelope (Strasser & Taylor 2004) and thus the parameters must vary over the envelope.As optical depth does not depend on the relative positions of the clouds along the line of sight,  cannot be a function of .Rather, following the formalism in §3.2.1, we argue that   is a function of the optical depth.We note that we can maximize   () by setting  = 1, which physically means that the right boundary is primarily constituted by velocity channels where the non-warm component lies behind the warm component.We further maximize   () by maximizing   using Equation 10.Thus, we arrive at the model for the upper region of the   () − () space: where  and  , are parameters of our model.Figure 2 (left) shows the model fit2 for the synthetic spectra (brown lines) for the constant WF effect case.We see that the model resembles the right envelope quite well with the parameter values   = 3200 K,  = 130 K, and  , = 10 K. We note that with a constant but inefficient WF effect and following the Liszt (2001) relation between   and   ,   is mostly limited to ≈ 3000 K even for the warm gas with much higher kinetic temperatures.Thus,   gives a good estimate of the warm gas spin temperature.The  , value well represents the typical brightness temperature of the warm components.Using the  , and   parameters, we can estimate the typical optical depth from the warm gas components as   ≈  , /  ≈ 0.003 (see Eq. 12), which agrees well with the spectra.The  −0.4 factor represents the temperature of the nonwarm component, which dominates at a given optical depth.The transition from the lower region to the upper region in the   −  plane occurs at around  ≈ 0.005.At this optical depth, the temperature with  = 130 K is ≈ 1100 K (see Eq. 10), which lies in the unstable phase.At  = 1, the temperature is 130 K, which lies in the cold phase.Thus, we see that gas with a wide range of temperatures contributes to the   () − () distribution.Table 1 summarizes the model parameters for all the different cases considered in this section.
The right panel in Figure 2 (right) shows our model fit for the observed data.The model parameters are:   = 5000 K,  = 200 K,  , = 13 K.The parameters   and  are significantly different than those for the synthetic data.This brings out the quantitative difference between the simulation and observed data.The high   for the observational data implies the existence of a significant amount of gas with a high spin temperature (which may indicate an efficient WF effect in the ISM; see §5.1 for further discussion).Considering the parameter , we see that at the transition from the lower to the upper region ( ≈ 0.005), the temperature of the non-warm component is ≈ 1800 K (see Eq. 10), which is much higher than for the simulation and lies in the unstable range.We relate this to the detection of a significant amount of gas in the temperature range of 1000 − 2000 K, the unstable gas, in several studies (Heiles & Troland 2003b;Roy et al. 2013a;Koley & Roy 2019).However, we note here that practical limitations with observations like larger emission beam size or noise may also affect the parameter values and may lead to larger uncertainties (see Table 1 and Appendix B and  D).For consistency checks and to constrain the parameters better, it is important to perform the analysis with emission spectra with narrower emission beam widths (e.g., using HI4PI data).However, for this work, we stick to the data from Roy et al. (2013a) and defer further analyses for the future (see Appendix E).
Assuming  , to be representative of the peak brightness temperature of the warm gas and that the line-width ∼ √︁  , , where  , is a parametric warm gas kinetic temperature, we can obtain an approximate relation between  , and the warm column density: combining Eqs. 8 & 11).The  , values for the synthetic and observed data and typical values of  , yield   H i ≈ few×10 20 cm −2 .This well represents the warm gas column density of the simulation and verifies the warm gas origin of the parameter  , .For the ISM, this is comparable to typical warm gas column densities (and tallies with the minimum warm column density limit for CNM clouds to form; see Kanekar et al. 2011).
From the derivation in §3.2.1, we see that the parameter  is related to the characteristic maximum length scales of the non-warm components.Thus, given a value of the parameter , the relation can be inverted to obtain an estimate of the maximum limit of the non-warm length scale.Substituting the constants, we find ℓ =  2.5 /7.434 pc, where  is the pressure of the non-warm gas.Using the volumeaveraged pressure  ≈ 2717 K cm −3 for the simulation, we estimate a length scale of ≈ 9 pc.Such a length scale is associated with the larger non-warm clouds in the simulation domain (the non-warm length scale in the simulation varies from ∼ 0.8 to ∼ 20 parsecs).A similar length scale estimation for the observed data yields a value of ≈ 20 pc when the pressure is taken to be 3700 K/cm 3 (nominal ISM pressure McKee & Ostriker 1977;McKee 1995;Jenkins & Tripp 2011;Koley & Roy 2019).This is close to the cold cloud scales measured from Zeeman splitting (∼ 10 pc, Heiles & Troland 2005), and happens to be close to the cooling length (∼ 20 pc, Hennebelle & Pérault 1999).

The left envelope of the 𝑇 𝐵 (𝑣) − 𝜏(𝑣) distribution
Unlike the right envelope, the left envelope is not very well defined.This region of the   () − () space mostly comprises cold clouds with small length scales or the outer wings of multiple cold Gaussians (also see §4.1 in Kim et al. 2014, for a related discussion).The contribution from the warm component in this regime is subdominant.This leaves us with the model   () =   1 −  −  () (see Eq. 12).The left region corresponds to the lowest   points for a given , and demands minimization of   .Unlike maximization, Equation 9does not yield a non-trivial minimum limit for   for a given .Rather, in this case, the limit arises from the lowest possible gas temperature in ISM.We see that a constant value of   = 60 K can enclose most of the data points of the observed data (black line in Figure 2, right panel).Such a model was also proposed earlier with a similar cold gas temperature value by Strasser & Taylor (2004).It is interesting to note that this model intersects with the previous model for the right envelope, which puts a limit on the optical depth and brightness temperatures.However, such a model (with  ′ = 40 K, black line in the left panel of the same figure) cannot explain the left envelope of the synthetic data satisfactorily.
The data points forming the left envelope, as noted in Kim et al. (2014), are expected to arise from sub-parsec scale cold clouds or the Gaussian tails of the cold components.Considering the former, in the ISM, the perceived small length scales of cold clouds can arise from the large asymmetry in the structure of cold gas, which may exist as sheets with aspect ratios as high as ∼ 200 (Heiles & Troland 2005).If this is the main reason, the discrepancy should decrease with increasing simulation resolution.Contrary to this expectation, our resolution study shows a different trend (fewer data points in the left region with increasing resolution, see Appendix C).However, there is a possibility that simulations with much higher resolution than this work are needed to resolve these structures (e.g.sheets) and see their prominence in the   () − () distribution.
For the other cause, the tails of the narrow Gaussians of the cold clouds usually get buried under the broad, warm components of the surrounding gas.Thus, they do not surface in the   () − () distribution.However, cold gas with a velocity significantly different than the surrounding gas may result in a cold Gaussian in the spectrum separated from the rest of the gas.The wings of such components may contribute to the left boundary of the distribution.This occurs in the lowest resolution simulation (128 3 ) due to a sudden jump in the velocity and temperature (see Figure C2 in Appendix C).However, a similar effect may occur in the ISM with fast-moving cold components through the surrounding (relatively warmer) medium, similar to intermediate and high-velocity clouds (IVCs and HVCs, Wakker & van Woerden 1997;Elise Albert & Danly 2005), or supernova shock fronts.
A different origin of the leftmost data points may be the larger emission beam width.A larger beam results in an averaging of the emission spectra over a region in the plane of the sky.If this length scale becomes comparable to the length scale of cold clouds, the resultant brightness temperature of these clouds decreases.Absorption spectra, on the other hand, are obtained over a narrow beam (as the background sources are usually point-like).This may result in an apparent lower brightness temperature for a given optical depth (equivalent to being in the left region of the   () − () distribution) for the cold components.Our study of this effect (detailed in Appendix D) indeed shows that with increasing emission beam width, the left boundary shifts towards left (a decreasing value of the parameter  ′ ) with increasing occurrences of such low   () points in the left region (see the extreme case of  = 15 in Figure D2).

MULTI-GAUSSIAN DECOMPOSITION
Gaussian decomposition is a widely used technique to extract information about isothermal gas clouds along the line of sight from the absorption and emission spectra.Here we recognize that emission spectra for the cloud components may not be Gaussians in optically thicker sightlines (since   ∝  only for  ≲ 1), which is the case for several of our synthetic spectra.Nevertheless, the emission spectrum can be well decomposed into Gaussians and can be used to reasonably estimate the spin temperature and column density of the components.Such methods have been used in analyzing data from several surveys like LAB (Haud & Kalberla 2007;Kalberla & Haud 2018;Marchal & Miville-Deschênes 2021) and other studies with synthetic spectra (Murray et al. 2017).A theoretically-motivated method that accounts for two phases, which was used in several later works (SPONGE and MACH surveys, see Murray et al. 2015Murray et al. , 2018;;Nguyen et al. 2019;Murray et al. 2021), is outlined in Heiles & Troland (2003a).However, this method suffers from the uncertainty related to the relative position of the gas clouds along the line of sight.For the purpose of this work, we use the former method.
We consider two analysis methods involving multi-Gaussian decomposition of the spectra.First is the standard method of performing joint Gaussian decomposition (JGD) of the absorption and emission spectra.The joint components are used to find the spin temperature and column density of the corresponding gas cloud.However, the emission components suffer from several drawbacks, mostly related to the much broader beam width associated with them than the absorption spectrum.This leads to stray emissions contaminating the spectrum.To overcome such problems, Koley & Roy (2019) proposed a method of using the Gaussian components from only the absorption spectra to determine the component properties (henceforth, the KR method).We use this method in §4.3.
Recently, several automatic Gaussian decomposition algorithms, which can identify the number of components and fit the spectra, have been developed (Lindner et al. 2015;Marchal et al. 2019).However, such algorithms have their own uncertainties, and they also fail to work on noiseless data.Additionally, they cannot be directly used for joint fitting of absorption and emission spectra.In this work, we use our own algorithm for automatic Gaussian decomposition, which is suitable for all the different cases considered in this work.
Our algorithm uses lmfit (Newville et al. 2023) and is based on repetitive fitting with an increasing number of components (see Appendix A for the details).For the joint absorption-emission decomposition, we have restricted (within a small window) the component line centers and width in emission with those obtained from fitting the absorption spectrum.Our method ensures that for each absorption component, we get a corresponding emission component (the joint absorption-emission components).We also get several extra components from the emission spectra, which do not appear in absorption.These components are mostly from the warm gas, which is subdominant in absorption, but detected in emission.We refer to these as "emission-only" components.We have also set several criteria (see Appendix A3) for accepting the inferred components.Due to the fundamental difference between the two methods, the criteria for the JGD and KR methods are different.For the first, the criteria involve the joint components, whereas, for the latter, the criteria are based on only the absorption components (see Appendix A3 for the details).Thus, the number of components used in the two methods is different.We randomly choose 200 sightlines through the simulation domain for our analysis.In Figure 3, we provide a few examples of the Gaussian decomposition of both the emission and absorption spectra.The synthetic spectra, the multi-Gaussian fit to the data, and the individual inferred components are shown.

Properties of the Gaussian components
With noiseless spectra of 200 random sightlines through the numerical domain, JGD yields a total of 532 joint absorption-emission components and 263 emission-only components.For the KR method, we obtain 604 absorption components.The components are characterized by the peaks    and   for emission and absorption components, respectively, and the total line broadening   .Table 2 summarizes the Gaussian decomposition results for all the different cases (including various observational studies).
Figure 4 (blue points) shows the distribution of the absorption and emission Gaussian components in their respective width and amplitude spaces.For the absorption components, a clear correlation between   and   is evident.The double-humped nature of the distributions results from the two-phase nature of the medium in the simulation.The width of the emission components also shows a double-humped distribution.The   distribution peaks at around 10 K, which mostly arises from the warm components, and there are only a few components with   < 1 K.The emission-only components turn out to be broader than the absorption components, as expected.The median values of the line width (  ) are 2.3 and 5.7 km s −1 for the joint absorption-emission and emission-only components, respectively.

Using both absorption and emission components
For the joint absorption-emission components, we estimate their spin temperature using the following relation: and the corresponding column density as: We study the inferred column density distribution over spin temperature and compare it with the simulation.The top panel of Figure 5 (orange step histogram) shows the JGD inferred column density distribution over  comp . The filled histogram represents the actual distribution for the 200 sightlines considered.We see that at cooler temperatures,   ≤ 2500 K, the inferred and the true distributions agree quite well.Conversely, at warmer temperatures, a consistent underestimation of column density is evident.This is mostly due to Gaussian decomposition failing to recover several low optical depth warm components in absorption (emission-only components may make up for this deficiency; see §5.2).As the synthetic spectra are noiseless, it is difficult to estimate a threshold of  comp  for absorption components to be detected.In real spectra, the noise level can be used to compute the limit of component spin temperatures to be detectable.Nevertheless, JGD could qualitatively capture the twophase distribution of the simulation.The inferences broadly remain the same for the maximum WF effect case, with the addition of noise or increasing the emission area to reasonable values (see §5.1, Appendix B and D).
Since JGD can extract the true cold gas properties of the medium, for each line of sight we estimate the cold gas column density from Gaussian decomposition as the sum of column densities of all the components with  comp  < 200 K.We found that the inferred cold gas column density for the lines of sight agrees with the true cold column  Top: Inferred distribution of the spin temperature using both the JGD method ( §4.2) and KR method on absorption components ( §4.3).Bottom: Inferred distribution of kinetic temperatures of the absorption components using the KR method.The inferred distributions provide a reasonable match to the true distributions for temperatures ≲ 2500 K, but are comparatively less reliable at warmer temperatures (≳ 2500 K).
densities within a factor of two for most cases.We define the inferred fraction of cold gas as the ratio of inferred cold column density and  iso H i .This is well constrained by the relation   =   / los  with the cold temperature between   = 30 K and   = 150 K.This finding agrees well with that obtained with the SPONGE observations (see fig. 5 in Murray et al. 2018).
We test the ability of JGD to infer the gas phase (CNM, UNM, and WNM) fractions of the ISM.Table 2 summarizes the inferred phase fraction with all the methods and cases.JGD cannot estimate the kinetic temperature of the components; thus, the standard definition of phases (based on   ) mentioned in §2 cannot be used.We instead adopt the definition used in Murray et al. (2018) based on the spin temperature:   < 250 K for the CNM, 250 K <   < 1000 K for the UNM and   > 1000 K for the WNM.We define the inferred phase fraction based on the spin temperature as  JGD Overall, with JGD we tend to overestimate (underestimate) the cold and unstable (warm) gas fractions.
The inferred absolute gas fractions of the CNM and UNM may suffer from several uncertainties, resulting mostly from the uncertainties in estimating the WNM gas fraction, which is subdominant in absorption.Including the emission-only components in estimation may change the inferred gas phase fractions drastically (see §5.2, §B and Table 2).In such cases, a robust quantity of interest is the ratio of the inferred amounts of CNM and UNM.We define  JGD =  JGD CNM /  JGD UNM and the true ratio from the simulation data as  true .We find, with the definition of phases used for JGD,  true ≈ 2.8.For the constant WF effect case,  JGD lies in the range of 2.5 − 3.6, which agrees with true value.With the maximum WF effect case,   however,  JGD lies in the range of 3.2 − 4.5, thus, overestimating the true value.

Using only the absorption spectra components
Now we use the KR method, which is based on only the absorption components.It uses the statistical velocity dispersion properties of turbulence to decouple the thermal and non-thermal broadening of the spectral lines.The method estimates the column densities, spin, as well as kinetic temperatures of the absorption Gaussian components using the values of   and   .In their work, Koley & Roy (2019) used an isobaric equation of state, with pressure , and a Larson-like turbulent velocity scaling law (Larson 1979) where ,  and  are parameters, whose values were taken to be 0.64 km s −1 , 0.37 and 3700 K/cm 3 respectively.Using these parameter values, the total column density and component spin temperatures for a selected few lines of sight (with simpler profiles: low optical depths and fewer components) more or less agree with those obtained using JGD (refer Koley & Roy (2019) for details).
The value of  represents the average pressure of the ISM.For our work, we take  = 2717 K/cm 3 , the volume-averaged pressure of the simulation domain.A power-law turbulent velocity scaling was found to be insufficient to describe the simulation over the entire range of length scales.This is a consequence of a low Reynolds number of this simulation (≈ 2000) compared to that estimated for the ISM in a typical galaxy (∼ 10 7 , see tab.3.1 in Brandenburg & Subramanian 2005), which results in a large dissipation scale and an insufficient inertial range.Kriel et al. (2022) found that the dissipation scale ℓ  is related to the turbulence driving scale ℓ turb and Reynolds number Re as ℓ  ≈ 40 ℓ turb Re −3/4 , which in this case (with ℓ turb = 100 pc) is ≈ 12 pc, placing most of the simulation domain in the non-inertial turbulence regime.We find that an exponential correction term can be used at the low Reynolds numbers of our simulations where we have used   = 12 pc.Comparing Equation 17 with the true line of sight velocity dispersion inside the simulation domain (see Figure 6), shows that  = 0.64 km s −1 and  = 0.37 (the same values used for observational data analysis) well describes the turbulence properties of the system.The value of  is also consistent with that observed with subsonic turbulence in simulations (0.39 ± 0.02, see Federrath et al. 2021).As was done in Koley & Roy (2019), we checked for a few lines of sight that with this modification in  turb , the total column densities and component spin temperatures match with those obtained by JGD.We note here that the modeling is fairly insensitive to the exact functional form of the scaling and the parameter values.As long as the non-thermal 1D velocity dispersion of the ISM is captured well, the inferences are unlikely to be significantly affected.
In Figure 5, we show the inferred column density distribution over spin temperature (top, green step histogram) and kinetic temperature (bottom) for the constant WF effect case.The KR method is seen to be capable of estimating the distribution quite well in the lower temperature regime (  and   ≲ 2500 K).We also see that the inferred distribution from the KR method and JGD agree quite well in this temperature regime.At higher temperatures, the estimates are unreliable, though qualitatively, a two-phase distribution is still recovered.Here too, the inferences broadly remain the same for the maximum WF effect case or with the addition of noise (see §5.1, Appendix B and D).
Similar to the JGD, we define the fractions  KR  ,  KR  , and  KR to study the method's ability to recover the gas phase fractions.As the KR method can give values of the kinetic temperatures of the components, we revert to the phase definition based on the kinetic temperatures as defined in §1. KR CNM ,  KR UNM and  KR WNM take values in the range of 0.9 − 1.2, 1.1 − 1.4 and 0.6 − 0.9.Thus, this method also shows the trend of overestimation of the CNM and UNM and the underestimation of WNM fractions.The value of  true , in this case, is ≈ 1.We find that  KR lies in the range of 0.7 − 1.1, which again agrees well with the true value.

Efficient Wouthuysen-Field effect
Recently, several theoretical and observational studies have indicated that the WF effect in the ISM may be efficient enough to render   =   for all phases (Murray et al. 2018;Seon & Kim 2020).
To check how this might affect our inferences, we performed our analyses separately, assuming   =   for all the phases (the maximum WF effect case).The results of the constant and maximum WF effect cases turned out to be broadly similar.The outcome of the data analysis techniques with the non-warm gas components remains unaltered.Both the JGD and KR methods could recover well the column densities and temperatures of gas with   /  ≲ 2500 K.The differences between the cases mostly lie in the warm temperature regime.Below, we discuss the key results for the case of a maximum WF effect (  =   ) and relate them to a few observational signatures indicative of an efficient WF effect in the ISM.
Using   =   changes the distribution of spectral data in the   () − () space at the lower optical depth regime.Fitting the new distribution with our model discussed in §3.2 yields   = 6000 K (see Eq. 11), leaving the values of the other two parameters ( , and ) practically unchanged.This is a direct consequence of the much higher spin temperature of the warm gas in this case.The non-warm gas length scales or warm gas brightness temperature do not depend on the spin temperature, leaving them unchanged.We relate the higher value of   in this case to that obtained by fitting the model to the observed data (  = 5000 K).This is an indication of higher warm gas spin temperatures and efficient WF effect in the ISM.Higher values of parameter   , however, can also occur due to larger beam sizes associated with emission spectra than the corresponding absorption spectra (see Table 1 and Appendix D).
With JGD, several of the joint Gaussian components yielded very high spin temperatures ( comp  > 4000 K).This resulted in a much broader distribution of the component spin temperatures, in contrast to the constant WF effect case, where most of the spin temperatures were limited to ≈ 3000 K. Higher spin temperature results in lower optical depths of the warm components, which renders several of these components undetectable in absorption spectra.This may be a possible reason behind the non-detection of components with spin temperature in the range of 2000 − 3000 K in observations (Murray et al. 2015), indicating an efficient WF effect.However, the high sensitivity absorption survey by Roy et al. (2013a) could detect several low-amplitude and broad components in absorption.Unfortunately, the spin temperatures of these components are not available.This, however, illustrates the importance of high-sensitivity absorption surveys to study the strength of the WF effect in the ISM.
With the maximum WF effect case, the KR method, in its original form, fails to estimate the spin temperature of the warm components, as it implicitly uses the Liszt (2001) relation between   and   .Thus, for a maximum WF effect, we modify and use   =   within the KR method.This improves its recovery of the true distribution for the maximum WF effect case.With this modification, however, the performance of the method worsens for a constant WF effect.This shows that the success of the method for recovering warm components with the KR method depends on our knowledge of the WF effect in the ISM.

Column densities of emission-only components
Estimating the physical properties of the emission-only components is difficult due to the unavailability of the corresponding absorption components and thus, Equations 14 and 15 cannot be used.The brightness temperature of emission-only components also suffers from absorption due to the optically thicker components.Here we use a simple method to estimate the column densities of the extra components in emission spectra, assuming that they are optically thin and do not contribute to the optical depth budget.We approximate the optical depth of the optically thicker components surrounding the optically thin emission-only component to be the value of () at the velocity center of the emission-only component, which we denote as ( comp ,em ).These components are potentially responsible for the attenuation of the emission from the emission-only component.To correct for this attenuation, we assume that a fraction  of the emission-only component lies in front of the optically thicker components and the rest lies behind.Using this two-phase model, we get the true peak brightness temperature estimate of the emission-only component as Statistically, we expect  ≈ 0.5 due to the diffuse nature of the warm components and use this value in our estimation.We use  est,peak  We denote  em H i as the sum of  comp,em H i for all the emission-only components for a line of sight.Figure 7 shows the line of sight column densities inferred from the joint absorption-emission Gaussian components only ( JGD H i , blue points) and including the emissiononly component column densities ( JGD H i +  em H i , orange points) for both the constant and maximum WF effect cases (top and bottom respectively). JGD H i +  em H i agrees with the true column density quite well.This serves as a verification of the completeness of the Gaussian decomposition of the spectra when emission-only components are included.This also shows that the above method gives a reasonable estimate of the emission-only components' column densities.We assume that all emission-only components comprise gas in the warm phase (in this case,   > 1000 K).We find that the new inferred gas phase fractions are in excellent agreement with the true fractions (see Table 2).
For both panels in Figure 7, we also include the total line of sight column density estimation from the KR method (black crosses).Compared to JGD inferred values, the KR method overestimates the column density from absorption components.This is an inherent limitation of the KR method as the parameters are chosen to approximately match the total column density in the first place (see §4.3).This leads to an overestimation of the spin temperatures, and thus column densities, of several warm components (for example, see the high inferred column density at   ≈ 2500 K in Figure 5), and consequently, an overestimation of the total column density.
We also perform the same analyses for the spectra with noise (see Appendix B for the details).No significant difference is obtained in the results, except for an even more prominent underestimation of the total column density with joint absorption-emission components.However, with noise, we can get an estimate of the spin temperatures of the emission-only components.This enables us to check the changes in the inferred column density distribution when these components are included.We see that we recover a significant amount of warm gas column density.For details, see Appendix B.

Properties of the synthetic Gaussian components
We note a very good agreement between the distribution of the inferred Gaussian components in this work and a similar analysis performed by Murray et al. (2017Murray et al. ( , 2018)) Figure 4 shows, along with the distribution of the synthetic components, the absorption components from observations (black points from GMRT/WSRT/ATCA data; Roy et al. (2013a)).The latter does not follow the overall trend of the synthetic components.They also tend to populate the region in the   −   space that is only sparsely populated by the simulation components.This difference in the nature of the Gaussian components is indicative of the possible difference in the nature of the gas clouds in simulations and observations, which further illustrates the quantitative difference between the two.

Temperature distribution of ISM gas
In this section, we compare the amount of gas in different phases as inferred by various observational surveys from 21 cm spectral data and as produced in the current simulations.Table 2 lists the column density fraction (equivalent to the mass fraction) of CNM, UNM, and WNM inferred by four surveys: the GMRT/WSRT/ATCA (Koley & Roy 2019), SPONGE21 (Murray et al. 2018), Arecibo (Heiles & Troland 2003b;Nguyen et al. 2019) and HI4PI (Kalberla & Haud 2018), along with the number of sightlines used and the number of extracted Gaussian components, wherever applicable.It is seen readily that the inferred phase fractions across different surveys show significant differences (for example, the estimated  UNM from GMRT/WSRT/ATCA vs. SPONGE21 in Table 2), which suggests that the results from different surveys are inconsistent.However, it is important to note that the gas phase definitions (temperature boundaries and whether spin or kinetic temperature) used in the various surveys often differ significantly (see footnotes in the table).The gas phase fractions can be quite different with different phase definitions (for example, see the phase fractions in the simulation domain with two different phase definitions, first row in Table 2).Thus, it is difficult to compare the inferred phase fractions across different surveys.This establishes the importance of using a common (or at least similar) definition of the gas phases across different studies (both simulations and observations).A similar need was also noted in McClure-Griffiths et al. (2023).
There are, however, a few factors that may make the observationally inferred phase fractions (or, in general, the temperature distribution) across surveys different to some degree.Firstly, the inferred cold and unstable gas fractions depend heavily on how the warm gas fraction is estimated.If the emission-only components are used to estimate the warm gas missing in absorption (similar to as discussed in §5.2), the inferred phase fractions change significantly (see SPONGE vs SPONGE+Emission-only in Table 2).Other than this, the analysis model (see §5.4.2 for a discussion) or the choice of line of sight (for example, low vs. high galactic latitude or proximity to molecular clouds) may also result in differently inferred phase fractions.
Despite the uncertainties involved with observational estimates, we attempt to bring out the commonalities among the results from different surveys and compare the observationally inferred gas-phase distribution with that in the simulation.For comparison, we choose the GMRT/WSRT/ATCA and SPONGE21 survey results for primarily two reasons: I) their analysis methods are similar to this work, and II) they sample the ISM across all galactic latitudes roughly uniformly, and thus do not suffer from the bias arising from the specific choice of lines of sight.Though we list the inferred phase fractions by Arecibo and HI4PI in the table for completeness, henceforth, we do not consider them owing to their very different analysis methods and/or gas phase definitions.On the other hand, results from a variety of ISM simulations, including large-scale ISM simulations, are available in the literature.Quantitatively, the different simulations often show different results.Here, too, we try to refer to the common features in all the simulations whenever possible.
Comparison of the gas properties in the simulation with those inferred from observations reveals a few quantitative differences, mostly related to the amount of cold and unstable gas.We first consider the  CNM and  UNM inferred from observations.As already discussed briefly, these values are significantly affected by the ways the warm gas fraction is estimated.To avoid this uncertainty, we instead consider the ratio of cold and unstable gas (i.e.,  CNM /  UNM ).Firstly, we note that Koley & Roy (2019) estimates a significantly high fraction of unstable gas (∼ 75%, 200 K <   < 5000 K) and a very low amount of cold gas (∼ 15%,   < 200 K).The ratio of the amount of cold and unstable gas, thus, is ≈ 0.2.This is significantly lower than that for the simulation (≈ 1 with the phase definitions with   , see §4.3).Similarly, with JGD, Murray et al. (2018) estimates unstable (250 K <   < 1000 K) and cold (  < 250 K) gas fractions of 41% (20%) and 56% (28%), respectively, without (with) including the emission-only components.Here, the ratio of the amount of cold and unstable gas, thus, is ≈ 1.4.This is again significantly lower than that of the simulation when the phase definition with   is used (≈ 2.8, see §4.2).
The quantity  CNM /  UNM is still dependent on the definition of gas phases.Thus, we now check the observationally inferred column density distribution itself.Although there is a significant amount of unstable gas in the simulation, the lower temperature peak in the distribution occurs at   or   ≈ 100 K, which lies in the temperature range of CNM.The distribution is also two-phase-like, with distinct peaks in the CNM and WNM regimes.Several large-scale ISM simulations also show similar behavior (Saury et al. 2014;Kim et al. 2013;Kim & Ostriker 2017;Rathjen et al. 2021;Kim et al. 2023).We contrast this with the distributions obtained from observations in Murray et al. (2018, Figure 8) and Koley & Roy (2019, Figure 6).Though the two distributions may differ quantitatively, they share the common feature of lacking the prominent two-peak feature.In fact, both the inferred distributions tend to peak in the UNM temperature regime, contrary to almost all simulations.
We now use the results of this work to show that the above differences are unlikely to arise solely from the uncertainties related to observational data analysis and the inferences thereof.Our analysis Table 2.The true and inferred gas phase fractions, for various cases, along the 200 lines of sights in the 512 3 simulation.cWF and mWF denote the constant and maximum WF effect cases, respectively.We primarily use two definitions of gas phases.First is in terms of   (following Murray et al. 2015, 2018, SPONGE21 survey) and used for the JGD method.The second is in terms of   (following Koley & Roy 2019, GMRT/WSRT/ATCA survey) and used for the KR method.For comparison, we provide the estimated phase fractions with these surveys.We also provide the phase fractions estimated by two other surveys: Arecibo survey (Heiles & Troland 2003a,b;Nguyen et al. 2019) and HI4PI (Kalberla & Haud 2018).We note that these works use different definitions of gas phase temperatures.We explicitly mention the definitions of phases used to compute the gas fraction.Additionally, we also provide the number of Gaussian components used for analysis for each of the cases.shows that both JGD and KR methods (even when the various observational effects are considered, see Appendix B, D and Table 2) are reliable in recovering the gas distribution of these phases reasonably well in most cases (see Figure 5, and also Figures B2, D1).Only with a very large emission beam averaging (over a rectangular beam of ∼ 6 pc, see Appendix D) JGD fails to recover the true distribution well.But such a case is unphysical, as argued in Appendix D, and unlikely to occur with observations.Even then, a qualitative twophase distribution is recovered, and the inferred gas phase fractions are broadly consistent with the other cases (see Table 2).The analysis methods, in any case, do not tend to overestimate the amount of unstable gas over the cold gas (inferred  CNM /  UNM is always ≥ the true value, see the values of  JGD and  KR in §4.2 and §4.3 respectively).Thus, at this stage, the indication of a higher (lower) amount of unstable (cold) gas in ISM from observations seems improbable to be an artifact of the analysis methods.

𝑓
There is also an additional discrepancy with the WNM gas.The inferred warm gas fraction in observations using both JGD (without including emission-only components) and KR methods is surprisingly low (see  WNM for "SPONGE" and "GMRT/WSRT/ATCA" rows in Table 2).Although our study suggests that we tend to underestimate the warm gas fraction with these analyses (e.g., some emission-only components are missing in absorption), it still does not explain the very low observationally inferred warm gas fractions.Observations suggest that warm gas is unexpectedly subdominant in absorption spectra.This may occur if the warm gas has very high temperatures through efficient WF effect, which decreases its optical depth.However, in our study, even with the maximum WF effect case, we do not see such a severe underestimation of warm gas.This puzzle was also noted by Murray et al. (2018).This behavior is clearly still an open question and is another point of difference between the simulations and observations.
When the emission-only components are included in JGD analysis, the inferred phase fractions change significantly (see "SPONGE" and "SPONGE+Emission-only" rows in Table 2).Our study shows that inclusion of emission-only components can help us recover the column density of warm gas, which is otherwise subdominant in absorption spectra (see Table 2 and Figures 7 and B3).In fact, the inferred warm gas fraction by SPONGE21 with emission-only components is in very good agreement with that in the simulation (see Table 2), and the values of  CNM and  UNM also come closer to that in the simulation.

Limitations and Future Scopes
The differences between the simulations and observational inferences, as discussed in the preceding section, are mostly open questions.The limitations of the current simulations, both in terms of capturing all the essential ISM physics or insufficient numerical resolution, may lead to such disagreements.There may be several shortcomings with observations, too, which may lead to erroneous inferences and are yet to be properly studied.Below, we discuss some of the aspects that demand future work in order to better understand and/or resolve these discrepancies.

Simulations
There are mainly two sources of limitations in simulations -the ability to include all essential ISM physics and the effect of numerical resolution.We discuss them separately.
ISM Physics: Several small-scale simulations aimed at studying ISM properties, like the one used in this work, potentially suffer from inaccuracies with regard to the proper implementation of the various physical processes occurring in ISM, especially on larger scales.One such aspect is the way turbulence is driven in the medium.These simulations use a constant turbulent forcing uniformly for the whole simulation domain.However, this may not be an appropriate description of turbulence driving in ISM, where transient supernova explosions are one of the main sources of turbulence.The gas temperature distribution depends on the position of the supernovae relative to the ISM thermal phases (Gatto et al. 2015).Supernovae exploding within cold clouds may heat the surrounding gas, leading to higher amounts of unstable phase gas.The intermittent nature of supernovae may also influence the amounts of cold and unstable gas: a significant amount of unstable gas may form in the decaying phase of the turbulence when the driving is not active (see, for example, Figure 12 in Saury et al. 2014).Additionally, strong magnetic fields generated at supernova shock fronts may stabilize thermal instability (e.g., Sharma et al. 2010) preventing the formation of cold clouds and leading to higher amounts of isobarically unstable phase gas (for observational evidence of high magnetic field regions see Thomson et al. 2019;Bracco et al. 2020).Other than this, underestimation of the strength of turbulence forcing in the ISM can affect the gas distribution to a large extent.Increasing the turbulence strength can increase (decrease) the amount of unstable (cold) gas and can also lead to a complete wipe-out of the bimodal gas distribution (Seifried et al. 2011).Galactic potential also affects the gas properties in ISM.
It leads to the accumulation of denser cold clouds nearer to the galactic plane, which may lead to higher amounts of unstable or warm gas at higher latitudes.Such effects are not captured in small-scale simulations due to the absence of an applied galactic potential.Numerical Resolution: Insufficient simulation resolution often does not lead to convergence of the amount of gas in different phases and may lead to incorrect conclusions.We refer to the resolution study in this work (Appendix C), which shows that with finer resolution, the amount of cold (unstable) gas increases (decreases) significantly when the other simulation conditions are kept unchanged.Though this shows that increasing resolution may not solve the problem of lower unstable gas in simulations (rather make it worse), it is necessary to attain convergence for comparison with other studies and observations.The large-scale ISM simulations, like those by Kim et al. (2023, TIGRESS-NCR and previous works) or Rathjen et al. (2021, SILCC VI and previous works), which otherwise incorporate more known ISM physics than us, suffer from this limitation.With resolutions of the order of a few parsecs, the cold and unstable gas regime is often not well-resolved, which may potentially lead to inaccurate cold and unstable gas fractions in such simulations.
We now note that the two sources of limitations discussed above cannot be resolved individually.To see how large-scale ISM physics affects the temperature distribution of the gas, it is essential to devise ways to incorporate such effects within high-resolution small-scale simulation domains, which can resolve all the essential gas structures.In other words, it is necessary to bridge the gap between the several large and small-scale simulations.Such a setup will help us to quantify the effect of various ISM physics on the amount of gas in different phases better and to understand the origin of the disagreements between simulations and observations.

Observations
Below, we discuss three main limitations associated with the observations.We note that our study includes the effects of the first two, and we do not witness significant alteration of the inferences in re-alistic scenarios.However, we acknowledge the possibility of larger effects in observation, which require further tests with different types of simulations.
Larger Emission Beam Width: One of the main limitations associated with the observations is the larger emission beam widths.Being mostly small-scaled, cold gas can be affected by averaging over larger areas as it can reduce their significance in the spectra, potentially leading to their underestimation (see Appendix D).Our study with a very large assumed emission beam width (averaging the emission spectra over 15 × 15 pixels around the single-pixel absorption spectra) shows that such effects can potentially lead to significant errors in the estimation of the column density distribution of the gas.Though such extreme beam smearing is unlikely to occur in observations (with the spatial resolution achieved with the current surveys, see Appendix D), it is important to recognize the errors that it may cause.We also note that, in reality, the averaging happens over a cone (corresponding to the angular beam width, unlike the implementation in this study), which may affect the inferences differently.Efforts have to be made to acquire emission spectra with beam sizes as small as possible and quantify the corrections from this effect through analysis with simulations.A comparison of observations with different emission beam sizes will also be useful to quantify the uncertainties.

Spectral Blending of Incoherent Components:
The data analysis techniques involving Gaussian decomposition have several limitations, though they are able to recover the ISM properties broadly.One of them is velocity crowding, where the spectral profiles of spatially incoherent components blend together to give a single apparent Gaussian component, leading to erroneous inferences (Dickey & Lockman 1990).Several attempts have been made to correct this in spectral line fitting (Haud 2000;Martin et al. 2015;Miville-Deschênes et al. 2017;Marchal et al. 2019).Another similar effect has also been noted in several studies where spatially incoherent structures at a velocity channel can blend to give apparent high-density structures, which may not arise from true cold components (Lazarian & Pogosyan 2000;Fukui et al. 2018;Hu et al. 2023).We encountered yet another effect in our work.We found that the blending may also take place between spatially coherent structures but with very different temperatures (thermally incoherent; in some cases, the blended components' temperatures were seen to span an order of magnitude).This breaks down the assumption that the Gaussian components are isothermal.The inferred spin temperature can be close to the harmonic mean of all the blended components, hence biased towards the low-temperature values.Such erroneous temperature estimations may affect the recovery of warm gas properties from the spectra.The frequency of such blending or its effect on the inferred properties, however, remains to be quantified.

Analysis Model Dependence:
The dependence of the quantitative inferences on the adopted analysis methodology also cannot be ruled out, especially those involving Gaussian decomposition, which may suffer from having degenerate solutions.Additionally, the success of these methods may vary across systems with different gas morphology.Such a possibility has already been noted in Murray et al. (2017).Using hydrodynamic simulations with a galactic potential by Kim et al. (2013), they showed that their JGD method does not work equally well for different latitudes (see §5.3 in Murray et al. 2017).On this front, the adoption of a common set of data analysis methodologies across different studies is preferable, and they should be tested against different simulations.

CONCLUSIONS
Several analysis methods are regularly applied on ISM H i 21cm observational data to infer the temperature and column density of the gas.These methods, however, need verification against simulation data and synthetic spectra to ascertain their reliability.In this work, we consider analysis methods both with and without the Gaussian decomposition of the spectra and test them against synthetic spectra generated from MHD simulations by Seta & Federrath (2022).Our work also resulted in new physical models to explain the behaviour of the spectral data.Recognizing the uncertainty regarding the extent of the Wouthuysen-Field (WF) effect in the ISM, we have tested the analysis methods and models with two choices of the WF effect: constant (following Liszt 2001) and maximum (  =   ).We primarily perform our analysis with noiseless data.However, to test how practical problems associated with observations may affect the inferences, we consider two other cases: spectra with noise (Appendix B) and spectra with larger emission beam width than absorption (Appendix D).To test the stability of our methods and the convergence of the results, we also consider simulations with different resolutions (Appendix C).Below, we summarize our main results: • We develop a new quantitative model that explains the right boundary of the   () −() distribution and fits both the simulation and observed ISM data quite well (see Figure 2).The model yields a relation between the maximum non-warm (cold and unstable) gas temperature,   and the optical depth :   ∼  −0.4 , which is similar to classical   −  correlations.On the other hand, a single model cannot explain the left boundary of the distribution for both the simulation and observed data.This is possibly due to the different cold gas morphology between the true and simulated ISM.Further extensions of our model and detailed comparison between ISM and synthetic data may constrain the ISM properties, including the nature of turbulence as well as the presence of very small-scale or highvelocity cold gas.
• We perform joint Gaussian decomposition (JGD) of the synthetic absorption and emission spectral lines.The properties of the inferred Gaussian components (Figure 4) are qualitatively similar to those obtained by earlier work, despite the different simulation conditions and analysis algorithms.This demonstrates the underlying commonality between different ISM simulations.JGD could recover the column density distribution of gas with   ≲ 2500 K quite well, whereas the warm column densities are underestimated (Figure 5).We further show that the inclusion of emission-only components (components recovered from emission spectra but undetectable in absorption) in column density estimation can help recover most of the warm gas column density (Figure 7 and also Figure B3 for spectra with noise).
• We test the newly proposed method by Koley & Roy (2019) to extract the kinetic temperature and column densities of the Gaussian components using only the absorption spectra.Like the JGD method, this method too can recover well the properties (temperature and column density) of gas with   /  ≲ 2500 K (Figure 5).At higher temperatures, the estimations become unreliable.The success of this method with the warm components is dependent on the knowledge of the strength of the WF effect (see discussion in §5.1).
• With noise and larger emission beam widths, though broadly, our results and inferences remain unchanged, the inferences suffered from quantitative differences (see Appendix B and D and the related figures).We show that larger emission beam widths may alter the parameter values of our model for the   () − () distribution and may lead to less accurate estimation of cold gas column densities with the JGD method.On the other hand, noise adds to the underestimation of warm gas by further suppressing the low-amplitude components in absorption.
• In observations, the inferred gas phase fractions from different surveys are often not consistent.The different gas phase definitions used in the different surveys may significantly contribute to such apparent inconsistencies.This establishes the need to adopt a common set of gas-phase definitions.Other factors like the analysis model dependence of the inference or bias from the choice of line of sights may contribute to quantitative differences in both phase fractions and temperature distribution.These factors demand corrections.Despite these, inferences from several surveys show common features that we compare to the simulation properties.
Comparison of the inferences from SPONGE21 (Murray et al. 2018, which uses the JGD method) and GMRT/WSRT/ATCA survey (Koley & Roy 2019, which uses the KR method) with the simulation shows quantitative disagreements (see §5.3.2 for the detailed discussion).The JGD-inferred column density distributions from both surveys lack the two-phase property, as seen in this and several other simulations, and tend to peak in the UNM regime.We also show that the ratio of the amount of CNM to UNM inferred from observations is a factor of ∼ 2 − 5 (depending on the gas phase definition) lower than in the simulation.Thus, observations suggest higher (lower) amounts of unstable (cold) gas than in the simulation.Our analysis shows that these disagreements are unlikely to be just an artifact of the observational data analysis methods.
• We discuss the scope of future studies to understand better the origin of the above-mentioned discrepancy between simulations and observations.We establish the need to develop ways to incorporate the large-scale turbulence driving effects in ISM (for example, intermittent supernova explosions) in smaller high-resolution simulation domains with resolution convergence studies as done in this work (see Appendix C).With observations, the analysis model dependence of the inferences needs to be quantified through verification against simulations.Such efforts will enable checking for consistency of inferences across various surveys and more robust comparisons of simulations and observations.ucts from the Leiden/Argentina/Bonn Galactic H i survey and the ATCA/GMRT/WSRT H i absorption survey for this work.We are also grateful to the anonymous reviewer for carefully reviewing our manuscript and providing constructive comments.

DATA AVAILABILITY
The 21 cm data used for this study are available publicly from the ATCA, GMRT and WSRT archives and the AIfA H i Surveys Data Server.The simulation data will be provided upon reasonable request to the corresponding author, AS, or CF.All the derived quantities and models produced in this study will be shared on reasonable request to the corresponding author.
We note that even after applying the termination checks, there can be over or under-fitting of the spectra.This, however, is inevitable, as the total spectrum may not exactly be a sum of Gaussians.Also, Gaussian decomposition is not a well-posed problem in general, as Gaussians do not form an orthonormal basis (Haud & Kalberla 2007;Lindner et al. 2015).

A2 Joint decomposition
For the joint decomposition, we first fit the absorption spectrum (method described in the previous section).We use the absorption spectrum Gaussian components to fit the emission spectrum by restricting the Gaussian center and width (allowing a variation of ±0.5 km s −1 ) and allow for up to 5 additional components to account for any warm components missed in the absorption decomposition, though we do not restrict the additional Gaussian components' width to be in the warm regime.For the additional components, we use initial parameter values of 0 km s −1 , 1 km s −1 and 1 K for the mean, , and amplitude respectively, and we place no restrictions on any of the parameter values.The termination in fitting is done following the termination criteria discussed in the previous section.

A3.1 JGD
The maximum temperature associated with a component is given by  In our decomposition, we do not force this condition while performing the fit to allow the algorithm to span a larger parameter space, which was seen to improve the quality of the fit, but for most components, the condition is seen to be satisfied naturally, which is also an indication that the Gaussian decomposition can extract the true physical Gaussian components.For the constant WF effect case, ⪆ 80% of the components were found to have  ≈  comp max when turbulence broadening is relatively small.Uncertainty in Gaussian decomposition and the related error may lead to some components having slightly higher estimated spin temperature than the maximum temperature.But in this case, too, we see that ≈ 80% of the component satisfies the relation Thus for most components, the spin temperature lies very close to the limit of  comp max .Such occurrences of a small fraction of components with  comp  >  comp max have also been noted with observed data (Heiles & Troland 2003a), even with careful decomposition of the data.These arise either due to misfitting of the spectra or due to incorrectly assigned Gaussian counterparts in absorption and emission spectra.For our analysis, in both cases, we eliminate all components for which  comp  > 1.1× comp max , with the 10% error window chosen to allow for errors related to Gaussian fitting, even for the true components.Additionally, we also discard all components with    < 0.1K and   < 5 × 10 −4 K.

A3.2 KR Method
The KR method considers that a fraction  of the total broadening is of non-thermal origin and iteratively solves for this fraction (for details, see Koley & Roy 2019).We impose the convergence that two consecutive values of  being within 5% of each other within 200 iterations.We disregard all the components for which convergence is not attained.We also discard all components for which the inferred kinetic temperature is > 15000 K. Additionally, we also discard all components   < 5×10 −4 K.We briefly note that these criteria yield a higher number of absorption components than the JGD criteria described in the previous section.However, no significant difference in the final inferences is seen if the JGD criteria-inferred absorption components are used for the KR method.
We note that the termination criteria for Gaussian fitting and subsequent component selection criteria were chosen mostly based on testing the algorithm on a few spectra.Small changes in the chosen values of the associated parameters were seen not to affect the results much, even quantitatively.

APPENDIX B: THE EFFECT OF NOISE
Noise in observed ISM spectra can potentially affect the analyses and the inferences.To test this, we add noise to the synthetic spectra (generated using the methodology described in §2.2) and perform our analyses.For the optical depth and emission spectra, we add random Gaussian noise with standard deviations of   = 10 −3 and    = 0.2 K respectively in each velocity channel, following the treatment in Murray et al. (2018).These values are in ballpark agreement with the noise levels in other surveys.We perform this analysis with the 512 3 simulation.
The   −  distribution is not affected by noise except at the low optical depth/brightness temperature regime of the warm gas.The values of the parameters  , and  (see Eq. 9) do not change significantly.However, the estimation of parameter   becomes difficult with higher noise levels.High-sensitivity absorption surveys (like Roy et al. 2013a, where   ≈ 2 × 10 −4 ), which can yield a  5).The inferred distribution agrees well with the true distribution in the lower temperature regime.The underestimation at higher temperatures is more prominent than with the noiseless spectra.We also test the Gaussian decomposition analysis methods.Gaussian decomposition of noisy spectra is challenging, and several dedicated algorithms have been devised for this task (Lindner et al. 2015;Marchal et al. 2019).In this work, we employ a much simpler method and perform a similar incremental Gaussian fitting as with the noiseless spectra (with appropriate modifications; see Appendix A for the details).Figure B1 shows a couple of examples of spectra with noise and the inferred Gaussian components.We perform the same analyses as done in §4.2 and §4.3 with the same 200 lines of sight we used previously.
Figure B2 shows the inferred column density distribution for the constant WF effect case.Even with noise, the recovered distributions with JGD and KR methods agree well with the true distribution in the lower temperature regime of   /  ⪅ 2500 K.This is mostly because the optical depth and brightness temperature values of these components are significantly higher than the noise level.Due to noise, however, the low optical depth warm components are further suppressed in the absorption spectra compared to noiseless spectra.This manifests both as the lower number of inferred absorption components (390 compared to 546 for noiseless spectra in constant WF effect case) and an increased underestimation of the warm gas column density from the noiseless spectra.The results are similar for the maximum WF effect case.See Table 2 for the summary of the results.
We check how the inclusion of the emission-only components affects the column density distribution over temperature.As these components are undetected in absorption, we assume that the peak optical depth of these components is close to the noise level,   .The spin temperature can be estimated from the following relation: where  est,peak  is the estimated peak brightness temperature of the warm components (see Equation 18) and  is a suitably chosen factor.Figure B3 shows the revised column density distribution over spin temperature for both the constant and maximum WF effect cases (top and bottom respectively, red histogram) when the emission-only components are included with the JGD analysis.For each case, we use the value of , which makes the resultant distribution closer to the actual distribution.For comparison, we also show the distribution without the emission-only components (orange histogram).As expected, the difference lies mostly in the warm region.We clearly see that by including the emission-only components, we recover significant amounts of the warm gas column density.

APPENDIX C: THE EFFECT OF SIMULATION RESOLUTION
As a convergence test, we used simulations with different resolutions, namely 128 3 and 252 3 .The amount of cold, unstable, and warm gas in the medium is dependent on the resolution, with the fraction of unstable gas in the medium decreasing with increasing resolution (see Table C1 for the mass and volume fractions of the gas phases).This is because simulations with lower resolution fail to adequately resolve the cooling fronts at scales comparable to the cooling length, which mostly occur on unresolved scales.We perform the analysis only for noiseless spectra.
The difference in cold and unstable gas morphology in the lower resolution affects the   () − () distribution of the spectral data.Figure C1 shows the distribution and the model for the boundaries along with the fit parameters for the constant WF effect case.The value of the parameter  (see Eq. 9) is higher than the value obtained for the 512 3 simulation data, while the other two parameter values do not differ significantly.For the maximum WF effect case, the parameter   takes the value of ≈ 6000 K, leaving the other parameters nearly unchanged from the constant WF effect distribution.The lowest resolution (128 3 grid) simulation domain contains a significant number of data points in the left region, which decreases with increasing resolution.The origin of the data points in the lowest resolution domain has been identified to be cold clouds with velocity and temperature distinctly different from its immediate boundaries (see Figure C2.This is an artifact of insufficient resolution, where the transition between surrounding gas components is often not well resolved.However, a similar effect may be achieved in the ISM with fast-moving cold clouds (see §3.2.3).The frequency of such cases, and consequently the number of data points in the left, decreases with increasing resolution.Broadly, the   () − () distribution and the model parameters also converge with increasing resolution.
We apply the Gaussian decomposition methods (both JGD and KR methods) to the spectra generated from the lower-resolution simulation of 200 randomly chosen lines of sight.Figure C3 shows the results for the same.Similar to all previous cases, the column density distribution of the non-warm gas is recovered well.The distribution is significantly different for the 128 3 simulation, however, the distribution converges with increasing simulation resolution.

APPENDIX D: EFFECT OF EMISSION BEAM WIDTH
As already mentioned in §5.4.2, the emission spectra are recorded over a much larger area in the sky than absorption spectra.We test how this may affect the inferences and the outcomes of the analysis techniques by averaging the emission spectrum over a region of    around the pixel used to generate the absorption spectrum.Here we consider three cases of  = 3, 5, and 15, which correspond to ∼ 1.2 pc, ∼ 2 pc, and ∼ 6 pc in spatial scale.We note here that the case of  = 15 has been included to demonstrate how very broad emission beams may affect the inferences.Averaging over such a spatial scale smears all the smaller-scale cold gas structures (most of which have scales of 1 − 10 pc).However, this may not happen in observations, and the cold structures are recovered well in spectra.
To substantiate our argument regarding the  = 15 case, we do a rough estimate of the averaging that may happen in observations.The high-resolution emission surveys like HI4PI or Arecibo, which are used in such studies, have beam widths of 3 − 4 ′ .The furthest ends of the Milky Way correspond to a distance of ∼ 10 kpc.This amounts to a spatial scale of ∼ 10 pc.Thus, this corresponds to the largest possible scale of averaging (at the farthest ends of the galaxy).Thus, averaging over ∼ 6 pc scale throughout for a 200 pc simulation domain is an extreme case, with which we try to magnify the effect (which also shows the strength of these numerical experiments).However, we recognize the possibility of similar spatial averaging effects in past surveys with large emission beam widths (e.g., LAB).This shows the importance of a systematic study utilizing surveys with different beam widths to constrain this effect (see §5.4.2 for a discussion).Such an analysis is beyond the scope of this paper, but we plan to do it in the future (see Appendix E).Additionally, we note that the various parameters (for joint fitting and termination, see Appendix A) used in our Gaussian decomposition algorithm may require changes, as the current version may not be optimized for such a large emission width.However, for a fair comparison, we have used the same parameters and algorithm throughout.
All parameters of the model for   () − () are affected when the emission beam size is increased, with the effect increasing with increasing . Figure D2 shows the distributions for the two cases, and the model fits for the constant WF effect case.For the right boundary model, the parameter  decreases, and the other two parameters increase.The left boundary also shifts towards the left (leading to a decrease in parameter  ′ ).These translate to an overall decrease (increase) in the non-warm (warm) gas brightness temperatures.However, similar to all previous cases, only the parameter   changes from constant to maximum WF effect case (6000 K, 7800 K and 15000 K for  = 3,  = 5 and  = 15 respectively).
Figure D1 shows the inferred column density using JGD for the same 200 lines of sight used in our work previously.The KR method does not depend on the emission spectra; thus, we do not include it.For comparison, we also show the JGD inference with the standard case of single-pixel emission spectra ( = 1).We see that with increasing emission beam size, there is a tendency to underestimate the non-warm gas, with the underestimation being severe for  = 15.No significant change in the warm gas estimation is seen.The quality of the joint Gaussian fits is seen to decrease with increasing .Overall, in all cases, a qualitative two-peak distribution is still recovered with a minimum in the unstable phase, though, for  = 15, the inferred distribution differs significantly from the true distribution and shows a somewhat flat distribution throughout the cold and unstable phases.
The reason behind the above effects possibly lies in the fact that the cooler gas components have smaller scales.Averaging over a larger region reduces the significance of these components in the spectra.The properties of the warm gas, having larger scales, do not change much over the beam width, and thus, their contribution to the spectra is not significantly affected.

Figure 1 .
Figure 1.An example of a line of sight.Left: Variation of the kinetic temperature,   , and number density, , along the line of sight, colored according to the line of sight velocity (  ).The cooler clouds are usually associated with higher density.Right: The corresponding emission spectrum,   (), and absorption spectrum,  (), with the data points coloured according to  iso  () (Equation7).The appearance of a broad warm component in the emission spectrum and its absence in the absorption spectrum displays the inability of the absorption spectrum to detect warm gas.

Figure 2 .
Figure 2. The   ()- () distribution for synthetic spectra for randomly chosen 1000 lines of sight (Left, the points are coloured according to  iso  (), Equation 7) and observed data from Roy et al. (2013a) (Right).The red horizontal line marks  = 0.005.The brown lines denote the models for the right boundary (see section 3.2.2).Both the simulation and observational data are well-fitted by the model.The parameters for synthetic and observed data are: (  , ,  , ) = (3200 K, 130 K, 9 K) and (5000 K, 200 K, 13 K) respectively.The black solid line shows the left boundary model (see section 3.2.3)which fails to fit the synthetic data satisfactorily.

Figure 3 .
Figure 3. Examples of absorption and emission spectra and their Gaussian decomposition results.The blue points are the spectral data points, and the orange line represents the net Gaussian fit (sum of all Gaussian components).The black dashed lines show the individual Gaussian components inferred.Several warm components subdominant in absorption shows up as extra components in emission (the emission-only components).On an average, there are about 2 − 4 absorption-emission joint Gaussian components and 0 − 2 emission-only components per line of sight.

Figure 4 .
Figure 4. Left: The width,   , and amplitude,   , of the absorption components from the KR method (green points) and those from JGD (orange points).For comparison, the absorption data from Roy et al. (2013a) are also plotted (black points).The observed components do not follow the distribution of the synthetic components, showing a quantitative difference between the observations and the simulation.( §5.3).Right: The peak brightness temperature,    , and   of the emission components from JGD and the emission-only components.As expected, warm gas with larger    and   is missing in the emission-only data.The histograms show the distribution of the parameters corresponding to the respective axes for both plots.

Figure 5 .
Figure 5. Inferred column density distributions (step histograms) compared to the true distribution (filled histogram).Top: Inferred distribution of the spin temperature using both the JGD method ( §4.2) and KR method on absorption components ( §4.3).Bottom: Inferred distribution of kinetic temperatures of the absorption components using the KR method.The inferred distributions provide a reasonable match to the true distributions for temperatures ≲ 2500 K, but are comparatively less reliable at warmer temperatures (≳ 2500 K).

𝑋=
∈  comp H i /  comp H i , where  can be CNM, UNM or WNM.For a fair comparison among the phases, we define  JGD  =  JGD  /  true  , where  true  denotes the true phase fractions in the simulation.Considering all the cases, we find  JGD CNM ,  JGD UNM and  JGD WNM take values in the range of 1.1 − 1.6, 1.1 − 1.8 and 0.4 − 0.9.

Figure 6 .
Figure 6.The true line of sight velocity dispersion in the simulation (blue points) and the model (red line) given by Equation 17 (with  = 0.64,  = 0.37 and ℓ  = 12 pc).The blue points were computed by taking several 1D lines of sight inside the simulation cube of varying lengths, computing the dispersion of the −axis velocity, and taking the median.The bars denote the 1 range.The vertical dashed line corresponds to the viscous cut-off length (see Eq.17).
Figure 6.The true line of sight velocity dispersion in the simulation (blue points) and the model (red line) given by Equation 17 (with  = 0.64,  = 0.37 and ℓ  = 12 pc).The blue points were computed by taking several 1D lines of sight inside the simulation cube of varying lengths, computing the dispersion of the −axis velocity, and taking the median.The bars denote the 1 range.The vertical dashed line corresponds to the viscous cut-off length (see Eq.17).
Figure 6.The true line of sight velocity dispersion in the simulation (blue points) and the model (red line) given by Equation 17 (with  = 0.64,  = 0.37 and ℓ  = 12 pc).The blue points were computed by taking several 1D lines of sight inside the simulation cube of varying lengths, computing the dispersion of the −axis velocity, and taking the median.The bars denote the 1 range.The vertical dashed line corresponds to the viscous cut-off length (see Eq.17).

Figure 7 .
Figure 7.The inferred line of sight column density using only the joint absorption-emission components ( JGD H i ), including the emission-only components ( JGD H i +  em H i ) and using only absorption components and the KR method ( KR H i ) (Top: constant WF effect case, Bottom:   =   case). JGD H i +  em H i agrees well with the true column density.The equality lines are shown as black lines.The KR method overestimates the line-of-sight column densities.
comp max = 121 2  .Due to non-thermal broadening, the kinetic temperature of the component  comp  is usually less than  comp max , and  comp  is, in turn, less than or equal to  comp  because of insufficient collisions or radiative coupling.Thus, a required physical condition for the Gaussian components is  comp  <  comp max .
for the maximum WF effect case (see §5.1), only around 66% of components have  comp  <  comp max .For the latter, for several components, we expect  comp

Figure B1 .
Figure B1.Examples of emission and absorption spectra with noise and the corresponding Gaussian components inferred using the JGD method.

Figure B2 .
Figure B2.Inferred column density distribution from spectra with noise (for figure details, see caption of Figure5).The inferred distribution agrees well with the true distribution in the lower temperature regime.The underestimation at higher temperatures is more prominent than with the noiseless spectra.

Figure B3 .
Figure B3.Inferred column density distribution with JGD with and without the inclusion of the emission-only components.The spin temperature of the emission-only components is estimated using Equation B1.For reasonable values of , the difference mostly lies in the warm gas regime.

Figure C2 .
Figure C2.Examples of discontinuities in physical parameters of adjacent grid cells (both in temperature and velocity in the top figure at ≈ 23 km s −1 , and in velocity in the bottom figure at ≈ 95 km s −1 ) arising from numerical effects, which leads to the data points in the left region of the   () −  () distribution.In ISM, however, similar effects can be expected with fastmoving cold clouds through the ambient medium (see §3.2.3).

Figure C3 .
Figure C3.The inferred column density distribution for 128 3 (Left) and 252 3 (Right) simulations for randomly chosen 200 lines of sight using both JGD and KR methods.The non-warm distribution is well recovered.This shows that the analysis methods are stable with resolution.For the distributions with 512 3 grids simulation, see Figure5.

Figure D1 .
Figure D1.The inferred column density distribution using JGD when the emission spectra are averaged over  ×  pixels around the absorption pixel for  = 3 (red),  = 5 (blue), and  = 15 (violet) for both the constant and maximum WF effect cases with the 512 3 grids simulation.For comparison, the inferred distribution with no averaging ( = 1) is provided (orange).

Figure E1 .
Figure E1.LAB and HI4PI emission data over-plotted for two representative lines of sight.Despite the different emission beam widths, the spectra are quite similar.Thus, we do not expect our inferences to change significantly if HI4PI data is included.

Table 1 .
The parameter values for the model of the right boundary of the   () −  () distribution for the different cases assumed in this work.cWF and mWF represent constant and maximum WF effect cases, respectively.The parameters  and  , do not change significantly between the two cases; thus, only one value is reported.
c Difficult to estimate with the assumed noise levels with a fraction  of the warm component being in front of the nonwarm component.Assuming negligible optical depth from the warm components, the brightness temperature then becomes Kim et al. (2013)7)5) (2013)simulation data and the Gausspy(Lindner et al. 2015)decomposition algorithm (see Figure4here compared to Figure11and 13 inMurray et al. 2017).Such a similarity is surprising, given the very different simulation setups ofKim et al. (2013)and this work and the different Gaussian decomposition algorithms used.This clearly shows that different ISM simulations have several underlying similarities in terms of observational signatures, which are largely successfully extracted by the Gaussian decomposition algorithms.

Table C1 .
The phase fractions in terms of mass (volume) for the three different resolutions.We use the definition of gas phases in terms of   as defined in §2 for the simulation domains with different resolutions.