The Halo21 Absorption Modeling Challenge: Lessons from “Observing” Synthetic Circumgalactic Absorption Spectra

In the Halo21 absorption modeling challenge we generated synthetic absorption spectra of the circumgalactic medium (CGM), and attempted to estimate the metallicity, temperature, and density ( 𝑍 , 𝑇 , and 𝑛 H ) of the underlying gas using observational methods. We iteratively generated and analyzed three increasingly-complex data samples: ion column densities of isolated uniform clouds, mock spectra of 1–3 uniform clouds, and mock spectra of high-resolution turbulent mixing zones. We found that the observational estimates were accurate for both uniform cloud samples, with 𝑍 , 𝑇 , and 𝑛 H retrieved within 0 . 1 dex of the source value for (cid:38) 90% of absorption systems. In the turbulent-mixing scenario, the mass, temperature, and metallicity of the strongest absorption components were also retrieved with high accuracy. However, the underlying properties of the subdominant components were poorly constrained because the corresponding simulated gas contributed only weakly to the H i absorption proﬁles. On the other hand, including additional components beyond the dominant ones did improve the ﬁt, consistent with the true existence of complex cloud structures in the source data.


INTRODUCTION
With the understanding that galaxies and their environments are part of an interconnected galactic ecosystem, one of the primary goals of galaxy-scale astrophysics for the next decade is to identify the processes that drive galaxy growth (National Academies of Sciences, Engineering, and Medicine 2021).Central to this effort are absorption system observations (e.g.Bahcall et al. 1993;Lanzetta et al. 1995;Lauroesch et al. 1996;Charlton & Churchill 1996;Churchill et al. 1996;Prochaska & Wolfe 1997;Rauch et al. 1997;Tumlinson et al. 2013;Werk et al. 2014;Prochaska et al. 2017;Kacprzak et al. 2019;Lehner et al. 2020a), one of the only constraints on the gaseous atmospheres of galaxies (the circumgalactic medium; CGM) as well as the intergalactic medium (IGM) between galaxies.Absorption system observations typically consist of ∼ 1 absorption spectrum per halo, each from a beam of background light (typically a quasar) that is partially absorbed by intervening gas en route to the observer.These observations provide essential data, but are also notoriously difficult to interpret.
The challenge in interpreting absorption spectra arises in two ways.The first challenge is that it is highly non-trivial to extract the prop-erties of the gas responsible for a given absorption spectrum.In observations the properties of the gas are derived based on spectra of individual ions (e.g.H , Mg , and O ), and their interpretation is hampered by line saturation, large and uncertain ionization corrections (e.g.Schaye 2006;Acharya & Khaire 2021), and more.One of the largest uncertainties is the structure of the absorbing gas-the simplest assumption is that the absorbing gas is a single cloud with uniform temperature, density, and metallicity, but many absorption spectra are best fit by assuming multiple clouds spanning a range of properties (e.g.Boksenberg et al. 1979;Muzahid et al. 2015;Liang & Kravtsov 2017;Lehner et al. 2019b;Wotta et al. 2019;Haislmaier et al. 2021;Sameer et al. 2021;Zahedy et al. 2021a;Marra et al. 2021;Narayanan et al. 2021;Nielsen et al. 2022), with possible overlapping spectra from physically-separated clouds (e.g.Marra et al. 2022).High-resolution simulations and ionization models predict a distribution of many small clouds instead of a few larger clouds (e.g.Fielding et al. 2020;Lehner et al. 2019a), consistent with nearby ( 100 pc to the sun) interstellar absorption systems best fit by multiple components (e.g.Welsh & Lallement 2010).
The second challenge is that the CGM is expected to be a highly complex, multiphase system that pushes the limits of available in-formation.The information obtainable from an absorption spectrum is typically restricted to the temperature, density, and metallicity as a function of line-of-sight velocity along 100 spectral skewers through the CGM of a given galaxy.If the CGM is described by a few discrete clouds (as proposed by earlier models for absorption system distributions, e.g.Srianand & Khare 1994;Das et al. 2001;Maller et al. 2003) a small number of sightlines per halo might be sufficient to constrain models of the CGM.Instead, observations suggest metallicities, densities, and temperatures vary over orders of magnitude across the CGM of different galaxies (e.g.Lehner et al. 2019bLehner et al. , 2022b) ) as well as within the CGM of a single galaxy (e.g.Lehner et al. 2020b).This is consistent with analytic theory and high-resolution simulations that predict fragmentation and mixing of circumgalactic and intergalactic clouds (e.g.Maller & Bullock 2004;McCourt et al. 2018;Hummels et al. 2019;van de Voort et al. 2019;Peeples et al. 2019;Mandelker et al. 2019;Mandelker et al. 2021).A complex multiphase CGM is also seen in cosmological simulations, which predict wind from the central galaxy, wind and stripping from satellite galaxies, and pristine accretion all give rise to absorbing gas, and in fact multiple origins likely overlap along a given sightline (e.g.Hafen et al. 2019Hafen et al. , 2020;;Saeedzadeh et al. 2023).As such, even given perfect information from each observed absorption line spectrum it may be challenging to use that information to gain a clear picture of cosmic ecosystems.
Despite the above concerns, there is reason to be optimistic, as interpretations of absorption spectra continue to improve (e.g.Churchill et al. 2015;Sameer et al. 2021), often with insight from synthetic spectra (e.g.Hummels et al. 2013;Liang et al. 2018).This paper marks the next step in improving interpretations of absorption spectra-the Halo21 Absorption Modeling Challenge, a mock-data challenge (e.g.Regimbau et al. 2012;Meacher et al. 2015;Hazboun et al. 2019) that combines the expertise of observers and theorists to test the limits of the current methodology for interpreting absorption spectra.
Attendees of the Halo21 KITP virtual conference1 showed significant support in addressing the above issues, and subsequently we organized the Halo21 Absorption Modeling Challenge.We invited the conference's 279 attending scientists to participate in one of two groups: theorists and observers.The premise of the Halo21 Absorption Modeling Challenge was for the theorists to generate mock data (ion column densities and spectra), and for the observers to estimate the source metallicity, density, and temperature of the provided mock data.By doing so we aimed to improve our understanding of the uncertainties and systematics involved in absorption-system parameter estimation.This allows for a validation of analysis techniques in isolation from real spectra, which may be affected by higher-amplitude noise, fixed-pattern noise, and other contaminations.The modeling challenge consisted of the following procedure: (1) theorists decide on and generate the synthetic spectra for the observers to analyze; (2) observers make their best effort at estimating the parameters of the underlying gas which produced the spectra; and (3) finally we reveal the underlying gas properties, compare the results, and revise the parameter estimation to improve agreement.In §2 we describe the data samples, in order of increasing complexity, and the process for generating them.In §2.4 we describe the absorption system modeling process.In §3 we compare the modeled properties to the actual properties, and discuss the results in §4.We conclude in §5.
Throughout the text we associate the color black with the source data and blue with the parameter estimates.Throughout the paper log refers to log 10 .

DATA GENERATION AND PARAMETER ESTIMATION
Participating theorists came from a wide variety of CGM-related backgrounds, from cosmological and idealized simulations to analytic models.Drawing on this expertise, theorists produced three mock data samples of increasing sophistication for observers to model: a sample of ion column densities for uniform clouds, a sample of multi-phase spectra intersecting one to three clouds per absorption system, and a sample of spectra drawn from a high-resolution simulation of a  ∼ 10 4 K filament embedded in a  ∼ 10 6 K halo.The metallicity, density, and temperature of all samples are shown in Figure 1.
During the production of all three samples, we used the synthetic spectrum generator (Hummels et al. 2017). is a Python package for creating synthetic absorption-line spectra from the distribution of gas in theoretical models and hydrodynamic simulations.It post-processes these data to approximate the abundance of different ions due to collisional and photo-ionization processes.The user selects an arbitrary sightline through the gas distribution to represent the path of a photon from a background quasar on its way to the synthetic telescope.Trident generates a corresponding synthetic spectrum, depositing Voigt profiles (VP) for each desired absorption line, as it steps through intervening gas parcels, accounting for the column density of the absorbers, thermal broadening, and cosmological and Doppler redshifts along the line of sight.Lastly, it processes the spectrum to account for instrumental limitations, including Gaussian noise, the spectral resolution and line spread function of the instrument, as well as adding a template quasar spectrum and Milky Way foreground.The resulting spectrum very closely mimics what is produced by real instruments like the Cosmic Origins Spectrograph (COS) aboard the Hubble Space Telescope (HST).
This was a highly collaborative project, and accordingly we highlight individual contributions in a footnote at the beginning of each section.

Column densities of uniform clouds -sample0 2
The first dataset, sample0, was a highly idealized sample consisting of 10 clouds each with uniform metallicity, density, and temperature.The data products made available to observers were total ion column densities for several relevant ions, as opposed to full spectra.The motivation for sample0 was to set a baseline for interpreting the subsequent data samples, absent complications from multiphase structure or interpreting spectra.
The physical properties of the 10 clouds were sampled from  = [0.01,1.5]  (where  = 0.014 is the total mass fraction in metals; Asplund et al. 2009),  H = [10 −6 , 0.1] cm −3 ,  = [10 4 , 10 6 ] K, and  H = [10 15 , 10 17 ] cm −2 .The values of density, temperature, metallicity, and H column density were chosen to be uncorrelated, which plays a role in the analysis ( §3.1).The sampled density, temperature, and metallicity are shown on the left side of Figure 1.We set the redshift for our mock sample to  = 0.25, typical for many CGM samples observed with the Cosmic Origins Spectrograph (COS, Green et al. 2012), and we included all ions The source properties (black) used to produce the synthetic absorption systems compared to the best estimates from parameter estimation (blue) across all three of our samples.The lines span the 16th to 84th percentiles of the parameter-estimation posteriors or source data when available.The samples range from uniform clouds to a turbulent mixing zone.For the turbulent mixing zone the thick black line spans the parameter range responsible for 68% of the H -absorption.The parameter-estimation best estimates and the source data agree to 0.1 dex in (  ,  ,  H ) for 90% of the systems, excluding the blinded sample0 estimates (red; §3.1).The grey lines connect the source clouds and the corresponding estimates. .The solid black line shows the range of temperatures and densities where gas is in cooling-heating equilibrium via photoheating.If cooling-heating equilibrium is assumed during parameter estimation then when the assumption is inaccurate the estimated metallicity tends to be inaccurate.However, in the simulation there is a large population of CGM gas that is in approximate cooling-heating equilibrium.
present in the instrument window of COS G130M and G160M at  = 0.25, including C , C , N , N , N , O , O , Si , Si , and Si .For each ion we evaluated the ion densities according to photo-ionization equilibrium, as tabulated by for single zone simulations produced with (Ferland et al. 2013) using the Haardt & Madau (2012) UV background.Note that photo-ionization equilibrium does not assume cooling-heating equilibrium (where gas is in heating-cooling balance, resulting in constant temperature), an assumption often employed in parameter estimation.To determine the total amount of absorbing gas given its properties, we calculated the length of the absorber from the H column density and its ionization state, ℓ =  H / H =  H / H .To avoid unphysically large absorbers we discarded absorbers with length > 1 Mpc (comparable to a conservative estimate of the diameter of the CGM of a MWmass galaxy).Note that the tabulated ion densities used to calculate the ionization state do not assume a size for the cloud, which is acceptable assuming the effects of self-shielding are small.Using an ionization table that includes self-shielding reveals that self-shielding has a small (typically much less than 0.1 dex) effect on ion column densities for the clouds considered here.
Prior to making the ion column densities available to the observers, we modified them to reflect uncertainty typical of similar real observations.The mock errors per cloud per ion,  log  ion , were estimated by sampling a gamma function fit to the distribution of errors estimated as part of the COS-Halos survey (Werk et al. 2013).Errors a factor of 1.5× beyond the COS-Halos range of errors were re-sampled.The actual ion column density per cloud per ion made available to observers was then calculated as log  ion, provided = log  ion +  (− log  ion ,  log  ion ), where  is a value drawn from a uniform distribution spanning ± log  ion .This is equivalent to assuming that the imaginary observers that observed the synthetic ion column densities calculated a conservative estimate of the error  log  ion , and the real ion column density per cloud per ion lies somewhere within the error bars.
We asked the observers to derive their best estimate for the metallicity, temperature, and density.Beyond providing the columns and errors on the columns, we informed the observers they may assume each sightline pierced the CGM of a MW-mass galaxy at  = 0.25, and we suggested using the Haardt & Madau (2012) UVB, because of the existence of systematics from the choice of UVB (Wotta et al. 2016(Wotta et al. , 2019;;Acharya & Khaire 2021;Gibson et al. 2022).

Spectra of multi-cloud systems -sample1 3
This sample was designed to test how well multiple clouds could be modeled.From sample0 to sample1 we transitioned from providing only total ion column densities to providing full spectra generated with (shown in Figure 3).We also moved from one discrete, uniform cloud per sightline to up to three discrete, uniform clouds per sightline.We used KDE resampling with (Kelley 2021) to draw the properties of the clouds from the distribution of properties typical for the CGM in the EAGLE cosmological simulations.The sampled metallicity, density, and temperature are shown in Figure 4. EAGLE ('Evolution and Assembly of GaLaxies and their Environments'; Schaye et al. 2015;Crain et al. 2015;McAlpine et al. 2016) is a cosmological, hydrodynamical simulation.It uses the Gadget3 "Tree-PM" scheme for gravity (Springel 2005) and the Anarchy implementation of smooth particle hydrodynamics (SPH) for fluid calculations (Schaye et al. 2015, appendix A;Schaller et al. 2015).The Ref-L100N1504 simulation, used here, was run in a (100 cMpc)3 periodic volume, with a mass resolution of 9.70 × 10 6 M for dark matter, an initial SPH particle mass of 1.81×10 6 M , and a Plummerequivalent gravitational softening length of 0.70 pkpc (at low redshift).In addition to gravity and hydrodynamics, the EAGLE simulations include radiative cooling and heating (Wiersma et al. 2009a), star formation (Schaye 2004;Schaye & Dalla Vecchia 2008), stellar feedback (Dalla Vecchia & Schaye 2012), and metal enrichment by stars (Wiersma et al. 2009b), as well as black hole seeding, growth (Rosas-Guevara et al. 2015), and AGN feedback (Booth & Schaye 2009).For more details, see the EAGLE core papers (Schaye et al. 2015;Crain et al. 2015;McAlpine et al. 2016).
To generate the distribution from which to sample the cloud properties we selected 200 random galaxies with halo masses  200 = 10 12 − 10 12.5  from the EAGLE Ref-L100N1504 volume at  = 0.27, where  200 is the halo mass such that the average enclosed density is 200× the critical density of the universe at that redshift.In the gas around these galaxies, the H fraction was calculated using fitting functions that approximate the effects of selfshielding (Rahmati et al. 2013) assuming the UVB from Haardt & Madau (2001).We extracted 200 sub-volumes around the 200 galaxies, and in each sub-volume we performed an H -mass-weighted projection of all gas with impact parameter <  200 and LOS distance < 2 200 relative to the galaxy.The projection was done along the z-axis of the simulation, which is random with respect to the galaxy orientation.For each halo, we obtained 'maps' of H column density and H -weighted temperature, density, and metallicity, with a pixel area of (3.125 comoving kpc) 2 .This corresponds to ∼ 10, 000 sightlines for a MW-mass galaxy.We then created a histogram of these sightline properties (simply weighted by sightline count) to try to capture a physically reasonable joint distribution of  temperature, density, metallicity, and H column density in UV absorbers.Note that EAGLE does not include molecular cooling, so temperatures below 104 K are rare, and the temperature of dense gas (n H > 10 −1 cm −3 ) is set by an equation of state and is therefore not physically realistic (Schaye et al. 2015).Therefore, when calculating the hydrogen neutral fraction we imposed a temperature of 10 4 K on the gas on this equation of state (following e.g., Rahmati et al. 2016).
We generated 100 sightlines of one to three clouds selected from the above distribution.We set  LOS for each cloud to be within ±150 km s −1 of  = 0.25, and randomly sampled temperature, density, metallicity, and  H from the cosmological distribution.To focus on interpreting sightlines with multiple cool clouds we only allowed a single cloud per sightline to have  > 10 5 K.We determined the ion densities and column densities as described for sample0, using an ionization table and constraining total column via  H .We used to produce spectra for both COS G130M and COS G160M, and we also provided spectra for the Mg doublet at  = 2796, 2803 Å to mimic complementary observations from the ground with a resolving power of ≈ 40,000.The  element abundances relative to other elements were set to solar, e.g.C / is solar.We applied the COS line spread function to each spectrum and included Gaussian noise with an signal-to-noise ratio of 30.Data with this low noise are unusual (as discussed in §4.1.1),so this sample provides a baseline for performance in a best-case scenario.The wavelength resolution for each spectrum was set to 0.01 Å.While we generated 100 sightlines, it is challenging to model that many.Accordingly observers sampled 10 random sightlines to focus on, excluding any damped Ly  absorbers to mitigate the effect of selfshielding.The only additional information provided to observers was that they could assume the absorbers were found in a galaxy-targeted survey with the host galaxies located at  = 0.25.

Spectra of high-resolution turbulent mixing -sample2 4
We designed the final sample to test modeling absorption systems with a realistic multiphase distribution of properties, as opposed to being described by a few discrete clouds.This choice reflects growing  2 1 2 1 1 1 3 2 2 1   n comp :   2 1 2 1 1 1 2 2 2 2 1 2 1 1 1 3 2 2 1   n comp :   2 1 2 1 1 1 2 2 2 2 1 2 1 1 1 3 2 2 1   n comp :   2 1 2 1 1 1 2 2 2 1   Figure 4. Synthetic data properties for multiple uniform clouds (sample1, black) compared to the distribution of expected values (the posteriors) from parameter estimation (blue).The wider the "violin" plots, the larger the value of the posterior at that value.As an example, for sightline 05 the parameter estimation suggests two components with metallicities log  / ∼ [1, −2] and log  / ∼ [0.5, 1.5].The numbers above each sightline indicate the actual number of source clouds compared to the estimated number of clouds.The maximum likelihood estimates (the peak values of the posteriors) typically align with the source parameters.
knowledge that instabilities may drive clouds to fragment and mix with surrounding gas (see Faucher-Giguere & Oh 2023 for a review).For sample2 we used data from a high-resolution simulation of a cool filament embedded in a hot halo (Mandelker et al. 2020).A turbulent mixing layer forms at the boundary of the filament and the hot gas, which produces gas with a complex continuous distribution of temperature, density, and metallicity.Figure 5 shows an example sightline.
The simulation used here is the M1.0_d100 simulation from Table 1 of Mandelker et al. (2020).This simulation uses the AMR code RAMSES (Teyssier 2002) to simulate a cold, dense cylindrical stream flowing supersonically through a hot, diffuse background, initially in pressure equilibrium.This represents a cold stream tracing a cosmicweb filament as it flows towards the central galaxy through its hot CGM, predicted to be the main mode of gas accretion in massive high- galaxies (Kereš et al. 2009;Dekel et al. 2009, though see Nelson et al. 2013).The initial density in the stream is  H,s = 10 −2 cm −3 and the density in the background is smaller by a factor of  = 100,  H,b = 10 −4 cm −3 .The temperature in the cold stream is set by cooling-heating equilibrium with a  = 2 Haardt & Madau (1996) UV background with no self-shielding for dense gas, resulting in  s ∼ 1.5 × 10 4 K.The hot and cold gas are initially in pressure equilibrium, which sets the temperature in the hot phase,  b ∼ 1.5 × 10 6 K.Note that at this temperature there is insufficient rest-frame UV absorption to accurately model the hot phase, which affected the parameter estimation.The stream is initialized with a metallicity of 0.03 (with  = 0.02), while the background is initialized with a metallicity of 0.1 .The individual abundances are set to solar composition, a fiducial choice for many simulations.The stream initially flows parallel to its axis with a velocity equal to the sound speed in the hot background.
The stream radius is set to  s = 3 kpc, and the simulation domain is a cube with sides  = 32 s , extending from −16 s to 16 s in all directions, with the stream axis corresponding to the -axis.We use periodic boundary conditions at  = ±16 s and outflow boundary conditions at  = ±16 s and  = ±16 s , such that gas crossing the boundary is lost from the simulation domain.At these boundaries, the gradients of density, pressure, and velocity are set to 0. The stream fluid initially occupies the region  <  s while the background fluid occupies the rest of the domain.We use a statically refined grid with resolution decreasing away from the stream axis.The resolution is highest in the region max(||, ||) < 1.5 s .Cells that lie within this region have a cell size Δ =  s /64 ≈ 47 pc.The cell size increases by a factor of 2 every 1.5 s in the  and -directions, up to a maximal cell size of Δ =  s /4 = 0.75 kpc.The resolution is uniform along the  direction, parallel to the stream axis.As shown in Mandelker et al. (2020), this resolution is sufficient to reach convergence in the evolution of the stream-background interaction.
As the simulation progresses, Kelvin-Helmholtz Instability (KHI) develops at the interface between the stream and the background, forming a radiative turbulent mixing layer around the stream where the two phases mix.For the chosen parameters, the cooling time in the mixing layer is shorter than the Kelvin-Helmholtz mixing timescale (Mandelker et al. 2020).As a result, initially hot CGM gas cools and condenses onto the stream through the mixing layer, and is entrained in the flow, causing the cold gas mass to increase with time.The snapshot used for spectra in this work is 10 stream sound crossing times into the evolution of the stream.At this time the stream is largely fragmented into small clouds, with significant mixing between phases.
To generate spectra we first used cloud-in-cell interpolation to deposit onto 20 sightlines the volume-weighted density and massweighted temperature, metallicity, and LOS velocity.The binning along each sightline was set to twice the smallest cell in the simulation, 2Δ ≈ 100pc.We selected the sightlines to maximize the variation in density, temperature, and LOS velocity along each sightline.All sightlines intersected the turbulent mixing zone at the boundary of the filament and the hot CGM, with five of them intersecting both the turbulent mixing zone and the filament core.All 20 sightlines extend half the box length and are parallel to the  direction.These are divided into four groups of five sightlines each, which intersect the  = 0 plane at four different  coordinates along the filament length.The five sightlines in each such group intersect the  = 0 plane at five different  values, extending from ±3.2 s , thus probing the turbulent mixing zone at different distances from the filament axis, with one sightline in each group passing through the filament core at  = 0. We generated spectra for all twenty sightlines, all of which show clear multiphase structure in the spectra for multiple ions.Of the twenty sightlines we provided ten to observers, chosen to span the breadth of output spectra.An example sightline is shown in Figure 5 and a summary of  LOS , ,  and  H for the sightlines is shown in Figure 10.
For the blinded data sample we changed the redshift to  = 0.13 to bring C into the observable range for the COS gratings.The simulations were run including a  = 2 UVB.However, during the initial analysis the ionization fractions were calculated in post-processing via Trident using a  = 0.13 Haardt & Madau (2012) UVB, and the observers were told to use the same UVB.The motivation for producing  = 0.13 spectra from a simulation run at  = 2 was two-fold: to avoid providing any clues regarding the origin of the blinded data, and to test the extent to which parameter estimation is successful at extracting gas properties independent of the mechanisms responsible for producing gas with a given , , and  H .For this sample, following the initial analysis we provided the observers with revised synthetic data.The revised data used a  = 2 UVB for calculating the ionization fractions, and was "observed" at  = 2.The revised spectra had a wavelength resolution of 0.05 Å, a line spread function with a FWHM of 7 km s −1 , and included the ions H , Si , Si , Si , C , C , C , O , O , N , N , Fe , Mg , and Ne .
uses absorption-line properties archived by the National Institute for Standards and Technology as a default, and we kept this default for the first two samples.For the third sample we aimed for greater consistency with the parameter estimation methodology by retrieving absorption line information from the commonly-used spectra analysis package (Prochaska et al. 2016).Between these two datasets the oscillator strengths could disagree by up to ∼ 2 dex for the lines O 922, 925, 930, and 989, N 1200, Si 1304, and S 1073.In the future the default behavior for will be to use the dataset.

Parameter estimation 5
Parameter For sample0, the initial modeling procedure involved adjustment of the three free parameters: H column density ( H ), metallicity (), and hydrogen number density ( H ). For each point in parameter space we used to calculate the temperature based on a balance of heating and cooling, and determined the column densities of all ions provided by the simulators, assuming photoionization equilibrium.A log-likelihood function was computed by summing the  Source properties are plotted with black 2D histograms, while the colors correspond to estimated components.The estimated components accurately describe the gas responsible for the vast majority of the absorption.However, gas aligned in velocity but physically separated can introduce confusion to the interpretation, especially when structure beyond the dominant absorption is loosely constrained.
chi-squared between the provided and the model column densities for all ions, accounting for the provided errors for each column density.This function was minimized to determine the distribution of the three parameters.The parameter estimation was performed using PyMultiNest (Buchner 2016).
For the revised parameter estimation of sample0 we also added in the temperature, , as a free parameter, no longer assuming a balance of heating and cooling.At temperatures significantly higher than   (see thick curve in Figure 2), the gas is predominantly collisionally ionized.The distribution of the four parameters,  H , ,  H , and , was then evaluated as constrained by the model column densities.§4.2 contains further discussion regarding equilibrium assumptions.
For the other two samples, sample1 and sample2, model spectra were provided.The shapes of the model line profiles provide an important constraint using the methods of Sameer et al. (2021) and Sameer et al. (2022).Though the parameter estimation methods employed by Zahedy et al. (2019) and Haislmaier et al. (2021) use a cloud-by-cloud approach, they do not use profile shapes, but instead use measured column densities.The shapes of the profiles, as employed here, give more leverage to separate phases that are coincident in velocity.The important transitions that were covered by the provided data included Mg , C , C , Si , Si , Si , N , and O for sample1, and additionally C , Ne , and Mg for sample2.The given spectra are inspected for absorption in various ionization transitions, and an initial VP fit was obtained for representative unsaturated transitions for a range of ionization states.The VP fit to the low ionization transitions is used to set a starting number of component locations for the modeling, and their initial positions in velocity space.
CLOUDY models are generated over the four-parameter space defined by  H = [10 10 , 10 21 ] cm −2 ,  = [10 −3 , 10 1.5 ]  ,  = [10 2 , 10 6.5 ] K, and  H = [10 −6 , 10 2 ] cm −3 .For the set of initial model clouds, a spectrum is synthesized using the superposition of all clouds.The VP for each cloud comes from the CLOUDY output at a given grid point, a Doppler parameter which is the sum of contributions of the thermal component, and a non-thermal Doppler parameter which is a fifth free parameter of the model.The velocities of each of the clouds are also allowed to vary, making the redshift a sixth parameter for each cloud.The grid of CLOUDY models is then explored for the combination of the multiple clouds to determine the parameters for which the log-likelihood function is minimized.In this case, the log-likelihood function is computed by comparing the model spectra to the provided spectra, and summing the differences pixel by pixel.If the starting number of component clouds from the low ionization VP fits do not completely account for the absorption in the higher ionization transitions, then additional components are included and the procedure is repeated.It is found, for example, that O seldom arises in the same phase as Mg .The posterior distribution of parameters for each of the clouds is then determined, again using PyMultiNest.Liang et al. (2018) used simulated spectra to show that Bayesian ionization modeling techniques could estimate properties in close agreement with the H mass-weighted averages along the LOS.This method follows earlier work (Crighton et al. 2015;Fumagalli et al. 2016), and additionally performs a careful treatment of nondetections.However, it assumes that the column density constraints for different ions are independent of each other.This presumption compels each component's observed column density to originate from a single gas cloud parcel.
However, multiphase gas may have contributions from one or more gas clouds in a single component.The CMBM approach enables more-precise constraints by using the shapes of the absorption profiles rather than only the measured component column densities.By accounting for their position in velocity relative to the Lyman series absorption, metallicities of different clouds are constrained.The constraint comes from the closest edge of the H profile if a metalline component is not centered on the H profile.Given that the  parameters of the metal lines and the H must be self-consistent, taking into account the derived non-thermal  from the optimization, the CMBM method is able to extract detailed information from the absorption profiles.

RESULTS
We discuss the per-sample results in the following subsections, while Figure 1 summarizes some of the key results.The overlap of points in Figure 1 reflects the agreement between the parameter-estimation best estimates and the source properties.For sample0 and sample1 we show one value per cloud for both the source and parameter estimations.For sample2 the source data is a distribution of clouds, and therefore we show the 16th to 84th percentiles of the H -weighted distributions compared to the H -weighted maximum likelihood estimates from parameter estimation.The parameter estimates and the source properties agree well across all three samples, and when there are disagreements, the estimated uncertainty is typically sufficiently conservative.

Column densities of uniform clouds -sample0
The left side of Figure 1 shows the results of parameter estimation for sample0.The blinded estimates (red) are the observers' first estimates given the data and no additional information.After the observers compared the blinded estimates to the revealed source properties they made adjustments to their methodology and performed parameter estimation again.The results are the revised estimates (blue).For sample0 the primary revision to methodology was removing the assumption of cooling-heating equilibrium.In addition, in the revision the same procedure was applied to all sightlines, instead of adapting the methodology for each sightline, as was done for the blind estimates.
The discrepancies between the blinded estimated properties and the source properties are driven by the assumptions made by both observers and theorists.Theorists created sample0 by randomly sampling properties from a uniform distribution for each property, assuming for simplicity the properties were independent.Subsequently, in isolation each property was characteristic of CGM gas, but in com-bination the properties described gas atypical for the CGM.This is seen in Figure 2, which compares the temperatures and densities of the source clouds to temperatures and densities typical for the CGM in a cosmological simulation.This particular data is part of the FIRE-2 public data release (Wetzel et al. 2022) and was chosen for convenience.
On the other hand, the observers modeled sample0 with assumptions informed by current knowledge of the CGM.For example, the observers tested if the column densities could be explained by cooling-heating equilibrium driven by photo-ionization equilibrium, in which case the estimated temperature is calculated from the estimated metallicity and density parameters.Under these assumptions the observers found reasonable fits to the provided column densities for sightlines 06 through 10.Observers found reasonable fits for three of the other sightlines after fixing the density to an assumed  H = 10 −3.9 cm −3 and assuming collisional ionization.In this case, the temperature can be determined, but the metallicity also depends on the density, which was not actually close to the assumed value.
In the revised modeling observers removed any assumptions of cooling-heating equilibrium, and found much better agreement with the source properties.Even still, in four of the ten sightlines (01, 02, 06, 07) the best-fit parameters resulted in failed CLOUDY runs with the error being "optically thick to electron scattering -Cloudy is not intended for this regime", which required observers to identify and scale solutions with the same ion densities but less total mass.
To summarize, the sample0 models were not realistic based on the parts of parameter space occupied by the synthetic data and the initial parameter estimation assumptions were too rigid.The lesson learned is that there will be places in the real universe where heating and cooling do not balance, and the solution that observers obtain may not be unique.This is particularly true for higher ionization gas (at  10 5 K and/or  H 10 −2 cm −3 ).Note that the effects percentiles of the posterior.The width of the violin plots and the opacity of the points are scaled logarithmically with  H , such that if the parameter estimation perfectly agreed with the data all visible MLE points would lie within the source data violins.As shown above, the strongest component(s) traces the average absorbing gas properties.The lower- H components trace some features of the underlying distribution, but arguably the lower- H structure present in the source data is not retrieved.
shown here may be stronger because only ion column densities were provided (as opposed to full spectra).

Spectra of multi-cloud systems -sample1
In Figure 3 we show an example spectrum provided to observers as part of sample1, and the VP from the parameter-estimation best fits.
The grey bands indicate areas that are masked and do not contribute to parameter estimation, either because there is a blend known to contaminate the spectrum or because there is no detected absorption in the region.For sample1 the parameter-estimation best fit was selected from models with varying numbers of clouds.In the displayed sightline, the data is best fit by two clouds, one with significant low-ion absorption (e.g.C ) and one with significant high-ion absorption (e.g.O ).In sample1 the observers did not perform a revised parameter estimation because the blinded parameter estimates agreed well with the source data, in part because cooling-heating equilibrium was not assumed.Figure 4 compares the posteriors from the parameter estimation to the properties of the source clouds.Posterior probability distributions, or posteriors for short, describe the predicted probability of the source properties having a given value.The posteriors are displayed via a violin plot, where the width of the vertical distribution at a given value scales with the posterior probability distribution at that value.Violin plots are equivalent to histograms rotated 90 • and reflected to be symmetric.In most cases the source properties reside well inside the posteriors, i.e. the posteriors are typically sufficiently extended.As with the blind modeling for sample0 some actual values lie outside the assumed range of possible values: some source clouds have  > 10 1.5  , while the parameters were estimated assuming  spans log / = [−3, 1.5] (where log / = 1.5 is the default maximum metallicity for Cloudy).However, the source  are within < 0.5 dex of the highest estimated metallicity, significantly less discrepant than blind modeling for sample0.
The most-extensive posteriors in sample1 are often associated with a hotter phase tracing the OVI for a few reasons.First, the cooling of such a gas phase resembles that of collisional time-dependent gas cooling with no external radiation, and hence is independent of density.Second, following the lessons learnt from sample0, the observers relaxed the priors such that the lowest temperature could be 10 2 K, while the source data has a lower bound of 10 4 K.The most poorly-constrained priors extended beyond the lower bound of the source data.Additional potential contributing factors to the wide posteriors include the absence of C lines in the provided spectra (C 1548, 1550 probe intermediate temperatures and are not covered by COS G130M or G160M at  = 0.25).
The modeling identifies the correct number of components in 9 of 10 sightlines.The exception is sightline 050, which has a hot, low-density, low-metallicity absorber that produces little absorption in the ions provided to the observers.

Spectra of high-resolution turbulent mixing -sample2
Figure 5 shows an example sightline (sightline 0015) passing through the simulation used to generate sample2.Figure 6 shows the resulting mock spectra, the best fit from parameter estimation, and the posteriors for derived properties.The best fit for sightline 0015 uses five clouds.Figure 7 compares the parameter-estimation posteriors for the sightline to the source property distributions.Both the source data and the posteriors are weighted by associated H column density.Integrating over the source data for any property yields the total H column density.Integrating over any property posterior for a component yields the median estimated H column for that component.The combined posterior is the result of co-adding the normalized singlecomponent distributions.As mentioned previously, the properties of cool gas are easier to estimate with UV absorption spectra than the properties of hot gas, so weighting by H focuses the comparison on the regime parameter estimation can probe best.Future work could focus on recovery of warm or hot phases by weighting by other ions.
In our analysis we typically bin the posterior distribution and use the peak as the maximum likelihood estimate (MLE).An alternative is to use a "point-estimate" MLE, i.e. the single set of parameters with the highest probability across all sets tested during parameter estimation.We do not use the point-estimate MLE because it can be noisy and not representative of the posterior, but Figure 6 shows this estimate as stars.
In sample2 the "original blinded" data provided to the observers was a  = 2 simulation "observed" at  = 0.13 ( §2.3).This choice complicated the interpretation of the parameter estimates, so the observers reran their parameter estimation on the same sightlines but now "observed" at  = 2.In our analysis we utilize the "rerun" parameter estimates, and relegate the original blinded parameter estimates to Appendix B. We emphasize that, while not strictly blinded, the observers did not revise their methodology prior to analyzing sample2 "observed" at  = 2.
The MLE of the combined distribution is usually dominated by the strongest component, and agrees relatively well with the peak values of the source data.The lower-column-density components are poorly constrained, but do enclose the source data.
Figure 8 shows for sightline 0015 the relationship between gas properties as a function of line-of-sight position vs as a function of line-of-sight velocity.The gas sightline 0015 passes through in Figure 5 shows up as ≈ 5 detectable regions in  LOS - space (the top left panel of Figure 8).This is the same number of regions as there are components derived from parameter estimation, though there is not a clear correlation between the regions and the number of components.Two of these regions are responsible for the majority of the H absorption, and the parameter estimation describes these regions with a log  H ≈ 17.2 component (yellow) and a log  H ≈ 17 component (pink).The density, temperature, and metallicity of these two strongest components are well-constrained, and consistent with the source data.The parameter estimation also correctly predicts these components as having kpc-scale pathlengths (Figure 6 lower right).The properties of the remaining, weaker absorption, components are not as well constrained but conservative enough to contain the source values ( H , , ) within ∼ 1 dex.Note that while the absorbing gas spans −7kpc  15 kpc the ( H , , ) distributions are similar throughout.To rephrase, cloud complexes have similar ( H , , ) distributions-there are not "cool" clouds with distinct properties.
Figure 9 extends the comparison between estimated parameters and source data properties from one sightline to the full sample.Each violin shape is a 1D distribution like those shown in Figure 7, rotated 90 degrees.The linear scale used here emphasizes the dominant  LOS , ,  and  H for both the source and parameter estimation.The shape of the  LOS posteriors closely traces the actual distribution of  LOS for both parameter estimations.
Figure 10 reveals the extent to which parameter estimation can capture the structure beyond the dominant components.Each violin is proportional to the logscale 1D source-data distribution, where only bins that contain  H > 10 12.5 cm −2 are shown.This value was selected because there are few parameter-estimation components with lower column densities.For example, for log  we set the thickness of the violin to zero for   H  log  < 10 12.5 cm −2 /Δ log  ≈ 10 12.5 cm −2 /0.145 ≈ 2.2×10 13 cm −2 .The points show the MLEs for each component the parameter-estimation fit is composed of, with the opacity scaled logarithmically by  H of the component.The components become completely invisible as  H → 10 12.5 cm −2 .The MLEs of the components with the highest  H contributions agree well with the properties at the peaks of the source data.However, MLEs of some components describe gas that is ∼ 1 dex cooler, denser, and more metal-poor than present in the source data.Note that the posterior percentiles of such components still overlap the source data.In the simulations this gas does not exist because the UVB provides an effective temperature floor, below which CGM gas has trouble cooling because it is photo-heated by background radiation.However, following sample0 the decision was made to perform parameter estimation without restricting the parameter space by assuming cooling-heating equilibrium.This is discussed further in §4.1.

Extent of recovery
The goal of parameter estimation is not just to estimate the average properties of absorbing gas, but also to disentangle the contributions of multiple clouds.The modeling challenge spans three regimes of "cloud structure": a single uniform cloud, multiple uniform clouds, and multiple distributions of clouds.Many quantities relevant to constraining the physics of the CGM only require cloud structure to be separated on the level of one or multiple uniform clouds.For example, metallicity can help determine if absorbing gas is pristine inflow or enriched outflow without requiring resolving many clouds (e.g.Hafen et al. 2017).On the other hand, the role complex multiphase gas plays in the physics of the CGM has been a focus of much recent research (e.g.Voit et al. 2015;Esmerian et al. 2021;Smith et al. 2023;Tan et al. 2023), and constraining this observationally may require identifying cloud structure on the level of multiple distributions of clouds.
Our analysis confirms that parameter estimation can successfully extract the properties of uniform clouds.This is apparent from the revised sample0 estimates.Parameter estimation was similarly successful at disentangling multiple uniform clouds (sample1).
For sample2 with its multiple distributions of clouds we ask whether parameter estimation can recover: a) the primary velocity components, b) the primary metallicities, c) the primary ionization states, and d) the shapes of the property distributions.Regarding capturing the bulk of the velocity structure, Figure 10 shows that in all sample2 sightlines the component best estimates span the full width of the  LOS distribution.Note that the error bars show only the 68th percentile of the component posteriors (see Figure 7 for an example of the range of the full posteriors).
Concerning metallicity, in all sample2 sightlines the metals were well-mixed, resulting in a narrow metallicity distribution peaked at log / ∼ −1.2.Parameter estimation identified at least one component consistent with this metallicity in all sightlines.However, in ∼ 6 / 10 sightlines parameter estimation also identified a significantly-contributing component with a metallicity > 0.5 dex higher or lower than the source metallicity, albeit with 16th-84th percentile uncertainties that are typically consistent with the source metallicities.
Regarding the main ionization states of distributions of clouds, the parameter estimation captures the majority of the low-ionization gas well in all three samples, but the properties of  10 4.5 K gas are more poorly-constrained.In sample1 the components with the largest uncertainties were  10 5 K gas, and the only cloud not identified had  ∼ 10 5.7 K.In sample2, all sightlines identify the  ∼ 10 4 K gas that dominates the absorption (Figure 10), and find that additional hard-to-constrain components improve the fit.Roughly half the sightlines suggest low-absorption  ∼ 10 3 K gas may be present, albeit with uncertainties 0.5 dex, which is a temperature not present in the simulation due to heating from the UVB.Regarding the hotter gas, the hot gas parameters are especially difficult to estimate in sample2 because the pathlength necessary to accrue significant absorption is longer than any sightline through the sample2 simulations.Difficulty recovering higher-temperature gas can also result from lack of access to high-ionization absorption lines such as Ne and Mg , although in sample2 the observers had access to these lines.In addition, at  10 5 K, gas cooling becomes independent of radiation field, and hence the density.Therefore, the inferred properties for high temperature gas are generally more uncertain.
Finally, regarding the shapes of the property distributions, the employed parameter estimation approximates the absorption as arising in several uniform clouds, and as such fundamentally cannot constrain distribution shape.This is an area of future work that some alternative parameter-estimation methods address (such as fitting the underlying density distribution as a power law; Stern et al. 2016).Another path to explore is to model each absorption system as a linear combination of turbulent-mixing-layer absorption profiles (Tan & Oh 2021),  ∼ 10 4 K absorption profiles, and  ∼ 10 6 K absorption profiles.

Factors affecting recovery
The velocity-space properties of the source gas significantly affected parameter estimation.All sightlines in sample2 had very similar source , , and  H distributions, similar column densities ( H ∼ 10 17 −10 18 cm −2 ), and widely varying source  LOS distributions.As such, sample2 provides a test of how line-of-sight velocities affect parameter estimation.From Figure 10 we can see that varying  LOS distributions drive significant variety in recovery of , , and  H , particularly for weak, low density, high temperature components.
The ions available for fitting determine much of how well cloud structure can be retrieved.Utilizing multiple Lyman-series lines reduces the contamination from noise during identification of coincident-in-velocity-absorption.Intermediate ionization states, especially strong ions such as C , play an important role in separating gas phases because they often have contributions from both a lower and a higher ionization transition.This is shown with the C absorption in Figure 3.
The sample2 analysis probes the regime of fractal cloud structure, which provided a challenge for parameter estimation built on uniform clouds.The gas probed in sample2 consists of both  ∼ 10 4 K and  ∼ 10 6 K gas, and turbulent mixing layers at the interface of the cool and hot gas.Analytic theory and numerical simulations predict that each turbulent mixing layer has a fractal structure (e.g.Fielding et al. 2020;Kanjilal et al. 2021;Tan & Oh 2021;Gronke et al. 2022).With a fractal structure any sightline that pierces a turbulent mixing layer will probe a distribution of temperatures and densities, regardless of the velocity resolution of the spectra.
The challenges found here are present despite performing parameter estimation under optimal conditions: for sample2 the lines available to the observers provided extensive coverage of multiphase gas, the provided data had a high signal-to-noise ratio of 30, and absorption was not contaminated by absorption associated with cosmological structures other than the source.However, some scenarios may enable better recovery of properties than the scenarios tested here, e.g. if the many small clouds have narrower profiles or are more widely distributed in velocity space than in sample2.Related, in sample0 the theorists provided ion column densities instead of full spectra.While this simplified the data it also removed important information.

Implications for existing results
In several studies, it has been found that small, dense clouds are required to explain observed absorption profiles (Rigby et al. 2002;Narayanan et al. 2008;Muzahid et al. 2018).Across all samples in the present study the density of the strongest absorbing component is not overestimated, suggesting that the high density estimates are not a common artifact of parameter estimation.Some of the components nonessential for a good fit can have overestimated best-estimates for density-in 4 / 10 of the sightlines of sample2 there exists such a component where the best estimate for the density is ∼ 1−2 dex higher than found in the source data.This suggests that the best estimate of poorly-constrained components is not good evidence for small, dense gas clouds.However, the 16th-84th percentile uncertainties of these components overlap the source density, so these components are not discrepant with the source data.We reiterate that none of the strong, well-constrained components have over-estimated densities.
Grid-based simulations such as those employed here can struggle to simulate high-density gas.This raises the question of whether the immediately-above conclusion regarding lack of density overestimates is subject to numerical issues.This is not the case in our analysis-the sharp cutoffs seen in the sample2 density distributions (e.g. Figure 10) at maximum densities of  H ∼ 10 −2 cm −3 are well below any numerical density "ceiling" of these simulations.Instead these cutoffs are driven by an effective density ceiling from photoionization and pressure equilibrium.More specifically, since the simulations included an ionizing UV background with no correction for self-shielding, gas cannot cool far below 10 4 K, even though the cooling function does allow for cooling down to ∼ 10 K. Since the simulation began in pressure equilibrium and there are no strong shocks generated during the simulation (the turbulence that develops in the mixing layer is subsonic), this implies that the density cannot increase much above  H ∼ 10 −2 cm −3 (see Mandelker et al. 2020).
In our analysis the parameter estimation excels at recovering the properties of the dominant absorbing gas but the properties of weaker components are poorly constrained, possibly due to limitations in the provided spectra.This has implications for the required resolution for generating mock observations.Because thermally unstable gas may have a fractal structure, fully resolving the gas may be effectively impossible.However, if spectra do not provide enough data to distinguish structure beyond the dominant properties then the resolution required to produce accurate mock observations may be less stringent.One suggestion for the target resolution for mock observations is the resolution at which the mass of cool gas converges (e.g.McCourt et al. 2018, though such a scale may not exist), which is still well below the resolution of most cosmological simulations.Further resolving the simulated gas may yield more complex cloud structure, but it may also be below the resolution of the observations, particularly for very cold gas (e.g.Jones et al. 2010).
Some works argue that it is possible to extract detailed information about the origin of absorbing gas with the aid of additional observations of the galaxy associated with the absorption system.For example, Péroux et al. (2013) argue that one of their identified absorption systems is dominated by gas originating in an outflow, and Péroux et al. (2017) suggest that a separate absorption system has little contribution from outflows.Making statements regarding the origin or fate of absorbing gas may require being able to identify and separate multiple clouds along a given line of sight, because sightlines are predicted to frequently penetrate multiple clouds with multiple origins (e.g.Hafen et al. 2019Hafen et al. , 2020)).Our results indicate that clouds with distinct, uniform properties can indeed be distinguished from one another successfully (Figure 4).On the other hand, sample2 is spectra of a filament falling in from the IGM and passing through a metal-enriched halo.In this case individual clouds that produce weak absorption cannot be easily identified in the provided spectra, but the average properties of the filament can be identified clearly.and these properties may allow observers to constrain the H -weighted origin of absorbing gas if constraining origin via e.g.metallicity.

Discrete uniform clouds versus distributions of clouds
Modeling absorption as arising in discrete components is in part motivated by observed profiles that are often quite symmetrical, suggesting a single region with a given density and temperature.This is especially relevant for colder clouds traced with low ionization absorption.Some components are also observed to be quite narrow (e.g.Churchill et al. 1999;Churchill & Vogt 2001), which requires cloud distributions to either be confined to specific regions of the CGM, or for separate regions to coincidentally have similar  LOS .The former is plausible: idealized simulations, analytic theory, and cosmological simulations indicate that mixing and perturbations from cold gas may be primary drivers of further cold gas growth (e.g.Nelson et al. 2020;Esmerian et al. 2021;Gronke et al. 2022;Gronke & Oh 2022;Ramesh et al. 2022;Saeedzadeh et al. 2023).This suggests that cool cloud distributions may indeed be limited to regions of the CGM with adjacent or already-present cool gas.
The most realistic among our samples, sample2, consists not of discrete clouds, but of a complex cloud structure with a distribution of properties.This is consistent with analytic theory and high-resolution simulations that predict turbulent mixing layers have fractal structure with a continuous distribution of temperatures (e.g.Tan & Oh 2021).The implication is that any fully-successful model of absorption systems should also include absorption from gas with a distribution of properties.Such models exist (e.g.Stern et al. 2016), but are not commonly employed in parameter estimation.Given a physicallymotivated parameter distribution (e.g. the temperature distribution expected in a mixing layer), these models have the potential to significantly improve the accuracy of the reproduced parameters.

Assumptions about thermal and ionization equilibrium
At the first step of our analysis the observers assumed cooling-heating equilibrium.This assumption resulted in inaccurate estimates of the parameters of sample0, though the magnitude of the errors was significantly exacerbated by the theorists providing source properties with unrealistic temperatures and densities.Removing this assumption produced very good fits to the data regardless of temperaturedensity phase-space location.In sample2, gas in the source simulation is allowed to cool below  ∼ 10 4 K, but photo-heating from the UVB provides an effective temperature floor.The estimated components with the strongest absorption have estimated  ∼ 10 4 K, consistent with the source data despite not assuming cooling-heating equilibrium.This suggests that, for modeling the majority of absorbing gas, the safest option is to forgo assumptions of cooling-heating equilibrium.However, in approximately half the sample2 sightlines the best estimates for some individual, lower- H components fall below the effective temperature floor (Figure 10), inconsistent with the source data.This may suggest that estimating the parameters of poorly-constrained clouds without requiring the clouds to exist in a physically-expected region of parameter space can result in unlikelyto-be-real estimated values.
Throughout the analysis, the participating observers and theorists assumed ionization equilibrium, with contributions from both collisional-and photo-ionization.On the data-generation side, this is a common assumption in cosmological and idealized hydrodynamical simulations, largely driven by the challenge in modeling non-equilibrium chemistry (e.g.Richings et al. 2014).However, in some cases non-equilibrium chemistry can have a significant effect on the properties of absorbing gas in the CGM, including providing one explanation for the observed bimodality in O around lowredshift galaxies (Oppenheimer et al. 2016).The effect of ionization equilibrium has also been studied in focused simulations of turbulent mixing layers (e.g.Ji et al. 2019).Note that the ionization and recombination timescales are different for every ion.It is possible that at a given (, ,  H ) some ions are ionization equilibrium while others are not.The effect of assuming ionization equilibrium on parameter estimation is a promising avenue of further research.

Absorption modeling "by hand"
There are a number of works that use a less-algorithmic, more peopledriven approach to model sightlines (e.g.Churchill & Charlton 1999;Charlton et al. 2000Charlton et al. , 2003;;Ding et al. 2003aDing et al. ,b, 2005;;Zonak et al. 2004;Masiero et al. 2005;Lynch & Charlton 2007;Misawa et al. 2008;Lacki & Charlton 2010;Jones et al. 2010;Muzahid et al. 2015;Richter et al. 2018;Rosenwasser et al. 2018;Norris et al. 2021).The methods employed by the observers in this work (CMBM; Sameer et al. 2021;Sameer et al. 2022) are based on these earlier models but with significant improvements.CMBM selects a set of possible properties, generates spectra, and repeats the process based on agreement with the observed spectra.This algorithmic approach covers a wider range of parameter space than feasible when modeling "by-hand", including quantifying the likelihood of the range of possible models.Additional beneficial features include a ∼ 10 − 100× automationenabled decrease in human hours, the ability to vary all parameters independently (including temperature), the output includes the full posterior for each cloud, and the self-consistent inclusion of all transitions, weighted by noise level.CMBM also handles blending well, an issue in real observations that is largely not present here (the only line with slight blending was Si 989 from N 989, and this was included in the modeling).

Comparison to similar analyses
Absorbing gas with the same LOS velocity is not necessarily cospatial, but instead a single absorption component can correspond to multiple clouds of gas that are physically separated, or a continuous distribution which spans a range of gas densities/temperatures.In an analysis of mock absorption spectra from cosmological simulations, Marra et al. (2022) show that only ∼ 30 − 50% of absorption components are composed of a single, contiguous cloud, and ∼ 50% of multiple clouds contributing to a single component are separated by 3 − 12 kpc depending on the ion.Further, Marra et al. (2022) find that the amount of mass in common between velocity-aligned absorption from different ions varies widely, with ∼ 1/3 of Si and C absorbers sharing > 50% of their mass.
Regarding estimates of the bulk properties of gas along a LOS, Liang et al. (2018) determined that a Bayesian method of modeling absorption lines was able to derive LOS H mass-weighted average properties.Marra et al. (2021) confirmed this finding using a blind study of spectra from cosmological simulations of a  = 1 Milky Way-type galaxy and a  = 0 dwarf galaxy.Approaching this problem from the observer side yields similar results: Sameer et al. (2021) showed that the HI-weighted average metallicities of multiple components agree with the single values that earlier modeling had found by using total column densities as constraints.However, a single valued property washes out the different physical processes that could be responsible for those different components.A number of analyses test the robustness of density and metallicity to changes in the UV background (Wotta et al. 2016(Wotta et al. , 2019;;Acharya & Khaire 2021;Gibson et al. 2022), and find that these factors are uncertain by factors of a few.

Recommendations
Based on our analysis we provide brief recommendations for observers estimating the parameters of real and mock spectra and for theorists producing mock spectra.Please note that many observers and theorists may already be following similar guidelines, independent of this analysis.

For observers
(i) Avoid restricting the parameter space of derived properties, including temperature, density, and metallicity ( §3.1).
(ii) The distribution of possible values is more informative than a single best value.Even when the best estimated value and the real value are not the same, the real value is typically among the likely values (Figure 4).
(iii) Be cautious when identifying multiple contributing cloudswhile the absorption-weighted averages can reliably be recovered, successful parameter estimation of multiple clouds depends on how distinct the cloud properties are and the per-cloud contribution to observable ion absorption (Figure 10).
(iv) Explore modeling absorption systems using physicallymotivated non-uniform systems (e.g.turbulent mixing layers) in addition to uniform clouds ( §4.2).

For theorists
(i) Avoid source properties that are overly pathological.While testing the limits of parameter estimation is useful, unusual source properties are disproportionately likely to be poorly modeled (Figure 2).
(iii) No single metric is likely to adequately describe the relationship between the source and estimated properties.Using multiple visualizations of the same data can help (e.g.Figures 5 to 10).

CONCLUSIONS
In the Halo21 absorption modeling challenge we generated "observed" column densities or spectra for three datasets, and tested the ability of parameter estimation to derive the source properties (, ,  H , and  LOS ) of the underlying gas.We started with simple source data and ramped up the complexity over the course of the challenge.For sample0 theorists provided the column densities of uniform clouds, with the properties randomly selected from (, ,  H ) typical values of the CGM, though the combined properties were not necessarily typical.For sample1 theorists provided absorption spectra of multiple uniform clouds, with properties drawn from gas distributions found in the CGM of the EAGLE cosmological simulations (e.g.Schaye et al. 2015).For sample2 theorists provided absorption spectra of a high-resolution simulation of a  ∼ 10 4 K filament embedded in a  ∼ 10 6 K halo (Mandelker et al. 2020).The sample2 spectra pierced turbulent-mixing zones, and subsequently had a complex cloud structure.For all three samples the observers estimated the gas parameters via cloud-by-cloud, multiphase, Bayesian ionization modeling (CMBM) methods (Sameer et al. 2021;Sameer et al. 2022).CMBM enabled observers to systematically explore the parameter space to identify the number of clouds and their properties that provided an acceptable fit to the data.Our conclusions are as follows.
(i) Best estimates: Across all three samples the maximumlikelihood estimates from parameter estimation agree well with the bulk properties of the absorbing gas (Figure 1), provided temperature and density are allowed to vary independently as part of the fitting process ( §3.1).
(ii) Uncertainty estimation: The inferred range of feasible gas properties is broad for weakly-absorbing components.The source parameters typically lie within the 16th to 84th percentiles of the posterior distribution (also Figure 1), suggesting errors are typically sufficiently conservative.
(iii) Recovery of detailed cloud structure: Parameter estimation successfully recovered the number and properties of multiple uniform clouds (Figure 4).Parameter estimation captured the one or two dominant absorption components in the turbulent-mixing scenario, however the parameters of additional absorbing gas were poorly constrained (Figure 10).Including multiple additional clouds beyond the dominant ones nevertheless did improve the fit, consistent with the true existence of complex cloud structures in the source.
(iv) Highlighted factors affecting parameter estimation: The imprint of the source line-of-sight velocity distribution on absorption spectra can strongly affect recovery of , , and  H ( §4.1.2).Ionization state also plays an important role, with better recovery of lower-ionization,  ∼ 10 4 K gas.
There are a number of avenues for future work.As briefly discussed in §4.1.3,provided the average temperature, density, and metallicity distributions are well-converged, hydrodynamic simulations may only need to surpass the "resolution" of the observations and parameter estimations to produce sufficiently realistic mock spectra.To rephrase, if the simulations are converged with respect to bulk properties it may not be necessary to further resolve the gas structure, as spectra may not provide enough information to extract that structure.However this warrants further analysis.
Reasonable agreement in ion column densities does not necessarily signal that the modeled properties are consistent with the actual properties ( §3.1, Appendix A).However, this result is found under unusual circumstances-observers were only provided with ion column densities rather than spectra, and the source data lay in a region of parameter space not typically occupied by CGM gas.A promising future direction may be to evaluate spectra goodness-of-fit as a predictor of accuracy and uncertainty in , , and  H .Such an analysis would be complementary to repeating the mock data challenge on additional data sources with properties differing from common parameter estimation assumptions, e.g.sources such as cosmological simulations or a density power law.
Finally, as discussed in §4.2, models composed of multiple nonuniform clouds have the potential to play an important role in future parameter estimation.The width of the violin plots and the opacity of the points are scaled logarithmically with  H , such that if the parameter estimation perfectly agreed with the data all visible MLE points would lie within the source data violins.The MLE of the strongest component typically aligns with the peak of the source distribution, but can be off by 1 dex for  and  H , especially for the blinded estimates.The best fit for parameter estimation consistently includes lower-column-density components with properties not found in the source data.
Figure1.The source properties (black) used to produce the synthetic absorption systems compared to the best estimates from parameter estimation (blue) across all three of our samples.The lines span the 16th to 84th percentiles of the parameter-estimation posteriors or source data when available.The samples range from uniform clouds to a turbulent mixing zone.For the turbulent mixing zone the thick black line spans the parameter range responsible for 68% of the H -absorption.The parameter-estimation best estimates and the source data agree to 0.1 dex in (  ,  ,  H ) for 90% of the systems, excluding the blinded sample0 estimates (red; §3.1).The grey lines connect the source clouds and the corresponding estimates.

Figure 2 .
Figure 2.Comparison between estimated and actual properties of sample0 in the context of the temperature-vs-density distribution of CGM gas in a FIRE-2 cosmological simulation (black background distribution; scales with log of mass and includes satellite ISM).The solid black line shows the range of temperatures and densities where gas is in cooling-heating equilibrium via photoheating.If cooling-heating equilibrium is assumed during parameter estimation then when the assumption is inaccurate the estimated metallicity tends to be inaccurate.However, in the simulation there is a large population of CGM gas that is in approximate cooling-heating equilibrium.

Figure 3 .
Figure3.The source spectrum (grey steps) and the parameter-estimation best-fit spectrum (dashed black) for sightline 076 in sample1.The data are best fit by a two-cloud model, and the Voigt profiles of the individual components are shown in dark blue and red.The dark blue curve traces the phase that produces the bulk of the H absorption, and the red curve traces the phase that produces O absorption.The spectrum wavelength was converted to line-of-sight velocity in this image, and the velocity of each component is indicated by a vertical dash.The grey bands show the portion of the spectrum with no influence on parameter estimation.

Figure 5 .
Figure 5. Density, temperature, metallicity, and line-of-sight velocity in a slice of the simulation used to generate sample2 (Mandelker et al. 2020).The dashed white line shows the location of one of the sightlines (sightline 15) forward-modeled to produce mock spectra.The simulation includes a turbulent mixing zone, and shows highly complex cloud structure.A grid with 1 kpc spacing is overlaid in white to help the viewer gauge cloud size.

Figure 6 .Figure 7 .
Figure6.Mock spectra for the sightline shown in Figure5, best fit absorption profiles found by observers, and the parameters of the best fits.The absorption along this sightline is best fit by five components.The stars in the right panel indicate the maximum likelihood estimate (MLE) according to best fit across all iterations tested during parameter estimation, which is subject to noise.The MLE displayed elsewhere in the analysis traces the peak of the posterior distribution.

Figure 8 .
Figure 8. Physical-and velocity-space cloud structure for the sightline shown in Figures 5, 6, and 7.The top left panel shows the relationship between position along the line of sight () and the line-of-sight velocity of the gas.Source properties are plotted with black 2D histograms, while the colors correspond to estimated components.The estimated components accurately describe the gas responsible for the vast majority of the absorption.However, gas aligned in velocity but physically separated can introduce confusion to the interpretation, especially when structure beyond the dominant absorption is loosely constrained.

Figure 9 .
Figure9.The distributions of  LOS ,  ,  , and  H for the synthetic source data (black) compared to the full posteriors from parameter estimation (blue).The width of the violin plots scales with  H .The parameter estimation accurately describes the properties of the gas responsible for the majority of the absorption.

Figure 10 .
Figure10.The distributions of  LOS ,  ,  , and  H for the synthetic source data (black) compared to the best estimates for individual components comprising the parameter estimation.Each point is the maximum-likelihood estimate (MLE) of an individual component, and the associated bars enclose the 16th to 84th percentiles of the posterior.The width of the violin plots and the opacity of the points are scaled logarithmically with  H , such that if the parameter estimation perfectly agreed with the data all visible MLE points would lie within the source data violins.As shown above, the strongest component(s) traces the average absorbing gas properties.The lower- H components trace some features of the underlying distribution, but arguably the lower- H structure present in the source data is not retrieved.

Figure A1 .Figure B1 .Figure B2 .
Figure A1.Error in properties of the absorbing gas vs error in column densities for sample0.The x-axis shows average log-space difference between column densities across all ions, weighted by the provided uncertainty.The original models fit the data decently (mean | log(   , modeled /   , actual ) | < 0.3), but do not accurately recover the properties of the absorbing gas (e.g.| log( H, modeled / H, actual ) | 1).The revised models (which do not assume coolingheating equilibrium) recover the properties of the gas much better (e.g.| log( modeled / actual ) | 0.1), and typically fit the data to mean | log(   , modeled /   , actual ) | 0.05.