Exploring the Dependence of Gas Cooling and Heating Functions on the Incident Radiation Field with Machine Learning

Gas cooling and heating functions play a crucial role in galaxy formation. But, it is computationally expensive to exactly compute these functions in the presence of an incident radiation field. These computations can be greatly sped up by using interpolation tables of pre-computed values, at the expense of making significant and sometimes even unjustified approximations. Here, we explore the capacity of machine learning to approximate cooling and heating functions with a generalized radiation field. Specifically, we use the machine learning algorithm XGBoost to predict cooling and heating functions calculated with the photoionization code Cloudy at fixed metallicity, using different combinations of photoionization rates as features. We perform a constrained quadratic fit in metallicity to enable a fair comparison with traditional interpolation methods at arbitrary metallicity. We consider the relative importance of various photoionization rates through both a principal component analysis (PCA) and calculation of SHapley Additive exPlanation (SHAP) values for our XGBoost models. We use feature importance information to select different subsets of rates to use in model training. Our XGBoost models outperform a traditional interpolation approach at each fixed metallicity, regardless of feature selection. At arbitrary metallicity, we are able to reduce the frequency of the largest cooling and heating function errors compared to an interpolation table. We find that the primary bottleneck to increasing accuracy lies in accurately capturing the metallicity dependence. This study demonstrates the potential of machine learning methods such as XGBoost to capture the non-linear behavior of cooling and heating functions.


INTRODUCTION
Galaxy formation involves many interacting processes, of which the primary one is the gravitational collapse of baryonic gas into the potential wells of dark matter halos after the decoupling of baryons and photons.Since baryonic gas can provide thermal pressure support, the rate at which the gas can dissipate energy (via cooling) is a critical factor in determining the density and temperature at which the gravitational collapse stops (Rees & Ostriker 1977).Cooling and heating functions are a traditional way of describing how the internal energy of the gas changes due to radiative processes (e.g.Cox & Tucker 1969;Sutherland & Dopita 1993;Lykins et al. 2013;Wang et al. 2014), and determine the thermal evolution of gas (e.g.Dalgarno & McCray 1972;Gnat & Sternberg 2007).Coupled with other relevant processes, cooling and heating functions help determine the overall evolution of the gas (e.g.Martínez-Serrano et al. 2008;Richings et al. 2014;Galligan et al. 2019;Romero et al. 2021).Hence, cooling and heating functions are important to theoretical modeling of galaxy formation (see the review of Benson 2010).For example, the comparison between the gas cooling time and the gravitational ★ E-mail: dbrobins@umich.edufreefall time introduces characteristic scales where the two are equal (e.g.Rees & Ostriker 1977;Silk 1977;White & Frenk 1991;Kauffmann et al. 1993).Cooling also regulates how efficiently additional gas can accrete onto a (proto)galaxy (e.g.Binney 1977;Bertschinger 1985;Cole et al. 1994;Croton et al. 2006;Brooks et al. 2009), and the radiation emitted by the accreting gas (Fardal et al. 2001).Gas mass can be lost from a (forming) galaxy due to heating by ionizing radiation from outside the galaxy (Okamoto et al. 2008).
Given an atomic and molecular composition, and using values for relevant atomic properties, the equilibrium populations of various ionization states and energy levels in a gas can be computed (e.g.Spitzer 1962;Arnaud & Rothenflug 1985;Ferland 1993Ferland , 2009)).These populations can then be used to determine the cooling and heating rates of the gas (e.g.Cox & Tucker 1969;Dalgarno & Mc-Cray 1972;Sutherland & Dopita 1993;Ferland et al. 1998).In the absence of external sources of ionizing radiation, these populations are set by collisional ionization equilibrium (CIE) and become wellknown and relatively simple functions of gas density, metallicity, and temperature only (Cox & Tucker 1969;Sutherland & Dopita 1993).
An incident radiation field can change the distribution of ionization states and energy level populations, and hence can alter cooling and heating functions from the CIE limit (e.g.Wiersma et al. 2009; Gnedin & Hollon 2012;Galligan et al. 2019;Robinson et al. 2022).Computing cooling and heating functions for an arbitrary radiation field is a highly complex and computational expensive effort, requiring major development efforts and sophisticated software, in particular photoionization codes (see the review of Kallman 2001) such as Cloudy (Ferland et al. 1997(Ferland et al. , 1998(Ferland et al. , 2013(Ferland et al. , 2017;;Chatzikos et al. 2023).Photoionization codes require a comprehensive database of atomic data related to ionization and recombination, such as XSTAR (Bautista & Kallman 2001).Photoionization codes have been developed for a variety of geometries, such as a spherical cloud with an illuminating source at the center (Kallman & McCray 1982), a slab illuminated from either side (Dumont et al. 2000), a cone illuminated from the apex (Kinkhabwala et al. 2003), pseudo-3D geometries with non-extended sources (Morisset et al. 2005), and fully general 3D geometries (e.g.Ercolano et al. 2003;Wood et al. 2004;Baes et al. 2005).
In this paper, we aim to accurately predict cooling and heating functions for an atomic gas in the presence of an incident radiation field, which may include a local component in addition to the extragalactic background.Local contributions can dominate the radiation field inside galaxies and in the circumgalactic medium (Draine 1978).
Since gas cooling (and heating) is a vital ingredient in galaxy formation, cooling and heating functions need to be included in numerical simulations of galaxy formation.However, the full calculation of level and ionization state populations is much too complex (and, hence, slow) for modeling cooling and heating functions on the fly in galaxy formation simulations.Hence, for simulation use, the cooling and heating functions must be approximated with a less computationally intensive method.One traditional approach to this is pre-computing cooling and heating functions with Cloudy on a grid of relevant parameters, and interpolating between table values in the simulation (e.g.Kravtsov 2003;Smith et al. 2008;Hopkins et al. 2011;Vogelsberger et al. 2014).Cooling and heating tables for gas in CIE generally include dimensions of temperature, density, and metallicity (e.g.Cox & Tucker 1969;Sutherland & Dopita 1993;Smith et al. 2008).The same table dimensions can be used for gas with a constant ionizing background at fixed redshift (e.g.Hopkins et al. 2011).Incorporating a spatially constant UV background adds one additional dimension, the redshift  (or radiation field strength), to the tables (e.g.Kravtsov 2003;Robertson & Kravtsov 2008;Smith et al. 2017;Gutcke et al. 2021;Schaye et al. 2023).In particular, various simulations incorporate the redshift-dependent cooling tables of Wiersma et al. (2009), including simulations described in Thomas et al. (2009); Schaye et al. (2014);McCarthy et al. (2017).More recent simulations, such as those of Gutcke et al. (2021) and Schaye et al. (2023) incorporate the cooling tables of Ploeckinger & Schaye (2020).These tables include an interstellar radiation field with a fixed frequency dependence and normalization depending on the gas column density, in addition to a spatially constant UV background.Tables with more dimensions can also be constructed for gas illuminated by an incident radiation field which includes local contributions from a synthesized stellar spectrum and a quasar-like power law (Gnedin & Hollon 2012).
Interpolation tables are not the only approach to approximating cooling and heating functions.One alternate approach is simplified chemical networks which follow only the most relevant atoms and molecules, which can be evaluated on-the-fly (e.g.Anninos et al. 1997;Grassi et al. 2011;Richings et al. 2014;Salz et al. 2015;Bovino et al. 2016).Other works use Cloudy to calculate cooling and heating functions for gas properties sampled from a numerical galaxy formation simulation of interest (e.g.Robertson & Kravtsov 2008;Wiersma et al. 2010;Romero et al. 2021;Robinson et al. 2022).
These 'true' cooling and heating functions can then be used to train a neural network to predict cooling and heating functions (at least for gas properties which are reasonably consistent with those seen in the training data) (e.g.Grassi et al. 2011;Galligan et al. 2019).
Cooling and heating functions depend on several gas parameters, as well as the incident radiation field, which is in general a function of frequency.The effect of the incident radiation field on the gas can be described by the photoionization rates for all possible ionization states of elements in the gas.Since an interpolation table in very large dimension is computationally impractical, we need to find a smaller number of photoionization rates that are still able to capture the dependence of cooling and heating functions on the incident radiation field.This is an example of the general class of dimensionality reduction problems.Such problems are common in machine learning, and several machine learning techniques exist to approach these problems (see reviews by Zebari et al. 2020;Jia et al. 2022).
Several additional factors which we do not include in this paper can also impact gas cooling and heating, such as cosmic rays, dust, and molecules.Cosmic rays can ionize gas particles, heating the gas.Dust grains contribute to the heating function via photoheating, and to cooling via collisions between gas particles and dust grains (Ferland 1993).Dust also impacts the cooling function indirectly, via the accretion of gas-phase metals onto dust grains.In general, this accretion will modify the relative abundances of metals (Dwek 1998).Note that UV radiation fields can destroy dust grains through sublimation (Guhathakurta & Draine 1989).Hence, low incident ionizing radiation is needed to have significant dust mass.Molecules contribute to heating via photodissociation, and cooling via vibrational and rotational line emission (Richings et al. 2014).In a hydrodynamic simulation, additional numerical approximations for the contributions of molecules, dust, and cosmic rays to the total cooling and heating functions could be added to the atomic cooling and heating approximations described in this paper.
In this work, we use the machine learning algorithm eXtreme Gradient Boosting (XGBoost, Chen & Guestrin (2016)) to model the dependence of cooling and heating functions on gas properties and various radiation field parameters.We train XGBoost models on tabulated cooling and heating functions calculated with Cloudy (Ferland et al. 1998).XGBoost is known to perform well on this type of tabular training data (Shwartz-Ziv & Armon 2022;Grinsztajn et al. 2022).We also explore the SHapley Additive eXplanation (SHAP, Lundberg & Lee (2017)) values (Lundberg et al. 2018;Lundberg et al. 2020) of our model inputs, which we use to determine what radiation field parameters are most predictive of cooling and heating functions.The XGBoost algorithm has been used in the astrophysics literature for a variety of tasks in classification (e.g.Tamayo et al. 2016;Ivanov et al. 2021;Lucey et al. 2023;Luo et al. 2022), such as identifying pulsar candidates in gamma ray data (e.g.Mirabal et al. 2016;Wang et al. 2019) and separating stars, galaxies, and quasars in photometric data (e.g.Jin et al. 2019;Chang et al. 2021;Fu et al. 2021;Golob et al. 2021;Li et al. 2021;Nakoneczny et al. 2021;Hughes et al. 2022); and regression (as here, see also Calderon & Berlind 2019;Hayden et al. 2022;Machado Poletti Valle et al. 2021;Dang et al. 2022;Andrae et al. 2023), including for predicting quasar redshifts from photometry (so called "photo-z"s, e.g.Jin et al. 2019;Nakoneczny et al. 2021;Kunsági-Máté et al. 2022).Some works include the use of SHAP values to analyze feature importances (e.g.Machado Poletti Valle et al. 2021;Heyl et al. 2023).
In this paper, we first describe our methodology, including the Cloudy calculations we use to train and evaluate XGBoost models, the input features we use and their distributions and correlations, how we determine which combinatos of radiation field features to use, and how we compare and evaluate XGBoost models.Next, we present results for trained models compared to each other and the interpolation table in Gnedin & Hollon (2012), on both the training data grid and an independent off-grid sample.Finally, we discuss our conclusions and potential future directions.

Model Data Input
In order to train machine learning models to predict cooling and heating functions, we use computations from the photoionization code Cloudy (Ferland et al. 1998).The training data used here is very similar to that used in Gnedin & Hollon (2012), with some additional information included.
For a given parcel of gas, we define cooling and heating functions through equation ( 1): where  is the thermal energy density of the gas,   =  H +4 He +. . . is the number of density of baryons (protons and neutrons),  is the gas temperature, and Γ, Λ are, respectively, the heating and cooling functions.The prefactor  2  is included because of the importance of collisional two-body processes in radiative cooling and heating.In the limit of collisional ionization equilibrium (CIE), where there is no external ionizing radiation field on the gas, and accounting for two-body processes only, Λ and Γ are independent of   .However, even in CIE, multi-electron processes such as Auger ionization and dielectronic recombination can contribute to the cooling and heating functions when the gas contains metals (Ferland et al. 1998), introducing   dependence to Λ and Γ.Here, we consider the more general case of an external radiation field, and account for other relevant physical processes beyond two-body collisions.Hence, Λ and Γ also depend on   , metallicity, the incident radiation field, and may depend on other physical characteristics -this complex dependence is captured by the dots in equation (1).
The input data we use consists of Cloudy computations of Λ and Γ evaluated on a grid of values for the gas properties from Gnedin & Hollon (2012).The relevant gas properties used are temperature , hydrogen number density   (since Cloudy uses the hydrogen number density   as its density parameter instead of the baryon density   , Ferland et al. 1998), metallicity , and 4 parameters describing a model for the incident radiation field: where  = ℎ/(1Ry) is the photon energy in Rydbergs.The parameter  0 in equation (2) accounts for the overall amplitude of the radiation field.The expression in square brackets contains two terms: a stellar spectrum   and an AGN-like power law with slope .The parameter   quantifies the ratio of the stellar and AGN-like contributions to   at 1 Ry.The stellar spectrum   is given by: and is a fit to stellar spectra from the Starburst99 spectral synthesis library (Leitherer et al. 1999).The radiation field   in equation ( 2) Table 1.The parameters describing the training data table and the values they take.Here,  MW = 10 6 photons cm −2 s −1 ster −1 eV −1 (Gnedin & Hollon 2012).The  = 0 case actually uses  = 10 −4  ⊙ .
is also subject to attenuation due to neutral hydrogen and helium with optical depth   given by: where  0 describes the overall optical depth,  HI,0 is the photoionization cross section of neutral hydrogen at its ionization threshold, and  HI (),  HeI () are the photoionization cross sections for neutral hydrogen and helium, respectively, as functions of frequency.A loose justification for this parameterization choice is that a random place in the universe may be irradiated by both stellar and AGN radiation, and the strongest nearby source may happen to lie behind a sufficiently dense absorbing cloud.More complex scenarios with multiple nearby sources of approximately equal strengths can obviously be imagined, but they are unlikely to be common.
In total, the input data we use is fully described by 7 parameters: ,   , ,  0 ,   ,  0 , and .We define our training data on a grid of these parameters, described in Table 1.
The 4 radiation field parameters ( 0 ,   ,  0 , and ) are only welldefined for a radiation field with the functional form given by equation (2), and cannot necessarily be determined for a given radiation field from a galaxy formation simulation.Since photoionization of atoms in the gas play an important role in radiative cooling and heating, we choose to represent the radiation field via the specific photoionization rates   defined as: where   is the number density of photons with frequency  (Gnedin & Hollon 2012).We explore 58 different such   , including rates for all the ionization states of magnesium (MgI − MgXII) and iron (FeI − FeXXVI).We also include the photodissociation rate of molecular hydrogen  LW in order to sample the radiation field at UV energies below the threshold for hydrogen ionization.Since the subsequent ionization states of chemical elements usually have increasing ionization thresholds, ionization rates of all ionization states of a given element sample a wide range of photon energies and hence may serve as good proxies of a radiation field spectrum.These   sample the radiation field across different wavelength ranges, and are determined both by the minimum photon frequency required for the particular ionization and the shape of the ionization cross section   () for higher frequencies.The minimum and maximum values attained given the radiation field parameter ranges from Table 1 are given for rates of particular interest for our analysis in Table 2.

XGBoost
For this work, we use the gradient-boosted tree algorithm XGBoost.
XGBoost uses an ensemble of trees, where the trees are trained  Grinsztajn et al. 2022).Given the tabular setup of our training data as described in Section 2.1, XGBoost is an appropriate choice of machine learning model.XGBoost is also well-suited to the highdimensional input data described in section 2.1 because the structure of regression trees means that the splitting at any node of any tree is determined by the value of only one feature.Furthermore, XGBoost includes a hyperparameter which limits the fraction of the input features used for each tree (Chen & Guestrin 2016).An additional reason for the choice of XGBoost over other machine learning models is the availability of tools to compute feature importances, as discussed in section 2.7 below.We train XGBoost regression models on a GPU using the tree method gpu_hist.Explanations of the XGBoost hyperparameters we varied from their default values can be found in Appendix A.

Training Data Preparation
The true cooling and heating functions Λ and Γ vary over several orders of magnitude on the training data, so we train our XGBoost models to predict log Λ and log Γ as functions of the gas temperature , density   , metallicity , and photoionization rates   .Since we only have very sparse sampling in / ⊙ , we do not use metallicity as an input feature for our XGBoost models.Instead, we train separate models at each of the 5 metallicity values in Table 1, and interpolate between model predictions (with the input features fixed) at intermediate metallicities.This interpolation is discussed further in Section 2.8.We explored incorporating the metallicity  as an additional XGBoost feature, but found that we could predict cooling and heating function more accurately by combining fixedmetallicity XGBoost models with interpolation in metallicity.This is somewhat unsurprising, since XGBoost models perform a piecewise constant interpolation in each input feature (Chen & Guestrin 2016).With only 5 data points in metallicity, we can outperform this by choosing a manual fitting function with some knowledge of the expected behavior.
We utilize gas temperature log (/K) and density log (  /cm −3 ) as input features for all of our XGBoost models.To describe the radiation field   , we use photoionization rates   to capture how the incident radiation field impacts various elements in the gas.As suggested by the structure of the interpolation table in Gnedin & Hollon (2012), we expect to need at least 4 distinct   to adequately describe the radiation field dependence.However, all rates   scale linearly with the radiation field amplitude  0 .Hence, to keep our features as uncorrelated as possible, we use the photodissociation rate of molecular hydrogen  LW as a reference rate.That is, we use  LW as an input to all of our XGBoost models, and scale all other photoionization rates as   / LW to remove the overall amplitude.The radiation field features we use are log ( LW /cm 3 s −1 ) and various sets of log (  / LW ) for  ≠ LW.
As seen in Table 1 and Table 2, these features have large, and very different, ranges.For example 1 ≤ log (/K) ≤ 9 and −6 ≤ log (  /cm −3 ) ≤ 6.To avoid this, we linearly rescale all features to be between 0 and 1 for the training data, using MinMaxScaler() from the preprocessing module of scikit-learn (Pedregosa et al. 2011).The resulting distributions for 6 log (  / LW ) features are shown in Fig. 1.

Principal Component Analysis of Rates
The 58 photoionization rates   we consider have similar ionization thresholds and frequency dependence for their absorption crosssection,   ().This means that many of the wavelength ranges sampled overlap with each other.As such, we should expect many of the   to be highly correlated with each other.As discussed in Section 2.3, we always use log ( LW /cm 3 s −1 ) as a feature to encode the overall radiation field amplitude  0 , and consider the scaled rates log (  / LW ) for  ≠ LW as candidate features.There are still 57 of these scaled rates, with highly overlapping wavelength ranges and similar absorption cross sections.An example of this can be seen visually in the right-hand column of Fig. 1, where the feature distributions of log (  / LW ) for  = CVI and NaXI appear very similar by eye.
In order to measure how strongly correlated the photoionization rate features are, we evaluate the correlation matrix of log (  / LW ),  ≠ LW for the data described in Section 2.1.We use the Spearman correlation coefficient, so that the correlations would be the same regardless of whether or not we take the logarithm of the scaled rates (since the logarithm is a monotonically increasing function).The correlation matrix (for the same scaled rate features log (  / LW ) as Fig. 1) is shown in Fig. 2. Note the high Spearman correlations of 0.95 or above for several pairs of features, and that the Spearman correlation is positive for all pairs.
To limit both the number of photoionization rate features we need to include and the correlations between them, we utilize tools from principal component analysis (PCA).In particular, we determine the eigenvalues   and normalized eigenvectors ì   of the 57 × 57 Spearman correlation matrix described above.We find that the principal components of this correlation matrix depend non-trivially on almost all of these rates.That is, the correlation matrix is close to degenerate, and the principal components are poorly constrained.For this reason, we depart from PCA and instead rank each log (  / LW ) by a "relative importance"   defined as the sum of the magnitude of the -th component of each eigenvector ì   ( ), weighted by the corresponding eigenvalue: As defined by equation ( 6), the relative importance for a given scaled rate is essentially a weighted average of the contribution of the scaled rate to each principal component of the correlation matrix, where the weights are the eigenvalue of each principal component.However, since we do not normalize the sum in equation ( 6), only comparisons between different   are physically meaningful, and not the magnitude of any individual   .In particular, a scaled rate with a larger relative importance   , on average, contributes more to more important principal components of the correlation matrix.Limiting the scaled rates we use as features for our XGBoost models to those with sufficiently large relative importance actually limits the number of scaled rates utilized in our analysis.In contrast, even selecting only the principal component with the largest eigenvalue would technically require the use of all 57 scaled rates, due to the near-degeneracy of the correlation matrix.
We show the relative importance,   , in Fig. 3.We choose to select all rates with relative importance   larger than that of  = FeII (i.e.left of the dashed line in Fig. 3), using a natural break in the distribution of   .Note that all the photoionization rates for iron, and most magnesium rates (with the exception of MgI − III and MgV) have very low relative importance   and are excluded from the remainder of our analysis.
This leaves 21 scaled rates, which we include as features in a set of XGBoost models at each metallicity.We perform a feature importance analysis, discussed in Section 2.7, on these models to guide further downselection of the set of scaled rate features.We use these models as a baseline comparison for models incorporating fewer scaled rate features.

How to Define an Error?
To train and quantitatively evaluate the performance of our XGBoost models, we must define the error in a predicted cooling or heating function.We generally define this as: where F = Λ or Γ.
In order to train our XGBoost models, we must choose a 'loss  6) for scaled rates log (  / LW ), in decreasing order.For simplicity, only the 35 scaled rates with the highest relative importance are shown.We use the 21 rates with the highest significance (those to the left of the vertical dashed line) in our analysis, as all other rates have similarly small significance.
function' to minimize on the training set (which is described below in section 2.6).This loss function compares the XGBoost predictions to the true values in the data table described in Section 2.1, averaged over a set of predictions.For this purpose, we use the mean squared error (MSE): where the average is over all points in the specific data set in question.However, we must also keep in mind that for any set of model predictions we will have a distribution of errors Δ log F , which cannot be completely characterized by the MSE in general.Instead, we can consider the cumulative error distribution function (> Δ log F ), which is defined by:

Number of points with error above Δ log F
Total number of points In particular, the cumulative error distribution function defined in equation ( 9) may include very large "catastrophic errors".While it may have a minimal effect on the overall MSE, minimizing both the frequency and magnitude of these catastrophic errors is crucial for robust prediction of cooling and heating functions.

Model Training
For each set of features, we must perform model validation (hyperparameter optimization), training, and testing for both the cooling function and heating function at / ⊙ = 0, 0.1, 0.3, 1, 3. That is, we validate, train, and test 10 different XGBoost models for each set of features.
After calculating the photoionization rates, evaluating the cooling or heating function at the desired metallicity, and scaling the features as described above, we split the data into validation, training, and test sets.Note that we scale the features before splitting, so that the minimum and maximum values used to scale each feature could be in any of the testing, training, or validation data.To split the data, we use train_test_split() from the model_selection module of scikit-learn.We split 20% of the data as the testing set.Of the remaining 80% of the data, we use 10% for hyperparameter validation (i.e.8% of the total data).Our hyperparameter tuning procedure is described in Appendix A. The remaining 72% of the data is used for model training using hyperparameters obtained from the validation step.These fractions are chosen so that the set used for model training contains the majority of the data, the testing set is large enough to be reasonably qualitatively representative of the feature distributions in the entire data set, and the separation of the validation set does not significantly reduce the amount of data available for model training.We do not expect small numerical changes to these fractions to significantly affect our results, as long as they satisfy these qualitative constraints.We use the same train-test-validation split for all models.
Using a suitable set of hyperparameters found as described in Appendix A, we then retrain XGBoost models on the 80% training set (but without splitting off any of this data for hyperparameter validation).We also retrain these XGBoost models on the entire grid for interpolating in , where we have independent testing data at intermediate values of .

Feature Importances with SHAP Values
To analyze which scaled photoionization rates have the most impact on the cooling and heating functions, we calculate SHAP values (Lundberg & Lee 2017;Lundberg et al. 2018;Lundberg et al. 2020), using models with the 21 scaled rate features described in Section 2.4 and retrained with optimal hyperparameters using 80% of the training data table, as described in Section 2.6.We evaluate SHAP values for all features on 500 randomly sampled points from the 20% test set using the Python package shap (Lundberg & Lee 2017;Lundberg et al. 2020).
The formula for computing the exact SHAP value for a particular model prediction and input feature comes from game theory, and describes the difference between the true model prediction and what would be expected without including the feature in question.These feature importances have several desirable properties, including exactly matching the model prediction in question, and always giving a value of 0 for a feature that does not affect the model prediction.However, the exact computation involves evaluating machine learning models analogous to the one we wish to explain using every possible subset of features.Since this is computationally unfeasible for complex models with many features, SHAP values must often be approximated in practice (Lundberg & Lee 2017).The specific approximation for tree-based models implemented in the shap package which we use here is described in Lundberg et al. (2020).
To compare the importance of the various features, we evaluate the mean absolute SHAP value (since SHAP values can be positive or negative) across the 500 random test points.Ranking these (in decreasing order) gives a description of the most important features (for a given XGBoost model predicting cooling or heating, at a given metallicity).The 7 most important features by this ranking are shown for all models trained using the 21 rates from the PCA described in section 2.4 in Fig. 4. For all these models, temperature log (/K) and radiation field amplitude (described by log ( LW /cm 3 s −1 )) are the two most important features.For the cooling function at metallicities / ⊙ ≥ 0.1, density log (  /cm −3 ) is the third-most important feature by SHAP value, but it is less important at / ⊙ = 0 and for the heating function at all metallicities.The three most important scaled photoionization rates, along with their SHAP value importance rank, are shown in Table 3 for the cooling function and Table 4 for the heating function.Several scaled rates (MgII, CI, HI, HeI) have high SHAP values at all metallicities for both cooling and heating, while CaXX has high SHAP values for heating at all metallicities except for / ⊙ = 0.At the highest metallicities (/ ⊙ = 1, 3), the rate CVI reaches high SHAP values.Since the models in Fig. 4 are trained at fixed metallicity values / ⊙ = {0, 0.1, 0.3, 1, 3}, these trends hold only in the metallicity range 0 ≤ / ⊙ ≤ 3, and should not be extrapolated beyond this range.Within this metallicity range, we also do not determine the functional form of the dependence of SHAP values on metallicity, and instead describe only trends of increasing or decreasing SHAP values with metallicity.
To limit the data size of the XGBoost models, and to compare with the 4 photoionization rates used in the interpolation table in Gnedin & Hollon (2012) ( LW ,  HI ,  HeI ,  CVI ), we choose to use the 3 scaled rates with the highest SHAP values to train new models (as above, the feature log ( LW /cm 3 s −1 ) is always used).For cooling and heating at each metallicity, we train models using the 3 most important scaled rate features from our SHAP value analysis in Fig. 4, Table 3, and Table 4 using the procedure described in Section 2.6.We also select several candidate sets of 3 scaled rates that appear frequently in Table 3, Table 4, or both.4) CI ( 4) HI ( 4) MgII ( 4) CI ( 4) HI ( 5) HI ( 5) MgII ( 5) HI ( 5) HeI ( 5) MgII ( 6) MgII ( 6) CI ( 6) CVI ( 6) Table 3.The three most important scaled rates, by mean absolute SHAP value, for 500 randomly sampled test points on cooling function models with the 21 scaled rate features from Section 2.4, retrained on 80% of the training data table using the optimized hyperparameters found via the procedure in Section 2.6.3) HeI ( 4) CaXX ( 4) CaXX ( 4) MgII ( 4) MgII ( 4) HI ( 5) HeI ( 5) HeI ( 5) ArXVIII ( 5) CVI ( 5) Table 4. Same as Table 4, but now for the heating function.

Interpolation in Metallicity
To approximate Λ and Γ at values of  besides the sample points, we must interpolate between the predictions of the 5 fixed- models evaluated at the relevant feature values.Following Gnedin & Hollon (2012), we perform a quadratic fit to Λ and Γ at the five metallicity values / ⊙ = {0, 0.1, 0.3, 1, 3}.However, we found that their quadratic fit sometimes yields unphysical negative predictions for Λ or Γ.
For this reason, we trained our XGBoost models to predict Λ and Γ at the five metallicities for which we have cooling and heating functions calculated with Cloudy, and perform a quadratic fit between XGBoost predictions that is constrained so as to never predict negative values on the metallicity domain of interest, i.e. 0 ≤ / ⊙ ≤ 3. Specifically, we fit a quadratic in metallicity: We find the best fit values of the parameters ( , , ) by numerically minimizing the loss function: The  2 term is the error between F (Γ or Λ) predicted by XGBoost, and the quadratic fit, calculated as: where the sum is over   / ⊙ = {0, 0.1, 0.3, 1, 3}.The constraint function ( , , ) is defined by: where again   / ⊙ = {0, 0.1, 0.3, 1, 3}.This constraint ensures that the quadratic fit is bounded by the minimum and maximum XGBoost predictions at fixed metallicity.Our XGBoost models actually predict log F , so the values of F predicted by XGBoost are always positive.Hence, the constraint ( , , ) also ensures that the quadratic fit always yields positive values for the metallicity range where we make predictions (0 ≤ / ⊙ ≤ 3). 3) for models trained with 21 scaled rate features selected from the PCA analysis described in Section 2.4.Cooling function models are shown in blue in the left column, with heating function models in red in the right column.The rows are for models at the 5 metallicity values in the training data.Only the 7 features with the largest mean absolute SHAP value are shown.Note that the full descriptions of the features are log(/K) for , log( H /cm −3 ) for  H , log( LW /cm 3 s −1 ) ) for LW, and log(  / LW ) otherwise.
To evaluate the performance of our XGBoost models combined with a quadratic fit in metallicity, we use a sample of off-grid data, which is randomly sampled in log(  /cm −3 ), log( 0 cm −3 /  / MW ), log   , log  0 , and  for the same ranges as in Table 1, and in the range [−3, log 3] for log(/ ⊙ ).We use 8666 such random samples, evaluated at the same 81 values of log(/K) as in Table 1, for a total of 701,946 evaluation points.

Comparison on training data
We first compare the performance of our XGBoost models at fixed metallicity to the interpolation table in Gnedin & Hollon (2012) on the training data described in Section 2.1.To make this comparison, we use XGBoost models trained on the entirety of this data set, as Gnedin & Hollon (2012) uses that same set to construct their interpolation table.To understand how accurately the XGBoost models can generalize outside of the training data, we also consider the performance of models trained on only 80% of the training grid (as described in Section 2.6) on the 20% test set withheld from model training.
To quantify the performance of these models, we consider the cumulative distribution function of errors (Δ > log F ) defined in equation ( 9).This is just the frequency of errors at least as large as Δ log F .We also consider the mean squared error (MSE, defined in equation ( 8)).
At each fixed metallicity, we compare the interpolation table in Gnedin & Hollon (2012) with 3 different XGBoost models: a model trained using the same inputs as Gnedin & Hollon (2012), i.e. using the scaled photoionization rates log(  / LW ) for  = {HI, HeI, CVI}; a model trained using the 21 scaled rates identified from the PCA analysis described in Section 2.4; and a model trained with the top 3 scaled rates from the SHAP value analysis of Section 2.7.
Fig. 5 shows the cumulative distribution function of training errors for the interpolation table in Gnedin & Hollon (2012) and the three types of XGBoost model described above for the cooling and heating function at / ⊙ = 0 and 1 (for the same plots at all metallicities in the training data, see Appendix B).The MSEs for all these models are shown for the cooling function in Table 5 and heating function in Table 6.
From Fig. 5, Fig. B1, Table 5, and Table 6, we see that all 3 types of XGBoost model greatly outperform the interpolation table in Gnedin & Hollon (2012) in all cases, with lower MSEs and a smaller cumulative error distribution function for all values shown on the plots (10 −2 < Δ log F < 2).The MSE on the 20% of the training table withheld from model training is always larger than that for analogous models trained on the entire table by a factor of order unity.However, these MSE values are still always smaller than for the interpolation table in Gnedin & Hollon (2012), and the cumulative error distribution function is similarly lower at the same error values.
The comparisons between the 3 different types of XGBoost model are not so clear.In particular, we can see that the cumulative error distribution functions for the XGBoost models (on the entire training table) sometimes intersect in Fig. 5 and Fig. B1 (see the cooling function at / ⊙ = 0 in the upper left of both figures), meaning that we cannot unambiguously identify a single 'best-performing' XGBoost model.

Comparison on off-grid data
Next, we compare the performance of our XGBoost models combined with the interpolation in metallicity described in Section 2.8 to the interpolation table in Gnedin & Hollon (2012) on the sample of off-grid data (also described in Section 2.8).Note that we are now evaluating the XGBoost models on points not seen in model training, and introducing interpolation between individual XGBoost predictions.Both of these factors decrease model performance.Since the rate of change of the energy density of a gas cloud is proportional to the heating function minus the cooling function, as seen equation ( 1), the physical evolution will usually be dominated by whichever of the cooling or heating function is larger.Hence, we will also consider the error in the maximum of the cooling and heating function at each point in the evaluation data.
For the interpolation in metallicity, we must select a common set of scaled rate features to use in the XGBoost models at each  value.Here, we consider two different sets of 3 scaled rates, {CaXX, HI, CVI} and {CI, HI, NaXI}, in addition to the set {HI, HeI, CVI} that makes XGBoost take the same inputs as the interpolation table in Gnedin & Hollon (2012).The choice of these sets of scaled rates is motivated by considering rates which have high SHAP value importance across the range of metallicity values, and by the heuristic of needing a photoionization rate with X-ray threshold to capture the quasar part of the ionizing radiation field (parameterized by the power-law slope ).
In the left panel of Fig. 6, we plot the cumulative error distribution function on the off-grid evaluation data described in Section 2.8 for the interpolation table in Gnedin & Hollon (2012) and our constrained quadratic interpolation between XGBoost models using the same inputs.In the right panel, we compare the cumulative error distribution function for our constrained quadratic interpolation between XGBoost models with the three different sets of scaled rates described above.In Table 7, we show the MSEs corresponding to each of the cumulative error distribution functions in Fig. 6.
From the left panel of Fig. 6, we see that our XGBoost models plus constrained quadratic interpolation can reduce the frequency of the largest cooling and heating function errors compared to the interpolation table in Gnedin & Hollon (2012), using the same inputs.From Table 7, we see that these XGBoost models (using the scaled rates {HI, HeI, CVI} yields comparable MSE values to the interpolation table in Gnedin & Hollon (2012), while (by construction) completely removing the occurrence of 'catastrophic errors' for cooling or heating.However, this improvement does not affect the prediction of the maximum of the cooling or heating functions.Indeed, both the cumulative error distribution function at each error value shown (10 −2 < Δ log F < 2) and the overall MSE are higher for XGBoost than Gnedin & Hollon (2012).
The performance of all three XGBoost calculations in the right panel of Fig. 6 are similar.Neither rate set in this figure produces significant improvement in the performance for the maximum of the cooling or heating function compared to {HI, HeI, CVI}, in either MSE or the cumulative error distribution function.However, the scaled rate set {CaXX, HI, CVI} results in lower MSE, and slightly lower cumulative error distribution in the high-error tail for cooling and heating individually.Hence, we are able to produce some improvement in overall performance with judicious choices of scaled rate features.
Comparison of Fig. 5 and Fig. 6 suggests that, for the training data grid used in this project and Gnedin & Hollon (2012), there is a bottleneck in model performance on off-grid data due to the interpolation in .This is perhaps unsurprising, given that we have only 5 values of  to interpolate over, and that we should expect the true dependence of Γ and Λ on  to be more complicated than a quadratic.

CONCLUSIONS AND DISCUSSION
In this paper, we train XGBoost models to predict cooling and heating functions at the fixed metallicities / ⊙ = {0, 0.1, 0.3, 1, 3}, using the same pre-computed Cloudy tables as Gnedin & Hollon (2012).We perform a constrained quadratic fit (with the input features held constant) between these XGBoost models to predict cooling and heating functions at arbitrary metallicity.Using model features selected from a principal component analysis and SHAP value feature importances, we compare both fixed metallicity performance and performance at arbitrary metallicity with results from the interpolation tables in Gnedin & Hollon (2012) in order to assess the potential of XGBoost as an alternative approach for evaluating cooling and heating functions in the presence of generalized radiation fields.We summarize our findings as follows: • Using the same photoionization rates as Gnedin & Hollon (2012) (  = LW, HI, HeI, CVI) our XGBoost models are able to significantly reduce the frequencies of errors 0.01 < Δ log F < 2, as well as the mean squared error, at all 5 fixed metallicities.This can be seen in the comparison between solid and dashed thick lines in Fig. 5 and Fig. B1.
• Using an eigenvalue and eigenvector analysis of the correlation matrix of scaled photoionization rates, we identified 21 particularly significant scaled photoionization rates (see Fig. 3).
• For models trained using these 21 rates, we identified the most important features using SHAP values (see Fig. 4 Gnedin & Hollon (2012) and the constrained quadratic fit described in Section 2.8 between XGBoost models trained with the same scaled rates (HI, HeI, CVI).Right panel: Comparison of the same constrained quadratic fit between XGBoost models trained with three different sets of scaled rates.Table 7. Mean squared errors on the off-grid data set described in Section 2.8 for the models in Fig. 6.For the GH12 column, the values in parentheses are the fraction of 'catastrophic errors' on the evaluation set where the prediction is negative (so Δ log F = ∞).We remove these points from the MSE calculation.
reduction with other rate sets suggested by a combination of SHAP values and physical heuristics, such as {CaXX, HI, CVI} (see Fig. 6).
The reduction in the frequency of cooling or heating function errors Δ log F ≥ 1 that we achieve at arbitrary metallicity (see Fig. 6 and Table 7) is much more modest than that seen for our individual fixed metallicity XGBoost models (see Fig. 5, Table 5, and Table 6).This is likely due to the small number of samples we have to perform a fit in metallicity (the five / ⊙ values in Table 1).
Our metallicity fit, seen in equation ( 10), assumes that the dependence of cooling and heating functions on metallicity (with all other parameters fixed) is quadratic.If the only processes involved were two-body collisions, then this assumption would be exactly correct for the cooling and heating functions computed by Cloudy.However, the Cloudy training data includes additional processes involving multiple electrons (Ferland et al. 1998).Furthermore, even small errors in our XGBoost predictions at fixed metallicity imply that the dependence of these predictions on metallicity will not exactly match the metallicity dependence of the Cloudy calculations they approximate.Also, we introduced an additional unphysical constraint on our quadratic fits in order to prevent negative predicted cooling and heating functions.These all suggest that a different, more complex metallicity fitting function is needed to accurately capture the metallicity dependence.
Hence, this paper suggests future work running Cloudy at intermediate metallicity values between the five used here, in order to implement a more flexible and accurate fit in metallicity and better understand the metallicity dependence of cooling and heating functions.This additional Cloudy data could also enable further exploration of how the importances of various radiation field parameters changes with metallicity.

Figure 3 .
Figure 3.The relative importance   defined by equation (6) for scaled rates log (  / LW ), in decreasing order.For simplicity, only the 35 scaled rates with the highest relative importance are shown.We use the 21 rates with the highest significance (those to the left of the vertical dashed line) in our analysis, as all other rates have similarly small significance.

Figure 4 .
Figure 4. Mean absolute SHAP value for 500 randomly selected points in 20% of the training grid withheld from model training (see Section 2.3) for models trained with 21 scaled rate features selected from the PCA analysis described in Section 2.4.Cooling function models are shown in blue in the left column, with heating function models in red in the right column.The rows are for models at the 5 metallicity values in the training data.Only the 7 features with the largest mean absolute SHAP value are shown.Note that the full descriptions of the features are log(/K) for , log( H /cm −3 ) for  H , log( LW /cm 3 s −1 ) ) for LW, and log(  / LW ) otherwise.

Figure 6 .
Figure 6.Cumulative distribution function of errors Δ log F on the evaluation data described in Section 2.8.Left panel: Comparison of results from the interpolation table inGnedin & Hollon (2012) and the constrained quadratic fit described in Section 2.8 between XGBoost models trained with the same scaled rates (HI, HeI, CVI).Right panel: Comparison of the same constrained quadratic fit between XGBoost models trained with three different sets of scaled rates.

Table 2 .
(Chen & Guestrin 2016)lues of log(  / LW ) for some rates of interest attained in the training data tabledescribed in Table 1.. sequentially to predict the residual between the true value and the prediction of the previous trees(Chen & Guestrin 2016).XGBoost is known to perform very well, including outperforming deep learning models, on tabular training data (Shwartz-Ziv & Armon 2022;

Table 5 .
).Many rates have consistently high importance for both cooling and heating across all five metallicity values, especially MgII, CI, HI, and HeI.The rate CaXX has high importance for the heating function, but not cooling.The rate CVI has low importance at low metallicities, but emerges as Mean squared errors (MSEs) for the cooling function models shown in

Table 6 .
Same as Table5, but now for heating function models.