A fast neural emulator for interstellar chemistry

Astrochemical models are important tools to interpret observations of molecular and atomic species in different environments. However, these models are time-consuming, precluding a thorough exploration of the parameter space, leading to uncertainties and biased results. Using neural networks to simulate the behavior of astrochemical models is a way to circumvent this problem, providing fast calculations that are based on real astrochemical models. In this paper, we present a fast neural emulator of the astrochemical code Nautilus based on conditional neural fields. The resulting model produces the abundance of 192 species for arbitrary times between 1 and 10 7 years. Uncertainties well below 0.2 dex are found for all species, while the computing time is of the order of 10 4 smaller than Nautilus. This will open up the possibility of performing much more complex forward models to better understand the physical properties of the interstellar medium. As an example of the power of these models, we ran a feature importance analysis on the electron abundance predicted by Nautilus. We found that the electron density is coupled to the initial sulphur abundance in a low density gas. Increasing the initial sulphur abundance from a depleted scenario to the cosmic abundance leads to an enhancement of an order of magnitude of the electron density. This enhancement can potentially influence the dynamics of the gas in star formation sites.


INTRODUCTION
Interstellar matter is continuously processed describing a life-cycle that connects stellar and interstellar environments.Interstellar matter forms filamentary clouds which fragment and collapse to form dense cores, where protostars are born.These protostars develop accretion disks, these latter being at the origin of planetary systems.The life-cycle is closed when, at the end of their lives, stars inject a good fraction of their mass back to the interstellar medium, either through gentle winds, in the case of low-mass stars, or as supernovae explosions, in the case of massive stars.
Molecules serve as excellent tracers of the physical and chemical evolution of interstellar matter (Jørgensen et al. 2020).To date, more than 200 different molecules have been identified in space 1 .The molecules observed in space go from the simplest hydrides, such as OH, OH + , CH, and CH + , present in diffuse interstellar clouds (Gerin et al. 2016(Gerin et al. , 2019)), to the complex organic molecules, some of them with prebiotic relevance, that are detected in pre-stellar cores, young protostars, and planet-forming disks ( 2022; Crockett et al. 2015;Rivière-Marichalar et al. 2020;Öberg et al. 2023).In environments that are heavily obscured at ultraviolet, optical, and near-infrared wavelengths, the observation of molecules becomes the only means to get access to the physical and chemical processes at work.Observations have demonstrated that the chemical nature of molecules varies drastically as objects evolve and their environment change.Some molecules are routinely used as chemical diagnostics of critical aspects such as, e.g., the gas ionization degree (see, e..g., Caselli et al. 1998;Bergin et al. 1999;Fuente et al. 2019;Pineda et al. 2024), the local flux of cosmic rays (Favre et al. 2017;Cabedo et al. 2023), the gas kinetic temperature (Hacar et al. 2020), and the density (Navarro-Almaida et al. 2021).However, the reliability of molecular diagnostics used to determine the properties of the interstellar medium depends on our ability to understand the interstellar chemistry and the predictive power of our chemical models.Complex chemical models including thousands of reactions are routinely used in astrophysics (Le Petit et al. 2006;Taquet et al. 2012;Ruaud et al. 2016;Holdship et al. 2017a;Laas & Caselli 2019).These codes are time-consuming and computationally expensive, which limits the total number of computed models.This limitation challenges the interpretation of the data provided by current telescopes which allows to simultaneously observe tens, even hundreds, of molecular lines (Fuente et al. 2014(Fuente et al. , 2019;;McGuire et al. 2020;Cernicharo et al. 2023), and map large areas in affordable amounts of time (Friesen et al. 2017;Gratier et al. 2021).
Molecules have an important influence on the heating and cooling of interstellar gas and, therefore, on the cloud structure of the interstellar medium in galaxies.Molecular clouds are the site of ongoing star formation to a large extent because molecules are efficient gas coolers, thereby diminishing thermal support relative to self-gravity.Moreover, molecules suppress the degree of ionization in these clouds and, therefore, decouple the gas from any supporting magnetic fields.Molecules, thus, play a key role in the formation process of stars and planets.Steady state models have been widely used to interpret molecular observations in the last decades.Many of the processes linked to the formation of a star, such as bipolar outflows and disk formation, have timescales (10 3 −10 5 yr) smaller than the time required for most species to reach a steady state in their abundances (∼1 Myr).Coupling dynamics and chemistry is therefore of paramount importance to progress in our understanding of the physical and chemical evolution from the initial cold pre-stellar core to a protoplanetary disk, where planets are formed.
Numerical simulations have become a valuable tool to study, in a realistic manner, the evolution of interstellar matter during the star formation process.This dynamical evolution has been investigated following different approaches: analytical one-dimensional models (see, e.g., Priestley et al. 2018), two-dimensional semi-analytical models (Drozdovskaya et al. 2014(Drozdovskaya et al. , 2016)), and three-dimensional radiation-magnetohydrodynamical (R-MHD) simulations (see, e.g., Commerçon et al. 2012;Masson et al. 2016;Hincelin et al. 2016;Zhao et al. 2018a;Hennebelle et al. 2020;Gong et al. 2020Gong et al. , 2021;;Lebreuilly et al. 2023).MHD simulations have become the stateof-the-art approach for the analysis of the gravitational collapse of prestellar cores into protostellar objects probing the decisive role of the magnetic field in the process of collapse, fragmentation, and disk formation.A few of these state-of-the-art simulations already incorporate a reduced chemical network that provides important parameters for MHD calculations, such as the gas ionization degree (Marchand et al. 2016;Zhao et al. 2018b).However, full coupling of the MHD evolution of the gas with a full chemical reaction network with thousands of reactions, is challenging with the current computational capabilities (Hsu et al. 2021(Hsu et al. , 2023)).As a feasible approximation, an alternative method based on the usage of virtual particles has been used.These particles store the key parameters required for the computation of the chemistry (namely temperature and density) and a full state-of-the-art chemical model is then used to predict their chemical composition as a function of time (Hincelin et al. 2016;Coutens et al. 2020;Navarro-Almaida et al. 2024).Yet, these calculations are expensive in terms of computing time since a large number of particles is needed to sample all the protostellar components and only a handful of cases have been completed, thus far.Consequently, a full comparison of these 3D chemo-MHD simulations with observations has not been done yet, so that their predictive power remains largely unexplored.
Astrochemistry has become a necessary tool for understanding the interstellar medium.Chemical models consist of a system of coupled ordinary differential equations (ODE) that represent a chemical network.This system is resolved for a set of input parameters which include the gas physical conditions (density, temperature), cosmic ray ionization rate per H atom, incident UV field, visual extinction, grain properties, initial chemical composition, and elemental abundances.Computing large grids of complex chemical models, varying input parameters to describe all astrophysical environments, and predicting the chemistry for each time step is unaffordable.It is therefore interesting to develop fast chemistry emulators which allow to address some problems in a fast way, which is precisely the purpose of this work.We are not the first team to explore emulators in chemistry.The vast majority of accurate enough emulators are based on neural networks, although other methods like random forest regression have been applied (Bron et al. 2021). de Mĳolla et al. (2019) used a neural network to emulate astrochemical models in equilibrium after 10 7 years obtained with UCLCHEM (Holdship et al. 2017b).Holdship et al. (2021) trained a neural network to predict the change in abundance during a small time step, allowing the user to evolve the chemistry for long times by iteratively applying the neural network.Grassi et al. (2022) and Sulzer & Buck (2023) used autoencoders to project large chemical networks into a low-dimensional latent space, where the chemistry is potentially evolved faster.A similar approach is followed by Maes et al. (2024) for emulating the chemistry in dynamical environments.Branca & Pallottini (2023) proposed neural networks as ansatzs for the time evolution of the chemistry and trained them using the ODEs in the loss function.Along similar lines, Branca & Pallottini (2024) used Neural Operators to emulate the chemical network (Lu et al. 2021).
Despite the amount of work done in the field, we think our architectural decisions, training strategy, and size of the chemical network can be of impact in the community.Moreover, we have trained this emulator to explore the impact of the sulphur elemental abundance on the physics and chemistry of the interstellar medium.In this paper, we focus on the gas ionization degree, an essential parameter for the dynamics of interstellar clouds.

CHEMICAL MODEL
We performed the abundance calculations using Nautilus 1.1 (Ruaud et al. 2016), a three-phase model, in which gas, grain surface and grain mantle phases, and their interactions, are considered.It solves the kinetic equations for the gas-phase and the solid species at the surface of interstellar dust grains.Species from the gas-phase can physisorb at the surface of dust grains with an energy that depends on the binding energy of surface species (Wakelam et al. 2017).They can be released to the gas phase by thermal desorption (Hasegawa et al. 1992), grain heating induced by cosmic-rays (following Hasegawa & Herbst 1993, impact of UV photons (photodesorption, Ruaud et al. 2016), exothermicity of surface reactions (chemical desorption, Minissale et al. 2016), and sputtering by cosmic-rays (following Wakelam et al. 2021).Detailed descriptions of the non-thermal desorption processes included in Nautilus can be found in Wakelam et al. (2021).The ice surface phase is composed of the first few monolayers of species on top of the grains (four in our case), while the rest of the molecules below these surface layers represents the bulk of the ice.The refractory part of the grains (below the bulk) is chemically inactive.The species on the surface can desorb into the gas phase, contrary to the species in the bulk.Only sputtering by cosmic-rays can directly desorb species from the bulk ice.The version used in this paper allows to calculate the dust temperature using the expression of Hocuk et al. (2017) and the cosmic ray ionization rate per H 2 molecule () as a function of A  following the expresion of Neufeld & Wolfire (2017) as described in Clément et al. (2023).However, we have disabled these options to create our database since we were interested in determining the variation of the chemical abundances with these parameters.

NEURAL EMULATOR
Learning emulators for computationally heavy numerical procedures via neural networks is a well-established technique in the field of machine learning.The idea exploits the well-known fact that neural networks are universal function approximators (Cybenko 1988) to train them to learn the mapping between the input (physical parameters) and the output (the result of the numerical procedure).In theory, they can learn the mapping between input and output spaces with arbitrary precision provided that the network is large enough and the training set is sufficiently rich.In practice, this is often limited and only a limited precision is reachable.
In our model, the solution to the kinetic equations give the abundance of a set of 192 species as a function of time .The abundance is also dependent on the external physical conditions, that are characterized with the following parameters: the temperature (), the initial gas density (  ), the cosmic ray ionization rate ( H 2 ), the sulfur abundance ([]) and the scaling factor for the surrounding UV flux ().As a consequence, our aim is to end up with a neural network (|M, ) that approximates the abundance for molecule M at time , given the physical conditions summarized in the vector  = (,  H ,  H 2 , [] , ).

Architecture
A crucial ingredient to end up with a succesful emulator is to find a suitable architecture.We propose to emulate the interstellar chemistry using a neural field (NF), also known as implicit neural representations (INR) or coordinate-based representations (CBR) (see Mildenhall et al. 2020;Bintsi et al. 2022;Jarolim et al. 2022;Asensio Ramos 2023).These are fully connected (FC) neural networks that are used to map coordinates on the space (or space-time) to coordinate-dependent field quantities.NFs have desirable properties for serving as emulators over other parametrizations.They are very efficient in terms of the number of free parameters.Most important, they produce continuous (and differentiable 2 ) fields, which can then be evaluated at arbitrary inputs.Finally, they have a strong implicit bias, favoring specific signals.
In our case, we use a NF in which the input is the time at which we want to predict the abundance of a specific species and the output is the abundance of that species.Obviously, the NF needs to be conditioned on the physical parameters of the cloud and on the specific species.We show later how we do this conditioning technically.A typical NF gives the abundance at time  of a certain species M and for a set of physical conditions  by the following function composition: where we explicitly state that each layer of the NF is conditioned on the vector , that contains information about the physical conditions and the species, and the specific species to be predicted, M. Note that ) refers to the function composition operation.Typically, each layer of the NF is given by: where W  and b  are weight matrices and bias terms, respectively, while  is a non-linear activation function.In our case, we choose the rectified linear unit (ReLU; Nair & Hinton 2010) as the activation function.
2 Or, more precisely, sub-differentiable in case of using activation functions that are not differentiable everywhere.Light purple color refer to the Fourier encoding layer explained in Eq. ( 5).
The conditioning vectors enter into the fully connected layers following the FiLM paradigm.
The conditioning is done via two FC neural networks, one that deals with the physical parameters and the other one with the species.The input of the first neural network is the vector , all of them properly normalized as described in Section 3.2.The input of the second neural network is a representation of the molecule in a 1hot encoding3 (i.e., a vector of dimension equal to the number of species, with a 1 in the component corresponding to the species and 0 in the rest).The outputs of each one of these neural networks are two scaling vectors,   phys and  M sp , and two bias vectors,   phys and  M sp .We follow the ideas behind feature-wise linear modulation (FiLM; Perez et al. 2017), so that each layer of the NF is modified using the previous vectors according to: where  M, =   phys ⊕  M sp and  M, =   phys ⊕  M sp are obtained via concatenation (⊕ is the vector concatenation operator).One of the advantages of the FiLM approach to conditioning is that it is very easy to implement and it does not require a significant modification of the NF architecture.Additionally, all layers of the NF receive conditioning information, which facilitates the adaptation of the NF to the complex dependence of the output on the physical conditions.
NFs, with the previous definitions, are known to suffer from the so-called spectral bias (Rahaman et al. 2019;Wang et al. 2021), which prevents them from learning high-frequency functions.We follow Tancik et al. (2020) and solve this problem by first passing the time coordinate through a Fourier feature mapping that increases the  dimensionality from 1 to 2 (with  the number of Fourier features): where We also include the time coordinate itself, to allow for outputs with very low-frequency time dependencies.The matrix elements of the matrix B ∈ R ×1 are randomly sampled from a normal distribution with zero mean and standard deviation   .After a non-exhaustive parameter search, we choose   = 2 and  = 512, which produces a Fourier feature mapping of dimensionality 1025.
Although we have found success with adding Fourier features and using ReLU activation functions, other solutions like the SIRENs architecture (Sinusoidal Representation Networks; Sitzmann et al. 2020), which uses periodic activation functions, can also be found in the literature and could be explored for this problem in the future.
The graphical representation of the final model is depicted in Fig. 1.The two conditioning neural networks contain three hidden layers with 512 neurons.They produce the two conditioning vectors   and   with 256 elements each.After concatenating them, they are used in the NF emulator to condition every hidden layer of the NF with 512 neurons.The final output is a single number, the abundance of the molecule for the specified physical conditions.The total number of trainable parameter is 4.83M, with 2.62M associated with the NF, and 1.05M and 1.15M associated with the conditioning networks for the physical conditions and the species, respectively.Thanks to the interpolation capabilities of the NF, we can predict the abundance of any molecule at any arbitrary time between 1 and 10 7 yr.Additionally, given that the NF is sub-differentiable, one can seamlessly compute derivatives of the abundance of any arbitrary species with respect to both the physical conditions and time.This can be of great potential interest for the analysis of the chemistry and also for using the chemistry emulator in other downstream tasks.

Training set
The training set is built by solving the interstellar chemistry using Nautilus for a set of physical conditions and a total of 192 species.We add atomic and molecular species, as well as ions and ices.The chemistry network is solved for 64 times equispaced in log space between 1 and 10 7 yr.A total of 150k models are run for building the training set.Roughly 16% of these models do not pass a hard timeout threshold of 5 min and are then discarded4 .This leaves us with 126k models, which is still a sufficiently large amount of models for training.The total number of training examples is, therefore ∼126k×192, which amounts to 24.2M.The space covering of the training models is depicted in Fig. 2, where we witness a region of a reduced number of models for low densities and high temperatures, where Nautilus has problems converging the kinetic equations.
The physical conditions are selected randomly in the intervals defined in Table 1.The temperature is sampled uniformly without transformation, while the rest of the parameters are sampled uniformly in log space given that they are positive quantities and span several orders of magnitude.A subset of 10% of the models is left out of the training set and used as a validation set.A final set of 5k models is obtained for testing purposes and it is used to compute the final performance of the emulator.We point out that other methods to sample the physical conditions, such as Latin hypercube sampling, could have been used, but we have found that random sampling is sufficient for this problem.Additionally, it allows us to seamlessly extend the training set outside the currently considered regions.
The input physical conditions (in linear or log scales) are normalized to the interval [−1, 1] when used as input to the conditioning neural networks.Likewise,  is also normalized to the same interval after being transformed to log scale.The output abundances are similarly treated in log scale.However, since a large variability is found in the abundances, we normalize each time bin and molecule separately.This normalization is stored and applied in reverse when the model is used in production.

Training
The training proceeds by minimizing the mean squared error between the output of the NF and the abundance for all considered molecules.The initial learning rate is set to 3×10 −4 and it is slowly reduced to 3×10 −5 using a cosine annealing schedule (Loshchilov & Hutter 2017).The optimization is carried out using the Adam optimizer (Kingma & Ba 2014) with a batch size of 1024 for 100 epochs.We utilize an NVIDIA GPU 4090 with 24 GB of memory.Given that the training takes around 24 hours, we did not do an exhaustive hyperparameter search.The evolution of the training and validation losses during training are displayed in Fig. 3, showing no evidence of overtraining.We found the saturation of the validation loss but with a tiny difference with respect to the training loss.Although after epoch ∼ 40 the decrease in the validation loss is relatively slow, we preferred to continue training for 100 epochs.The reason is that the diagnostics shown in the following sections carried out during intermediate epochs showed that the training was still improving.
In order to adapt the complexity of the NF to the specific problem, we opt for starting the optimization taking only into account the low frequency components in the Fourier feature mapping, while gradually turning on higher frequency components during training.To this end, we extend the approach used by Lin et al. (2021) as follows.We reorder the rows of the matrix B in terms of their ℓ 2 norm, with low frequencies appearing first.We then weight the Fourier features as follows: where the -th element ( = 0, 1, . . ., −1) of the weighting function is given by and  ∈ [0, ] is slowly increased during training from 0 to .We choose to linearly increase  from 0 to  during the first 25000 batches, and set  =  (no weighting) for the rest of the training.This procedure allows the NF to start learning low frequencies first and then gradually incorporate high frequencies.We have found that this procedure gives more consistent and faster results during training than simply starting with all frequencies active in the Fourier encoding from the beginning.

VALIDATION
Once the emulator is trained, we evaluate the performance using the 5k Nautilus models computed specifically for this purpose.Figure 4 shows the time evolution of five randomly selected models.We only display a subset of species of certain relevance, although the output is obtained for all 192 species.The time evolution of the abundance computed with the neural emulator is displayed with solid lines, while the time evolution computed with Nautilus is shown with dashed lines.A simple visual comparison demonstrates that the emulator is able to correctly reproduce the time evolution of these species.
The neural emulator provides a significant gain in computing time.A typical Nautilus model requires of the order of 120-240 s to produce the output for all considered molecules at 64 timesteps equispaced in logarithm between 1 and 10 7 yr.This number fluctuates depending on the physical conditions, which change the stiffness of the kinetic equations and impacts the total computing time.Our emulator is able to compute 4.8 models/s in an Intel Xeon CPU E5-2660@2.60GHz, a rather standard desktop CPU from 2017.We clarify that a "model" evaluation for the emulator requires using the emulator for all 192 species at 64 timesteps.This number goes up to 87 models/s when using a GPU if models are computed one by one.This does not sat-urate the computing power of the GPU, which are specially tailored to compute many models in parallel.This saturation happens, in our specific conditions, when ∼32 models are evaluated simultaneously.At this rate, we are able to compute 146 models/s.Therefore, our neural emulator is between 10 4 and 3 × 10 4 times faster than Nautilus.Additionally, the computing time is not sensitive to the specific physical conditions, carrying out exactly the same number of operations for the whole parametric volume defined in Tab. 1.
An arguably better representation of the performance of the emulator is given in Fig. 5, together with Figs.A1, A2 and A3 in Appendix A. We display the distribution of differences in logarithm between the abundance predicted by the emulator and the original one computed with Nautilus.For simplicity, we characterize the distributions showing the percentiles.The black solid line shows the median difference, which is interestingly close to zero for all species.This means that deviations above and beyond are equally likely.This is confirmed with the shaded areas, which display the range between percentiles 1 and 99 in blue, 5 and 95 in orange, and 16 and 84 in green.The figure shows that the difference lies below 0.2 dex in the majority of species of relevance for all times with 98% probability.Only for a few species (like H 2 S 3 , HC 7 N, . . . ) this difference goes up to 0.4 dex for the latest phases of the chemical evolution.Note that we also show species in the ices (characterized with the letter J at the beginning when the species is in the surface, and K for those in the ice bulk).
The difference is consistently very small for the initial phases of the evolution for all species.The reason is that the initial abundance for all species is fixed for all models so it is easy to be learned by the emulator.The specific final abundance depends on the time evolution of the chemistry, which turns out to be more difficult to learn.The abundance at very large times converges to the equilibrium value, which should only depend on the physical conditions.Despite this fact, we find that the uncertainty in the emulator is often larger in the final stages of the evolution, although still consistently below 0.4 dex.This is because the equilibrium value is often reached at very large times so the uncertainty in the emulator is larger.

IMPORTANCE OF THE IONIZATION DEGREE AND INITIAL SULPHUR ABUNDANCE
Neural networks are often described as black boxes in the sense that, even though they can approximate the results well, the structure of the neural network itself generally does not provide any insight into the model being approximated.It is a reality (although the machine learning community is actively working towards solving this issue) that it is hard to understand the learned relationship between the input parameters and the outcome of the model.This information is sometimes of great interest in astrochemistry, as questions like what physical conditions favor or hinder the production of a given chemical species are frequently faced.
A potential avenue to understand the relationship between inputs and outputs in astrochemical models was addressed in Heyl et al. (2023) using Shapley values (Shapley 1953).The Shapley value is an approach from game theory that provides a way to fairly distribute the outcome of a cooperative game among the players.In the context of astrochemical emulation, the chemical emulator is the cooperative game, the outcome (or prediction) is the abundance of a given chemical species, and the players, commonly referred as features, are the input parameters of the emulator.Shapley values therefore allow us to quantify the importance of the features for a given prediction.In the following, we give a brief description of the method applied to the chemical emulator.For a full description of the Shapley value approach, we refer to Shapley (1953).
Let us define  = {,  H ,  H 2 , [] , } as the full set of input parameters of the neural emulator and   as the vector of input parameters of training example .Under the Shapley value framework (Lundberg & Lee 2017;Heyl et al. 2023), the explanation of the prediction  (  ) is assumed to be given by the following linear model: where we made the notational simplification  (  ) = (|M,   ) once  and M are fixed.Put in words, the simplified explanation model is built by adding ⟨  ⟩, the average value of the prediction, together with the Shapley values    associated with all the -th input parameter at the -th training example.This linear model allows one to extract the importance of each input parameter for a given prediction by simply comparing the Shapley values.
Shapley values can be obtained from the predictions of our model.Let  ⊆  be a subset of  (e.g.,  = {,  H }). The vector    is obtained by keeping the elements from the subset  as they are and setting the rest of parameters to a random value extracted from the data set.With these definitions, the Shapley value    associated with the input parameter  and training example  is the weighted average of the marginal contribution of the feature  over all possible coalitions of features excluding the feature : As indicated, the summation is carried out for all subsets that do not include the input parameter of interest.Note that | | and || are the number of elements in sets  and , respectively.
Once the Shapley values are computed, the global importance of feature  in the prediction, denoted as   , is given by the mean absolute value of the Shapley values for that feature over the entire dataset: where  is the number of examples in the dataset.Obviosuly, the relative importance of feature  can be obtained as: Sulphur atoms, with an ionization potential of 10.36 eV, are easily ionized in translucent gas, becoming an important donor of electrons in these areas inside molecular clouds (see, e.g., Goicoechea et al. 2006;Fuente et al. 2019;Bulut et al. 2021).We expect that in denser and shielded gas, where sulphur is mostly found in sulphur-bearing molecules, the initial sulphur abundance becomes less important to set the ionization degree of the gas.The ionization degree is a key quantity that influences the coupling between the gas and the interstellar magnetic field.Consequently, it strongly impacts the dynamics of the gas and, potentially, the star formation process.With the advent of numerical simulations of core collapse, this quantity is present in diffusive phenomena like ambipolar diffusion, Ohmic dissipation, and Hall effect (Masson et al. 2012).These diffusive phenomena were proposed as a solution to the magnetic braking catastrophe, in which the implementation of ideal magnetohydrodynamics (MHD) prevents disks from being formed (see, e.g., Allen et al. 2003;Galli et al. 2006  It is therefore interesting to examine the impact of the initial sulphur abundance on the number of electrons for physical conditions corresponding to different layers of a molecular cloud and discuss the possible implications in the star formation process. To investigate the impact of the model features on the number of electrons, we used the Python package SHAP (Lundberg & Lee 2017).We first created a wrapper around the emulator so that the input features are scaled to the range [0, 1] and the output is the decimal logarithm of the chemical abundances at a given time instant.Unless specified, the values of the output are taken at 1 Myr.We performed a random uniform sampling of the parameter space with 2048 predictions.The SHAP values of the five features for the 2048 predictions and 192 chemical species were calculated with the Explainer object of the SHAP library.The importances of the features (see Eq. 10) setting the number of electrons are shown in Fig. 6.According to the ranking of features (left hand side of Fig. 6), the density is the feature that makes the greatest impact on the number of electrons, with a contribution of 51.2% (Eq.11), followed by the cosmic ray ionization rate (21.6%), the FUV field strength (9.6%), and the initial sulphur abundance and gas temperature, both with a contribution of 8.8%.In the right hand side of Fig. 6 we show the beeswarm plot of the ionization degree, that is, the SHAP values for all the features that correspond to each one of the 2048 randomlysampled predictions, color-coded by the feature value.Our results indicate that SHAP values for density are anti-correlated with the feature (density) value, that is, high gas densities have a negative impact on the number of electrons.This is in agreement with our previous discussion: as density increases, sulphur is no longer in its ionized form but found in sulphur bearing molecules, thus reducing its contribution to the ionization state of the gas.For the rest of features, this behavior is inverted, as their SHAP values are correlated with their value.The cosmic-ray ionization rate and the FUV field strength have a positive correlation with their corresponding SHAP values.As expected, stronger ionizing agents have a positive impact in the ionization of the gas.Likewise, higher temperatures are associated with a positive impact on the number of electrons of the gas, although this correlation is weaker.
Correlations are quantified in Fig. 7.A strong correlation,  = 0.99, is seen between the SHAP and feature values of the cosmic-ray ionization rate, while a strong anti-correlation,  = −0.97, is present between the SHAP and feature values of the gas density.Correlations on the rest of features are weaker, meaning that predictions with a similar feature value may have different SHAP values for that feature.This indicates that the SHAP value for a feature may not be only dependent on that feature but on several interacting features.We investigate feature interaction in Fig. 7, showing the pairs of SHAP and feature values for a given feature, color-coded by the rest of features.Gas density  H shows a clear interaction with the initial sulphur abundance [S] and the FUV field strength  (see plots in the second column, fourth and fifth row, respectively, in Fig. 7).In the case of the initial sulphur abundance, for any given value of this feature, lower densities yield higher (in absolute value) SHAP values for [S] than higher gas densities.Since SHAP values tell us how each feature contributes to the deviation of a given prediction from the average outcome (Eq.8), this indicates that the number of electrons is susceptible to change more when the initial sulphur abundance is modified in low-density gas than when it is modified in denser gas, where the number of electrons is more robust against changes in [S].A similar behavior is also observed in the interaction between density  H and the FUV field strength .
We can provide a quantitative estimation of how different densities alter the variations in the ionization degree when the initial sulphur abundance is modified.We first considered the electron density   in two low-density scenarios, with low and high initial sulphur abundances respectively.The family of vectors of input parameters with low density and low initial sulphur abundance, both normalized to the range [0, 1], following the standard ordering of features given by the set  = {,  H ,  H 2 , [] , }, is given by with temperature , cosmic-ray ionization rate  H 2 , and FUV field strength  set as free parameters.Similarly, the family of vectors of input parameters with low density and high initial sulphur abundance is given by We estimated the electron density in each case by averaging over the free parameters using a set of 30000 uniformly sampled vectors of input parameters following Eqs.12 and 13.The resulting electron  That is, in the low density scenario ( H ∼ 10 4 cm −3 ), increasing the initial sulphur abundance from [] = 8 × 10 −8 to [] = 1.5 × 10 −5 leads to an enhancement of the electron density by an order or magnitude.We can perform a similar analysis in a high density scenario ( H = 10 7 cm −3 , normalized density of 1).Given the following two families of input vectors: with temperature , cosmic-ray ionization rate  H 2 , and FUV field strength  set as free parameters and averaged over 30000 random samples, their respective electron densities are: This results in a relative difference of ⟨  (  )⟩ ⟨  (  )⟩ = 10 −8.57+8.60 ∼ 1.07, that is, in the high density scenario, the density of electrons is only increased by a 7% when the initial sulphur abundance is enhanced from [] = 7.5 × 10 −8 to [] = 1.5 × 10 −5 .
It is now worth discussing whether the link between sulphur abundance, density, and the ionization degree has relevant implications in the star formation process.As mentioned above, non-ideal MHD simulations are the state-of-the-art tool to model core collapse and star formation.They include diffusive phenomena like ambipolar diffusion, Ohmic dissipation, and Hall effect that regulate the dissipation of the magnetic flux (Masson et al. 2012).These resistivities depend on the ionization degree of gas and dust (see, e.g., Marchand et al. 2016).At densities  H < 10 7 cm −3 , electrons are the dominant charge carriers while at higher densities metallic ions and grains overcome electrons becoming the main charged species (Marchand et al. 2016).This density is precisely the largest considered in our parameter space and, accordingly, the possible effects of the initial sulphur abundance on the number of electrons and the dynamics of star formation should start early on in the isothermal core collapse.
In this stage, in the low density scenario  H ∼ 10 4 cm −3 , the initial sulphur abundance can be more effective increasing or decreasing the number of electrons and, consequently, modifying the magnetic resistivities.This is particularly relevant for the role of ambipolar diffusion in the star formation process (see Fig. 8): a high initial sulphur abundance in a low-density medium leads to a higher ionization of the gas that becomes better collisionally coupled with the neutral species, reducing the ambipolar diffusion resistivity.Likewise, low initial sulphur abundances could lead to a lower electron abundance that enhances ambipolar resistivity.As density increases, the initial sulphur abundance loses efficiency at setting the amount of electrons, and eventually, metallic ions and grains become the main charge carriers.We conclude that, at the early times during the isothermal collapse that precedes the First Hydrostatic Core phase, the initial sulphur abundance can modify the ambipolar resistivity effectively in low density environments.This ability is greatly reduced as collapse proceeds and the amount of electrons decreases.While the ambipolar diffusion does not seem to have an impact on the mass and radius of the first Larson core when compared to the ideal MHD case (Masson et al. 2016), it defines the ambipolar-diffusion timescale, the timescale in which filaments fragment and cores form (Ciolek & Mouschovias 1993;Tassis & Mouschovias 2004;Kudoh & Basu 2014;Burge et al. 2016).In the simulations carried out by Burge et al. (2016), the collapse rate is inversely proportional to the ionization degree.According to their results, a difference of an order of magnitude in the ionization degree, which is the difference we predict between the high and low initial sulphur abundance scenarios for the low density case (see Eq. 14), leads to different collapse dynamics: a magnetically-regulated collapse, where gas is strongly coupled to the magnetic field and the gas collapses at velocities much lower than the sound speed, and a gravity-dominated one where gas is weakly coupled to the magnetic field and reaches higher velocities.Consequently, the initial sulphur abundance and its impact on the ionization state of the gas in low density environments could potentially play a role in the dynamics of core collapse.

CONCLUSIONS
Astrochemistry is a fundamental tool for the interpretation of observations in the interstellar medium.The solution to the system of nonlinear coupled ordinary differential equations describing the timedependent chemistry in large chemical networks is feasible nowadays but requires lots of computing power.Having access to a fast emulator will open the possibility of improving our understanding of the physical conditions in the interstellar medium.
We have developed such an emulator making use of NFs conditioned on the physical conditions and trained with computations carried out with the Nautilus code.We have demonstrated, through validation studies, that the emulator does an excellent job in approximating the time evolution of 192 species.The logarithm of the abundance of almost all species can be obtained with uncertainties well below 0.2 dex for all times below 10 7 yr.The resulting model is able to provide the time evolution for all species ∼ 10 4 times faster than Nautilus.In addition, the computing time is not sensitive to the specific physical conditions, carrying out exactly the same number of operations for any combination of the input parameters inside the ranges defined in Tab. 1.This allows us to perform thousands of simulations in a reasonable computing time to fit spectroscopic observations or explore the influence of different input parameters in chemical predictions.In this paper, we have used Shapley values to investigate the influence of the sulphur elemental abundance on the gas ionization degree, key parameter in the dynamical evolution of interstellar clouds.We conclude that the initial sulphur abundance can modify the ambipolar difussion resistivity in low density environments (n H ∼10 4 cm −3 to a few 10 5 cm −3 ), hence determining the timescale for fragmentation and collapse.This ability is reduced as collapse proceeds and density achieves values close to n H ∼10 7 cm −3 .In this high density regime the influence of sulphur abundance on the dynamics is negligible.
Additionally, the model is fully differentiable.This opens the possibility of including the chemistry in more elaborate forward models, to better understand the observations.One example is the Bayesian inference of physical properties in the interstellar medium via the interpretation of observed spectral lines.The speed of our model make it feasible to systematically determine the physical and chemical conditions over large regions of the sky to characterize star forming regions of ours and external galaxies.
Emulators, obviously, trade speed for flexibility.Since these models require a training set, the chemical network, the number of species, the range of physical parameters and all remaining hyperparameters have to be fixed in advance.Updating the chemical network (either changing the reaction rates or adding/removing reactions), including more species or exploring different regions of the space of parameters requires a retraining of the neural model.Anyway, the training time is negligible when compared with the amount of time saved in any downstream task carried out with the emulator.10 0 10 2 10 4 10 6 0.5 0.0 0.5 KHCCO 10 0 10 2 10 4 10 6 KHCCS 10 0 10 2 10 4 10 6 KNH2CN 10 0 10 2 10 4 10 6 10 0 10 2 10 4 10 6 10 0 10 2 10 4 10 6 10 0 10 2 10 4 10 6

Figure 1 .
Figure1.Architecture used in the emulator.The "Emulator" block encapsulates the NF that predicts the abundance of a certain species at a given time.The two conditioning blocks produce vectors that are concatenated together and used in the neural field.Yellow blocks are input layers.Orange blocks are fully connected layers, with the number of neurons indicated for clarity.Light purple color refer to the Fourier encoding layer explained in Eq. (5).The conditioning vectors enter into the fully connected layers following the FiLM paradigm.

Figure 2 .Figure 3 .
Figure 2. Two-dimensional histogram showing the covering of all 126k models of the training set shown as combinations of temperatures and the rest of parameters.

Figure 4 .
Figure 4. Five samples per species of those considered of more interest.We show the time evolution of the abundance computed with the neural emulator in solid line, while the original time evolution of the test set is shown in dashed lines.These have been selected randomly with no cherry-picking.

Figure 5 .
Figure5.Difference in logarithm between the emulator and the original run for selected species.The solid line shows the median difference, while the shaded area shows the quantiles 1-99 in blue (approximately 3), 5-95 (approximately 2) in orange and 16-84 (approximately 1) in green.

Figure 6 .
Figure 6.Plots summarizing the SHAP values and feature importance for the 2048 predictions of the number of electrons.Left: feature importance as defined in Eq. 10 for each feature of the emulator.Right: a so-called beeswarm plot where dots show the SHAP values of each prediction for all features, colored depending on their feature value.

Figure 7 .
Figure 7. Correlations between SHAP and feature values of the 2048 predictions of the number of electrons for gas temperature (first row), gas density (second row), cosmic-ray ionization rate (third row), initial sulphur abundance (fourth row), and FUV field strength (fifth row).In each row, the 2048 predictions are color-coded by the rest of the features.The histogram on the x-axis shows the uniform sampling in the range of feature values.

Figure 8 .
Figure 8. Magnetic resistivities and their relevance at different stages of the star formation process as a function of density.Light blue: negative Hall resistivity, dark blue: positive Hall resistivity, red: ambipolar diffusion resistivity, and green: Ohm resistivity.Taken from Marchand et al. (2016).

TimeFigure A3 .
Figure A3.Difference in logarithm between the emulator and the original run for selected species.The solid line shows the median difference, while the shaded area shows the quantiles 1-99 in blue (approximately 3), 5-95 (approximately 2) in orange and 16-84 (approximately 1) in green.

Table 1 .
Physical conditions considered in the emulator, together with the range of values and the transformation used for sampling them.