The GUAPOS project. V: The chemical ingredients of a massive stellar protocluster in the making

Most stars, including the Sun, are born in rich stellar clusters containing massive stars. Therefore, the study of the chemical reservoir of massive star-forming regions is crucial to understand the basic chemical ingredients available at the dawn of planetary systems. We present a detailed study of the molecular inventory of the hot molecular core G31.41+0.31 from the project GUAPOS (G31.41+0.31 Unbiased ALMA sPectral Observational Survey). We analyze 34 species for the first time plus 20 species analyzed in previous GUAPOS works, including oxygen, nitrogen, sulfur, phosphorus, and chlorine species. We compare the abundances derived in G31.41+0.31 with those observed in other chemically-rich sources that represent the initial and last stages of the formation of stars and planets: the hot corino in the Solar-like protostar IRAS 16293–2422 B, and the comets 67P/Churyumov-Gerasimenko and 46P/Wirtanen. The comparative analysis reveals that the chemical feedstock of the two star-forming regions are similar. The abundances of oxygen-and nitrogen-bearing molecules exhibit a good correlation for all pair of sources, including the two comets, suggesting a chemical heritage of these species during the process of star formation, and hence an early phase formation of the molecules. However, sulfur-and phosphorus-bearing species present worse correlations, being more abundant in comets. This suggests that while sulfur-and phosphorus-bearing species are predominantly trapped on the surface of icy grains in the hot close surroundings of protostars, they could be more easily released into gas phase in comets, allowing their cosmic abundances to be almost recovered.


INTRODUCTION
One of the fundamental open questions in astrochemistry is to understand which is the chemical feedstock of star and planet birthsites, and how it is transferred during the different phases of evolution of star and planetary systems: from parental molecular clouds, to starless/prestellar cores, to protostars, to planet-forming disks, and to planetary objects (planets, comets and asteroids).Nowadays, it is debated to what extent the molecular content is conserved or significantly reprocessed between the different evolutionary stages.
It has been shown that water ices from the Solar Nebula were widely available for the formation of the subsequent planetary system (Cleeves et al. 2014;Altwegg et al. 2017), and that at least most of them could survive to the process of accretion in planet-forming disks (Öberg et al. 2011;Visser et al. 2009).The detection of molecules in ★ E-mail: alvarolg@cab.inta-csic.esprotoplanetary disks also suggest that the chemical reservoir found in planet-forming sites might be directly inherited from earlier phases (e.g., Loomis et al. 2018;Bianchi et al. 2019;Booth et al. 2021;van der Marel et al. 2021;Ceccarelli et al. 2023;Tobin et al. 2023).Moreover, the molecular abundances of some species derived in planet-forming disks are similar to those found in comets (Öberg et al. 2015).Supporting this chemical heritage, Bockelée-Morvan et al. (2000) found that the abundances of some selected molecules present in the comet Hale-Bopp are similar to those of several interstellar high-mass star-forming regions.More recently, Drozdovskaya et al. (2019) showed that the molecular abundances of the comet 67P/Churyumov-Gerasimenko and the Solar-like protostellar environment IRAS 16293-2422 B are reasonably well correlated.Furthermore, Coletta et al. (2020) found that the molecular ratios of two complex organic molecules (COMs, species with >5 atoms; Herbst & van Dishoeck 2009), methyl formate and dimethyl ether, are nearly constant in multiple sources at different evolutionary stages of star and planetary system formation (molecular clouds, prestellar cores, low-, intermediate-and high-mass star-forming regions, protostellar shocks, and comets).
Since there is a growing evidence that our Sun was born in a dense stellar cluster that included also massive stars (Carpenter 2000;Lada & Lada 2003;Porras et al. 2003;Reach et al. 2009;Adams 2010;Öberg et al. 2011;Dukes & Krumholz 2012;Pfalzner & Vincke 2020;Korschinek et al. 2020;Brinkman et al. 2021), the study of the chemical composition of massive star-forming regions can give us information about the molecular feedstock of an environment that might resemble the birthsite of our Solar System.This is the main purpose of the project GUAPOS (G31.41+0.31Unbiased ALMA sPectral Observational Survey), which carried out a spectral survey of the whole ALMA Band 3 towards the massive star-forming region G31.41+0.31 (hereafter G31.41).This region, with a mass of 70 M ⊙ (Cesaroni 2019), harbors a hot molecular core (HMC) where at least four massive protostars are forming (Beltrán et al. 2021).G31.41 is located at 3.75 kpc distance (Immer et al. 2019), and has a luminosity of ∼ 4.4 × 10 4 L ⊙ (Osorio et al. 2009).The G31.41 hot core is one of the most chemically-rich sources in the Galaxy (Beltrán et al. 2009;Rivilla et al. 2017;Suzuki et al. 2023).The initial exploitation of GUAPOS data were focused on selected categories of COMs, like the isomers of glycolaldehyde (HCOCH 2 OH; Mininni et al. 2020; hereafter GUAPOS I), peptide-like bond molecules (Colzi et al. 2021; hereafter GUAPOS II), several oxygen-and nitrogen-bearing COMs (Mininni et al. 2023; hereafter GUAPOS III); and phosphorusbearing molecules (Fontani et al. 2024;hereafter GUAPOS IV).
The aim of this work, GUAPOS V, is to extend the chemical chemical census of the G31.41 massive star-forming region already presented in previous GUAPOS works, with the aim of comparing it with those of other well-studied astronomical objects at different evolutionary stages of the star-and planet-formation process.We have chosen sources that have been extensively studied in a homogeneous way, i.e. using the same dataset analyzed with the same method, and providing chemical censuses of tens of species, which will allow us to study the chemical budgets of different chemical families in a statistical way.The chosen sources to perform the comparison are: the low-mass proto-Solar analogue IRAS 16293-2422 B (hereafter IRAS16B; Jørgensen et al. 2016;Drozdovskaya et al. 2019), and the comets 67P/Churyumov-Gerasimenko (hereafter 67P/C-G; Rubin et al. 2019;Drozdovskaya et al. 2019), and 46P/Wirtanen (hereafter 46P/W; Biver et al. 2021).In Sect. 2 we present the ALMA observations of G31.41 used in our analysis.In Sect. 3 we describe the census of molecules that we have used to compare the chemical reservoir of G31.41 with those of other sources.In Sect. 4 we present the molecular data analysis of the G31.41 spectra.In Sect.5, we quantitatively compare the molecular abundances derived in G31.41 with those of other three sources using three complementary statistical tests.In Sect.6 we discuss the implications of our results, and in Sect.7 we summarize the main conclusions of this work.

ALMA OBSERVATIONS OF G31.41+0.31
The observations were carried out with ALMA (Atacama Large Millimeter Array) during the Cycle 5 as part of the project 2017.1.00501.S (PI: M. T. Beltrán).The phase center was  J2000 = 18 h 45 m 34 s and  J2000 = −0.1 • 12 ′ 45 ′′ .We fully covered the Band 3 (86.05obtaining an unbiased spectral survey.The frequency resolution of the correlator setup was 0.49 MHz equivalent to a velocity resolution of ∼ 1.6 km s −1 at 90 GHz.The datacubes of the whole survey were created using a common restoring beam of 1.2", about 4500 au.The uncertainties in the flux calibration are ∼5% (from Quality Assesment 2 reports), which is in good agreement with flux uncertainties at ALMA band 3 reported in Bonato et al. (2018).For more details regarding the observations we refer to Mininni et al. (2020).

SAMPLE OF MOLECULES
The molecules considered in this study were selected because they have been detected or searched for (providing an upper limit for their column density) towards at least two of the four sources considered (the high-and low-mass star-forming regions G31.41 and IRAS16B, respectively, and the comets 67P/C-G and 46P/W).The total number of molecules considered for the comparative study are 57 and the information about the detection or non detection of the molecules towards the four astronomical sources are listed in Table B1.For IRAS16B, we have used the abundances and upper limits reported in Drozdovskaya et al. (2019) (see also references therein), Martín-Doménech et al. (2017), Coutens et al. (2018Coutens et al. ( , 2019)), Calcutt et al. (2019) and Manigand et al. (2021).Regarding 67P/C-G, we used the results presented in Calmonte et al. (2016), Dhooghe et al. (2017), Fayolle et al. (2017), Altwegg et al. (2019), Hadraoui et al. (2019), Rubin et al. (2019), Schuhmann et al. (2019) and Rivilla et al. (2020); while for 46P/W we considered the information reported in Biver et al. (2021).For G31.41, we have used the molecules already reported in previous works (GUAPOS I, Mininni et al. (2020); II Colzi et al. (2021); III, Mininni et al. (2023); García de la Concepción et al. 2022;and Cesaroni et al. 1994), and the ones analyzed in this work (see Sect. 4).For NH 3 , we used the data presented by Cesaroni et al. (1994), based on VLA observations with a beam of 1.3" x 1", which is very similar to that of the GUAPOS ALMA observations, and hence allows a fair comparison.

DATA ANALYSIS OF THE G31.41+0.31 SPECTRA
We present here the analysis of the molecular emission of 34 molecules in the GUAPOS spectral survey, along with other 20 molecules analyzed in previous works (see Table 1).The ALMA data were calibrated and imaged with CASA 1 (the Common Astronomy Software Applications package, McMullin et al. 2007).Since G31.41 presents an extremely rich spectrum, with no line-free channels, we did not perform the continuum substraction in the uv-space before imaging.For each observed basebands, the spectrum including the continuum level was extracted from an area equal to the synthesized beam (1.2") and centered on the continuum peak of our source.The root mean square (rms) noise of the spectra is 7-27 mK.More details are described in Mininni et al. (2020).To subtract the continuum in the extracted spectra, as described in Colzi et al. (2021), we applied to each baseband spectrum the sigma clipping method (c-SMC) of the python based tool STATCONT2 (Sánchez-Monge et al. 2018).
To perform the spectral analysis (line identification and fitting) we use the Madrid Data Cube Analysis (MADCUBA) software3 (Martín et al. 2019).We searched for different molecules in the spectrum, and we fit them using the SLIM (Spectral Line Identification and Modeling) tool of MADCUBA, which incorporates the Cologne Database for Molecular Spectroscopy4 (CDMS, Müller et al. 2001Müller et al. , 2005;;Endres et al. 2016) and the Jet Propulsion Laboratory molecular catalogs5 (JPL, Pickett et al. 1998).
SLIM produces a synthetic spectrum assuming local thermodynamic equilibrium (LTE) conditions, and taking into account the line opacity (see formalism in Martín et al. 2019).Considering the high density of the G31.41 molecular core, which is ∼ 10 8 cm −3 (Mininni et al. 2020), LTE is a good approximation.SLIM compares the synthetic LTE spectrum with the observed spectrum, and finds the best fit applying the AUTOFIT tool based on a Levenberg-Marquardt algorithm.The free physical parameters of the fit are: the column density (), the excitation temperature ( ex ), the velocity ( −  0 ), where  0 is the systemic velocity of G31.41 (96.5 km s −1 , Beltrán et al. 2018), and the full width at half maximum (FWHM).As discussed in Mininni et al. (2020) and Colzi et al. (2021), the molecular emission fills the beam, and hence no beam dilution correction was applied.
G31.41 is a chemically-rich source, so the blending among lines of different species is frequent.To take this into consideration, we performed the fit of each molecule accounting also for the emission of all other species detected.To run AUTOFIT, we have selected the most unblended transitions of each molecule, and also those transitions that, together with the emission from other known species, well reproduce the observed spectrum.If possible, we have left free the four physical parameters of the fit.In case the AUTOFIT algorithm did not converge, we fixed some of the parameters.Unfortunately, in many cases AUTOFIT does not converge if  ex is set as a free parameter due to the lack of enough transitions free of contamination and covering a broad range of energy levels.Thus,  ex cannot be directly derived, and we assumed a value for it.For the most complex molecules with ≥6 atoms, which are expected to trace the hot molecular core, we have assumed 150 K (similar to the values derived by Rivilla et al. 2017;Mininni et al. 2020;Colzi et al. 2021); while for molecules with < 6 atoms, which likely trace colder gas, we have assumed 50 K.To evaluate the impact of these assumptions on the derived column densities, we have also repeated the fits using a range of temperatures: 100 and 200 K for molecules with ≥6 atoms, and 25 and 75 K for molecules with < 6 atoms, respectively.As shown in Table 2 and in Table A1, the resulting column densities in these temperature ranges are, in most cases, within less than a factor of 2 with respect to the values derived using the assumed values of 50 K or 150 K.In addition, to allow the convergence of AUTOFIT in some cases,  −  0 and/or FWHM were fixed to 0 km s −1 and 7 km s −1 , respectively, which fit properly the most unblended transitions.
Some of the molecules analysed present high abundances and thus they are optically thick.As a consequence, the derived  should be considered as a lower limit.In these cases, we have searched for and analysed their optically-thinner isotopologues to obtain a better estimation of the column density.If more than one isotopologue is well detected, we have adopted the optically thinnest to derive the  of the main isotopologue.We used the following values of the isotopic ratios, which depends on the galactocentric distance ( GC ), assuming  GC for G31.41 of 5.02 kpc (calculated from the heliocentric distance of 3.75 kpc, Immer et al. 2019): 12 C/ 13 C = 39.5 ± 9.6 (Milam et al. 2005;Yan et al. 2019), 14 N/ 15 N = 340 ± 90 (Colzi et al. 2018), 32 S/ 34 S = 14.6 ± 2.1 (Yu et al. 2020) and 16 O/ 18 O = 330 ± 140 (Wilson 1999).Moreover, for isotopic ratios that are not sensitive to  GC , we used: 34 S/ 33 S = 5.9 ± 1.5 (Yu et al. 2020), 34 S/ 36 S = 115 ± 17 (Mauersberger et al. 1996) and 18 O/ 17 O = 3.6 ± 0.2 (Wilson 1999, taking the value of local ISM).
The molecular abundances were obtained by using the column density of (H 2 ) = (1.0 ± 0.2)×10 25 cm −2 derived from the continuum emission within the same region used to extract the spectrum (more details in Mininni et al. 2020).
In the following sections we describe in detail the fitting procedure and the results obtained for each molecular species (including isotopologues) detected, the tentative detections and the upper limits derived from the non-detected molecules.The spectroscopic information of all the molecules used in this work is summarized in Table C1.The transitions used for fitting each species are listed in Table D1.

Detected molecules
We summarize in Table 2 the results of the line fitting of the different molecules.For optically thick molecules, we show in the table only the result of the optically thin isotopologue used to derive the column density of the main isotopologue.The results of the fits of the remaining isotopologues, including the main ones, are shown in Table A1 instead.The detailed analysis of each molecule (and its isotopologues) are described in the following sections.

Hydrogen cyanide (HCN)
The spectral profiles of the =1−0 rotational transitions of HCN and H 13 CN show clear absorptions (left and middle upper panels of Fig. F1), which are likely due to filtering of extended emission by the interferometer and/or possible infall motions.To derive the molecular abundance of HCN, we have then used the 15 N isotopologue, which shows a Gaussian emission profile (upper right panel of Fig. F1).The results of the fit are shown in Table 2.The derived line opacity () is ∼1.7, which indicates that even the 15 N isotopologue is optically thick.The column density of HCN derived, assuming the 14 N/ 15 N ratio, is (1.5 ± 0.4)×10 18 cm −2 .We warn that, due to the opacity, this value is likely a lower limit of the true column density.

Hydrogen isocyanide (HNC)
Similarly to HCN, the spectral profiles of the =1−0 rotational transitions of HNC and HN 13 C show absorptions (left and middle lower panels of Fig. F1).Unlike HC 15 N, the =1−0 transition of H 15 NC is heavily contaminated with a transition of CH 3 NH 2 (right lower panel of Fig. F1).Therefore, we derived an upper limit for its column density (procedure explained in Sect.4.2) using the H 15 NC isotopologue, and we retrieve the upper limit of the HNC column density applying the 14 N/ 15 N ratio (Table 2).We obtain a high HCN/HNC ratio >136, as expected in hot gas (Colzi et al. 2018;Hacar et al. 2020).

Carbon monoxide (CO)
Similarly to HCN and HNC, the spectral profiles of the =1−0 rotational transitions of CO and 13 CO show strong absorptions (Fig. F2), so we searched for optically thinner isotopologues.The isotopologue C 17 O is heavily blended with two bright transitions of CH 3 COOH (Fig. F2), and thus we report an upper limit (procedure explained in Sect.4.2).The transition of the C 18 O isotopologue is detected, although partially blended with emission of CH 3 OCHO (see Fig. F2).The transition of the optically thinner double isotopologue 13 C 18 O is also detected (Fig. F2), only slightly blended with a transition of CH 3 NCO.We used the 13 C 18 O isotopologue to derive the column density of CO taking into account the slight contribution of CH 3 NCO, and using the corresponding carbon and oxygen isotopic ratios.The results of the fits are shown in Table 2 and Table A1.The derived CO/H 2 ratio is 2.6×10 −4 , which is consistent with the typical value found in molecular clouds (e.g.Dickman 1978;Lis & Goldsmith 1988;Duan et al. 2023).

Propyne (CH 3 CCH)
We fitted the two -ladders of the =5−4 and =6−5 rotational transitions showed in Fig. F4.Some of the transitions are blended with CH 3 COCH 3 , aGg'-(CH 2 OH) 2 and CH 3 OCHO.The observed spectrum is well reproduced with the combined emission of CH 3 CCH and the blending species, as shown in Fig. F4.If we leave  ex as a free parameter the AUTOFIT algorithm does not converge, hence we fixed it to 150 K.We derived a column density of (1.20 ± 0.09)×10 17 cm −2 (see Table 2).

Methyl isocyanide (CH 3 NC)
We fitted the -ladder of the =5−4 rotational transition.As shown in Fig. F5, the =0 and =1 transitions (at 100.5265GHz and 100.5242GHz respectively, see Table D1) are slightly blended with aGg'-(CH 2 OH) 2 .In addition, the =2 transition (at 100.5174GHz) is blended with CH 3 OCHO and =3 (at 100.5061GHz) is blended with HCOCH 2 OH and CH 3 COCH 3 .The blending of the higher energy transitions of the -ladder (see Table D1) does not allow convergence of the AUTOFIT algorithm leaving  ex as a free parameter, so we fixed it to 150 K.We retrieved a column density of (1.1 ± 0.4)×10 15 cm −2 (see Table 2).

Cyanamide (NH 2 CN)
We show in Fig. 1 the most intense and less blended transitions of this species (Table D1).The 5 1,5 → 4 1,4 transition at 99.3112 GHz is only marginally contaminated by CH 3 COOH.The other transitions of the right panels of Fig. 1, are partially blended with CH 3 COOH, C 2 H 5 OCHO, C 2 H 5 OH and with a transition not identified, respectively.We note that all other transitions predicted by the LTE model are consistent with the observed spectra, but they are heavily blended with emission from other species.The transitions used for the fit are partially optically thick ( = 0.04 -0.34;Table 2).We searched for the 13 C isotopologue, but it is not detected.The NH 2 CN/NH 2 CHO ratio in G31.41 is 0.007, using the value derived here and that reported by Colzi et al. (2021) for formamide.For comparison, in IRAS16B is 0.02 (Coutens et al. 2018), which is only a factor three of difference suggesting that the NH 2 CN column density that we have derived for G31.41 is more or less reliable.

Carbon monosulphide (CS)
The only rotational transition of CS in the GUAPOS spectral range is the 2 → 1 at 97.9810 GHz (see Table D1), which is optically thick and it shows a clear absorption probably due to filtering of extended emission and/or infall (see Fig. 2).The transitions of the isotopologues 13 CS, C 34 S and C 33 S are unblended, but they also are optically thick, with  of 2.5, 1.17 and 1.0 respectively (see Table D1).The isotopologues C 36 S and 13 C 34 S are also detected with no or very little contamination (Fig. 2).For the fit of C 33 S and C 36 S isotopologues, we fixed the FWHM to 9.5 km s −1 (see Table 2 and Table A1), derived from the fit of C 34 S, instead of the 7 km s −1 generally used when no further information are available and the AUTOFIT do not converge.We used the optically thinner isotopologue, C 36 S ( = 0.08), to derive the column density of CS, which is shown in Table 2.

Ethylene oxide (c C 2 H 4 O)
Figure 3 shows selected transitions of c C 2 H 4 O detected towards G31.41, which are unblended or partially blended and intense, with emission from other species whose joint contribution explains well the observed spectrum.We note that the remaining transitions predicted by the LTE model are consistent with the observed spectrum, but appear more blended with transitions of other species.The transitions are optically thin ( ≤ 0.10; see Table D1).They cover a wide range of transitions energies from 9.5 K to 290 K, allowing us to leave  ex as a free parameter.We found  ex = 132 ± 7 K, similar to the assumed value of 150 K assumed for the molecules with > 5 atoms.The other derived parameters are shown in Table 2.

Thioformaldehyde (H 2 CS)
The fit of the detected transitions of H 2 CS (see Fig. F6 and Table D1) gives a  ex = 38 K (see Table A1), and indicates that the emission of the molecule is optically thick.We have thus searched for optically thinner isotopologues with 13 C, 33 S and 34 S, which are also detected and shown in Fig. F6.Since AUTOFIT does not converge leaving  ex as a free parameter, we fixed it.We do not use the value derived for the main isotopologue because of its optical thickness, but we used instead 50 K, following our general criteria.The transitions of the 13 C isotopologue are partially blended, while the transitions of H 2 C 33 S are less blended with other species (Fig. F6).The isotopologue H 2 C 34 S is slightly optically thick ( = 0.03-0.33),whereas H 2 C 33 S has a lower  of 0.01-0.07.Therefore, we used the H 2 C 33 S isotopologue to retrieve the column density of H 2 CS (Table 2).We note that the H 2 CS column density derived by using H 2 C 34 S would be a factor 3.6 lower, confirming that it is partially optically thick.

Nitrous acid (HONO)
Some few transitions of this species are detected with no or slight blending (see Table D1), such as the the 6 1,5 → 6 0,6 transition at 98.0877 GHz shown (fourth panel of Fig. F7), the 4 0,4 → 3 0,3 at 93.9528 GHz (slightly blended with CH 3 CONH 2 ; upper right panel of Fig. F7), and the 8 1,7 → 8 0,8 transition at 111.5519 GHz (second leftmost upper panel of Fig. F7).We note that all other transitions predicted by the LTE model are consistent with the observed spectra, but they are heavily blended, with only one possible discrepancy of the 3 1,2 → 3 0,3 at 85.7329 GHz, in which the LTE model overestimate the observed spectrum by ∼1.5 K (middle right panel of Fig. F7).This difference might be due to a non optimal baseline substraction, considering that the uncertainty of the continuum level of the ALMA baselines in the GUAPOS survey is of the same order (1.2 K, as described in Colzi et al. 2021).Since the number of unblended transition detected of this species is not enough to claim a robust detection, we consider it as a tentative detection.We note that this species has been only detected previously towards IRAS16B by Coutens et al. (2019).In this source the HONO/CH 3 OH ratio is 9×10 −5 , whereas the ratio in G31.41 we derived a similar value of 6×10 −5 , which further supports that the molecule is tentatively detected.

Sulphur monoxide (SO)
Four intense and unblended transitions of SO are detected (see Fig. F8 and Table D1).The AUTOFIT converged leaving all parameters free (see Table A1).The results indicate that three of the transitions are optically thick ( > 0.8; Table D1), and that only the 5 4 → 4 4 transition is optically thin (=0.15).Therefore, we used the isotopologue 34 SO, whose detected transitions are shown in the lower panels of Fig. F8.Since the main isotopologue is optically thick, we do not use the  ex derived for it, but assume a value of 50 K, following our general criteria.The 2 3 →1 2 transitions at 97.7153 GHz (lower left panel) is completely unblended, while the other three are partially blended.These transitions are optically thin ( < 0.2).We applied the 32 S/ 34 S ratio to derive the column density of SO (see Table 2).

Methyl mercaptan (CH 3 SH)
We have detected several unblended transitions of CH 3 SH (see Fig. 4 and Table D1).These transitions are optically thin ( < 0.042).The remaining CH 3 SH transitions, along with the contribution from other species, such as C 2 H 13 5 CN or C 2 H 5 CHO, are consistent with the observed spectra.The derived parameters of the fit are shown in Table 2.

Cyanoacetylene (HC 3 N)
The spectral coverage of the GUAPOS survey contains the =10−9, =11−10 and =12−11 transitions of HC 3 N.The transitions =10−9 and =12−11 are unblended, while the =11−10 transitions is partially blended with CH 3 OCHO (see Fig. F9 and Table D1).By fitting these transitions we obtained a value for  ex of 58 ± 8 K (see Table A1), and line opacities of =1.6-2.0, which indicate that the main isotopologue is optically thick.We have also searched for the three 13 C isotopologues: H 13 CCCN, HC 13 CCN and HCC 13 CN.We fixed the  ex to 50 K because the values obtained from the HC 3 N fit are not reliable due to the high optical depth.The transitions of the three isotopologues are unblended, or they are blended with emission from other identified species that explain properly the observed spectrum.The transtition =12−11 of HCC 13 CN is blended with HCOCH 2 OH; the transtition =11−10 of H 13 CCCN is blended with C 2 H 3 CN and CH 3 COOH; and the transtition =12−11 of HC 13 CCN is blended with aGg'-(CH 2 OH) 2 .The emission of the 13 C isotopologues is optically thinner than that of the main isotopologue ( = 0.26−0.40).The column densities obtained for the three isotopologues are similar (Table 2).To derive the column density of HC 3 N we have thus done the average of the column densities of the 13 C isotopologues, and applied the isotopic ratio.Since the transitions have intermediate opacities, the HC 3 N column density might be slightly underestimated.We also searched for optically thinner isotopologues ( 15 N or the double 13 C isotopologues), but they are not detected.We note that the upper limits obtained for those isotopologues are consistent with the column density we report for HC 3 N (Table 2).

Propanal (C 2 H 5 CHO)
The transitions 12 0,12 → 11 1,11 E and 12 0,12 → 11 1,11 A, both at 114.1011 GHz (Table D1), shown in upper middle panel of Fig. F10, are unblended.Figure F10 also shows other several transitions of C 2 H 5 CHO that reproduce well the observed spectra when the contribution of the emission from other species is considered.The other transitions predicted by the LTE model are consistent with the observed spectra, but they are heavily blended, and thus they are not used for the LTE model fit.Therefore, we consider that this species is tentatively detected.The results of the fit using the transitions shown in Fig. F10 are presented in Table 2.This molecule has been detected previously in other interstellar sources, including IRAS16B, in which Lykke et al. (2017) found a C 2 H 5 CHO/CH 3 OH ratio of 2.2×10 −4 , very similar to the value derived here for G31.41 (4.0×10 −4 ), which supports the tentative detection.

Carbonyl sulphide (OCS)
Several transitions of OCS, O 13 CS, OC 34 S are clearly detected with no contamination (see Fig. F11 and Table D1).For OCS and O 13 CS AUTOFIT converges leaving  ex as a free parameter, giving 74±5 K and 35±16 K, respectively (Table A1).However, both are optically thick ( > 1.1), and hence they do not provide a reliable value for the temperature.Thus, we have fixed  ex = 50 K for the other isotopologues.The transitions of 18 OCS and OC 33 S are optically thinner ( = 0.16-0.22),and also unblended, except for the 8 → 7 transition of OC 33 S, which is blended with CH 3 OCHO, and for the 10 → 9 transition of 18 OCS which is blended with an unidentified species.The results of the fits for the isotpologues and the main molecule are shown in Table 2 and Table A1.The column density of OCS using the isotopologue 18 OCS or OC 33 S are consistent within a factor of 1.3.We used the isotopologue 18 OCS to estimate the column density of OCS because it is slightly optically thinner.

Methoxymethanol (CH 3 OCH 2 OH)
The LTE model using the parameters shown in Table 2 is depicted in Fig. F12, which shows several transitions of this species that might be detected towards G31.41.The inclusion of the contribution of the CH 3 OCH 2 OH emission, along with that of other molecules previously identified, explain better the observed spectrum than without considering it.We note that the remaining transitions predicted by the LTE model are consistent with the observed data.However, there is not enough unblended transitions of this species to provide a robust detection, and hence we consider it as tentative.This molecule has been only previosly detected in the MM1 core of NGC6334-I by McGuire et al. (2017), in Orion KL by Tercero et al. (2018), and in IRAS16B by Manigand et al. (2020).These works found a CH 3 OCH 2 OH/CH 3 OH ratios of 0.014, 0.014 and 0.029 respectively.In this work we have derived a very similar ratio of 0.034 towards G31.41, supporting the tentative detection.

Sulphur dioxide (SO 2 )
Figure F13 shows the transitions of SO 2 , 34 SO 2 and 33 SO 2 detected towards G31.41 (see also Table D1).Several transitions of SO 2 and 33 SO 2 are blended with other species, while the two transitions of 34 SO 2 are unblended.Moreover, this isotopologue is the only for which AUTOFIT converges leaving  ex as a free parameter, giving a value of 56±14 K, and a low opacity of =0.15±0.05.We have then fixed this derived value of  ex also for SO 2 and 33 SO 2 .The main isotopologue SO 2 is optically thick (=0.64),while 33 SO 2 is optically thin (=0.08).In this case, although the opacity of the 34 SO 2 is slightly higher than that of 33 SO 2 , we have used the former to derive the column density of SO 2 because their detected lines are unblended (Fig. F13).The results are shown in Table 2 and Table  A1.

Non detected molecules
Besides the 18 molecules with positive detections (including 3 tentative detections) towards G31.41 presented in the previous section, we have also searched for other 16 species (of the molecule sample described in Sect.3) that were not detected in this or any other GUAPOS works (see Table 1).We have thus derived upper limits for their column densities, which are listed in Table 2.The transitions of these species from which we have obtained the upper limits are mostly blended.Therefore, to compute the upper limits, we have selected the brightest transition of each species according to the LTE model that do not appear heavily blended.Since there are no linefree channels in the G31.41 spectra, to derive the upper limits we have fixed the values of T ex ,  −  0 and FWHM to the values previ-ously explained in Sect.4, and we have increased the value of  that makes the LTE model (including also the contribution from other species) visually compatible with the observed spectra.We list in Table D1 the transition used for each molecule, and in Fig. G1 we show the observed spectra at the corresponding frequency.To assess how constraining are the derived upper limits, in Fig. G2 (Appendix G) we have compared them, normalized to CH 3 OH, with the molecular ratios (or upper limits) obtained in IRAS16B.The details of this comparison are presented in Appendix G.
Table 2: Results of the SLIM fits of the molecules analysed towards G31.41, ordered by increasing molecular mass, from which we have retrieved their abundances.For optically thick molecules, we include the optically thin isotopologue that we have used to derive the column density of the main isotopologue (the results of the other isotopologues are presented in Table A1 instead).The resulting physical parameters, along with their associated uncertainties, are presented.The values of the parameters fixed to perform some fits are shown without uncertainties.In the case of non detections, we indicated the upper limit in the column density with <.In the case of tentative detections, they are denoted with ∼.When  ex is fixed, we calculate in what factor the value of  changes when using the different temperatures  min and  max : 25-75 K and 100-200 K for fixed temperatures of 50 K and 150 K, respectively.We also present the maximum line opacity of the transitions of each molecule (  ), and the molecular abundances compared to H 2 .The transitions used to obtain these values are in Table D1.(1.8 ± 0.8)×10 2 0.9-1.218 ± 9 F11 18 OCS 50 0.55 ± 0.07 0.9-1.22.9 ± 0.5 7 0.16 ± 0.06 (5.5 Notes.Isotopic ratios used: 12 C/ 13 C = 39.5 ± 9.6 (mean value of isotopic ratios from Milam et al. 2005 andYan et al. 2019), 14 N/ 15 N = 340 ± 90 (Colzi et al. 2018), 32 S/ 34 S = 14.6 ± 2.1 (Yu et al. 2020), 34 S/ 33 S = 5.9 ± 1.5 (Yu et al. 2020), 34 S/ 36 S = 115 ± 17 (Mauersberger et al. 1996), 16 O/ 18 O = 330 ± 140 (Wilson 1999), 18 O/ 17 O = 3.6 ± 0.2 (Wilson 1999, taking the value of local ISM). Maximum  of all the transitions detected in each species. Value of (H 2 ) = (1.0 ± 0.2)×10 25 cm −2 (Mininni et al. 2020). The transition of H 2 S used has a high value of  up of 520 K (see Table D1), which is completely out of the range of the  ex adopted in the fits (25, 50 and 75 K); and thus the derived  is extremely sensitive to the value assumed.* This molecule, HNC, is detected, but the profiles of the main isotopologue and the 13 C isotopologue show absorptions (see Fig. F1).We have derived an upper limit using the 15 N isotopologue (see details in Section 4.1.2).

COMPARISON WITH OTHER INTERSTELLAR SOURCES
We have compared the chemical reservoir of G31.41, using the molecular abundances derived here and in previous works (Cesaroni et al. 1994;Mininni et al. 2020;Colzi et al. 2021;Mininni et al. 2023) with that of the hot corino surrounding the low-mass Solar-like protostar IRAS16B, and the two comets 67P/C-G and 46P/W.While G31.41 and IRAS16B are two archetypical examples of a hot core and a hot corino, respectively, namely, the natal environments where massive and low-mass (Solar-like) protostars are born, the two comets (67P/C-G and 46P/W) are representatives of the outcoming products of the process of star and planet formation.We select these three sources to be compared with G31.41 because its chemical reservoir have been previously extensively studied, providing molecular catalogs of tens of molecules.Their molecular abundances were derived from deep surveys based on homogeneous data, which were analysed using the same procedure, like we have performed in the GUAPOS project for G31.41.This is a fundamental requisite to obtain reliable values of molecular abundance ratios, and fair comparisons of the chemical content of the four sources.IRAS16B is the prototypical example of a low-mass star-forming region, and its rich molecular content has been obtained using ALMA data in multiple works, especially from the Protostellar Interferometric Line Survey (PILS) (e.g.Jørgensen et al. 2016).IRAS16B is a member of a multiple protostellar system (Maureira et al. 2020), and it is commonly considered a proto-Solar analog given its low mass, which is 0.1 M ⊙ (Jacobsen et al. 2018).IRAS16B is the most chemically rich hot corino known, and several molecules have been detected towards it for the first time in the ISM (e.g.Coutens et al. 2019;Zeng et al. 2019), or in low-mass star-forming regions (e.g.Martín-Doménech et al. 2017;Ligterink et al. 2017;Calcutt et al. 2018;Coutens et al. 2018).For our comparison, we have used the molecular abundances of 55 molecular species towards IRAS16B (41 detected molecules; and 14 non detected molecules for which we used column density upper limits, see Table B1), obtained from Drozdovskaya et al. (2019) (see also references therein), Martín-Doménech et al. (2017), Coutens et al. (2018Coutens et al. ( , 2019)), Calcutt et al. (2019) and Manigand et al. (2021).For those molecules for which the uncertainties of the derived molecular column densities are not available, we have assumed an uncertainty of 20%.
The comet 67P/C-G was visited by the European Space Agency (ESA) spacecraft Rosetta, which provided us with unique data thanks to the instrument on board ROSINA (Rosetta Orbiter Spectrometer for Ion and Neutral Analysis; Le Roy et al. 2015).This instrument has provided an unprecedented view of the volatile chemical composition of a comet (Rubin et al. 2019) by obtaining in-situ data from the volatile emanated from the nucleus of the comet.The abundances were obtained by mass spectroscopy, thus isomer groups cannot be distinguished.We used here the data from Calmonte et al. (2016), Dhooghe et al. (2017), Fayolle et al. (2017), Altwegg et al. (2019), Hadraoui et al. (2019), Rubin et al. (2019), Schuhmann et al. (2019) and Rivilla et al. (2020).For the comparison presented in this work we have considered the molecular abundances of 32 isomeric groups (29 detections and 3 non detected species for which abundance upper limits were derived, see Table B1).
The chemical composition of the comet 46P/W was studied in detail with the IRAM 30-m telescope and the NOEMA interferometer when its distance was less than 0.1 au from Earth by Biver et al. (2021).We have used the molecular abundances reported in Biver et al. (2021), which provided information of 31 species (11 detections and 20 non detections for which upper limits are available, see Table B1).

Comparison of molecular abundances
We have directly compared the molecular abundances derived towards the four sources in Fig. 5, in which we have normalized the column densities of all the species with respect to methanol (CH 3 OH).To compute the molecular ratio uncertainties we have used error propagation for those column densities with symmetric uncertainties (e.g.those of G31.41 and IRAS16B).For the molecules with column densities reported with asymmetric uncertainties in the comet 67P/C-G (e.g.S-bearing from Calmonte et al. 2016), the error bars have been computed by considering the minimum and maximum ratios calculated considering the column density uncertainties, as also done by Drozdovskaya et al. (2019).
Figure 5 is composed of six panels that show the molecular abundance ratios for each possible pair of sources.In the cases in which the molecules are not detected, but upper limits of their column densities have been reported (see Section 4.2 for G31.41), we denote them with triangles.Due to the use of the mass spectroscopy technique for the comet 67P/C-G, we summed the column densities of the molecules with the same molecular mass of the other three sources for the comparison.In the case that the column density of one of the species is an upper limit, we have considered the sum as an upper limit as well.
In Figs. 6, 7, and 8 we have separated different families of molecules: i) oxygen(O)-bearing species; ii) nitrogen(N)bearing species (including molecules with O); and iii) sulfur(S)-, phosphorus(P)-and chlorine(Cl)-bearing species.We have normalized the column density of each molecule by the reference molecule indicated in the axis label: CH 3 OH for O-bearing, CH 3 CN for Nbearing, and CH 3 SH for S-, P-and Cl-bearing species.The column densities of CH 3 OH and CH 3 CN in G31.41 are calculated by Mininni et al. (2023) using CH 18  3 OH and 13 CH 3 CN istopologues respectively, while CH 3 SH has been analized in this work (Sect.4.1.13).
Figures 5, 6, and 7 shows in general a good correlation of Oand N-bearing species for all pair of sources, spanning a remarkably broad range of molecular abundance ratios of up to six orders of magnitude.Most of the points fall in the ±1 order of magnitude region from the 1:1 correlation (gray shaded area in the figures).
The case of S-and P-bearing species is different.Figure 5 shows that these species, when normalized to CH 3 OH, lie well below the 1:1 correlation, namely they are underabundant, in the star-forming regions (G31.41 and IRAS16B) compared to 67P/C-G (upper middle and lower left panels in Fig. 5).However, when the abundance ratios are computed by using CH 3 SH instead of CH 3 OH, they lie closer to the 1:1 line as shown in Fig. 8.This suggests that the chemical content of S-and P-bearing species, unlike O-and N-bearing, is different in the star-forming regions and the comet 67P/C-G.We will discuss in detail the implications of this in Sect 6.

Correlation between sources using statistical tests
To quantify the goodness of the molecular abundance correlations between the different sources, and hence to investigate how their chemical contents behave, we used three statistical tests: Spearman ( s ), Kendall () and Theil-Sen (T-S).Each test checks different properties of the correlations: while Spearman and Kendall report whether the data points are monotonically increasing or not (by comparing the ranges and by analyzing if the pairs of data are concordant      or discordant, respectively), Theil-Sen indicates if the data can be fitted by a linear fit, similarly to the Pearson correlation coefficient, but with the advantage of being less sensitive to the outliers.We present a detailed description of the three tests in Appendix E.
We applied the correlation tests using the molecular abundance ratios of the detected molecules.For each statistical test we calculated its -value, which quantifies the reliability of the derived correlation coefficient by taking into account the number of data points and the goodness of the correlation.If the correlation coefficient calculated is perfectly reliable, then the -value is 0, whereas a value of 1 indicates that the correlation coefficient is not reliable at all.We considered that a value below 0.1 is good enough to trust the correlation coefficient.
To analyse the results of the correlation tests we have computed four correlation matrices, one for each test, and one for the average correlation coefficient () defined as: where  s , , and T-S are Spearman, Kendall and Theil-Sen statistical tests, respectively.Each matrix is divided in cells, with each cell corresponding to the correlation coefficient for a pair of sources.The correlation matrices have been obtained for different groups of molecules: all molecules (Fig. 9), O-bearing (Fig. 10), N-bearing (Fig. 11), and S-bearing (Fig. 12).As shown in these figures, the results obtained using the three different statistical tests and the average one () are overall similar, which stress the robustness of the correlation analysis.For this reason, we will restrict the discussion to the average correlation matrices.

Full chemical reservoir
We first discuss the results found considering all the molecules analyzed.
The average correlation matrix (Fig. 9) indicates that the pairs of sources with very good correlations are IRAS16B and 46P/W (=0.78, values < 0.01, based on 10 molecular species), and the two comets (=0.7, = 0.1−0.23,indicating a low reliability because is based on only 5 species).Good correlations are also found between G31.41 and IRAS16B (=0.63, < 0.01 based on 35 molecular species), and G31.41 and 46P/W (=0.54, = 0.07−0.12,based on 9 species).The poorest correlations involve the 67P/C-G comet with the two star-forming regions: IRAS16B (=0.45, < 0.1, based on 17 molecular species), and G31.41 (=0.22,based on 15 molecular species, with high  values ranging 0.21−0.7,which indicated a low reliability).As discussed in the next subsections, the main reason for these discrepancies are the different molecular abundance ratios for S-bearing species found in 67P/C-G compared to those in the star-forming regions.
In the following subsections, we discuss the results obtained for the different families of molecules.

O-bearing species
Considering only O-bearing species, the correlation matrix (Fig. 10) indicates that this family of molecules has in general better correlations between sources than the full chemical reservoir (compare with Fig. 9).The best correlation is found between IRAS16B and comet 67P/C-G (= 0.86,  ≤ 0.01, using 8 molecules); in agreement with the results previously found by Drozdovskaya et al. (2019).There is also a very good correlation between G31.41 and 46P/W (= 0.84,  < 0.1, based on 5 molecules).
A very good correlation is also found between IRAS16B and 46P/W (=0.72,using 5 molecules).The lower middle panel of Fig. 6 shows that the detected molecules are inside the grey shaded area around the 1:1 correlation line, although the  values (0.06−0.23) indicate that the correlation should be taken with some caution.
The correlation between G31.41 and IRAS16B is good (=0.65, < 0.001, based on 16 molecules).Most of the molecular abundance ratios (with the only exceptions of CH 3 CHO and CH 3 COOH) fall within the area of ± 1 order of magnitude around the 1:1 line, confirming a very similar chemical composition of O-bearing molecules between these two star-forming sources.CH 3 CHO is less abundant in G31.41 by a factor of 26.1, which might be partially due to opacity effects, since Mininni et al. (2023) derived that its emission is partially optically thick, with line opacities up to =0.2.CH 3 COOH is more abundant in G31.41 than in IRAS16B by a factor of 40.5.However, this result is compatible with the ones obtained by Mininni et al. (2020), in which CH 3 COOH was more abundant in G31.41 compared with other 3 sources by a factor of 2 and also was more abundant on G31.41 than other 18 sources.
The poorest correlation found is between G31.41 and 67P/C-G (=0.26,based on 7 groups with same molecular mass).The reason for this poor correlation (see upper middle panel of Fig. 6) is the presence of two outliers: t HCOOH and H 2 CO.In the case of t HCOOH, the transitions analyzed by García de la Concepción et al. ( 2022) are slightly optically thick (=0.05-0.2),so the true column density would be higher, and thus closer to the 1:1 region.These two outlier molecules might be either underabundant in G31.41 or overabundant in 67P/C-G.Comparing G31.41 with IRAS16B and the comet 46P/W, both molecules fall in the shaded area around the 1:1 line (Fig. 6; being t HCOOH an upper limit in 46P/W).In addition, when comparing the two comets (67P/C-G and 46P/W), H 2 CO and the upper limit of t-HCOOH are above the 1:1 line.Thus, these two species seem to be overabundant in comet 67P/C-G compared to the other sources.
We did not perform correlation tests for the 67P/C-G and 46P/W pair because only 2 molecular abundance ratios are available (lower right panel of Fig. 6).

N-bearing species
The correlation matrices of N-bearing species are shown in Fig. 11.The comparison between the two star-forming regions (G31.41 and IRAS16B) shows a very good correlation (=0.78, ≤ 0.01, based on 11 molecules).N-bearing As shown in Fig. 7, most of the molecular abundance ratios are within the area of ± 1 order of magnitude with respect to the 1:1 line, with only two outliers: HC 3 N and NH 2 CN.Both molecules fall only slightly outside of the ±1 order of magnitude area.As already mentioned in Sect.4.1.7,this discrepancy in NH 2 CN might be due to optical depth effects.
Only HCN, HNC, CH 3 CN, and NH 2 CHO are detected in both comets, and also NH 3 and CH 3 NC were reported in 67P/C-G.As a consequence, only three molecular abundance ratios are available for the comparisons of G31.41 vs. 67P/C-G, G31.41 vs. 46P/W, and IRAS16B vs. 46P/W, and so the correlation tests performed should be taken with caution.The average correlation coefficients are in the three cases =1 (Fig. 11), but the low number of points (only three), make these correlations not fully reliable.
For the two comparisons left (IRAS16B vs. 67P/C-G and 67P/C-G vs. 46P/W), we did not preform statistical tests because only two molecular abundance ratios are available.

S-, P-and Cl-bearing species
Figure 12 shows the correlation matrices for S-bearing molecules.The correlations are systematically poorer than those for O-and N-bearing molecules.
The comparison between the two star-forming regions, based on 6 molecules (upper left panel in Fig. 8), gives a good correlation in the Spearman and Kendall correlation matrix ( s =0.64 and =0.55 respectively).However, the Theil-Sen correlation matrix value gives no correlation (T-S=0.2).This is because Theil-Sen searches for a linear fit, while the other two methods check if the points are monotonically increasing.The low correlation derived by Theil-Sen decreases the average coefficient to =0.46, which is slightly below to the threshold we have considered for a good correlation.In any case, the low number of molecules considered (6), and the high -values (=0.13-0.70),advice to take this result with some caution.
We cannot retrieve the correlation coefficients in the comparisons involving 46P/W due to the low number of S-bearing species detected (only H 2 S and CS), and the non detection of the reference molecule, CH 3 SH.
The comparison between the comet 67P/C-G and the two starforming regions gives no correlation (=-0.2for G31.41, and =0.09for IRAS16B; Fig. 12), confirming that the S chemical content in the comet significantly differs from that of the star-forming regions, unlike the O and N content, as already mentioned in Sect.5.1.
Regarding Cl-and P-bearing species, we did not perform correlation tests given the low number of species detected.Only CH 3 Cl has been detected towards IRAS16B and 67P/C-G (Fayolle et al. 2017), and PO towards 67P/C-G (Rivilla et al. 2020), preventing us to compare among the different sources.Grey dotted line is the 1:1 relation on each plot.Dark and light grey are half and one order of magnitude difference with respect to the grey dotted line.In the legend:  s is the Spearman correlation coefficient,  is the Kendall one, T-S is the Theil-Sen one,  is the correspondent -value.We note that the limits of the axis are different to those shown in Fig. 5 for display purposes.
and planetary systems.On the one hand, the natal cores in a highmass and a low-mass star-forming region (G31.41 and IRAS16B, respectively).On the other hand, two Solar System comets (67P/C-G and 46P/W), whose chemical composition is thought to be almost pristine, namely it informs us about the primitive molecular ingredients of our planetary system that could have been delivered to the young Earth (Altwegg et al. 2016(Altwegg et al. , 2019)).In the following, we discuss the implications of our findings about the evolution of the chemical feedstock during the process of star and planet formation.
Complementing the correlations done in Sect.5, and to follow how the molecular abundance ratios vary from source to source, we have plotted them in Fig. 13, separating O-bearing, N-bearing and S-bearing species.The two former are normalized by CH 3 OH (upper right panel) and CH 3 CN(upperleftpanel),respectively • Thelatterisnormalized by CH 3 OH (lower left panel) and CH 3 SH (lower right panel).As shown in Fig. 13, and already mentioned in Sect.5, we have found an overall very good correlation between the molecular abundance ratios of the two star-forming regions (G31.41 and IRAS16B), using a large sample of molecules (35 common detected species in both sources).Most of the molecular ratio pairs in these two regions are consistent within an order of magnitude, with variation factors of 1.0±0.6 for O-bearing species, and of 2.3±1.6 for N-bearing molecules.This similarity is remarkable because these two star-forming regions have very different physical properties.They differ in the masses of their gas envelopes (70  ⊙ vs. 4  ⊙ ; Cesaroni 2019 and Jacobsen et al. 2018;respectively), their protostellar masses (15-26 M ⊙ vs. 0.1 M ⊙ ), and their luminosities (4.5×10 4  ⊙ vs. 3  ⊙ ), from Beltrán et al. (2021) and Jacobsen et al. (2018), respectively.Their location within the Galaxy is also different, and the local environment clearly differs in terms of clustering: while G31.41 is a massive protostellar cluster (Beltrán et al. 2021), IRAS16B is an isolated low-mass triple protostellar system (see e.g., Maureira et al. 2020).This implies that the expected protostellar feedback (e.g. from protostellar molecular outflows, and UV radiation) is significantly different.As well, the protostellar heating timescales of the hot core G31.41 is expected to be much faster than that of the hot corino IRAS16B (Viti et al. 2004;Awad et al. 2010).Moreover, it should be noted that the spatial scales traced by the observations used in this work are very different: 4400 au for G31.41 and 60 au for IRAS16B (Mininni et al. 2020 andJørgensen et al. 2018, respectively).In principle, one should expect that all these above-mentioned differences in the physical conditions might have an imprint in the molecular ratios in the gas phase, due to e.g.different desorption histories (thermal and/or shock-induced), UV processing, or spatial scale effects.However, we have found that the molecular ratios of O-and N-bearing species towards the hot core and hot corino of G31.41 and IRAS16B, respectively, are remarkably similar.This result is in good agreement with previous works that have found similar chemical reservoirs comparing tens of star-forming regions, based on smaller samples of molecules (e.g.Coletta et al. 2020;van Gelder et al. 2020;Colzi et al. 2021;Nazari et al. 2021Nazari et al. , 2022;;and references therein).
When the comparison is extended to comets, we have found different results depending on the chemical family.O-and N-bearing  Theil-Sen  Urquhart et al. (2019) considered up to 120 line intensity ratios (which can be used as an approximate proxy of abundance ratios) towards massive clumps at different stages, and found that 13 of them show some trends with evolutionary stage (see their Fig.20).The ratios change only by a factor of ∼2−4 at most, with only a couple of exceptions, for which the factor is >5.These variations are in agreement with the results found in our analysis for the molecular ratios for N-and O-bearing species, when comparing the two starforming regions (low and high-mass) and the two comets.Although a few species appear as outliers, as discussed in Section 5, most of the molecular ratios are in general well correlated in the four sources analyzed, with variations less than one order of magnitude (see Figs. 6, 7 and 13).
Regarding S-and P-bearing species, clear differences were observed between the star-forming regions and the comets, as already mentioned in Sect. 5.The two lower panels of Fig. 13 are very illustrative of the behaviour of the S-bearing family.The abundance ratios with respect to CH 3 OH (lower left panel) have significantly higher values in 67P/C-G than in the two star-forming regions by 1-2 orders of magnitude (factors of 7-377).However, once normalized with CH 3 SH (lower right panel), the molecular abundance ratios in 67P/C-G are very similar to those of the two star-forming regions (CH 3 SH was not detected towards 46P/W).
The thiol/alcohol ratio CH 3 SH/CH 3 OH is 0.18 in 67P/C-G.This value is three orders of magnitude higher than those derived in G31.41 and IRAS16B: (8±4)×10 −4 and (4.8±0.8)×10−4 , respectively.This suggests an overabundance of S-bearing species, compared to O-bearing, in the comet 67P/C-G.This is in good agreement with the high S/O ratio of 0.015 found by Calmonte et al. (2016) in the coma of 67P/C-G, and in the comet C/1995 O1 Hale-Bopp of 0.02 (Bockelée-Morvan et al. 2000).Both values are very close to the cosmic abundance S/O ratio measured from solar photospheric abundances (Anders & Grevesse 1989;Lodders 2010).As discussed in Calmonte et al. (2016), this strongly suggests that S is not depleted in these comets, in contrast to the results usually found in dense clouds and star-forming regions (Gondhalekar 1985;Jiménez-Escobar & Muñoz Caro 2011;Laas & Caselli 2019;Fuente et al. 2023), like G31.41 and IRAS16B.This means that in star-forming regions most of the sulfur is trapped in ice grains, since S-bearing species are more refractory than O-and N-bearing species, and thus they are not easily thermally desorbed into the gas phase in the hot cores/corinos present in high-and low-mass star-forming regions, respectively.In contrast, the results found in comets indicate that these S-bearing species can escape the surface, and hence we can have access to the refractory S-bearing content.
A similar behaviour is also observed for P-bearing species, which are also thought to have a refractory character.In this work, we have not performed the statistical tests for P-bearing molecules because only a single species, phosphorus monoxide (PO), has been detected towards only one of the sources considered, the comet 67P/C-G (Rivilla et al. 2020).The derived P/O ratio is (0.5−2.7)×10 −4 , close to the solar value of ∼6×10 −4 , which indicates that P, similarly to S, is only slightly or not depleted.In contrast, the middle upper and left lower panels of Fig. 8 show that the PO upper limits derived towards G31.41 (this work) and IRAS16B (Drozdovskaya et al. 2019) are below the 1:1 line, indicating that this species is underabundant in the gas phase of star-forming regions compared to the 67P/C-G comet, similarly to the S-bearing species.
Indeed, P-bearing molecules are not detected in hot cores (Rivilla et al. 2020) or hot corinos (Bergner et al. 2022).The interferometric high-angular resolution of these works has shown that the emission of P-bearing molecules (PN and PO) does not arise from the hot gas surrounding the protostars, but from more distant gas affected by shocks, which are able to sputter the grains and release the P content of their surfaces.Very recently, Fontani et al. (2024) have confirmed that PN, and tentatively PO, are indeed detected in shocked gas associated with molecular outflows powered by the protostars embedded in the G31.41 cluster.
Finally, we note that the analysis performed here presents evident caveats that need to be addressed in the near future.We have considered the molecular ratios of only four sources (two star-forming regions and two comets), whose chemical composition is very well known.This has allowed to study tens of different molecular ratios, but with poor statistics in terms of number of sources.Therefore, we stress that it is mandatory to perform unbiased and sensitive spectral surveys towards large samples of low-and high-mass star-forming regions.This will allow an extensive analysis of their full chemical census, including tens of molecules, like the ones done for G31.41 and IRAS16B by the GUAPOS and PILS survey, respectively.This will complement previous efforts that have been focused on a handful of selected species towards large samples of hot corinos (e.g.Bouvier et al. 2021or Yang et al. 2021), and hot cores (e.g.Chen et al. 2023).
Moreover, there is still a key step in the process of star and planet formation whose chemical content is largely unknown: the planetforming disk phase.Numerous efforts have been done in the last years (e.g.Banzatti et al. 2020;Grant et al. 2021;Pegues et al. 2021;Booth et al. 2021;Brunken et al. 2022), including the detection of complex molecules (Brunken et al. 2022;Tabone et al. 2023), which have suggested a direct inheritance between protoplanetary disks and previous phases e.g.(Booth et al. 2021).However, the number of species detected in these objects, which are the cradles of new planets, is still low compared with the ISM census (see McGuire 2022).Hence, direct comparisons based on a large sample of molecules, like the ones done in this work, are not possible yet, and should be addressed in the forthcoming years.As well, current and future space in-situ and sample-return missions, targeting new comets and asteroids, like the on-going JAXA Hayabusa2 mission to the asteroid Ryugu (e.g.Potiszil et al. 2023), or the OSIRIS-REx mission to the asteroid Bennu (Lauretta et al. 2017), will significantly widen our current view of the chemical ingredient of Solar System objects, and thus they will allow us to attain a complete view of the chemical thread during the whole evolution of planetary systems.

SUMMARY AND CONCLUSIONS
In this work we provide a comprehensive view of the chemical feedstock of the massive star-forming protocluster G31.41+0.31.We have analyzed the emission of 34 molecules using data from the GUAPOS project (G31.41+0.31Unbiased ALMA sPectral Observational Survey).We have derived the column densities and abundances of 18 molecular species, including 25 isotopologues and 3 tentative detections (HONO, C 2 H 5 CHO and CH 3 OCH 2 OH).For the 16 molecules not detected, we derived column density upper limits.We have added these molecules to those already reported in previous GUAPOS works, to compile a total molecule sample of 57 species.Then, we have performed a comparative study of the G31.41 molecular abundance ratios with those of sources that represent different evolutionary stages of star formation, from the protostellar stage to Solar System objects: the prototypical Solar-like protostar (IRAS 16293-2422 B), and two comets (67P/Churyumov-Gerasimenko and 46P/Wirtanen).To quantitatively study the correlation between the chemical content of the sources we performed three complementary statistical tests: Spearman, Kendall and Theil-Sen.Our results shows that the molecular abundances of the two star-forming regions, G31.41 (high-mass) and IRAS16B (low-mass), are in general well correlated, including O-, N-and S-bearing species.This similarity, based on a large sample of molecules, denotes that molecular abundance ratios do not vary by factors larger than one order of magnitude, despite of the very different physical properties of these two highand low-mass star-forming regions, such as the mass or the luminosity, level of clustering, protostellar feedback, UV radiation, evolution timescales, or location within the Galaxy.
The comparison with the two comets (67P/C-G and 46P/W) shows that their O-and N-bearing molecular abundance ratios are also well correlated with those of the star-forming regions.This suggests that the abundances ratios of these chemical families are primarily set at initial phases of star formation, and subsequently inherited without variations larger than one order of magnitude during the process of star and planet formation.However, the S-bearing molecules behave differently, being significantly more abundant in 67P/C-G and 46P/W than in the gas phase of star-forming regions.This result can be explained in terms of sulfur-depletion in the ISM, namely that sulfur is trapped on the icy surfaces of dust grains.On the contrary, Sbearing species are believed to have been freely desorbed into the coma of the comets, recovering the expected cosmic abundance.A similar behaviour is also suggested for P, based on the PO detection in the comet 67P/C-G, and the upper limits derived in the two starforming regions.

APPENDIX A: SLIM FITS OF OTHER ISOTOPOLOGUES
Table A1: Results of the SLIM fits of the isotopologues of the molecules analysed towards G31.41 that were not used to derive the abundances of the main isotopologues, ordered by increasing molecular mass.The resulting physical parameters, along with their associated uncertainties, are presented.The values of the parameters fixed to perform some fits are shown without uncertainties.In the case of non detections, we indicated the upper limit in the column density with <.When  ex is fixed, we calculate in what factor the value of  changes when using the different temperatures  min and  max : 25-75 K and 100-200 K for fixed temperatures of 50 K and 150 K, respectively.We also present the maximum line opacity of the transitions of each molecule (  ).The transitions used to obtain these values are in Table D1.Notes. Maximum  of all the transitions detected in each species.

APPENDIX B: SAMPLE OF MOLECULES USED FOR COMPARISONS BETWEEN SOURCES
The complete list of molecules used in this work (detections, tentative detections and upper limits) is given in Table B1 for the four sources used in this work (G31.41,IRAS16B, 67P/C-G and 46P/W).The species are ordered by increasing molecular mass.

APPENDIX D: MOLECULAR TRANSITIONS USED IN THE ANALYSIS
The transitions used for the MADCUBA fits of the molecular emission of the analysed species in this work towards G31.41+0.31(see Table 2 and Table A1) are listed in Table D1.
Table D1: List of the transitions of the molecules analysed in this work that were used to perform the MADCUBA fits to obtain their physical parameters (see Table 2 and Table A1).Frequency is the rest frequency of the transitions in GHz; Quantum numbers indicates the quantum number used to label the rotational energy levels; log  is the logarithm of the line intensity on nm 2 MHz;  up is the energy of the state of higher energy in K; Area is the integrated intensity of the line in K km s −1 and  is the optical depth of the transition.In case of upper limits, the transitions listed are only those used to retrieve the upper limit (see Fig G1

APPENDIX E: STATISTICAL TESTS
We have used three complementary correlation tests, Spearman ( s ), Kendall () and Theil-Sen (T-S), to perform a quantitative analysis of the comparison of the chemical content in the different interstellar sources.We explain here the basics of these correlation tests, and how to interpret their results.

E1 Spearman
Spearman correlation test ( s ) is a non-parametric test or a range correlation test.This test orders the   and   data from lower to higher separately.Once the data of   and   is reordered, each data are replaced by its range, i.e. an integer number starting by 1 for   and   following the order.If two or more data have the same value (  or   ), the number assigned is the average number of the ranges.The formula which calculates the Spearman correlation coefficient depends on the range of both data, in this case the range of   and   is   and   respectively.Therefore, this is the formula: The range of this correlation test is from -1 to 1. -1 is perfect anti-correlation, 0 no correlation at all and 1 perfect correlation.

E2 Kendall
Kendall correlation test () is based on Kendall correlation coefficient which is a non-parametrical correlation test.In this case, we calculate by pairs of  data (  ,   ).We have the  data (, ) with all the possible pair of points (  ,   ) -(  ,   ) having ( − 1)/2 different pairs, which are classified in two different categories.Any pair of data (  ,   ) and (  ,   ) where  ≠  are concordant if   >   and   >   or   <   and   <   .Otherwise, the pair of data are discordant.
To calculate the value of this coefficient the equations is the following: Where  C is the concordant pair,  D is the discordant one and  is the number of the data (, ).This coefficient has the same range and interpretation as Spearman correlation coefficient.

E3 Theil-Sen
The Theil-Sen estimator (T-S) is an alternative for the conventional linear regression (i.e.Pearson correlation coefficient).This method provides us with a non-parametric linear regression which is very robust.It is very effective against outliers.The idea is to calculate the slope (   ) for each possible pair of points, using the following formula:  and 4.1.2.The black histogram and its grey shadow are the observational spectrum.The red curve is the fit of the individual transition, the green curve is the upper limit obtained using that transition and the blue curve is the cumulative fit considering all detected species.The red/green dashed lines indicate the frequency of the molecular transitions.
The final estimation for the slope ( T−S ) of the linear regression is the median of all the slopes.The intercept point is calculated by doing the median of all the intercept points of the data.
Once we have the slope, we use formula which associates the slope of a linear regression with a correlation coefficient (this is used to obtain Pearson correlation coefficient), Theil-Sen in this case.The formula is the following: By doing so, we obtain a correlation coefficient from a linear regression, but more robust and less dependent on outliers.This correlation coefficient has the same range and interpretation as Spearman and Kendall.The difference is that this coefficient is evaluating if the points are close to a straight line, and the other coefficients evaluates is the points are monotonically increasing or decreasing.
To calculate the -value of Theil-Sen, we use the -value of Pearson correlation coefficient, due to this estimator is a modified linear regression (Pearson), so for -value we used the t distribution: where  is the coefficient value and  is the number of observations.To calculate the -value the formula is 2 x ( > ) where  follows a t distribution with  − 2 degrees of freedom. is the probability of a certain value of being higher than ,  follows the density distribution of a  distribution.We integrated the probability density from  to infinite to obtain the , the expression used is the following: Where  is the degrees of freedom,  is the t distribution described in equation E5, and  is .
To obtain the -value we solve the equation E6 and we multiply the result by 2 because the -value is from two tails of the Gaussian.

APPENDIX F: REMAINING SPECTRA OF DETECTED MOLECULES
We show here the spectra of the detected molecules towards G31.41 analyzed in this work that have not been presented in the main text.The results of the parameters obtained from the fits are summarized in Table 2 and Table A1, and the information about the transitions used are listed in Table D1.spectra (explained in more detail in Sect 4.2).To assess how constraining (or not) are the derived upper limits, in Fig. G2 we show how they compare, normalized to CH 3 OH, with the fractional abundances or upper limits derived towards IRAS16B.Most of the molecules not detected towards G31.41 were also reported as upper limits towards IRAS16B, and they fall within or very close to the region of ±1 order of magnitude around the 1:1 line (light grey shaded area of Fig. G2).Only five species of the sample are not detected towards G31.41 but detected in IRAS16B: H 2 S, HOCH 2 CN, C 2 H 3 CHO, CH 3 Cl, and C 3 H 6 .The upper limit of H 2 S we derived in G31.41 is not constraining because there are not adequate transitions of this species in the ALMA Band 3. As shown in Table D1, the transition used to derive the upper limit, although it is the best in the spectra analyzed, it is relatively weak and has a high energy level; and as a consequence the derived upper limit is high.The upper limit derived for HOCH 2 CN is not particularly constraining either, around one order of magnitude higher than the value derived from its detection in IRAS16B.In contrast, for C 2 H 3 CHO, CH 3 Cl and C 3 H 6 , the upper limits are below the values derived in IRAS16B, indicating that these species are less abundant in G31.41.

Figure 1 .Figure 2 .
Figure 1.NH 2 CN described in Sect.4.1.7.The black histogram and its grey shadow are the observational spectrum.The red curve is the fit of the individual transition and the blue curve is the cumulative fit considering all detected species.The red dashed lines are the frequency of the transitions that we are fitting.The plots are sorted by decreasing intensity of the corresponding species (red line).

Figure 3
Figure 3. c C 2 H 4 O described in Sect.4.1.9.Selected transitions of c C 2 H 4 O are shown, more transitions are detected but are not part of this figure.The black histogram and its grey shadow are the observational spectrum.The red curve is the fit of the individual transition and the blue curve is the cumulative fit considering all detected species.The red dashed lines indicate the frequency of the molecular transitions.The plots are sorted by decreasing intensity of the corresponding species (red line).

Figure 4 .
Figure 4. CH 3 SH described in Sect.4.1.13.The black histogram and its grey shadow are the observational spectrum.The red curve is the fit of the individual transition and the blue curve is the cumulative fit considering all detected species.The red dashed lines indicate the frequency of the molecular transitions.The plots are sorted by decreasing intensity of the corresponding species (red line).

Figure 5 .
Figure 5. Molecular abundances with respect to CH 3 OH (of detected molecules and upper limits) classified by chemical families on different pair of sources.The errorbars marked with arrows are the upper limits.Grey dotted line is the 1:1 relation on each plot.Dark and light grey are half and one order of magnitude difference with respect to the grey dotted line.In the legend:  s is the Spearman correlation coefficient,  is the Kendall one, T-S is the Theil-Sen one,  is the correspondent -value.

Figure 6 .
Figure6.Same as in Fig.5but only plotting O-bearing molecules with respect to CH 3 OH.The errorbars marked with arrows are the upper limits.Grey dotted line is the 1:1 relation on each plot.Dark and light grey are half and one order of magnitude difference with respect to the grey dotted line.In the legend:  s is the Spearman correlation coefficient,  is the Kendall one, T-S is the Theil-Sen one,  is the correspondent -value.We note that the limits of the axis are different to those shown in Fig.5for display purposes.

Figure 7 .
Figure7.Same as in Fig.5but only for N-bearing molecules (including molecules containing O) with respect to CH 3 CN.The errorbars marked with arrows are the upper limits.Grey dotted line is the 1:1 relation on each plot.Dark and light grey are half and one order of magnitude difference with respect to the grey dotted line.In the legend:  s is the Spearman correlation coefficient,  is the Kendall one, T-S is the Theil-Sen one,  is the correspondent -value.We note that the limits of the axis are different to those shown in Fig.5for display purposes.

6
DISCUSSION: THE CHEMICAL THREAD DURING STAR FORMATIONWe have compared in this work the chemical composition of four sources that cover initial and final steps of the formation of stars N / N CH3SH (

Figure 8 .
Figure8.Same as in Fig.5but only for the P-, S-, and Cl-bearing molecules with respect to CH 3 SH.The errorbars marked with arrows are the upper limits.Grey dotted line is the 1:1 relation on each plot.Dark and light grey are half and one order of magnitude difference with respect to the grey dotted line.In the legend:  s is the Spearman correlation coefficient,  is the Kendall one, T-S is the Theil-Sen one,  is the correspondent -value.We note that the limits of the axis are different to those shown in Fig.5for display purposes.

Figure 9 .Figure 10 .Figure 13 .
Figure 9. Correlation matrices considering all the molecules analysed in this work.The goodness of the correlation is color-coded, the bluer the color, the better the correlation.The value between brackets on each cell is the number of molecules considered.

Figure F1 .
Figure F1.HCN and HNC isotopologues described in Sect.4.1.1 and 4.1.2.The black histogram and its grey shadow are the observational spectrum.The red curve is the fit of the individual transition, the green curve is the upper limit obtained using that transition and the blue curve is the cumulative fit considering all detected species.The red/green dashed lines indicate the frequency of the molecular transitions.

Figure F2 .
Figure F2.CO isotopolgues described in Sect.4.1.3.The black histogram and its grey shadow are the observational spectrum.The red curve is the fit of the individual transition, the green curve is the upper limit obtained using that transition and the blue curve is the cumulative fit considering all detected species.The red/green dashed lines indicate the frequency of the molecular transitions.

H 2 C 18 O
Figure F3.H 2 CO isotopolgues described in Sect.4.1.4.The black histogram and its grey shadow are the observational spectrum.The red curve is the fit of the individual transition and the blue curve is the cumulative fit considering all detected species.The red dashed lines indicate the frequency of the molecular transitions.The plots are sorted by decreasing intensity of the corresponding species (red line).

Figure F4 .
Figure F4.CH 3 CCH described in Sect.4.1.5.The black histogram and its grey shadow are the observational spectrum.The red curve is the fit of the individual transition and the blue curve is the cumulative fit considering all detected species.The red dashed lines indicate the frequency of the molecular transitions.The plots are sorted by decreasing intensity of the corresponding species (red line).

Figure F5 .Figure F6 .Figure F7 .Figure F8 .
Figure F5.CH 3 NC described in Sect.4.1.6.The black histogram and its grey shadow are the observational spectrum.The red curve is the fit of the individual transition and the blue curve is the cumulative fit considering all detected species.The red dashed lines indicate the frequency of the molecular transitions.

Figure F9 .
Figure F9.HC 3 N isotopologues described in Sect.4.1.14.The black histogram and its grey shadow are the observational spectrum.The red curve is the fit of the individual transition and the blue curve is the cumulative fit considering all detected species.The red dashed lines indicate the frequency of the molecular transitions.The plots are sorted by decreasing intensity of the corresponding species (red line).

Figure F11 .Figure F12 .
Figure F11.OCS isotopologues described in Sect.4.1.16.The black histogram and its grey shadow are the observational spectrum.The red curve is the fit of the individual transition and the blue curve is the cumulative fit considering all detected species.The red dashed lines indicate the frequency of the molecular transitions.The plots are sorted by decreasing intensity of the corresponding species (red line).

Figure F13 .
Figure F13.SO 2 isotopologues described in Sect.4.1.18.The black histogram and its grey shadow are the observational spectrum.The red curve is the fit of the individual transition and the blue curve is the cumulative fit considering all detected species.The red dashed lines indicate the frequency of the molecular transitions.The plots are sorted by decreasing intensity of the corresponding species (red line).

Figure G1 .Figure G2 .
Figure G1.Molecules not detected in G31.41.The black histogram and its grey shadow are the observational spectrum.The LTE synthetic spectra using the derived upper limits of  are indicated with green curves.To compute the upper limits of their molecular abundances we have used the brightest and less blended transitions, which are shown here.The green dashed lines indicate the frequency of the molecular transitions.The transitions of this plots are in TableD1on the not detected molecules section.

Table B1 :
Molecules used in this work.Molecules with a [✓] implies that they are detected; [∼ ✓] indicates a tentative detection; and [ < ] denotes non detections.The molecules indicated with * in the 67P/C-G column are members of isomeric groups .

Table C1 :
TableC1lists the spectroscopic catalog entries of all the molecules analysed in this work towards G31.41.The species are ordered by increasing molecular mass.Spectroscopic entries used to perform the analysis, including the molecular catalog, tag of the species, version number, and date of the version.

Table D1 :
).The quantum numbers used are:  which indicates the total angular momentum (fine structure effects imply the use of both  and ),  a and  c are the projections of  onto the  and  inertial axis (if the molecule is symmetric top only  is needed),  specifies the vibrational states,  t is the torsional state and  designates the spin quanta.The  and  labels refers to the A-and E-symmetry states, respectively, if present.We label as 0 + and 0 − the symmetric and antisymmetric tunneling states, respectively, if present.The molecules are in the ground state by default in this table unless specified.Continued.