Near-Infrared Transmission Spectroscopy of HAT-P-18 b with NIRISS: Disentangling Planetary and Stellar Features in the Era of JWST

The JWST Early Release Observations (ERO) included a NIRISS/SOSS (0.6–2.8 𝜇 m) transit of the ∼ 850 K Saturn-mass exoplanet HAT-P-18 b. Initial analysis of these data reported detections of water, escaping helium, and haze. However, active K dwarfs like HAT-P-18 possess surface heterogeneities — starspots and faculae — that can complicate the interpretation of transmission spectra, and indeed, a spot-crossing event is present in HAT-P-18 b’s NIRISS/SOSS light curves. Here, we present an extensive reanalysis and interpretation of the JWST ERO transmission spectrum of HAT-P-18 b, as well as HST/WFC3 and Spitzer /IRAC transit observations. We detect H 2 O (12.5 𝜎 ), CO 2 (7.3 𝜎 ), a cloud deck (7.4 𝜎 ), and unocculted starspots (5.8 𝜎 ), alongside hints of Na (2.7 𝜎 ). We do not detect the previously reported CH 4 (log CH 4 < -6 to 2 𝜎 ). We obtain excellent agreement between three independent retrieval codes, which find a sub-solar H 2 O abundance (log H 2 O ≈ − 4 . 4 ± 0 . 3). However, the inferred CO 2 abundance (log CO 2 ≈ − 4 . 8 ± 0 . 4) is significantly super-solar and requires further investigation into its origin. We also introduce new stellar heterogeneity considerations by fitting for the active regions’ surface gravities — a proxy for the effects of magnetic pressure. Finally, we compare our JWST inferences to those from HST/WFC3 and Spitzer /IRAC. Our results highlight the exceptional promise of simultaneous planetary atmosphere and stellar heterogeneity constraints in the era of JWST and demonstrate that JWST transmission spectra may warrant more complex treatments of the transit light source effect


INTRODUCTION
In the works for over two decades, the James Webb Space Telescope (JWST) is finally operational.Astronomers can now count on space instruments with modes designed to study exoplanetary atmospheres.Transmission spectroscopy is a commonly used method to reveal the composition and structure of an atmosphere and, therefore, to enable inferences about a planet's formation and evolution history.During a transit, part of the starlight is blocked by the planet, whereas some light passes through the planetary atmosphere and can introduce measurable absorption features (Seager & Sasselov 2000;Brown 2001).The resulting transmission spectrum can reveal key information regarding the abundance of molecular and atomic species and the presence of clouds and hazes (e.g., Charbonneau et al. 2002;Tinetti et al. 2007;Wakeford & Sing 2015;Sing et al. 2016).
Astronomers have faced many challenges when conducting atmospheric studies with the Hubble (HST) and Spitzer Space Telescopes since neither observatory was designed for exoplanet observations.Numerous technical difficulties, instrument systematics, as well as narrow wavelength range and spectral resolution, have limited atmospheric inferences.For example, observations with the Wide Field Camera 3 (WFC3) instrument aboard HST are complicated by systematic trends, such as "HST breathing" effects, visit-long slopes, and the "ramp" effect (Wakeford et al. 2016).Nevertheless, the efforts and ingenuity of scientists have led to astonishing discoveries, such as the detection of several chemical species, including water vapour on hot Jupiters (e.g., Tinetti et al. 2007;Tsiaras et al. 2018) and even on a sub-Neptune (e.g., Benneke et al. 2019b), the inference of clouds and hazes in several gas giants (e.g., Wakeford & Sing 2015;Sing et al. 2016), and the observation of atmospheric escape on hot Neptune-like exoplanets (e.g., Ehrenreich et al. 2015;Spake et al. 2018).
HAT-P-18 b was discovered in 2010 by Hartman et al. (2011) using the Hungarian-made Automated Telescope Network.It is of approximately Saturn mass ( = 0.197  J ), but with an inflated radius ( = 0.995  J ), due to its higher temperature ( eq = 852 K) relative to the Solar System giants, which is a consequence of the planet's comparatively short orbital period ( = 5.5 days).Groundand space-based transmission spectroscopy has been performed on this target.Kirk et al. (2017) suggested a high-altitude haze consistent with the detection of Rayleigh scattering and the absence of the sodium absorption feature using the Auxiliary-port CAMera (ACAM) instrument on the William Herschel Telescope (WHT).Tsiaras et al. (2018) detected the presence of water vapour and a grey, opaque cloud deck in HAT-P-18 b's atmosphere using HST/WFC3, reporting a water abundance of log H 2 O = −2.63 ± 1.18 and a cloudtop pressure of log  cloud [Pa] = 2.82 ± 0.91.Fu et al. (2022) presented an analysis of the transit observed in the Single Object Slitless Spectroscopy (SOSS) mode (Albert et al. 2023) of the Near Infrared Imager and Slitless Spectrometer (NIRISS) instrument (Doyon et al. 2023) on board the JWST and detected water (with an abundance of log H 2 O = −3.03+0.31  −0.25 ), hints of methane (Δ log  = 3.79, or a 3.2  confidence), as well as excess helium absorption and tail in an otherwise very hazy atmosphere.
HAT-P-18 is an active K dwarf with an effective temperature of 4803 K and a slightly super-solar metallicity ([Fe/H] = 0.10 ± 0.08; Hartman et al. 2011).Stellar active regions, such as starspots and faculae, can introduce spectral features in transmission spectra that overlap those of exoplanetary atmospheres.Occulted active regions were often masked when fitting transit models to spectroscopic light curves; however, the impact of the occulted spot on the transmission spectrum is still present despite that (e.g., Oshagh et al. 2014;Bixel et al. 2019).This can also lead to a biased transmission spectrum by impacting not only the transit depth but possibly the mid-transit time, the scaled semi-major axis, and the impact parameter (e.g., Barros et al. 2013;Alexoudi et al. 2020).Recent studies have moved towards joint inferences of transit and active region properties (e.g., Bixel et al. 2019;Espinoza et al. 2019a).The NASA Study Analysis Group on the effect of stellar contamination on space-based transmission spectroscopy (SAG 21) of the Exoplanet Exploration Program Analysis Group (ExoPAG) recommends performing these joint in-ferences with future observations instead of masking active region occultations (Rackham et al. 2023).
Unocculted stellar active regions have long been recognized as a significant obstacle to exoplanet transmission spectroscopy.Early HST studies recognized that unocculted cool starspots can cause strong transit depth slopes towards short visible wavelengths (e.g., Pont et al. 2007Pont et al. , 2013;;McCullough et al. 2014).Similarly, unocculted hot faculae can imprint a negative slope in transmission spectra (e.g., Rackham et al. 2018;Kirk et al. 2021).The physical origin of this 'stellar contamination' is a mismatch between the intensity of the stellar surface sampled by the planet during transit and the average spectrum of the star (including the photosphere, starspots, and faculae).Since the contrast ratio between the spectra of stellar regions at different temperatures increases as shorter wavelengths, this 'transit light source effect' (TLSE; Rackham et al. 2018; Barclay et al. 2021) is wavelength-dependent and more significant at visible wavelengths.The most common approach to deal with the TLSE in early studies was to correct the transmission spectrum based on activity monitoring or occulted starspot properties (e.g., Pont et al. 2008;Berta et al. 2011;Sing et al. 2011).Barstow et al. (2015) demonstrated that not accounting for starspots when modelling transmission spectra of giant planets can bias retrieved molecular abundances, while (Rackham et al. 2018) further showed that the TLSE can dominate over absorption features for terrestrial planets.Moran & Stevenson et al. 2023 provide a recent example of this prediction, finding degenerate interpretations between unocculted starspots and atmospheric H 2 O for JWST observation of a terrestrial exoplanet.
Recent years have seen a renewed focus on incorporating unocculted stellar regions into atmospheric retrieval codes, allowing simultaneous inferences of stellar and atmospheric properties.Pinhas et al. (2018) developed a transmission spectrum retrieval framework to jointly fit a single population of unocculted stellar heterogeneities and a planetary atmosphere.Subsequent retrieval studies have explored the fidelity of TLSE retrieval assumptions (e.g., Iyer & Line 2020;Thompson et al. 2023;Rackham & de Wit 2023) and applied these joint retrievals to interpret observations from HST and Spitzer (e.g., Bruno et al. 2020;Rathcke et al. 2021), ground-based telescopes (e.g., Bixel et al. 2019;Jiang et al. 2021;Kirk et al. 2021), and JWST (Moran & Stevenson et al. 2023).However, the SAG 21 report (Rackham et al. 2023) notes that there is considerable scope to improve the realism and complexity of retrieval prescriptions for unocculted stellar active regions.
In this work, we aim to disentangle stellar and planetary atmosphere signals by including stellar heterogeneities in transit fits and atmospheric retrievals.We present and compare two independent atmospheric reanalyses of HAT-P-18 b, one using JWST NIRISS/SOSS Early Release Observations (ERO) transit observation and another combining transit observations from HST/WFC3 and the Infrared Array Camera (IRAC) of Spitzer.We describe the observations and data reduction approach in Section 2 and detail our JWST NIRISS/SOSS light curve fitting and occulted starspot analysis in Section 3. Section 4 describes our joint stellar heterogeneity and atmospheric retrieval method and presents results from three independent retrieval codes.We summarize and discuss our results in Section 5 and conclude in Section 6.

OBSERVATIONS & DATA REDUCTION
The scientific legacy of HST and Spitzer is considerable, particularly for exoplanet studies; JWST will, in many ways, build on this legacy.In this work, we present transit observations with JWST and its pre- decessors, HST and Spitzer, to show the potential of NIRISS/SOSS to characterize exoplanetary atmospheres and to cope with challenges such as stellar contamination.

JWST NIRISS/SOSS
HAT-P-18 b was observed in transit using the SOSS mode of the NIRISS instrument (Albert et al. 2023;Doyon et al. 2023)  15 hours, which covered the 2.7 hours transit as well as some baseline before and after the transit.The GR700XD grism and the CLEAR filter were used, along with the SUBSTRIP256 subarray, which captures both diffraction orders 1 and 2. There are 469 integrations, each consisting of 9 groups and lasting 54.94 seconds.In addition, a second exposure with the GR700XD grism and F277W filter was taken directly after the end of the main CLEAR filter TSO.This exposure had the exact same readout parameters as the CLEAR exposure, except it consisted of only 11 integrations lasting 10.07 minutes.The TSO was previously published by Fu et al. (2022), the major findings of which are summarized in Section 1.

HST/WFC3 + Spitzer/IRAC
Transit observations of HAT-P-18 b were obtained using HST/WFC3 with the G141 grism in the spatial scan mode.These come from two visits, consisting of five orbits each, made as part of the Cycle 23 HST General Observer campaign (PI: Drake Deming; Deming et al. 2015) in February 2016 and January 2017.These observations spanned the wavelength range from 1.1 to 1.7 m and were downloaded from the Mikulski Archive for Space Telescopes (MAST).The raw light curves show characteristic, non-linear systematics because of thermal effects induced by the telescope's 96-minute orbit (Foster et al. 1995).This leads us to follow standard practice (e.g., Benneke et al. 2019b) and discard from our analysis the first orbit as well as the first two measurements in each subsequent orbit.This leaves one orbit on either side of the transit and two orbits corresponding to the transit itself, including significant portions of the ingress and egress.These HST transits were previously published by Tsiaras et al. (2018).
Transit observations with Spitzer come from two visits in 2013 using the IRAC instrument in the 3.6 and 4.5 m channels (PI: Jean-Michel Désert; Desert et al. 2012).The data were obtained from the Spitzer Heritage Archive.

JWST NIRISS/SOSS Data Reduction
We reduced the NIRISS/SOSS TSO using the supreme-SPOON pipeline (Feinstein et al. 2023;Coulombe et al. 2023;Radica et al. 2023;Lim et al. 2023), starting from the raw, uncalibrated files downloaded from the MAST archives.Some steps in the supreme-SPOON pipeline are handled by the official JWST pipeline.We follow a nearly identical procedure for the reduction as was presented in Feinstein et al. (2023) and Radica et al. (2023): we first process the TSO through supreme-SPOON Stage 1, which performs the detector level calibrations.This includes, in particular, the subtraction of the column-correlated 1/  noise, during which we ensure to properly mask several undispersed (zeroth order) as well as the sole dispersed (likely a second order, slightly above the target third order) contaminants of background field stars, and the ramp fitting to convert from 3D "ramp" to 2D "slope" images.
Stage 2 of supreme-SPOON performs further reductions on the slope-level products, including the background subtraction, for which we use the SOSS SUBSTRIP256 background model provided by the Space Telescope Science Institute (STScI)1 , and tracing of the target orders to define the extraction boxes as well as the stability of the target trace (changes in  and/or  position, changes in morphology, etc.) over the course of the TSO.For further details on these steps, please see Radica et al. (2023).We note, though, that using a constant scaling of the STScI background model, such as in Radica et al. (2023), did not perfectly remove the background step.We, therefore, separately scaled the STScI model blueward and redward of the step (e.g., Lim et al. 2023).We find values for the best-fitting background scaling to be 0.4384 and 0.4080 for pre-and post-jump, respectively.
We extract the stellar spectra using the ATOCA algorithm (Darveau-Bernier et al. 2022) using a specprofile reference file created specifically for this observation using the APPLESOSS code (Radica et al. 2022).ATOCA explicitly takes into account the overlap between the first and second orders of the target spectrum on the detector during the extraction, though the expected dilution introduced from this overlap is predicted to be near-negligible for relative measurements such as exoplanet transmission spectroscopy (Darveau-Bernier et al. 2022;Radica et al. 2022).
We use the updated APPLESOSS v2.0.0 in this work.The initial version of APPLESOSS presented in Radica et al. (2022) used the WebbPSF package (Perrin et al. 2014) to simulate the extended wings of the SOSS PSF (Point Spread Function).However, due to concerns that the simulated PSFs underpredict the SOSS wings2 , we update the APPLESOSS framework to use the wings of order 1 profiles bluewards of ∼ 1.1 µm where there is no contribution from order 2 on the detector.We note, however, that the transmission spectrum is unchanged whether simulated or empirical wings are used, as the strength of the self-dilution is significantly smaller than the transit depth precision.Finally, after the extraction, we clip any data points which deviate by more than 5  from their neighbours in the time direction; < 2% of points are rejected in this way.We find no significant deviation in the target trace position with the positions included in the spectrace reference file 3 , and we, therefore, use the default SOSS wavelength solution.A summary of the major reduction steps is shown in Figure 1.HAT-P-18 has a co-moving white dwarf companion separated by only ∼ 2.66 ′′ at a position angle of ∼ 186 • , as revealed by GAIA astrometry (Mugrauer & Michel 2021)  4 .Given the aperture position angle of the SOSS observation (252.09• ), the spectral trace of this companion is offset by (−9, +40) pixels relative to the trace of HAT-P-18, i.e., mostly out of the extraction aperture throughout order 1 and for all wavelengths < 0.85 m at order 2. Combined with its measured contrast of 8 mag at 1.4 m (Mugrauer & Michel 2021), it has a negligible effect on the flux extracted for HAT-P-18.No particular action was thus taken to deal with it.

HST/WFC3 + Spitzer/IRAC Data Reduction & Light Curve Fitting
Following standard procedure for HST/WFC3 observations (e.g., Deming et al. 2013;Benneke et al. 2019b), we perform the necessary data reduction and light curve fitting by using the modular Exoplanet Transit Eclipses and Phase curves (ExoTEP) framework (Benneke et al. 2017(Benneke et al. , 2019a)), which employs the batman transit light curve model (Kreidberg 2015).Using a Markov chain Monte Carlo (MCMC) method with the emcee package (Foreman-Mackey et al. 2013), ExoTEP jointly fits the transit and systematics models for the two visits along with photometric noise parameters.We construct spectrophotometric light curves by spectrally binning at 30 nm intervals (Kreidberg et al. 2014) the extracted flux for each visit.This binning was chosen because it offered the best trade-off between the number of data versus error bar size, as compared to 10 and 60 nm bin widths.We follow Benneke et al. (2019b) for the procedure of the white and spectrophotometric light curve fits.The latter for both visits are shown in Figure B1, whereas the best-fitting values from the white light curve fit are quoted in Table A1.The resulting HST/WFC3 transmission spectrum is displayed in Figure 3.
We follow standard procedure for Spitzer/IRAC image processing (e.g., Benneke et al. 2019b).As was done for the HST data, we employ ExoTEP to reduce the data further and fit the Spitzer light curves.After correcting for the Spitzer-specific systematics, we obtain a final light curve for each of the two channels as displayed in Figure B2.We used 80-second bins for the time axis due to the relatively high cadence of the Spitzer data.Compared to the HST spectrophotometric light curves, we find that the scatter is higher, resulting in larger errors for the Spitzer data points in the combined transmission spectrum.We kept the transit depths from Spitzer/IRAC in the HST transmission spectrum in order to better constrain the atmospheric retrievals.
3 spectrace reference file jwst_niriss_spectrace_0023.fits was used in this work. 4This companion was also previously reported as a candidate by Ginski et al. (2016) based on lucky imaging.

Light Curve Fitting with Spot-Crossing Masked
For the SOSS data, we construct separate white light curves for orders 1 and 2 by summing the flux from all wavelengths in order 1 (0.85-2.8 µm) and from 0.6-0.85µm in order 2. We then fit a transit model to each white light curve using the juliet package (Espinoza et al. 2019b).We fix the orbital period to 5.508023 d, the eccentricity to 0.084, and the argument of periastron to 120 • (Hartman et al. 2011; all their values used in this paper are listed in Table 1), and put wide, uninformative priors on the time from midtransit,  0 , the impact parameter, , the scaled orbital semi-major axis, / * , and the scaled planet radius,   / * .We fit two parameters of the quadratic limb darkening law following the parameterization of Kipping (2013) There is a spot-crossing event clearly visible just after the time from mid-transit.Here we do not include a model of this occulted starspot and instead mask the integrations associated with the spotcrossing (integrations 240 -270).Therefore, we fit eight parameters for each order.The reduced Chi-squared statistic for the fits is  2  = 1.08 for order 1 and 1.06 for order 2. The best-fitting transit models for orders 1 and 2 are overplotted in Figure 2.
We then proceed to fit the spectrophotometric light curves at the pixel level -that is, we fit one light curve per pixel column on the detector.This results in 2038 light curves for order 1 and 567 for order 2. For these fits, we fix the orbital parameters and the transit baseline's zero point to their best-fitting values from the order 1 white light curve fit because of the better signal-to-noise (S/N).We leave only the scaled planet radius, limb-darkening parameters, and jitter term free.We put Gaussian priors on the limb-darkening parameters based on calculations from the ExoTiC-LD package (Wakeford & Grant 2022) using the 3D stagger grid (Magic et al. 2015).The widths of the Gaussian priors are set to 0.1 (e.g., Pontoppidan et al. 2022;Espinoza et al. 2022).We tested larger widths of 0.2 following Patel & Espinoza (2022) and the retrieved transmission spectrum is consistent within less than 1 5 .We also experimented with freely fitting the limb darkening coefficients and found that this results in a slight (∼20 ppm) offset to the spectrum while preserving the relative amplitudes of the spectral features.We again mask the integrations associated with the spot-crossing in each fit.The resulting transmission spectrum is displayed in Figure 3.
Although we do not directly use the F277W exposure for science purposes in this study, as in Radica et al. (2023), it is useful to pinpoint undispersed (order 0) contaminants of field stars.Several bright contaminants are clearly visible in the F277W data frame, many of which are also readily visible in the CLEAR frame.However, unlike in Radica et al. (2023), we are unable to post-process our transmission spectrum to correct for the diluting effects of field star contamination.The contaminants are too bright preventing good reconstruction of the target trace.Moreover, as pointed out by Fu et al. (2022), one contaminant is time-varying.We, therefore, mask the affected regions of the transmission spectrum as was done in Feinstein et al. ( 2023) and Fu et al. (2022).The wavelength regions masked in this way are as follows: 0.714 -0.724 µm and 0.841 -0.85 µm in order 2, and 0.853 -0.870 µm, 1.048 -1.061 µm, 1.366 -1.384 µm, and 1.972 -2.011 µm in order 1.A frame of the F277W exposure, highlighting the masked regions is shown in Figure B3.
We also produce a transmission spectrum using an independent reduction pipeline, NAMELESS (Feinstein et al. 2023;Coulombe et al. 2023;Radica et al. 2023), as shown in Figure A1, to verify the selfconsistency of our results (see Appendix A).The best-fitting transit parameters from NIRISS/SOSS white light curve fits are shown in Table A1.

Occulted Starspot Method
We also perform a second light curve fit, enabling a joint inference of the starspot and planet properties to model the spot and study its impact on the transmission spectrum.To this end, we first compute a single broadband light curve by summing the flux from wavelengths 0.65-0.85m of order 2 together with wavelengths 0.85-1.5 m of order 1.We exclude wavelengths bluewards of 0.65 m to maximize the S/N and wavelengths redwards of 1.5 m because the effect of spot crossings is stronger at shorter wavelengths where the spot contrast with respect to the stellar photosphere is larger.We fit a transit model with a spot-crossing event using spotrod (Béky et al. 2014), which we have implemented into the juliet package (Espinoza et al. 2019b).We fix the period, the eccentricity and the argument of periastron to the Hartman et al. (2011) values and fit the mid-transit time, the impact parameter, the scaled semi-major axis, the scaled planet radius, the two parameters of the quadratic limb darkening law, and the jitter term with wide, uninformative priors.We also fit the zero point of the transit baseline, though the best-fitting value is again very close to zero.We fit four additional parameters to model the starspot: the -and -positions of the spot, its radius, and its spotto-stellar flux contrast.The positions and the radius of the spot are in stellar radii units.The centre of the star is at (0, 0).We use uniform priors ranging from 0 to 1.We employ dynesty (Speagle 2020) to sample the parameter space with 5000 live points.The reduced Chi-squared statistic for the fit with the highest likelihood is   = 1.08.There is strong evidence (Δ log  = 95.7, or a > 5  preference) for a transit model with an occulted spot.The best-fitting parameters for the broadband light curve fit with the spot-crossing modelled are displayed in Table A1.The choice of wavelength range (0.65-1.5 m) results in best-fitting values that are more precise and overall consistent at 1  to the best-fitting values with the entire wavelength ranges of order 1 and 2. We note that the -position of the spot is not well constrained, as shown in the corner plot in Figure B4, so we use the set of parameters with the highest likelihood instead of the medians of the posteriors from the broadband light curve fit, to ensure that the parameters are self-consistent.This best-fitting transit model is overplotted in the top panel of Figure 4, and a cartoon of the spot-crossing event is shown in the bottom panel.
We then fit the spectrophotometric light curves at a resolving power of  = 100, fixing the orbital parameters ( 0 , , / * ), as well as the spot positions and radius from the above broadband light curve fit.We fit the scaled planet radius, the limb-darkening parameters, the spot contrast and the jitter coefficient with the priors described above.The spectrophotometric light curves for 14 bins with their best-fitting transit models are shown in Figure 5 for the highest likelihood model.The spot contrast is correlated with the -position (see Figure B4) because there is a degeneracy between the radius and the contrast of the spot given the strong correlation between the spot temperature and filling factor6 (Bruno et al. 2022).To capture the uncertainties coming from the degeneracies in the spot -position, size and contrast, we repeat the fit of the spectrophotometric light curves 50 times by fixing the same parameters ( 0 , , / * , spot positions and radius) to 50 random sets taken from the posterior samples of the broadband light curve fit.We draw the 50 sets from the 68 % of the weighted samples with the highest likelihoods.
We next constrain the temperature of the occulted spot by fitting PHOENIX stellar atmosphere model spectra (Husser et al. 2013) to the spot contrast spectrum resulting from the spectrophotometric light curves fits.Starspots' spectra are thought to be represented by stellar models with lower temperatures and 0.5-1 dex lower log  than the host star.The increased magnetic pressure in active features decreases the gas pressure (Solanki 2003;Bruno et al. 2022), which is akin to a stellar model with a lower surface gravity.In fitting for the spot temperature, we thus also left the spot surface gravity to vary as a free parameter.For the spot, we use stellar model spectra with temperatures from 4000 to 5000 K, logarithmic surface gravities from 1 to 5 dex and a fixed metallicity of 0.1 from Hartman et al. (2011).We interpolated these spectra in temperature and log  as needed during the fitting process.For the star, we use a stellar model spectrum with the temperature, log  and metallicity fixed to the corresponding values from Hartman et al. (2011).The model spot contrast spectrum is then simply the ratio of the model spot spectrum to the model star spectrum.We fit the temperature of the spot and the log  with wide, uninformative priors using the emcee MCMC package (Foreman-Mackey et al. 2013).

Inferred Occulted Starspot Properties on HAT-P-18
From the aforementioned process, we obtained 51 starspot contrast spectra from the 51 spectrophotometric fits performed: one from the highest likelihood values and 50 with random draws from the broadband light curve fit.Each model has its own set of parameters (i.e.,  0 , , / * , spot positions and radius).The resulting density of spot contrast spectra is shown in Fig. 6.By inspecting the individual contrast spectra, we see that 28 of the 50 random fitting models (56 %) have a retrieved contrast spectrum within 1  of the one retrieved with the highest likelihood set of parameters.We lump these solutions together as the "most likely" one (blue model in Figure 6).The mean and standard deviation of the spot -and position, radius, temperature, and the model log  are given in Table 2 for this group of solutions.This corresponds to a spot radius of 0.116 ± 0.014 stellar radii, a temperature Δ = -93 ± 15 K colder than the star (or T spot = 4710 ± 15 K) and a lower log  by 1.16 ± 0.19 dex.The retrieved log  for the starspot model is consistent with Bruno et al. (2022) expectation.
Among the remaining contrast spectra, those not within 1  of the one with the highest likelihood, we can identify three other broad families of solutions corresponding to a colder spot because of a smaller filling factor; see Table 2.The transmission spectra for these four families of solutions do not significantly change (they are consistent at the 1  level) from the transmission spectrum obtained with the spot occultation features masked (as done in Section 3.1).This is expected for a spot of that size and temperature, as the TLSE from occulting such a spot results only in a small slope toward bluer wavelengths in the transmission spectrum, as shown in Figure B5.This   effect, estimated to be of the order of 10 to 20 ppm from Equation 4in Section 4, is smaller than our transit depth uncertainties.

RETRIEVAL ANALYSIS
We now present the inferences from our retrieval analysis of HAT-P-18 b's transmission spectrum with NIRISS/SOSS (from Section 3.1).
Our retrievals jointly model the influence of the planetary atmosphere and unocculted stellar heterogeneities on the transmission spectrum   and thus yield simultaneous constraints.We conducted this analysis with three independent retrieval codes to ensure robust results.
In what follows, we first describe our approach for joint retrievals of a planetary atmosphere and stellar contamination.We then describe the setup of our three retrieval codes before presenting our resulting constraints on HAT-P-18 b's atmosphere and unocculted stellar active regions.Finally, we compare retrieval results from our HST + Spitzer transmission spectrum to those yielded from NIRISS/SOSS.

A Joint Planetary Atmosphere and Stellar Contamination Retrieval Method
The influence of active stellar regions on exoplanet transmission spectra -the transit light source effect (e.g., Rackham et al. 2018) -can be modelled simultaneously with the planetary atmosphere in retrievals.This has the advantage of yielding joint constraints on the planetary atmosphere and unocculted stellar heterogeneities while also propagating uncertainties from the star into the derived atmospheric constraints (see Rackham et al. 2023 Thompson et al. 2023), with the most common being a threeparameter model that fits for the temperature and covering fraction of a single heterogeneity (Pinhas et al. 2018).While these treatments have generally sufficed for Hubble and ground-based data, the exceptional data quality and wavelength coverage of JWST motivates the consideration of more complex stellar contamination models.We investigate a range of unocculted stellar heterogeneity prescriptions during our HAT-P-18 b retrievals.Our goal is to determine the appropriate level of starspot and/or faculae model complexity required to interpret the JWST/NIRISS transmission spectrum of HAT-P-18 b while also informing future JWST retrieval studies of hot Jupiters.We begin by summarizing the equations underlying the transit light source effect.

The Transit Light Source Effect
The transmission spectrum of an exoplanet transiting a star with a heterogeneous stellar surface can be written as (MacDonald & Lewis 2022) where Δ  is the observed transmission spectrum,  , atm is the transmission spectrum from the planetary atmosphere,  , het is the wavelength-dependent "stellar contamination factor" from a heterogeneous stellar surface, and  , night accounts for thermal emission from the planetary nightside (we assume this final term negligibly deviates from unity for HAT-P-18 b).For a planetary atmosphere with properties varying only with altitude (the 1D assumption),  , atm is given by where  p, top is the planetary radius at the top of the modelled atmosphere,  is the ray impact parameter,  * is the stellar radius, and  , slant is the slant optical depth.For a stellar surface with  het unocculted heterogeneous active regions (e.g., spots or faculae), the stellar contamination factor,  , het , can be expressed as where  het,  is the fractional stellar disc coverage of the  th heterogeneous region, with corresponding specific intensity  , het,  , while  , phot is the specific intensity of the stellar photosphere.Equation 3 demonstrates that the stellar contamination factor deviates from unity when a heterogeneity possesses a different intensity from the photosphere.In general,  , het,  is shorthand for  , het,  ([Fe/H]  , log   ,   ), where [Fe/H]  is the local metallicity of the heterogeneity, log   is the local log surface gravity, and   is the heterogeneity temperature.
Most retrieval studies consider a single population of unocculted heterogeneities with a temperature different from that of the photosphere (but with the same metallicity and surface gravity).Given these assumptions, the stellar contamination factor is given by (e.g., Rackham et al. 2018;Pinhas et al. 2018) where the three free parameters are the heterogeneity coverage fraction,  het , the heterogeneity temperature,  het , and the photosphere temperature,  phot .
We also consider here a more general treatment of stellar contamination.Motivated by several strategic gaps identified in the NASA ExoPAG SAG 21 community report (Rackham et al. 2023), we also consider a two-heterogeneity model with the assumption of a common surface gravity relaxed.By defining these heterogeneities as starspots and faculae ( spot <  phot <  fac ), we can write the stellar contamination factor as The addition of a second heterogeneity population and separate surface gravities increases the number of free parameters for this stellar contamination model to eight.

Unocculted Stellar Heterogeneity Models for HAT-P-18 b
We consider five treatments for unocculted stellar heterogeneities when retrieving HAT-P-18 b's transmission spectrum: (i) No stellar contamination.
(ii) One heterogeneity, defined by three free parameters:  het ,  het , and  phot .
(iv) One heterogeneity with free surface gravity, defined by five free parameters:  het ,  het , log  het ,  phot , and log  phot .
(v) Two heterogeneities with free surface gravity, defined by eight free parameters:  spot ,  spot , log  spot ,  fac ,  fac , log  fac ,  phot , and log  phot .

Retrieval Configuration
We applied three exoplanet retrieval codes to HAT-P-18 b's transmission spectrum: Poseidon (MacDonald & Madhusudhan 2017; MacDonald 2023), Aurora (Welbanks & Madhusudhan 2021), and SCARLET (Benneke & Seager 2012;Benneke 2015).We initially conducted independent analyses to establish which atmospheric and stellar properties provide the necessary complexity to describe HAT-P-18 b's transmission spectrum.After comparing our results, we chose a common set of model assumptions for the three codes: chemical opacity from Na, K, H 2 O, CO, CO 2 , CH 4 , HCN, and NH 3 (the most prominent opacity sources expected at HAT-P-18 b's equilibrium temperature under equilibrium chemistry and/or with chemical quenching, see e.g., Madhusudhan et al. 2016); an inhomogeneous grey cloud-deck combined with a power-law haze; and one unocculted stellar heterogeneity (i.e., the one heterogeneity model).Our retrievals were conducted on the  = 100 binned variant of HAT-P-18 b's transmission spectrum with the spot-crossing masked (see Figure 3), but we found consistent results when retrieving the fullresolution NIRISS data.We describe the specific configuration used by each retrieval code below and list the priors used in Table 3.

Poseidon
We conducted a series of Poseidon retrievals to assess the model complexity required to fit HAT-P-18 b's JWST/NIRISS transmission spectrum.This includes the five stellar contamination models listed in Section 4.1.2,nested models to compute detection significances for each chemical species, and additional robustness tests (e.g., a retrieval of the unbinned pixel-resolution spectrum).
Note: All three retrieval codes adopt the one heterogeneity stellar contamination model for this comparison. ref is defined at 10 mbar for Poseidon, 10 −8 bar for Aurora, and is the isothermal temperature for SCARLET.All pressure parameters are expressed in units of bar and temperatures in K.For the priors, we adopt  p = 0.995   ,  * = 4803 K, log  * = 4.57 (cgs),   * = 80 K, and  log  * = 0.04 (cgs).All retrievals have a fixed stellar metallicity of [Fe/H] = 0.1, while those without free surface gravity assume log  = log  * = 4.57 in all regions.'-' indicates that the parameter did not feature in the retrieval.All log parameters are base 10.

SCARLET
We performed retrievals using the SCARLET retrieval framework (Benneke 2015;Benneke et al. 2019a).We improved the SCARLET framework from previous work on low-resolution observations (Benneke et al. 2019a,b;Piaulet et al. 2023) with the addition of nonuniform cloud coverage and of the contribution of stellar contamination to the transmission spectrum.
In the SCARLET retrieval, the atmosphere is parameterized by the abundances of the spectrally-active chemical species of interest (Na, K, H 2 O, CO, CO 2 , CH 4 , HCN, NH 3 ) which are assumed to be wellmixed and constant with pressure, and an isothermal temperature structure.The ratio of He and H 2 , which make up the remainder of the gas, is set to Jupiter's He/H 2 of 0.157.
The retrieval includes three parameters describing aerosols (log  cloud ,  cloud , log  haze ).The cloud parameterization consists of a grey cloud, opaque across all wavelengths, with a cloud top pressure  cloud .Potential fractional cloud coverage is captured by the  cloud parameter and corresponds to a weighted average of a cloud-free and a cloudy model.The contribution of haze to the transmission spectrum via a slope towards shorter wavelengths is parameterized using the  haze parameter, which is a factor that multiplies the Rayleigh contribution to the scattering coefficient (unity corresponds to standard Rayleigh scattering).
Finally, two parameters describe the stellar heterogeneity: Δ spot and  spot .Stellar heterogeneity is implemented following the standard approach (see Section 4.1.1)assuming that the contribution of stellar contamination to the spectrum can be represented by that of spots with a temperature of  spot =  phot + Δ spot (where  phot is the photosphere temperature) covering a fraction  spot of the star.Stellar models are queried from the PHOENIX grid of stellar atmosphere models (Husser et al. 2013).
The forward models calculated by the retrieval have a resolving power of 15,625.We use opacity sampling from cross-section tables at a resolving power of 250,000 to build the opacity tables used, and the opacities are taken from the ExoClimes Simulation Platform database.We choose HITEMP opacities for H 2 O, CO 2 , CH 4 , NH 3 (Rothman et al. 2010), and ExoMol opacities for all other species (Tennyson et al. 2016).We use Nested Sampling to sample the parameter space and its Python implementation in the nestle module (Skilling 2004;Mukherjee et al. 2006;Shaw et al. 2007).

Aurora
We also perform atmospheric retrievals with Aurora (Welbanks & Madhusudhan 2021) to interpret the transmission spectrum of HAT-P-18 b.The application of Aurora to transmission spectra of exoplanets is described in Welbanks & Madhusudhan (2022) following the general setup presented Welbanks & Madhusudhan (2019); Welbanks et al. (2019).We follow the methods of Pinhas et al. (2018) to consider the impact of stellar heterogeneities as implemented in previous works (see e.g., Ahrer et al. 2023).
Aurora computes line-by-line radiative transfer in transmission geometry for a parallel-plane atmosphere assuming hydrostatic equilibrium.Our atmospheric model is computed using a 100 layer pressure grid uniformly spaced in log-pressure between 10 −8 and 100 bar, and a wavelength grid from 0.55 m to 2.9 m, covering this NIRISS observation, at a constant resolution of  = 20,000.We parameterize the P-T profile of the atmosphere using the prescription from Madhusudhan & Seager (2009) (Yurchenko et al. 2011), computed as described in Gandhi & Madhusudhan (2017, 2018) and Welbanks et al. (2019).Following Pinhas et al. (2018), we model the heterogeneous stellar photosphere by interpolating stellar models from the PHOENIX database (Husser et al. 2013).The parameter estimation is performed using the nested sampling algorithm (Skilling 2004) with MultiNest (Feroz et al. 2009) and its python implementation PyMultiNest (Buchner et al. 2014).

An Explanation for HAT-P-18 b's Transmission Spectrum
We first provide an intuitive explanation for HAT-P-18 b's transmission spectrum.Figure 7 shows a spectral decomposition of the maximum likelihood (best-fitting) model transmission spectrum from the Poseidon one heterogeneity retrieval (model (ii) in Section 4.1.2).We detect multiple H 2 O absorption features near 0.95 m, 1.15 m, 1.4 m, 1.9 m, and 2.6 m (with a combined significance of 12.5 ).
We further detect a CO 2 absorption feature near 2.7 m (7.3 ) and infer evidence of Na at lower significance (2.7 ).The small amplitude of the absorption features is due to an optically thick cloud deck which is uniform around the terminator (7.4 ).The inference of non-patchy clouds is driven by the wing shape of molecular bands (MacDonald & Madhusudhan 2017) -especially the H 2 O band centred near 1.4 m -and the flat continuum from 1.6-1.8m.Finally, the spectral slope shortwards of 1.65 m is caused by the combination of unocculted starspots (5.8 ) and the uniform cloud deck.The quoted detection significances are from Bayesian model comparisons between the reference Poseidon one heterogeneity model and another retrieval with one model component removed (e.g., Benneke & Seager 2013).We do not identify statistically significant evidence for K, CO, CH 4 , HCN, NH 3 , nor a scattering haze.We note that these conclusions are independently retrieved by the SCARLET and Aurora retrievals, which produce similar best-fitting spectra (see Section 4.4).Finally, we determine below that a single high data point near 1.1 m, not fit by our retrieval models, is attributable to metastable helium.We verified that removing this data point does not alter the retrieval results.

Evidence of Helium Absorption
We assessed evidence of helium absorption in HAT-P-18 b's atmosphere via an independent fit to the pixel-resolution transmission spectrum near 1.083 m. Figure 8 shows that a Gaussian fit, with free amplitude and width, favours an excess absorption of 0.13 ± 0.03 % (4.3 ) centred around the near-infrared helium triplet at 1.083 m.
Our result is consistent within 1  to Fu et al. (2022), Vissapragada et al. (2022), andParagas et al. (2021).Since the helium line profile is unresolved with NIRISS/SOSS, even at pixel resolution, we do not conduct more complex modelling of the helium line.Consequently, we cannot unambiguously attribute this signature to the planetary atmosphere nor disentangle the layers probed by the helium triplet.Nonetheless, we confirm that HAT-P-18 b exhibits a longer transit duration with more absorption post transit -as also observed by Fu et al. (2022) -which could indicate an extended atmosphere with the presence of a cometary-like tail as exhibited by WASP-107 b (Allart et al. 2019).Future observations of HAT-P-18 b at higher spectral resolution can confirm the helium absorption.A confirmation would indicate that HAT-P-18 b is losing its atmosphere and is undergoing significant mass loss.We estimated the intrinsic excess absorption arising from the resolved helium triplet to be 3.17 ± 0.67 % (Figure 8).This value was derived by fitting a Gaussian with a fixed width of 1 Å (the typical width of the helium triplet; e.g., Allart et al. 2018Allart et al. , 2019;;Nortmann et al. 2018;Salz et al. 2018) and a free intrinsic amplitude, which was then convolved to the pixel-resolution NIRISS/SOSS data.Such a strong helium signature would be readily detectable with ground-based high-resolution spectrographs, which would provide complementary observations to study the unique processes shaping HAT-P-18 b's upper atmosphere.

The Atmosphere of HAT-P-18 b
Our retrieved atmospheric properties for HAT-P-18 b are summarized in Figure 9.All three retrieval codes obtain retrieved transmission spectra consistent with the picture presented previously: HAT-P-18 b's observed spectrum is shaped by absorption from H 2 O and CO 2 (and likely Na), unocculted starspots, and a cloud deck.The posterior distributions in Figure 9, and corresponding confidence regions in Table 4, provide our quantitative constraints on HAT-P-18 b's atmospheric properties.
We also obtain robust upper limits on the abundances of several important chemical species.Our retrievals strongly disfavour the presence of K (log  K ≲ −10 to 2 ), which could be due to condensation out of the gas phase.Similarly, we establish that CH 4 is at least 100× less abundant than the expectation for a solar-composition atmosphere in chemical equilibrium (log  CH 4 ≲ −6 to 2  vs. the expected log  CH 4 ∼ −4; Woitke et al. 2018), which could indicate photochemical dissociation (e.g., Moses et al. 2011;Venot et al. 2012).Although we see a tentative hint of CO in our posteriors (see Figure 9), additional observations at longer wavelengths, covering the 4.6 m CO band, would be required to robustly constrain the CO abundance (and hence the atmospheric C/O ratio).We also find upper limits on HCN and NH 3 (log  HCN ≲ −5; log  NH 3 ≲ −6), which are consistent with both equilibrium and disequilibrium ex-  For each of the three retrieval codes (Poseidon; purple, SCARLET; orange, and Aurora; blue), the median retrieved spectrum (solid lines) and 1  confidence intervals (shaded contours) are shown.The most important model features required to explain HAT-P-18 b's NIRISS transmission spectrum are annotated.All three codes adopt the one heterogeneity model for this comparison.Bottom: Posterior probability distributions corresponding to the retrieval model in the top panel.The top row shows retrieved atmospheric properties with constrained values, the middle row shows non-detected chemical species with abundance upper limits, and the bottom row shows the retrieved unocculted starspot properties.Each histogram is annotated with the retrieved median and ± 1  confidence intervals from the Poseidon retrieval for reference (see Table 4 for the results from all three codes).
pectations for hot giant planets (e.g., Moses et al. 2011;Venot et al. 2012).These upper limits underline that SOSS data is of sufficiently high quality to provide scientifically meaningful constraints even on the abundances of non-detected chemical species.

Unocculted Stellar Heterogeneities
We next turn to constraints on unocculted stellar heterogeneities.We first present results under the assumption of a single unocculted heterogeneity population with fixed log  -a common approach in the literature -before relaxing this assumption by considering more complex treatments of the TLSE.Finally, we examine the impact of different unocculted stellar heterogeneity models on the retrieved atmospheric properties of HAT-P-18 b.

More Complex Stellar Contamination Models
We find that the best-fitting stellar contamination model for HAT-P-18 b is two distinct populations of unocculted heterogeneitiesspots and faculae -with different surface gravities.We established this via a series of Bayesian model comparisons between the five stellar contamination models listed in Section 4.1.2.Our results, summarized in Table 5, demonstrate a significant improvement in the Bayesian evidence for the two heterogeneities with free surface gravity model compared to the fiducial one heterogeneity model we have focused on thus far (a Bayes factor of 77.5, equivalent to 3.4 ).
We also find that the preference for unocculted heterogeneities vs. no stellar contamination is greater for the two heterogeneities with

Spots Faculae
(free log g) Figure 10.Unocculted stellar heterogeneity properties from the Poseidon retrieval analysis.Posterior distributions from four retrieval models are shown, from top to bottom: (i) a single population of stellar heterogeneities (spots) with log  spot = log  phot = 4.57, as in Figure 9 (red); (ii) two distinct populations of spots and faculae, all with log  = 4.57 (orange); (iii) same as the spot model, but with the log  of the spots and photosphere as free parameters (gold); and (iv) same as the spot + faculae model, but with free log  for all three stellar components (green).The histograms are ordered such that the columns (within each row) have the same physical interpretation between the different retrievals (e.g.,  het from the one heterogeneity models corresponds to  spot in the two heterogeneities models).The retrieved atmospheric properties for HAT-P-18 b are consistent across all four unocculted stellar heterogeneity models.
free surface gravity model compared to the one heterogeneity model (6.5  vs. 5.8 ).Nevertheless, we show below that adopting the preferred stellar contamination model does not significantly alter the atmospheric constraints for HAT-P-18 b presented in Figure 9.
We show in Figure 10 that the choice of stellar contamination model can significantly alter inferences about unocculted stellar active regions.The most significant change is that retrievals with free log  favour much cooler spots (Δ ≈ −800 K vs. ≈ −300 K) than those with log  spot = log  phot .Indeed, the retrieved spot surface gravity is lower than the photosphere by > 2 .We similarly find that fitting for log  fac allows a more accurate determination of the faculae temperature compared to fixing the surface gravities.Overall, our preferred solution for HAT-P-18's unocculted active regions is as follows: spots and faculae cover  spot = 12 +5 −3 % and  fac > 4 % (2  lower limit) of HAT-P-18's surface, with temperatures of  spot = 4045 +162 −116 K and  fac = 4952 +62 −64 K and surface gravities of log  spot < 4.17 (2  upper limit) and log  fac > 4.45 (2  lower limit).These compare to a background photosphere with  phot = 4840 +50 −51 K and log  phot = 4.56 +0.03 −0.03 .These results demon-strate that more complete stellar contamination models can provide deeper insights into the nature of unocculted stellar active regions.

Sensitivity of Atmospheric Properties to Stellar Contamination Models
Finally, we find that, provided the spectroscopic imprint of unocculted heterogeneities is considered in the atmospheric model, the adopted stellar contamination model only minimally impacts the retrieved atmospheric properties for this data.Figure 11 shows that our preferred stellar contamination model (two heterogeneities with free surface gravity) produces retrieved temperatures, cloud pressures, and H 2 O and CO 2 abundances consistent with the simpler spotonly model with fixed surface gravity (one heterogeneity).However, a retrieval not including stellar contamination would attribute the spectral slope (caused by spots and a cloud deck) to an atmospheric haze and erroneously retrieve a H 2 O abundance 1-2  higher than when spots are included.Given that we do not see evidence of a haze when including a stellar heterogeneity model, we conclude that our NIRISS/SOSS data have sufficient wavelength coverage and preci- Note: We do not list results for unconstrained parameters (e.g., the other P-T profile parameters, since the retrieved profiles are essentially isothermal).'<' and '>' represent 2  upper and lower limits, respectively.sion to lift the degeneracy between an atmospheric haze power law and unocculted starspots.This mirrors a conclusion from Rackham et al. (2023), where they demonstrated that haze-spot degeneracies can be lifted with sufficiently high-quality transmission spectra (see their Figure 22).Therefore, while the finer details of the stellar contamination model are less crucial if one only wishes to constrain the planetary atmosphere, a simple stellar contamination model (e.g., one heterogeneity, as considered in studies such as Pinhas et al. 2018 andRathcke et al. 2021) is vital to obtain reliable atmospheric inferences for exoplanets transiting active stars.

A Comparison between HST + Spitzer and JWST
We additionally ran retrievals on our newly reduced HST and Spitzer observations to contextualize our JWST results.We consider two Poseidon retrievals, a model without stellar contamination and with a single unocculted stellar heterogeneity, adopting the same retrieval configuration and priors as used in the JWST analysis (see section 4.2 and Table 3).The only difference compared to our previous approach is an altered model wavelength grid (0.9-5.3 m at  = 20,000) to cover the Spitzer data at longer infrared wavelengths.Figure 12 compares our retrieved spectrum and atmospheric constraints from HST and Spitzer to our previously presented JWST results.We focus on the abundances of H 2 O, CO 2 , and the cloud pressure for this comparison since these are the most well-constrained model parameters from our JWST/NIRISS data (see section 4.4).Both retrieval models infer an attenuated absorption feature in the HST/WFC3 data, suggesting a degree of cloud opacity (log  cloud ≲ −2 to 1).The model without stellar contamination attributes the absorption near 1.1 and 1.4 m to H 2 O, with an abundance constraint at the order-of-magnitude level (log  H 2 O = −2.43+0.87  −1.05 ).However, the unocculted starspot model, which obtains an improved fit to the Spitzer data, finds the HST/WFC3 data can be explained by either H 2 O absorption or CH 4 in combination with starspots.This three-way degeneracy between  het , log  H 2 O , and log  CH 4caused by the lack of data shortwards of 1.0 m -removes the H 2 O detection one could claim under the assumption of no stellar contamination.Therefore, HAT-P-18 b's H 2 O abundance is essentially unconstrained from the HST and Spitzer observations.In contrast, our JWST/NIRISS observation has the wavelength coverage to break the degeneracies between unocculted starspots and the atmospheric composition, affording detailed characterization of HAT-P-18 b's atmosphere.9) are overlaid, and the retrieved median and ± 1  parameter constraints annotated, demonstrating that NIRISS/SOSS data provides significantly improved constraints on the atmospheric composition and cloud pressure compared to Hubble and Spitzer data.

SUMMARY & DISCUSSION
This work has led to important inferences on both the stellar activity of HAT-P-18 and the atmosphere of its hot Saturn.Our main results are as follows: • The transit light curves show evidence of a spot-crossing event (> 5 ); we inferred that the most likely parameters for that occulted spot are a position on the projected stellar surface of (x, y) = (0.090 ± 0.005, 0.42 ± 0.05) R * , with a radius of 0.116 ± 0.014 R * , and a temperature colder than the star of Δ = -93 ± 15 K.
• The main features in the transmission spectrum retrieved from NIRISS/SOSS observation are multiple absorption features produced by water vapour (12.5 ) with a retrieved abundance of log H 2 O ≈ −4.4 ± 0.3, a rise at redder wavelengths due to a CO 2 absorption feature (7.3 ) and evidence of Na (2.7 ).Also, there is a slope towards bluer wavelengths caused by unocculted starspots (5.8 ) and a uniform, grey cloud deck (7.4 ) that mutes some absorption features.
• Modelling stellar heterogeneities led to four slightly different solutions for the occulted spot, and we showed that different treatments of the unocculted active regions could be used.Fortunately, we found that the different solutions for the active regions had no significant impact on the retrieved transmission spectrum and atmospheric properties of HAT-P-18 b.
• Modelling spot spectra is best achieved with stellar models with lower surface gravities than the stellar photosphere.For the most likely solution of the occulted spot, that is a Δ log  = 1.16 ± 0.19 dex, which is in perfect agreement with our solutions for the unocculted spots with free log .
We proceed to discuss the implications of our results and their potential impact on exoplanetary atmosphere studies in transmission spectroscopy.

The Atmosphere of HAT-P-18 b in Context
The best-fitting atmosphere model confirms the presence of H 2 O and clouds in the terminator region of HAT-P-18 b, which is in agreement with previous work (Tsiaras et al. 2018).As in Fu et al. (2022), our transmission spectrum shows a slope towards bluer wavelengths; however, to explain it, instead of unocculted stellar heterogeneities, these authors inferred Rayleigh haze scattering, as also suggested by Kirk et al. (2017) from a ground-based observation.The TLSE, which was not taken into consideration in those studies, is likely responsible for that difference, as well as the 10× higher water abundance in Fu et al. (2022).We also find 10× less CO 2 than was reported by Fu et al. (2022), and we furthermore cannot confirm their detection of CH 4 ; both differences could come from the data reduction and modelling differences.
We show in Figure 13 our retrieved chemical abundances in comparison to an equilibrium chemistry model from FastChem 2 (Stock et al. 2022) at 600 K with a metallicity of 0.1 dex and a solar C/O ratio.We selected 10× sub-solar metallicity for this reference equilibrium model based on our retrieved sub-solar H 2 O abundance (which is our most precisely constrained abundance), allowing us to compare the other retrieved molecular abundances to equilibrium expectations.The retrieved Na abundance is consistent with this equilibrium model, as are the non-detections of CO, HCN and NH 3 .On the other hand, our retrieved abundance of CO 2 is significantly more abundant, whereas CH 4 and K are depleted by at least two orders of magnitude.We also computed equilibrium chemistry models with different C/O ratios, but the abundances did not change significantly.The differences between our free-retrieval results and the expectations from equilibrium chemistry may be indicative of either 1) non-equilibrium effects, like photochemistry for CH 4 and clouds for K or 2) a demonstration of the sensitivity of free-retrievals to individual points which could be driving an anomalously high CO 2 abundance.Methane, which would be expected at the temperature of HAT-P-18 b, was also not detected in the atmosphere of the slightly warmer hot-Saturn WASP-39 b (Feinstein et al. 2023).The C/O ratio is still uncertain for HAT-P-18 b's atmosphere and the need for longer wavelength observations with other instruments, such as NIRSpec G395H, are required to measure all the main carbon-bearing species and further investigate possible CH 4 signatures in the infrared.
Our detection of CO 2 may be spurious, and the abundance estimate unreliable.An edge effect in the NIRISS observation may be the culprit, as the CO 2 band is right at the red end of the SOSS spectrum, even exceeding it slightly.Longer wavelength data with the Near Infrared Spectrograph (NIRSpec) G395H grating would allow the detection of CO 2 to be confirmed and its abundance refined.
Our atmospheric analyses consistently find a uniform cloud deck as the only aerosol required to explain HAT-P-18 b's NIRISS/SOSS transmission spectrum.Atmospheric studies of hot Jupiters often predict a degree of patchy clouds in hot Jupiter atmospheres (e.g., Parmentier et al. 2016;Komacek et al. 2022).However, despite allowing for patchy clouds, our retrievals all favor terminator cloud fractions > 83% (to 2).Future general circulation model studies focused on HAT-P-18 b can further investigate whether uniform cloud coverage is more favoured for this Saturn-mass planet with a colder equilibrium temperature (∼ 850 K) than many hot Jupiters.Additional observations in the infrared will also be informative.

Legacy of Hubble & Spitzer in Light of JWST Results
For our HST and Spitzer transmission spectrum, the model without stellar contamination finds an abundance of H 2 O (log H 2 O = −2.43+0.87 −1.05 ) and a cloud pressure (log  cloud ≲ −2 ≈ 10 mbar) in good agreement with the findings of Tsiaras et al. (2018) (log H 2 O = -2.63 ± 1.18 and log  cloud = -2.18± 0.91 ≈ 6.6 mbar).However, we showed that a model considering the stellar contamination with a single heterogeneity removes the H 2 O detection because of a three-way degeneracy between the coverage fraction of heterogeneities and the abundances of H 2 O and CH 4 due to the lack of the visible spectrum with HST/WFC3.
The impact of spots on optical transmission spectra has been known for some time to mimic a Rayleigh scattering slope (e.g., McCullough et al. 2014;Rackham et al. 2018).Also, for exoplanets orbiting red dwarf stars, the 1.4 m water absorption feature observed with HST/WFC3 was sometimes debated to come from potential water molecules in their cold starspots (e.g., Barclay et al. 2021).Now, with NIRISS/SOSS, we can break the degeneracy between a hazy atmosphere and starspots, and potentially for colder stars, we could confirm if water absorption features come from starspots or the exoplanetary atmosphere.
HST and Spitzer inference studies have shown that water vapour, alkali metals, as well as clouds are common in hot giant planets (e.g., Madhusudhan 2019).Still, there often remained a degeneracy between water and clouds, preventing robustly retrieving the abundance of H 2 O and the location of the cloud deck (e.g., Benneke & Seager 2012, 2013).In this work, we confirmed these detections of water vapour, clouds and potentially sodium in a hot giant atmosphere, and we demonstrate that the improved wavelength coverage provided by NIRISS/SOSS is sufficient to break the cloud-metallicity degeneracy.

Challenges of Occulted Spot Analysis in the Era of JWST
Our work adds to the growing body of literature demonstrating that the inference of the transit and occulted spots properties with JWST observations can and should be done to avoid possible biases in atmospheric estimates (e.g., Bixel et al. 2019;Rackham et al. 2023).A semi-analytic tool like spotrod (Béky et al. 2014), used here, is less computationally expensive compared to other tools that use a pixelation approach to model the stellar disc.Parallelization of those tools could be a simple solution to further improve the computation time.
Moreover, we found that modelling active features from light curves, with the current tools and knowledge, is a degenerate problem.The standard practice of fixing the orbital parameters using a white light curve fit, which also fixes the occulted spot size and position, may prevent ruling out some families of solutions.Therefore, we are currently developing a package to simultaneously fit all spectral channels using both wavelength-dependent and wavelengthindependent parameters.Still, theoretical advances are needed to understand the limits of inferences from occultations and break the degeneracy between starspot size and temperature.Fortunately, those degeneracies do not impact the retrieved transmission spectrum significantly.Furthermore, spotrod assumes that active features have a circular shape and the same limb-darkening law as the star.This can also impact the retrieved properties and make it more difficult to lift some degeneracies.The occulted feature may, moreover, have a more complex structure consisting of a cool spot and a hot facula, which could explain the smaller temperature difference retrieved for the occulted spot compared to the unocculted spots.Nonetheless, the retrieved stellar fraction of the occulted spot  oc.spot = 1.35 ± 0.02 % is coherent with our preferred solution for the coverage fraction of unocculted spots  unoc.spot = 12 +5 −3 %.

Accounting for Unocculted Heterogeneities in JWST Transmission Spectra
Our best-fitting model suggests that the slope towards bluer wavelengths in this NIRISS/SOSS observation comes mainly from unocculted stellar heterogeneities.Our work shows that we should constrain planetary atmosphere and stellar contamination properties simultaneously with JWST NIRISS/SOSS data; otherwise, there is a risk of biasing atmospheric inferences.However, we could not distinguish clearly which of the four stellar contamination treatments investigated (one and two heterogeneities, one and two heterogeneities with free surface gravity) is the most plausible, but we were able to show that a particular choice of treatment does not significantly impact the atmospheric inferences.Furthermore, the spectra of active regions are currently modelled with 1D stellar models, which is another limitation, particularly for faculae (Norris et al. 2017;Johnson et al. 2021), because they fail to capture aspects of facular contrasts, such as the center-to-limb variation (e.g., Yeo et al. 2013) and the dependence on stellar metallicity (Witzke et al. 2018).More theoretical work, such as magnetohydrodynamic simulations of magnetic features, is needed to understand better and, thus, model the contrast of active regions more accurately as a function of stellar fundamental parameters and activity.In the meantime, we suggest that spots' flux spectra be modelled with stellar models with lower surface gravities than the star.

CONCLUSION
We presented atmospheric retrieval analyses of two transmission spectra of HAT-P-18 b obtained with JWST NIRISS/SOSS, and HST/WFC3 with Spitzer/IRAC.We also modelled a spot-crossing event in the transit observed with NIRISS/SOSS.We confirmed that the wavelength coverage and spectral resolution of NIRISS/SOSS considerably improve the detection significance and the abundance constraints on chemical species compared to observations from HST and Spitzer combined.In including stellar heterogeneities in transit fits and atmospheric retrievals, we implemented new model considerations designed to fit the local surface gravity of stellar heterogeneities.This work is instructive for the upcoming JWST transit observations, particularly by informing the community on disentangling stellar and planetary atmosphere signals.

Figure B5
. Stellar contamination on the transmission spectrum due to the occulted spot.The contamination factor (blue) represents the squared ratio of the planet's apparent radius R p over the planet's true radius R p,0 .The former corresponds to a transit depth computed with a spot-crossing masked and the latter with a spot-crossing modelled.The theoretical factor (red) was derived from Eq. 4.

Figure 1 .
Figure 1.Data products at different stages of the reduction process.(A): A raw, uncalibrated data frame in data numbers (DN).(B): Data frame after superbias subtraction and reference pixel correction.(C): 1/  noise.(D): After ramp fitting and flat field correction.(E): Final calibrated data product after background subtraction and bad pixel correction.

Figure 3 .
Figure3.Transmission spectra of HAT-P-18 b; one with JWST NIRISS/SOSS at pixel resolution (faded grey) and binned to a resolving power of  = 100 (blue) and another with HST/WFC3 (red).Note that no offset was applied to the HST spectrum.

Figure 4 .
Figure 4. Modelling of the spot-crossing event.Top: NIRISS/SOSS broadband light curve (blue), along with the best-fitting model overplotted (black) and the fit statistics listed in the bottom left corner ( 2  ; the reduced Chisquared, ; the average error bar, and e; the error multiple needed to obtain a  2  equal to unity).Middle: Residuals to the transit fit with the RMS scatter.Bottom: Physical representation of the maximum likelihood solution for the occulted starspot (black circle) on the star (yellow circle), along with the transit motion in black (dashed lines representing the transit chord) of the planet (orange circle).

Figure 5 .Figure 6 .
Figure 5. JWST NIRISS/SOSS binned spectrophotometric light curves, along with the best-fitting transit models using spotrod (black).Left: Normalized spectrophotometric light curves at a resolving power of  = 100.Right: Associated residuals from the light curve fit in each bin with the RMS scatter indicated.

Figure 8 .
Figure 8. Pixel-resolution transmission spectrum of HAT-P-18 b around the 1.083 m helium triplet.We model the helium triplet as a Gaussian convolved to the resolution of NIRISS/SOSS and fitted to the data.The best-fit is shown in blue, and the unconvolved Gaussian in green.The central wavelength of the metastable helium triplet is indicated as the vertical dashed line.

Figure 9 .
Figure 9. Atmospheric and stellar retrieval results for HAT-P-18 b.Top: Retrieved model transmission spectra from our JWST/NIRISS spectra (black errors).For each of the three retrieval codes (Poseidon; purple, SCARLET; orange, and Aurora; blue), the median retrieved spectrum (solid lines) and 1  confidence intervals (shaded contours) are shown.The most important model features required to explain HAT-P-18 b's NIRISS transmission spectrum are annotated.All three codes adopt the one heterogeneity model for this comparison.Bottom: Posterior probability distributions corresponding to the retrieval model in the top panel.The top row shows retrieved atmospheric properties with constrained values, the middle row shows non-detected chemical species with abundance upper limits, and the bottom row shows the retrieved unocculted starspot properties.Each histogram is annotated with the retrieved median and ± 1  confidence intervals from the Poseidon retrieval for reference (see Table4for the results from all three codes).

Figure 13 .
Figure 13.Comparison between an equilibrium chemistry model (coloured curves) and the retrieved mixing ratios from the Poseidon one heterogeneity model (error bars and arrows).The error bars correspond to 1 constraints for the inferred chemical species (H 2 O, CO 2 , and Na), whilst the arrows correspond to 2 upper limits for the non-detected species.The FastChem 2 equilibrium model shown assumes a solar C/O ratio and 10× sub-solar metallicity (chosen based on the retrieved H 2 O mixing ratio) and a temperature consistent with the retrieved terminator temperature (600 K).

Figure B1 .
Figure B1.HST/WFC3 binned spectrophotometric light curves, along with the best-fitting transit models overplotted (black) for both visits.Light curves are arbitrarily offset for clarity.Right: Residuals from the different light curve fits with the associated root-mean-square (RMS) scatter.

Figure B2 .
Figure B2.Spitzer/IRAC light curves.Top: Best-fitting transit models (black), overlaid with the systematics-corrected data (red circles), showing channel 1 (3.6 m) and channel 2 (4.5 m) in the left and right panel, respectively.Bottom: Residuals from the light curve fits.

Figure B3 .
Figure B3.Exposure in the NIRISS/SOSS F277W filter (2.5 ≤  ≤ 2.85), highlighting the positions of undispersed contaminants.The extraction apertures for orders 1 and 2 are denoted in red and blue respectively.Regions of each order that are masked due to the presence of a contaminant are denoted by the black dashed lines (see text for exact wavelength ranges that are masked).

Figure B4 .
Figure B4.Posterior probability distributions from the NIRISS/SOSS broadband light curve fit with spotrod.This corresponds to the joint fit of the orbital and starspot parameters.

Table 1 .
Parameters of the HAT-P-18 planetary system used in this analysis Hartman et al. 2011omHartman et al. 2011

Table 2 .
Families of solutions for the occulted starspot

Table 3 .
Retrieval parameters and priors.

Table 4 .
Retrieval results for the one heterogeneity model.

Table 5 .
Benneke & Seager 2013)erogeneity Bayesian model comparison.Note: ln Z i is the Bayesian evidence of the i th model.B ref,i is the Bayes factor of the reference model with respect to the i th model."Det.Sig."indicates the detection significance, expressed in equivalent "" (e.g.,Benneke & Seager 2013), for the reference model, highlighted in bold, vs. the i th model.
Impact of unocculted stellar contamination model on retrieved atmospheric properties.Posterior distributions are shown from three Poseidon retrieval models: no stellar contamination (black), one heterogeneity (red), and two heterogeneities with free surface gravity (green).The one heterogeneity ("spots") model corresponds to the results in Figure9.Without considering stellar contamination, one would retrieve a H 2 O abundance biased 1-2  higher and find no cloud deck.The CO 2 abundance and atmospheric temperature are less sensitive to the stellar contamination model.
Atmospheric retrieval of Hubble and Spitzer transmission spectra of HAT-P-18 b.Left: Retrieved spectrum assuming no stellar contamination (black) and with a single heterogeneity (red), shown via their median (solid lines) and 1  confidence intervals.Right: Corresponding posterior probability distributions.The JWST NIRISS/SOSS retrieval results from the single heterogeneity Poseidon retrieval (see Figure