Exoplanet Transit Spectroscopy with JWST NIRSpec: Diagnostics and Homogeneous Case Study of WASP-39 b

The JWST has ushered in a new era of exoplanet transit spectroscopy. Among the JWST instruments, the Near-Infrared Spectrograph (NIRSpec) has the most extensive set of configurations for exoplanet time series observations. The NIRSpec Prism and G395H grating represent two extremes in NIRSpec instrument modes, with the Prism spanning a wider spectral range (0.6-5.3 $\mu$m at lower resolution (R$\sim$100) compared to G395H (2.87-5.14 $\mu$m; R$\sim$2700). In this work, we develop a new data reduction framework, JexoPipe, to conduct a homogeneous assessment of the two NIRSpec modes for exoplanet spectroscopy. We use observations of the hot Saturn WASP-39 b obtained as part of the JWST Transiting Exoplanets ERS program to assess the spectral quality and stability between the two instrument modes at different epochs. We explore the noise sources, effect of saturation, and offsets in transmission spectra between the different instrument modes and also between the two G395H NRS detectors. We find an inter-detector offset in G395H of $\sim$ 40-50 ppm, consistent with recent studies. We find evidence for correlated noise in the Prism white light curve. We find the G395H spectrum to be of higher precision compared to the Prism at the same resolution. We also compare the JexoPipe spectra with those reported from other pipelines. Our work underscores the need for robust assessment of instrument performance and identification of optimal practices for JWST data reduction and analyses.


INTRODUCTION
Exoplanetary atmospheres are key to understanding the physical conditions, chemical composition and origins of exoplanets.Transmission spectroscopy (Seager & Sasselov 2000;Brown 2001) has emerged over the past two decades as the most widely applied technique used to obtain exoplanet spectra.Since the first detection of sodium in the upper atmosphere of the hot Jupiter HD 209458 b (Charbonneau et al. 2002), transmission spectroscopy has progressed with dozens of exoplanet atmospheres probed, and remains an extremely promising method of planetary remote sensing (Madhusudhan 2019).In particular over the last decade the Hubble Wide Field Camera 3 (WFC3) near-infrared (NIR) instrument G141 grism (1.1-1.7 µm) has delivered high precision detections of H 2 O in numerous hot Jupiters (e.g., Deming et al. 2013;McCullough et al. 2014;Sing et al. 2016) as well as in the sub-Neptune K2-18 b (Benneke et al. 2019;Tsiaras et al. 2019).
Transmission spectroscopy depends on wavelength-dependent absorption and scattering of stellar light by atmospheric atomic and molecular species as the planet transits in front of its host star.During the transit, a proportion of the star's light is effectively blocked in the line-of-sight by the atmosphere which can be represented as an opaque annulus which adds an apparent extra height to the bulk ★ E-mail: subhajit.sarkar@astro.cf.ac.uk (SS) radius of the planet.Due to the wavelength dependence of the various opacities, the transit depth varies with wavelength and thus the apparent (  /  ) 2 (where   is the planet radius and   is the star radius).This technique probes the high altitude atmosphere at the planet day-night terminator.In addition, atmospheric modelling and spectral retrieval methods have progressed to allow increasingly more sophisticated interpretation of these spectra (Madhusudhan & Seager 2009;Madhusudhan 2018).Transmission spectroscopy is particularly powerful in the visible and near-infrared wavelength ranges, where the host star flux is maximal (minimising fractional photon noise), and where there are numerous atomic and molecular spectral signatures of molecules expected in planetary atmospheres.
While only MIRI provides substantial wavelength coverage in the mid-infrared (MIR) beyond 5 µm, there is a choice of three instruments in the near-infared (NIR) range: ∼ 1-5 µm.In addition, each of these NIR instruments have different configurations, i.e. combinations of dispersion element, filter, detector subarray and readout pattern, that further expands the choices presented to an observer.While the choice of configuration may at least partly depend on wavelength coverage, there are overlapping wavelength ranges between the different NIR instrument configurations: see Fig. 1 in Sarkar & Madhusudhan (2021).One of the key questions in this early stage of JWST operations is the relative performance of the different instruments and their configurations particularly in regions of overlapping wavelength and also the optimal data reduction methods to process each configuration.
Of the available NIR instruments, NIRSpec presents the largest number of possible configurations with 7 different dispersive elements available.Two of these elements: the Prism and the G395H grating, present two extremes.The Prism mode gives the broadest wavelength coverage in a single pass (0.6-5.3 µm) with high pixel count rates, but has the lowest spectral resolving power (R) of ∼ 100, with the spectrum concentrated within a 512 pixel-wide detector subarray on the NRS1 detector (the SUB512 subarray which is 32 × 512 pixels in size) which causes it to saturate more easily than other modes.In contrast G395H has a high R of ∼ 2700 but covers about half the wavelength range of Prism (2.87-5.14µm), with the spectrum dispersed widely over the two NRS (2048 × 2048 pixel) detectors (using the SUB2048 subarray which is 32 × 2048 pixels in size over each of the two detectors).
In this paper we aim to compare these two key NIRSpec modes and uncover how similar the final results are between the two.We do this by performing a homogenized analysis of two observations taken of the hot Saturn mass planet, WASP-39 b, obtained as part of the JWST Transiting Exoplanet Community ERS program 1366 (PI: N. Batalha), one with NIRSpec Prism and the other with G395H.
To facilitate this work we developed an end-to-end data processing framework, JexoPipe, which takes the raw uncalibrated files and produces final transmission spectrum.We minimise differences in processing between the two configurations, allowing us to better control and assess the causes of any differences between the final spectra in their overlapped regions.We further examine JexoPipe against previously developed pipelines by comparing the final spectra obtained.The ERS NIRSpec Prism observation of WASP-39b was originally presented in JWST Transiting Exoplanet Community Early Release Science Team et al. (2023) and Rustamkulov et al. (2023) and the G395H observation in Alderson et al. (2023).
In the following we first describe WASP-39 b and summarise previous results in Section 2 and then review NIRSpec and its performance in Section 3. We describe JexoPipe in Section 4. In Section 5, we present the final spectra obtained and compare these to those from other pipelines.In Section 6 we discuss any differences in the spectra obtained from the different NIRSpec modes.Section 7 describes application of an atmospheric forward model to the data.We summarise our findings and conclusions in Section 8.

WASP-39 B
WASP-39 b is a highly inflated Saturn-mass planet discovered in 2011 (Faedi et al. 2011) transiting a G8 star ([Fe/H] = -0.12)located at a distance of 213.982 pc1 .It has an equilibrium temperature of 1166 K with an orbital period of 4.0552941 days (Mancini et al. 2018).With a radius of 1.279  J and mass of 0.281  J , its bulk density is roughly four times less than of Saturn at 0.167 g/cm 3 (Mancini et al. 2018).
Prior to JWST, NIR transmission spectra had been obtained with Hubble WFC3 Wakeford et al. (2018), STIS andSpitzer IRAC Fischer et al. (2016); Sing et al. (2016).Ground-based facilities have also contributed with a UV-optical transmission spectrum from VLT-FORS2 Nikolov et al. (2016) and UV-optical multi-band photometry (Ricci et al. 2015).These spectra indicated a cloud-free atmosphere with the detection of sodium and potassium (Fischer et al. 2016;Sing et al. 2016;Nikolov et al. 2016).Water vapour was also detected with an estimated metallicity of 151 +48 −46 × solar (Wakeford et al. 2018).A suite of JWST ERS NIR observations have been performed on WASP-39 b.Using NIRISS SOSS (0.6-2.8 µm) (Feinstein et al. 2023) reported a super-solar metallicity with results varying from 10-30 × solar dependent on the atmospheric models applied, a sub-solar C/O ratio and super-solar K/O ratio.Using the G395H grating Alderson et al. (2023) reported a somewhat lower metallicity of 3-10 × solar, with a sub-solar to solar C/O ratio.They also report detection of CO 2 , CO, H 2 O and SO 2 .Rustamkulov et al. (2023) and JWST Transiting Exoplanet Community Early Release Science Team et al. (2023) reported results using the Prism configuration (0.6-5 microns), with detection of CO 2 , CO, H 2 O SO 2 and Na.The best-fitting model corresponded super-solar metallicity and super-solar C/O ratio with moderate cloud opacity.The SO 2 feature has also been discussed in Tsai et al. (2023) who reported that the feature could be explained by the photochemical breakdown of H 2 S. Using NIRCam F322W2 (2-4 µm), Ahrer et al. (2023) reported detection of water vapour and that best-fit chemical equilibrium models favoured a metallicity of 1-100 × solar with a substellar C/O ratio.

NIRSPEC
NIRSpec is a complex instrument with multiple configurations which cover (in total) the wavelength range from 0.6-5.3µm.Due to the wide wavelength coverage in the NIR (covering the spectral features of key atmospheric molecules at high SNR as well as the Rayleigh scattering slope) and the many choices of configuration (giving flexibity for different targets and brightnesses), NIRSpec will be one of the principal instruments used in JWST transmission spectroscopy.It incorporates an integral field unit (IFU) and a micro-shutter array (MSA) for multi-object spectroscopy and five slits (apertures) for individual spectroscopy.The observing template for exoplanet timeseries is the bright object time-series (BOTS) mode.This utilises the large aperture S1600A (1600 x 1600 mas 2 ) minimizing slit losses, in combination with up to nine different disperser/filter combinations.
There are three high resolution (R ∼ 2700), each with an associated filter, and three medium resolution (R∼1000) gratings, where G140H and G140M have two possible filters, and the others have one filter.This gives eight possible grating/filter configurations.In addition, the Prism mode is combined with the CLEAR filter to give the ninth configuration.Two Teledyne H2RG 2048 × 2048 HgCdTe detector arrays are used called NRS1 and NRS2.Spectra from the medium resolution gratings and Prism project only onto NRS1.
A choice of detector subarrays exist with different frame times.The detectors are read up-the-ramp producing non-destructive reads (NDRs) separated by the frame time.Up-the-ramp sampling allows for cosmic ray detection and possible recovery of affected slopes (Giardino et al. 2019).Most time-series observations are expected to use the NRSRAPID read pattern, where there is no on-board frame averaging, giving one NDR or frame per 'group', and where one group time per integration ramp is lost to reset.The observation proceeds as a sequence of integrations constituting an exposure.Thermal settling at the beginning of the exposure can occur (Birkmann et al. 2022a).If the SUB2048 subarray is used (with the high resolution gratings), both detectors are implemented and a small gap will appear in any spectrum due to the physical separation between the two detectors.While the detectors are identical in manufacture, the commissioning study by Espinoza et al. (2023) using G395H and HAT-P-14 b as the target found differences in the slope of the systematic trend (being stronger in NRS1) and how closely the measured scattered in light curves matched the calculated noise (being a closer match in NRS1).All NIRSpec spectra have a slight curvature and tilt on the detector.
One of the main sources of systematics in NIRSpec is '1/f noise' which correlates counts during readout mainly the fast readout direction (and to a lesser extent in the slow direction) and manifests as vertical banding seen in raw images.Espinoza et al. (2023) recommended producing light curves at the sampling resolution of the instrument (rather than binning across pixel columns) to minimize the degradation in SNR from correlated 1/f noise across columns, and then binning the results at a post-processing stage.However, Holmberg & Madhusudhan (2023) tested this strategy for NIRISS and found that the result is the same, regardless of the order of the binning, if one takes into account the covariance.
While pointing jitter and drift combined with intra-pixel variations was considered another possible correlated noise source, commissioning studies found that the line-of-sight pointing stability is very good ∼ 1 mas radial (Lallo & Hartig 2022).This minimizes the need to apply de-jittering algorithms to the data.
While persistence is expected in H2RG detectors producing a similar form of temporal effect as seen in the Hubble WFC3 detectors, the effect is expected to be smaller in the newer JWST detectors (Birkmann et al. 2022a).Commissioning studies indicate the effect of persistence may not be a major concern for science programs (Böker et al. 2023).
Bad pixels can exist of various types, e.g.'hot' pixels with high dark signal, pixels with poor response to light, or dead pixels.From commissioning Böker et al. (2023) found an operability rate of pixels of 99.59 % on NRS1 and 99.80 % on NRS2.These give 16948 and 8275 non-operable pixels on NRS1 and NRS2 respectively.Cosmic rays (CRs) are another source of abnormal pixel counts during an observation.Böker et al. (2023) report an average CR hit rate of about 5.5 cm −2  −1 with a typical hit area of about 10.5 pixels.In addition the so-called "snowballs" (Birkmann et al. 2022b) have been identified: cosmic ray events with a heavily saturated core of 2-5 pixels in radius surrounded by a halo.The large number of bad pixels combined with CR hits require pipelines to have a robust management method for dealing with bad pixel counts.Time-dependent systematics and time-correlated noise are important to characterise for time-series exoplanet observations.Espinoza et al. (2023) performed an Allan deviation analysis of the bandintegrated light curve fit residuals using NIRSpec G395H.These were consistent with uncorrelated noise.The Alderson et al. (2023) analysis of the G395H data for WASP-39 b found minimal systematics but there was a mirror segment tilt event that caused a change point spread function (PSF) and a jump in flux.For the Prism Rustamkulov et al. (2023) found a high-gain antenna (HGA) movement event that affected a few integrations during the WASP-39 b Prism observation.

JEXOPIPE
JexoPipe is a recently developed data reduction framework that incorporates selected JWST Science Calibration Pipeline steps which it combines with its own customised steps, procedures and pathways.JexoPipe remains under development and will evolve to be applied for different instrument configurations.In this paper we describe its use for the Prism and G395H configurations.
We applied JexoPipe to the Prism and G395H observations of WASP-39 b from the ERS program 1366 (PI: N. Batalha).Details of these observations are given in JWST Transiting Exoplanet Community Early Release Science Team et al. ( 2023), Alderson et al. (2023) and Rustamkulov et al. (2023).To summarize, the Prism observation of WASP-39 b began at 15:30:59 on 10th July 2022 and ended at 23:37:13 UTC.The NRSRAPID readout mode was used with group time of 0.22616 seconds and 5 groups per integration ramp, with a total of 21500 integrations.The G395H observation began at 22:04:06 on 30th July 2022 and ended at 06:20:26 on 31st July 2022 UTC.NRSRAPID was used with a group time of 0.902 seconds and 70 groups per integration, with a total of 465 integrations.The duty cycle efficiencies were 82% and 98.6% respectively for Prism and G395H.
The pipeline is summarised in Figure 1, with four stages of processing.To allow a homogenised approach to facilitate comparison of spectra from the two configurations we keep pathways for the two configurations as similar as possible.The main differences arise in Stage 1 due to management of saturated pixels in the Prism data.The G395H pathway also applies a Reference Pixel Correction step and Jump Detection step in Stage 1 which are not applied in the Prism pathway.Reference files will also differ between the two configurations (and between the two G395H NRS detectors), such as the superbias file.The final transmission spectra were produced using steps from version 1.11.3 of the JWST Science Calibration Pipeline.

Stage 1
Stage 1 begins with .uncalFITS files obtained from the Mikulski Archive for Space Telescopes (MAST) archive 2 .These contain the uncalibrated non-destructive reads (i.e. group level images) per integration, and are provided as contiguous segments: four for the Prism and three for the G395H grating data.The G395H data is additionally divided between the two detectors: NRS1 and NRS2.The end products of Stage 1 are .rateintsFITS files containing the count rate (in DN/s) per pixel per integration.
For both Prism and G395H we process each segment through the steps shown in Figure 1.The functions of the official JWST pipeline Note that for G395H NRS2 the dark step was applied but automatically 'skipped' because the dark reference frame does not have enough groups.We expect this framework to evolve as our understanding of JWST performance improves and to be tailored for different instrument configurations.
(blue and grey) steps are described in the JWST pipeline package documentation3 .We use the default settings for these steps except where described below.
For the Prism we first run the pipeline for all segments up to completion of the Saturation Detection step.At that point the group data quality (DQ) arrays from all the segments are combined into a 'super DQ array'.This super DQ array is subsequently used in the Group Control step.
The pipeline then proceeds to the Superbias Subtraction step.We used the default superbias files in all cases.Given previous concern with the G395H NRS1 superbias file as cited in Alderson et al. (2023) for this mode, we tried using a custom superbias created from the median of all first group images.However we found no significant difference in the final spectrum offset or noise for G395H compared to using the default file (Table 5).For Prism, there was a very small increase in the average spectrum transit depth when using the custom file (Table 5).
The G395H pipeline applies the Reference Pixel Correction step as side reference pixels are available, however the top and bottom reference pixels are not available due to the subarray cutting these off.The SUB512 subarray used in the Prism observation has no reference pixels available (either at the sides or at the top and bottom) to apply the Reference Pixel Correction step.The Linearity Correction and Dark Current Subtraction steps are then applied.We found no significant difference in the spectral baseline if the dark current step was omitted compared to that if it was included (Table 5).
In common with the pipelines in Rustamkulov et al. (2023), the Jump Detection step is omitted for Prism (but not for G395H), as it results in a large number of false positives.Instead cosmic rays in Prism are managed by detecting outliers in Stage 2 (in the Flag Bad Pixels step which is also applied to G395H).For G395H we trialled the default rejection threshold of 4 and also a threshold of 15, as used in Alderson et al. (2023).There were statistically insignificant differences in the final spectrum (Table 5).For the final baseline case we use the results from the 15 threshold.
The Background Subtraction step is performed to mitigate 1/f noise and remove diffuse and sky backgrounds.This is applied at the group level in both the Prism and G395H modes.Column-bycolumn background subtraction at the group level has been proposed to mitigate the vertical banding that arises from 1/f noise particularly for the Prism where no reference pixels are available (Birkmann et al. 2022a).Background subtraction is performed as follows.For the Prism, a median image is first obtained from the median of all the final groups in each integration (per data segment).A spatial profile of the median image is obtained by taking the sum in the y-axis and is used to identify the pixel row with the maximum count.In each group image, we mask out the spectrum, leaving the peripheral 5 rows on the top and bottom of the image, We also mask out bad pixels identified in the group DQ array and the pixel DQ array.Then we mask out outliers to minimize the impact of cosmic rays: half of the 16th-84th percentile range of all unmasked pixel values is taken to be   and pixels ± 10  beyond the median value of all unmasked pixels are then masked.The column mean of unmasked pixels is then subtracted from all pixels in that column.We use this method since the spectral trace for the Prism is nearly linear and parallel to the x-axis of the detector.
For G395H, a median image is obtained in the same way as for the Prism.While the Prism spectral trace is roughly linear and parallel to the x-axis of the detector, the trace for G395H is slightly curved.As a result the application of the mask for G395H images required first that the trace be mapped on the detector.On the median image we divide the subarray into 10 pixel-wide column-wise slices and obtain a median profile in the y-axis for each slice.The pixel row with the maximum count in this profile is identified for each slice, and this is used to assign an approximate initial maximum point for each individual pixel column.We then fit a 4th order polynomial to these points.This polynomial refines the pixel row in each column closest to the maximum of the spectrum and also allows the trace to be extended to the edges of the image where the signal is low.In each group-level image, a mask is applied ± 10 pixels around this central pixel per column.Bad pixels and outliers are masked out in the same way as for the Prism, and the mean of the unmasked pixels in each column is then subtracted from all pixels in that column.
As noted in Radica et al. ( 2023) while photons from diffuse backgrounds are affected by detector non-linearity, 1/f noise is not, so ideally 1/f noise correction should occur before the Linearity Correction step and diffuse background subtraction after.For NIRISS SOSS, Radica et al. ( 2023) present a way to do this with multiple steps, however since the main purpose of this study was to compare two data sets with the same pipeline we decided implementing a single stage of background subtraction which happens after the Linearity Correction step, to be sufficient, thus conflating 1/f noise and diffuse background subtraction.Radica et al. (2023) noted that if 1/f correction was performed after the non-linearity correction it did not result in any biases in the spectrum but increased noise.Thus this single-step approach may result in increased noise.

Saturated pixels
Prism data presents a challenge in that a large number of pixels near the peak of the stellar spectrum are saturated.We identify a 'persistently saturated region' of pixels, which are pixels where at least 10% of the integrations in the timeline have at least one saturated group.This defines a region around the peak of the spectrum between 0.69 and 1.91 µm (Figure 2).The management of this region is challenging and final results from this saturated region need to be interpreted with added caution.
We take a different approach to managing saturation compared to the pipelines in Rustamkulov et al. (2023).All four pipelines used in that paper expand the saturation flags along entire columns if a pixel in that column was saturated.Instead, here we make use of the Saturation Detection step argument n_pix_grow_sat which we describe further below.
The Saturation Detection step compares the count on each pixel (before linearity correction) to a pre-determined saturation level in a reference file.If during an integration a pixel exceeds its reference level in a particular group, then it is flagged as saturated in the DQ array (DQ flag = 2) in that group and all remaining groups after that, but not in the preceding groups.Then at the Ramp Fitting step, the saturated groups are ignored when fitting the ramp.If all but the first group is saturated, there is the option of using this first group alone to obtain a 'ramp' value.To do this the Ramp Fitting step argument suppress_one_group must be changed to False from the default of True.Since the 'slope' value using just one group may be quite different compared to that from two or more groups, this can potentially lead to noise in pixel timelines if a pixel flips between a single group and two groups in different integrations in its timeline.To reduce the noise impact of such an effect we implement a Group Control step for the Prism pipeline which is described further below.
If a pixel becomes saturated, exceeding its full well capacity, charge can leak from the pixel diffusing into surrounding pixels.This horizontal charge overflow from saturated pixels is known as 'blooming' (Cohen et al. 2020).These neighbouring pixels may not be truly saturated (as defined by exceeding their saturation thresholds) however due to this leaked charge their electron counts now become unreliable in relation to their incident photon counts.By default the Saturation Detection step flags all neighbouring pixels within 1 pixel of a truly saturated pixel as 'saturated' in the same groups as the truly saturated pixel, i.e. 3 x 3 box of pixels centred on the affected pixel is flagged as saturated.The size of this box is controlled by the step argument n_pix_grow_sat, which has the default value of 1.
We ran our pipeline for the Prism data varying the value of n_pix_grow_sat between 1 and 4, and the final spectra obtained are shown in Figure 3 up to 3 µm for three different conditions: a) no group control and suppress_one_group set to True (excludes single group integrations), b) no group control and suppress_one_group set to False (includes single group integrations), c) with group control and suppress_one_group set to False (includes single group integrations).We explain group control below.In both cases a) and b) there are wide point-to-point variations in the spectrum within the persistently saturated region for all values of n_pix_grow_sat, while outside this region, all values of n_pix_grow_sat result in similar final spectra.In case c), implementing group control greatly reduces the point-to-point variations, however we also notice that increasing n_pix_grow_sat to values > 1 reduces the spectrum baseline which leads to the emergence of distinct peaks at 1.4 and 1.9 µm consistent with water bands (as well as further reducing point-to-point variation).The fact that increasing n_pix_grow_sat to values > 1 appears to bring out some spectral features would suggest that the effect of the saturated pixels influences the counts on neighbouring pixels greater than one pixel away.We realise however that this a somewhat subjective assessment of the improvement in the spectrum, and thus we exercise caution in the interpretation of the saturated region.
We decided to proceed in the final analysis using a value of n_pix_grow_sat = 3 for all saturated pixels in the Prism data.However we use the default value of 1 for the G395H data.This is because count rates in the G395H data are much lower than in the Prism, and rare persistently saturated pixels are those isolated pixels with unusually low saturation thresholds rather than those with excessively high photon counts in large groups.Thus we assume that the leakage into surrounding pixels is less of an issue.However to be sure we also ran the pipeline with n_pix_grow_sat = 3 for G395H and found no significant difference in the spectrum transit depths or noise (Table 5).

Group control step and 'flipping' noise
We find that some pixels in the Prism data exhibit a type of noise in their timelines that seems to result from flipping of the number of unsaturated groups per integration, and which can be controlled if the number of groups fitted for in the Ramp Fitting step is kept constant for the entire timeline (Figure 4).This also appears to give the wide point to point variations seen in Figure 3 in cases a) and b).For flips between 1 and 2 groups, this 'flipping noise' is also controlled by the default ramp fit step argument suppress_one_group, which is by default set to True, where 1 group integrations are given NaN values, and which explain the slightly less noisy spectra seen for case a) in Figure 4 compared to case b).For this reason, and also to manage pixels where there are a significant number of single unsaturated groups, we apply a Group Control step for the Prism data.For pixels in the persistently saturated region, the super DQ array is used to find the minimum number of unsaturated groups in any integration (Figure 2, top).If the minimum number of unsaturated groups is 2 or more, all integrations of that pixel are forced to use this number of groups.For pixels where the minimum number of unsaturated groups is 1, if the pixel exhibits ≥ 10% of its timeline as single unsaturated groups, then the full timeline is fixed to a single group per integration.If a pixel has single groups for < 10% of its timeline these groups are given NaN values (resulting in NaN values for the integration after the ramp fit step) (and flagged as 'bad' in the group DQ array) and the number of groups fixed at 2 for the remainder of the timeline.
The NaNs act as effective flags for correction in the Stage 2 Fix Bad Pixels step.Outside of the persistently saturated region, we do not apply group control as saturation is less frequent.If a pixel outside the persistently saturated region has a rare integration where the number of groups is different from the majority of its integrations, this may manifest as an outlier in the timeline, and thus managed in the same way as for other outliers in the Stage 2 Flag Bad Pixels step.There are a few pixels outside the persistently saturated region that have single unsaturated groups but these are for only 1-2 integrations (possibly from cosmic ray hits) and are flagged by applying NaN values at this stage to these groups.
When we apply group control, the point to point variations seen in the persistently saturated region are largely suppressed: Figure 3  case c).The Group Control step was not applied to the G395H data as saturation was much less frequent, and the longer ramp probably favours more stability against changes in the ramp gradient (and thus flipping noise) if the number of groups changes from integration to integration.

Stage 2
Stage 2 begins with the .rateintsFITS files for each segment and first combines them into one file for the entire observation (Figure 1).This facilitates the production of a rolling median image used in the Fix Bad Pixels step (see below).On completion of Stage 2 a .calintsFITS file is produced containing calibrated, wavelength-assigned 2-D slope images in flux units of DN/s.

Flag bad pixels
After combining all segments, the Assign WCS step is applied, but does not change the science data.We then apply a custom Flag Bad Pixels step.We flag all pixels that have abnormal DQ flags.We however do not include pixels flagged just for saturation or jumps as these effects will have been managed in the Ramp Fitting step, except in cases where additional flags, e.g."DO NOT USE" have been generated.NaN values are applied to these flagged pixels.Next we check for outliers not picked up in DQ flagging in each integration image on a row-by-row basis.For each row in each integration image we obtain a rolling median and rolling standard deviation ( 1 ) ± 5 pixels around a given pixel in the x-direction.We also find a line "sigma" ( 2 ) based on the half of the 16th-84th percentile range in the entire row.We flag as outliers (and give NaN values to) any pixel ± 3 1 or ± 3 2 from its corresponding rolling median value.This is iterated 3 times.For our baseline case, before this step the proportion of NaN values (for all pixels over all integrations) is 0.6% in Prism, no group control and suppress_one_group set to False.Bottom: case a) with group control and suppress_one_group set to False.We settle on using an n_pix_grow_sat value of 3 for our final reduction.0.8% in G395H NRS1 and 0.9% in G395H NRS2.After the Flag Bad Pixels this increases to 3.5% in Prism, 3.6% in G395H NRS1 and 1.6% in G395H NRS2.

Fix bad pixels
We adopt the approach of filling in NaNs which have flagged bad pixels and outliers, rather than leaving these values open.This contrasts with the approach where such values are not filled and bad pixels are de-weighted.We fill in any NaN values in two ways.Firstly if a pixel timeline has < 10% NaN values these are filled in by linear interpolation of good values.If a pixel timeline has ≥ 10% NaN values, then the full timeline for that pixel is made NaN.Secondly, any remaining NaNs are filled in spatially by linear interpolation of good values on a row-by-row basis in each integration image.Finally, we deal with any remaining pixel-level light curve outliers thus far not identified.For each pixel on each integration image, a rolling median value is produced from neighbouring images spanning ± 100 integrations for the Prism data set and ± 10 integrations for the G395H data.In addition for each pixel, we obtain a rolling standard deviation (   ) over the same range, and also the median of the rolling standard deviation values (  ).We identify pixel values which are either ± 5   or ± 5  beyond the corresponding rolling median value.These outlier pixel values are replaced by their value in the corresponding rolling median image.As explained in the text, and as can be seen in the top row of figures, this leads to noise in the timeline.The number of single group integrations is ≥ 10% (20%) of the timeline for this pixel, so per the rules adopted we force all the integrations to use only 1 group.This gives the result shown in the middle panel, where we see the noise has been mitigated.Pixel (109,17) also flips between 1 and 2 groups with the same kind of associated noise in the OOT phase.However it has < 10% (8.4%) of its timeline with single group integrations.For such cases we flag the single group integrations as 'bad' by giving them NaN values in Stage 1, and these are later filled in during the Stage 2 Fix Bad Pixels correction step as described in the text.Pixel (142,17) occurs at the edge of the persistently saturated region, and flips between 4 and 5 unsaturated groups in its OOT phase.This again results in noise which is controlled by fixing all integrations to 4 groups (middle panel).The noise resulting from flipping between different groups per integration may be due in part to the small number of total groups per integration operating near the saturation threshold, giving noticeably different slopes per integration.

Remaining steps
We then proceed to utilise steps from the JWST Science Calibration Pipeline that apply the 2-D wavelength solution(Figure 1) ending with the Wavelength Correction step.The flux units after this step remain as DN/s.The Photometric Calibration step in the JWST Science Calibration Pipeline is not strictly required since we are interested in relative flux changes.
In common with other pipelines we have not applied a flat field step as the reference files were not complete (Alderson et al. 2023;Lustig-Yaeger et al. 2023).The effect of not applying a flat field may result in increased noise due to uncorrected pixel quantum efficiency and gain variations that manifests during movement (jitter) of the spectral image on the detector (Sarkar & Madhusudhan 2021).The JWST pointing system has line-of-sight pointing stability of ∼1 mas (Lallo & Hartig 2022) which is 1/100th the pixel scale of NIRSpec (0.1").The jitter noise impact of not applying the flat field should thus be negligible.

Stage 3
Stage 3 begins with a manual examination of engineering data to identify relevant events in the timeline that may require addressing at the light curve level.The .calints file from Stage 2 is then opened

Analysis of engineering data
We analyse the engineering data associated with each observation4 , in particular the guide star x and y centroids, guide star flux, and HGA movement flags.As previously noted in Rustamkulov et al. (2023) in the Prism data there is HGA movement, and we find this occurs between 59770.9722 and 59770.9723MJD UTC.Therefore in Stage 4 we exclude any light curve points that fall into this time period.As explained below, we bin the Prism timeline to every 25 integrations, so this amounts to exclusion of two of the final binned light curve points.In the G395H data, there are no HGA movements, however we can see the impact of the previously noted (Alderson et al. 2023) mirror segment tilt event as a step change in the guide star flux (Figure 5).This is reflected in the light curves of individual pixels, where the change in count can be positive or negative and varies between pixels (Figure 6).We did not detect an associated change in the guide star centroid positions.The tilt event is corrected in the Stage 4 Prepare Light Curves step.

1-D spectral extraction
Next, a custom Extract 1-D Spectra step is applied.Extraction is obtained through a 'box' extraction method and also an optimal extraction method.
For box extraction, using the assigned 2-D wavelength solution per pixel, we obtain the mean wavelength per pixel column giving us a nominal 1-D wavelength grid.We then assume wavelength bins centred on these values with boundaries being the mean of adjacent values.We then sum up the counts on all pixels that fall in a given wavelength bin.This method allows flexibility to obtain the 1-D spectra from potentially tilted or curved spectral traces.We found the wavelength variations over a pixel column to be as follows: in Prism the standard deviation of the wavelength ranged from 0.002 to 0.013 µm per column, and for G395H this was ∼ 0.0003 µm per column in both NRS1 and NRS2.The resulting 1-D spectra look virtually identical to those obtained through simple summation of For optimal extraction, we apply the principles in (Horne 1986) to obtain the spectra and variance for each integration image.We use a rolling median image of 12 integrations for G395 and 100 integrations for Prism to provide the column-wise profile for each integration image.No background subtraction is included, and 5 outliers were rejected iteratively.The final optimal extraction transmission spectra did not show any significant differences in offset and comparable or slightly noise compared to the baseline cases (Table 5).

Error propagation
Regarding error propagation up to the end of stage 3, the ERR array in stages 1 and 2 propagates the calculated photon noise and read noise errors through the different steps.We do not adjust the error in the custom Background Subtraction step since the proportion of signal removed is small will not impact the photon noise substantially.In the Fix Bad Pixels step, the variances of bad pixels are corrected in the same way as the pixel values themselves.In Stage 3, box extraction of the 1-D spectra proceeds with quadrature addition of the corresponding ERR array values, whereas in optimal extraction, the errors are obtaining directly from the optimal extraction algorithm.

Stage 4
In Stage 4, the 1-D spectra from Stage 3 are extracted from the Stage 3 FITS file to produce a data cube of 1-D spectra vs time for each dataset.In the time axis, this constitutes the set of native (i.e.pixellevel) resolution spectral light curves.These spectral light curves are then fitted with light curve models to extract the transit depth per wavelength, resulting in the final output consisting of the planet transmission spectrum.
In Stage 4, When calculating the errors on white light curve data points, these are obtained by taking the quadrature sum of the errors on the spectral data points which are summed to produce the white light curve point.Further error handling is discussed below.

Prepare light curves
Prior to light curve fitting we prepare the light curves in various ways, e.g.removal of any unwanted sections or data points, and/or binning in time or wavelength space as needed.
We perform one more stage of outlier removal by identifying white light curve outliers.A rolling median and rolling standard deviation () of 200 data points in Prism and 20 data points in G395H are obtained, and outliers identified as being those ± 4 from the median.No outliers were identified for G395H this way and five data points identified in Prism.The 1-D spectrum corresponding to the outlying integrations were rejected and replaced by the average of the two integrations either side of it.
For the Prism, the white light curve shows a clear non-linear trend in the out-of-transit (OOT) data.By excluding the first 1500 points, we found we could reasonably fit this trend with a second order polynomial, so this clipping was applied to all spectral light curves as well as the white light curve.Given the high cadence rate of the Prism data, we bin down each spectral light curve in time every 25 points, obtaining the mean mid-BJD time stamp of the 25 binned points.The time step between the binned points is 34.438 seconds.Two points are then excluded around the HGA movement event as previously mentioned.
For G395H, we do not bin down the timelines, the time step between points being 64.056 seconds.Although systematic trend appears more linear than in the Prism light curves, we fitted the trend with a second-order polynomial to keep the pipeline as consistent with the Prism pipeline as possible.
A short 'hook' appears at the start of the G395H white light curves, which we remove by excluding the first 10 integrations.To correct for the mirror-segment tilt event in G395H we first identify visually the event on the white light curves, and exclude three integrations around the tilt event.The event of the event is then corrected during light curve fitting as explained below.

White light curve fitting
The white light curve for the Prism was obtained by co-adding the Stage 4 spectral light curves.We trialled this twice: 1) including only wavelengths above 2 µm (and thus excluding the persistently saturated region), and 2) including all wavelengths.For light curve fitting we use a Mandel-Agol transit model applied using PyLightcurve5 within a Monte Carlo Markov Chain (MCMC) algorithm applied using emcee (Foreman-Mackey et al. 2013).In addition to the planetstar radius ratio (  /  ), we fit for mid-transit time ( 0 ), the ratio of semi-major axis to star radius ( ′ /  ), the inclination angle (), and two quadratic limb darkening coefficients ( 1 and  2 ).We also fit for the systematic trend and out-of-transit baseline using a second-order polynomial (and thus three coefficients, ,  and , where the trend is represented by (1 +  +  2 ) (where  is time since the first integration included).The error on the data is obtained by fitting the OOT data only with the systematic model (using a least squares method), dividing this out and obtaining the standard deviation of the resulting points.The error bars on the individual light curve data points are then scaled by a factor such that the average error bar matches the standard deviation of points.
This initial fit to the systematic trend also gives the signs of the coefficients  and .These signs are held in variables and applied later in the final model fit, while the positive magnitudes of  and  are converted to natural logarithmic values for the MCMC algorithm.A log-likelihood function is used with uniform priors for all free parameters.We fit for the natural logarithms of all free parameters except  0 ,  1 and  2 .The period () is fixed to 4.0552941 days (Mancini et al. 2018).Eccentricity is set to 0 and argument of periastron to 90 • .We use 64 walkers with a burn-in of 1000 steps, followed by 4000 steps for the production run.Mean acceptance fractions of 44 and 43 %s respectively for the all wavelength and the > 2 µm case are obtained consistent with good convergence.Figure A1 shows the posterior distributions for these cases and Figure 8 shows the white light curves with best-fitting model and residuals.Table 1 summarises the parameter estimation results from the Prism white light fits.We find that there are no significant differences between in the system parameters obtained ( ′ / and ) using all wavelengths vs just > 2 µm, however given the uncertainties in managing the saturated region, we choose to use the > 2 µm values for the rest of the study.The residuals for the all wavelength case have a standard deviation of 120 ppm which is ∼ 1.8 × the estimated noise based on propagating the ERR array values, and for the > 2 µm case, the residuals are 140 ppm, 1.6 × the estimated noise.
For G395H, the white light curves for NRS1 and NRS2 were obtained by co-adding all spectral light curves.We performed two sets of fits 1) where the system parameters  ′ / and  are fixed to those from the Prism > 2 µm result, and 2) 'independent' white light curve fits where  ′ / and  were not fixed to the Prism values but obtained directly from the fits.In both cases we fitted for  0 ,  1 ,  2 and the natural logarithms of   /  and three polynomial coefficients, ,  and .To correct for the tilt event, a 'shift' parameter is added to the light curve model that adds an offset to the post-event section of the light curve model prior to multiplication by the systematic model.
Other aspects of the fit were as for the Prism above.We obtain mean acceptance rates of 31 and 32% for NRS1 and NRS2 fixedto-Prism cases and 29% for the independent cases.The results are summarised in Table 2. Figure A2 shows the posterior distributions and Figure 9 shows the light curve fits.For the fixed-to-Prism cases, the residuals for NRS1 and NRS2 have standard deviations of 142 and 195 ppm respectively (1.4 and 1.5 × the estimated noise).For the independent cases these are almost the same, 141 ppm and 195 ppm respectively (1.4 and 1.5 × the estimated noise).
We find that the system parameters  ′ /  and  have somewhat lower median values in the 'independent' G395H fit than in the Prism fit, however these are both highly correlated in the corner plots.To allow an unbiased comparison of Prism and G395H final spectra we thus need to control for  ′ /  and , which we do by adopting the Prism > 2 µm values for  ′ /  and  for all final spectral light curve fits.The median   /  results for the G395H results are somewhat greater than for the Prism all wavelength result, but closer to the Prism > 2 µm result.This could be explainable by the high amplitude spectral features in the G395H range that pushing up the average apparent radius over NRS1 and NRS2 wavelength ranges.

Allan deviation analysis
We also performed an Allan deviation analysis of white light curves to examine for correlated noise.We look at the following cases: 1) Prism all wavelengths, 2) Prism > 2 µm, 3) Prism 0.65-2 µm, 4) Prism <0.65 µm, 5) G395H independent NRS1, 6) G395H independent NRS 2. The Allan deviation plots are shown in Figure 10.To produce these plots, the residuals are binned into progressively larger bin sizes.We find that the G395H noise bins down fairly closely to what we would expect for uncorrelated noise, consistent with Espinoza et al. (2023).For the Prism, when we include all wavelengths in the white light curve, there is a markedly shallower gradient than expected for uncorrelated noise.For the fractionated white light curves, we see that the deviation is worst for 0.65-2 µm, which encompasses the persistently saturated region.The deviation is less for > 2 µm and lesser still for < 0.65 µm.These results suggest that the saturated region suffers the greatest correlated noise which may be related to the saturation or our data reduction methodology for that region.However some correlated noise extends beyond this region and this is therefore not attributable to saturation.It may be related to 1/f noise, our chosen systematic model or an additional unidentified cause but further investigation is needed to elucidate this.In Figure 8, we see a slight upward 'bump' in the residuals around the start of ingress (more noticeable in the all wavelengths case).We repeated the Prism white light curve fits with this region (containing 43 light curve points) removed, and found the Allan deviations for the different cases to be unaffected, indicating that the correlated noise is not due to a poorer model fit in this region.
We use the Monte Carlo prayer bead method (Gillon et al. 2007;Manjavacas et al. 2018) to estimate the error inflation if we account for the correlated noise.For each case we perform 50 trials.In each trial we produce 1000 model realisations of the white light curve and then fit for all system (except period, eccentricity and argument of periastron which are set to the previously stated values), light curve and systematic parameters using LMFIT (Newville et al. 2016).Each realisation is produced as follows.The residuals from the MCMC best-fit model and the light curve are obtained.The residuals are then shifted in sequence by a random value picked from a uniform distribution ranging from zero up to the total number of light curve points.The shifted residuals are added to the best-fit model to construct a 'new' light curve.This light curve is then fitted using LMFIT to extract the planet-star radius ratio.1000 values of the planet-star radius ratio are thus obtained and we take the standard deviation of the distribution as an estimate of the 1 uncertainty that incorporates correlated noise.We then take the ratio of this to the error given by LMFIT on the original light curve, to obtain an error inflation factor.We then obtain the mean and standard deviation of the inflation factor from the 50 trials.The inflation in errors on   /  as a result of this are as follows: Prism (all wavelengths) × 1.57 ± 0.03, Prism (> 2 µm) × 1.36 ± 0.03, Prism (0.65 − 2 µm) × 1.45 ± 0.03, Prism (< 0.65 µm) × 1.29 ± 0.03.We confirm the relative lack of correlated noise in the G395H data by finding inflation factors of × 1.09 ± 0.02 in NRS1 and × 1.119 ± 0.025 in NRS2.

Use of model 4-factor limb-darkening coefficients
To see if any improvement in accuracy or precision occurs using model limb-darkening coefficients (LDCs) we repeated the following cases using model 4-factor LDCs obtained from ExoCTK6 : Prism (all wavelengths), Prism (> 2 µm), G395H NRS1 and NRS2 (independent of Prism values).The resulting corner plots and light curve fits are shown in Figures A3 and 11, with the parameters summarised in Table 3 For the Prism, there is close agreement with the baseline case in the > 2 µm case.In the all wavelengths case, the transit depth,  ′ /  and  are distinct at 1 but not at 2.In G395H NRS1, the transit depths again agree at 2, while in NRS2 they agree at 1.The residuals on the Prism show more systematic variation than in the baseline case indicating a poorer fit to the data: the standard deviation of the residuals are 133 ppm and 141 ppm for the all wavelength and >2 µm cases respectively.For G395H, the fits appears similar to the baseline cases with residual standard deviations of 145 and 196 ppm for NRS1 and NRS2 respectively.As a goodness-of-fit test to compare the baseline case with the 4-factor LDC case, we employ the reduced chi-squared, keeping the noise constant between the two comparisons.We use the absolute noise from the baseline cases for this purpose.(i.e. the standard deviation of the residuals).Table 4 summarises the results.The reduced chi-square is larger in the 4-factor model cases in all four cases, with the greatest increase being in the Prism all wavelength case.This would indicate that the empirically-derived quadratic LDCs give a better fit to the data than the model 4-factor LDCs.

Spectral light curve fitting
We obtained model quadratic LDCs from the ExoCTK website, which were used for preliminary studies and to give initial values for fits, however for the final analysis we chose to obtain empiric LDCs for the full wavelength range covered by both Prism and G395H from light curve fitting.This is partially motivated by the fact that for white light curves the empiric quadratic fit gave a better fit to data than the 4-factor model LDC coefficients.Empiric 4-factor fits would be challenging due to the additional two free-parameters required, and quadratic fits have been used in previous fits to WASP-39 data (Rustamkulov et al. 2023;Alderson et al. 2023;Ahrer et al. 2023).In addition, Rustamkulov et al. (2023) found that WASP-39 is 6% brighter at the limb than predicted by models, indicating an empiric approach is preferred.
The spectra from Prism and G395H were binned to R=66 (slightly lower than the lowest native spectral power across the Prism subarray) to optimise the SNR.We then fitted each binned spectral light curve as described below and obtained the two limb darkening coefficients,  1 and  2 for the binned wavelengths.Figure 12 shows the extracted LDCs with error bars.Despite differences in instrument transmission, we find the empiric LDCs for G395H match those from Prism within the 1 errors.As a result we proceed using just the Prism-derived LDCs for both modes.
A smoothing function was then applied to the Prism LDC vs wavelength plots to obtain the final LDCs7 Figure 13 shows the final empirically derived LDCs and those from the ExoCTK model.On running the final spectral light curve fits for Prism and G395H, the LDCs were obtained for each spectral channel by interpolating the empiric LDC vs wavelength plots to the central wavelength of each light curve.
We fit the spectral light curves for both Prism and G395H at their native (pixel column level) resolutions.To allow comparison with the Prism, G395H spectral light curves were also binned to the Prism resolution.We exclude NRS2 wavelengths > 5.1 µm, due the rapid fall off in the transmission above that wavelength resulting in low SNR (Figure 7).Example posterior plots and light curve fits for full resolution cases, are shown in Figure B1 ).In fitting the spectral light curves for each configuration (and NRS detector) we fix  0 to the values in Tables 1 and 2 (i.e. to the Prism > 2 µm case, and the G395H fixed-to-Prism cases).We fix the values for  ′ /  and  to those obtained from the Prism white light curve fit (> 2 µm case) and fix the period to 4.0552941 days (Mancini et al. 2018).Eccentricity is set to 0 and argument of periastron to 90 • .For Prism and both G395H detectors we fit for   /  and three polynomial coefficients for the systematic fit, ,  and .Additionally for G395H we fit for the light curve 'shift' correcting for the mirror tilt event.
Figure 14 shows how the systematic coefficients ,  and  and the shift parameter in G395H vary with wavelength.The results have been binned to R=60 for clarity. follows the shape of the stellar spectrum, giving an out-of-transit baseline flux.The variation of  and  is complex in Prism, and appear negatively correlated to each other in wavelength, but not obviously to spectral features.In G395H,  and  are within 1 of zero, consistent with no time-dependent systematic trend.The shift parameter appears fairly constant with wavelength in NRS1, but displays an increasing trend with wavelength in NRS2.

TRANSMISSION SPECTRA
The final transmission spectra obtained are shown in Figure 15.The uppermost plot shows the unbinned 'native resolution' spectra.In the second plot the G395H spectra are rebinned to the resolution of the Prism spectrum, with the residuals between the two configurations and the average errors shown in the lower two plots.

Prism
Using the strategy of group control and increasing n_pix_grow_sat to 3, we were able to recover data from the persistently saturated region including the 1.4 µm H 2 O feature.However, given there is no consensus on the optimal strategy to manage the saturated region, the results in this region should be considered tentative.We recover the previously observed water features at 1.8 and 2.8 µm, the large CO 2 feature at 4.3 µm and the likely SO 2 feature at 4 µm.Figure 16 compares our Prism spectrum with those publicly-available spectra from the four pipelines used in the study by Rustamkulov et al. (2023).The errors on the residuals are the quadrature sum of the 1 error bars from the comparison pipeline spectrum and those from the re-binned JexoPipe spectrum.There is some disagreement within the persistently saturated region, where JexoPipe gives a shallower transit depth compared to the other pipelines on the blue-ward side of the 1.4 µm water feature, but a deeper transit depth over the rest of the region.This may reflect the difference in approach taken in processing the saturated region between JexoPipe and the four other pipelines.
Another consideration for the difference in the saturated region could be possible correlations between our empirically-derived LDCs and transit depth which could have resulted in a transit depth bias in the saturated region.Further investigation is needed to fully understand the differences.
Excluding the saturated region by comparing wavelengths above 2 µm only, if we consider the residuals (i.e.JexoPipe minus the alternate pipeline), the closest match is to Eureka where the average residual above 2 µm is 28±20 ppm.When comparing the average errors on the transit depths (at the binned resolution of the comparison pipeline spectrum), the smallest difference is with Eureka (JexoPipe average error above 2 µm is 5 ppm higher), and the largest difference is with Tiberius (JexoPipe average error is 24 ppm lower).
Limb Darkening: Given the structure of the residuals presented in Figures 8 and 11 (no evident symmetry around the mid-time), I believe that the differences between the baseline analysis and the quadratic or 4-factor model LDC cases are due to the correlated noise.In practice, by fitting for empiric LDCs, the model is probably absorbing part of the correlated noise caused by the saturated pixels, resulting in better statistics.At a first glance this can be overlooked but I am concerned that it can cause biases in the final spectrum.Such a bias can be seen in the saturated region of the Prism data.In Figure A1, we can see that c1 is correlated with depth, while c2 and depth are anti-correlated.Around 1.4um the empiric c1 is significantly lower than the theoretical c1 and the empiric c2 is significantly higher than the theoretical c2.By combining the two observations we can expect that if the LDC calculations are biased, then the spectrum will be negatively biased.There are two data points in the current analysis that are consistently lower compared to any other result close to 1.4 um (Figure 16).Therefore, I believe that at least these two points are biased due to the choice of empirical LDCs instead of model LDCs.I do not expect the authors to change their analyses based on this observation, but they could note it in the text as something to further investigate in the future.

G395H
The much higher native resolution of G395H compared to Prism is evident in Figure 15 (upper plot).To make it easier to visualise the spectral features and to allow comparison to the Prism, the spectral light curves were rebinned to the Prism native resolution (Figure 15, second plot).The SO 2 and CO 2 features are evident.The gap in the middle of the spectrum is due to the physical division between the NRS1 and NRS2 detectors.
In Figures 17 and 18 we compare the JexoPipe spectrum to 12 publicly-available spectra from the 11 pipelines and weighted mean used in the study by Alderson et al. (2023).The errors on the residuals are the quadrature sum of the 1 error bars from the comparison pipeline spectrum and those from the re-binned JexoPipe spectrum.We find good agreement with the amplitudes of spectral features obtained, however there are differences in the spectrum baseline that vary depending on the pipeline being compared.There is a broad range of spectrum baselines across the 11 comparison pipelines.The closest matches in terms of average offset are with  and Evans (-4±17 ppm).The Jex-oPipe result has a lower spectrum baseline than the weighted mean result with an average offset of -194±17 ppm.Comparing average errors on the transit depths, JexoPipe has similar errors to the weighted mean result, the average error being 12 ppm lower in JexoPipe.

COMPARISON OF PRISM AND G395H SPECTRA
We now examine the differences in the baseline Prism and G395H spectra (Figure 15).An offset is visible between the Prism and G395H result.The average offset (G395H-Prism) is -188±20 ppm for NRS1 and -139±31 for NRS2.We can see that when binned to the same resolution, the transit depth precision is higher for G395H than for Prism (Figure 15, lowest plot).The average difference in noise compared to Prism is -29 ppm in G395H NRS1 and -50 ppm in G395H NRS2.The noise in Prism is on average ∼ 1.3 × that in G395H at the Prism resolution.
The cause of the offset between the two modes is not immediately apparent.Since we homogenised this study by using the same LDCs and system parameters,  ′ /  and , for both datasets, we can attribute any differences to instrumental or astrophysical variations between the two observations or differences in data processing.
We kept the data processing differences to a minimum, however there were differences in Stage 1, where the Reference Pixel Correction stage was applied in the G395H pathway but not in the Prism pathway, and where the Prism pathway applied the Group Control step and a different value of n_pix_grow_sat.Group control however is only applied to the persistently saturated region of Prism, and so would not be a cause of the offset seen with G395H which are at wavelengths beyond this region.
We investigated the effects of various changes to pipeline processing and light curve fitting from the baseline case.The results are summarised in Table 5.The light curve fits for these comparisons were performed as previously described but with modified MCMC parameters (400 burn-in/ 2000 production steps, 32 walkers), and Prism spectra were obtained only at > 2µm, with G395H binned to Prism resolution to allow comparison.We re-ran the baseline case with the changed MCMC run parameters and obtained an offset of -185± 20 ppm between G395H NRS1 and Prism as compared to -188± 20 ppm mentioned above, while the offset between G395H NRS2 and Prism is unchanged.
The noise in all the cases is comparable to the baseline cases.In all cases the G395H NRS2-Prism offset is greater than the G395H NRS1-Prism offset, indicating an offset between the two G395H NRS detectors which is ∼ 40-50 ppm in most cases, including the baseline case.We note this is consistent with inter-detector offsets reported in previous studies (Moran et al. 2023;Madhusudhan et al. 2023).
An interesting finding is that if background subtraction is performed just before the linearity correction step8 , then the offset between Prism and G395H is greatly reduced or eliminated.Compared to the baseline, the G395H NRS1-Prism offset falls from -185±20 to -96±20 ppm and the G395H NRS2-Prism offset falls from -139±31 to -1±31 ppm (Table 5).This reduction in offset is due to a change in the Prism spectrum from the baseline case (changing on average by -102±21 ppm).There is no significant change in the G395H spectra from baseline.The reason for this change in Prism (and not G395H) is not immediately clear.
Omitting the reference pixel stage impacts G395H NRS1 by causing a significant negative offset compared to the baseline case of -74±18 ppm, and increasing the G395H NRS1-Prism offset to -259±21 ppm.However omitting the reference pixel stage does not seem to impact NRS2, where the change from the baseline case is not significant (-2±29 ppm).The reason for this discrepant response between the two NRS detectors is not clear.Applying n_pix_grow_sat =3 to G395H also does not significantly change the offset (see Table 5).The other variations in Table 5 do not show significant changes from the baseline case in terms of average offsets and noise.These include using a 'null' systematic for G395H where no time-dependent trends are included (i.e., we fit only for systematic coefficient  and not  or ) and the use of model LDCs.
We thus find offsets between Prism and G395H and also between the two G395H detectors.The causes of the inter-modal and interdetector offsets are not immediately clear.The G395H-Prism offsets could be due to astrophysical or instrumental variations between the two observations.However they may also be due to differences in data processing, e.g.use of different reference files.Alternate systematic models to the second-order polynomial used here may be investigated to see if these might affect the offsets seen, though as noted above use of a null systematic model (with no time-dependent trend) in G395H gives similar results to the baseline case.While the offset between Prism and G395H has several possible causes, we can rule out astrophysical variation for the G395H inter-detector offset between NRS1 and NRS2, which we detect here through the comparison with the Prism spectrum.While the cause of offsets like these need continued investigation, they represent a systematic uncertainty that needs to be accounted for when analysing such spectra, and when comparing spectra taken at different times and/or with different instrument modes.

ATMOSPHERIC MODELLING
We use the AURA framework (Pinhas et al. 2018) to generate a number of simulated transmission spectra to compare to the obtained JWST NIRSpec observations.AURA treats the terminator as a 1D plane-parallel atmosphere in hydrostatic equilibrium with a uniform composition.For the models considered in this work, we treat the atmospheric regions giving rise to the transmission spectrum as isothermal.
We consider atmospheric opacity contributions from a number of   gaseous species previously reported in the atmosphere of WASP-39 b (e.g.(Borysow et al. 1988;Orton et al. 2007;Abel et al. 2011;Richard et al. 2012), which set the spectral baseline in the absence of aerosols.
For the nominal model shown in figure 19, the abundances of all gaseous species except SO 2 correspond to a 10× enhancement over solar values under thermochemical equilibrium (Burrows & Sharp 1999; Lodders & Fegley 2002;Madhusudhan & Seager 2011;Moses et al. 2013).For SO 2 , which is the product of disequilibrium processes (Zahnle et al. 2009;Wang et al. 2017;Polman et al. 2022;Tsai et al. 2023), we use a volume mixing ratio of 10 −5 , which is largely consistent with literature findings (Constantinou et al. 2023;Rustamkulov et al. 2023;Tsai et al. 2023).We set the atmosphere's isothermal temperature to 900 K.The ZnS aerosols have a 40% coverage fraction of the terminator and full vertical extent, with a mixing ratio of 5×10 −7 and a modal particle radius of 0.01 µm.
The forward model shown in Figure 19 provides a good fit to the observations and displays prominent spectral features consistent with previous studies (Alderson et al. 2023;Rustamkulov et al. 2023;Tsai et al. 2023;Constantinou et al. 2023) .The data and model show absorption features from H 2 O near 0.9, 1.1, 1.4, 1.9 and 2.9 µm.The latter feature also contains secondary spectral contributions from CO 2 and H 2 S. Additionally, the blue end of the observations prominently shows a significant absorption feature from Na.Both NIRSpec Prism and G395H datasets show a highly prominent CO 2 Figure 19.The obtained NIRSpec Prism and G395H transmission spectra of WASP-39 b, shown as errorbars with green and blue centres, respectively.Also shown is a nominal model assuming 10× solar elemental abundances and Mie scattering aerosols, as discussed in section 7. The observations are binned to R∼100 for visual clarity, while the grey area denotes the persistent saturation region for NIRSpec Prism as shown in Figure 2 absorption feature at ∼ 4.3 µm, which is reproduced by the forward model.There are also spectral contributions from SO 2 , CO and H 2 O on either side of the large CO 2 feature.Specifically, SO 2 is responsible for a small peak at 4 µm, while CO and H 2 O provide atmospheric opacity towards the red end of the spectrum.Lastly, ZnS aerosols are responsible for significant truncation of spectral features, particularly those of H 2 O at wavelengths smaller than 2 µm.
While the above forward modelling does not provide a definitive retrieval of the present dataset, the findings are broadly consistent with prior analyses of JWST observations of WASP-39 b with NIR-Spec (JWST Transiting Exoplanet Community Early Release Science Team et al. 2023;Rustamkulov et al. 2023;Alderson et al. 2023;Constantinou et al. 2023;Niraula et al. 2023) and other instruments (Ahrer et al. 2023;Alderson et al. 2023;Feinstein et al. 2023).Specifically, prior works also find highly prominent spectral features from H 2 O and CO 2 , with additional contributions from SO 2 , CO and H 2 S.Moreover, many of the above works find that the observations are best explained by a super-solar atmospheric metallicity, with additional spectral contributions from non-grey clouds.A more comprehensive retrieval analysis of the present data can confirm this agreement with prior works.
As noted earlier, we find the greatest time-correlated noise in the Prism white light curve composed of wavelengths between 0.65-2 µm, which would suggest that the transit depth error bars in this range may be underestimated, precisely where we find largest discrepancy between the forward model and the data.

CONCLUSIONS
In this work, we present JexoPipe, a newly developed JWST pipeline for exoplanet tranist spectroscopy.We applied it to observations of the warm Saturn WASP-39 b obtained in the ERS program 1366 using the NIRSpec instrument in two contrasting configurations, Prism and G395H.We use JexoPipe to apply consistent pipeline procedures, LDCs and system parameters to both datasets, enabling a comparative analysis of the spectra and instrument configurations.
We find a significant offset between Prism and G395H spectra which is more pronounced for G395H NRS1.The Prism data baseline also reveals an offset between G395H NRS1 and NRS of the order of 40-50 ppm.This is consistent with an intra-detector offset reported by Moran et al. (2023) and supports the incorporation of detector offsets when interpreting such spectra (Madhusudhan et al. 2023).We cannot rule out astrophysical causes for the G395H-Prism offset, however instrumental changes or differences in processing may be potential causes.
We note the effect of omitting the reference pixel stage on the spectral baseline was more pronounced for NRS1 than NRS2.We also found that performing the group-level background subtraction before the linearity correction step resulted in a significant fall in the Prism spectrum baseline, such that the offsets with G395H were reduced (NRS1) or eliminated (NRS2), but no such change is seen for G395H itself.Further investigation of these differences is warranted.
We address the Prism saturation through a combination of increasing the n_pix_grow_sat argument to counteract detector 'blooming' and a custom group control stage.In the Prism, we find 'flipping' noise, which appears to result from the variation in number of groups with integration.Group control is used to mitigate this.
On choosing between Prism and G395H each offers advantages and disadvantages.The Prism avoids any inter-detector offsets, but the final spectra have somewhat more noise, and correlated noise was detected in this study.The average error on the spectrum transit depths are about 1.3 × higher in Prism compared to those on G395H when binned to the Prism resolution.It also saturates easily.G395H is superior in terms of noise and allows for higher resolution spectra, however there is the potential for inter-detector spectrum baseline offsets of the order of 10s of ppm.
Using a nominal forward atmospheric model with 10× solar elemental abundances we show that we recover water peaks at 1.1, 1.4 and 1.9 µm within the saturated region, although the scatter and deviation from the model is somewhat higher in this region than outside.We note this is also the region where time-correlated noise is highest likely leading to an underestimation of the error bars on the spectrum and which might explain at least some of this added deviation.It remains to be seen if the method used in this paper provides a more or less accurate recovery of the saturated region than previous methods.
The examination of JexoPipe involved a comparison with spec-tra obtained from previously developed pipelines.The greatest disagreement with pipelines used in Rustamkulov et al. (2023) occurs in the persistently saturated region.JexoPipe has reasonably good agreement with two pipelines used in Alderson et al. (2023) but has appreciable baseline differences with others.In this early stage of JWST observations, development of independent pipelines such as JexoPipe allows us to compare results, ultimately leading to more robust scientific conclusions and the elucidation of optimal and best practice approaches for the processing of data from JWST exoplanet transit observations.Prism (all wavelengths) Prism (>2 µm)  MNRAS 000, 1-23 (2015)

Figure 2 .
Figure2.Determing a 'persistently saturated region' on the Prism subarray.Top: per pixel, the minimum number of unsaturated groups in any integration after completing the Saturation Detection step as implemented in this framework.Bottom: yellow region shows pixels where ≥ 10% of the timeline have at least one saturated group.We term this the 'persistently saturated region'.

Figure 3 .
Figure 3.The effect of changing the Saturation Detection step argument, n_pix_grow_sat, on the transmission spectrum for Prism.The shaded area shows the persistently saturated region which extends from 0.69 to 1.91 µm.Top: case a) no group control and suppress_one_group set to True.Middle: case b)no group control and suppress_one_group set to False.Bottom: case a) with group control and suppress_one_group set to False.We settle on using an n_pix_grow_sat value of 3 for our final reduction.

Figure 4 .
Figure 4. 'Flipping' noise as the rationale for the Group Control step in Prism processing.Three pixels are shown, all of which fall in the persistently saturated region.suppress_one_group was set to False.The top plots show the timelines of pixel counts without any group control.Middle plots show the timelines with the Group Control step implemented.The timelines are shown after the Fix Bad Pixels step in Stage 2. The bottom panel shows the number of unsaturated groups per integration (and thus the default ramp length).For pixel (101,17), the number of groups flips between 1 and 2 during the out-of-transit (OOT) phase.As explained in the text, and as can be seen in the top row of figures, this leads to noise in the timeline.The number of single group integrations is ≥ 10% (20%) of the timeline for this pixel, so per the rules adopted we force all the integrations to use only 1 group.This gives the result shown in the middle panel, where we see the noise has been mitigated.Pixel (109,17)  also flips between 1 and 2 groups with the same kind of associated noise in the OOT phase.However it has < 10% (8.4%) of its timeline with single group integrations.For such cases we flag the single group integrations as 'bad' by giving them NaN values in Stage 1, and these are later filled in during the Stage 2 Fix Bad Pixels correction step as described in the text.Pixel (142,17) occurs at the edge of the persistently saturated region, and flips between 4 and 5 unsaturated groups in its OOT phase.This again results in noise which is controlled by fixing all integrations to 4 groups (middle panel).The noise resulting from flipping between different groups per integration may be due in part to the small number of total groups per integration operating near the saturation threshold, giving noticeably different slopes per integration.

Figure 5 .
Figure 5. Guide star flux during the G395H observation.Flux has been normalized to the mean value, and smoothed by convolving with a 1000 step box-shaped kernel for clarity.

Figure 6 .
Figure 6.G395H mirror segment tilt event at pixel level.Two example pixel timelines from NRS2 are shown, with a step clearly visible and corresponding to that in the guide star flux timeline.In pixel (1000, 15), there is an increase in flux and in pixel (1400, 15) there is a decrease in flux after the event.The amount and direction of flux change is thus pixel-dependent.

Figure 7 .
Figure 7. Example 1-D stellar spectra from the first integration of each time series.These are the end result of Stage 3 of the pipeline.The flux counts on the Prism have been reduced by a factor of 100 to allow all spectra to be shown on same plot.

Figure 8 .
Figure 8. Prism White light curves and best fit solutions using median values from the MCMC posterior distribution, with residuals.

Figure 9 .
Figure 9. G395H white light curves and best fit solutions using median values from the MCMC posterior distribution, with residuals for G395H.

Figure 10 .
Figure 10.Allan deviation analysis of white light curve residuals.The fractional noise is normalised to the first data point for comparison purposes.The dotted black line has a gradient of -0.5 in log-log space and indicates how uncorrelated noise would appear to integrate down.

Figure 11 .
Figure 11.White light curves and best fit solutions when using model 4-factor LDCs.

Figure 12 .
Figure 12.Empirically-derived quadratic LDCs from R=66 Prism data (faded red and blue points), and those from R=66 G395H data (bolder red and blue points). 1 is in blue and  2 in red.The values between Prism and G395H match within the 1 error bars.

Figure 13 .
Figure 13.Empirically-derived quadratic LDCs from Prism data and those from the ExoCTK website model.The latter uses the Kurucz ATLAS9 model grid ( eff =5000K, log=4.45,Fe/H =0.01).Shaded regions give the 1 error on the empiric values.

Figure 14 .
Figure 14.Systematic coefficients.Data have been binned to R=60.Blue = Prism.Red = G395H.Transmission spectra are shown for comparison in top panel.For G395H, values for  have been increased by a factor of 5000 for clarity.The light curve shift applied during light curve fitting for G395H to correct for the tilt event is shown in the bottom panel.

Figure 15 .
Figure 15.Transmission spectra of WASP-39 b obtained using JexoPipe.Uppermost plot shows the spectra obtained at the native resolution without any wavelength binning.The shaded grey area indicates the region of persistent saturation for the Prism.Central points are the median of the posterior distributions and the error bars span the 16th-84th percentile range.In the second plot down, the Prism spectrum is shown at native resolution, while the G395H spectra are binned to the Prism wavelength grid to allow comparison of the transit depths with wavelength.To permit a difference comparison of central points and quadrature sum of errors, the error bars show the average error (half the 16th-84th percentile range) and the central points are the average of the 16th and 84th percentiles in the posterior distribution.The third plot gives the difference between the binned G395H spectrum and the Prism spectrum central points (i.e.residuals = G395H minus Prism).The error bars on the residuals are the quadrature sum of the error bars as shown in the second plot.The fourth plot gives the average error on the transit depth at the native resolution of the Prism.

Figure 16 .
Figure 16.Comparison of our Prism result with other Prism pipelines presented in Rustamkulov et al. (2023).The JexoPipe result is rebinned to the wavelength grid of the comparison pipeline's publicly available spectrum.The grey shaded area indicates the region of persistent saturation.Residuals are JexoPipe minus the comparison pipeline spectrum.The average errors on the transit depths for the comparison spectrum and the JexoPipe spectrum when binned to the resolution of the comparison spectrum are shown in the lowest plots.

Figure 17 .
Figure 17.Comparison of our G395H result with six other G395H pipelines (listed by author and/or pipeline name) included in the study by Alderson et al. (2023).The JexoPipe result is rebinned to the wavelength grid of the comparison pipeline's publicly available spectrum.Residuals are JexoPipe minus the comparison pipeline spectrum.The average errors on the transit depths for the comparison spectrum and the JexoPipe spectrum when binned to the resolution of the comparison spectrum are shown in the lowest plots.

Figure 18 .
Figure 18.Comparison of our G395H result with five further G395H pipelines (listed by author and/or pipeline name) included in the study by Alderson et al. (2023) and the weighted mean result in that study.The JexoPipe result is rebinned to the wavelength grid of the comparison pipeline's publicly available spectrum.Residuals are JexoPipe minus the comparison pipeline spectrum.The average errors on the transit depths for the comparison spectrum and the JexoPipe spectrum when binned to the resolution of the comparison spectrum are shown in the lowest plots.

Figure B1 .
Figure B1.Joint posterior probability distributions and light curve fits for three example spectral light curve fits at full resolution.
JexoPipe: the data reduction framework used in this work.Blue boxes are steps utilised from the JWST Science Calibration Pipeline.Red boxes are customized steps.In Stage 1 the Prism and G395H pathways differ in the handling of saturated pixels, the application of reference pixels and jump detection.

Table 1 .
Summary of retrieved Prism white light curve parameters.

Table 2 .
Summary of retrieved G395H white light curve parameters.

Table 3 .
Summary of retrieved white light curve parameters using model 4-factor LDCs

Table 4 .
White light curve goodness-of-fit for 4-factor model LDCs vs a quadratic LDC fit.
To permit comparison with the other cases the baseline case was repeated here with the shorter MCMC runs used for the comparisons; hence the G395H NRS1 offset is slighly different from that quoted in the text for the longer MCMC run used for Figure15.