Gemini/GMOS Optical Transmission Spectroscopy of WASP-121b: signs of variability in an ultra-hot Jupiter?

We present ground-based, spectroscopic observations of two transits of the ultra-hot Jupiter WASP-121b covering the wavelength range $\approx$500 - 950 nm using Gemini/GMOS. We use a Gaussian process framework to model instrumental systematics in the light curves, and also demonstrate the use of the more generalised Student's-T process to verify our results. We find that our measured transmission spectrum, whilst showing overall agreement, is slightly discrepant with results obtained using HST/STIS, particularly for wavelengths shortward of $\approx$650 nm. In contrast to the STIS results, we find evidence for an increasing blueward slope and little evidence for absorption from either TiO or VO in our retrieval, in agreement with a number of recent studies performed at high-resolution. We suggest that this might point to some other absorbers, particularly some combination of recently detected atomic metals, in addition to scattering by hazes, being responsible for the excess optical absorption and observed vertical thermal inversion. Our results are also broadly consistent with previous ground-based photometry and 3D GCM predictions, however, these assumed different chemistry to our retrievals. In addition, we show that the GMOS observations are repeatable over short periods (days), similarly to the HST/STIS observations. Their difference over longer periods (months) could well be the result of temporal variability in the atmospheric properties (i.e. weather) as predicted by theoretical models of ultra-hot Jupiters; however, more mundane explanations such as instrumental systematics and stellar activity cannot be fully ruled out, and we encourage future observations to explore this possibility.


INTRODUCTION
Close-in giant exoplanets are amongst the most promising targets for detailed atmospheric studies of planets outside our own Solar System, owing to their favourable combinations of short periods, extended atmospheres and large planet-to-star radius ratios -all of which helps to maximise the achievable signal-to-noise of what are very challenging measurements. The extreme temperatures experienced by the most highly irradiated hot-Jupiters may also lead to a simplification of the atmospheric chemistry on these planets, due to the thermal dissociation of a large fraction of molecular species into their constituent elements -aiding in the interpre-★ Contact e-mail: jwilson34@qub.ac.uk tation of their atmospheric observations (Lothringer et al. 2018;Kitzmann et al. 2018). The technique of transmission spectroscopy, which measures the variations in the effective size of the planet as a function of wavelength, is an enormously successful method for investigating the chemical composition and physical structure of a planet's atmosphere (Brown 2001;Seager & Sasselov 2000). Transmission spectroscopy observations have provided many highsignificance detections of atomic and molecular species for a range of exoplanets, including detections of Na, K and H 2 O both from the ground and using space-based instrumentation (e.g. Charbonneau et al. 2002;Sing et al. 2015;Deming et al. 2013;Kreidberg et al. 2014;Stevenson et al. 2014;Nikolov et al. 2016Nikolov et al. , 2018Evans et al. 2016Evans et al. , 2018Spake et al. 2018;Carter et al. 2020). However, other observations have resulted in partially muted or even completely featureless spectra due to the influence of clouds and/or hazes (e.g. Sing et al. 2011Sing et al. , 2016Bean et al. 2011;Pont et al. 2013;Gibson et al. 2013b).
A number of recent studies have also focused on investigating the temperature structures of highly irradiated exoplanets and the presence of vertical thermal inversions (e.g. Knutson et al. 2010;Madhusudhan et al. 2011;Mollière et al. 2015;Beatty et al. 2017;Parmentier et al. 2018;Lothringer et al. 2018;Gandhi & Madhusudhan 2019). These inversion layers occur high in the planet's atmosphere where the temperature is increasing with altitude and are present in the atmospheres of almost all of the Solar System planets (Gillett et al. 1969;Wallace et al. 1974;Ridgway 1974). The Earth's stratosphere for example is due to absorption of UV photons high in the atmosphere by ozone and hazes. For very highly irradiated exoplanets it has been predicted that the molecules titanium oxide (TiO) and/or vanadium oxide (VO) could provide both the required opacity to incoming optical radiation and the necessary abundances to be strong candidates for the drivers of these thermal inversions (Hubeny et al. 2003;Fortney et al. 2008). Though thus far, and despite much effort, detections exist only for a limited number of planets (e.g. Nugroho et al. 2017;Sedaghati et al. 2017), and in most cases have been disputed (e.g. Herman et al. 2020;Espinoza et al. 2019). The presence of a thermal inversion can be inferred from planetary emission spectra by observing bands in emission rather than absorption, and this was first achieved for the hot Jupiter WASP-121b from observations of H 2 O on the dayside , and has since been verified using TESS optical phase curves and other observations (Daylan et al. 2019;Bourrier et al. 2020b).
WASP-121b is a highly irradiated ultra-hot Jupiter discovered by Delrez et al. (2016). It orbits a bright F6-type star (V = 10.4) and has a highly inflated atmosphere, both of which make it particularly amenable to transmission spectroscopy observations. It has a mass similar to that of Jupiter and a radius significantly larger -1.18 M J and 1.7 R J respectively, and an equilibrium temperature above 2400 K. WASP-121b is amongst the most extensively studied of the transiting planets with numerous observations having being acquired from the ground and from space and at both low-resolution and using high-resolution Doppler-resolved spectroscopy. At lowresolution, observations obtained with the Hubble Space Telescope (HST) have resulted in detections of H 2 O along with excess optical absorption which was tentatively attributed to TiO/VO . Follow-up observations revealed further evidence for VO (Evans et al. 2018) along with an unknown blue absorber suggested to be attributed to the SH molecule. However, additional secondary eclipse measurements have not been able to confirm the previous VO detection (Mikal-Evans et al. 2019 and instead suggest H − emission. It has not yet been possible to definitively show that either VO or TiO are responsible for the thermal inversion in the atmosphere of WASP-121b. Additionally, Beatty et al. (2017) and Parmentier et al. (2013) both show how cold-trap processes can interfere with the circulation of TiO/VO and suppress the formation of inversions, and it has also been demonstrated (Parmentier et al. 2018;Lothringer et al. 2018) that for the very hottest planets much of the TiO and VO will be thermally dissociated on the dayside and in these cases thermal inversions could instead be driven by NUV and optical absorption by gas phase metals such as Fe and Mg. Indeed, at high-resolution, the atmosphere of WASP-121b has been observed using both the HARPS (Bourrier et al. 2020a;Cabot et al. 2020) and UVES Merritt et al. 2020) spectrographs, which has resulted in significant detections of the atomic metals FeI and NaI and absorption from H-alpha but non-detections of both TiO and VO. FeI in particular is a strong optical absorber and its presence could therefore provide the source of the heating required to produce the observed temperature inversion Pino et al. 2020).
Even more recent observations at high-resolution have confirmed the previous metal detections and also uncovered evidence for multiple other atomic species including Mg, Na, Ca, Cr, Fe, Ni, V (Ben-Yami et al. 2020;Hoeĳmakers et al. 2020) and a very recent detection of Li using ESPRESSO on the VLT (Borsa et al. 2020). In the UV, SWIFT/UVOT observations revealed tentative evidence of metal ions high in the atmosphere (Salz et al. 2019), whilst HST/STIS observations detected strong absorption lines from iron and magnesium atoms ) extending far higher than the optical and near-infrared features and demonstrating the existence of an extended and possibly escaping atmosphere. In the present study we report optical transmission spectroscopy observations of WASP-121b using the Gemini Multi-Object Spectrograph (GMOS). Our observations aim to search for evidence of absorption by TiO/VO or other metals in order to infer the main driver of the thermal inversion by resolving the shape of these molecular features.
This paper is structured as follows: we describe our observations and data reduction steps in Section 2 and detail our light curve analysis in Section 3; in Section 4 we describe our atmospheric retrieval approach and discuss our results in Section 5. Finally, we offer our conclusions in Section 6.

GMOS OBSERVATIONS AND DATA REDUCTION
Two transits of the ultra-hot Jupiter WASP-121b were observed on the nights 2017 January 4 and 2017 January 9 (hereafter Transit 1 and 2, respectively) using the 8-m Gemini-South telescope with the Gemini Multi-Object Spectrograph (GMOS, Hook et al. 2004) as part of programme GS-2016B-Q-42 (PI: Gibson). GMOS consists of three 2k × 4k CCDs arranged side by side and separated by small detector gaps and has an imaging field-of-view of 5.5 × 5.5 arcminutes squared. Both transits were observed using the R400 grism + OG515 filter with a central wavelength of 725 nm in 2 × 2 binning mode covering the spectral range of 515 − 940 nm. To reduce readout time, we read out only three regions of interest (ROI) 1 on the chip including the target and two comparison stars. For Transit 1 we obtained 580 science exposures over a total period of 340 minutes, whilst for Transit 2 we obtained a total of 505 science exposures covering a total period of 290 minutes. For both transits we used exposure times of 10 s with a readout time of 20 s. For Transit 1 the on-sky separation of the Moon from the target was ≈ 97 degrees with lunar illumination of ≈ 44%. For Transit 2 the Moon was separated by ≈ 63 degrees with illumination of 93%.
To enable differential spectroscopy we observed two comparison stars simultaneously with WASP-121 for each transit and used a custom mask with wide slit widths (40 × 15 arcsec) in order to reduce the impact of differential slit losses. Only the brighter of our two comparison stars was used in the subsequent analysis since the other was found to be significantly fainter. The average FWHM for Transit 1 was found to be ≈ 4 pixels but varied between a maximum of ≈ 7 pixels and a minimum of ≈ 3 pixels, whilst for Transit 2 the average was ≈ 5 pixels, also varying between a maximum of ≈ 8 1 To speed up the readout time only pixels containing flux from the slits were recorded into separate regions of interest (ROIs) 5500 6000 6500 7000 7500 8000 8500 9000 9500 Wavelength ( pixels and a minimum of ≈ 4 pixels. The seeing-limited spectral resolution was found to be R ≈ 690 -1500 and R ≈ 600 -1200 for Transit 1 and 2 respectively, and we observed the target from an airmass of 1.51 to 1.01 for Transit 1 and 1.25 to 1.01 for Transit 2. The standard GMOS pipeline contained in the Gemini IRAF 2 /PyRAF 3 package was used to carry out the initial data reduction. We first converted the ROI images to standard GMOS format, and then proceeded with basic reduction procedures to de-bias and flat field the raw images using the calibration frames obtained at the beginning and end of each night. To extract the spectra we used a custom pipeline in IRAF/PyRAF and experimented with various aperture widths and background regions in order to minimise the residual scatter. For the final analysis we used an optimum aperture radius of 18 pixels and two background regions located 80 -100 pixels either side of the spectral trace. The background contribution was estimated by taking the median value within these regions. Figure 1 shows example spectra for WASP-121 and the comparison star for both transits obtained after spectral extraction.
To align each of our spectra we use the x-shifts obtained from a process of cross-correlation using suitable absorption features in the stellar spectra after normalising the continua. We repeated this cross-correlation procedure for a number of different absorption features but found the results to be insensitive to the specific feature chosen, in addition, we did not notice any significant stretching of the spectra when compared to the width of our bins over the timeseries, and we use the alignment obtained using the O 2 feature in our final analysis.
To calibrate the wavelength scales, we first tried the wavelength solution derived from arc frames obtained with a calibration mask with much narrower 1 arcsec slit widths. However, inspection of our spectra revealed residual offsets between the target and comparison 2 IRAF is distributed by the National Optical Astronomy Observatory, which is operated by the Association of Universities for Research in Astronomy (AURA) under cooperative agreement with the National Science Foundation 3 PyRAF is a product of the Space Telescope Science Institute, which is operated by AURA for NASA spectra and we instead constructed our solution by first identifying a set of well-resolved absorption lines in the mean spectrum, and then fitting a Gaussian to each of these lines to accurately retrieve the position of the line centres. Our solution is obtained by fitting these line centres using a Gaussian process (GP) with a standard squared exponential kernel (see Section 3.1 for a detailed description of our implementation of GPs) and we use the solution derived from this fit in our final analysis.
To produce the differential white light curves for each transit we bin the flux along the dispersion axis for the target and comparison star over the wide wavelength ranges shown in grey in Figure 1, and then divide the flux of the target star by the flux of the comparison star for the entire time-series to correct for variations in observing conditions and telluric effects. Multiple spectroscopic light curves are constructed in a similar way by integrating over each of the narrower bins shown in blue (avoiding areas which overlapped the detector gaps). We extracted 53 spectroscopic channels for both transits and the resulting differential white light curves and spectral light curves are shown in Figures 2, 3 and 4.
Our limb darkening coefficients and their associated uncertainties were obtained using the PyLDTk toolkit (Parviainen & Aigrain 2015), which uses the PHOENIX models of Husser et al. (2013). For this we used the stellar parameters for WASP-121 and the system parameters and uncertainties as given in (Delrez et al. 2016). Finally, we also examined various diagnostic measurements, including the FWHM of the spectral trace and the shifts in the dispersion and spatial axes to check for correlations with the instrumental systematics (e.g. Brown 2001;Gilliland & Arribas 2003;Pont et al. 2007;Swain et al. 2009;Stevenson et al. 2010;Gillon et al. 2012;Huitson et al. 2013;Nikolov et al. 2016), though in this case no obvious correlations were found.

White Light Curve Analysis
To model our white light curves we use a very similar procedure to that previously outlined in Wilson et al. (2020), which implements the methodology introduced in Gibson et al. (2012). In brief, we fit using a time-dependent GP 4 simultaneously with the analytic transit model of Mandel & Agol (2002) with quadratic limb darkening. GPs have been successfully implemented in a large number of similar studies (e.g. Gibson et al. 2013a;Evans et al. 2017Evans et al. , 2018Nikolov et al. 2018) and have been shown to be extremely useful for modelling correlated noise in exoplanet time-series, whilst offering robust uncertainty estimates. Our joint posterior distribution is given by the multivariate Gaussian: where and are the vectors of time and flux measurements respectively, T is the analytic transit mean function with parameters φ and is the covariance matrix with hyperparameters θ. Correlations between data points are described using the covariance matrix which is calculated using a kernel function with parameters θ (for a more detailed review of GPs and kernels see Rasmussen & Williams 2005). In this work we use the Matérn 3/2 kernel which can be viewed as a less smooth version of the more commonly used squared exponential kernel and which is defined as: where is the height scale, Δ t is the time difference of observations, is the inverse length scale, is the Kronecker delta and is the white noise term which is identical for each data point. As a check we repeated our analysis using a squared exponential kernel but found that our results were unaffected. Since no obvious correlations were observed between the form of the systematics in the light curves and the diagnostic measurements, we proceeded by using time as our only input variable, which has the added advantage of requiring fewer assumptions. We obtain the posterior probability distributions of the parameters of interest by first specifying prior distributions for the hyperparameters of the kernel function and then multiplying by the marginal log-likelihood.
Our GP mean function assumes a circular orbit with period fixed to that reported by Delrez et al. (2016). For each white light curve we allow the mid-transit time (T c ) and linear baseline parameters (f oot , T grad ) to vary as free parameters and fixed the system scale (a/R ★ ) and impact parameter b to the values previously constrained in Evans et al. (2018). For the planet-to-star radius ratio ( = R p /R ★ ) we use a Gaussian prior centered on the reported value in Evans et al. (2018). Constraining the white light curve parameters to previously reported values allows us to more easily compare the respective results and should help to improve the accuracy of our systematics models.
To account for the effects of stellar limb darkening, we used a quadratic limb darkening law (Claret 2000) and placed Gaussian priors on the quadratic coefficients c 1 and c 2 with a mean and standard deviation determined using PyLDTk. We also tried repeating our analysis having both fixed the limb darkening parameters to their best-fit values and leaving the parameters completely free in the fits, but found that this variation in treatment did not affect our results (see Section 5 where we discuss the impact of our limb darkening treatment further). We summarise the assumed parameter values for the white light curves in Table 1. Similar to previous studies (e.g. Gibson et al. 2017;Evans et al. 2017), we fit for log and log 4 For the implementation of our Bayesian inference we made extensive use of the Python modules G P and I which are freely available from https://github.com/nealegibson with uniform priors and constrain the length scales to lie between the cadence and twice the total duration of our observations.
We performed a joint fit to both individual transits, allowing for only a single planet-to-star radius ratio and limb darkening coefficient pair, whilst allowing the hyperparameters to vary separately for each transit. To check the validity of our assumed parameter values, we also carried out independent fits to both of our individual white light curves finding the values to be consistent within 1 sigma to those of Evans et al. (2018) for both transits, though the measured planet-to-star radius ratio was found to be higher for one transit and slightly lower for the other (discussed further in Section 5). This ultimately results in an offset in the mean levels of the individual transmission spectra, though it does not affect their relative values and we find that both transits produce spectra with a consistent shape. We further tested this by again fitting each transit independently, but this time using the same priors as for our joint fit (shown in Figure 5). We confirm that the shape of the individual spectra are not affected by imposing these priors and find that both transits produce spectra which are in excellent agreement, though with quite different average uncertainties. We computed the reduced 2 between the two spectra to be 0.7 by taking the difference between each pair of points and adding the uncertainties in quadrature 5 . The overall mean level of the transmission spectrum can also be influenced by inaccuracies in the common-mode correction, though this similarly should not affect the relative values, but can potentially lead to offsets, especially when including multiple datasets from different instruments, and needs to be carefully considered when interpreting results. The individual transmission spectra for Transits 1 and 2 are shown in Figure 5 and the results from the joint fits are shown in Figure 6. We discuss these results further in Section 5.
To find the best-fitting models for each of our white light curves we began by using a differential evolution algorithm to optimise the log-likelihood with respect to the transit and kernel parameters using the values from Delrez et al. (2016) and Evans et al. (2018) as the initial points. Next, we used a Nelder-Mead simplex algorithm (Nelder & Mead 1965) to fine-tune these estimated values and, finally, we ran a Markov-Chain Monte-Carlo (MCMC) method to marginalise our posterior distributions. Four independent chains with length 80,000 were initiated for each of our light curves with the first 40% of samples in the chain being considered as burn-in and discarded. We used the Gelman-Rubin statistic to confirm mutual convergence for each free parameter. We show the best-fit white light curve models and derived systematics models and residuals for both transits in Figure 2.

Spectroscopic Light Curve Analysis
To construct our spectroscopic light curves, we began by binning over the narrow wavelength regions shown in blue in Figure 1. For each transit, we extracted 53 individual light curves using a uniform bin width of 75 Å whilst avoiding the GMOS detector gaps. We show the resulting raw light curves in the left panels of Figures 3 and 4. The raw light curves for both transits show significant systematics which are largely independent of wavelength. This is a similar situation to that encountered for the FORS2 dataset in Wilson et al. (2020) and we proceeded in much the same way by dividing each of the spectroscopic light curves by the common-mode 5 where the difference between pairs of data points will follow a normal  corrections derived from the white light curves. Additionally, to remove any remaining high-frequency systematics, we also subtract the residuals from the white light curves fits. The net result of this procedure is to provide significant improvements to the precision of our transmission spectrum whilst preserving the relative values of the planet-to-star radius ratios. After applying the common-mode correction described above we find the best-fitting model for each spectroscopic light curve using the same process and the same systematics model as described in Section 3.1. The main difference is that we fix the mid-transit time to the best-fit value from the white light curve analysis, whilst allowing the planet-to-star radius ratio, limb darkening parameters, linear baseline parameters and kernel hyperparameters to vary for each fit. As for the white light curves, we perform a joint analysis allowing for only a single planet-to-star radius ratio and limb darkening coefficient pair for each individual wavelength channel, but allow the hyperparameters to vary separately for each transit. To add another layer of flexibility to our model we use wide Gaussian priors with a standard deviation of 0.1 for the limb darkening coefficients, with a mean given by the best fit values determined using PyLDTk. We again used the same differential evolution algorithm to optimise the log-likelihood as for the white light curves, and fine-tuned using a Nelder-Mead simplex algorithm. Outliers deviating more than 4 from the predictive distribution were removed for each fit (this procedure typically clipped 1-2 points per light curve) and we used the same MCMC method described above to marginalise our posterior distributions. The best-fit common-mode corrected GP models for the spectroscopic light curves are shown in the middle panels of Figures 3 and 4 and we list the measured planet-to-star radius ratios and uncertainties in Table 2. Our GMOS transmission spectrum for WASP-121b is shown in Figure 6 and the results are discussed further in Section 5.

Modelling Systematics with Student's-T Processes
To investigate the robustness of the analysis outlined above, we performed some additional modelling using a Student's-T Process (STP), implemented with a modified version of our GeePea code. STPs have already been successfully applied to Bayesian optimisation problems and aerospace design (e.g. Shah 2013; Tracey & Wolpert 2018) and have been shown to come at no additional computational cost over GPs (Shah et al. 2014) but, to our knowledge, have not yet been applied to the analysis of systematics in transit light curves.
Whilst the basis of a GP is the multivariate Gaussian distribution, the basis of a STP is instead the multivariate Student's-T distribution (Genz & Bretz 2009) and therefore, similarly to a GP, a STP defines a prior over functions. A STP differs however from a GP in two ways: firstly, a Student's-T distribution can have a higher kurtosis than the corresponding Gaussian distribution, and therefore a STP prior can assign higher probability to extreme outliers. Secondly, with a GP, the conditioned posterior variance (excluding white noise) depends exclusively on the location of the input variables, whilst with a STP the posterior variance will also depend explicitly on the observed values. To account for this extra complexity, a STP has an additional parameter, , which describes the degrees of freedom of the distribution. As the degrees of freedom approaches infinity, the multivariate Student's-T distribution converges to the corresponding multivariate Gaussian distribution. Thus, a STP can be considered as a generalisation of a GP with an additional parameter. Following the derivation outlined in Tracey & Wolpert (2018), the pdf of a STP is given by: where and are the mean and degrees of freedom of the distribution respectively and is the dimension. is a symmetric positive definite shape parameter which is related to the covariance matrix by: set of observations = {( 1 , 1 ), ( 2 , 2 ), ( 3 , 3 ),...}, the posterior mean and covariance will be given by: and: These two equations are the same as that which would be obtained for the equivalent GP, except for an additional term modifying the covariance, which can be seen as a corrective factor which depends explicitly on the observed values. In short, if the observed values are consistent with a GP then the posterior covariance of the STP will be roughly equal to that of a GP. Conversely, if the variation in the observed values is greater or smaller than expected then the STP posterior covariance will be larger or smaller respectively. In other words, for the same kernel, the posterior means will be identical, but the variance will differ, depending on the observed values.
For our implementation of STPs we used the same Matérn 3/2 kernel and hyperparameter priors as for our GP analysis described in Sections 3.1 and 3.2. For the degrees of freedom parameter we set a prior condition of > 4 to ensure that the distribution has both finite variance and finite kurtosis. We then carry out the same procedure as before to perform a joint fit to both transits. We compare the results from the STP analysis with those from our GP analysis in Figure  A1 in the appendix. As can be seen from the figure our results are almost identical across the full spectrum, with a mean uncertainty that is inflated by only a few percent and a posterior mean for that was typically a very large number. Our results therefore verify that the observed data can indeed be well described by a GP in this case, and gives us greater confidence in our estimated uncertainties.
For the analysis as implemented here we should acknowledge that the value for is common for both the stochastic component, and for the noise. Ideally, these should be treated independently with separate instances of for each. However, the fact that we recover almost identical results to our GP analysis is reassuring. We will return to examine this potential shortcoming in greater detail in future work. A more detailed discussion on the comparisons between STPs and GPs and their respective strengths and weaknesses is beyond the scope of the present study.

ATMOSPHERIC MODELLING WITH PETRA
To obtain constraints on the vertical temperature structure and terminator composition of WASP-121b, we performed an atmospheric retrieval using an adaptation of the retrieval code PETRA (PHOENIX ExoplaneT Retrieval Algorithm; Lothringer & Barman 2020). In addition to the optical GMOS data presented in Table 2, our retrieval also incorporates the near-IR HST/WFC3 data of Evans et al. (2016). Atmospheric retrieval codes typically comprise a forward model which is used to generate the predicted atmospheric spectrum and include a statistically robust inference method for parameter estimation. PETRA incorporates a modified version of the well-tested and self-consistent atmosphere model PHOENIX as its forward model, which has been widely used to study the atmospheres of both stellar and sub-stellar atmospheres (Hauschildt et al. 1997(Hauschildt et al. , 1999Allard et al. 2011;Barman et al. 2001Barman et al. , 2011Lothringer et al. 2018;Lothringer & Barman 2019).
The model assumes hydrostatic equilibrium and a plane parallel atmosphere and solves for radiative transfer in a line-by-line  . The red line shows the best-fit from our GMOS/WFC3 retrieval analysis using PETRA along with 1 significance contours (light red). We also show the best-fit model from the STIS/WFC3 retrieval (purple) described in Lothringer et al. (2020a) for comparison. Astrophysical Chemical Equilibrium Solver (ACES) to calculate the equation of state for each species with elemental abundances scaled uniformly by a free-parameter for the bulk metallicity. We treat H 2 O, TiO, and VO abundances as free and independent parameters. Line profiles are calculated using Voigt profiles. Details of the sources of the various opacity cross-sections can be found in Lothringer et al. (2018) and Lothringer & Barman (2019).
For the p-T profile we make use of the three-channel Eddington approximation outlined in Parmentier & Guillot (2014) as implemented in Line et al. (2013). This approach has five free parameters which describe the Planck mean thermal IR opacity, two independent downwelling visible channels of radiation, the partition of the flux between the two visible streams and a catch-all term which describes the albedo, emissivity, and day-night redistribution. Our model also considers the contributions from cloud/haze coverage as described in MacDonald & Madhusudhan (2017), with parameters cloud (the cloud-deck altitude in bars), a parameter which describes the scattering enhancement factor and a parameter which describes the scattering slope. For a full description of the retrieval framework see Lothringer & Barman (2020).
We use a Differential Evolution Markov Chain (DEMC) algorithm (Ter Braak 2006) with "snooker" updates (Ter Braak & Vrugt 2008) as a statistical sampling algorithm to obtain the posterior distributions of the forward model parameters, checking for convergence using the Gelman-Rubin statistic. Lastly, we also include an offset between the GMOS/HST datasets as another free parameter to account for potential discrepancies. This is particularly important given that the Evans et al. (2016) analysis adopted slightly different values for the transit shape parameters to those adopted in Evans et al. (2018). We used wide uniform priors for the temperature parameters, the H 2 O, TiO, and VO abundances (between 10 −12 and 10 −1 ) and [Fe/H] (between 10 −1 and 10 4 ). For our final parameter and uncertainty estimates, we also inflated the error bars by ∼ 13% so that our fits were approaching a chi-square of 1.0.
Our retrieval results are summarised in Table 3. We find little evidence for either TiO or VO with 1 upper limits on the best-fit absolute terminator abundances of log(X TiO ) < -9.60 and log(X VO ) < -10.25 respectively. We also constrain the H 2 O abundance to log(X H 2 O ) = -5.05 +0.19 −0.18 and find a best-fit mean temperature of 2664 +120 −100 K. In contrast to the results in Evans et al. (2018), our retrieval favours a scattering/haze slope rather than absorption by TiO/VO, with best-fit scattering enhancement and slope of 2.43 +0.31 −0.29 and -7.51 +1.72 −1.86 respectively. This is perhaps not all that surprising given the overall smoother, sloping shape of our transmission spectrum. For our best-fit model we reach a reduced 2 of 1.485 (for 14 free parameters) which is comparable to the value in Evans et al. (2018). Interestingly, our retrieval also tries to fit the metal hydride CaH to the apparent feature at ∼0.7 microns, although we find only weak evidence for it (ΔBIC = 2.14). CaH absorption is observed in some cool stars and brown dwarfs (e.g. Kirkpatrick et al. 1999;Reiners et al. 2007;West et al. 2011).
The best-fit from our GMOS/WFC3 retrieval along with 1 sigma confidence contours are shown in Figure 6 and the full marginalised posterior probability densities are presented in Figure A3 in the appendix. A similar retrieval using PETRA was also carried out on the original STIS and WFC3 data and is presented in detail in Lothringer et al. (2020a). In that study the Fe/H, H 2 O and VO abundances were found to be slightly reduced from that reported in Evans et al. (2018) though the overall interpretation remained largely the same. We also show the best-fit model from this retrieval in Figure 6 for comparison.

DISCUSSION
WASP-121b is a close-in, ultra-hot Jupiter with a mass and radius of 1.18 M J and 1.7 R J respectively, an equilibrium temperature over 2400 K and an orbital period of 1.27 days (Delrez et al. 2016). In Evans et al. (2017) strong evidence was found for the presence of a vertical thermal inversion using secondary eclipse measurements and this was subsequently confirmed in (Bourrier et al. 2020b).
The cause of such thermal inversions in the atmospheres of hot-Jupiters has traditionally been suggested to be due to absorption by TiO and/or VO (Fortney et al. 2008), though a convincing detection of either molecule has yet to be made for WASP-121b. Evans et al. (2018) presented a STIS optical transmission spectrum of WASP-121b which showed features consistent with absorption by VO. However, follow-up secondary eclipse observations and recent observations at high-resolution have been unable to confirm this detection (Mikal-Evans et al. 2020;Merritt et al. 2020). This has led to some speculation that other species, such as the plethora of atomic metals including FeI that have recently been detected, might be the drivers for WASP-121b's temperature inversion (Lothringer et al. 2018;Gandhi & Madhusudhan 2019;Gibson et al. 2020). The transmission spectrum obtained from our joint analysis of two transits of WASP-121b is shown in Figure 6, with the STIS data of Evans et al. (2018) and the results of a re-analysis of the groundbased photometry from Delrez et al. (2016) (as presented in Evans et al. (2016) overplotted. For a small number of our points the values and uncertainties appear to deviate slightly from what would be expected from a simple weighted average of the two individual transits. This is likely due to more accurate limb darkening constraints from the joint fits, which should also improve the constraints on the corresponding systematics models. Overall, our transmission spectrum shows some agreement with that of Evans et al. (2018) for wavelengths longward of ∼ 650 nm, with some similar features being apparent in the middle portion of the spectrum. However, the results begin to diverge noticeably at wavelengths shorter than about ∼ 650 nm, with the GMOS values being consistently larger than the STIS measurements across this short wavelength portion of the spectrum. We considered a number of factors that might be responsible for this apparent disagreement. The first of these is the treatment of limb darkening, which can potentially introduce small discrepancies in the transmission spectrum if not treated correctly, and will be particularly true at shorter wavelengths where the effects are strongest. For the analysis we have presented here we used the best-fit values calculated using PyLDTk and imposed wide Gaussian priors with a width of 0.1 to add some flexibility to the model fits. However, we also tried both fixing the limb darkening coefficients to the best-fit values and leaving them completely free in the fits, and found that this made almost no difference to our final transmission spectrum. In the Evans et al. (2018) study they used 3D limb darkening coefficients from the STAGGER grid (Magic et al. 2013), so, as a check to ensure that our results were not overly dependent on our particular choice of limb darkening treatment, we re-extracted all light curves using identical bins as used for the G750L dataset in Evans et al. (2018), and repeated the analyses having fixed the limb darkening coefficients to the same best-fit values obtained from the STAGGER 3D stellar model. A comparison of our original spectrum and that produced from this subsequent analysis is shown in Figure A2 in the appendix. We find that adopting the STAGGER 3D limb darkening coefficients has a minimal influence on the overall shape of our transmission spectrum and is unable to explain the discrepancy between the GMOS and STIS datasets. This, together with the fact that we see little variation from fixing the limb darkening parameters or leaving them free, leads us to conclude that our results appear to be relatively insensitive to the treatment of limb darkening. Stellar activity, including the possibility of unocculted spots or plages, has the potential to introduce both systematic offsets and wavelength-dependent biases in the measured transmission spectrum, with the effect being strongest for shorter wavelengths. In addition, changes in the wavelength dependence, as a result of timevariable spot coverages, could plausibly explain the observed discrepancy between the GMOS and STIS datasets given the separation in time of the individual observations. In Rackham et al. (2019) the estimated contamination for an F6 dwarf was found to be around a factor of ∼ 1.001, which corresponds to an offset in transit depth of ∼ 0.0025 %. The authors conclude that, whilst the effects of stellar contamination are more pronounced at shorter wavelengths, unocculted spots on typically active FGK dwarfs should only result in minor transit depth changes. In the WASP-121b discovery paper, Delrez et al. (2016) used the 60 cm TRAPPIST telescope to investigate the photometric variability of WASP-121 over a six-week period, finding standard deviations of 1.6 mmag in the B band, 1.3 mmag in the V band and 1.1 mmag in the z band for the nightly photometry, and found no evidence for periodic variability above the ∼ 1 mmag level. Despite finding the photometry to be quiet, they nonetheless found that WASP-121 shows high scatter in its RV residuals, CCF bisector spans and FWHM from an analysis of CORALIE spectra, and suggested that this could point to the photosphere being plage dominated, since the lower flux ratios with respect to spots would result in smaller brightness variations (e.g. Dumusque et al. 2014).
Similarly, Evans et al. (2018) also carried out photometric monitoring of WASP-121 using the Automated Imaging Telescope (AIT) during two separate campaigns in 2017 and 2018 and found a standard deviation about the yearly mean of 4.6 mmag for the 2017 campaign and 3.0 mmag for the 2018 campaign, with no significant periodicity detected for either campaign. They concluded that host star activity was unlikely to have significantly affected their transmission spectrum. In addition, they demonstrated that unocculted spots were unable to explain the shape of their measured spectrum under reasonable assumptions. Given this relatively quiet photometry and considering that both of our visits (separated by five days) analysed individually result in transmission spectra with very similar shapes, we likewise do not expect stellar activity to have significantly affected our results. However, taking into account the signatures of activity presented in the discovery paper, together with the temporal separation of the observations, we cannot entirely rule out the possibility that variable activity levels could at least partially be responsible for the discrepancies between our results and those obtained using STIS.
In order to quantify the potential effect of varying activity, we follow a similar approach to that outlined in Evans et al. (2018), which is an equivalent method to calculating spot correction factors as used in previous analyses (e.g. Sing et al. 2011;Huitson et al. 2013). Using this approach we fit for the chromatic bias ( ), which we here define as the difference between the measured GMOS transmission spectrum and the weighted average of the G750L STIS points and is given by: where is the fractional spot coverage (also known as the filling factor), is the wavelength-dependent spot-to-photosphere flux ratio and D is the true transit depth in the absence of spots. In our fits we set D equal to the weighted average of the STIS measurements and allow alpha to vary as a free parameter. This simple model assumes a fractional spot coverage and fixed spot temperature, and we use a model atmosphere obtained from the Phoenix grid (Husser et al. 2013) with parameters T ★ = 6500 K, logg = 4.0 cgs, [Fe/H]= 0 dex to estimate the flux for the stellar photosphere. We use the same model to estimate the spot fluxes for a range of spot temperatures from 3500 K up to 6000 K and repeat our fits for each spot temperature. We found that none of our bias models were able to adequately explain the difference between the GMOS and STIS datasets, though the fits improved as the spot temperature was increased, however, this also required the spot fractions to increase from at least 2% coverage to about 8% coverage. We repeated this process subtracting the datasets in the opposite order, finding similar results. Given the suggestion in Delrez et al. (2016) that the activity might be dominated by plage, we also tried fitting for higher temperature spots but found that a similar level of coverage was required in this case. This level of spot coverage variation would almost certainly be expected to result in larger features in the long-term monitoring provided by TRAPPIST, AIT and TESS which, although limited by both photometric precision and temporal coverage, suggest modulations no larger than about 5 mmag. Given the complexity involved in modelling the effects of stellar activity, we cannot completely rule out the possibility of changing activity levels with the simple model used here, although we consider it unlikely that it could account for the full variability observed between the data sets.
Another likely explanation for the disagreement is the possibility of remaining, unaccounted-for systematics in one or both of the GMOS transits, or within the STIS datasets, and perhaps even within both. Although independent analyses of our two individual transits already reveals an overall offset between them, we nonetheless recover transmission spectra with a consistent shape for both (see Figure 5), and we have shown that this does not change with our choice of priors for the transit shape parameters. It therefore seems strange that unaccounted-for systematics could conspire to alter the shape of our final transmission spectrum, unless they affected both individual transits in a similar way. However, this is equally true of the STIS datasets (which were analysed using a similar Gaussian process technique), where good agreement was found for separate fits to the individual G430L visits. We therefore cannot rule out the possibility that some unaccounted-for systematics in the GMOS and/or STIS transits are the cause of the discrepancy at short wavelengths. We further discuss potential causes of the disagreement between datasets in Sect. 5.1.
Whilst we fail to detect any significant evidence for TiO or VO absorption in our PETRA retrieval, our transmission spectrum clearly shows excess absorption in the optical relative to that in the near-IR as reported in previous studies. Our results would therefore tend to support the hypothesis that other species are responsible for this optical excess. In particular, a number of atomic metals, including FeI, have already been detected in the atmospheres of WASP-121b and several other ultra-hot Jupiters at high-resolution (e.g. Hoeĳmakers et al. 2019;Cauley et al. 2019;Sing et al. 2019;Gibson et al. 2020;Hoeĳmakers et al. 2020;Nugroho et al. 2020) and would be expected to be strong sources of optical absorption (Fortney et al. 2008;Lothringer et al. 2020b). The best-fit from our retrieval analysis favours a scattering/haze slope, and absorption by FeI and other atomic metals could potentially help to strengthen this slope. Additionally, UV observations of WASP-121b have shown evidence of atmospheric escape ) and this could also enhance the effect of metals on the transmission spectrum if there is substantial material escaping from the atmosphere.
It has not been possible to uniquely identify these absorbing species in our low-resolution observations, though it seems likely that one or more of the reported species could be contributing to the observed optical excess and driving the thermal inversion Gibson et al. 2020;Lothringer et al. 2020b). It is also possible that our non-detection of either TiO or VO could be explained simply by these molecules being condensed out on the cooler nightside and terminator regions. In Figure 7 we show a comparison of the GMOS transmission spectrum and WFC3 data from Evans et al. (2016) with the predictions of a three-dimensional general circulation model (GCM) including TiO/VO/FeH opacity from Parmentier et al. (2018). The two models correspond to a temperature map produced by a solar composition SPARC/MITgcm simulation with asymmetric temperatures between the west and east limbs and assume either a clear atmosphere or the presence of CaTiO 3 clouds. It is clear from this comparison that the GMOS data is in broad agreement with both the ground-based photometry and GCM predictions, even though the models have not themselves been fine-tuned to the data. Given this broad agreement we cannot entirely rule out TiO/VO despite not resolving such features in our transmission spectrum. Additionally, the reasonably strong haze slope favoured by PETRA is inconsistent with the short wavelength turn down in the GCM. There thus remains a number of plausible scenarios that could explain the current data and further observations at shorter wavelengths may help to determine if the absorption keeps rising (as favored by PETRA) or turns back down (as favored by the GCM).
A clearer picture of the overall atmospheric composition of WASP-121b may also emerge with broader wavelength coverage including in the IR with e.g. JWST. Accurate modelling of the full optical-infrared continuum is crucial for reliable retrievals, helping to break some of the degeneracies between model parameters (Pinhas et al. 2018;Wakeford et al. 2018). A further way to break some of these degeneracies and help constrain the atmospheric properties could be achieved by combining the low-resolution observations with those obtained at high-resolution, either by joint fitting or by combining the posterior distributions of parameters, for example using direct likelihood evaluation techniques (e.g. Brogi & Line 2019;Gibson et al. 2020).

Variability in the transmission spectrum?
A further, and somewhat intriguing, interpretation of the differences in the STIS and GMOS transmission spectra is that they are direct evidence of temporal variability in the atmospheric properties of WASP-121b, on timescales similar to that of the separation between the individual datasets. Such time-variability has previously been evoked to explain the changing phase curve offsets observed In Parmentier et al. (2013), the authors use a 3D GCM to simulate dynamical mixing and vertical settling for the well-studied hot-Jupiter HD 209458b, and show that strong variability of condensable chemical constituents is expected both spatially and temporally, driven mainly by the large day-to-night temperature contrast, and could result in observable variations during transit measurements. Furthermore, these results may also apply to silicate cloud/haze coverage, which could also display variability from epoch to epoch. Depending on the size of the condensates involved, the simulations predict an expected period of fifty to one hundred days for the largest amplitude variations, which is similar to the separation between the STIS/GMOS observations. Similarly, in Komacek & Showman (2020), the authors used GCM simulations to investigate the time-variability of hot-Jupiter atmospheres, finding ∼0.1%-1% variations in globalaverage, dayside-average, and nightside-average temperatures, and ∼1%-10% variations in globally averaged wind speeds, even when ignoring variability from magnetic effects and clouds. They also show how these variations could result in time-variable secondary eclipse depths, phase-curve amplitudes and offsets and terminatoraveraged wind speeds. From phase-curve retrievals, WASP-121b is expected to have wind speeds of ∼7 km/s and a p-T profile which lies near the condensation curves of a number of species (e.g. Parmentier et al. 2018;Daylan et al. 2019). It is therefore perhaps not all that surprising that small temperature fluctuations could result in significant spatial and temporal variations in atmospheric constituents and could lead to measurable variations in transit measurements.
Taken together the STIS/GMOS observations represent a somewhat unique dataset in that they both consist of sets of repeat observations separated by several days, with a longer period of ∼60 days between these sets. The STIS dataset comprises two repeat observations using the G430L grating and one with the G750L grating with individual analyses of the G430L data giving consistent results and showing a good level of agreement both with the NUV data from Sing et al. (2019) and with the G750L dataset across the overlapping wavelength range. In their analysis Evans et al. (2018) carried out a comprehensive range of tests to evaluate the robustness of their transmission spectrum, including a sensitivity analysis of their limb darkening treatment, the inclusion of additional GP inputs and a consideration of the effects of stellar activity. Evans et al. (2018) also noticed a small discrepancy between their results and the ground-based photometry results obtained by Delrez et al. (2016), and also briefly speculated that intrinsic variability of the atmosphere from epoch to epoch could be responsible (these observations were separated by over 100 days). Both sets of individual observations for each instrument show excellent repeatability on short timescales, whilst the results over a longer period do not. It therefore seems plausible that the discrepancies between the STIS, GMOS and ground-based photometry could be the signature of time-variable weather conditions in the atmosphere of WASP-121b. However, we cannot entirely rule out the possibility that coherent systematics could be affecting the individual observations for one or both of the datasets in a similar way. Whilst a number of studies have shown some disagreement between ground/spacebased instrumentation likely due to systematics (e.g. Gibson et al. 2017Gibson et al. , 2019, others have shown good agreement (e.g. Alam et al. 2020) and we therefore consider temporal variability to represent a credible and intriguing possibility.
The fact that both the GMOS and STIS datasets can be fit with physically plausible PETRA models, are repeatable (over ∼ day timescales), appear insensitive to limb darkening treatment and are a reasonable match to theoretical GCM predictions, means that we have no strong reason to discount the validity of either dataset and is therefore suggestive of the presence of time-variability. However, neither can we fully rule out the possibility of more mundane causes, namely the influence of stellar activity and instrumental systematics, given the difficulty in robustly accounting for them even with our best instrumentation (e.g. Gibson et al. 2017Gibson et al. , 2019Espinoza et al. 2019). We would therefore highly recommend further transit observations in the optical regime using appropriate timescales to specifically search for signs of such variable weather conditions.

CONCLUSION
We have presented ground-based Gemini/GMOS observations of the ultra-hot Jupiter WASP-121b covering two full transits and extracting a transmission spectrum over the wavelength range ≈ 500 -950 nm using the technique of differential spectrophotometry. We used a Gaussian process to simultaneously model the deterministic transit component and the instrumental systematics thereby avoiding the need to specify a specific functional form. We also introduced a new analysis technique for dealing with instrumental systematics in exoplanet transit light curves using Student's-T processes, which can be thought of as a generalisation of the more commonly employed Gaussian processes. We used this new method to verify the analysis we have described here, finding that the results are in excellent agreement. We used the derived systematics models to correct our spectroscopic light curves by the application of common-mode corrections to improve the precision of our transmission spectrum. In contrast to the STIS results, we find evidence for an increasing blueward slope and little evidence for absorption from either TiO or VO from an atmospheric retrieval using PETRA, in agreement with a number of recent studies performed at high-resolution. We suggest that this might point to other absorbers, such as some combination of atomic metals, being responsible for the optical excess and the vertical thermal inversion observed for WASP-121b. The scattering/haze slope favoured by our retrieval could be enhanced by both absorption from FeI and other atomic metals and from the atmospheric escape of material.
We considered a number of factors that might be responsible for the apparent discrepancy between the GMOS and STIS observations and conclude that, while it is still possible that the disagreement could be caused by either contamination from host star activity or some unaccounted-for systematic effects, the excellent repeatability of both individual datasets over day-long periods, and difference over longer month-long periods suggests that it is plausible that this discrepancy represents time-variable weather conditions in the atmosphere of WASP-121b, as predicted by theoretical models of ultra-hot Jupiters. We recommend further observations to both confirm this time-variability and help to further constrain the atmospheric properties, ultimately revealing the driver of the vertical thermal inversion.  Figure A2. Our original GMOS transmission spectrum of WASP-121b compared with the result of a re-analysis using identical bins and the same STAGGER 3D limb darkening coefficients as in Evans et al. (2018). The green points show our original transmission spectrum whilst the red points are the result of the re-analysis.