Features in the Broadband Eclipse Spectra of Exoplanets: Signal or Noise?

A planet's emission spectrum contains information about atmospheric composition and structure. We compare the Bayesian Information Criterion (BIC) of blackbody fits and idealized spectral retrieval fits for the 44 planets with published eclipse measurements in multiple thermal wavebands, mostly obtained with the Spitzer Space Telescope. The evidence for spectral features depends on eclipse depth uncertainties. Spitzer has proven capable of eclipse precisions better than 1E-4 when multiple eclipses are analyzed simultaneously, but this feat has only been performed four times. It is harder to self-calibrate photometry when a single occultation is reduced and analyzed in isolation; we find that such measurements have not passed the test of repeatability. Single-eclipse measurements either have an uncertainty floor of 5E-4, or their uncertainties have been underestimated by a factor of 3. If one adopts these empirical uncertainties for single-eclipse measurements, then the evidence for molecular features all but disappears: blackbodies have better BIC than spectral retrieval for all planets, save HD 189733b, and the few planets poorly fit by blackbodies are also poorly fit by self-consistent radiative transfer models. This suggests that the features in extant broadband emission spectra are due to astrophysical and instrumental noise rather than molecular bands. Claims of stratospheric inversions, disequilibrium chemistry, and high C/O ratios based solely on photometry are premature. We recommend that observers be cautious of error estimates from self-calibration of small data sets, and that modelers compare the evidence for spectral models to that of simpler models such as blackbodies.


INTRODUCTION
An exoplanet on an edge-on orbit periodically passes behind its host star. The decrement in thermal flux that occurs during such an eclipse is a measure of the dayside brightness temperature of the planet at that wavelength. The brightness temperature of a planet varies with wavelength because of the wavelength-dependent opacity and vertical temperature profile of the atmosphere. 3 If different wavelengths probe the same atmospheric layer (e.g., a cloud deck) then the planet will appear to have a blackbody spectrum. In the absence of clouds, a planet may still have a blackbody spectrum if the atmospheric layers probed are isothermal.
At sufficiently high spectral resolution, we do not expect any planet to emit like a blackbody, because molecular absorption lines dictate that the atmospheric opacity varies dramatically between neighboring wavelengths and temperature cannot be isothermal throughout the atmosphere. So far, however, the vast majority of eclipse measurements have been broadband photometry.
In principle, the detection of molecular bands in the infrared emission spectrum of a planet enables the retrieval of greenhouse gas abundances and the vertical temperature profile of the planet. A typical retrieval model uses 10 parameters to describe the atmospheric composition and vertical temperature profile (Madhusudhan & Seager 2009), while a typical hot Jupiter has only been observed in 2-4 thermal broadbands. The retrieval problem is thus under-constrained.
A widely noted consequence of the parameter-data mismatch is that exact atmospheric properties cannot be uniquely determined. This has not stopped researchers, however, from placing interesting limits on certain parameters (principally C/O ratios and the presence of temperature inversions; Madhusudhan et al. 2011), and from finding trends between these atmospheric properties and those of the host star (Knutson et al. 2010).
The more troubling aspects of under-constrained retrieval are that (1) there is no way to reject erroneous measurements, and (2) the estimated uncertainties on eclipse depths directly affect the uncertainties on atmospheric parameters. This is in stark contrast to typical over-constrained problems such as fitting an occultation model to time-series data, for which it is normal to σ-clip the data, and for which the uncertainties on individual data are largely irrelevant (indeed photometric uncertainties are typically estimated in the process of fitting a model to the data, rather than taken on faith).
Since we are merely concerned with the emergent spectra of the bodies, it is immaterial if a planet is on an eccentric orbit (GJ 436b, HAT-P-2b, XO-3b) or even if it is a highly-irradiated brown dwarf (KELT-1b). The majority of these observations-in particular, all those at 3.6, 4.5, 5.6, 8.0, and 24 µm-were made with the Spitzer Space Telescope (Werner et al. 2004).
We fit a blackbody spectrum to the eclipse depths for each planet using the published transit depth and stellar effective temperature. We assume symmetric, Gaussian, error bars for the eclipse depths; in the few cases were asymmetric error bars were published, we take the mean of the upper and lower error bars.
The transit depth and stellar effective temperature have associated uncertainties that are important if one is trying to estimate the planet's bolometric flux, but these errors tend to have a gray impact on the planet's spectrum and hence we optimistically neglect them for the current analysis.
In the interest of simplicity, we also ignore the detector spectral response functions and instead compute the Plank function at the central wavelength of each photometric observation. Moreover, by using the stellar effective temperature rather than a detailed stellar model, we are treating the star as a blackbody. For broadband measurements in the infrared these assumptions are reasonable.

THE SIGNIFICANCE OF SPECTRAL FEATURES
In order to quantify the significance of spectral features, we turn to the Bayesian Information Criterion (BIC; Schwarz 1978). The BIC is a simple way to compare the goodness-of-fit of models with different numbers of parameters: BIC = χ 2 +k ln N , where k is the number of free parameters and N is the number of data. It is similar in spirit to the reduced χ 2 in that it penalizes models with many parameters, but it remains well-defined when there are fewer data than there are parameters (as is the case for hot Jupiter photometric eclipse retrieval). As a baseline, we compute the BIC for each planet in our sample by fitting a blackbody to each planet and adopting the quoted uncertainties (the only unknown is the blackbody temperature, so k = 1).
The blackbody BIC values are plotted in Figure 1 against the number of wavelengths available for each planet. The black lines denote the quality of the blackbody fit: the solid line is a good fit (χ 2 /N = 1), while the dashed line is a perfect fit (χ 2 = 0). Planets that lie above the solid black line are poorly fit by a blackbody. The dashed red line is BIC = 10 ln N , where k = 10 is representative of an idealized spectral retrieval model that can perfectly fit the observations (i.e., χ 2 = 0). The solid red line shows the same but assuming a more realistic goodness-of-fit: χ 2 /N = 1.
Planets that lie above the red lines show Bayesian evidence of spectral features according to the following rule of thumb: ∆BIC < 2 is not worth more than a bare mention, 6 < ∆BIC < 10 is strong evidence, and ∆BIC > 10 is very strong (Kass & Raftery 1995).
If published eclipse values and uncertainties are taken at face value, then a handful of hot Jupiters have broadband emission spectra that demand a full spectral retrieval: CoRoT-2b, GJ 436b, HAT-P-8b, HD 189733b, HD 209458b, WASP-1b, WASP-8b, and WASP-12b. The bulk of hot Jupiters, however, lie below the red lines, implying that the data do not warrant spectral retrieval.

EMPIRICAL ESTIMATE OF ECLIPSE UNCERTAINTIES
For a handful of the best and brightest targets, multiple Spitzer observations have been obtained with the same instrument. 4 These observations are listed in Table 1. For any reshoot or reanalysis, we list the discrepancy between the new measurement and the original published value.
The published uncertainties are too low, but by how much? If we demand that 68% of repeated measurements fall within the 1σ error bars, then we obtain an empirical eclipse uncertainty of 4.7 × 10 −4 . This value is greater than every single quoted uncertainty in the table (let alone the Poisson noise limits), so we conclude that it is a systematic uncertainty. Given the small number statistics, we round the above value to a single significant figure, and adopt it as our estimate of systematic error in broadband photometric eclipse measurements with existing facilities: σ syst = 5 × 10 −4 .

SYSTEMATIC ERRORS
In Figure 2 we revisit the distribution of blackbody BIC vs. N λ in light of systematic errors. We add the systematic uncertainty of σ sys = 5 × 10 −4 in quadrature to the quoted uncertainties to obtain realistic error bars and recompute the BIC for each planet.
All planets lie below the blue line, meaning that there are no planets for which a spectral retrieval is warranted. The two planets that are worse fit by a blackbody are CoRoT-2b and WASP-1b. It is worth noting that the emergent spectrum of CoRoT-2b is also poorly fit by any 1D atmospheric models (

DISCUSSION
It is worth summarizing why observers tend to underestimate eclipse depth uncertainties. First and foremost, the instruments being used are being pushed orders of magnitude beyond their design specifications. This is easy to see in the raw photometry, which suffers from detector systematics that are comparable to, and sometimes dwarf, the astrophysical signal of interest. It is routine to remove these detector systematics so effectively that the quoted uncertainties are within 10-20% of Poisson (photon-counting) noise. Fig. 1.-The Bayesian Information Criterion (BIC) of a blackbody fit is plotted against the number of wavebands for which photometric eclipse measurements have been obtained; each blue dot represents one of the 42 transiting planets in our sample. The left panel uses the published uncertainties for each planet, while the right panel adds an empirically-determined systematic error of σsyst = 5 × 10 −4 in quadrature to each eclipse measurement. The red lines show the BIC one would obtain by fitting the data with a 10-parameter spectral retrieval model (the solid line is a good fit: χ 2 /N = 1; the dashed line is a perfect fit: χ 2 = 0). The black lines denote the quality of the blackbody fit: the solid line is a good fit (χ 2 /N = 1), while the dashed line is a perfect fit (χ 2 = 0). Planets that lie above the solid black line are poorly fit by a blackbody. Planets that lie above the red lines warrant a full spectral retrieval.  Note. -a Magnitude of discrepancy between an observation and the original published value. b The "null hypothesis" fit for which ellipsoidal variations are fixed to zero. This is approximately equivalent to analyzing the eclipse in isolation, as was done in Campo+(2012).
It is undeniable that careful calibration can help make up for an imperfect instrument, but if it was really possible to extract Poisson-limited performance out of a systematics-riddled instrument, then there would be no advantage in building better instruments: astronomers would content themselves with large light buckets. The real uncertainties must lie in between the detector systematic noise (∼ 10 −2 for IRAC) and the Poisson limit (5 × 10 −5 for the brightest hot Jupiter systems).
With more repeat observations it might be possible to make separate systematic error estimates for each tele-scope+instrument combination, but there is no evidence that these errors are grossly different for the various instruments on Spitzer. This is unsurprising since the magnitude of detector effects for IRAC and MIPS are all at about the 1% level, and none of these systematics are entirely understood.
The secondary sources of systematic errors are correlated parameters and conditional solutions. Correlated parameters can be accounted for by running a Markov Chain Monte Carlo (MCMC) and marginalizing over nuisance parameters; this has become standard in the field. "Conditional solutions" refers to the fact that the best-fit parameters are obtained given many assumptions that are not varied as part of the MCMC: aperture, sky pixels, σ-clipping scheme, detector and astrophysics parametrization, lack of binary companion or accretion disk, etc. The choices made by researchers are defensible, but adopting a reasonable meta-parameter is similar to slicing through correlated parameters: it invariably leads to under-estimated error bars.
When one is pushing instruments two orders of magnitude beyond their specifications, the most robust measurements are those repeatedly made, and analyzed by multiple distinct groups. It is possible to push well beyond the systematic noise floor and eventually to establish a firm value with robust error bars.
It should be noted that there is still science to be done using broadband eclipse photometry, even in the presence of systematic uncertainties. The orbital phase of eclipse places useful constraints on orbital eccentricity and hence planet formation/migration scenarios (Charbonneau et al. 2005). Moreover, dayside bolometric flux can be readily estimated, since integrating noise is less damning that differentiating it. In fact, the systematic errors involved in bolometric flux estimates are small for hot Jupiters precisely because these planets have approximately blackbody spectra (Cowan & Agol 2011). The dayside effective temperature of a planet, in turn allows us to infer Bond albedo and/or heat transport.
In light of this study, spectral resolution offers two significant advantages over photometry: (1) a highresolution emission spectrum is more likely to deviate significantly from a blackbody, and (2) the retrieval problem is over-constrained. This bodes well for current and future efforts to perform bona fide emission spectroscopy.

CONCLUSIONS
The retrieval of parameters from disk-integrated broadband photometry hinges on planets not looking like blackbodies. If published uncertainties are taken at face value, then many of the brightest hot Jupiters have distinctly non-blackbody broadband spectra.
In order to perform under-constrained spectral retrieval, however, it is critical to have believable error bars. One can empirically check the accuracy of published error estimates by considering the measurements that have been replicated. We performed this comparison and found that the empirical 1σ systematic uncertainty in broadband eclipse depths is approximately 5 × 10 −4 . If one combines this uncertainty with published eclipse measurements, then all hot Jupiters are featureless, including the brightest targets. We conclude that statements about atmospheric composition based solely on broadband emission measurements are premature. Temperature inversions and odd compositions were inferred for short period planets based on photometric eclipse spectra. Our results calls these phenomena into question. Undoubtedly, many planets have stratospheric inversions and interesting chemistry, but there is no robust evidence for this in the photometry of short-period exoplanets.
NBC is indebted to the participants of ExoPAG-9 for discussions of instrument systematics, as well as to M.R. Line and N. Madhusudhan for discussions of spectral retrieval. This research has made use of the Exoplanet Orbit Database and the Exoplanet Data Explorer at exoplanets.org.