Theoretical foundations of emergent constraints: relationships between climate sensitivity and global temperature variability in conceptual models

There is as yet no theoretical framework to guide the search for emergent constraints. As a result, there are significant risks that indiscriminate data-mining of the multidimensional outputs from GCMs could lead to spurious correlations and less than robust constraints on future changes. To mitigate against this risk, Cox et al (hereafter CHW18) proposed a theory-motivated emergent constraint, using the one-box Hasselmann model to identify a linear relationship between ECS and a metric of global temperature variability involving both temperature standard deviation and autocorrelation ($\Psi$). A number of doubts have been raised about this approach, some concerning the theory and the application of the one-box model to understand relationships in complex GCMs which are known to have more than the single characteristic timescale. We illustrate theory driven testing of emergent constraints using this as an example, namely we demonstrate that the linear $\Psi$-ECS proportionality is not an artifact of the one-box model and rigorously features to a good approximation in more realistic, yet still analytically soluble conceptual models, namely the two-box and diffusion models. Each of the conceptual models predict different power spectra with only the diffusion model's pink spectrum being compatible with observations and the complex CMIP5 GCMs. We also show that the theoretically predicted $\Psi$-ECS relationship exists in the \texttt{piControl} as well as \texttt{historical} CMIP5 experiments and that the differing gradients of the proportionality are inversely related to the effective forcing in that experiment.

find a relationship via a scatter plot between an observable plotted on the x axis and the future projection plotted on the y axis, each point on the plot being one member of the model ensemble. The model ensemble derived relationship or emergent relationship, and the uncertainty in it, can then be determined from regression on the scatter plot. A measurement of the observable in the real world can be combined with the model-derived emergent relationship to produce an emergent constraint on the climate projection. Hall and Qu 3 published one of the first emergent constraints relating the strength of the snow albedo feedback in the seasonal cycle (the observable) to the strength of the snow albedo feedback in climate projections within the multimodel ensemble used in IPCC AR4 5 . Since then many others have been published including studies on sea-ice 6,7 , tropical precipitation extremes 8 , equilibrium climate sensitivity (ECS) 1, [9][10][11][12] , carbon loss from tropical land under warming 13 , zonal shift of Southern Hemisphere westerlies 14 , cloud feedbacks [15][16][17][18][19][20] , strengthening of the hydrological cycle 21 , the climate-carbon cycle feedback 22 and CO 2 fertilization effect 23 , future changes in ocean net primary production 24 , permafrost melt 25 , and changes in natural sources and sinks of CO 2 26 .
Some scepticism about emergent constraints is healthy, particularly when they are not founded on well understood physical processes. There are significant risks that indiscriminate data-mining of the multidimensional outputs from models could lead to spurious correlations 27 and less than robust constraints on future changes 28 . Care is also needed drawing statistical inferences from ensembles of small numbers of models. The problem is compounded if models within the ensemble share common components giving a smaller effective ensemble size [29][30][31] . Observations used to guide model development also may lead to dependencies 32 .
To minimise these risks, a theoretical framework for finding and evaluating emergent constraints is needed. The approach described here involves a form of hypothesis testing, in which an underlying simple, conceptual model is proposed, the model is used to predict an emergent relationship between an observable and an unknown future projection, and the predicted emergent relationship is tested against results from an ensemble of more complex models. Emergent relationships are usually assumed to be univariate and linear, but these are not necessary simplifications.
As an example, we illustrate this theory led approach using simple conceptual models of the global mean temperature as emergent constraints on ECS and test the theoretically predicted relations against observations and the CMIP5 models.
In CHW18, the theoretical linear relationship between a measure of the variability of annual mean global surface air temperature, the observable Ψ, and the equilibrium climate sensitivity (the future projection) was used to derive an emergent constraint on ECS. Colman and Power 33 also found a correlation between the tropical decadal temperature standard deviation and ECS in the CMIP5 models. A number of doubts have been raised about CHW18 34-37 , some concerning the theory and the application of the one-box model to understand relationships in complex GCMs which are known to have more than the single characteristic timescale 38,39 . In section II we investigate whether the relation in CHW18 derived for the one-box model still holds in more realistic yet still analytically soluble conceptual models, namely the often used two-box and diffusion models. It is known the two-box and diffusion models unlike the onebox model are able to reproduce the long term transient behaviour of the CMIP5 GCM abrupt4xCO2 and 1pcCO2 simulations 39,40 . Although we find the one-box linear proportionality between Ψ and ECS is generally no longer true in the two-box and diffusion models, we show the linear proportionality holds to a good approximation for both when the range of their parameters are applicable to the complex CMIP5 GCMs. This gives us increased confidence in the theoretical foundation of CHW18.
It is important to note that each of these conceptual models differ structurally, predict different temperature responses and will not be able to reproduce all of the features of the global mean temperature response. One could loosely view these conceptual models as zeroth order approximations and GCMs as higher order approximations of the real world. The often used quote 'all models are wrong but some are useful' is quite apt as a guiding principle for this manuscript. The usefulness of the model will depend on the question asked of it.
In section III conceptual model predictions are compared with the CMIP5 ensemble and observations, particularly the power spectra and autocorrelation functions. The pink power spectrum of global mean temperature in observations and CMIP5 models can only be reproduced by the diffusion model. However, if one is interested in the shorter time scale behaviour for use as an emergent constraint on ECS, we find the simplest conceptual one-box model will serve as a good approximation.
Also in section III Ψ vs ECS emergent relationships for both piControl and historical CMIP5 experiments are shown. Both have the theoretically predicted linear proportionality although they have differing gradients. This difference in gradient is theoretically expected to scale inversely with the effective forcing and this is also observed in the CMIP5 models.
The current paper can be seen as a companion paper to CHW18, as it examines and tests the appropriateness of the theory used to inform that study.

II. CONCEPTUAL MODELS RELATING GLOBAL TEMPERATURE VARIABILITY TO EQUILIBRIUM CLIMATE SENSITIVITY
Caldeira and Myhrvold 40 (hereafter CM13) fitted three different conceptual models, namely the one-box, two-box and diffusion models to the annual global mean air temperature time series of the CMIP5 abrupt4xCO2 experiments 4 . These fits were then tested against the 1pcCO2 CMIP5 experiments. CM13 showed that while the one-box model was a poor fit to either experiment on longer timescales both the two-box and diffusion models did good jobs. Here we use these conceptual models to analyse the annual global mean air temperature variability in the CMIP5 historical experiments with a view to obtaining ECS as a function of Ψ as found in CHW18 for the one-box model.
Each of the conceptual models have differing numbers of free parameters, the one-box and diffusion model have three and the two-box has five. None of these are assumed to be fixed. These parameters are essentially fitted to each of the CMIP5 models and the observations in the historical period via Ψ (introduced in equation 12). The models are introduced in order of complexity and completeness, the one-box being the simplest analytically while the diffusion model is harder to solve but reproduces more of the observed temperature response.
The historical time series can be approximated as the sum of the responses to the forcing resulting from changes in the atmospheric composition. These include greenhouse gases, tropospheric and stratospheric aerosols from large volcanic eruptions and solar variability. There is also a response to fast, internal variability which is parameterized here as the response to random, white noise forcing. We seek to isolate the response to the latter and relate it to equilibrium climate sensitivity (ECS). A relation between the system response to random fluctuations and its sensitivity, is essentially a fluctuation-dissipation theorem (FDT) 41,42 .
ECS is defined as the equilibrium temperature change due to the constant forcing Q 2×CO2 from the doubling of CO 2 , and λ is the climate feedback factor. Linearity of the conceptual models allows each temperature response T i (t) to each forcing Q i (t) to be added to give the total response i.e. if the total forcing is Q(t) = i Q i (t) then the total temperature response is just the sum of the temperature responses to each of the individual forcings T (t) = i T i (t) (principle of superposition). Linearity means that by suitable detrending the response from the trend in emissions can be removed from the total temperature response to leave the residual response, ∆T (t), to the random forcing. For this study, we assume this detrending can be carried out to a good approximation and work with just the residual temperature. For notational ease we also refer to ∆T as T i.e. ∆T := T .
Although the theory we derive here assumes external, random forcing, we have shown that the Ψ-ECS linear proportionality will theoretically become more tightly defined in the presence of common (non-random) forcing across a model ensemble 37 . The gradient of the relationship does however change, being roughly inversely proportional to the amplitude of the forcing (see section III).
The superposition principle implies the response to any forcing can be written as the convolution of the linear response function g(t) (the response to delta function forcing) with the forcing i.e. (2) Each model can therefore be characterized by g(t). We will be interested in their response in the stationary limit i.e. when t ≫ τ where τ is the longest timescale in the model. The residual response is found when Q(t) is a Gaussian random variable with zero mean and a standard deviation of σ Q , Q(t) = σ Q dW t , turning equation 2 into a stochastic integral where W s is a Wiener process.
In the following we choose to use the two observables variance σ 2 T and autocorrelation α T (t) as fitting parameters for the conceptual models as they can be easily estimated for a given time series, be it a CMIP5 model or observations. These can be computed for the residual temperature by using equation 3 and the relevant model g(t) via the autocovariance R(t) (4) Another useful quantity we use to compare the simple models to the CMIP5 models and observations is the power spectrum of T , |T (ω)| 2 , which can be found from the Fourier transform of the autocovariance A. Hasselmann one-box model The simplest, one-box (or one-timescale) model for the evolution of T (t) is In this model the climate system can be thought of as a single well-mixed box with effective heat capacity C forced by Q(t) and adjusting to this forcing with climate sensitivity λ proportional to the temperature anomaly. The single well-mixed box can be roughly thought of as representing the atmosphere, surface mixed ocean layer and the land. The linear response function for this model is where the timescale in the model τ H = C λ and Θ(t) is the Heaviside step function. When the forcing Q(t) is Gaussian white noise, equation 8 is known as the Hasselmann model 43 . Variance and autocorrelation for the one-box model can be computed from equations 3 and 4 to be These equations can be combined with equation 1 to give 1 where Ψ is defined as and α 1T = α T (1 year). It was equation 12, namely the linear proportionality between the observable Ψ, estimated from timeseries of T and the future projection ECS, that was used to guide the search for an emergent constraint in CHW18. The magnitude of proportionality between Ψ and ECS, the ratio of the effective forcing due to doubling CO 2 and the mean amplitude of the effective forcing in the experiment σ Q , √ 2 Q2×CO 2 σQ cannot be observed but is fortunately weakly correlated to ECS (r = −0.02) across the CMIP5 model ensemble 1 . By linearly regressing Ψ against ECS, the magnitude of proportionality is therefore determined by the model ensemble itself.
The power spectrum of the one-box model is This model predicts a red power spectrum temperature response, that is, the power scales inversely to the square of the forcing frequency ω.

B. Two-box model
The two-box model 39,44,45 consists of two well-mixed layers, one representing the upper mixed layer of the ocean plus the lower atmosphere, with effective heat capacity C and temperature T , and a second well-mixed box representing the deep ocean with heat capacity C 0 and temperature T 0 . Heat transport between the two boxes is proportional to the temperature difference between the two boxes with constant of proportionality γ. The equations describing the evolution of temperature are therefore This model has a two timescales, a fast τ f and slow one τ s . The linear response function is the sum of the fast and slow modes with amplitudes a f τ f and as τs , This model has been extensively used in previous climate applications and here we use the notation and expressions for the amplitudes and timescales in terms of the quantities in equation 15 as given in Geoffroy et al 39 . They also fitted this model to abrupt4xCO2 CMIP5 experiments for which they found two widely separated timescales, typical values being τ f ∼ 4 yrs and τ s ∼ 250 yrs while the dimensionless mode parameters, a f and a s , were roughly of equal size (a f ∼ 3/5 and a s ∼ 2/5). The autocovariance function for Gaussian white noise forcing can be found by using equations 3 and 4 and in contrast to the one-box model features two modes: giving This general result simplifies for typical fitted parameters to the CMIP5 models 39 as the variance and the autocorrelation are dominated by the fast mode. These can be approximated in the limit (τ s ≫ τ f , a s ∼ a f ) by: The approximate expressions are therefore very similar to the one-box model for the CMIP5 models. Combining these expressions with the equation for ECS gives so that the linear relationship between Ψ and ECS found in the one-box model also approximately holds for the twobox model. The constant of proportionality is however different, it has an extra factor in the denominator a f , which is roughly constant and is approximately a f ∼ λ/(λ + γ) over the CMIP5 model range of parameters. Relative standard deviation in a f is 13%. The reason for the approximate equivalence between the ECS relations in one-and two-box models is due to the wide separation in timescales between the two modes fitted to the CMIP5 models. As in the one-box case, the 'constant' of proportionality between ECS and Ψ, √ 2 Q2×CO 2 σQa f is weakly correlated to ECS (r = 0.03) across the CMIP5 models and one can linearly regress Ψ against ECS for a theoretical emergent relationship.
In contrast to the one-box, the two-box power spectrum is which, depending on the size of terms can give red and ω −4 scaling although when fitted to the CMIP5 models, the spectrum is approximately red.

C. Diffusion equation
The diffusion equation (or heat equation) model 38,40 consists of a continuous vertical layer, z ≥ 0, where radiative forcing at the surface (z = 0) causes heating which is transported down through the water column by diffusion (parameterized by diffusivity D), representing heat uptake by the deep ocean. A mixed-layer surface box has also been added in previous studies to add realism 46-48 although here we use just the diffusion equation for simplicity. The model is described by a partial differential equation: with flux boundary conditions where ρ and c p are the density and specific heat capacity of water respectively. The maximal depth of the ocean, z max , is taken to be infinite. The temperature is now a function of both depth and time, T (z, t) although our interest is only in the surface temperature T (0, t).
In contrast to the one-and two-box models, the ocean is modelled as a vertical continuum rather than a finite number of well-mixed boxes. As heat is diffused down the water column with time, the effective heat capacity increases as the heat sees more ocean. This model can also be thought of as an M -box model where M is very large and each well-mixed box is very thin resulting in a continuum of (M ) timescales. The diffusion model reduces to a one-box model when times of interest are larger than the time taken for heat to be well diffused throughout the water column i.e. when t > z 2 max /D. For ocean depths of z max = 4000 m and typical diffusivities of D = 5 × 10 −5 m 2 s −1 this happens when t > 10,000 years and so this limiting case is not met for the application here.
The linear response function for surface temperature T (0, t) can be found using Laplace transforms on equations 25 and 26 to be A timescale, τ D , can be identified in this model as τ D = ρ 2 c 2 p D λ 2 . τ D ∼ 25 years for the mean value of D found in CM13 40 . One needs to be aware of an unphysical infinity in g(0, t) at t = 0 because of the time dependence of the effective heat capacity. At t = 0 this results in zero effective heat capacity and therefore an infinite response. In reality energy is not absorbed in an infinitely thin surface layer and thus care needs to be taken when calculating at t = 0.
The power spectrum at the surface can be also be found using either Laplace or Fourier transforms. This is given by The diffusion model therefore predicts a pink spectrum i.e. power scales inversely proportional to ω −1 in contrast to the red spectra predicted by the one-and two-box models.
To obtain the autocovariance function at z = 0 we start with the power spectrum and Fourier transform it using equation 7: this rather long exact expression can be well approximated more compactly as Unfortunately R(t = 0) = σ 2 T is also infinite because of the unphysical zero effective heat capacity. σ 2 T can however be approximated by taking a very small but finite time, t 0 . Starting with equation 31 and Taylor expanding the exponential integral to zeroth order around t = 0 results in where c 0 = −2γ EM −log(πt 0 ), γ EM ≈ 0.577 is the Euler-Mascheroni constant and in the second line, the approximation log x = c 0 x 1 c 0 − c 0 has been used. This approximation gets better for larger c 0 (smaller t 0 ). So for small t 0 34) and the autocorrelation function is which is purely a function of τ D . Rearranging equation 34 and combining with the equation for ECS (eq 1) gives where Or in terms of observables where E −1 1 (x) is defined as the inverse of the exponential integral. The linear Ψ-ECS proportionality is not true for the diffusion model. However, comparing eq. 37 with similar for one-and two-box models (eq. 13), if is approximately true then the linear ECS-Ψ proportionality is also approximately true for the diffusion model. By plotting one against the other in figure 1 this is the case for the range of values of τ D applicable to CMIP5 models (τ D ∈ [10, 60] years in CM13). α 1T is calculated from the exact formula, equation 30.

III. COMPARISON WITH CMIP5 MODELS AND OBSERVATIONS
Theoretical autocorrelation functions and power spectra predicted by the conceptual models are shown in figure  2 for typical values found in fits to the CMIP5 models 39,40 . One-and two-box autocorrelation functions and power spectra are very similar for timescales less than 100 years. Power spectra in these models have the same |T (ω)| 2 ∝ ω −2 red power spectra. In contrast to the box models, the diffusion model has a faster drop off in autocorrelation but a slower approach to equilibrium and a power spectrum that predicts a |T (ω)| 2 ∝ ω −1 pink power spectrum.
For comparison with the conceptual plots, the CMIP5 historical runs (coloured lines) and the HadCRUT4 historical observational dataset 49 (thick black line) are shown in figure 3. The power spectra of the HadCRUT4 observations and CMIP5 models show approximately a |T (ω)| 2 ∝ ω −1 pink spectrum most closely resembled by the diffusion model. The dotted white line is shown as a guide to this proportionality. High sensitivity CMIP5 models also generally have higher autocorrelation. The HadCRUT4 autocorrelation is more representative of the low sensitivity models, consistent with the findings of CHW18.
We have used detrended CMIP5 historical simulations as a comparison to observations can also be made and an emergent constraint obtained. However, conceptual model theoretical relations have been derived assuming white noise external forcing as a parameterization of internal variability. The CMIP5 piControl experiments are the closest analogue to this simplification and one may wonder whether the same relations hold in these experiments as it is known the forced response may not always be the same as the response to internal variability 50,51 . Power spectra and autocorrelation functions for the piControl experiments are broadly the same as figure 3 (not shown). The linear Ψ-ECS emergent relationships are also similarly strong in both piControl and historical simulations having correlations of r = 0.68 and r = 0.77 respectively. The higher correlation in the historical experiment resulting in a reduced uncertainty emergent constraint is theoretically expected when there is common forcing across the model ensemble (see Cox et al 37 ). In this case the common forcing in the historical experiment is provided by the increasing concentrations of greenhouse gases, aerosols and volcanic eruptions.
There are differences in the emergent relationships however. In figure 4 (a) the emergent relationships for the piControl and historical have different gradients. This is due to increased effective forcing in the historical simulations from residual volcanic, aerosol and greenhouse gas forcing remaining after the detrending procedure.   table  I). The emergent relationship is calculated between 1880-2015 for the historical experiment and for the first 135 years of piControl using the same methodology of CHW18. The gradient of the emergent relationship (dashed line) is theoretically predicted to be smaller with increased forcing in the experiment. This is why the historical run with its volcanic, greenhouse gas, aerosol and internal forcing has a shallower gradient than the control run. (b) is the same plot with Ψ renormalized by the estimated forcing from σN , the standard deviation of the top-of-atmosphere radiative forcing. Emergent relationships then have a very similar gradient illustrating the inverse proportionality of the gradient to the forcing in (a). the CMIP5 models from the net top-of-atmosphere radiative flux N where N ≈ Q − λT with standard deviation σ N .

IV. DISCUSSION AND CONCLUSIONS
All three conceptual models have both physical similarities and deficiencies relative to the CMIP5 models and the real Earth system. The one-box model only really has any physical justification when the timescales of interest are dominated by the well-mixed atmosphere and ocean surface layer. This has led some to question the use of the onebox model by CHW18 to motivate their emergent constraint between equilibrium climate sensitivity (ECS) and Ψ, a statistic dominated by the fast timescale processes e.g. Rypdal et al 36 . However, in this paper we have shown that a near-linear relationship is to be expected between ECS and Ψ for more realistic conceptual models. For the oneand two-box models we were able to find analytical relations to show this. Semi-analytical relations for the diffusion model also show a similar near linear relationship.
Even though a linear proportionality between Ψ and ECS is expected in the conceptual models for regions of parameter space applicable to CMIP5 models, each of the conceptual models predicts different temperature responses. The one-box model cannot reproduce the long timescale behaviour of the two-box or diffusion model and neither the one-or two-box models can mimic the observed and CMIP5 power spectra. Of the three, the diffusion model reproduces the power spectra of the CMIP5 models and the observations most closely although it is more difficult to work with and has some deficiencies as an analogue to the real climate system. Combining a well-mixed atmosphere-surface ocean box with a diffusive continuous deep ocean [46][47][48] , although adding another layer of complexity and making the model less analytically amenable, would add physical realism. We suspect this would produce a similar linear relation to the one-and two-box models as well as mimicking the CMIP5 and HadCRUT4 power spectra due to the timescale separation between surface mixed and deep layers.
In conceptual models, we therefore expect to find emergent relationships between ECS and short-term variability (e.g. as measured by Ψ). However, the underlying models considered here remain deliberately very simple compared to the GCMs we are using to define emergent constraints. It is therefore vital that we continue to check that our conceptual models provide useful insights into the spread of projections from GCMs. We see this as a form of hypothesis testing, in which a conceptual model is proposed, an emergent relationship between variability and sensitivity is predicted based-on that conceptual model, and then that predicted emergent relationship is checked against the ensemble of full-form GCMs. This approach requires that the search for emergent constraints becomes more theory-led than it has been to date, but would also guard against spurious relationships that could easily arise from blind data-mining of the many diagnostics available from modern GCMs. Most importantly, in our view, such theory-led hypothesis testing is much more likely to improve understanding of the climate system than purelystatistically-derived emergent constraints.