## Abstract

Dynamical systems methodology is a mature complementary approach to forward simulation which can be used to investigate many aspects of climate dynamics. With this paper, a review is given on the methods to analyse deterministic and stochastic climate models and show that these are not restricted to low-dimensional toy models, but that they can be applied to models formulated by stochastic partial differential equations. We sketch the numerical implementation of these methods and illustrate these by showing results for two canonical problems in climate dynamics.

## 1. Introduction

Climate research starts with ‘reality’ which we humans experience incompletely and inaccurately (Peixoto and Oort, 1992). With the instrumental record, a very rough view of the climate system has been obtained over the last 150 years, with up to the 1950s, mainly through sea surface temperature and sea surface pressure observations. Global coverage of several surface quantities, for example, of sea surface height is monitored since the satellite era. Proxy data such as oxygen isotope records probe a very incomplete development of the climate system in the (geological) past (Zachos et al., 2001). Based on this data and our imagination, we want to build a picture how particular phenomena arise through the interaction of more elementary processes, i.e. we want an understanding (Ghil, 2001). For some phenomena, such as the flows in the ocean and atmosphere, we can make use of more general formulated ‘laws of nature’, such as mass, momentum and energy balances. For other phenomena, such as phytoplankton blooms and in general the behaviour of the marine biosphere, it is difficult to derive precise laws from first principles.

Combining data and imagination leads to models of phenomena. Actually, a whole hierarchy of models usually exists, depending on the detail of the phenomenon, which is aimed to be described (Dijkstra, 2013). Models on the macroscopic level of the hierarchy only describe very gross characteristics of the behaviour, for example only a particular timescale of variability. Other models are aimed to describe the spatial structure of the phenomenon and its context, e.g. its coupling to other phenomena. Which model to develop or pick from the zoo of models available at each (vague) level of the hierarchy depends on the particular question asked. Development of new models can be an art and may depend on the imagination of the scientist. What is important is that the model is ‘fit for purpose’, i.e. it captures all processes which are relevant for answering the question posed with the ‘right’ of amount of detail (Katzav et al., 2012).

Eventually, every climate model (whether discrete or continuous) can be formulated as a dynamical system. This dynamical system may be autonomous, non-autonomous or stochastic. Each such system has parameters and may contain delay terms. After a model has been specified, efforts are directed to determine the structure of its solutions for different initial conditions and varying parameters. In climate models, non-autonomous effects easily arise through diurnal, seasonal and/or through Milankovitch astronomical forcing, with known frequencies (Berger and Loutre, 1991). Based on the behaviour of linear systems, one might anticipate that solutions of the model will display these ‘forced’ timescales of variability. However, in many models, the solution behaviour is contrary to what one expects. Variability can appear with frequencies very different from those in the forcing. A prominent example is the El Niño/Southern Oscillation interannual variability (Philander, 1990) in the equatorial Pacific, which arises due to coupled ocean–atmosphere processes. Also, transitions in observables can occur on much shorter timescales than those of the forcing, for example the changes in the Atlantic Ocean circulation (Rahmstorf, 2000). Non-linear elementary processes, for example advection and radiation, are responsible for this ‘unexpected’ behaviour.

One may find changes in the behaviour of a model by changing its parameters. Interesting behaviour appears when qualitative changes occur once a parameter is changed only gradually. For example, for $\mu =\mu 0$, all initial conditions may converge in time to a stationary solution, whereas for the $\mu =\mu 1$, suddenly all initial conditions converge to a periodic orbit. Critical transitions, also called ‘tipping points’ (Gladwell, 2000), appear when sufficiently large or drastic changes in the dynamics are observed as parameters are changed (Kuehn, 2013). Climate phenomena in which such tipping behaviour can occur, such as the decay of the Greenland Ice Sheet, are referred to as ‘tipping elements’ (Lenton et al., 2008).

For autonomous finite-dimensional dynamical systems—where the dimension or the number d.f. refers to the number of dependent variables in the state vector—there is a beautiful mathematical theory of transition behaviour, i.e. bifurcation theory (Guckenheimer and Holmes, 1990). The concept of a bifurcation of a fixed point (or steady state solution of the dynamical system) is directly defined through the behaviour of eigenvalues of the Jacobian matrix. The theory of normal forms provides a classification of types of bifurcations which is sufficiently detailed for most practical applications. For non-autonomous finite dimensional systems, bifurcation theory is much less advanced and is still in development (Ghil et al., 2008). An exception is the theory of fast–slow systems, where a decoupling can be made between the autonomous behaviour and the slow transient changes in the system (Berglund and Gentz, 2006; Kuehn, 2011). For random dynamical systems, bifurcation theory development is a highly active area of research and currently a complete normal form theory is not available (Arnold, 1998).

Bifurcation theory is widely applied to study the behaviour of models with a relatively few degrees of freedom (d.f.) and is considered as a useful approach to analyse low-dimensional conceptual models in the climate community. However, for answering the most relevant scientific questions posed on climate variability and climate change, only high-dimensional models are considered fit for purpose. The main aim of this review paper is to demonstrate that a stochastic dynamical systems approach can also be applied to high-dimensional climate models. The setup of the paper is to first introduce two relevant climate variability phenomena (section 2), and then show the methods of the numerical approach to determine the behaviour in high-dimensional models of these phenomena (section 3). These methods will then be applied to the phenomena in the sections 4 and 5 and a summary and discussion concludes the paper (section 6).

## 2. Two canonical problems

The ocean circulation plays an important role in the climate system, in particular regarding climate variability on interannual-to-millennial timescales. Ocean flows are forced by the surface wind stress and affected by density differences, the latter mainly caused by the surface buoyancy flux. The wind-stress forcing leads to strong western boundary currents in the ocean, such as the Gulf Stream in the North Atlantic and the Kuroshio in the North Pacific. The effects of density lead to a meridional overturning circulation (MOC) which is mainly responsible for the meridional heat transport in the oceans. The two problems here deal with variations in the path of the Kuroshio and transitions in the Atlantic MOC (AMOC).

### 2.1. Path transitions of the Kuroshio

The Kuroshio, the western boundary current in the North Pacific, flows south of the islands of Japan. After separating from the coast, the Kuroshio flows almost directly eastwards into the open ocean and is known as the Kuroshio Extension. The Kuroshio Current system as a whole has been observed to have multiple paths (Taft, 1972). The path taken by the Kuroshio south of Japan is divided into two categories, the Large Meander (LM, or meandering path), where the current meanders offshore and then returns to the coast before its separation point (Fig. 1a), and the Non-Large Meander (NLM, or alongshore path), where the current stays close to the coast (Fig. 1b) until its separation point (Kawabe, 1995). Every few years there are transitions between the LM and NLM Kuroshio path states and several mechanisms have been proposed. In Qiu and Miao (2000), it has been proposed that a self-sustained internal variability mechanism is responsible, whereas in Schmeits and Dijkstra (2001), it has been suggested that they are caused by switches between multiple steady Kuroshio states.

The Kuroshio Extension also displays interannual-to-decadal variability between so-called ‘elongated’ and ‘contracted’ states. In the elongated state, the current has larger eastward surface transport, a more northerly path and a more intense southern recirculation gyre. The contracted state has smaller eastward surface transport, a more southerly path and a weaker southern recirculation gyre (Qiu and Chen, 2005). The eddy kinetic energy of the contracted state is also generally higher than that of the elongated state. By analysing altimetry data over the period 1993–2009, Qiu and Chen (2010) clearly demonstrate that the transitions between the two Kuroshio Extension states coincide with the arrival of sea surface height anomalies associated with long baroclinic Rossby waves. These waves have been generated by wind variations such as these related to the Pacific Decadal Oscillation (PDO) (Mantua et al., 1997). Using a 1950–2003 hind cast simulation with a high-resolution ocean model, Taguchi et al. (2007) showed that the broad scale response in the Kuroshio Extension could indeed be attributed to the Rossby wave mechanism. However, also strong internal variability is found in the same high-resolution simulation in the absence of any PDO wind variations.

Given the fact that the low-frequency changes of the Kuroshio Current play an important role in air–sea interaction and fisheries, it is important to determine the predictability of these changes and improve model prediction skill (Nonaka et al., 2012). A sound theoretical framework, where the role of the forced Rossby waves, the internal variability, the mesoscale eddy field and the interaction with the bathymetry is clarified, is highly desired. This theoretical framework should not only clarify what causes the variability in the Kuroshio Extension but also explain the processes causing the transitions between the NLM and LM states of the Kuroshio.

### 2.2. Transitions of the AMOC

In the North Atlantic, the Gulf Stream transports relatively warm and saline waters northwards. Part of the heat is taken up by the atmosphere, making the water denser. In certain areas (e.g. the Labrador Sea), when there is strong cooling in winter, the water column becomes unstably stratified resulting in strong convection. The interaction of this convection with boundary currents (Spall, 2003) eventually leads to the formation of deepwater, which overflows the various ridges that are present in the topography and enters the Atlantic basin (Fig. 2). This deepwater is transported southwards at a depth of about 2 km, where it enters the Southern Ocean. Through upwelling, the water is slowly brought back to the surface and eventually transported back in the upper ocean to the sinking areas in the North Atlantic. In the Southern Ocean, also bottom water is formed which has a higher density than that from the northern North Atlantic and therefore appears in the abyssal Atlantic and Pacific; in the North Pacific, no deep water is formed (Kuhlbrodt et al., 2007).

The AMOC is the zonally integrated volume transport at each latitude and depth, and its strength is usually taken as the maximum of the meridional overturning stream function. This transport is mainly responsible for the meridional heat transport in the Atlantic. There are no observations available to reconstruct the pattern of the AMOC but its strength at 26°N is now routinely monitored (Cunningham et al., 2007). The currently available time series of the AMOC strength indicates a mean of about 19 Sv (1 Sv = 10^{6} m^{3} s^{−1}), a SD of 5 Sv and a decreasing trend of about 1.6 Sv over the last decade (Smeed et al., 2014). At 26°N, the heat transport associated with the AMOC is estimated (Johns et al., 2011) to be 1.2 PW (1 PW = 10^{15} W). This heat is transferred to higher latitudes leading to a relatively mild climate over Western Europe, compared to similar latitudes on the eastern Pacific coast.

The AMOC is sensitive to freshwater anomalies (Stommel, 1961). Freshening of the surface waters in the Nordic and Labrador Seas diminishes the production of deepwater that feeds the deep southward branch of the AMOC. The weakening of the MOC leads to reduced northward salt transport freshening the northern North Atlantic and amplifying the original freshwater perturbation. An AMOC collapse, where a transition to a different equilibrium (weak AMOC) state occurs within a few decades, may occur due to the existence of a saddle-node bifurcation associated with this salt-advection feedback. If the AMOC is in a multiple equilibrium regime, it may undergo transitions due to the impact of noise (Cessi, 1994). The stochastic nature of the forcing (wind-stress and/or buoyancy flux) may even lead to transitions before the actual saddle-node bifurcation has been reached (Ditlevsen and Johnsen, 2010). Hence, it is of central importance to determine what perturbations can cause a transition to the weak AMOC state, in particular, which spatial and temporal correlations in the noise are most effective.

## 3. Numerical stochastic dynamical systems approach

Most climate models are formulated as a set of partial differential equations (which may be deterministic or stochastic) with boundary and initial conditions. To determine solutions of these equations, they are discretized, e.g., with a finite difference, finite element or spectral method. After discretization, a (large) set of ordinary differential equations with algebraic constraints will result which can be written as

where $x\u2208\mathbb{R}d$ is the state vector and $p\u2208\mathbb{R}p$ the parameter vector. In addition, *L* represents the discretized linear part of the equations, *N* its non-linear part, *F* represents the forcing and *M* is in most models a singular diagonal matrix of which every zero on the diagonal is associated with an algebraic constraint (e.g. an incompressibility condition).

To answer questions on flow transitions (such as discussed in section 2 for the Kuroshio and the AMOC) with these models, we follow the systematic approach provided by dynamical systems theory. The first step is to determine the fixed points of the model versus specific control parameters and analyse their linear stability. Methods to perform this numerically, and to detect simple bifurcation points (i.e. pitchfork, transcritical, saddle-node and Hopf), are described in section 3.1. To understand the physics of the occurrence of these bifurcation points, it is helpful to have locally reduced models (with fewer d.f.) and methods to determine these models numerically are given in section 3.2. In regimes where complicated behaviour occurs (even in deterministic models), one is interested in the probability density function of specific observables and techniques to obtain information on these distributions through transfer operator methods are described in section 3.3. In the sections 3.4 to 3.6, we turn to stochastic dynamical systems and provide methods to determine local (Gaussian) probability density functions near deterministic fixed points (section 3.4), methods to numerically determine reduced stochastic models (section 3.5) and methods to determine approximate probability density functions by directly solving the full stochastic equations numerically (section 3.6).

### 3.1. Continuation methods

Often the first step in the analysis of the dynamical system (3.1) is the determination of its fixed points, or steady states. One can simply integrate the model in time to get to an equilibrium for different values of parameters, but there are more efficient techniques to determine the dependence of fixed points upon parameters, i.e. continuation methods.

Finding steady states of the system (3.1) versus a single parameter *p* amounts to solving the algebraic system

The idea of pseudo-arclength continuation (Keller, 1977; Seydel, 1994) is to parametrize branches of solutions $\Gamma (s)\u2261(x(s),p(s))$ with an arclength parameter *s* (or an approximation of it, thus the term ‘pseudo’) and choose *s* as the continuation parameter instead of *p*. Because of the extra d.f., an extra equation is added by approximating the normalization condition of the tangent $\Gamma \u02d9(s)=(x\u02d9(s),p\u02d9(s))$ to the branch $\Gamma (s)$, where the dot refers to the derivative with respect to *s*, with $|\Gamma \u02d9|2=1$. More precisely, for a given solution $(x0,p0)$, the next solution $(x,p)$ is required to satisfy the constraint

where $\Gamma \u02d90=(x\u02d90,p\u02d90)$ is the normalized direction vector of the solution family $\Gamma (s)$ at $(x0,p0)$ and Δ*s* is the step size. There are many approaches to finding a starting point and initial tangent on a branch of solutions (Dijkstra, 2005; Kuehn, 2015).

The solution of the system (3.2) and (3.3) is obtained through a Newton-Raphson iteration, with iteration index *k*. Starting from $x0=x0+\Delta sx\u02d90$, $p0=p0+\Delta sp\u02d90$, at each iteration a linear system of the form

has to be solved, with the Jacobian matrix $J=Dx\Phi $. Here $rk=\Delta s-x\u02d90T(xk-x0)-p\u02d90(pk-p0)$. Once $(\Delta xk+1,\Delta pk+1)$ is found, one sets $xk+1=xk+\Delta xk+1$ and $pk+1=pk+\Delta pk+1$.

When a steady state has been determined, the linear stability of the solution is considered and transitions that mark qualitative changes such as transitions to multiple equilibria (pitchfork bifurcations or limit points) or periodic behaviour (Hopf bifurcations) can be detected. The linear stability analysis amounts to solving a generalized eigenvalue problem of the form

The matrix $J$ is in general non-symmetric, *M* is an, often singular, diagonal matrix and σ is a complex number $\sigma =\sigma r+i\sigma i$. Simple bifurcations can be detected by monitoring the sign of $\sigma r$ and they can be distinguished by computing additional properties (Seydel, 1994).

The advantage of pseudo-arclength methods over traditional continuation methods is that the Jacobian of the extended system $[J\u2202\u2202p\Phi ]$ has rank $d+1$, even at saddle-node bifurcations where the Jacobian $J$ becomes singular. Hence, one can easily continue around these bifurcations (Keller, 1977). Another advantage of pseudo-arclength continuation is that it can compute branches of unstable as well as stable solutions (Seydel, 1994). The unstable solutions are often useful to understand the dynamical behaviour of the system. Continuation methods can also be used to determine periodic solutions versus parameters and to analyse their stability (Dijkstra et al., 2014; Doedel, 1980; Doedel and Tuckermann, 2000).

### 3.2. Local amplitude equations

The discretized system of equations (3.1) will be referred to be in this subsection as the ‘full model’. Suppose that at some value of the control parameter, say at $p=pB$, a bifurcation has been detected within this full model using the methods described above. The idea of reduced models is to construct a low-order model locally near the bifurcation which is capable of representing the dynamics of the full model. The main advantage of such reduced models is that the elementary processes leading to the bifurcation can be studied in detail.

The Jacobian matrix $J$ near a fixed point has an eigenvector decomposition,

Here $R$ and $L$ denote the right- and left-hand eigenspaces of the problem (3.5), and $LH$ is the conjugate transpose of $L$. $I$ is the identity and the diagonal matrix $\Sigma $ contains the corresponding eigenvalues, i.e. $R=(r1r2\cdots rr),L=(l1l2\cdots lr)$ and $\Sigma =diag(\sigma 1\sigma 2\cdots \sigma r)$, where $r=rank(\Sigma )<d$; the latter inequality is due to the singular nature of *M*. Relation (3.6) states that ** L** and

**are a bi-orthogonal set of (generalized) eigenvectors.**

*R*The equations governing the perturbations ** v** from the steady state can be written as

where $N$ represents the non-linear terms. With the use of the eigenbasis, such a perturbation is expanded in *n* right-hand eigenvectors, according to

where $a=LnHv$. The matrix $Rn$ denotes the *n*-dimensional subspace of ** R** of suitably chosen right-hand vectors, and $Ln$ is its adjoint subspace.

Substitution of (3.8) into (3.7), using (3.2), subsequent projection onto the left-hand eigenbasis $Ln$ and the use of the bi-orthogonality relation (3.6), results in the set of coupled equations for the $aj,j=1,\u2026,n$, i.e.

where the coefficients in the projected system are defined as $bjk=ljHJrk$ and $cjkl=ljHN(rk,rl)$.

While this approach is fairly standard, an issue appears regarding the domain in parameter space for which a close correspondence exists between the dynamical behaviour contained in (3.9), for a chosen value of *n*, and the full model (3.1) exists. In addition, the selection of the appropriate eigenmodes is also highly problem dependent (Van der Vaart et al., 2002).

### 3.3. Transfer operator methods

In the previous sections, focus was on individual trajectories and their asymptotic behaviour (e.g. to fixed points or periodic orbits). However, the climate system displays chaotic behaviour in many of its subsystems, e.g. in its midlatitude atmospheric flow. One is then interested in statistical properties of trajectories, or ergodic properties, of a model. For example, a common approach to improve the skill of weather forecasts and quantify their uncertainty is to use an ensemble of trajectories with slightly different initial conditions (Slingo and Palmer, 2011).

Mathematically, let the model satisfy a system of well-posed ordinary differential equations

where $X=\mathbb{R}d$ is the Euclidean state space of dimension *d* and $\Phi :X\u2192X$ a smooth vector field with associated flow $St$, which associates to an initial state $x0$ the solution of (3.10) at some time $t\u22650$. An initial ensemble is then represented by the initial distribution $\rho 0:X\u2192\mathbb{R}$ with

where the $x0(i)$ in *X* are the initial states of the members of the ensemble and δ is the Dirac distribution. The average of any smooth observable $g:X\u2192\mathbb{R}$, such as the sea surface temperature at some location, over the initial ensemble $\rho 0$ is then given by

where the right-hand side is the usual formula for numerical applications. Similarly, the ensemble average of *g* after a given time $t\u22650$, i.e. after propagating the ensemble members by the flow $St:x0(i)\u2192x(i)(t)$, is

The latter can then be used to define an ensemble $\rho (x,t)$ at time *t* such that

which, for $\rho 0$ defined by (3.11), gives

More generally, there exists (Lasota and Mackey, 1994) a linear operator $Pt:L1(X)\u2192L1(X)$ such that (3.14) holds with $\rho (x,t)=Pt\rho 0(x)$, for any density $\rho 0$ (our initial ensemble) in $L1(X)$ and observable *g* in $L\u221e(X)$ (Fig. 3). In particular, taking *g* in (3.14) to the characteristic function $\chi A$ of any Borel set *A*, i.e. $\chi A(x)=1$ if *x* is in *A*, 0 otherwise, one has that

which expresses the fact that the probability to find a member $x$ sampled from an initial ensemble $\rho 0$ in a set *A* after some time *t* is none other than the probability of this member to be initially in the preimage of this set by the flow.

The family $Pt,t\u22650$ inherits from the semigroup property of the flow, i.e.

and satisfies the Liouville equation

which expresses the fact that probabilities are conserved in phase space, thus playing the role of the continuity equation for hydrodynamic flows in physical space.

A key concept in ergodic theory is that of an invariant measure for the non-singular flow $St$, i.e. a probability measure μ such that $\mu (St-1A)=\mu (A)$, for any Borel set *A*. In other words, a measure is invariant if the probability of a state to be in some set does not change as this state is propagated by the flow. It follows then that the average $g\xaf$ with respect to the invariant measure μ of any integrable observable *g* is also invariant with time, i.e.

A flow $St$ with an invariant measure μ has interesting statistical properties when μ is ergodic, i.e. when the sets *A* which are invariant, i.e. $St-1A=A$, are either of measure 0 or 1. Then, by the celebrated individual ergodic theorem of Birkhoff, the average of any integrable observable *g* is such that

where the right-hand side is none other than the infinitely long version of the time mean used in numerical applications, for which the ergodic theorem is in fact implicitly invoked. When μ is ergodic, the time mean thus depends on the initial state ** x** only for a set of measure 0.

Unfortunately, there may exist many ergodic measures. For example, multi-stable dissipative systems, i.e. with more than one attractor, such as climate models with both a warm and a snow-ball steady state, may have several ergodic measures supported by different attractors. In this case, the equality (3.20) between ensemble averages and time averages only holds for initial states belonging to the attractor supporting the measure. On the other hand, time averages for two initial conditions in different basins of attractions would not coincide in general. This leads to the definition of a physical measure for which the equality (3.20) between ensemble averages and time averages holds almost everywhere with respect to the Lebesgue measure $dx$. When several physical measures exist, as may be the case when several attractors coexist, one may choose to restrict the phase space to the basin of attraction of only one of the attractors, e.g. by discarding all initial data converging to the snow-ball attractor, in the case of multi-stable climate models. Let us then assume for convenience that μ is the unique physical measure of the $St$ and denote the time mean of *g* in the right-hand side of (3.20) by $g\xaf$.

In climate studies, one is not only interested in the mean but also in the variability of observables. This can be done by looking at the correlation function $Cf,g$ between two observable *f* and *g* in $L\mu 2(X)$, defined as

For the physical measure μ, the correlation function can be evaluated as the cross-correlation function

which is much used in practice (Von Storch and Zwiers, 1999). Furthermore, in the definition of the correlation function, only information about the dynamics on the support of the measure μ is required. A family of transfer operators $Lt,t\u22650$ can be defined on $L\mu 2(X)$ by

Note that compared to the semigroup $Pt,t\u22650$ defined by (3.14) with respect to the Lebesgue measure $dx$, the semigroup of transfer operators $Lt,t\u22650$ is defined with respect to μ. As will become clear, this comes as a restriction of the method presented here. In particular, when μ is supported by an attractor, no information on the dynamics away from the attractor is carried by $Lt$. On the other hand, when μ has a density $\rho \u221e(x,t)$ with respect to the Lebesgue measure, then both semigroups are related by

Particularly important are the eigenfunctions of the transfer operators $Lt$ and their adjoints, the Koopman operators $Ut:g\u2192g\u2218St$. A function $\psi k$ is an eigenfunction of $Lt$ if

where the complex number $\zeta k(t)$ is the associated eigenvalue. To each $\psi k$ corresponds an eigenfunction $\psi k*$ of $Ut$ associated with the complex conjugate of the eigenvalue $\zeta k(t)$ and such that

It follows from the semigroup property that the eigenfunctions $\psi k*$ and $\psi k$ are independent of time. Moreover, under sufficient continuity conditions for the spectral mapping theorem to hold (Engel and Nagel, 2001), the eigenvalues $\zeta k(t)$ of the semigroup $Lt,t\u22650$ are also related to the eigenvalues $\lambda k$ of its generator *K* by (for all *k*)

and $Re(\lambda k)\u22640$, since $|\zeta k(t)|\u22641$. In particular, one can see from (3.19) that the invariance of μ implies that constant functions on the support of μ are eigenvectors of $Lt$ and $Ut$ associated with the eigenvalue 1.

The mixing spectrum, denoting the set eigenvalues $\lambda k$ and eigenvectors $\psi k$ and $\psi k*$ of the generator and its adjoint, provides deep insight into the statistical properties of the trajectories of the dynamical system. In particular, when the eigenvectors form a bi-orthonormal family, i.e. when the eigenvalues are all semi-simple, the correlation function $Cf,g$ can be decomposed as

so that mixing eigenvalues close to the imaginary axis are responsible for a slow decay of correlations, at a rate given by the real part $-Re(\lambda k)$ and with oscillations at an angular frequency given by the imaginary part $Im(\lambda k)$.

In addition, the correlation spectrum, given by the Fourier transform of the correlation function, decomposes as

The largest peaks in the spectrum are thus related to the subset of eigenvalues $\lambda k$ closest to the imaginary axis. The position of each peak is located at $\omega =Im(\lambda k)$ and the half width of the amplitude is given by $-Re(\lambda k)$.

In order to get information on the mixing spectrum for high-dimensional systems, projections of the transfer operators on a reduced space *Y* subset of the phase space *X* will be approximated from time series. After defining an observation operator $h:Y\u2192X$, Ulam’s method is applied which relies on a Galerkin approximation of the infinite-dimensional transfer operators by finite-dimensional transition matrices (Dellnitz and Junge, 1999; Froyland, 1998; Ulam, 1961). For this purpose, the reduced space is discretized into a family $G={Bi}1\u2264i\u2264mb$ of $mb$ grid boxes corresponding to the orthogonal basis ${\chi B1,...,\chi Bmb}$ of characteristic functions. The transfer operator $L\tau $ is then approximated by a time-homogenous Markov chain with transition matrix $P\tau $ whose elements are the transition probabilities

These probabilities can be estimated from a long time series ${xt}1\u2264t\u2264N$ of ** x** using the Maximum Likelihood Estimator

where $#{(h(xt)\u2208Bi)\u2227(h(xt+\tau )\u2208Bj)}$ is the number of observations $h(x)$ in box $Bi$ such that the observation $h(xt+\tau )$ (a time τ later) is in box $Bj$ and $#{h(xt)\u2208Bi}$ is the total number of observations $h(x)$ in $Bi$.

From the eigenvalues $\zeta ^k(\tau )$ of the matrix $P^\tau $, for a chosen delay time τ, eventually the $\lambda ^k$ are determined through

There are all kind of technical issues in this approximation of the transfer operator and the ergodicity eigenvalues. One important issue is that the semigroup property of $Lt$ may no longer hold in the reduced space because memory enters the reduced system due to projection into the low-dimensional subspace. If the projection does not introduce any non-Markovian effects, the family of reduced transfer operators should inherit from the semigroup property and the $\lambda ^k$ should be constant. If this is not the case, however, one can choose the lag τ in a range in which at least the leading eigenvalues $\lambda ^k$ vary as little as possible, so that the latter behave as Markovian. Another issue is the choice of the delay time τ. Apart from this, the reduced subspace should be quite small as otherwise the computation of the transition probabilities becomes intractable. Hence, in each application, the robustness of the mixing spectrum has to be addressed carefully (Tantet et al., 2015).

### 3.4. Covariance matrix determination through Lyapunov solvers

When noise is added to the forcing, the evolution of the flow (described by the stochastic process $Xt$) can generally be formulated (after discretization of the continuous equations) by a stochastic differential-algebraic equation (SDAE) of the form

where $\Phi (x)=Lx+N(x)+F$ is the deterministic right-hand side of (3.1) and $Wt$ represents standard *d*-dimensional multivariate Brownian motion (Gardiner, 2002) and $\Theta (Xt)$ the state-dependency of the noise. Memory effects in the driving noise can be accounted for by increasing the dimension of the state vector $X$.

Suppose that the deterministic part of (3.34) has a stable fixed point $x\u2217$ for a given range of parameter values. Then linearization around the deterministic equilibrium, and assuming the multiplicative noise part is higher order in the linearization of $\Theta (Xt)$, yields (Kuehn, 2012)

where $J$ is again the Jacobian matrix and $B=\Theta (x\u2217)$. Hence, in this section, we consider only the additive noise case.

When *M* is a singular diagonal matrix, the stochastic part is assumed non-zero only on the respective components where *M* is non-singular. In this case, (3.35) can be written as

where $M22$ represents the non-singular part of *M*. Consequently, for non-singular $J11$ (3.36) can be separated into an algebraic part and an explicitly time-dependent part according to

where $S=J22-J21J11-1J12$ is the Schur complement of $J$. Since $M22$ is non-singular, the stationary covariance matrix $C22$ can be determined by solving the corresponding generalized Lyapunov equation (Kuehn, 2012)

In order to find the full covariance matrix one can use $Cij=E[vivj\u22a4]$ together with (3.37a) which gives $C12=-J11-1J12C22,C21=-C22J12\u22a4J11-\u22a4$ and $C11=-J11-1J12C21$.

An estimate of the covariance matrix *C* of the stochastic dynamical system (3.34) can hence be obtained by providing the matrices $J$, $B$ and *M* and solving the corresponding generalized Lyapunov equation (3.38). Once the covariance matrix *C* is computed, the stationary probability density function, indicated by $q(x)$, of the approximating Ornstein-Uhlenbeck process follows from (Gardiner, 2002; Kuehn, 2011) as

The limitations of this approach are, firstly, that only a probability density function estimate is provided valid on a subexponential timescale before large deviations occur and, secondly, that only the local (i.e. near the steady state) and Gaussian stochastic behaviour of the system are obtained. When large deviations away from the fixed point are of interest, other approaches must be considered (Freidlin and Wentzell, 2012).

In Baars et al. (2017), a new efficient Lyapunov equation solver was presented which solely relies on matrix-vector products based on a Galerkin projection. The covariance matrix *C* is approximated by a low-rank approximation $C\u02dc=VTV\u22a4$ and the projected system solved is of the form

With $J\u02dc=V\u22a4JV$, $M\u02dc=V\u22a4MV$, $B\u02dc=V\u22a4B$ is taken, a smaller Lyapunov equation results, i.e.

is obtained which can be solved by a dense solution routine (Bartels and Stewart, 1972). Details on the computation of the space *V* are presented in Baars et al. (2017).

### 3.5. Non-Markovian reduced stochastic models

The Galerkin method in section 3.2 was used to numerically reduce a high-dimensional system locally to a low-dimensional system to study the physics of the local bifurcations. Such methodology was recently extended to stochastic case by Chekroun et al. (2015) and improved by the consideration of non-Markovian terms. We here only consider the full multiplicative noise case, where the noise is only applied to components of the state vector for which the diagonal element of *M* is non-zero. To the evolution equations of the perturbation state vector $v$ (with respect to the deterministic fixed point), a specific state-dependent noise term is consider here. This gives the SDAE

where $Wt$ denotes the scalar Wiener process. Here, $v\u2218dWt$ is understood in the Stratonovich sense, and $\epsilon \u22650$ quantifies the intensity of the noise term.

The eigenvalues of the linearized problem (as in section 3.2) are ordered by their real part, with $Re(\sigma 1)\u2265Re(\sigma 2)\u2265\u2026$ and again $ri$ and $li$ are the right and left eigenvector belonging to eigenvalue $\sigma i$. Next the projection of the perturbation ** v** on the space spanned by the

*m*first eigenvectors is denoted by $vc$; these are the ‘resolved variables’. The projection of

**on the space spanned by the $m+1$-th until**

*v**r*-th eigenvector is indicated by $vs$; these are the ‘unresolved variables’.

Expanding $v$ in *r* eigenvectors gives $v=vc+vs=\u2211i=1rairi=Rrar$. Substituting this in equation (3.41) and multiplying by $LrH$ gives:

In one version of the parameterized manifold approach developed in Chekroun et al. (2015), the unresolved part of the solution $vs$ is represented by

where ξ is the vector variable in the parameterized manifold and for $n\u2265m+1$:

Here the $Qni1,i2$ are determined by solving the stochastic differential equation

and the coefficients $Npqk$ are determined from $Npqk=<lk,N(rp,rq)>=lkHN(rp,rq)$.

Substituting this approximation for $vs$ into (3.42) and projecting again eventually gives the stochastic reduced model

These random coefficients $Qni1,i2$ convey exogenous memory effects resulting from the interactions of the noise with the bilinear term in (3.41) (these coefficients depending on the past of the noise) as explained in Chekroun et al. (2015). For $\epsilon =0$, the deterministic case is obtained and compared to the result in section 3.2. One observes that, different from the Galerkin projection, now also the higher order interactions of the modes are taken into account in the reduced model (3.45).

### 3.6. Dynamical Orthogonal (DO) field methods

If a general random state vector is denoted by $Xt$, the starting point of the dynamical orthogonal (DO) method is to use a generalized, time-dependent, Karhoenen-Loeve expansion

where the coefficients $Yi(t)$ are scalar stochastic processes with arbitrary probability distribution functions and the $Xi$ are deterministic vectors. The employed representation follows from the assumption that the stochastic part of the solution ‘lives’ in a finite-dimensional space, the stochastic subspace $VS$ (Fig. 4).

The finite-dimensionality of the representation is crucial towards the derivation of closed field equations although it is not sufficient. The choice to have all unknown terms varying with time leads to redundancy issues and therefore additional constraints are necessary in order to obtain independent equations for all the quantities involved. In (Sapsis and Lermusiaux, 2009), it is shown that a suitable constraint, which arises naturally to overcome this redundancy issue, is the dynamical orthogonality condition. It requires that the time-dependent basis should vary orthogonally to the space $VS$ and hence on every time instant:

and $<,$ is the standard inner product. The use of time-dependence in the basis elements (the so-called DO modes) allows the representation of the transient character of the solution using much fewer modes. From a physical point of view, the constraint (3.47) imposes the natural need for the computational algorithm to evolve the shape of the perturbations only if there is an important new direction, which is not already covered by the existing modes. In terms of computational performance, it is a trade-off between, on the one hand, fewer modes to consider, but on the other hand, more equations (including interactions between all the considered modes) to solve.

The orthogonality condition leads to a closed set of equations that allows for the evolution of the mean field $X\xaf(t)$, the DO modes $Xi(t)$ and the stochastic coefficients $Yi(t)$. In this way the solution is obtained in the form of random realizations for the scalar stochastic coefficients $Yi(t)$ (having low-computational and storage cost since they are low-dimensional) and in terms of $s+1$ deterministic fields containing the full deterministic information (mean field) and the *s* time-dependent directions (the DO modes) where stochasticity is important. From the mean, the time-dependent patterns of the DO modes plus the distribution of the stochastic coefficients *Y* (at a certain time *t*), an approximation to the probability density function of the state vector can be obtained.

The dimension *s* can be estimated during the time-development by monitoring the eigenvalues of the covariance matrix $CYi(t)Yj(t)$ (Sapsis and Lermusiaux, 2012). The criteria for evolving the subspace size are based on stability arguments which follow directly from the system differential equations. When an eigenvalue of the covariance matrix $CYi(t)Yj(t)$ exceeds a critical value, *s* is increased by one; when it drops below a critical value, *s* is decreased by one. The algorithm also provides the instantaneously most unstable perturbation, which is not included into the stochastic subspace. More details on the methodology can be found in Sapsis (2010).

Having the random coefficients expressed in the form of random realizations has important advantages especially for climate applications. It allows for the simple and efficient representation of any interesting statistical quantity of the solution such as single or joint moments of any order and probability density functions of the velocity field in specific points of the domain but also spatiotemporal correlation functions and length-scales. Based on these realizations, one may compute statistical moments or the associated probability density functions.

## 4. Application: the path variations of the Kuroshio

In this section, we apply part of the methodology to an idealized model of the wind-driven ocean circulation. After presenting the model and its bifurcation diagram in the sections 4.1 and 4.2, respectively, we consider results for the DO method in section 4.3 and of a stochastic reduced model in section 4.4. From the bifurcation diagram in section 4.2 we learn that symmetry breaking can lead to multiple asymmetric Kuroshio jets, even under symmetric wind forcing, and that intrinsic variability (meandering of the jet) can occur through Hopf bifurcations. With the DO analysis in section 4.3 the effects of noise in the wind stress on the transitions between the different Kuroshio states are studied. With the local stochastic model near the first symmetry breaking pitchfork the local probability density function is calculated (section 4.4).

### 4.1. Model of the Kuroshio

We consider ocean flows with constant density $(\rho o)$ in a square basin of horizontal dimension *L* and with constant depth *H*, which is situated on a midlatitude β-plane with Coriolis parameter $f=f0+\beta 0y$. The ocean circulation in the basin is forced at the surface through a wind-stress vector $\tau =\tau 0(\tau x(x,y),\tau y(x,y))$.

In a quasi-geostrophic approximation, the flow can be modelled by the well-known barotropic vorticity equation (Pedlosky, 1987) for the geostrophic stream function ψ. With introduction of the horizontal velocities *u* and *v* and the relative vorticity ζ, the model can be written as

where $AH$ is the lateral friction coefficient. On the lateral zonal boundaries $x=0,L$, no-slip conditions are prescribed, whereas at the meridional boundaries ($y=0,L$), we apply slip conditions

The deterministic (so-called double-gyre) wind stress considered in this section is

where the parameter *a* controls the asymmetry of the wind-stress field with respect to the mid-axis of the basin. The standard values of the parameters are provided in Table 1 and we use $AH$ (or the Reynolds number $Re=UL/AH$) as a control parameter.

H = 6.0 ∙ 10^{2} m | ρ = 1.0 ∙10^{3} kg m^{−3} |

L = 1.0 ∙ 10^{6} m | $\beta 0$ = 2.0 ∙ 10^{−11} (ms)^{−1} |

U = 1.6 ∙ 10^{−2} m s^{−1} | $\tau 0$ = 1.0 ∙ 10^{−1} Pa |

H = 6.0 ∙ 10^{2} m | ρ = 1.0 ∙10^{3} kg m^{−3} |

L = 1.0 ∙ 10^{6} m | $\beta 0$ = 2.0 ∙ 10^{−11} (ms)^{−1} |

U = 1.6 ∙ 10^{−2} m s^{−1} | $\tau 0$ = 1.0 ∙ 10^{−1} Pa |

### 4.2. Deterministic bifurcation diagram

The equations (4.1) were discretized using a 128 × 128 equidistant grid and the steady states are computed using a pseudo-arclength continuation method as described in section 3.1. The dimension of the state vector $d=128\xd7128\xd72$ as both ψ and ζ are considered in the state vector. The matrix *M* in (3.1) has zeroes on the diagonal for each ψ equation, i.e. the discretized version of (4.1b).

For the wind-stress forcing (4.3), the structure of the steady solutions is shown through the bifurcation diagram in Figure 5a, where the value of the stream function (scaled with $LHU$) at a point in the southwest part of the domain ($\psi R=\psi (x=L/4,y=L/4)/(LHU)$) is plotted versus $Re$. At small $Re$, the anti-symmetric double-gyre flow (Fig. 5b) is a unique state. When lateral friction is decreased, this flow becomes unstable at the pitchfork bifurcation $P1$ and two branches of stable asymmetric states appear for smaller values of $AH$ (larger $Re$). The solutions on these branches have the jet displaced either southward or northward (Fig. 5c) and are exactly symmetrically related for the same value of $Re$. The asymmetric solutions become unstable at a Hopf bifurcation *H* located at $Re=52$.

The branch of unstable steady solutions at $P1$ is the continuation of the anti-symmetric solution branch. For even smaller friction, this anti-symmetric flow becomes inertially dominated and $\psi R$ increases rapidly while going through the saddle-node bifurcation *L*. A pitchfork bifurcation $P2$ occurs on the anti-symmetric branch where an additional pair of asymmetric solution branches appears; all these solutions are unstable.

### 4.3. Wind-stress noise: DO analysis

The wind-stress field over the North Atlantic and North Pacific contains a very strong variability on many timescales. Hence, to understand the variability of wind-driven flows, a stochastic component of the wind-stress needs to be represented. A stochastic wind-stress product $\tau s$ was suggested in Sura et al. (2001) as

where $\rho a$ is the air density and $CD$ the drag coefficient, with

where λ is a scale factor controlling the spatial extent of the stochastic forcing and α controls its amplitude. The quantities $\eta x$ and $\eta y$ represent Gaussian white noise as a generalized (distribution valued) stochastic process [see section 3 in Doering (1990)]. Numerically, at each time value $ti=i\Delta t,i=1,\u2026,n$, the quantities $\eta x(ti),i=1,\u2026,n$ are independent and normally distributed, and the stochastic process has a non-zero autocorrelation time that is shorter than the time step $\Delta t$; the same holds for $\eta y(t)$.

Similarly to the choice in Sura et al. (2001), the amplitude α is normalized so that it is the same for every value of λ, i.e. α is computed from $\alpha =1/f(0,0;1/4)$. The reference amplitude for $\lambda =1/4$ is chosen such that the variance in the noise $\epsilon 2$ = 28 m^{2} s^{−2}, similar to that in Sura et al. (2001), with values of $\rho a=1.2$ kg m^{−3} and $CD=2.0\u22c510-3$. The DO results for $Re=35$ are shown in Figure 6 for $\lambda =0.25$. The amplitudes of all DO modes approach steady values in time and the first DO mode has the largest amplitude. This mode has the spatial pattern of the *P*-mode in Simonnet and Dijkstra (2002), the normal mode destabilizing the steady state at the pitchfork bifurcation $P1$. It was shown in Sapsis and Dijkstra (2013) that the spatial structure of the DO modes is the same for each value of λ. Also, the geometry of the probability density remains the same for different noise forcing although the probability measure is more spread for smaller values of λ.

### 4.4. (Stochastic) reduced models

We will show results for the reduced models just above the value of $Re$ at the first pitchfork bifurcation $P1$ in Figure 5. Thereto the equations (4.1) are first non-dimensionalized using a spatial scale *L*, a velocity scale *U* and a timescale $L/U$. For $Re=37$, there are two (dimensionless) real eigenvalues $\sigma 1=6.98\xd710-2$ and $\sigma 2=-4.196$. In the deterministic case, using the methodology of section 3.2, we obtain the equations (3.9)

with $c112=1.49\xd710-2$, $c121=-1.53\xd710-2$, $c211=1.37\xd710-3$, $c222=-7.36\xd710-4$. Apart from $(0,0)$, there are two additional steady states and the total solution for the dimensionless stream function based on $a1$ and $a2$ gives $\psi max=20.93$, whereas from the numerical solution of the full model, we find $\psi max=21.16$ indicating that the deterministic reduced model is well capable of approximating the pitchfork bifurcation quantitatively (Van der Vaart et al., 2002).

For the stochastic case the amplitude $a1$, as determined from (3.45) using $m=1$ and $r=2$, is shown versus time for a noise amplitude $\epsilon =0.1$ in Figure 7a; the initial amplitude $a1(0)=1$. The function $Q21,1$ (not shown) varies around 0.23, $a1$ around $710.8$ and the probability density function $p+$ is plotted in Figure 7b. In this multiplicative noise case considered, there are no stochastically induced transitions between fixed points (note that $Re=37$ is in a multiple equilibrium regime) and the probability density function depends on the sign of the initial amplitude $a1(0)$.

## 5. Application: the transitions of the MOC

In this subsection, we apply part of the methodology to an idealized model of the meridional overturning ocean circulation. After presenting the model and its bifurcation diagram of the deterministic version in the sections 5.1 and 5.2, respectively, we consider the Lyapunov solver approach in section 5.3 and the transfer operator approach in section 5.4.

### 5.1. Model of the MOC

The spatially quasi two-dimensional model of the Atlantic circulation as described in den Toom et al. (2011) is fit for purpose to consider transitions of the AMOC (see section 2.2). In the model, there are two active tracers: temperature *T* and salinity *S*, which are related to the density $\rho o$ by a linear equation of state

where $\alpha T$ and $\alpha S$ are the thermal expansion and haline contraction coefficients, respectively, and $\rho 0$, $T0$ and $S0$ are reference quantities. The mixing of momentum and tracers due to eddies is parameterized by simple anisotropic diffusion. In this case, the zonal velocity $u=0$, the longitudinal derivatives are zero and the equations for the meridional velocity *v*, vertical velocity *w*, pressure *p*, and the tracers *T* and *S* are given by

Here *t* is time latitude, *z* the vertical coordinate, $r0$ the radius of Earth, *g* the acceleration due to gravity, $AH$ ($AV$) the horizontal (vertical) eddy viscosity and $KH$ ($KV$) the horizontal (vertical) eddy diffusivity. The terms with CA represent convective adjustment contributions as specified in den Toom et al. (2011).

The equations are solved on an equatorially symmetric, flat-bottomed domain. The basin is bounded by latitudes $\theta =-\theta N$ and $\theta =\theta N=60\xb0$ and has depth *H*. Free-slip conditions apply at the lateral walls and at the bottom. Rigid lid conditions are assumed at the surface and atmospheric pressure is neglected. The wind stress is zero everywhere, and ‘mixed’ boundary conditions apply for temperature and salinity,

This formulation implies that temperatures in the upper model layer (of depth $Hm$) are relaxed to a prescribed temperature profile $T\xaf$ at a rate $\tau -1$, while salinity is forced by a net freshwater flux *Q*, which is converted to an equivalent virtual salinity flux by multiplication with $S0$. The numerical values of the fixed model parameters are given in Table 2.

H = 4.0 ∙ 10^{3} m | $Hm$ = 2.5 ∙10^{2} m |

$r0$ = 6.371 ∙ 10^{6} m | $T0$ = 15.0 °C |

g = 9.8 m s^{−2} | $S0$ = 35.0 psu |

$AH$ = 2.2 ∙ 10^{12} m^{2} s^{−1} | $\alpha T$ = 1.0 ∙ 10^{−4} K^{−1} |

$AV$ = 1.0 ∙ 10^{−3} m^{2} s^{−1} | $\alpha S$ = 7.6 ∙ 10^{−4} psu^{−1} |

$KH$ = 1.0 ∙ 10^{3} m^{2} s^{−1} | $\rho 0$ = 1.0 ∙ 10^{3} kg m^{−3} |

$KV$ = 1.0 ∙ 10^{−4} m^{2} s^{−1} | τ = 75.0 days |

H = 4.0 ∙ 10^{3} m | $Hm$ = 2.5 ∙10^{2} m |

$r0$ = 6.371 ∙ 10^{6} m | $T0$ = 15.0 °C |

g = 9.8 m s^{−2} | $S0$ = 35.0 psu |

$AH$ = 2.2 ∙ 10^{12} m^{2} s^{−1} | $\alpha T$ = 1.0 ∙ 10^{−4} K^{−1} |

$AV$ = 1.0 ∙ 10^{−3} m^{2} s^{−1} | $\alpha S$ = 7.6 ∙ 10^{−4} psu^{−1} |

$KH$ = 1.0 ∙ 10^{3} m^{2} s^{−1} | $\rho 0$ = 1.0 ∙ 10^{3} kg m^{−3} |

$KV$ = 1.0 ∙ 10^{−4} m^{2} s^{−1} | τ = 75.0 days |

### 5.2. Deterministic bifurcation diagram

For the deterministic case, the equatorially symmetric boundary forcing is fixed as

where μ is the strength of the mean freshwater forcing (which is taken as bifurcation parameter) and $F0=0.1$ m yr^{−1} is a reference freshwater flux.

The equations are discretized on a latitude-depth equidistant $Mo\xd7Lo$ grid using a second-order conservative central difference scheme. An integral condition expressing the overall conservation of salt is also imposed, as the salinity equation is only determined up to an additive constant. The total number of d.f. $d=6MoLo$, as there are six unknowns per point and the standard spatial resolution used is $Mo=32,Lo=16$.

The bifurcation diagram of the deterministic model for parameters as in Table 1 is shown in Figure 8a. On the *y*-axis, the sum of the maximum ($\Psi +$) and minimum $\Psi -$ values of the meridional stream function Ψ is plotted, where Ψ is defined through

For the calculation of the transports, the basin is assumed to have a zonal width of $64\xb0$. The value of $\Psi ++\Psi -$ is zero when the MOC is symmetric with respect to the equator.

For small values of μ, a unique equatorially anti-symmetric steady MOC exists of which a pattern at location *a* is shown in Figure 8b. This pattern destabilizes at a supercritical pitchfork bifurcation and two asymmetric pole-to-pole solutions appear. An example of the MOC at location *b* in Figure 8b shows a stronger asymmetric overturning with sinking in the northern part of the basin. The pole-to-pole solutions cease to exist beyond a saddle-node bifurcation near $\mu =0.13$ and both branches connect again with the anti-symmetric solution at a second subcritical pitchfork bifurcation. At this bifurcation, the anti-symmetric solution with equatorial sinking (see MOC at location *c* in Fig. 8b) appears which is stable for larger values of μ. The value of μ at the point *b*, $\mu b=0.110$, is here considered as the reference freshwater forcing.

### 5.3. Lyapunov solver approach

Next, the symmetric freshwater forcing is chosen as

where $\eta (t)$ represents Gaussian white noise [again in the generalized distribution sense (Doering, 1990)], i.e. with $E[\eta (t)]=0$ and $E[\eta (t)\eta (s)]=\delta (t-s)$, where δ is the Dirac distribution. For the numerical implementation and the interpretation of the stochastic process, the same holds for $\eta (t)$ as for $\eta x(t)$ in (4.5). In this case, the noise matrix $B$ in (3.35) simply represents additive noise, which is (i) only active in the freshwater component, (ii) only present at the surface, (iii) uncorrelated and (iv) at each latitude θ the magnitude σ is $10%$ of the deterministic freshwater forcing amplitude [see (5.6)]. Note that the noise in (5.6) is assumed to be highly correlated spatially which is not very realistic considering the spatial irregularity of the atmospheric precipitation field.

Using the available Jacobian $J$ of the deterministic continuation, the mass matrix *M*, which is a diagonal matrix which is only non-zero on the *T* and *S* parts, and the forcing $B$ as described above, we can determine the local probability distribution of a steady state using the generalized Lyapunov equation (3.38). The patterns of the stream function, temperature and salinity of the first eigenvector of the covariance matrix for $\mu =\mu b$ using the method as described in section 3.4 are shown in Figure 9 and are the same as the first empirical orthogonal function (EOF) (Baars et al., 2017) determined from a long time series. The patterns of the first eigenvector of the covariance matrix are here closely related to the first normal mode [the first eigenvector of the generalized stability problem (3.5)]. Perturbing the system along this direction with a small amplitude will induce the longest decay time back to the steady state in the presence of the salt-advection feedback. This is not a generic result; there may be a sum of eigenmodes which may end up in the first EOF due to the existence of non-normal dynamics (Alexander and Monahan, 2009; Sévellec et al., 2007; Tziperman et al., 2008).

### 5.4. Transfer operator approach

In the bifurcation diagram Figure 8a, a saddle-node bifurcation occurs when μ is increased slightly from the value of $\mu b$, say at $\mu S$. When the freshwater forcing is such that $\mu >\mu S$, then the MOC will collapse into a flow with sinking at the equator (Fig. 8d), with severe changes in the meridional heat transport. Hence, when μ slowly increasing from $\mu b$, one wants to design an early warning indicator of the approaching collapse. Traditional methods make use of the increase in lag-1 autocorrelation or the increase in variance (Lenton, 2011), which are expected near a saddle-node bifurcation as one eigenvalue of the Jacobian matrix approaches zero. However, in practical applications this method may lead to false alarms and too early warning. It is therefore interesting to consider the behaviour of the correlation function (and hence the ergodicity spectrum) near the saddle-node bifurcation.

From (3.29), it is seen that the decay in the correlation function is controlled by the real part of the eigenvalue of the generator $Re(\lambda 2)$—note that $\lambda 1=0$—near the imaginary axis. To approximate the transfer operator, time series of 50000 years of the stream function Ψ are used with a resolution of 1 year. First, the eigenfunctions of the covariance matrix are calculated and next the time series are projected into the (EOF1, EOF2) space using the principle component time series (PC1, PC2). Second, the transition matrix is calculated as described in section 3.3 using a grid of $40\xd740$ and the time lag τ of $51$ years.

In the spectrum of the generator at $\mu b$ (Fig. 10a), there is large spectral gap with a strong decay of correlations. When μ is increased, $\lambda 2$ becomes real and approaches the imaginary axis (Fig. 10b) leading to long decay rates and explaining the critical slowdown near the saddle-node bifurcation (Gaspard et al., 1995). The results are robust for different grid sizes in the (EOF1, EOF2) space and are also robust for time series with a length equal to or bigger than 40000 years. While this slowing down is here visible from the decay of the correlation function alone, the spectrum of appropriately reduced transfer operators can reveal a more complete information about the system, such as the presence of resonance for different observables. However, as this method is ‘data hungry’, its applicability to determine early warning signals for transitions in the AMOC is very limited.

## 6. Summary and discussion

In this paper, we have given a review of a number of numerical techniques to study transition behaviour in high-dimensional stochastic dynamical systems, such as those arising from the discretization of stochastic partial differential equations. This allows the application of tools from dynamical systems theory (both bifurcation theory and ergodic theory) to problems involving spatially extended systems in many domains.

At the basis of all dynamical systems analysis is a determination of the fixed points of the autonomous ‘backbone’ system in parameter space (i.e. for which the forcing and parameters are assumed stationary). While these fixed points may be less relevant to explain complicated spatial-temporal behaviour of practical interest, the bifurcation diagram indicates possible intervals in parameter space where multiple states can occur and where transitions to time-dependence take place. Continuation methods, combined with sophisticated solvers for linear systems and generalized eigenvalue problems, are now capable of handling these computations. Practical difficulties in evaluating Jacobian matrices and choices for preconditioners may be application dependent but general strategies exist (Dijkstra et al., 2014).

Further analyses depend on the specific questions asked and whether the mathematical model system is autonomous, non-autonomous or stochastic. Interest in specific mechanisms of transitions in deterministic systems may benefit from the deterministic reduction techniques as presented in section 3.2, where a system of local ordinary differential equations is generated numerically by Galerkin projection. Analysis of this reduced model in the form of amplitude equations enables a very detailed analysis of instability mechanisms (Dijkstra and Katsman, 1997), such as the symmetry breaking at the pitchfork $P1$ in Figure 5.

When a non-autonomous system is considered, e.g., by adding a time-dependent forcing, the behaviour can often be interpreted with help of the autonomous bifurcation diagram. This includes so-called fast–slow systems, where the timescale of the forcing/parameters is much slower than the internal response timescales of the autonomous system. This timescale is determined by the eigenvalues of the linear stability problem of the fixed points and hence is available from the bifurcation analysis. In many cases, rate-dependent tipping behaviour (Ashwin et al., 2012) can also be explained in terms of classical bifurcation theory although one may also need other theoretical approaches or full trajectory computation.

Under stochastic noise, one can either look at the probability density function under linearized dynamics by solving a Lyapunov equation or directly tackle the full equations by DO field methods. The matrix formulation of the latter method allows a relatively easy application as the generation of the equations for the stochastic coefficients and the DO modes is carried out numerically. Methods to obtain stochastic reduced (non-Markovian) equations, where the interaction of the noise and the non-linearities can be investigated near any background flow, can also be very useful to study the interaction of noise and non-linearities near a (deterministic) bifurcation.

We have demonstrated the potential of the methods by applying them to two canonical problems in physical oceanography, the midlatitude Atlantic wind-driven ocean circulation and the AMOC. Which method to use depends on the eventual question and the type of noise included in a particular model. Each method has its limitations in terms of applicability (e.g. only additive noise) and in the information eventually provided (e.g. only local dynamics). In addition, the methods presented here can be technically demanding for a particular model chosen as Jacobian matrices have to be computed, or transfer operators have to be reconstructed.

The techniques in principle also allow to tackle problems in control and predictability. The availability of the Jacobian matrix is crucial to determine transfer functions in optimal control problems. For predictability studies, an estimate of the evolution of the probability density function, even in a reduced space, is valuable. Furthermore, the adjoint of the tangent linear model is easily constructed from the Jacobian matrix as it is its transpose. Hence, optimal modes, in any norm, can be computed by solving the corresponding optimization problem (van Scheltinga and Dijkstra, 2008).

To conclude, efficient numerical methods are available to study non-linear behaviour in a broad class of climate models which are described by stochastic partial differential equations and we hope that this paper will stimulate their usage, the improvement of current methods and the development of new tools.