Summary

Wavelets provide the flexibility for analysing stochastic processes at different scales. In this article we apply them to multivariate point processes as a means of detecting and analysing unknown nonstationarity, both within and across component processes. To provide statistical tractability, a temporally smoothed wavelet periodogram is developed and shown to be equivalent to a multi-wavelet periodogram. Under a stationarity assumption, the distribution of the temporally smoothed wavelet periodogram is demonstrated to be asymptotically Wishart, with the centrality matrix and degrees of freedom readily computable from the multi-wavelet formulation. Distributional results extend to wavelet coherence, a time-scale measure of inter-process correlation. This statistical framework is used to construct a test for stationarity in multivariate point processes. The methods are applied to neural spike-train data, where it is shown to detect and characterize time-varying dependency patterns.

1. Introduction

We adopt the construction of Hawkes (1971), which presents a |$p$|-dimensional (⁠|$p\geq1$|⁠) multivariate point process as a counting vector |$N(t)= \{N_{1}(t),\ldots,N_{p}(t)\}^{{ \mathrm{\scriptscriptstyle T} }}$|⁠, where the random element |$N_{i}(t)$||$(i=1,\ldots,p)$| represents the number of events of type |$i$| over the interval |$(0,t]$|⁠. Its first-order properties are characterized by its rate |$\lambda(t)\in\mathbb{R}^p$|⁠, defined as |$\lambda(t)= E\{{\mathrm d} N(t)\}/{\mathrm d} t$| where |${\mathrm d} N(t) = N(t+{\mathrm d} t)-N(t)$|⁠, and its second-order properties at times |$s$| and |$t$| are characterized by its covariance density matrix

The process |$N(t)$| is second-order stationary, henceforth referred to simply as stationary, if |$\lambda(t)$| is constant for all |$t$| and |$\Gamma(t,s)$| depends only on |$\tau = s-t$|⁠. In this setting we will denote the covariance density matrix by |$\Gamma(\tau)$|⁠.

The spectral domain provides a rich environment for representing this second-order structure, and is based on the fact that a stationary stochastic process can be regarded as a composite of subprocesses operating at different frequencies. The spectral density matrix of a stationary point process is the Fourier transform of its covariance density matrix (Bartlett, 1963), namely
A fundamental summary of the second-order relationship between a pair of component processes, |$N_i(t)$| and |$N_j(t)$| say, is their coherence, defined as
(1)

This provides a normalized measure on |$[0,1]$| of the correlation structure between the processes in the frequency domain. For time-series data, coherence has been used extensively in several disciplines, including climatology, oceanography and medicine. For event data, it has served as an important tool in neuroscience for the analysis of neuron spike-train data.

Estimation of the coherence can be achieved by substituting smoothed spectral estimators into (1). Failure to smooth, i.e., simply using the periodogram, will result in a coherence estimate of 1 for all frequencies, irrespective of whether correlation exists between the pair of processes or not. Tractability of the coherence estimator’s distribution is crucial for principled statistical testing and is dependent on the smoothing procedure used (Walden, 2000).

Often, stochastic processes do not conform to the assumptions of stationarity. This may be due to simple first-order trends in the underlying data-generating process or, more typically, complex changes in its second- or higher-order structure. For point processes, the key objective is to analyse how correlations within and across component processes change over time. One approach is to model the process and its associated time-varying spectrum, with the locally stationary Hawkes process (Roueff et al., 2016; Roueff & Von Sachs, 2019) representing an important recent advancement. The present paper complements this by developing a versatile nonparametric approach for exploring second-order structures in a time-localized way. Wavelets form a natural basis with which to do this because of their inherent ability to trade off time and frequency resolution. Wavelet methods designed for time-series analysis (e.g., Nason et al., 2000) could be deployed on discrete-time count sequences. However, should precise timestamps for the events be available, the necessary binning procedure required to implement them would discard information. This motivates the development of continuous-time methods that preserve all temporal information.

For a wavelet |$\psi(t)$|⁠, the continuous wavelet transform at scale |$a>0$| and translation, or time, |$b\in\mathbb{R}$| of |$N(t)$|⁠, observed on the interval |$(0,T]$|⁠, is defined by Brillinger (1996) as
(2)
where |$^*$| denotes the complex conjugate. The |$i$|th element of this stochastic integral is computed as |$w_i(a,b) = \sum_{k=1}^{N_i(T)}\psi^{*}_{a,b}(s_{i,k})$|⁠, where |$s_{i,1},\ldots,s_{i,N_i(T)}$| are the ordered event times of |$N_i(t)$| and |$\psi_{a,b}(t)=a^{-1/2}\psi\{(t-b)/a\}$|⁠. Thus, working with the continuous-time process is possible if the finite set of event times are known. The wavelet periodogram is then defined as |$W(a,b) = w(a,b)w^{{ \mathrm{\scriptscriptstyle H} }}(a,b),$| where |$^{{ \mathrm{\scriptscriptstyle H} }}$| denotes the complex conjugate transpose.
As is the case with the Fourier periodogram, smoothing is required for two reasons, firstly to control variance, and secondly to give meaningful values of the wavelet coherence estimator. Wavelet coherence is an analogue of coherence that provides a normalized measure on |$[0,1]$| of the correlation between a pair of processes in time-scale space. It is defined as
where |$\Omega$| is a smoothed version of |$W$|⁠. In the time-series setting, wavelet coherence has been extensively applied in a wide range of disciplines (e.g., Torrence & Webster, 1999; Grinsted et al., 2004). Understanding the distributional properties of these smoothed coherence estimators is vital for rigorous statistical analysis and testing. In the Gaussian discrete-time setting, the asymptotic distribution of coherence has been extensively studied (Cohen & Walden, 2010a, b); however, the point-process case has received little attention.

There are a wide range of ways in which nonstationarity can occur. Hence, rather than assume a specific model of nonstationarity, we propose to study the properties of the temporally smoothed wavelet periodogram and coherence for stationary point processes. This provides a statistical framework in which we deploy methods for exploratory data analysis, and construct a test for second-order stationarity, complementing stationarity tests in time-series analysis (e.g., Von Sachs & Neumann, 2000; Nason, 2013; Preuss et al., 2013).

2. Temporally smoothed wavelet periodogram

2.1. Formulation

 
Assumption 1

The wavelet |$\psi(t)$| is a real- or complex-valued continuous function that satisfies (i) |$\int_{-\infty}^{\infty}\psi(t)\,{\mathrm d} t =0$|⁠, (ii) |$\|\psi\|_2 = 1$|⁠, and (iii) the admissibility condition |$\int_{-\infty}^\infty f^{-1}|\Psi(f)|^2 \,{\mathrm d} f<\infty$| where |$\Psi(f)$| is the Fourier transform of |$\psi(t)$|⁠.

 
Assumption 2

The smoothing function |$h(t)$| is a nonnegative symmetric function supported and continuous on |$(-1/2,1/2)$| and normalized such that |$\int_{-\infty}^{\infty}h(t)\,{\mathrm d} t=1$|⁠.

Let |$\psi(t)$| and |$h(t)$| satisfy Assumptions 1 and 2, respectively. We define the temporally smoothed wavelet periodogram as
(3)
where |$h_{\xi (t)}= \xi^{-1}h(t/\xi)$| with |$\xi>0$| controlling the level of smoothing. This is a wavelet analogue of Welch’s weighted overlapping sample averaging spectral estimator for stationary time series (Welch, 1967; Carter, 1987). It will prove convenient to have the level of smoothing scale with |$a$|⁠; we therefore let |$\xi = \kappa a$| with |$\kappa>0$|⁠.
For a particular choice of |$\kappa$|⁠, and defining the Hermitian kernel function, at scale |$a=1$|⁠, as
(4)
the temporally smoothed wavelet periodogram in (3) can be expressed as
where |$ K_{a,b}(s,t)=a^{-1}K\{(s-b)/a,(t-b)/a\} $|⁠. The |$(i,j)$|th element of |$\Omega(a,b)$| is computed as

Given a choice for |$h(t)$| and |$\kappa$|⁠, the form of |$K(s,t)$| will depend on |$\psi(t)$|⁠. Throughout this paper, we use the complex-valued Morlet wavelet and the real-valued Mexican hat wavelet as examples, since they are wavelets for which |$K(s,t)$| is analytically tractable.

2.2. Practical implementation

For continuous-time wavelet analysis, the wavelets themselves are often noncompactly supported. However, the region of significant support is typically well localized, and a close approximation to |$w(a,b)$| can be obtained through utilizing the approximating wavelet

For example, the Morlet wavelet |$\psi(t) = \pi^{-1/4}\exp({-t^2/2})\exp({{\mathrm i}\, 2\pi t})$| shown in Fig. 1 has infinite support, but can be well approximated by |$\bar\psi(t)$| for |$\alpha=8$|⁠. In practice, to speed up computation, it can make sense to use the approximating wavelet, as only a subset of the data is required to compute the wavelet transform. Hereafter, to simplify notation, we will use |$\psi(t)$| to represent both the original and the approximating wavelets, assuming that |$\alpha$| is chosen appropriately.

(a) The Morlet wavelet, showing its real (solid) and imaginary (dashed) parts; (b) the valid region for analysis, $\mathcal{T}_{\alpha,\kappa,T}$, plotted with time $b$ on the horizontal axis and scale $a$ on the vertical axis, as is convention.
Figure 1

(a) The Morlet wavelet, showing its real (solid) and imaginary (dashed) parts; (b) the valid region for analysis, |$\mathcal{T}_{\alpha,\kappa,T}$|⁠, plotted with time |$b$| on the horizontal axis and scale |$a$| on the vertical axis, as is convention.

In a finite data setting, we are restricted to regions of the time-scale space in which we can fairly evaluate (2) without the consequences of edge effects at either end of the data. These issues are compounded when smoothing across time; for a smoothing window |$h_\kappa(t)$| with |${\mathrm{supp}}(h) = (-\kappa/2,\kappa/2)$|⁠, the effective size of the support for |$K(s,t)$| is |$\alpha+\kappa$|⁠, so we restrict ourselves to values of |$a$| and |$b$| for which |${\mathrm{supp}}(K_{a,b})= (b-a(\alpha+\kappa)/2,\,b+a(\alpha+\kappa)/2)\times(b-a(\alpha+\kappa)/2,\, b+a(\alpha+\kappa)/2)\subseteq (0,T]\times (0,T]$|⁠. This defines an isosceles triangle |$\mathcal{T}_{\alpha,\,\kappa,\,T}\subset\mathbb{R}^2$| with vertices |$(0,0)$|⁠, |$(0,T)$| and |$(a_{\mathrm{max}}(T),T/2)$|⁠, where |$a_{\mathrm{max}}(T) = T/(\alpha+\kappa)$|⁠. This is an adaptation of the cone of influence of Mallat & Peyré (2008, p. 215) that also mitigates for smoothing distances. In practice, a positive minimum value of |$a$| should be imposed to ensure that a reasonable amount of event data exists in the smoothing range.

3. Multi-wavelet representation

3.1. Formulation

Since the kernel |$K(s,t)$| is continuous and nonnegative definite by construction, we can associate with |$K(s,t)$| a Hermitian linear operator |$T_K$|⁠, defined as |$ [T_K g](s) = \int_{-\infty}^{\infty}K(s,t)g(t)\,{\mathrm d} t. $| It follows from Mercer’s theorem (Mercer, 1909) that |$ K(s,t) = \sum_{l=0}^{\infty}\eta_{l} \varphi_{l}(s)\varphi^{*}_{l}(t) $|⁠, where |$\{\varphi_{l}(t),\, l=0,1,\ldots\}$| are the orthonormal eigenfunctions of |$T_K$| with nonzero eigenvalues |$\{\eta_{l},\, l=0,1,\ldots\}$| ordered by decreasing size. Because |${\mathrm{tr}} (T_K)= \int_{-\infty}^{\infty}K(t,t)\,{\mathrm d} t = 1$|⁠, it follows that |$\sum_{l=0}^{\infty} \eta_{l} = 1$|⁠. From here on, we refer to |$\{\varphi_{l}(t),\, l=0,1, \ldots\}$| as the eigenfunctions of |$K(s,t)$|⁠. The following proposition shows that these orthonormal eigenfunctions are themselves wavelets.

 
Proposition 1

Suppose that |$\psi(t)$| satisfies Assumption 1, |$h(t)$| satisfies Assumption 2, and for |$\kappa>0$| the corresponding nonnegative-definite kernel |$K(s,t)$| has eigenfunctions |$\{\varphi_{l}(t),\, l=0,1,\dots \}$|⁠. Every eigenfunction |$\varphi_{l}(t)$| with a nonzero eigenvalue is a wavelet that satisfies the conditions of Assumption 1.

We call the functions |$\{\varphi_l(t), \, l=0,1,\ldots\}$| eigen-wavelets.

Turning our attention back to the temporally smoothed wavelet periodogram, it is straight- forward to show that
Hence, the scaled and shifted versions |$\varphi_{l,a,b}(t)=a^{-1/2}\varphi_{l}\{(t-b)/a\}$||$(l=0,1,\ldots)$| of the eigen-wavelets are themselves the eigenfunctions of |$K_{a,\,b}$|⁠, and again from Mercer’s theorem we have |$ K_{a,b}(s,t) = \sum_{l=0}^{\infty}\eta_{l} \varphi_{l,a,b}(s)\varphi^{*}_{l,a,b}(t). $| The temporally smoothed wavelet periodogram can thus be represented as
(5)
where |$v_l(a,b) = \int_0^T \varphi_{l,a,b}(t)\,{\mathrm d} N(t)$| is the continuous wavelet transform of |$N(t)$| at scale |$a$| and translation |$b$| with respect to eigen-wavelet |$\varphi_l(t)$|⁠. Therefore, the temporally smoothed wavelet periodogram is equivalent to the weighted sum of wavelet spectra arising from the orthonormal eigen-wavelet system. This is analogous to multi-tapering (Thomson, 1982), and comparisons can also be drawn to the multi-wavelet spectrum of Cohen & Walden (2010b). In that setting, multiple orthogonal wavelets were derived in Olhede & Walden (2002) from a time-frequency concentration problem, whereas here we have shown that they can be generated by any arbitrary wavelet |$\psi(t)$| and smoothing window |$h(t)$|⁠.

The representation in (5) will be crucial for deriving the distributional results in § 4, as well as affording computational speed-up. In particular, we will make use of the following proposition, which shows that the effective frequency response of the eigen-wavelet system is equal to the frequency response of the generating wavelet |$\psi(t)$|⁠.

 
Proposition 2

Suppose that |$\psi(t)$| satisfies Assumption 1, |$h(t)$| satisfies Assumption 2, and for |$\kappa>0$| the corresponding nonnegative-definite kernel |$K(s,t)$| has eigenfunctions |$\{\varphi_{l}(t),\, l=0,1, \ldots\}$| and eigenvalues |$\{\eta_l,\,l=0,1,\ldots\}$|⁠. Then |$\sum_{l}\eta_{l}|\Phi_{l}(f)|^{2}=|\Psi(f)|^{2}$|⁠, where |$\Phi_l(f)$| and |$\Psi(f)$| are the Fourier transforms of |$\varphi_l(t)$| and |$\psi(t)$|⁠, respectively.

In general, closed-form expressions for the eigen-wavelets |$\{\varphi_l(t),\,l=0,1,\ldots\}$| will be unobtainable, and numerical procedures need to be used to find the solutions of |$ \int_{-\infty}^\infty K(s,t)\varphi(t)\,{\mathrm d} t = \eta\varphi(s). $| Details of an implementation of the Nystrom method for doing just this can be found in the Appendix.

3.2. Worked example

The Morlet wavelet can be viewed as a complex sinusoid enveloped with a Gaussian window, and so the wavelet transform at scale |$a>0$| and translation |$b$| is the Fourier transform of the tapered process, localized at |$b$| and evaluated at frequency |$1/a$|⁠. The temporally smoothed wavelet periodogram using a rectangular smoothing function
(6)
emits the kernel |$ K(s,t) = k(s,t)\exp\{-{\rm i}\, 2\pi(t-s)\}, $| where
with |$\mathrm{erf}(x) = \pi^{-1/2}\int_{-x}^{x}\exp(-t^2)\,{\mathrm d} t$| denoting the Gaussian error function. The real part of this kernel is shown in Fig. 2(a).
The kernel and first five eigen-wavelets for the Morlet wavelet in (a) and (b), and for the Mexican hat wavelet in (c) and (d), subject to a rectangular smoothing window of width $\kappa=10$. In panel (b) the solid and dashed lines represent the real and imaginary components, respectively.
Figure 2

The kernel and first five eigen-wavelets for the Morlet wavelet in (a) and (b), and for the Mexican hat wavelet in (c) and (d), subject to a rectangular smoothing window of width |$\kappa=10$|⁠. In panel (b) the solid and dashed lines represent the real and imaginary components, respectively.

The function |$k(s,t)$| is itself a real-valued nonnegative kernel with its own set of real-valued orthonormal eigenfunctions |$\{\phi_l(t),\,l=0,1,\ldots\}$| and associated eigenvalues |$\{\eta_l,\,l=0,1,\ldots\}$|⁠. It follows that |$\varphi_l(t) = \exp({{\rm i}\, 2\pi t})\phi_l(t)$| is an eigenfunction of |$K(s,t)$| with corresponding eigenvalue |$\eta_l$|⁠, and hence |$\{\varphi_l(t)=\exp({{\rm i}\, 2\pi t})\phi_l(t),\, l=0,1,\ldots\}$| is the eigen-wavelet system emitted by the Morlet wavelet with a rectangular smoothing function. The first five of these eigen-wavelets for |$\kappa=10$| are shown in Fig. 2(b). This eigen-wavelet system is in the same spirit as the generating Morlet wavelet, with the eigen-wavelets themselves being complex sinusoids enveloped by a taper. Thus, performing a continuous wavelet transform with one of the eigen-wavelets is equivalent to a time-localized tapered Fourier transform evaluated at frequency |$1/a$|⁠, and the temporally smoothed wavelet periodogram as represented in (5) is equivalent to a time-localized multi-taper spectral estimator. For comparison, the kernel and associated eigen-wavelets of the Mexican hat wavelet using a rectangular smoothing function are shown in Fig. 2 panels (c) and (d), respectively.

4. Statistical properties under stationarity

4.1. Preliminaries

Define the |$k$|th-order cumulant |$q$| of the differential process as

The following mixing condition (Assumption 2.2 in Brillinger, 1972) is sufficient for the asymptotic results that follow. It ensures that the dependency structure in the point process decays at a sufficient rate for central limit arguments to be invoked.

 
Assumption 3
The |$p$|-dimensional point process |$N(t)$| is strictly stationary, i.e., |$q_{i_1,\ldots,i_k}(t_1+t,\ldots,t_{k}+t) = q_{i_1,\ldots,i_k}(t_1,\ldots,t_k)$|⁠, and we set |$r_{i_1,\ldots,i_k}(u_1,\ldots,u_{k-1}) = q_{i_1,\ldots,i_k}(u_1,\ldots,u_{k-1},0)$|⁠. Furthermore, all moments exist, the cumulant function satisfies
and

The distributional results that follow differ slightly depending on whether a real-valued wavelet, such as the Mexican hat wavelet, or a complex-valued wavelet, such as the Morlet wavelet, is chosen. We present the results for a complex-valued wavelet and relegate the derivation for a real-valued wavelet to the Supplementary Material. For wavelet |$\psi(t)$| with Fourier transform |$\Psi(f)$|⁠, its central frequency is defined as the first moment of |$|\Psi(f)|^2$|⁠, namely |$f_{0}=\int_{0}^{\infty}f|\Psi(f)|^{2}\,{\mathrm d} f$| (Cohen & Walden, 2010a). The central frequency of |$\psi_{a,b}$| is therefore |$f_0/a$|⁠, and can be interpreted as the central analysing frequency of the wavelet at scale |$a$|⁠. For example, the Morlet wavelet has a central frequency of |$f_0 = 1$|⁠, and the Mexican hat wavelet has a central frequency of approximately |$f_0 = 0.21$|⁠. We further define the frequency concentration of |$\psi(t)$| to be the second central moment of |$|\Psi(f)|^2$|⁠, namely |$\sigma_\psi = \|(f-f_0)\Psi(f)\|_2$|⁠. It immediately follows from Proposition 2 that the central frequency and frequency concentration of the eigen-wavelet system are |$f_0$| and |$\sigma_\psi$|⁠, respectively.

 
Assumption 4

The wavelet |$\psi(t)$| is complex-valued, satisfies Assumption 1, has approximating support |$(-\alpha/2,\alpha/2)$| for some finite |$\alpha>0$|⁠, and is such that |$\sigma_\psi<\infty$|⁠. Furthermore, there exists a finite |$C$| such that |$\int|\psi(t+u)-\psi(t)|\,{\mathrm d} t<C|u|$| for all real |$u$|⁠, and |$\psi$| is orthogonal to its complex conjugate, i.e., |$\int_{-\infty}^{\infty}\psi(t)\psi(t)\,{\mathrm d} t = 0$|⁠.

The Morlet wavelet is an example of a complex-valued wavelet that satisfies Assumption 4.

 
Assumption 5

The smoothing function |$h(t)$| satisfies Assumption 2 and, moreover, there exists a finite |$C'$| such that |$\int|h(t+u)-h(t)|\,{\mathrm d} t<C'|u|$|⁠.

4.2. Asymptotic distributional results

We allow the wavelet to scale with |$T$| by defining |$\psi^{\scriptscriptstyle{(T)}}(t) = \{(\alpha+\kappa)/T\}^{1/2}\psi\{t(\alpha+\kappa)/T\}$|⁠, and appropriately normalize the scale and translation parameters as |$\tilde a = a(\alpha + \kappa)/T$| and |$\tilde b = b/T$|⁠, respectively. Under this rescaling (2) becomes
and the temporally smoothed wavelet periodogram becomes

For any |$T$|⁠, the valid region of analysis is normalized to |$\tilde{\mathcal{T}}_{\alpha,\,\kappa}$|⁠, an isosceles triangle with vertices |$(0,0)$|⁠, |$(0,1)$| and |$(1,1/2)$|⁠, whose interior contains all valid pairs of |$(\tilde a,\tilde b)$|⁠. Asymptotic results are presented for any fixed point |$(\tilde a,\tilde b)\in \tilde{\mathcal{T}}_{\alpha,\kappa}$| as |$T\rightarrow\infty$|⁠. In doing so we define the frequency |$f_{\tilde a} = f_0^{\scriptscriptstyle{(T)}}/\tilde a$|⁠, where |$f_0^{\scriptscriptstyle{(T)}}=\int_{0}^{\infty}f|\Psi^{\scriptscriptstyle{(T)}}(f)|^{2}\,{\mathrm d} f$| with |$\Psi^{\scriptscriptstyle{(T)}}(f)$| denoting the Fourier transform of |$\psi^{\scriptscriptstyle{(T)}}(t)$|⁠.

 
Proposition 3
Let |$N(t)$| be a |$p$|-dimensional stationary process with spectral density matrix |$S(f)$|⁠. Let |$\psi(t)$| be a wavelet satisfying Assumption 1, and let |$h(t)$| be a smoothing function satisfying Assumption 2. For all |$\kappa>0$| and for all |$(\tilde a,\tilde b)\in \tilde{T}_{\alpha,\kappa}$|⁠,
and |$E\{\Omega^{\scriptscriptstyle{(T)}}(\tilde a,\tilde b)\} = S(f_{\tilde a}) + O(T^{-2})$| as |$T\rightarrow\infty$|⁠.

In the following theorem, |$\mathcal{N}^{\mathcal{C}}_p(\mu,\Sigma)$| denotes the circular |$p$|-dimensional complex normal distribution with mean |$\mu$| and covariance matrix |$\Sigma$|⁠.

 
Theorem 1

Let |$N(t)$| be a |$p$|-dimensional stationary process satisfying Assumption 3 with spectral density matrix |$S(f)$|⁠, and let |$\psi(t)$| be a wavelet with central frequency |$f_0$| satisfying Assumption 4. The continuous wavelet transform |$w^{\scriptscriptstyle{(T)}}(\tilde a,\tilde b)$| is asymptotically |$\mathcal{N}^{\mathcal{C}}_p\{0,S(f_{\tilde a})\}$| as |$T\rightarrow\infty$|⁠, for all |$(\tilde a,\tilde b)\in\tilde{\mathcal{T}}_{\alpha,\kappa}$|⁠.

Let |$\mathcal{W}^{\mathcal{C}}_p(n,\Sigma)$| denote the |$p$|-dimensional complex Wishart distribution with |$n$| degrees of freedom and centrality matrix |$\Sigma$|⁠.

 
Theorem 2

Let |$N(t)$| be a |$p$|-dimensional stationary process satisfying Assumption 3 with spectral density matrix |$S(f)$|⁠. Let |$\psi(t)$| be a wavelet with central frequency |$f_0$| satisfying Assumption 4, let |$h(t)$| be a smoothing function satisfying Assumption 5, and for |$\kappa>0$| let |$\{\eta_l,\,l=0,1,\ldots\}$| be the eigenvalues of the kernel |$K(s,t)$| defined in (4). The temporally smoothed wavelet periodogram |$\Omega^{\scriptscriptstyle{(T)}}(\tilde a,\tilde b)$| is asymptotically |$(1/n)\mathcal{W}^{\mathcal{C}}_p\{n,S(f_{\tilde a})\}$| as |$T\rightarrow\infty$| for all |$(\tilde a,\tilde b)\in \tilde{\mathcal{T}}_{\alpha,\kappa}$|⁠, where |$n=1/(\sum_{l=1}^\infty\eta_l^2)$|⁠.

The following distributional result for the wavelet coherence is now immediate from Theorem 2 and Goodman (1963). Let |$_2 F_1(\alpha_1,\alpha_2;\beta_1;z)$| denote the hypergeometric function with 2 and 1 parameters |$\alpha_1$|⁠, |$\alpha_2$|⁠, |$\beta_1$| and scalar argument |$z$|⁠.

 
Corollary 1
Under the conditions of Theorem 2, the temporally smoothed wavelet coherence |$ \gamma_{ij}^2(\tilde a,\tilde b) = |\Omega^{\scriptscriptstyle{(T)}}_{ij}(\tilde a,\tilde b)|^2/\{\Omega^{\scriptscriptstyle{(T)}}_{ii}(\tilde a,\tilde b)\Omega^{\scriptscriptstyle{(T)}}_{jj}(\tilde a,\tilde b)\}$| between component processes |$N_i(t)$| and |$N_j(t)\ (i\neq j) $| asymptotically has density function
where |$\rho^2$| is shorthand for |$\rho_{ij}^2(f_{\tilde a})$|⁠, the spectral coherence between |$N_i(t)$| and |$N_j(t)$| at frequency |$f_{\tilde a}$|⁠.

In the case of the rectangular smoothing function given in (6), the effective degrees of freedom |$n$| scales linearly with |$\kappa$| according to the following proposition.

 
Proposition 4

Let |$\psi(t)$| satisfy Assumption 4, let |$h(t)$| be the rectangular smoothing function given in (6), and for |$\kappa>0$| let the corresponding kernel |$K(s,t)$| have ordered eigenvalues |$\{\eta_{l},\, l=0,1,\dots \}$|⁠. If |$\kappa>\alpha$|⁠, then |$ n = (\sum_{l=0}^{\infty}\eta_l^2)^{-1} = \kappa \{\int^{\infty}_{-\infty} |\mathcal{P}(x)|^2\,{\mathrm d} x\}^{-1} $|⁠, where |$\mathcal{P}(x) = \int_{-\infty}^\infty\psi(t)\psi^*(t-x)\,{\mathrm d} t$|⁠.

5. Test for stationarity

5.1. Formulation

Consider testing the null hypothesis |$H_0$| that |$N(t)$| is a stationary process against the alternative hypothesis |$H_{\rm A}$| that |$N(t)$| is nonstationary. Under |$H_0$| and from Proposition 3, it is true that |$E\{\Omega(a,b)\}$| is constant in |$b$|⁠. We therefore consider testing for stationarity at different scales. For convenience, we perform a dyadic partition of the time-scale , conducting a test at each scale in the set |$\{\tilde a_j=2^{-j},\, j=1,\ldots,J\}$|⁠. At scale |$\tilde a_j$|⁠, we partition time into |$2^j$| non-overlapping segments of equal size centred at time-points |$\{\tilde b_{j,\,k} = (2k-1)/(2^{j+1}),\,k=1,\ldots,2^j\}$|⁠, each with width equal to the approximate support of the wavelet at that scale. Our test at scale |$\tilde a_j$| therefore becomes a test of the null hypothesis
where |$\Omega_j$| is unspecified.

We construct a likelihood ratio test based on the asymptotic distribution of |$\Omega^{\scriptscriptstyle{(T)}}(\tilde a,\tilde b)$| stated in Theorem 1. To obtain asymptotic results we require the smoothing parameter |$\kappa$|⁠, and hence the degrees of freedom |$n$| given in Proposition 4, to grow with |$T$|⁠. Furthermore, we permit the maximum resolution of analysis |$J$| to also grow.

 
Assumption 6

We have |$\kappa = O(T^c)$| and |$2^{J} = o(T^{1/2-c})$| for some |$c\in(0,1/2)$|⁠.

 
Proposition 5
Let |$B_1,\ldots,B_K$| be independent samples where |$B_i\sim(1/n)\mathcal{W}^{\mathcal{C}}_p(n,\Sigma_i)\ (i=1,\ldots,K)$|⁠. The likelihood ratio test statistic for the null hypothesis |$H:\Sigma_1=\cdots =\Sigma_K=\Sigma$| with unspecified |$\Sigma$| is

Furthermore, when |$H$| is true, |$-2\log \{U(K)\}$| is asymptotically |$\chi^2_{f}$| as |$n\rightarrow\infty$|⁠, where |$f = (K-1)p^2$|⁠.

In the following proposition, we let |$\tilde U_j$| be a random variable that is equal in distribution to |$U(2^j)$| under the null hypothesis.

 
Proposition 6
Suppose that Assumptions 3–6 hold, and for a fixed |$j\in\{1,\ldots,J\}$| define the test statistic for |$H_j$| as
where |$K = 2^j$| and |$n$| is as given in Theorem 2. Under |$H_j$| we have |$V_j= \tilde U_j + o(1)$| in distribution as |$T\rightarrow\infty$|⁠.
 
Theorem 3

Suppose that Assumptions 3–6 hold. For a fixed |$j\in\{1,\ldots,J\}$|⁠, under |$H_j$| we have |$\mathrm{pr}\{-2\log (V_j)\leq x\} = \mathrm{pr}(\chi^2_{\nu_j}\leq x) + o(1)$| as |$T\rightarrow\infty$|⁠, where |$\nu_j = (2^j-1)p^2$|⁠.

With the following assumption, we can combine the test statistics for |$H_1,\dots ,H_J$| to form |$V= \prod_{j=1}^J V_j$| with which to test |$H_0$|⁠.

 
Assumption 7

The wavelet |$\psi(t)$| satisfies |$\int_{-\infty}^{\infty}\psi_{j,k}^{\scriptscriptstyle{(T)}}(t)\psi_{l,m}^{*\scriptscriptstyle{(T)}}(t)\,{\mathrm d} t = 0$| for all |$(j,k)\neq(l,m)$|⁠, where |$\psi^{\scriptscriptstyle{(T)}}_{j,k}(t)$| denotes the wavelet at the |$j$|th scale and |$k$|th translation (⁠|$j=1,\ldots,J$|⁠; |$k=1,\ldots,K$|⁠).

We remark that this is only an approximation for the Morlet wavelet.

 
Corollary 2

Suppose that Assumptions 3–7 hold. Under |$H_0$| we have |$\mathrm{pr}\{-2\log (V)\leq x\} = \mathrm{pr}(\chi^2_{\nu}\leq x) + o(1)$| as |$T\rightarrow\infty$|⁠, where |$\nu = \sum_{j=1}^{J}\nu_j = p^2(2^{J+1}-2-J)$|⁠.

5.2. Simulations

The following simulations all use a Morlet wavelet, and tests are conducted on |$H_1$|⁠, |$H_2$|⁠, |$H_3$| and the combined hypothesis |$H_0$| at the |$\alpha = 0.05$| level.

To demonstrate that the test asymptotically attains the nominal level, we consider a pair of independent homogeneous Poisson processes, each with intensity |$\lambda=1$|⁠, and increase |$T$|⁠. We let the smoothing parameter |$\kappa$| grow as |$\kappa = 10\, T^{1/4}$|⁠. The Type I error rate of the tests as a function of |$T$| is shown in Fig. 3(a). The effective support of a smoothed wavelet periodogram at scale |$j$| is |$2^{-j}T$|⁠; hence, convergence is fastest at |$j=1$|⁠.

Results for the simulation studies in § 5.2: (a) Type I error rate for a pair of independent homogeneous Poisson processes; (b) power as a function of $\theta$, the slope of the intensity function for the pair of independent inhomogeneous Poisson processes; (c) power as a function of $\alpha_{12}$, and $\alpha_{21}$, the mutual excitation parameter in the second segment of the piecewise-stationary bivariate Hawkes process; (d) power as a function of $\delta_0$, the magnitude of the excitation delay effect in the locally stationary Hawkes process. Tests are performed at the $\alpha=0.05$ level (grey solid line) on hypotheses $H_1$ (dashed), $H_2$ (dotted) and $H_3$ (dot-dash), and on the combined hypothesis $H_0$ (solid). In all cases, 1000 realizations of the processes were used.
Figure 3

Results for the simulation studies in § 5.2: (a) Type I error rate for a pair of independent homogeneous Poisson processes; (b) power as a function of |$\theta$|⁠, the slope of the intensity function for the pair of independent inhomogeneous Poisson processes; (c) power as a function of |$\alpha_{12}$|⁠, and |$\alpha_{21}$|⁠, the mutual excitation parameter in the second segment of the piecewise-stationary bivariate Hawkes process; (d) power as a function of |$\delta_0$|⁠, the magnitude of the excitation delay effect in the locally stationary Hawkes process. Tests are performed at the |$\alpha=0.05$| level (grey solid line) on hypotheses |$H_1$| (dashed), |$H_2$| (dotted) and |$H_3$| (dot-dash), and on the combined hypothesis |$H_0$| (solid). In all cases, 1000 realizations of the processes were used.

To study the power with respect to first-order nonstationarity, we consider a pair of independent inhomogeneous Poisson processes. The first has a linearly increasing intensity function |$\lambda_1(t) = \tan(\theta) t + 1 - (T/2)\tan(\theta)$|⁠; the second is linearly decreasing with intensity function |$\lambda_2(t) = \tan(-\theta) t + 1 - (T/2)\tan(-\theta)$|⁠. We vary |$\theta$|⁠, the slope of the intensity functions, between |$-\!\arctan(2/T)$| and |$\arctan(2/T)$|⁠; the process is stationary if and only if |$\theta = 0$|⁠. This model ensures that the expected number of events in |$(0,T]$| is the same for all |$\theta$|⁠. We set |$T = 1024$| and again take |$\kappa = 10\, T^{1/4}$|⁠, resulting in 22 degrees of freedom. The power is plotted in Fig. 3(b) as a function of |$\theta$|⁠. As expected, the power increases with |$|\theta|$|⁠, with the nominal level attained for small |$j$| at |$\theta =0$|⁠.

To study the power with respect to second-order nonstationarity, we consider a bivariate piecewise-stationary Hawkes process made up of three segments; see the Appendix for details. The first and third segments are the time intervals |$(0,256]$| and |$(768,1024]$|⁠, respectively. On these segments there exist a bivariate process with baseline intensities |$\nu_1=\nu_2 = 1$|⁠, and an exponential excitation kernel with excitation parameter |$\alpha_{11} = \alpha_{22} = 1$| and decay parameter |$\beta_{11} = \beta_{22} = 2$|⁠; there is no mutual-excitation between the processes, i.e., |$\alpha_{12} = \alpha_{21} = 0$|⁠. On the second segment, namely the time interval |$(256,768]$|⁠, there exists a bivariate Hawkes process with self-excitation parameters identical to those for the first and third segments, but now with mutual excitation between the individual processes. The decay parameters are set at |$\beta_{12} = \beta_{21} = 2$|⁠, and we examine the power as |$\alpha_{12}$| and |$\alpha_{21}$| are increased together to introduce increasing levels of mutual excitation. This leads to increasing deviation from the distribution of the process in the first and third segments. The baseline intensity on |$(256,768]$| is decreased accordingly to ensure that the expected number of events is constant across the entirety of the time interval |$(0,1024]$|⁠. Fig. 3(c) shows how the power increases with the value of |$\alpha_{12}$| in the second segment, i.e., as the distribution of the process in the second segment deviates further from that of the first and third segments.

The final example we consider consists of a pair of independent locally stationary Hawkes processes as developed in Roueff et al. (2016). We let each process have a baseline intensity of |$\nu_1 = \nu_2 = 0.5$| and a gamma excitation kernel of the form

This excitation kernel only allows an offspring event to be generated after a delay of |$\delta(\cdot)$|⁠, which varies as a function of time to introduce nonstationary second-order structure. In this example we set |$\delta(s)=1+\delta_0|1-2s|$|⁠; when |$\delta_0=0$| the process is stationary, and we can introduce time-varying second-order behaviour of increasing magnitude by increasing |$\delta_0$|⁠. The power as a function of |$\delta_0$| is plotted in Fig. 3(d). As expected, the power increases with |$\delta_0$|⁠, and the nominal level is attained at |$\delta_0 =0$|⁠.

5.3. Practical implementation

In practice, one must make a choice of |$\kappa$| and |$J$| that is suitable for both the dataset under analysis and the time scales on which one wishes to detect deviations from stationarity, while accounting for traditional trade-offs. Assumption 6 requires the growth rates of |$\kappa$| and |$J$| to be balanced such that |$2^J\kappa = o(T^{1/2})$|⁠. If analysing variation on small time scales, one benefits from lowering |$\kappa$| and increasing |$J$|⁠, at the cost of asymptotic limits being less appropriate. Conversely, asymptotic limits can be practically attained at the cost of losing temporal resolution. Therefore, a universal heuristic for choosing |$\kappa$| and |$J$| does not exist. If attaining the nominal level of the test is important, experiments with independent Poisson processes over a duration and intensity indicative of the data can be used to select values of |$\kappa$| and |$J$|⁠. Example results for experiments of this type are presented in the Supplementary Material.

6. Application to neural signalling

To illustrate the methods in practice, we analyse signalling regions within the lateral geniculate nucleus of a mouse. Specifically, we consider a set of neurons studied by Tang et al. (2015), who were primarily concerned with analysing firing properties in order to understand how visual signals are encoded and transmitted throughout the brain. To demonstrate the ability of our smoothed coherence estimator and stationarity test to operate with a single trial, we consider only a single firing sequence from Tang et al. (2015). In this case, the mouse was given a visual stimulus in the form of a liquid crystal display screen, showing a sinusoidal monochromatic drifting grating with spatial stimulus at a frequency of 0.04 cycles per second and a temporal flicker of 1 Hz. The firing pattern is 7 s in length and represents data for cells 108 and 117, which we selected for our example because they demonstrate relatively high firing rates. We use the Morlet wavelet with temporal smoothing parameter |$\kappa=10$| and approximating support parameter |$\alpha=8$|⁠. For completeness, this example was performed using exact kernel sampling; however, an approximate computation based on the Nystrom method gave visually indistinguishable results and |$p$|-values; see the Appendix.

The results of the analysis are displayed in Fig. 4. Tests for stationarity were performed at scale levels |$j=1$| and |$2$|⁠, informed by the analysis in the Supplementary Material, with dyadic sampling points marked by the crosses. With a |$p$|-value of 0.032, there is strong evidence that the process demonstrates nonstationary behaviour at the coarsest scale |$j=1$|⁠, corresponding to a scale of 0.25 s and a frequency of 4 Hz through the |$f=1/a$| relationship for a Morlet wavelet. However, there is not enough evidence to reject the hypothesis that the wavelet spectrum stays constant at the finer scale |$j=2$|⁠. With the parameters specified, the 95th percentile of the distribution for zero coherence is 0.593 and is represented by the black contour line. This indicates that the nonstationarity at |$j=1$| involves a change in the correlation between the two processes half-way through the experiment, with significant coherent signalling becoming present in the latter half.

Wavelet coherence estimated from the mouse neuron firing data with the Morlet wavelet. Black lines represent contours drawn at the 95th percentile of the distribution for zero coherence; see Corollary 1. Crosses mark the dyadic sampling points for the stationarity test, with arrows indicating the scale-specific $p$-values.
Figure 4

Wavelet coherence estimated from the mouse neuron firing data with the Morlet wavelet. Black lines represent contours drawn at the 95th percentile of the distribution for zero coherence; see Corollary 1. Crosses mark the dyadic sampling points for the stationarity test, with arrows indicating the scale-specific |$p$|-values.

Acknowledgement

This work was funded by the U.K. Engineering and Physical Sciences Research Council. The authors thank the reviewers for their insightful comments and suggestions. Special thanks go to Leigh Shlomovich of Imperial College London for developing the Hawkes process simulation code, and to Heather Battey, Dean Bodenham and Andrew Walden for stimulating conversations.

Supplementary material

Supplementary Material includes proofs, results for real-valued wavelets, a guide to practical implementation, and a link to a Matlab package for implementing the presented methods.

Appendix

Computing eigen-wavelets and eigenvalues

The Nystrom method (Kythe & Puri, 2001, Ch. 1) is an efficient method for computing the eigenfunctions of the kernel |$K(s,t)$| for the multi-wavelet representation described in § 3. We can approximate the integral by using the quadrature rule to solve the approximate eigenproblem |$\sum_{j=1}^n w_j K(s,t_j)\tilde{\varphi}_l(t_j)=\tilde{\eta} \tilde{\varphi}_l(s)$| for a discrete set of values for |$s$|⁠. The quadrature points |$\{t_1,\dots ,t_n\}$|⁠, where |$n$| is large, are regularly spaced across |$(-(\alpha+\kappa)/2,(\alpha+\kappa)/2)$|⁠, and the weights are set to |$w_j=(\alpha+\kappa)/n$|⁠. For simplicity, the Nystrom points |$\{ s_1,\ldots,s_n\}$| are taken to be |$\{t_1,\ldots,t_n\}$|⁠. In matrix form, the eigenproblem now becomes
where |$K$| is the |$\mathbb{R}^{n\times n}$| matrix |$\{K(s_{i},t_{j})\}$|⁠, |$\tilde\varphi=\{\tilde{\varphi}(t_{1}),\ldots,\tilde{\varphi}(t_{n})\}^{{ \mathrm{\scriptscriptstyle T} }}$| and |$W=\mathrm{diag}(w_{1},\ldots,w_{n})$|⁠. Solving this problem gives approximations to the first |$n$| eigenvalues and eigen-wavelets of |$K(s,t)$|⁠.
Should it be required, the Nystrom extension of the sampled vector |$\tilde\varphi_{l}=\{\tilde{\varphi}(s_{1}),\ldots,\tilde{\varphi}(s_{n})\}$| is the function

The sum in (5) is over an infinite set of eigen-wavelet periodograms. However, in practice, the size of the eigenvalues drops away rapidly, indicating that the kernel can be accurately reconstructed using only a small number of its eigen-wavelets; hence (5) can be approximated with only a small number of terms. For example, in the case of |$\kappa = 10$|⁠, the first nine eigenvalues contain 99.9|$\%$| of the overall energy.

Hawkes process

A stationary bivariate Hawkes process (Hawkes, 1971) contains both inter- and cross-dependencies, and is defined through its stochastic intensity function
where |$\nu=(\nu_1,\nu_2)^{ \mathrm{\scriptscriptstyle T} }$| is the baseline intensity vector and |$G(\cdot)$| is the matrix-valued excitation kernel. The diagonal elements of |$G(\cdot)$| characterize the self-exciting behaviour of each individual process, and the off-diagonal elements characterize the mutually exciting behaviour. The exponential excitation kernel used in § 5.2 is

For simulation we use the thinning algorithm of Ogata (1981).

References

Bartlett,
M. S.
(
1963
).
The spectral analysis of point processes
.
J. R. Statist. Soc. B
25
,
264
96
.

Brillinger,
D. R.
(
1972
).
The spectral analysis of stationary interval functions
. In
Proceedings of the Sixth Berkeley Symposium on Mathematical Statistics and Probability. Volume 1: Theory of Statistics
.
Berkeley, California
:
University of California Press
, pp.
483
513
.

Brillinger,
D. R.
(
1996
).
Some uses of cumulants in wavelet analysis
.
J. Nonparam. Statist.
6
,
93
114
.

Carter,
G.
(
1987
).
Coherence and time delay estimation
.
Proc. IEEE
75
,
236
55
.

Cohen,
E. A. K.
&
Walden,
A. T.
(
2010a
).
A statistical analysis of Morse wavelet coherence
.
IEEE Trans. Sig. Proces.
58
,
980
9
.

Cohen,
E. A. K.
&
Walden,
A. T.
(
2010b
).
A statistical study of temporally smoothed wavelet coherence
.
IEEE Trans. Sig. Proces.
58
,
2964
73
.

Goodman,
N.
(
1963
).
Statistical analysis based on a certain multivariate complex Gaussian distribution (an introduction)
.
Ann. Math. Statist.
34
,
152
77
.

Grinsted,
A
,
Moore,
J. C.
&
Jevrejeva,
S.
(
2004
).
Application of the cross wavelet transform and wavelet coherence to geophysical time series
.
Nonlin. Proces. Geophys.
11
,
561
6
.

Hawkes,
A. G.
(
1971
).
Spectra of some self-exciting and mutually exciting point processes
.
Biometrika
58
,
83
90
.

Kythe,
P. K.
&
Puri,
P.
(
2001
).
Computational Methods for Linear Integral Equations
.
New York
:
Springer
.

Mallat,
S.
&
Peyré,
G.
(
2008
).
A Wavelet Tour of Signal Processing
.
Amsterdam
:
Elsevier
, 3rd ed.

Mercer,
J.
(
1909
).
Functions of positive and negative type, and their connection with the theory of integral equations
.
Phil. Trans. R. Soc. A
209
,
415
46
.

Nason,
G.
(
2013
).
A test for second-order stationarity and approximate confidence intervals for localized autocovariances for locally stationary time series
.
J. R. Statist. Soc. B
75
,
879
904
.

Nason,
G. P.
,
von Sachs,
R.
&
Kroisandt,
G.
(
2000
).
Wavelet processes and adaptive estimation of the evolutionary wavelet spectrum
.
J. R. Statist. Soc. B
62
,
1
28
.

Ogata,
Y.
(
1981
).
On Lewis’ simulation method for point processes
.
IEEE Trans. Info. Theory
27
,
23
31
.

Olhede,
S. C.
&
Walden,
A. T.
(
2002
).
Generalized Morse wavelets
.
IEEE Trans. Sig. Proces.
50
,
2661
70
.

Preuss,
P.
,
Vetter,
M.
&
Dette,
H.
(
2013
).
Testing semiparametric hypotheses in locally stationary processes
.
Scand. J. Statist.
25
,
417
37
.

Roueff,
F.
&
Von Sachs,
R.
(
2019
).
Time-frequency analysis of locally stationary Hawkes processes
.
Bernoulli
25
,
1355
85
.

Roueff,
F.
,
Von Sachs,
R.
&
Sansonnet,
L.
(
2016
).
Locally stationary Hawkes processes
.
Stoch. Proces. Appl.
126
,
1710
43
.

Tang,
S.
,
Ardila Jimenez,
S. C.
,
Chakraborty,
S.
&
Schultz,
S. R.
(
2015
).
Visual receptive field properties of neurons in the mouse lateral geniculate nucleus
.
PLoS One
11
,
1
34
.

Thomson,
D. J.
(
1982
).
Spectrum estimation and harmonic analysis
.
Proc. IEEE
70
,
1055
96
.

Torrence,
C.
&
Webster,
P.
(
1999
).
Interdecadal changes in the ESNO-monsoon system
.
J. Climate
12
,
2679
90
.

Von Sachs,
R.
&
Neumann,
M. H.
(
2000
).
A wavelet-based test for stationarity
.
J. Time Ser. Anal.
21
,
597
613
.

Walden,
A. T.
(
2000
).
A unified view of multitaper multivariate spectral estimation
.
Biometrika
87
,
767
88
.

Welch,
P.
(
1967
).
The use of fast Fourier transform for the estimation of power spectra: A method based on time averaging over short, modified periodograms
.
IEEE Trans. Audio Electroacoust.
15
,
70
3
.

This is an Open Access article distributed under the terms of the Creative CommonsAttribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.

Supplementary data