CUSUM-Based Monitoring for Explosive Episodes in Financial Data in the Presence of Time-Varying Volatility *

We generalize the Homm and Breitung (2012) CUSUM-based procedure for the real- time detection of explosive autoregressive episodes in ﬁnancial price data to allow for time-varying volatility. Such behavior can heavily inﬂate the false positive rate (FPR) of the CUSUM-based procedure to spuriously signal the presence of an explosive episode. Our modiﬁed procedure involves replacing the standard variance estimate in the CUSUM statistics with a nonparametric kernel-based spot variance estimate. We show that the sequence of modiﬁed CUSUM statistics has a joint limiting null distribution which is invariant to any time-varying volatility present in the innovations and that this delivers a real-time monitoring procedure whose theoretical FPR is controlled. Simulations show that the modiﬁcation is effective in control- ling the empirical FPR of the procedure, yet sacriﬁces only a small amount of power to detect explosive episodes, relative to the standard procedure, when the shocks are homoskedastic. An empirical illustration using Bitcoin price data is provided.

variance estimate used by Homm and Breitung (2012) in calculating the CUSUM statistics with a nonparametric kernel-based spot variance estimate, designed to model the unknown variance path of the underlying innovations.
Under quite general conditions we show that the resulting sequence of modified CUSUM statistics has a joint limiting null distribution which is invariant to any time-varying volatility present in the innovations and that, as a result, this delivers a real-time monitoring procedure whose theoretical FPR is controlled. Indeed, these quantities are shown to coincide with those which obtain for the standard CUSUM procedure in the case of homoskedastic innovations. Monte Carlo methods are used to examine the empirical FPR and TPR of our proposed monitoring procedure. These results show that the empirical FPR of the modified procedure is well controlled in practice. Moreover, the efficacy of the modified procedure to detect an explosive episode, as measured by the empirical TPR, is shown to be little altered in the homoskedastic case, so that the cost (in terms of ability to detect an emerging explosive episode) of this additional robustness to time-varying volatility appears relatively small. We also show here that the presence of an explosive episode prior to the start of the monitoring period has little impact on the properties of our modified CUSUM procedure but can very substantially lower the empirical TPR of both the CUSUM-based procedure and the procedure of Astill et al. (2018).
The remainder of the paper is organized as follows. Section 1 outlines the autoregressive data generating process (DGP) we work with and outlines the assumptions under which our analysis will be conducted. In Section 2, we briefly review the CUSUM-based procedure of Homm and Breitung (2012) and demonstrate that it does not, in general, have a controlled FPR when time-varying volatility is present in the innovations. We then outline our modified CUSUM procedure and establish the large sample validity of this procedure. Issues concerning its practical implementation, including the selection of the bandwidth and kernel used in the context of the nonparametric spot variance estimator, are also discussed in this section. Our Monte Carlo study is reported in Section 3. An empirical illustration of our modified CUSUM monitoring procedure, using Bitcoin price data, is provided in Section 4. Section 5 concludes.
The specification of the DGP in (1) and (2) defines the series y t separately over two subsample periods: the period t ¼ 1; . . . ; T which will later form the training period in our analysis, and the period t ¼ T þ 1; . . . ; bkTc which will form the monitoring period for our procedure. Our model is such that y t follows a unit root process over the training period t ¼ 1; . . . ; T, while over the monitoring period, y t again follows a unit root process over the sub-periods t ¼ T þ 1; . . . ; bs 1 Tc and t ¼ bs 2 Tc þ 1; . . . ; bkTc, but crucially is subject to potentially explosive behavior in the period t ¼ bs 1 Tc þ 1; . . . ; bs 2 Tc when d > 0. In total there are bkTc observations with k > 1 a fixed constant. When d > 0, if s 1 ¼ 1 then the explosive regime will begin at the start of the monitoring period, while if s 2 ¼ k, the explosive regime will still be on-going at the end of the monitoring period. In the context of monitoring for explosive autoregressive behavior during the monitoring period, our null hypothesis is given by H 0 : d ¼ 0, with the corresponding alternative hypothesis being H 1 : d > 0.
Remark 1. The model considered in (1) and (2) does not allow for a collapse following the termination of the explosive regime. The model could easily be extended to allow for either an instantaneous collapse (as in, e.g., Phillips, Wu, and Yu, 2011), or a stationary collapse regime (as in, e.g., Harvey et al., 2016). However, when monitoring for an emerging explosive regime in real time, the nature of any post-explosive collapse has no bearing on the detection properties of the monitoring procedures. While some differences will arise when monitoring beyond the point at which an explosive regime terminates, this is a secondary consideration for the purposes of this paper and, as a consequence, we focus on the case of a non-collapsing explosive period for simplicity. Simulation results for models with collapse regimes in the monitoring period are available on request.
With respect to the error, e t , we allow for the possibility of non-constancy in its unconditional volatility by setting e t ¼ r t e t , such that r 2 t is the unconditional (spot) variance of e t and where e t is a homoskedastic innovation sequence. Precisely, we make the following assumptions regarding e t and r t , respectively: Assumption 1. e t is a martingale difference sequence with respect to the natural filtration generated by the sequence of e t ; fF t g, such that Varðe t jF tÀ1 Þ ¼ 1 and Eðe 4 t Þ < 1.
The function rð:Þ has support ½0; k and is strictly positive, continuously differentiable and uniformly bounded by a constant M. Furthermore, the derivative of rð:Þ is Lipschitz continuous over ð0; kÞ.
Remark 2. Assumption 1 imposes conditional homoskedasticity on the innovation sequence e t . This assumption is standard in the time-varying volatility literature; see, for example, Hansen (1995), Phillips and Xu (2006), Xu and Phillips (2008), Harris and Kew (2017), Boswijk and Zu (2018), Harvey, Leybourne, and Zu (2020), Harris, Kew, and Taylor (2020), and Boswijk and Zu (2021). The moment condition, Eðe 4 t Þ < 1, imposed by Assumption 1 is weaker than is usually made in the literature, where an assumption of the existence of the 8th moment is standard; an exception is Beare (2018) who makes a comparable finite 4 þ d; d > 0, moment assumption in connection with the unit root tests he develops for cases where the errors display time-varying volatility.
Remark 3. Assumption 2 allows for time-varying behavior in the unconditional volatility of e t including, among other things, smooth transition single or multiple level shifts and trending volatility which may also be subject to smooth breaks in the trend coefficient. The case of constant volatility, where r t ¼ r, for all t, also satisfies Assumption 2 because here rðsÞ ¼ r for all s. Discrete jumps in volatility are formally ruled out under Assumption 2 which imposes continuity on the volatility path rðÁÞ. This smoothness requirement on the volatility function is needed to obtain the uniform consistency results for our nonparametric kernel-based volatility estimator, which is in turn needed for our main result given in Theorem 1 to hold. The smoothness assumption is not restrictive in practice, because one can always approximate discontinuities in rðÁÞ arbitrarily well using smooth transition functions. The conditions imposed on the errors, e t , by Assumptions 1 and 2 are therefore considerably weaker than those of Homm and Breitung (2012) who assume that e t is independent and identically distributed (IID) with mean zero and constant variance, r 2 .

Real-Time Explosive Episode Detection Procedures
In this section, we briefly review the CUSUM monitoring procedure of Homm and Breitung (2012) and propose a modification to this procedure to allow for the possibility of timevarying volatility in the innovations. As mentioned in the introduction, Homm and Breitung (2012) also propose a second monitoring procedure, which they label FLUC, based on sequential DF statistics; see Equations (27) and (31) of Homm and Breitung (2012). We will not consider this DF-based procedure any further in this paper for the following reason. Where no detrending is undertaken, one could use the approach taken in Beare (2018) to develop a heteroskedasticity-robust version of the DF statistic used by Homm and Breitung (2012), based on the same estimator of r t used to modify the CUSUM statistic in Section 2.2. The resulting FLUC procedure would then share the same large sample properties as attained by the standard FLUC procedure under homoskedasticity. However, this approach does not seem extendable to the case where detrending is used as in Remark 10. In particular, Beare (2018) demonstrates that the limiting null distribution of his modified DF statistic in that case still depends on the volatility path, rðÁÞ. It would then appear infeasible to standardize this statistic in the way done in the homoskedastic case by Homm and Breitung (2012, p. 212) to ensure the boundary function j t used in Equation (31) of Homm and Breitung (2012) is positive when detrending is undertaken.

The Homm-Breitung CUSUM-Based Procedure
Under the additional assumption that e t is an IID process with mean zero and variance r 2 , and assuming a training period of t ¼ 1; . . . ; T as in (1) and (2), Homm and Breitung (2012) propose testing for explosive behavior in the monitoring period using the following CUSUM statistic: where t > T is the monitoring observation, and wherer 2 t is a consistent estimate of r 2 ; in their numerical work, Homm and Breitung (2012) use the first-difference estimator, Dy 2 j . Homm and Breitung (2012) show that if the CUSUM statistic, S t T , is computed sequentially at dates t ¼ T þ 1; . . . ; bkTc, then under the null hypothesis, H 0 , of no explosive behavior, then for any k > 1 and, hence, from Theorem 3.4 of Chu, Stinchcombe, and White (1996) for some t 2 fT þ 1; . . . ; bkTcgÞ exp ðÀb a =2Þ; where c t :¼ ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ffi b a þ logðt=TÞ p . The CUSUM monitoring procedure proposed in Homm and Breitung (2012) then rejects H 0 if S t T > c t ffiffi t p for some t > T, with an explosive episode signaled at the first time point t in the monitoring period for which such an exceedance occurs.
For such a (one-sided upper tail) test at the a ¼ 0:05 significance level, the appropriate asymptotic setting for b a used to compute c t is b a ¼ 4:6. Henceforth, we will refer to a monitoring procedure based on the S t T statistic as the (standard) CUSUM monitoring procedure.

A Time Varying Volatility-Robust CUSUM Procedure
A major drawback with this CUSUM monitoring procedure in practice is that the variance estimator,r 2 t , is only appropriate for the homoskedastic case. More specifically, under heteroskedasticity of the form given in Assumption 2, the result in Equation (4) This limiting process clearly depends on the time-varying volatility process, rðÁÞ, reducing to the result in Equation (4) only in the case where rðÁÞ is constant. As a consequence, Theorem 3.4 of Chu, Stinchcombe, and White (1996) can no longer be applied to give the result in Equation (5). Monte Carlo simulation evidence provided in Astill et al. (2018) confirms this asymptotic prediction with the empirical FPR of the CUSUM procedure shown to be severely impacted under various patterns of time-varying volatility. The simulation results we report in Section 3 provide further confirmation of this. In order to address this problem, we propose robustifying the standard CUSUM procedure to time-varying volatility by modifying the CUSUM statistic in Equation (3) such that each observation on Dy j in Equation (3) is standardized by a Nadaraya-Watson-type kernel-based nonparametric spot variance estimator as is done in, among others, Hansen (1995), Xu and Phillips (2008), and Beare (2018). Specifically, we consider the following heteroskedasticity-robust version of the CUSUM statistic wherer 2 j;N is a kernel smoothing estimator for the spot variance r 2 j :¼ r 2 ðj=TÞ. The kernel smoothing estimator is defined as follows for j P N þ 1: where Kð:Þ is a kernel function and where N denotes the bandwidth (such that the number of observations used in the kernel smoothing is N þ 1). For completeness, we definer 2 j;N ¼ r 2 Nþ1 for j 6 N, although in practical situations such definitions are not generally required as the point where monitoring begins (t ¼ T þ 1) is typically (much) larger than the bandwidth parameter N. Henceforth, we will refer to a monitoring procedure based on the modified CUSUM statistic, SV t T , as the CUSUM V monitoring procedure. We will make the following technical assumptions on the kernel function used in Equation (7): Assumption 3. Kð:Þ is continuously differentiable over the interval (0, 1), with K(x) ¼ 0 Ð 1 0 jKðxÞxjdx < 1 and the characteristic function /ðtÞ ¼ Ð 1 À1 exp ðitxÞKðxÞdx of K satisfies Ð 1 À1 j/ðtÞjdt < 1. K 0 ð:Þ, the derivative of the Kð:Þ function, also has a characteristic function that is absolutely integrable.
Remark 4. Our assumption of a one-sided kernel function Kð:Þ which is positive on the unit interval, (0, 1), implies that we are using a rolling window type filter to estimate the spot variance r 2 j . This one-sided kernel assumption should not be viewed as a restriction, however, because in our real-time monitoring context we would obviously not have access to future data. Allowing for a kernel which is positive on ð0; 1Þ is technically possible, but we prefer to work with a rolling window using recent lagged data as this has computational advantages in our monitoring context, especially at long monitoring horizons. Notice that our conditions on the kernel function impose that Kð0Þ ¼ 0 and Kð1Þ ¼ 0, which implies that the current observation is left out when estimating the volatility. These restrictions are necessary for proving our crucial intermediate result in Lemma 2. Examples of kernels which satisfy Assumption 3 include the rectangular, Epanechnikov, Bartlett, and truncated Gaussian kernels (with their boundary values adjusted to 0 in each case).
Before deriving our main result in Theorem 1, in Lemmas 1 and 2 we first provide two important intermediate results relating to the large sample properties of the nonparametric variance estimator,r 2 j .
Lemma 1. Let y t be generated according to Equations (1) and (2) under H 0 : d ¼ 0 and let Assumptions 1-3 hold. Then, if T; N ! 1 such that N=T ! 0 and N 2 =T ! 1, then max Tþ1 6 j 6 bkTc jr 2 j;N À r 2 j j ¼ o p ð1Þ: Lemma 2. Let y t be generated according to Equations (1) and (2) under H 0 : d ¼ 0 and let Assumptions 1-3 hold. Then if T; N ! 1 such that N=T ! 0 and N 3=2 =T ! 1, then max Tþ1 6 j 6 bkTc jðr 2 jÀ1;N À r 2 jÀ1 Þ À ðr 2 j;N À r 2 j Þj ¼ o p ðT À1 Þ: Remark 5. Lemma 1 establishes the necessary rate condition on the bandwidth, N, such that our nonparametric variance estimator,r 2 j;N of Equation (7), is uniformly consistent. Lemma 2 establishes a similar result for the numerical derivative of the nonparametric variance estimator, showing that a stronger rate condition is needed on the bandwidth than was needed for the consistency result in Lemma 1.
Theorem 1. Let the conditions of Lemma 2 hold. Then, T À1=2 SV bTrc T ) WðrÞ À Wð1Þ; 1 < r k: Remark 6. The result in Theorem 1 demonstrates that for any pattern of time-varying volatility satisfying Assumption 2, the joint limiting null distribution of the sequence of modified CUSUM statistics in Equation (6) coincides with that which obtains for the standard CUSUM statistics in Equation (3) when e t is homoskedastic. An immediate consequence of this, given in Corollary 1 below, is that we can therefore apply Theorem 3.4 of Chu, Stinchcombe, and White (1996) to obtain a result corresponding to Equation (5) with the implication that our modified CUSUM monitoring procedure will have a controlled theoretical FPR even in the presence of time-varying volatility of the form specified by Assumption 2.
Remark 7. The proof strategy used to establish the result in Theorem 1 extends the approach used in Beare (2018), which is based on a uniform consistency result for the nonparametric variance estimator and its derivative (cf., Lemma 4.1 and Lemma 4.2 of Beare, 2018). By construction, our nonparametric variance estimatorr 2 bsTc;N is not differentiable with respect to s and, hence, the proof strategy used in Beare (2018) is not directly applicable. We therefore extend the proof strategy used in Beare (2018) and build our proof upon a corresponding result for the numerical derivative ofr 2 bsTc;N , as given in Lemma 2.
By Theorem 3.4 of Chu, Stinchcombe, and White (1996), Theorem 1 implies the following result for SV t T .
Corollary 1. Under the same conditions as Theorem 1 for some t 2 fT þ 1; . . . ; bkTcgÞ expðÀb a =2Þ: Remark 8. The result given in Corollary 1 has two main implications. First, in the case where the innovations are homoskedastic, using the nonparametric spot volatility estimator r j leads to the same limiting null distribution and crossing probabilities for both the CUSUM and CUSUM V procedures. Second, and more importantly, when the innovations display time-varying volatility of the form outlined in Assumption 2, both the limiting null distribution and crossing probability for the CUSUM V procedure are unchanged relative to those which obtain in the homoskedastic case. This stands in contrast to the CUSUM procedure which requires homoskedasticity for Equation (4), and therefore Equation (5), to hold. The CUSUM V procedure therefore offers robustness to time varying volatility in a way that the standard CUSUM procedure does not.
Remark 9. It is interesting to note the asymptotic results given in Theorem 1 and Corollary 1 remain valid in the case where a finite number of level breaks are present in the DGP for y t in Equation (1). This holds by virtue of the fact that SV t T is calculated using the first differences of y t . Consequently, any level breaks present in the levels data, y t , are transformed to one-period outliers in the first differenced data which therefore have no impact on the large sample properties of SV t T .
Remark 10. The model in Equations (1) and (2) allows for a non-zero mean in y t through the presence of l in Equation (1). In most applications of tests for explosive episodes, allowing for a non-zero mean in y t is considered sufficient. However, in some circumstances it may be desirable to allow for the possibility that the expected value of y t follows a linear trend. In this case, we would replace l in Equation (1) by l þ bt. As discussed in Homm and Breitung (2012, p. 212), the CUSUM-type procedures discussed in Section 2.1 can be modified to allow for the presence of a linear trend in Equation (1) by replacing Dy j in the CUSUM statistic in Equation (3) with the standardized recursive residuals ffiffiffiffiffi ffi jÀ1 j q ðDy j Àl jÀ1 Þ and redefining the variance estimator asr 2 t :¼ ðt À 2Þ À1 P t j¼2 ðDy j Àl t Þ 2 where, for any Dy l . The large sample properties of the resulting sequence of CUSUM statistics are unchanged from that given in Section 2.1. The heteroskedasticity-robust CUSUM statistic defined in Equation (6) can be similarly modified to allow for a linear trend by again replacing Dy j in the numerator of Equation (6) by ffiffiffiffiffi ffi jÀ1 j q ðDy j Àl jÀ1 Þ and replacing Dy 2 jÀs in Equation (7) by ðDy jÀs Àl j Þ 2 , and similarly in Equations (8) and (9) in Section 2.3; again this will not affect the large sample results stated in Section 2.2. We repeated all of the simulation experiments reported in Section 3 (which only allow for a non-zero mean) with this correction for a linear trend implemented and found these results to be almost identical to those reported. These results are available on request.
Remark 11. Thus far e t in Equation (2) has been assumed serially uncorrelated. This can be weakened to allow for the case where e t admits a finite-order autoregression of the form / j z j , such that /ðzÞ 6 ¼ 0 for all jzj 1, and where e t and r t continue to satisfy Assumptions 1 and 2, respectively. The CUSUM-type procedures discussed in Sections 2.1 and 2.2 can be modified to allow for such serial correlation by using pre-whitening. This is done by replacing Dy j in Equations (3) and (6) bỹ e j :¼ Dy j À/ 1;j Dy jÀ1 À Á Á Á À/ p;j Dy jÀp , where/ i;j ; i ¼ 1; . . . ; p, are the OLS autoregressive lag estimates from the prewhitening regression of Dy s on fDy sÀi g p i¼1 , over the sample data s ¼ p þ 2; . . . ; j. Similarly, redefine the variance estimator used in connection with ðDy j À/ 1;t Dy jÀ1 À Á Á Á À/ p;t Dy jÀp Þ 2 . With regard to the kernel-based estimator in Equation (7) we need to replace Dy 2 jÀs by ðDy jÀs À/ 1;j Dy jÀsÀ1 À Á Á Á À/ p;j Dy jÀsÀp Þ 2 ; s ¼ 0; . . . ; N, and similarly in the associated cross-validation criteria in Equations (8) and (9) in Section 2.3. 1 If the lag order, p, is known, then the estimates of the autoregressive lag coefficients defined above are consistent at rate T 1=2 under H 0 ; see Phillips and Xu (2006). As a result, we conjecture that the limiting results given previously will continue to hold under this modification. In most practical applications a very low autoregressive order, either p ¼ 0 or p ¼ 1, is typically assumed. In practice, p could in principle be determined using any consistent model selection criterion, an obvious example being the Bayesian information criterion (BIC) of Schwarz (1978). 2 The next theorem establishes the large sample behavior of SV t T under the alternative hypothesis: Theorem 2. Under the same conditions as Theorem 1, but under H 1 : for some t 2 fT þ 1; . . . ; bkTcgÞ ¼ 1: The result in Theorem 2 demonstrates that our modified CUSUM monitoring procedure is consistent under H 1 , rejecting the false null of no explosivity with probability one in the limit.

Implementation Issues: Bandwidth and Kernel Selection
The practical implementation of SV t T requires choices to be made for both the kernel and bandwidth used in constructing the nonparametric estimatorr 2 j;N . We will now discuss these two choices, providing recommendations for each.
In general, the choice of kernel tends to be much less crucial for the finite sample performance of nonparametric kernel-based estimators than is the bandwidth, and we found this general rule to hold true for our particular nonparametric estimator,r 2 j;N . We conducted finite sample simulations using a number of kernels (in particular the boundary value adjusted truncated Gaussian, rectangular, Epanechnikov, and Bartlett kernels), and found little difference between the empirical FPR and TPR profiles of our proposed monitoring procedure across these different choices. Throughout the remainder of the paper, results are reported for the truncated Gaussian kernel; results for the other kernels mentioned above are available on request.
In practice, it is the choice of bandwidth that is crucial to the performance of nonparametric estimators such asr 2 j;N . Other things being equal, adopting too large a bandwidth results in oversmoothing which leads to increased bias in the volatility estimator, while using too small a bandwidth results in undersmoothing which leads to an increased variance in the resulting volatility estimator, both of which will have a detrimental impact on the empirical FPR and TPR of the resulting procedure. As is commonly done in the literature, we adopt a data-driven method for selecting the bandwidth in order to automate the decision on how to trade off the bias and variance of the estimator. To this end, we propose selecting the bandwidth according to a standard cross-validation procedure. Specifically, for a given time period in the monitoring period t ¼ T þ 1; . . . ; bkTc, first define the cross validation criterion 2 A linear trend can also be allowed for as described in Remark 10, by analogous demeaning of the prewhitened residuals. Here, an intercept also needs to be included in the pre-whitening regression.
The CV t ðNÞ criterion is essentially an estimate of the mean integrated squared error (MISE) of the variance estimator for a given bandwidth N. The automated bandwidth, denoted N cv t , is then chosen to be the bandwidth that minimizes the (estimated) MISE and is therefore defined as N cv t :¼ argmin N CV t ðNÞ. Following Härdle, Hall, and Marron (1988), N cv t ¼ OðT 4=5 Þ, and so this choice of bandwidth easily satisfies the rate restriction placed on the bandwidth in Theorem 1.
The MISE-minimizing cross-validation procedure discussed above can be interpreted as a "global" procedure in that it attempts to minimize the error we make in estimating the spot variance from time j ¼ T þ 1 to the current period j ¼ t. In the context of monitoring, however, minimizing CV t ðNÞ across the full range ofr 2 j;N ; j ¼ T þ 1; . . . ; t, may not be appropriate. What is important is not how well we estimate the spot variance using the entire monitoring period, but how well this variance is estimated in the immediate neighborhood of the monitoring time period t. As such, we may also consider a "local" cross-validation procedure where CV t ðNÞ is instead defined as and we select the bandwidth, denoted N cv t , according to N cvÃ t :¼ argmin N2½1;H CV Ã t ðNÞ so that the bandwidth is instead selected to minimize the estimation error of the spot variance over the most recent H observations; cf., Hall and Schucany (1989). We recommend the use of the automated bandwidth N cvÃ t in practice and will use this choice in both our numerical simulations and empirical exercise. Implementation of N cvÃ t requires the user to make a choice for the tuning parameter, H; this will be further explored in Section 3.

Finite-Sample Simulations
In this section, we compare the finite sample performance of our proposed CUSUM V monitoring procedure with the standard CUSUM monitoring procedure of Homm and Breitung (2012) and also with the MAX m monitoring procedure of Astill et al. (2018); the latter is, like CUSUM V , robust to the presence of time-varying volatility in the errors.
The MAX m monitoring procedure of Astill et al. (2018) is based on the sequential application of the statistic where m is a user chosen window width. The MAX m procedure then signals the presence of an explosive episode if at any point t, T < t bTkc, during the monitoring period the statistic S t;m exceeds the maximum value across the corresponding sequence of statistics Astill et al. (2018) demonstrate that an approximation to the FPR of this procedure is given by The FPR of the MAX m monitoring procedure at any point t, T < t bTkc, in the monitoring period can be computed using Equation (10) by replacing bkTc with t. Observe that this FPR is a function of the length of the training period, T and the window width, m, used in the S t;m statistics.
In our simulations, data were generated according to the DGP Equations (1) and (2) with e t $ NIIDð0; 1Þ, setting l ¼ 0 without loss of generality, and with the DGP initialized at u 0 ¼ 100 so that generated data remain positive and thereby any explosive episodes will typically appear as having upward trajectories, thus mimicking what is observed in an asset price bubble. We assume that monitoring begins at time t ¼ 220 and set the training period sample size to T ¼ 219. We set kT ¼ 255, such that we have a (maximum) monitoring period of 36 observations. For the MAX m procedure, we use a single window width of m ¼ 10 as recommended by Astill et al. (2018). By assuming a common monitoring start date for all procedures of T ¼ 220, we treat the sample t ¼ 1; . . . ; T as the training period for the CUSUM and CUSUM V procedures, while the training period for the MAX 10 procedure is given by the sample t ¼ 1; . . . ; T À m þ 1. Homm and Breitung (2012) show that the asymptotic critical values for the CUSUM procedures implied by Equation (5) are very conservative in practice, being based on an assumption of a monitoring period of infinite length. As such, and to aid comparison with the MAX 10 procedure, finite sample critical values for the CUSUM and CUSUM V monitoring procedures were obtained by choosing a value of b a such that for a homoskedastic DGP (r t ¼ 1), the empirical FPR of these procedures is equal to the FPR of the MAX 10 procedure, determined by Equation (10), when the latter has a FPR of 0.10. Therefore, in the simulations that follow the CUSUM and CUSUM V monitoring procedures were performed using b a ¼ 0:147 and b a ¼ 0:177, respectively. All simulations were conducted in Gauss 9.0 using 10,000 replications.
The bandwidth used in connection with the kernel-based spot variance estimator used in the CUSUM V procedure was selected at each point in the monitoring period using the local cross validation procedure in Equation (9). We experimented with the tuning parameter H and found the robustness of the empirical FPR of the CUSUM V procedure to timevarying volatility to be decreasing in the choice of H, whereas the empirical TPR of the procedure was found to be increasing in H. Our experiments suggested that setting H ¼ 20 delivered a procedure with the best trade-off between these two considerations and we will use this choice in all of the numerical and empirical work that follows. As discussed in Section 2.3, the reported results are for a truncated Gaussian kernel.

Empirical FPRs under H
We first simulate the FPRs of the CUSUM, CUSUM V , and MAX 10 procedures under the null hypothesis H 0 : d ¼ 0 for cases where: (i) the errors e t are homoskedastic (r t ¼ 1) and (ii) where the errors exhibit time-varying volatility. In the latter case, we first consider smooth shifts in volatility of the form that is, a logistic smooth transition in volatility from 1 to 1 þ a when h > 0, and from 1 þ a to 1 when h < 0. In each case, the transition speed and timing of the transition are governed by jhj and T b , respectively. Figure 1 reports the empirical FPRs of the three procedures when a smooth shift in volatility occurs, with jhj ¼ 0:25 with T b ¼ T so that the transition is centered around the starting date of the monitoring period. For each time point e, T þ 1 e bTkc, the corresponding point on the curves in the figure represents the empirical FPR of the particular procedure run from time t ¼ T þ 1 until time t ¼ e. We consider the cases a ¼ f0; ffiffiffi 2 p À 1; ffiffiffi 3 p À 1; ffiffiffi 4 p À 1g; here, a ¼ 0 represents the benchmark case of homoskedasticity (in which case the value of h is irrelevant), while for a 6 ¼ 0, the variance changes (increases) from 1 to 2, 3, or 4 when h ¼ 0:25, and (decreases) from 2, 3, or 4 to 1 when h ¼ À0:25. The red vertical dashed line on each graph represents the time at which the FPRs of the CUSUM, CUSUM V , and MAX 10 procedures are equal to 0.10 for the case of a homoskedastic DGP. Figure 1(a) reports the homoskedastic case (a ¼ 0) and here all three procedures have very similar FPR profiles across the range of end-of-monitoring dates e, T þ 1 e bTkc. Figure 1(b)-(d) reports the FPR of the procedures when an upward shift in volatility occurs with h ¼ 0:25. The FPR of the CUSUM procedure is seen to be inflated to a large degree relative to the homoskedastic case; for the largest value of a considered this FPR exceeds 0.33 at time e ¼ 241, this being the point at which the procedure is calibrated to have an FPR of 0.10 under homoskedasticity. The FPR of the CUSUM V procedure is inflated to some extent relative to the case where the errors are homoskedastic, but to nowhere near the extent of the CUSUM procedure. For the largest value of a considered, the FPR of the CUSUM V procedure at time e ¼ 241 is about 0.13. The FPR of the MAX 10 procedure is barely impacted by any shifts in volatility. Figure 1(e)-(g) reports the FPR of the procedures for the smooth downward shift in volatility cases with h ¼ À0:25. The FPR of the CUSUM procedure is severely deflated relative to the homoskedastic case, with this feature again most apparent for larger values of a. For the largest a considered, the FPR of the CUSUM procedure does not exceed 0.05 even by the very end of the monitoring period. The FPR of the CUSUM V procedure is slightly deflated relative to the homoskedastic case, but again this is modest in comparison with the CUSUM procedure. As with upward volatility shifts, the FPR of the MAX 10 procedure is little affected.
We next consider cases where the volatility shift is centered around a date before or after the monitoring period commences. Figure 2 reports the FPR of the three procedures when T b ¼ T þ 10 so that the mid-point of the volatility shift occurs shortly after the start of monitoring. Figure 2(a) is a repeat of Figure 1(a) for reference purposes, while Figure 2(b)-(d) reports the FPRs when the volatility shift is upward. The relative FPR inflation exhibited by the test procedures is broadly similar to the case where the smooth volatility shift is centered around the start of the monitoring period, with the CUSUM procedure displaying the largest degree of FPR inflation and the CUSUM V procedure again displaying only a modest degree of FPR inflation relative to the homoskedastic case. The FPR of the MAX 10 procedure is again almost unchanged from the homoskedastic case. Figure 2(e)-(g) reports the FPRs for the cases of downward shifts in volatility. Again we see broad similarity with the corresponding panels of Figure 1. Finally, Figure 3 reports results for T b ¼ T À 10 so that the volatility shift mid-point occurs shortly before the commencement of monitoring. Once again, the FPR of the CUSUM procedure is severely impacted, with a large degree of FPR inflation or deflation exhibited for h ¼ 0:25 and h ¼ À0:25, respectively. The FPR of the CUSUM V procedure is less impacted by a smooth shift in volatility centered at this point in time than for shifts centered at dates where monitoring has already commenced, with this result explained by the fact that volatility shifts that occur before the commencement of monitoring allow greater time for the spot variance estimate in Equation (7) to adapt to the transitioning volatility path. Again, the FPR of the MAX 10 procedure is very similar to the homoskedastic case.

Empirical TPRs under H 1 : d>0
We now turn to an examination of the empirical TPRs of the CUSUM, CUSUM V , and MAX 10 procedures to detect an emergent explosive episode in the monitoring period. We initially concentrate on the homoskedastic case a ¼ 0. We will consider two possible starting dates for the explosive regime in Equations (1) and (2), namely bs 1 Tc ¼ f220; 230g and generate explosive regimes of length 25 observations, so that bs 2 Tc ¼ bs 1 Tc þ 25. We also vary the magnitude of the offset to the autoregressive parameter driving the explosive regime by considering the settings d ¼ f0:004; 0:006; 0:008; 0:010g. The results are reported in Figure 4. Henceforth, the time periods over which an explosive regime is present are identified by gray shaded areas in each figure. As would be expected, the best overall TPR profile is displayed by the CUSUM procedure, as this procedure is specifically calibrated for data generated from the homoskedastic case. The TPR of the CUSUM V procedure is, encouragingly, very close to that of the CUSUM procedure, so it appears that the FPR robustness of the CUSUM V procedure to time-varying volatility in the errors does not come at the expense of significantly reduced power to detect an explosive regime. The TPR profile of the CUSUM V procedure is far superior overall than that of the MAX 10 procedure, with the TPR of MAX 10 only marginally higher than that of the CUSUM-based procedures for a small number of observations at the beginning of the explosive episode, at which point all of the TPRs are still very close to the corresponding FPRs. We next consider the TPRs of the three procedures when a smooth volatility shift is present in the data. Figure 5 reports the TPRs when the volatility shift is upward and centered at the commencement of the explosive episode; this timing ensures that the start of the explosive episode coincides with the middle of the volatility transition, a situation that is arguably of substantial empirical relevance given that periods of explosivity are often accompanied by large changes in volatility. Specifically, we set bs 1 Tc ¼ f220; 230g; bs 2 Tc ¼ bs 1 Tc þ 25; h ¼ 0:25, and T b ¼ bs 1 Tc, focusing on the case d ¼ 0:007 (the average of the set of d values considered in Figure 4). Results are reported for the same set of values of a as considered in Figures 1-3 (including the homoskedastic case of a ¼ 0). When a smooth upward volatility shift occurs the TPR of the CUSUM procedure is much higher than the other two procedures, but this result is of course purely an artifact of the significant FPR inflation exhibited by the CUSUM procedure when such a volatility shift is present in the data. Of the two procedures which offer broad FPR robustness to an upward volatility shift, namely CUSUM V and MAX 10 , it is the CUSUM V procedure which exhibits by far the superior TPR profile.  τ 1 T = 220, a = 0 Figure 5. TPR-smooth upward volatility shift. (a) bs 1 T c ¼ 220, Figure 6 reports the TPRs for the case of a smooth downward shift in volatility, using the same settings as for Figure 5 but with h ¼ À0:25. Here, we observe that the TPR of the CUSUM procedure is severely impacted, which is again a consequence of its corresponding FPR deflation in the case of a downward volatility shift, with this impact being more pronounced the larger is the value of a. The best overall TPR profile is arguably displayed by the CUSUM V procedure, although as a increases the region where the MAX 10 procedure offers TPR advantages over CUSUM V for observations early in the explosive regime become somewhat more pronounced; this feature is in line with the modest FPR deflation exhibited by the CUSUM V procedure in the presence of a downward volatility shift.

The Impact of an Explosive Episode in the Training Period
We next assess the impact that a single collapsed explosive episode in the training period has on both the empirical FPR and empirical TPR of the CUSUM, CUSUM V , and MAX 10 detection procedures. To that end, data were generated according to y t ¼ u t with t ¼ 1; . . . ; bs 1;p Tc; ð1 þ d p Þu tÀ1 þ e t ; t ¼ bs 1;p Tc þ 1; . . . ; bs 2;p Tc; u bs1;pTc þ e t t ¼ bs 2;p Tc þ 1; and e t ¼ e t $ NIIDð0; 1Þ, thereby focusing on the homoskedastic case (cf., a ¼ 0 in the previous subsections). The series y t therefore admits a single collapsed explosive episode in the training period of length bs 2;p Tc À bs 1;p Tc :¼ l p driven by an explosive offset of d p . A further explosive episode will occur in the monitoring period if d > 0. Figure 7 reports the empirical FPRs of the procedures in the case where bs 1;p Tc ¼ 95 for two possible lengths, l p ¼ f10; 15g and four explosive offsets d p ¼ f0:004; 0:006; 0:008; 0:010g: In all cases, the FPR of the CUSUM V procedure is seen to be unaffected by the presence of these explosive episodes in the training period. This is because the local cross-validation procedure in Equation (9) used to select the bandwidth entails that these explosive observations from the training period receive zero weight in the construction of the spot variance estimator used in the CUSUM V monitoring statistics. By contrast, the empirical FPRs of both the CUSUM and MAX 10 procedures are very significantly deflated when a training period explosive episode is present, as is clear from a comparison of Figure 7 with Figure 1(a). The impact of an explosive episode in the training period is seen to be greater the longer the length of that training period episode and the larger the explosive offset driving this episode. Figure 8 reports the TPRs of the three procedures to detect an explosive episode in the monitoring period in the case where an explosive episode occurred during the training period. We report results for a single setting for the training period episode for l p ¼ 15 and d p ¼ 0:007; qualitatively similar patterns emerge for other settings. The explosive episode generated in the monitoring period for this setting are set to be identical to those reported in Figure 4. A comparison of the results in Figure 8 with those in Figure 4 shows that the τ 1 T = 220, a = 0 Figure 6. TPR-smooth downward volatility shift. (a) bs 1 T c ¼ 220,  TPR of the CUSUM V procedure is unchanged relative to the case where no training period episode is present, for the same reasons discussed above in the context of the robustness of its empirical FPR to training period explosive episodes. Contrastingly, the TPRs of both the CUSUM and MAX 10 procedures are significantly negatively impacted by the presence of a training period explosive episode, so that the CUSUM V procedure easily offers the best TPR profile of the three procedures considered in such cases.

Additional Simulations
In addition to the simulation results discussed in Sections 3.1-3.3 we also performed a large number of further experiments which we do not report in detail here, but which are available on request. We now summarize these findings: i. The simulations reported in Sections 3.1-3.3 all set e t to be an NIID(0, 1) process. We repeated the experiments for a number of other distributions for e t , including highly skewed distributions such as ðv 2 1 À 1Þ, and found these all to yield very similar results to those reported. ii. We also examined the impact of other forms of time-varying volatility on the finite sample properties of the CUSUM, CUSUM V , and MAX 10 procedures. In particular, we considered linearly trending volatility paths, abrupt shifts in volatility, and multiple smooth or abrupt volatility shifts. Upward (downward) trending volatility paths lead to FPR inflation (deflation) for the CUSUM procedure but had little impact on the FPR of the CUSUM V procedure. Abrupt upward (downward) shifts in volatility again lead to significant FPR inflation (deflation) for the CUSUM procedure but only had a modest impact on the FPR of the CUSUM V procedure, in spite of such abrupt volatility shifts not being permitted by Assumption 2. With multiple smooth or abrupt volatility shifts, the FPR of the CUSUM monitoring procedures was governed by the relative average volatility in the training and monitoring periods implied by each volatility path. For scenarios where the average volatility in the monitoring period exceeded that in the training period the CUSUM procedures exhibited FPR inflation, whereas if the average volatility in the monitoring period was less than that in the training period the CUSUM procedures exhibited FPR deflation. In all instances, the inflation or deflation exhibited by the CUSUM procedure was far more severe than for the CUSUM V procedure. The FPR of the MAX 10 was unaffected by trending volatility paths, abrupt shifts in volatility, or multiple volatility shifts. iii. When allowing for trending volatility under the alternative, the TPRs of the CUSUM V and MAX 10 procedures were little different to those obtained in the homoskedastic baseline. In line with the distortions in the FPR outlined above, under upward (downward) trending volatility the TPR of the CUSUM procedure was relatively higher (lower) than the other two procedures. In the case of abrupt upward variance shifts the power ordering of the three test procedures was unchanged relative to the homoskedastic baseline, albeit with the difference in power between the test procedures being amplified somewhat due to the modest FPR inflation displayed by the CUSUM V procedure and the severe FPR inflation exhibited by the CUSUM procedure. For abrupt downward variance shifts under the alternative, the TPR of the MAX 10 procedure was largely unchanged, whereas the powers of the CUSUM and CUSUM V procedures were deflated relative to the homoskedastic baseline, modestly so in the case of the CUSUM V procedure and severely so in the case of the CUSUM procedure.

Empirical Application
In this section, we illustrate the methods discussed in this paper using empirical data on Bitcoin. Bitcoin is a digital cryptocurrency that, much like government-backed currencies, is envisaged as a medium of exchange. The price of Bitcoin has, however, been subject to a great deal of volatility since its inception and, as such, is regarded as a speculative asset. We apply our procedures directly to the Bitcoin price with no adjustment for fundamentals since there is no consensus as to what would be the appropriate fundamental for the price of Bitcoin or if indeed it could even be measured (e.g., the cost of mining has been suggested). Consequently, what we are doing is examining the Bitcoin data for an emerging explosive episode. While formally this doesn't allow us to determine if a bubble is present, it does nonetheless provide some evidence that a bubble might be present. We obtained daily data on the price of Bitcoin from https://finance.yahoo.com/quote/ BTC-GBP. We will concentrate attention on monitoring for explosive episodes in the year 2017 as the price of Bitcoin rose markedly over the course of this year and also appears subject to time-varying volatility.
We apply both the CUSUM and CUSUM V monitoring procedures to this dataset. Results for the MAX 10 procedure are omitted as this procedure fails to find evidence of explosivity in either of the two example monitoring exercises we consider. To account for potential serial correlation in the data, we apply pre-whitening to the first differences of Dy t as discussed in Remark 11, selecting the lag order, p, using BIC with a maximum lag order of 4.
Before considering the results of our monitoring exercises we first plot in Figure 9 the price and estimated volatility path of the Bitcoin series for the period January 1, 2017-November 30, 2017. The estimated volatility path is computed using the kernel smoothing estimator in Equation (7) using the same choices for the kernel function and H as in the Monte Carlo exercise reported in Section 3. It can be seen from Figure 9 that the Bitcoin price series rose a great deal over 2017, beginning the year at £809 and rising to £7,565 by November 30, 2017, leading to widespread belief that the series may have been subject to one or more explosive episodes over the period in question. Figure 9 also highlights the presence of considerable time-variation in the estimated volatility path of the Bitcoin price series. It is therefore seen to be of considerable importance to allow for the presence of time-varying volatility in the data when investigating whether or not the general upward movement in the Bitcoin price series is due to explosive episodes.
We report results for two monitoring exercises performed on the Bitcoin data. In each case, the length of the training period is set to be the same value as in the Monte Carlo simulations in Section 3. We first consider how a real-time monitoring exercise that began on July 15, 2017 and ended on August 19, 2017 would have played out, with the training period for this monitoring exercise given by data from December 8, 2016 to July 14, 2017. This monitoring period is identified by the first shaded area in Figure 9, with the solid and dashed black lines plotted within this period identifying the first point at which the CUSUM V and CUSUM monitoring procedures signal a rejection, respectively. The price of Bitcoin increases slightly from July 15-July 20 and is then relatively flat until the end of July, at which point it begins to increase again until the end of the monitoring period. The estimated volatility path of the series shows a marked increase at the start of the monitoring period, before gradually declining over the remainder of the monitoring period. In this example, both the CUSUM and CUSUM V monitoring procedures signal an explosive episode, with the CUSUM procedure first rejecting on July 20 and CUSUM V first rejecting on August 11. In Section 3.1, we saw that upward volatility shifts can significantly increase the FPR of the standard CUSUM procedure relative to CUSUM V . It is also worth noting that the spike in the volatility estimate on July 21 implies that the volatility of the series increased a great deal on July 20 which is precisely the date where the CUSUM procedure first rejects. 3 It therefore seems likely that in this example the CUSUM procedure signals an explosive episode much earlier than the CUSUM V procedure because the former is running off a much higher FPR than the latter due to an increase in volatility at the start of the monitoring period. That it signals an explosive episode at exactly the point where a large spike in volatility is observed while CUSUM V does not is further suggestive that this may be a spurious detection.
For our second illustration, we investigate how a real-time monitoring exercise that began on August 30, 2017 and ended on November 8, 2017 would have played out, with the training period for this monitoring exercise given by data from January 23, 2017 to 3 The kernel smoothing estimator in Equation (7) imposes leave-one-out so that any increases in volatility have a 1-period lag in entering the volatility estimator.
August 29, 2017. This monitoring period is identified by the second shaded area in Figure 9 with, again, the solid and dashed black lines plotted within this period identifying the first point at which the CUSUM V and CUSUM monitoring procedures signal a rejection, respectively. The price of Bitcoin declines slightly at the start of the monitoring period, but then increases from September 14 until the end of the monitoring period. The estimated volatility path of the series first increases up until September 15 before then dropping rapidly up until October 9. At this point, volatility then increases again up until October 13 before stabilizing for the remainder of the monitoring period. In contrast to our first monitoring exercise, the CUSUM V procedure is the first to signal an explosive episode on October 14, with the CUSUM procedure not rejecting until October 29. So, while in our first example an upward movement in volatility coincided with the CUSUM procedure rejecting first, here we observe the opposite outcome. In this scenario, the decline in volatility seen from mid-September until early October seems likely to have effected a lower rejection probability for the CUSUM procedure relative to CUSUM V, consistent with the simulation results in Section 3.1 for the case of downward shifts in volatility, allowing the latter to deliver a far earlier rejection of the null. We also note that the results of the first monitoring exercise indicate that there may be an explosive episode present in the training period for this second monitoring exercise, which may also be contributing to the significant delay in CUSUM rejecting relative to CUSUM V and would again be consistent with the simulation results reported in Figures 7 and 8.

Conclusions
We have generalized the CUSUM-based real-time explosive episode detection procedure of Homm and Breitung (2012) to allow for the presence of time-varying volatility in the innovations. Such patterns were shown to cause potentially severe inflation in the true FPR of the CUSUM procedure. Our proposed modification involves replacing the first-difference estimator of the variance used in the CUSUM statistics by a Nadaraya-Watson-type nonparametric estimator. The resulting sequence of modified CUSUM statistics was shown to have a pivotal joint limiting null distribution coinciding with that of the sequence of standard CUSUM statistics under homoskedasticity with the result that the theoretical FPR of the procedure is controlled. A discussion of the bandwidth and kernel choices associated with the nonparametric variance estimator was also provided with a cross validation choice recommended for the former, whereby it is selected to minimize the estimation error of the spot variance over the most recent observations. Simulation evidence, for a variety of timevarying volatility processes, suggested that the FPR of the modified procedure is well controlled in finite samples. Where the innovations were homoskedastic the potency of the modified procedure to detect an emergent explosive episode was shown to be only slightly lower than the standard procedure. In contrast to both the standard CUSUM procedure and the procedure of Astill et al. (2018), the modified CUSUM procedure was also shown to be robust to the presence of explosive episodes in the training period. An application to a Bitcoin price data series was used to illustrate the possible advantages of our proposed procedure relative to the standard CUSUM procedure, with our proposed procedure signaling the presence of an explosive episode sooner than the standard CUSUM procedure in a period of downward transitioning volatility and avoiding a potential early false rejection in a period of upward transitioning volatility.

APPENDIX A: MATHEMATICAL PROOFS
Throughout the proofs, unless otherwise stated, we use max j as shorthand notation for max Tþ1 6 j 6 bkTc . We also denote ðr 2 Þ 0 ð:Þ as the derivative of r 2 ð:Þ.

A.1 Preparatory Lemmas
Lemma A1. Let the conditions of Theorem 1 hold. Then, under H 0 , max j X N s¼0 w s r 2 jÀs ðe 2 jÀs À 1Þ ¼ o p ð1Þ: Proof of Lemma A1. This lemma is proved using the Fourier transformation-based method as in Theorem 2.8 of Pagan and Ullah (1999 where we have used the change of variable l ¼ j À s. In fact we know that there are zero terms in the sum as K is only non-zero on (0, 1), but we keep the sum free of the index j as a mechanism to derive the max rate. Consider first the numerator of (A.1). For this, we have that Kððj À lÞ=NÞr 2 l ðe 2 l À 1Þ ¼ exp ðitlÞr 2 l ðe 2 l À 1Þ expðÀijÞ/ðtNÞdt; where we have used the change of variable s ¼ tN. Thus, we have using Jensen's inequality which gives us that EZ 1=2 6 ðEZÞ 1=2 for Z > 0. As fe 2 l À 1g is a martingale difference sequence indexed by l, it follows using Burkholder's inequality (e.g., Shiryaev, 1996, p. 499) that, for a positive constant C, E X bkTc l¼1 cos ðtlÞr 2 l ðe 2 l À 1Þ 2 6 CE X bkTc l¼1 cos 2 ðtlÞr 4 l ðe 2 l À 1Þ 2 ¼ OðTÞ; by the uniform boundedness of volatility function and the existence of fourth moment of e l . Similarly, we have and u bs1Tc is the last observation in the unit root regime (and also serves as the initial value for the explosive regime). Since fe j g is a m.d.s., using Burkholder's inequality, we have E/ À2ðjÀbs1TcÞ je j þ /e jÀ1 þ Á Á Á þ / jÀbs1TcÀ1 e bs1Tcþ1 j 2 ¼ Oð1Þ; and it follows that / ÀðjÀbs1TcÞ ðe j þ /e jÀ1 þ Á Á Á þ / jÀbs1TcÀ1 e bs1Tcþ1 Þ ¼ O p ð1Þ for any bs 1 Tc þ 1 6 j 6 bs 2 Tc. Then by Doob's maximal inequality for martingales, we also have max bs1Tcþ1 6 j 6 bs2Tc j/ ÀðjÀbs1TcÞ ðe j þ /e jÀ1 þ Á Á Á þ / jÀbs1TcÀ1 e bs1Tcþ1 Þj ¼ O p ð1Þ: For the initial value of the explosive regime, it is satisfied that So the effect of the initial point is dominant in Equation (A.3), and we have max bs1Tcþ1 6 j 6 bs2Tc jT À1=2 / ÀðjÀ1Àbs1TcÞ u jÀ1 j ¼ O p ð1Þ: We first prove the result for bs 1 Tc þ N þ 1 6 j 6 bs 2 Tc. When bs 1 Tc þN þ 1 6 j 6 bs 2 Tc, w s e 2 jÀs þ 2d X N s¼0 w s u jÀsÀ1 e jÀs : Notice that D 1 satisfies max bs1TcþNþ1 6 j 6 bs2Tc NT À1 / À2ðjÀ1Àbs1TcÞ D 1 ¼ max jT À1=2 / ÀðjÀ1Àbs1TcÞ u jÀ1 j 6 max bs1Tcþ1 6 j 6 t jT À1=2 / ÀðjÀ1Àbs1TcÞ u jÀ1 j X t j¼bs1Tcþ1 1 ¼ O p ðt À bs 1 Tc À 1Þ: For the lower bound part of the proof, notice that jT À1=2 / ÀðjÀ1Àbs1TcÞ u jÀ1 j P min bs1Tcþ1 6 j 6 t jT À1=2 / ÀðjÀ1Àbs1TcÞ u jÀ1 j X t j¼bs1Tcþ1 1: From Equation (A.4), it is known that jT À1=2 / ÀðjÀ1Àbs1TcÞ u jÀ1 j ¼ O p ð1Þ for any bs 1 Tc þ N þ 1 6 j 6 t and is nondegenerate to 0. Therefore, O p ðt À bs 1 Tc À 1Þ is also a lower bound rate and the proof of the lemma is finished.

A.2 Proof of Lemma 1
First consider the decomposition w s e 2 jÀs À r 2 j ¼ X N s¼0 w s r 2 jÀs e 2 jÀs À r 2 j ¼ X N s¼0 w s r 2 jÀs ðe 2 jÀs À 1Þ þ X N s¼0 w s r 2 jÀs À r 2 j ¼: A 1;j þ A 2;j ; (A.5) where A 1;j and A 2;j are defined implicitly. By Lemma A1, we have max j jA 1;j j ¼ o p ð1Þ. Next consider A 2;j , max j j X N s¼0 w s r 2 jÀs À r 2 j j ¼ max where we have used the approximation given by the convergence to the Riemann integral. The approximation error is clearly dependent on N and independent of j. Using the continuous differentiability of the rð:Þ function, we have that max j Ð 1 0 KðuÞ r 2 jÀuN T À r 2 j T du þ oð1Þ Ð 1 0 KðuÞdu þ oð1Þ by our assumption that N=T ! 0. Taken together these results establish that max j jr 2 j À r 2 j j ¼ o p ð1Þ.

A.3 Proof of Lemma 2
Using the decomposition in Equation (A.5), we havê r 2 j À r 2 j ¼ A 1;j þ A 2;j : The object of interest can therefore be written as jðr 2 jÀ1 À r 2 jÀ1 Þ À ðr 2 j À r 2 j Þj ¼ jðA 1;jÀ1 À A 1;j Þ þ ðA 2;jÀ1 À A 2;j Þj: Consider first the difference A 1;j À A 1;jÀ1 , A 1;j À A 1;jÀ1 ¼ P N s¼0 w s r 2 jÀs ðe 2 jÀs À 1Þ À P N s¼0 w s r 2 jÀ1Às ðe 2 jÀ1Às À 1Þ where we have used the fact that Kð0Þ ¼ Kð1Þ ¼ 0. Because K is continuously differentiable over (0, 1), we can employ the mean value theorem to show that the foregoing expression becomes 1 N P N s¼1 K 0 ðs s Þr 2 jÀs ðe 2 jÀs À 1Þ P N s¼0 K s N À Á ; where s s 2 ððs À 1Þ=N; s=NÞ. Using the same strategy as used in Lemma A1, coupled with the absolute integrability assumption placed on the characteristic function of K 0 ð:Þ in where s jÀ1Às 2 ððj À 1 À sÞ=T; ðj À sÞ=TÞ; s jÀ1 2 ððj À 1Þ=T; j=TÞ, where we have used the mean value theorem based on the differentiability of the r 2 ð:Þ function. By the Lipschitz assumption made on the ðr 2 Þ 0 ð:Þ function, we have that max j jA 2;j À A 2;jÀ1 j 6 which is clearly oð1=TÞ because of our assumption that N=T ! 0. Finally, because where we have used the result min bs1Tcþ1 6 j 6 t jN 1=2 T À1=2 / ÀðjÀ1Àbs1TcÞr j;N j is nondegenerate to 0 in Lemma A2 and the result of Lemma A3. Using the same argument, we can show that B t2 is dominated by B t1 in order, so the order of B t is determined by B t1 .
Next we derive a lower bound for the divergence rate of B t1 . Notice jB t1 j also satisfies jB t1 j P d min bs1Tcþ1 6 j 6 t j 1 jT À1=2 / ÀðjÀ1Àbs1TcÞ u jÀ1 j ¼ d max bs1Tcþ1 6 j 6 t jN 1=2 T À1=2 / ÀðjÀ1Àbs1TcÞr j;N j N 1=2 X t j¼bs1Tcþ1 jT À1=2 / ÀðjÀ1Àbs1TcÞ u jÀ1 j: From Lemma A2, we have max bs1Tcþ1 6 j 6 t jN 1=2 T À1=2 / ÀðjÀ1Àbs1TcÞr j;N j ¼ O p ð1Þ, and using the result of Lemma A3, it follows that B t diverges at a rate at least as fast as O p ðN 1=2 ðt À bs 1 Tc À 1ÞÞ. Now, since A T ¼ O p ðT 1=2 Þ and clearly does not grow with t, B t dominates under the alternative, and the derived divergence rate N 1=2 ðt À bs 1 Tc À 1Þ is clearly higher than the boundary function c t ffiffi t p and the claim of the proposition follows.