R-estimators in GARCH models; asymptotics and applications

Summary The quasi-maximum likelihood estimation is a commonly-used method for estimating the GARCH parameters. However, such estimators are sensitive to out-liers and their asymptotic normality is proved under the ﬁnite fourth moment as- sumption on the underlying error distribution. In this paper, we propose a novel class of estimators of the GARCH parameters based on ranks of the residuals, called R- estimators, with the property that they are asymptotically normal under the existence of a ﬁnite 2+ δ moment of the errors and are highly eﬃcient. We propose fast algorithm for computing the R-estimators. Both real data analysis and simulations show the superior performance of the proposed estimators under the heavy-tailed and asymmetric distributions.


Robust estimation based on ranks
Consider observations {X t ; 1 ≤ t ≤ n} from a financial time series with the following representation where { t ; t ∈ Z} are unobservable i.i.d. non-degenerate error r.v.'s with mean zero and unit variance and with ω 0 , α 0i , β 0j > 0, ∀ i, j. In the literature, such models are known as the GARCH (p, q) model and we assume that {X t ; t ∈ Z} is stationary and ergodic. Estimation of parameters based on ranks of the residuals was discussed by Koul and Ossiander (1994) for the homoscedastic autoregressive models and Mukherjee (2007) for the heteroscedastic models. Andrews (2012) proposed a class of R-estimators for the GARCH model using a log-transformation of the squared observations and then minimizing a rank-based residual dispersion function. However, such square-transformation, not being one-to-one transformation, may lead to loss of information under an asymmetric error distribution as demonstrated later in this paper. This motivates us to define R-estimators for the GARCH model that uses the data directly without requiring such transformation.
Robust estimation of the GARCH parameters has been studied extensively in the literature although the attention has been focused exclusively on the class of M-estimators. See, for example, Berkes and Horváth (2004), Mukherjee (2008), Francq et al. (2011), Fan et al. (2014) and Zhu and Ling (2011) and the references in those papers. Some simulation study by Fan et al.(2014, Section 7.2) to compare M-estimators with the Restimators proposed by Andrews (2012) revealed that these two classes of estimators have almost indistinguishable asymptotic performance while the rank-based estimators are slightly better and this provides another motivation for considering R-estimators. As shown in the motivating example of Section 1.2 and Table 2, our proposed R-estimators are highly efficient for symmetric error distributions while it can be more efficient than those of Andrews (2012) when the error distribution is asymmetric.
Since the M-estimators in GARCH models are obtained by maximizing a pseudolikelihood of the data directly without requiring any (squared) transformation of the observations, we define R-estimators without transforming the original data. Consequently, similar to the linear regression and autoregressive models, the asymptotic normality of the R-estimators are derived under smoothness conditions on the error probability density function (pdf) while assumptions in Andrews (2012) are related to the pdf of the squared and logged error which are the errors of a transformed model. However, unlike for the R-estimators defined by Andrews (2012), the limiting covariance matrices of our proposed class of R-estimators do not have explicit forms and consequently, we can employ, for example, bootstrap methods to approximate their distributions.
Unlike the commonly-used quasi-maximum likelihood estimator (QMLE) which is asymptotically normal under the finite fourth moment assumption of the error distribution, the proposed class of R-estimators turn out to be asymptotically normal under the assumption of only a finite 2 + δ-th moment for some δ > 0. The efficiency property of the R-estimators is studied based on the simulated data from the GARCH (1, 1) model. Analysis of real data shows that the numerical values of R-estimates can be different from the QMLE and the subsequent analysis of the GARCH residuals shows that such difference may be attributed to the infinite fourth moment of the error distribution, which leads to the failure of the QMLE.
One conspicuous issue with previous studies is related to the estimation of an identifiable scale parameter that leads to often more than one stage of estimation. As pointed out by Fan et al. (2014, Section 7.2), such estimation is important for comparing the bias performance. Preminger and Storti (2017) also pointed out the importance of estimating the scale parameter although such estimates may not always be needed in some applications such as estimating the 'Value at Risk'. In Section 2.3 of this paper, we propose a simple consistent estimate of the scale based on R-estimators and the general principle can be applied to the M-estimation as well; see Andrews (2012, Remark 9) and Preminger and Storti (2017, Section 3) for other estimates of the scale.
The main contributions of the paper are as follows. First, a new class of robust and efficient estimators for the GARCH model parameters is proposed. Second, the asymptotic distributions of the proposed estimators are derived based on weak assumption on the error moments. Third, we propose an algorithm for computing the R-estimators which is computational friendly and easy to implement.

A motivating example
To illustrate the advantages of our proposed R-estimator over the commonly-used QMLE and R-estimator of Andrews (2012) (BA, henceforth), we consider below some simulation results corresponding to the GARCH (1, 1) model with underlying standardized error density being (i)the standard normal distribution and (ii) the skew normal distribution (see e.g. Azzalini and Dalla Valle (1996) for details of such distribution). We generate R = 1000 samples of size n = 1000 with parameter values (ω, α, β) = (6.50 × 10 −6 , 0.177, 0.716) as in Section 3.2. Simulation results described below are similar to various other choices of the true parameters. To make a fair comparison, for both R-estimators we use the van der Waerden score (vdW, henceforth); our vdW score is given in Section 2.4 while BA's one has a different form; see Andrews (2012, Section 3). The resulting boxplots of all estimators are displayed in Figure 1 under the skew normal (upper panel) and standard normal (lower panel) error densities and the MSE ratios of the QMLE over other estimators are reported.
An inspection of these plots reveals superiority of our R-estimator over the QMLE and BA's. Under the normal error distribution, the distribution patterns of the R-estimators are quite similar to the QMLE around the true parameter value, and the MSE ratios of the QMLE over the R-estimators are all close to one. However, under the skew normal errors where asymmetry is present, our proposed R-estimator has the least dispersion and with a gain of around 44%-92% efficiency over the QMLE, while the BA's R-estimator has around 12%-23% gain in efficiency over the QMLE. Therefore, although the BA's R-estimator achieves efficiency similar to ours under the normal distribution, it is not as efficient as ours under an asymmetric distribution.

Outline of the paper
The rest of the paper is organized as follows. Section 2 defines a class of R-estimators of scaler-transformed parameters based on an asymptotic linearity result of a rank-based central sequence. A consistent estimator of the unknown scalar c ϕ defined by (2.4) is provided and the asymptotic distribution and efficiency of the R-estimator are discussed. We describe an algorithm for computing the R-estimators. Section 3 contains empirical and simulation results of the R-estimators. Conclusion is given in Section 4. The technique used to establish the asymptotic distribution is included in the online Appendix A. The Appendix contains some additional numerical results.

THE CLASS OF R-ESTIMATORS FOR THE GARCH MODEL
In this section, we first define a central sequence based on ranks of the residuals of the GARCH model. We prove the asymptotic uniform linear expansion (2.7) of this central sequence, which enables us to define one-step R-estimatorθ nϕ in (2.9) as a rootn consistent estimator of a scale-transformed GARCH parameter θ 0ϕ = (c ϕ ω 0 , c ϕ α 01 , ..., c ϕ α 0p , β 01 , ..., β 0q ) (2.1) with a constant c ϕ > 0 satisfying (2.5). Based onθ nϕ and {X t }, we are able to derive a consistent estimator of c ϕ . We discuss some computational aspects and propose a recursive algorithm for computation in Section 2.5. NOTATIONS: Throughout the paper, for a function g, we useġ andg to denote its first and second derivatives whenever they exist. We use c, b, c 1 to denote positive constants whose values can possibly change from line to line. Let be a generic random variable (r.v.) with the same distribution as { t } and let F and f denote the cumulative distribution function (c.d.f.) and probability density function (p.d.f.) of , respectively. Let η t := t / √ c ϕ and η be a generic r.v. with the same distribution as {η t }. Let G and g be the c.d.f. and p.d.f. of η, respectively. A sequence of stochastic processes {Y n (·)} is said to be u P (1) (denoted by Y n = u P (1)) if for every c > 0, sup{|Y n (b)|; ||b|| ≤ c} = o P (1), where || · || stands for the Euclidean norm.

Define the variance function by
where the coefficients {c j (θ); j ≥ 0} are given in (3.1) of Berkes et al. (2003) with the property c j (θ 0 ) = c j , j ≥ 0, so that the variance functions satisfy Let H * (x) = x{−ḟ (x)/f (x)}. From Berkes and Horvath (2004, Example 2.4) and Mukherjee (2008Mukherjee ( , p. 1534, the maximum likelihood estimator (MLE) is a solution of ∆ n,f (θ) = 0, where . However, f in H * is usually unknown and we therefore consider an approximation to ∆ n,f (θ). Let ϕ : (0, 1) → R be a score function satisfying some regularity conditions which will be discussed later. Examples of ϕ are given in Section 2.4. Let R nt (θ) denote the rank of In linear regression models, the MLE has the same asymptotic efficiency as an R-estimator based on the score function ϕ(u) = −ḟ (F −1 (u))/f (F −1 (u)). For the estimation of the scale parameters, a correspondence to the MLE is through the central sequence .

One-step R-estimators and their asymptotic distributions
To define the R-estimator in terms of the classical Le Cam's one-step approach as in Hallin and La Vecchia (2017) and Hallin et al. (2020), we derive the asymptotic linearity of the rank-based central sequence under the following assumptions. Let c ϕ > 0 be defined by The following conditions on the distribution of η t are assumed for the proof of Theorem A.1 in Appendix on the approximation of a scale-perturbed weighted mixedempirical process by its non-perturbed version.
We remark that Assumption (i) entails that µ(x) is uniformly Lipschitz continuous in scale in the sense that for some constant 0 < c < ∞ and for every s ∈ R, we have sup x∈R |µ(x + xs) − µ(x)| ≤ c|s|.
In particular, Assumptions (i), (ii) and (iii) hold for a wide range of error distributions, including normal, double-exponential, logistic and t-distributions with degrees of freedom more than 2 which are considered for simulation study.
We also need the following assumptions on the parameter space and the score function ϕ.
ASSUMPTION 2.2. Denoting by Θ 0 the set of interior points of Θ, we assume that θ 0 , θ 0ϕ ∈ Θ 0 . ASSUMPTION 2.3. The score function ϕ is non-decreasing, right-continuous with only a finite number of points of discontinuity and is bounded on (0, 1).
We now compare our assumptions with those made by Andrews (2012), to be called as BA1, BA2 etc. Assumption BA1 states the stationarity and ergodicity of {X t } similar to what we assume but not necessarily finiteness of the variance of X t . However, estimation of the intercept parameter α 00 appearing in Andrews (2012, Equation (2.2)) is not considered there. In this paper, we estimate the intercept parameter under the finite variance assumption of X t as in Assumption A4 below. Higher moment assumptions were also made in Fan et al. (2014) while estimating the equivalent scale parameter η f .
Assumptions BA2 and BA3 are related to the uniqueness of parameters and the existence of the non-degenerate errors and observations. We assume non-degenerate models as is common in the literature. Assumptions BA4 and BA7 are on the score functions which are bounded, non-decreasing and left-continuous. In Assumption 2.3, we assume that the score functions are bounded, non-decreasing and right-continuous with finite number of discontinuity. Assumptions BA5 and BA6 are on the cdf and pdf of the log of the squared error distribution. We have made analogous assumptions on the error pdf itself and uniform continuity on the µ −1 -transformed axis in Assumption 2.1(i) and (ii); the later came up as a part of the technical assumptions in using some convergence results of empirical processes to derive the asymptotic distribution of the R-estimators. Assumptions BA8, BA9 and BA10 describe various scenarios related to some of the component parameters equal to zero as in Francq and Zakoian (2007). In Assumption 2.2, we assume that all parameters are positive as in Berkes and Horváth (2004), Mukherjee (2008) and Francq et al. (2011) in relation to the M-score and this corresponds to BA8.
To state the asymptotic linearity ofR n (θ), we introduce the following notation. Let is the standard Brownian bridge. Then Z has mean zero and variance γ(ϕ); see the proof in Lemma A.5 for details. Let G n (x), x ∈ R be the empirical distribution function of {η t } (which is unobservable), The following proposition states the asymptotic uniform linearity ofR n (θ).
The above asymptotic linearity allows us to define a class of R-estimators through the one-step approach. Let {Υ n } be a sequence of consistent estimators of Υ ϕ,g (θ 0ϕ ) := (1/2 + ρ(ϕ)/2)J (θ 0ϕ ); see Section 2.5 for a construction ofΥ n . Letθ n be a root-n consistent estimator of θ 0ϕ and, for technical reasons, we assumeθ n is asymptotically discrete. More precisely, a sequence {θ n } is called discrete if there exists K ∈ N such that independent of n ∈ N,θ n takes on at most K different values in see Kresis (1987, Section 4) for details. We remark that here asymptotically discreteness is only of theoretical interest since in practiceθ n always has a bounded number of digits; see Le Cam and Yang (2000, Chapter 6) and van der Vaart (1998, Section 5.7) for more details. Then the one-step R-estimator is defined aŝ (2.9) Note that strictly speaking, the R-estimators based on this definition are not functions of the ranks of the residuals only. However, we borrow the terminology from the regression and the homoscedastic-autoregression settings and still call them (generalized) R-estimators. When, for example, ϕ(u) = u − 1/2,θ nϕ is an analogue of the Wilcoxon type R-estimator.
The following theorem shows that the R-estimator defined in (2.3) is √ n-consistent estimator of θ 0ϕ . The proof is given in Appendix A.

Estimation of c ϕ
To obtain a consistent estimator of θ 0 , we estimate the unknown scalar c ϕ under the following mild moment assumption on X t .
From Bollerslev (1986, Theorem 1), a sufficient condition for the GARCH model to be second-order stationary (hence to satisfy Assumption 2.4) is Using the ergodicity property, E(X 2 t ) is estimated from the data by X 2 n := n −1 n t=1 X 2 t .
Also, c ϕ ω 0 and c ϕ α 0i are estimated byω nϕ andα nϕi , respectively. Solving the following equation for c we obtain an estimate of c ϕ aŝ nϕi . (2.11) Consequently writeθ nϕ in its component-wise form ..,β nq and letθ The following theorem states thatĉ nϕ is a consistent estimator of c ϕ and thusθ n is a rootn consistent estimator of θ 0 with asymptotically normal distribution; see Appendix A for its proof.
THEOREM 2.2. Let Assumptions 2.1-2.4 hold. Then, as n → ∞, Hence as n → ∞, √ n θ n − θ 0 is normal with mean 0 and covariance matrix Note that by making Assumption 2.4 and following similar approach as in the proof of Theorem 2.2, one can also obtain consistent estimators of unknown scalars for robust estimators in GARCH models (e.g., M-estimators of Liu and Mukherjee (2020) and Restimator of Andrews (2012)). For illustration, recall that the R-estimator of Andrews (2012) is root-n consistent estimator of where ω 0 is the intercept parameter that plays the role of the unknown scalar. Denoting the R-estimator of Andrews (2012) bŷ θ BA;n := (â BA;n1 , ...,â BA;np ,β BA;n1 , ...,β BA;nq ) , a consistent estimator of ω 0 can be obtained aŝ We remark that from Theorem 2.2, the asymptotic covariance matrix ofθ n has a complicated form. Hence we can consider, for example, bootstrap methods to approximate the limit distribution of √ n θ n − θ 0 .

Examples of the score functions
Below we cite examples of three commonly-used R-scores; for similar examples of scores in other models, see Mukherjee (2007) and Hallin and La Vecchia (2017).
Example 1 (sign score). Let ϕ(u) = sign(u − 1/2). Then for symmetric error distribution, c ϕ = (E| |) 2 , which coincides with the scale factor of the LAD estimator in Mukherjee (2008). Therefore, the sign R-estimator is expected to be close to the LAD estimator. This is demonstrated later in the real data analysis.
Example 3 (van der Waerden (vdW) or normal score). One might also set ϕ(u) = Φ −1 (u), with Φ(·) denoting the c.d.f. of the standard normal distribution. Notice that unlike the sign and Wilcoxon score, the vdW score is not bounded as u → 0 and u → 1. It thus does not satisfy Assumption 2.3. However, an approximating sequence of bounded score functions of ϕ on (0, 1) can be constructed as in Andrews (2012). For example, letting ϕ m (u) satisfies Assumption 2.3 and converges pointwise to the vdW score on (0, 1). It is demonstrated later using both real data analysis and extensive simulation that the vdW has superior performance compared with the QMLE.
We now provide heuristics for the definition of the R-estimator in (2.2). When the underlying error distribution is known, one can obtain efficient R-estimator by choosing the score function as ϕ(u) = −ḟ (F −1 (u))/f (F −1 (u)). Since for large n, the empirical distribution function R nt (θ 0ϕ )/(n + 1) of . Therefore, the criteria function of the R-estimator gets close to the MLE which is efficient. This leads to the choice of the vdW, sign and Wilcoxon under the normal, double exponential (DE) and logistic distributions, respectively. This is observed later in simulation study of the R-estimator.

Computational aspects
Here we discuss some key computational aspects and propose an algorithm to computê θ nϕ andθ n . Codes are available from the authors' GitHub page https://github.com/ HangLiu10/RestGARCH. First, since c ϕ depends on the unknown density f , it is difficult to have a √ n-consistent initial estimatorθ n of θ 0ϕ . One possible choice of the initial consistent estimatorθ n is the two-stage type least squares estimator as proposed by Preminger and Storti (2017). In practice, due to the finite sample size, the one-step procedure in (2.9) is usually iterated a number of times, takingθ nϕ as the new initial estimate, until it stabilizes numerically. This iteration process would mitigate the impact of different initial estimates; see van der Vaart (1998, Section 5.7) and Hallin and La Vecchia (2017) for similar comments. In fact, we observed during our extensive simulation study that irrespective of the choice of the QMLE, LAD or θ 0 as initial estimates, only few iterations result in the same estimates.
Second, to computeθ nϕ of (2.9), we needΥ n which is a consistent estimator of (1/2 + ρ(ϕ)/2)J (θ 0ϕ ). The matrix J (θ 0ϕ ) can be consistently estimated bŷ For estimating ρ(ϕ) which is a function of the density g, we can use the asymptotic linearity in (2.7). Here with an arbitrarily chosen b, we can substituteθ n for θ 0ϕ and then solve the equation for ρ(ϕ) based on (2.7). A more delicate approach for estimating ρ(ϕ) can be found in Cassart et al. (2010) and Hallin and La Vecchia (2017, Appendix C). Based on our extensive simulation study and real data analysis, it appears that different values of ρ(ϕ) would finally lead to same estimate after some iterations. Consequently, we set ρ(ϕ) = 1 during the computation which is the value corresponding to the vdW score under the normal distribution. In summary, we propose the following iterative Algorithm 1 to computeθ nϕ , with which we can obtainĉ ϕ using (2.11) and henceθ n .
Algorithm 1: R-estimation for GARCH models Input: a sample {X t ; 1 ≤ t ≤ n}, orders p and q of the GARCH process, number k of iterations in the one-step procedure. Output: R-estimatorθ n Step 1. Compute a preliminary root-n consistent estimatorθ n and setθ nϕ =θ n .

REAL DATA ANALYSIS AND SIMULATION RESULTS
This section examines the performance of the R-estimators and compare them with the QMLE by analyzing three financial time series and by carrying out extensive Monte Carlo simulation.

Real data analysis
In this section we fit GARCH (1, 1) model to three financial time series and compare the proposed three R-estimators with the M-estimators QMLE and LAD discussed in Mukherjee (2008), where the unknown scalar of the LAD can also be estimated by (2.11).
In an earlier work, Muller and Yohai (2008) fitted the the GARCH (1, 1) model to the Electric Fuel Corporation (EFCX) time series for the period of January 2000 to December 2001 with sample size n = 498. The parameters of the model are estimated by M-estimators based on various score functions. It turned out that the M-estimates of the parameter β differ widely depending on the score functions and so it is difficult to assess which estimate should be relied on in similar situations. Here we compare various M-estimates and R-estimates of the GARCH (1, 1) parameters for the EFCX series again shedding light on which could be some possible reasons for the difference in estimates and finally which estimation methods can be relied upon. We also compare M-estimates of the GARCH (1, 1) parameters when fitted to two other dataset, namely, the S&P 500 stock index from June 2013 to May 2017 with n = 1005 and the GBP/USD exchange rate from June 2013 to May 2017 with n = 998 to illustrate that the M-and R-estimates do not differ widely when the underlying theoretical assumptions hold in general.
In Table 1, we report the QMLE computed using the fGarch package in R program, the M-estimates QMLE and LAD and the R-estimates proposed in Examples 1-3 of Section 2.4. For the EFCX data, the R-estimates for all score functions are quite close to the LAD estimate, but they are very different than the QMLE. On the contrary, for the S&P 500 and GBP/USD data, M-and R-estimates are close to each other.
To investigate why the QMLE is different from the other R-estimates and LAD for the EFCX data, we check the assumption E 4 < ∞ for this data by using the QQ-plots of the residuals (based on the vdW R-estimates) against t distributions. We consider the vdW score only because the R-estimates based on two other score functions and the LAD are close to the vdW estimates. For comparison, we have also provided QQ-plots for the S&P 500 data. The main idea behind the QQ-plots of the residuals against the t(d) distribution is simple: Suppose ∼ t(d) distribution; then E| | ν < ∞ if and only if ν < d. Hence, residuals with heavier tail than the t(d) distribution correspond to the errors with the infinite d-th moment while those with thinner tail than the t(d) distribution have the finite d-th error moment.
The top-left panel of Figure 1 (see Appendix C) shows the QQ-plot of the residuals against the t(4.01) distribution for the EFCX data. The residuals have heavier right tail than the t(4.01) distribution which implies that the fourth moment of the error term may not exist. On the other hand, the QQ-plot against the t(3.01) distribution reveals lighter tail as shown at the bottom-left panel of Figure 1 and this implies that E| | 3 < ∞.
For the S&P 500 data, the QQ-plot against t(4.01) distribution at the top-right panel of Figure 1 shows that the residuals have lighter tails than t(4.01) distribution. For the QQ-plot against t(6.01) distribution, as shown at the bottom-right panel of Figure 1, the residuals fit the distribution better. Therefore, we may conclude that E| | 4 < ∞ holds for the S&P 500 data.

Simulation study of the R-estimators
We now evaluate the performance of the R-estimators based on simulated data from various error distributions in the GARCH (1, 1) model. Let R denote the number of replications andθ nr = (ω r ,α r1 , ...,α rp ,β r1 , ...,β rq ) denote the R-estimator computed from the r-th data, 1 ≤ r ≤ R. We throughout compare the R-estimators with the QMLE by using the bias and MSE based on averages over 1 ≤ r ≤ R. We also compare the relative efficiency of the R-estimators wrt the QMLE under a finite sample size, as an estimate of the ARE, by using the formula Simulation for the GARCH (1, 1) model. Here we run simulation with R = 500, n = 1000 and θ 0 = (6.50 × 10 −6 , 0.177, 0.716) , where our choice of θ 0 is motivated by the estimate given by the fGarch package for the S&P 500 data in Table 1. The estimates of the bias and MSE of the R-estimators and QMLE under various symmetric error distributions are reported in Table 2, where the estimates of the ARE with respect to the QMLE are shown in the parentheses. When a data is generated under the t(3) error distribution, the fourth moment of the error is infinite and consequently the algorithm for computing the QMLE does not converge for many replications, while Algorithm 1 for computing the R-estimators always converges. Therefore, the bias and MSE are obtained using the data of those replication cases when the QMLE converges.
To explain our simulation results in Table 2 corresponding to the vdW score, recall the classical Chernoff and Savage (1958) phenomenon which appears in linear regression models as discussed in Jurečková and Sen (1996, Section 3.4) and linear autoregressive models as discussed in Mukherjee and Bai (2002). Accordingly, the ARE of the R-estimator (based on the unbounded vdW score ϕ(u) = Φ −1 (u)) with respect to the least squares estimator is 1 at the standard normal error and more than 1 for all other symmetric error distributions. In this paper we consider a bounded approximating score of the vdW score in the nonlinear GARCH model where the QMLE is analogous to the least squares estimator. Simulation results in Table 2 reflects that phenomenon closely. In particular, the vdW achieves almost the same efficiency as the QMLE under the nor-mal distribution and it is more efficient under symmetric heavier-tailed distributions. In general, the sign score is most efficient under the DE and t(3) distributions, while the Wilcoxon score is optimal under the logistic distribution. Under the t(3) distribution with infinite fourth moment, the R-estimators yield smaller bias and significantly smaller MSE than the QMLE.
For a demonstration of our asymptotic result with increasing sample size, we have reported simulation results for larger sample sizes n = 3000 and n = 5000 under t(3) error distribution in Table C.1 of the Appendix. The QMLE failed to converge even for large sample size as the fourth moment was not finite; for example, with n = 5000, the QMLE did not converge for around 8% of replications. From Table C.1, when n increases, the performance of the R-estimators becomes even better in terms of both the ARE and MSE.

CONCLUSION
We propose a new class of R-estimators for the GARCH model and derive the asymptotic normality of these estimators under mild moment and smoothness conditions on the error distribution. We exhibit the robustness and efficiency of R-estimators with respect to the QMLE through simulation and real data analysis. To approximate the distribution of R-estimators, we can consider, for example, a general type of weighted bootstrap for such estimators which is computational-friendly and easy-to-implement. The theoretical analysis such as the asymptotic validity of the weighted bootstrap is an interesting but challenging problem that can be explored in the future. A 'Value at Risk' backtesting exercise could be performed to indirectly compare the predictive performances of various estimators under consideration.
Jurečková, J. and P. Sen (1996).  We will use the following facts from Berkes et al. (2003) for the proofs: FACT 1. For any ν > 0, FACT 2. There exist random variables Z 0 , Z 1 and Z 2 , all independent of { t ; 1 ≤ t ≤ n} and a number 0 < ρ < 1, such that FACT 3. Let {(A t , B t , C t ); t ≥ 0} be a sequence of identically distributed random variables. If E log + A 0 + E log + B 0 + E log + C 0 < ∞, then for any |r| < 1, Idea of the proof of Theorem 2.1. We first derive the following Theorem A.1, Corollary A.1.1 and Theorem A.2 on empirical processes where a scale-perturbed weighted mixed-empirical process is approximated by its non-perturbed version. With θ nϕ = θ 0ϕ + n −1/2 b, we derive asymptotic expansion of the difference between two quantities T 1n (θ nϕ ) and T 2n (θ nϕ ) which are defined later. We then show that T 1n (θ nϕ ) can be approximated by a r.v., which is asymptotic normal, plus a term linear in b. Also, we use T 2n (θ nϕ ) to approximate R n (θ nϕ ) and show that asymptotically their difference is a r.v. with mean zero. Finally, we prove that the difference of R n (θ nϕ ) and R n (θ nϕ ) converges in probability to zero. Using these results, we are able to derive the asymptotic linearity of R n (θ nϕ ) as shown in Proposition 2.1. Finally, using the definition of the one-step R-estimator in (2.9), we are able to derive the asymptotic distribution of θ nϕ .
Let C n := n t=1 E|γ nt | q for some q > 2. Let a with 0 < a < q/2 be such that The following theorem shows that uniformly over the entire real line, the perturbed processŨ n can be approximated by U * n .
THEOREM A.1. Under the above set-up and Assumptions (A.6)-(A.12) and Assumption 2.1, Proof. The proof is similar to the proof in Mukherjee (2007, Theorem 6.1). In particular, we show point-wise convergence for each x and then invoke the monotone structure of the mean processes to achieve the uniform convergence. For weighted empirical, the monotonically increasing mean process is given by the distribution function. Although µ in the present case is not a monotone function on (−∞, ∞), we use its monotone property separately on (−∞, 0] and [0, ∞).
We remark that this theorem is different from Koul and Ossiander (1994, Theorem 1.1) and Mukherjee (2007, Theorem 6.1) where weighted empirical processes were considered for the estimation of the mean parameters. For the estimation of the scale parameters, in this paper we consider weighted mixed-empirical process which is a weighted sum of the mixture of error and its indicator process.
The following corollary describes a Taylor-type expansion of the weighted sum of indicator functionsṼ n (x). Hence, Proof. Here (A.15) follows from (A.14) and (A.13). Therefore, it remains to prove (A.14).
Notice that the LHS of (A.14) equals =o P (1) due to (A.12) and Assumption 2.1.
The next theorem provides an extended version of (A.13) when the weights are functions on appropriately scaled parameter space. We define the following processes of two arguments as follows.
Probabilistic framework: Let {η t , 1 ≤ t ≤ n} be i.i.d. with the c.d.f. G, {l nt ; 1 ≤ t ≤ n} be an array of measurable functions from R m to R such that for every b ∈ R m and 1 ≤ t ≤ n, (l nt (b), u nt (b)) are independent of η t . For x ∈ R and b ∈ R m , let Here U * (·, ·) is a sequence of ordinary non-perturbed weighted mixed-empirical processes with weights {l nt (·)} andŨ(·, ·) is a sequence of perturbed weighted mixed-empirical processes with scale perturbations {u nt (·)}. In Theorem A.2 below it is shown thatŨ can be uniformly approximated by U * under the following conditions (A.16)-(A.24) for {l nt (·)} and {u nt (·)}. Note that the statements on assumptions and convergence hold point-wise for each fixed b ∈ R m .
There exist numbers q > 2 and a (both free from b) satisfying 0 < a < q/2 such that For some positive random process (b), ∀ b and ε > 0, ∃ δ > 0, and n 1 ∈ N whenever s ≤ b, and n > n 1 , (A.23) ∀ b and ε > 0, ∃ δ > 0, and n 2 ∈ N whenever s ≤ b, and n > n 2 , The uniform convergence with respect to b over compact sets can be proved as in Mukherjee (2007, Lemma 3.2) using conditions (A.23) and (A.24).
The following facts are useful in the proofs of various results of this paper. Let m = 1 + p + q be the total number of parameters and fix b ∈ R m . Let θ nϕ = θ 0ϕ + n −1/2 b, for some θ * = θ * (n, t, b) in the neighbourhood of θ 0ϕ for large n. The n −1/2 -factor is used later for deriving convergence of some sequence of random vectors. Similarly, for some θ * , For δ > 0 in Assumption 2.1 and any c > 0, since all moments of {|ξ nt |} are finite and η t and ξ nt are independent for all t. Therefore max Ifv t (θ nϕ )/v t (θ nϕ ) appears as the coefficients, we replace it byv t (θ 0ϕ )/v t (θ 0ϕ ) and the difference is controlled as follows. Notice that all derivatives below exist with bounded moments and sov Only the term n −1/2 A t (θ 0ϕ )b is of our interest since others are of higher order than n. Take l nt (b) to be equal to the j-th coordinate (1 ≤ j ≤ m = 1 + p + q) of and u nt (b) as in (A.26). We now show that (A.17)-(A.24) hold with such choice.
For each t with 1 ≤ t ≤ n, {L nt (b), u nt (b)} are independent of η t . Using a Taylor expansion of l nt (b) at θ 0ϕ for each 1 ≤ t ≤ n and noting the existence of all moments of v t (θ 0ϕ ) and its derivatives of all higher orders, (A.17) and (A.18) hold. Existence of all higher moments of {l nt (b), u nt (b)} ensure conditions (A.19)-(A.21).
To verify (A.22), we use (A.27) and that for each t, v t (·) is a smooth function with derivatives of all order to conclude that Conditions (A.23) and (A.24) can be verified using the mean value theorem.
The following lemmas and their proofs represent the intermediate steps in the proofs of Proposition 2.1 and Theorem 2.1.
The following lemma states that the difference between T n1 (θ nϕ ) and N n (θ 0ϕ ) is asymptotically linear in b.
Using the independence of v t and η t for each t, N n (θ 0ϕ ) is a sum of the vectors of martingale differences and so (A.35) follows from the martingale CLT. Now consider the rank-based counterpart of T n2 (θ nϕ ) The following lemma provides the difference between T n2 (θ nϕ ) and R n (θ nϕ ). It shows that the effect of replacing observations in T 2n (θ nϕ ) by ranks is asymptotically a r.v. with mean zero.
Proof. Consider the following decomposition Using the n −1/2 -factor in (A.30) and (A.28), D n1 , D n2 and D n4 are u P (1). We next prove that D n3 = Q n (θ 0ϕ ) + u P (1) in detail. Recall thatG n (x) is the empirical distribution function of {η t }. Let G n (x), x ∈ R be the empirical distribution function of n1 is the weighted sum of the difference of a c.d.f. evaluated at two different r.v.'s and integrated wrt ϕ, using the same technique for proving (A.33), Write w t =v t (θ 0ϕ )/v t (θ 0ϕ ). Since D * n2 is the weighted sum of the difference of two different c.d.f.'s evaluated at two different r.v.'s and integrated wrt ϕ, Using (A.29), sup G −1 n (u) −G −1 n (u) ; u ∈ (0, 1) = u P (1). Hence, by Theorem A.2, Finally consider D * n3 written as For (A.37), note that {M n (.)} converges weakly to a Brownian Bridge since for each fixed u, M n (u) converges to a normal distribution using the martingale CLT and it is tight using the bound on the moment of the difference process in Billingsley (1968, Theorem 12.3).
LEMMA A.6. Let Assumptions 2.1-2.3 hold. Then, as n → ∞, Proof. Note that R n (θ nϕ ) − R n (θ nϕ ) equals Hence, in view of (A.1) and (A.4), for every 0 < b < ∞, which implies that (A.41) is u P (1). Since ϕ is bounded, (A.42) is u P (1). For (A.43), since there is a n −1/2 factor fromv it suffices to prove that Let x denotes the greatest integer less than or equal to x. We split the sum in K n into two parts: in the first part, t runs till n k − 1 where 0 < k < 1/2. We show that this part is u p (1) by noting that its expectation is of the form n k n −1/2 = o(1) multiplied by expectation ofv t (θ 0ϕ )/v t (θ 0ϕ )η t and a bounded quantity because ϕ is bounded. The number of summands in the second term is n − n k which is large but there we bound expectation of the sum of by a quantity of the form nρ n k with 0 < k < 1/2 and 0 < ρ < 1 and this is o(1). Accordingly To show (A.45) is u P (1), we prove that for every 0 < b < ∞, For n k ≤ t ≤ n, we decompose ranks as where the first sum is O P (n k−1 ). For the second sum, writing , the modulus of (A.48) is bounded above by sup x∈R ||b||<b 1 n + 1 n i= n k where the set A i,x,θ is defined as Therefore, it suffices to prove that n i= n k I(A i,x,θ ) = o P (1) uniformly with respect to both x and θ. We show this with sets containing A i,x,θ .
Recall that using (A.2), v t (θ) ≥ c 0 (θ) > c for a positive constant c and so .

Now using the triangular inequality
.
Consider r.v.s X and L ≥ 0 with X independent of η. Then P[X < η < X + L] ≤ sup Notice that since n k ≤ t ≤ n, n i= n k E ρ t Z 4 |η i | ≤ nρ n k E(Z 4 )E|η| = o(1) due to 0 < ρ < 1 and E|η| < ∞. Hence, n i= n k I(A i,x,θ ) converges in mean to zero uniformly with respect to both x and θ, which entails n i= n k I(A i,x,θ ) = o P (1) uniformly.
With all the results above, we can easily prove Proposition 2.1 and the asymptotic result for the R-estimator as follows. which, by letting b = 0, entails R n (θ 0ϕ ) = Q n (θ 0ϕ ) + N n (θ 0ϕ ) + u P (1).
where the first equality is due to independence of η i and η j for i = j, independence of v j and η j , and Assumption 2.1. The second equality is due to independence of v i and η i . The last equality is due to Assumption 2.1. Recall the definition of λ(ϕ) in (2.6), which can also be written as We then have Hence, by recalling (A.38) and in view of (2.10), the asymptotic covariance matrix of √ n θ nϕ − θ 0ϕ is (1 + ρ(ϕ)) 2 (J (θ 0ϕ )) −1 .
Proof of Theorem 2.2.
For the vdW and Wilcoxon R-estimators, the AREs are more difficult to calculate since γ(ϕ) and λ(ϕ) are non-zero. However, in the simulation study in Table 2, the estimated AREs reveal that the vdW R-estimator, compared with the QMLE, does not lose any efficiency which is a reflection of the well-known Chernoff-Savage phenomenon in the literature on the R-estimation in linear models.  Note: The residuals are obtained by using the vdW R-estimator.