## Abstract

Using a new Bayesian method for the analysis of diffusion processes, this article finds that the nonlinear drift in interest rates found in a number of previous studies can be confirmed only under prior distributions that are best described as informative. The assumption of stationarity, which is common in the literature, represents a nontrivial prior belief about the shape of the drift function. This belief and the use of “flat” priors contribute strongly to the finding of nonlinear mean reversion. Implementation of an approximate Jeffreys prior results in virtually no evidence for mean reversion in interest rates unless stationarity is assumed. Finally, the article documents that nonlinear drift is primarily a feature of daily rather than monthly data, and that these data contain a transitory element that is not reflected in the volatility of longer-maturity yields.

The drift of the short-term interest rate is an important determinant of a wide variety of asset prices, both inside and outside the boundaries of what is often called fixed income. While volatilities may be estimated relatively accurately using high-frequency observations of the short rate, the short rate's extreme persistence makes identifying the true shape of the drift function a particularly elusive goal.

If our goal is merely to fit prices, then the difficulty in estimating a drift function can be avoided by backing out implied drifts using a no-arbitrage approach such as Hull and White (1990) or Heath, Jarrow, and Morton (1992). These methods bypass the true distribution entirely, focusing on solving for the risk-neutral drift function that is consistent with the cross section of bond prices. If our goal, however, is to learn from prices and to be able to assess theories, such as the expectations hypothesis, that link short and long rates, then estimating the drift function under the true measure is unavoidable.

In several recent articles, a variety of sophisticated econometric techniques have been brought to bear on the problem. In particular, Aït-Sahalia (1996a) and Stanton (1997) propose nonparametric-based methods for estimating nonlinear drift and diffusion functions of the short rate. Both articles find that the nonlinearity in the drift function is important. In fact, Aït-Sahalia (1996b, p. 385) concludes, in reference to the linear drift models of Chan et al. (1992) and others, that “the principal source of rejection of existing models is the strong nonlinearity of the drift.” He finds that interest rates behave like a random walk over nearly their entire historical range, reverting toward the middle of this range only when they become very high or very low. In a fully nonparametric analysis, Stanton (1997) estimates a comparable drift function, with very little mean reversion for all rates below 15% but substantial negative drift for higher rates.

Similar results are reported by Conley et al. (1997; hereafter CHLS), who estimate a drift function that is nonzero only for rates below 3% or above 11%. Jiang and Knight (1997) find a comparable pattern of nonlinear mean reversion in a sample of Canadian interest rates.

Figure 1 plots the drift functions — the expected change in the short rate per year as a function of the level of the rate — estimated by Aït-Sahalia and CHLS. While there is general agreement that higher interest rates tend to drift downward and low rates upward, how high and how low rates must be for this to happen remains a point of contention, with Aït-Sahalia assigning random walk-like behavior for a much wider range of interest rates.

One possible criticism of all of these articles is that each assumes the stationarity of the interest rate process, a characteristic that has a great deal of economic appeal but which fails to receive strong support in formal tests.^{1,}^{2} Aït-Sahalia's estimator, since it requires the nonparametric estimation of the marginal density of the spot rate, is undefined if rates are nonstationary. The CHLS approach relies on the moment conditions of Hansen and Scheinkman (1995), which can loosely be interpreted as statements of the fact that functions of stationary processes have an unconditionally zero drift. Because imposing stationarity of the short rate puts restrictions on the possible shape of its drift function, any analysis that imposes this restriction runs the risk of mechanically assuming away the question of interest, no matter how appealing the restriction seems.

Even if the short rate is stationary, its high degree of persistence may make small-sample inference problematic. Several recent Monte Carlo studies have examined the finite sample performance of estimators used in the previous articles and have concluded that this performance can be deficient.

Pritsker (1998) finds that the asymptotic significance levels of Aït-Sahalia's (1996b) specification test are often inappropriate in finite samples. He notes that nonparametric procedures have been predominantly studied in an *i.i.d.* setting, and that little is known about optimal implementation of these procedures (particularly the choice of bandwidth) when the data generation process is highly persistent, as is the case with interest rates. Persistence is also unrecognized by Aït-Sahalia's test statistic, since it is not a concern in large samples. In a careful consideration of the case of Vasicek (1977) interest rates, Pritsker finds that the asymptotic test rejects the true null approximately 50% of the time in some cases.

More relevant to the current study, Chapman and Pearson (2000) find that both Stanton's and Aït-Sahalia's estimators display a finite sample bias toward finding nonlinearity in a drift that is actually linear. Chapman and Pearson attribute this bias to the nonparametric procedures that underlie both of these articles' estimation methods and contend that the evidence provided by Stanton and Aït-Sahalia is insufficient to conclude that nonlinear drift is a “robust stylized fact.”

While both Pritsker (1998) and Chapman and Pearson (2000) suggest that nonparametric methods may be unreliable in the detection of non-linearities, there is a more general concern in estimating time-series models that has nothing to do with nonparametric methods. The problem is that standard estimators such as ordinary least squares and maximum likelihood are generally biased for time-series models. In the first-order auto-regressive model, for example, it is wellknown that in finite samples the autoregressive coefficient is biased toward zero. How this bias generalizes to more complicated nonlinear models is unknown.

In spite of these problems, analysis of nonlinear mean reversion remains important simply because of its relevance for so many economic issues. Nonlinear drift offers potential improvements in fixed income pricing, as Ahn and Gao (1999) have recently demonstrated, and is also compelling because it has the potential to explain, at least in part, a number of the outstanding puzzles about the term structure. Bekaert, Hodrick, and Marshall (2000) propose to explain empirical findings of the expectations hypothesis using a regime-switching model that Ang and Bekaert (2002) have shown is capable of capturing nonlinear behavior in the short rate. Pfann, Schotman, and Tschernig (1996) observe that the nonlinear relations that exist between short and long yields and also between their volatilities are also consistent with nonlinear models of the short rate. Finally, nonlinearity in the short-rate drift might explain why standard tests of stationarity generally do not reject the unit root. Because Dickey–Fuller tests are based on the assumption of a linear autoregressive model, data generated by a stationary nonlinear drift model could have little power to reject the unit root.

In order to remain consistent with previous literature, I focus on representations of the short-term interest rate as a continuous-time diffusion process. This decision reflects the fact that diffusions are the modeling framework of choice for much of modern asset pricing. With this asset pricing theory, prices of related fixed-income securities can be calculated without resorting to linear or log-linear approximations that may not hold accurately. More importantly, diffusions provide a parsimonious framework for examining data of different frequencies, since a single diffusion model automatically determines conditional distributions of the process at all time horizons.

Because of the problems that have been attributed to the use of asymptotic frequentist methods, particularly those which rely on the stationary of the process under consideration, this article takes a Bayesian perspective. A primary task of the article is therefore to introduce a new method for the Bayesian analysis of diffusion processes that will generate exact finite-sample inferences even for nonstationary models.

Using this method I reassess the evidence for the drift nonlinearities first identified by Aït-Sahalia using a time series of short-term interest rates. Robustness of the results will be evaluated by comparing results generated under a variety of priors, where each is chosen to represent some notion of prior ignorance. This type of Bayesian analysis, suggested by Leamer (1985) and Poirier (1995), interprets the sensitivity of results to the specification of the prior as evidence against the availability of an “objective” conclusion.

The results of this article demonstrate that fully efficient parametric analysis may be no less problematic than nonparametric analysis, and that conclusions in favor of nonlinear drift may largely be driven by implicit prior beliefs that contain a nontrivial amount of information about the shape of the drift function. This article shows that the priors that generate nonlinear drift may reasonably be interpreted as informative, and that under other priors the result disappears completely.

Lastly, the article identifies that the evidence favoring nonlinear drift is primarily a feature of high-frequency data, and that these data contain a transitory noise component that accounts for roughly half the daily variation in the short rate. The analysis reveals an obvious misspecification of the one-factor model, so I propose a simple two-factor extension with a latent nonlinear stochastic mean process. The generalized model reconciles the different sampling interval results and provides further evidence against the nonlinearities identified previously.

The article proceeds as follows: Section 1 reviews previous work in modeling nonlinear interest rate processes. Section 2 develops a Bayesian method for estimating parameters of discretely observed diffusion processes. The method is applied in section 3 to analyze nonlinear mean reversion under the different prior distributions. Section 4 checks for model misspecification and introduces the stochastic mean model. Section 5 concludes.

## 1. Modeling Nonlinear Drift in the Short-Term Interest Rate

Within the class of one-factor models, the interest rate process has traditionally been modeled as having a linear drift, often with a constant elasticity of variance. As a diffusion, the process is written as

For simple linear models such as these, estimating the drift may be as simple as running a least squares regression. These same models, however, have often been found to be unsatisfactory in their description of short-rate dynamics and their implications for other security prices. The alternatives that have been proposed are often a great deal more complex. Gray (1996), Pfann, Schotman, and Tschernig (1996), and Naik and Lee (1993), for example, have generalized standard models to include regime shifts, Das (2002) and Johannes (2002) add jumps, while Andersen and Lund (1997), Balduzzi, Das, and Foresi (1998), and Jegadeesh and Pennacci (1996) consider multifactor models in which volatility is stochastic or interest rates revert in a linear fashion toward a stochastic attractor. Articles too numerous to mention have explored other generalizations.

The primary model considered in this article, while more general than those first proposed by Vasicek (1977) and Cox, Ingersoll, and Ross (1985), remains in the single-factor class. This choice reflects a belief that this class of models has not yet been fully explored. At the very least, it seems natural to ask how much of the dynamics of both short-and long-term yields can be explained by a more general one-factor model before considering multifactor models.

Because there is little reason a priori to assume particular specifications of either the drift or diffusion functions, Aït-Sahalia (1996b) advocates the use of flexible functional forms to approximate their true unknown shapes. He proposes the following model of the short rate process:

Because it is a characteristic that has generally been assumed in previous work, it is useful to consider the parameter restrictions that are required to generate stationarity. In fact, stationarity of the nonlinear drift model can be achieved in several ways. A simple sufficient condition is that $$\alpha_2 \lt 0$$ and $$\alpha_3 \gt 0.$$ In Aït-Sahalia's (1996b) treatment of this model, these are the parameter restrictions he employs. CHLS note, however, that the restriction on $$\alpha_2$$ is unnecessary when $$\gamma \gt 1.5.$$ In this case the stationarity of the process may be “volatility induced” rather than “drift induced.” I will examine the implications of imposing each type of stationarity in the estimation of the model.

## 2. A Bayesian Method for the Analysis of Diffusion Processes

The primary difficulty in estimating diffusion processes stems from the intractability of their transition densities and hence likelihood functions.^{3} Because the Bayesian posterior distribution is typically attained as the normalized product of the prior distribution and the likelihood function, the unknown form of the likelihood impedes Bayesian analysis as well. I address this problem by using a combination of simple numerical techniques: the Euler approximation, the Gibbs sampler, and the Metropolis–Hastings algorithm. By combining these tools appropriately, posterior distributions of the parameters of the diffusion process can be generated to any desired degree of accuracy. This is accomplished by generating thousands of draws from these multivariate posteriors. Given a large enough set of such draws, moments, confidence intervals, and marginal densities of the parameters can be computed easily.

The econometric approach of this article is based in a strand of statistics known as Markov chain Monte Carlo (MCMC). Appendix A provides a brief introduction to the Gibbs sampler, perhaps the simplest example of MCMC.^{4} The following sections assume a casual familiarity with this technique.

### 2.1 Augmenting with high-frequency data

The approach of this article to estimating diffusions is based on a simple intuition: if the diffusion,

In theory, therefore, we could avoid the elaborate econometrics of continuous-time processes by simply restricting our analysis to high-frequency data. Unfortunately high-frequency data are not always available, particularly for less recent historical periods. And even if, say, daily data were available, would it be of sufficiently high frequency to render discretization bias insignificant? There is in general no way to answer this question except through empirical investigation with a method that can be used to account for this bias.

Even if the data were available, there are a number of reasons why the use of high-frequency asset price data may be undesirable. Price discreteness, infrequent trading, intraday volatility periodicity, bid-ask bounce, and periodic market closure are all difficult to reconcile with the simple and elegant properties of the diffusion process. Although some of these problems can be corrected for using simple modifications of the procedure proposed here, each would invalidate the simple estimation of the discretized process of Equation (5).

The resolution proposed in Jones (1999) is to use Tanner and Wong's (1987) data augmentation algorithm to augment the observed data with paths of much higher frequency data — for example, augmenting monthly with daily data.^{5} As these augmented data are added at closer and closer intervals, the likelihood of the discretized approximation will converge to that of the true diffusion likelihood, following the results of Pedersen (1995) and Brandt and Santa-Clara (2002). In practice, the frequency of the Euler approximation will be chosen to be high enough so that it will have approximately the same distribution as the diffusion of interest. This will generally mean that the observed data are of a lower frequency than the frequency at which the Euler approximation operates. For example, we may be working with month-end data, but a reasonable diffusion approximation might require 10 discrete time transitions per month, making 9 out of every 10 data points unobserved.

Conditional on this unobserved high-frequency data in addition to the observable low-frequency data, a distribution for the model parameters may usually be obtained quite easily. We then integrate out, using a Gibbs sampler-like Markov chain, the dependence on particular paths of unobserved data to get posteriors conditional on only the observed data.

The idea of augmenting with high-frequency data may be considered the Bayesian counterpart of the simulation-based classical literature on continuous-time econometrics, which typically uses the Euler approximation to compute by simulation objective functions that are analytically intractable. Examples include Duffie and Singleton (1993), Gourieroux, Monfort, and Renault (1993), Pedersen (1995), Gallant and Tauchen (1996), and Brandt and Santa-Clara (2002). These approaches use the Euler approximation to simulate forward paths of artificial data. Simulated moment-based procedures, for example, use the Euler approximation to simulate long paths of the diffusion which are then used to calculate unconditional moments of the model. Simulated maximum likelihood uses the Euler approximation to compute each one-period transition density numerically, requiring a large number of short simulated paths.

In contrast, the simulations in this article merely “bridge” the observed low-frequency data with short paths of high-frequency data. Each simulation is entirely consistent with the low-frequency data, automatically preserving many of the stylized facts observable in the original data: the general historical shape, the patterns of volatility, and the degree of persistence, for example. Figure 2 illustrates the comparison of high-frequency data augmentation with two classical methods, the simulated method of moments [e.g., Duffie and Singleton (1993)] and simulated maximum likelihood [e.g., Brandt and Santa-Clara (2002)]. It is clear that by pinning down both ends of the simulated paths the variance of the latent high-frequency data can be reduced dramatically relative to other methods. Since all methods require some form of Monte Carlo integration, the lower variance of augmented data results in greater computational efficiency.

It should be emphasized that the purpose of augmenting with high-frequency data is to reduce discretization bias, not add information to the sample. Although each path of high-frequency data will add information to the relatively scarce low-frequency data, by integrating out the dependence on particular high-frequency paths, this information is washed out of the final posterior distribution.

### 2.2 Details of the Markov chain

To explain the details of the procedure it is necessary to have a more precise statement of the Euler approximation. For maximum intuition, the procedure is described for a univariate process $$r$$, although a multivariate generalization is simple and is pursued later in the article. A discrete time process operating on a unit of time of length $$h$$, the Euler approximation of Equation (4) may be written as

From Pedersen (1995) or Brandt and Santa-Clara (2002) we know that under regularity conditions the likelihood of the Euler approximation converges to that of the diffusion as $$h \to 0.$$ The approach will therefore allow $$h$$ to be arbitrarily small regardless of the frequency of the observed data.

Let $$\textbf {R}^{\textbf {o}}$$ denote the set of all the observed low-frequency data, corresponding to integer values of $$kh$$. Let $$\textbf {R}^{\textbf {u}}$$ denote the unobserved high-frequency data, corresponding to noninteger $$kh$$. Following the intuition of the Gibbs sampler, the Markov chain will alternate between drawing from the conditional distributions $$p(\phi|\textbf {R}^{\textbf {o}}, \textbf {R}^{\textbf {u}})$$ and $$p(\textbf {R}^{\textbf {u}}|\phi, \textbf {R}^{\textbf {o}}).$$

We draw from the distribution of the model parameters conditional on both observed and augmented data. From Bayes' rule,

If it were possible to draw directly from the distribution $$p(\textbf {R}^{\textbf {u}}|\phi, \textbf {R}^{\textbf {o}}),$$ then the specification of the Markov chain would be complete. In even the simplest cases, however, this high-dimensional distribution is of unknown form, meaning that an additional numerical technique must be applied.

I adapt a technique proposed by Jacquier, Polson, and Rossi (1994) for the analysis of discrete-time stochastic volatility models. It is termed a cyclic Metropolis chain because it “cycles” through the individual elements of $$\textbf {R}^{\textbf {u}},$$ drawing values of $$\textbf {R}^{\textbf {u}}$$ point by point using the Metropolis–Hastings algorithm at each step. In essence, we make each element of $$\textbf {R}^{\textbf {u}}$$ a separate block in the Markov chain. Thus if there were 1000 elements of $$\textbf {R}^{\textbf {u}},$$ we would have 1001 block draws in the Markov chain: 1000 draws of high-frequency data points and one draw of $$\phi$$.^{6}

Appendix B describes the data augmentation procedure in greater detail.

## 3. Estimating the Short-Rate Model

The primary model of nonlinear drift considered in the remainder of the article is

The Euler approximation of the nonlinear drift model is given by

In particular, the appendix shows that augmenting with high-frequency data is particularly important when looking at monthly data, as interest rates generated by a naive discretization $$(h = 1)$$ can easily be rejected as coming from the corresponding diffusion process. By reducing the discretization interval to .05 or .2, however, the tests no longer result in rejections. In the simulation of daily data, discretization bias is not detected, implying that discretization bias may not be very important for these data. With the support these results provide, we proceed with the use of the discretization scheme.

The heteroscedasticity in Equation (9) may be eliminated by rearranging the Euler approximation as

### 3.1 The data

The time series used to proxy for the short-term interest rate is the same seven-day Eurodollar rate series used by Aït-Sahalia (1996b). The data are graphed in Figure 3. This daily series, with 5505 observations, covers the period from June 1, 1973, to February 25, 1995.

One goal of this article is to determine the robustness of nonlinear mean reversion to different sampling intervals. In addition to estimating the model using the entire daily sample, I will repeat the estimations using only the 261 month-end observations. While the daily data have the potential of adding additional information, they appear to be very noisy with many highly transitory shocks, especially in the first half of the sample. Part of this noise appears to be microstructure-related, since the reported rates are usually approximate multiples of one-sixteenth of 1%. Monthly data should allow us to mitigate the effects of this predominately high-frequency noise. In any case, if our primary concern is to learn about the drift of the process, it is likely that monthly and daily data will yield similar results, as higher-frequency observation tends to add little information about parameters of the drift.

### 3.2 Prior distributions

I will consider several prior distributions with the goal of determining how different prior beliefs affect conclusions about the shape of the drift function. The two classes of priors are considered — the flat prior and an approximate Jeffreys prior — are both chosen to represent different notions of prior ignorance. Within each class I will consider differing prior beliefs about stationarity. The first is a prior that is not informative about stationarity. The second is a prior that contains a belief that the process is stationary with probability one. The last is a belief that the process is stationary, and furthermore, that the stationary is drift induced, corresponding to the parameter restrictions imposed by Aït-Sahalia (1996b). Differences in conclusions across the six priors will be taken as evidence of a Bayesian small sample problem, in which no “objective” Bayesian inference is possible.

The first class, the flat prior, is particularly easy to work with and is interesting for a variety of reasons. Flat priors allow us to examine most directly the shape of the likelihood function. Since the flat prior mode is typically very close to the maximum-likelihood estimate, sometimes identical to it, flat prior results have a frequentist interpretation. In addition, the flat prior is also a natural choice since it is the prior that is often favored by applied Bayesian researchers.

In the case of exogenous regressors, the flat prior has a more theoretical grounding as well, since it is synonymous with the Jeffreys prior, which is known to have many desirable properties. One such property is that the Jeffreys prior is invariant to reparameterizations of the model — two models parameterized differently will yield the same results if each is analyzed under the Jeffreys prior derived under its own parameterization. Another is the fact that the Jeffreys prior is the prior distribution that minimizes Shannon's commonly used measure of information, giving a more formal justification for the view that the Jeffreys prior is maximally ignorant.

When regressors are endogenous, the flat and Jeffreys priors no longer coincide.^{7} As has been argued forcefully by Phillips (1991a, 1991b), flat priors can be quite informative for time-series models. In his analysis of the first-order autoregressive model, $$y_t = \rho y_{t - 1} + \epsilon_t,$$ Phillips notes that the data should be expected to do a better job distinguishing nearby values of the autoregressive parameter $$\rho$$ when the true value of $$\rho$$ is close to or within the explosive region $$|\rho| \ge 1.$$ Intuitively, if $$y$$ explodes then the ratio of signal to noise about $$\rho$$ goes to infinity, since the mean is level dependent but the variance is not. In a frequentist setting this behavior leads to the superconsistency and downward bias of the MLE estimator.

Phillips argues that the flat prior, by ignoring this property of the model, effectively imposes a prior view that explosive behavior is improbable. Mechanically the MLE estimate of $$\rho$$ is identical to its Bayesian posterior mean. By not anticipating and correcting for the bias of the MLE estimator, the researcher is implicitly taking an informed view that this bias is somehow desirable.

Proposing to use the Jeffreys prior as a better representation of prior ignorance, Phillips derives the Jeffreys prior for the AR(1) model and finds that it assigns much higher prior densities to values of $$\rho$$ in the explosive region than to nonexplosive values of $$\rho$$. In effect, the Jeffreys prior offsets the finite sample bias of MLE. Phillips finds that the conclusions that result from using the Jeffreys prior are similar to those made using frequentist unit root econometrics. Namely, the rejections of unit roots that result from flat prior Bayesian analysis are generally overturned when using the Jeffreys prior.

Whether or not the short rate actually has a unit root, its high degree of persistence makes concerns about the flat prior relevant for our analysis. I therefore consider an approximation of the Jeffreys class of priors as an alternative to flat priors.^{8} Again, I consider the case in which the prior belief contains no information about stationarity and the case in which parameter combinations that generate stationary or drift-stationary behavior are viewed as having zero prior one.

Without a prior belief about stationarity, the flat prior is given by

^{9}

The Jeffreys prior, as discussed in Appendix E, does not have a closed-form representation and must be computed by simulation. If we let $$p_J$$ denote the Jeffreys prior that does not impose stationarity, then the corresponding stationary prior is given by

The Jeffreys prior that imposes drift-induced stationarity is then

These “restricted” priors used to impose stationarity are particularly easy to work with. Following Box and Tiao (1973, p. 67–69), it can be shown that, in the region in which the restricted prior is nonzero, a posterior which incorporates a restricted prior is proportional to the corresponding posterior using an unrestricted prior. Where the prior probability is zero, so must be the posterior probability. This result suggests the simple approach of accept/reject as a way of drawing the parameters in the restricted case: draw the vector of parameters as if the prior were unrestricted and accept only those parameter vectors for which the stationarity restrictions hold.

### 3.3 Results

Markov chains were simulated to length 110,000 and the first 10,000 draws were discarded to negate the effects of initial conditions. To facilitate numerical computations only 1 out of every 10 iterations of the chain were saved, leaving 10,000 draws from the posterior distribution for each prior. A natural concern in any Markov chain Monte Carlo method is that the posterior draws are too highly autocorrelated, an indication that the chain may be slow to converge to its invariant distribution. The autocorrelation of the 10,000 draws saved is not high, however. In fact, the first-order autocorrelations of the drift parameter draws are nearly identically zero. The draws of $$\sigma$$ and $$\gamma$$ have first-order autocorrelations of about .5, declining to about .02 at the 10th lag, values that should not raise concerns about convergence.

Given the results in Appendix C, discretization bias is eliminated by setting $$h$$ equal to .2 for all analysis with daily data and .05 for analysis with monthly data. Smaller values of $$h$$ have no noticeable impact on any of the results.

Table 1 lists descriptive statistics on the posterior draws for the annualized parameters for both sampling frequencies and each of the six priors. Specifically, I report the means, standard deviations, and 95% highest posterior intervals.^{10}

Flat prior | Stationary flat prior | Drift-stationary flat prior | Jeffreys prior | Stationary Jeffreys prior | Drift-stationary Jeffreys prior | |
---|---|---|---|---|---|---|

Panel A: Daily Data | ||||||

Posterior means | ||||||

$$\alpha_0 \times 10$$ | −3.62 | −4.14 | −4.14 | 0.75 | −0.66 | −0.66 |

$$\alpha_1 \times 10$$ | 6.91 | 7.66 | 7.66 | 0.43 | 2.83 | 2.83 |

$$\alpha_2 \times 10^{- 1}$$ | −3.74 | −4.05 | −4.05 | −0.96 | −2.04 | −2.04 |

$$\alpha_3 \times 10^3$$ | 6.40 | 7.40 | 7.40 | −2.21 | 0.08 | 0.08 |

$$\sigma$$ | 1.55 | 1.55 | 1.55 | 1.56 | 1.62 | 1.62 |

$$\gamma$$ | 1.36 | 1.36 | 1.36 | 1.36 | 1.38 | 1.38 |

Posterior standard deviations | ||||||

$$\alpha_0 \times 10$$ | 2.60 | 2.19 | 2.19 | 1.34 | 0.37 | 0.37 |

$$\alpha_1$$ | 3.95 | 3.38 | 3.38 | 2.36 | 1.16 | 1.16 |

$$\alpha_2 \times 10^1$$ | 1.77 | 1.55 | 1.55 | 1.23 | 0.80 | 0.80 |

$$\alpha_3 \times 10^3$$ | 4.98 | 4.17 | 4.17 | 2.29 | 0.29 | 0.29 |

$$\sigma$$ | 0.09 | 0.09 | 0.09 | 0.08 | 0.07 | 0.07 |

$$\gamma$$ | 0.02 | 0.02 | 0.02 | 0.02 | 0.02 | 0.02 |

Posterior 95% HPD intervals | ||||||

$$\alpha_0 \times 10$$ | (−8.51, 1.61) | (−8.19, −0.18) | (−8.19, −0.18) | (−1.18, 3.62) | (−1.18, −0.17) | (−1.18, −0.17) |

$$\alpha_1$$ | (−1.13, 14.27) | (1.59, 14.23) | (1.59, 14.23) | (−4.21, 4.56) | (0.99, 4.50) | (0.99, 4.50) |

$$\alpha_2 \times 10^1$$ | (−7.35, −0.44) | (−7.07, −1.16) | (−7.07, −1.16) | (−3.09, 1.51) | (−3.01, −0.70) | (−3.01, −0.70) |

$$\alpha_3 \times 10^3$$ | (−3.71, 15.71) | (0.00, 14.79) | (0.00, 14.79) | (−7.12, 0.14) | (0.00, 0.17) | (0.00, 0.17) |

$$\sigma$$ | (1.39, 1.72) | (1.40, 1.73) | (1.40, 1.73) | (1.40, 1.71) | (1.49, 1.71) | (1.49, 1.71) |

$$\gamma$$ | (1.32, 1.40) | (1.32, 1.40) | (1.32, 1.40) | (1.32, 1.40) | (1.34, 1.40) | (1.34, 1.40) |

Panel B: Monthly Data | ||||||

Posterior Means | ||||||

$$\alpha_0 \times 10$$ | −1.14 | −1.56 | −1.58 | 0.31 | −0.16 | −0.15 |

$$\alpha_1$$ | 2.11 | 2.75 | 2.80 | −0.29 | 0.46 | 0.48 |

$$\alpha_2 \times 10^{- 1}$$ | −1.11 | −1.38 | −1.41 | 0.05 | −0.31 | −0.31 |

$$\alpha_3 \times 10^3$$ | 1.95 | 2.73 | 2.77 | −0.77 | 0.02 | 0.01 |

$$\sigma$$ | 1.49 | 1.50 | 1.50 | 1.63 | 1.84 | 1.84 |

$$\gamma$$ | 1.63 | 1.64 | 1.64 | 1.67 | 1.72 | 1.72 |

Posterior standard deviations | ||||||

$$\alpha_0 \times 10$$ | 1.23 | 0.95 | 0.94 | 0.47 | 0.08 | 0.07 |

$$\alpha_1$$ | 1.95 | 1.54 | 1.51 | 0.89 | 0.29 | 0.28 |

$$\alpha_2 \times 10^{- 1}$$ | 0.92 | 0.75 | 0.74 | 0.53 | 0.28 | 0.27 |

$$\alpha_3 \times 10^3$$ | 2.28 | 1.75 | 1.74 | 0.79 | 0.03 | 0.02 |

$$\sigma$$ | 0.33 | 0.33 | 0.33 | 0.34 | 0.23 | 0.23 |

$$\gamma$$ | 0.08 | 0.08 | 0.08 | 0.08 | 0.05 | 0.05 |

Posterior 95% HPD intervals | ||||||

$$\alpha_0 \times 10$$ | (−3.67, 1.17) | (−3.41, −0.02) | (−3.41, −0.07) | (−0.27, 1.25) | (−0.32, −0.10) | (−0.31, −0.09) |

$$\alpha_1$$ | (−1.62, 6.05) | (0.09, 5.81) | (0.19, 5.69) | (−2.10, 1.06) | (−0.22, 1.16) | (0.22, 1.17) |

$$\alpha_2 \times 10^{- 1}$$ | (−2.96, 0.65) | (−2.92, −0.02) | (−2.82, −0.10) | (−0.81, 1.18) | (−0.84, −0.03) | (−0.84, −0.03) |

$$\alpha_3 \times 10^3$$ | (−2.37, 6.61) | (0.00, 5.97) | (0.00, 5.98) | (−2.15, 0.04) | (0.00, 0.03) | (0.00, 0.03) |

$$\sigma$$ | (0.87, 2.14) | (0.85, 2.13) | (0.85, 2.13) | (1.07, 2.25) | (1.27, 2.17) | (1.27, 2.17) |

$$\gamma$$ | (1.48, 1.81) | (1.48, 1.80) | (1.48, 1.81) | (1.52, 1.81) | (1.60, 1.80) | (1.60, 1.79) |

Flat prior | Stationary flat prior | Drift-stationary flat prior | Jeffreys prior | Stationary Jeffreys prior | Drift-stationary Jeffreys prior | |
---|---|---|---|---|---|---|

Panel A: Daily Data | ||||||

Posterior means | ||||||

$$\alpha_0 \times 10$$ | −3.62 | −4.14 | −4.14 | 0.75 | −0.66 | −0.66 |

$$\alpha_1 \times 10$$ | 6.91 | 7.66 | 7.66 | 0.43 | 2.83 | 2.83 |

$$\alpha_2 \times 10^{- 1}$$ | −3.74 | −4.05 | −4.05 | −0.96 | −2.04 | −2.04 |

$$\alpha_3 \times 10^3$$ | 6.40 | 7.40 | 7.40 | −2.21 | 0.08 | 0.08 |

$$\sigma$$ | 1.55 | 1.55 | 1.55 | 1.56 | 1.62 | 1.62 |

$$\gamma$$ | 1.36 | 1.36 | 1.36 | 1.36 | 1.38 | 1.38 |

Posterior standard deviations | ||||||

$$\alpha_0 \times 10$$ | 2.60 | 2.19 | 2.19 | 1.34 | 0.37 | 0.37 |

$$\alpha_1$$ | 3.95 | 3.38 | 3.38 | 2.36 | 1.16 | 1.16 |

$$\alpha_2 \times 10^1$$ | 1.77 | 1.55 | 1.55 | 1.23 | 0.80 | 0.80 |

$$\alpha_3 \times 10^3$$ | 4.98 | 4.17 | 4.17 | 2.29 | 0.29 | 0.29 |

$$\sigma$$ | 0.09 | 0.09 | 0.09 | 0.08 | 0.07 | 0.07 |

$$\gamma$$ | 0.02 | 0.02 | 0.02 | 0.02 | 0.02 | 0.02 |

Posterior 95% HPD intervals | ||||||

$$\alpha_0 \times 10$$ | (−8.51, 1.61) | (−8.19, −0.18) | (−8.19, −0.18) | (−1.18, 3.62) | (−1.18, −0.17) | (−1.18, −0.17) |

$$\alpha_1$$ | (−1.13, 14.27) | (1.59, 14.23) | (1.59, 14.23) | (−4.21, 4.56) | (0.99, 4.50) | (0.99, 4.50) |

$$\alpha_2 \times 10^1$$ | (−7.35, −0.44) | (−7.07, −1.16) | (−7.07, −1.16) | (−3.09, 1.51) | (−3.01, −0.70) | (−3.01, −0.70) |

$$\alpha_3 \times 10^3$$ | (−3.71, 15.71) | (0.00, 14.79) | (0.00, 14.79) | (−7.12, 0.14) | (0.00, 0.17) | (0.00, 0.17) |

$$\sigma$$ | (1.39, 1.72) | (1.40, 1.73) | (1.40, 1.73) | (1.40, 1.71) | (1.49, 1.71) | (1.49, 1.71) |

$$\gamma$$ | (1.32, 1.40) | (1.32, 1.40) | (1.32, 1.40) | (1.32, 1.40) | (1.34, 1.40) | (1.34, 1.40) |

Panel B: Monthly Data | ||||||

Posterior Means | ||||||

$$\alpha_0 \times 10$$ | −1.14 | −1.56 | −1.58 | 0.31 | −0.16 | −0.15 |

$$\alpha_1$$ | 2.11 | 2.75 | 2.80 | −0.29 | 0.46 | 0.48 |

$$\alpha_2 \times 10^{- 1}$$ | −1.11 | −1.38 | −1.41 | 0.05 | −0.31 | −0.31 |

$$\alpha_3 \times 10^3$$ | 1.95 | 2.73 | 2.77 | −0.77 | 0.02 | 0.01 |

$$\sigma$$ | 1.49 | 1.50 | 1.50 | 1.63 | 1.84 | 1.84 |

$$\gamma$$ | 1.63 | 1.64 | 1.64 | 1.67 | 1.72 | 1.72 |

Posterior standard deviations | ||||||

$$\alpha_0 \times 10$$ | 1.23 | 0.95 | 0.94 | 0.47 | 0.08 | 0.07 |

$$\alpha_1$$ | 1.95 | 1.54 | 1.51 | 0.89 | 0.29 | 0.28 |

$$\alpha_2 \times 10^{- 1}$$ | 0.92 | 0.75 | 0.74 | 0.53 | 0.28 | 0.27 |

$$\alpha_3 \times 10^3$$ | 2.28 | 1.75 | 1.74 | 0.79 | 0.03 | 0.02 |

$$\sigma$$ | 0.33 | 0.33 | 0.33 | 0.34 | 0.23 | 0.23 |

$$\gamma$$ | 0.08 | 0.08 | 0.08 | 0.08 | 0.05 | 0.05 |

Posterior 95% HPD intervals | ||||||

$$\alpha_0 \times 10$$ | (−3.67, 1.17) | (−3.41, −0.02) | (−3.41, −0.07) | (−0.27, 1.25) | (−0.32, −0.10) | (−0.31, −0.09) |

$$\alpha_1$$ | (−1.62, 6.05) | (0.09, 5.81) | (0.19, 5.69) | (−2.10, 1.06) | (−0.22, 1.16) | (0.22, 1.17) |

$$\alpha_2 \times 10^{- 1}$$ | (−2.96, 0.65) | (−2.92, −0.02) | (−2.82, −0.10) | (−0.81, 1.18) | (−0.84, −0.03) | (−0.84, −0.03) |

$$\alpha_3 \times 10^3$$ | (−2.37, 6.61) | (0.00, 5.97) | (0.00, 5.98) | (−2.15, 0.04) | (0.00, 0.03) | (0.00, 0.03) |

$$\sigma$$ | (0.87, 2.14) | (0.85, 2.13) | (0.85, 2.13) | (1.07, 2.25) | (1.27, 2.17) | (1.27, 2.17) |

$$\gamma$$ | (1.48, 1.81) | (1.48, 1.80) | (1.48, 1.81) | (1.52, 1.81) | (1.60, 1.80) | (1.60, 1.79) |

The table reports means, standard deviations, and 95% highest posterior density intervals (the shortest interval containing 95% of all posterior mass) for each of the six parameters of the model

Comparison of panels A and B reveals major differences between the parameter values implied by the daily and monthly data. First, the drift parameter posterior means are significantly closer to zero for the monthly data than they are for the daily data. Surprisingly, the monthly data generate much lower standard deviations for the drift parameter posteriors than do the daily data.

The obvious cause of this difference is the much higher annualized volatility of the daily Eurodollar rates. Using the posterior mean values of $$\sigma$$ and $$\gamma$$ obtained under the flat prior, I compute and plot the time series of $$\sigma r_t^\gamma.$$Figure 4 compares the resulting annualized volatility paths that result from using daily and monthly posterior means. The differences are striking, with daily data implying an average annualized volatility of about 5.6%, compared with just 2.7% implied by the monthly data.

In addition to simply being higher overall, volatility in the daily data is less level dependent than it is for monthly data. For daily rates, the posteriors of $$\gamma$$ for different prior distributions are tight around means between 1.35 and 1.4, slightly lower than the values reported by Chan et al. That monthly rates imply a somewhat higher $$\gamma$$ is consistent with the presence of transitory noise that is less level dependent.

One possibility is that this noise is simply a product of a bid-ask effect or the existence of a discrete grid on which rates or prices are quoted. A preliminary version of this article calculated that such a grid would have to be fairly coarse for this to be a plausible explanation. A quick calculation yields a similar result: Suppose the observed interest rate, $$r_t,$$ is the sum of some “true” unobserved rate, $$r_t^*,$$ and an *i.i.d.* error, $$\eta_t,$$ that is normally distributed with mean zero. The variance of the change in observed rates is therefore equal to

Rough calculations reveal that raising the annualized volatility from 2.7% to 5.6% as the sampling frequency increases from once per month to once per day would require the standard deviation of $$\eta_t$$ to be around 0.2 percentage points, which would seem to be a large amount in the liquid Eurodollar market.

While the choice of prior has little impact on the posteriors of the variance parameters $$\sigma$$ and $$\gamma$$, the prior has a major effect on the posterior means of the drift parameters $$(\alpha_0, \alpha_1, \alpha_2, \alpha_3).$$ Posterior standard deviations are also affected by the prior, with larger differences in the monthly results. In general, the flat prior results in posteriors for the drift parameters that are further away from zero than those of the Jeffreys prior, although with somewhat higher standard deviations.

Because they impose sign restrictions, it is not surprising that the priors that impose stationarity result in posteriors that are more conclusive about the signs of the drift parameters. Nevertheless, for both sampling frequencies and for each prior, the large dispersion of the posteriors often makes inferences about the exact magnitudes of individual parameters difficult, especially the parameters of the drift function.

Because the multivariate posterior distribution exhibits strong correlations, sometimes above .95 in absolute value, looking at marginal posteriors may understate the informativeness of the joint posterior. In addition, the parameters $$(\alpha_0, \alpha_1, \alpha_2, \alpha_3)$$ have, individually, little economic interpretation, so a more illuminating viewpoint of the posterior is desirable. A natural quantity of interest is the drift function itself,

Panel A of Figure 5, for example, reveals a pattern of nonlinear mean reversion similar to that reported in previous studies. Little positive or negative drift is found for rates between 3% and 15%, while very strong negative drift is found for higher rates. The magnitude of the effect is striking. When the short rate is at 20%, its posterior median drift is −45% per year. Even the upper bound of the 95% confidence interval is about −10% per year.

In comparison, the same data, when analyzed under the Jeffreys prior, produces much weaker evidence for nonlinear drift. Panel B of Figure 5 shows that the drift posterior computed under the Jeffreys prior has substantial mass above zero even for interest rates above 15%.

Given that the flat prior analysis suggests highly stationary parameter values, imposing stationarity does not substantially affect any results, as is apparent in Table 1 and panels C and E of Figure 5. Under the Jeffreys prior, however, stationarity is no longer as obvious, so a prior belief that imposes stationarity has a significant impact. In panels D and F, we see that nonlinear drift is restored even under the Jeffreys prior.

Given the form of the parameter restrictions imposed by drift-induced stationarity, the nonlinearity found in panel F is not totally unexpected, since the restriction that $$\alpha_2 \gt 0$$ ensures a negative drift for sufficiently high levels of the interest rate. What is interesting is that this negative drift is inferred for values of $$r$$ that are not too extreme, with reliably negative drifts for interest rates as low as 15%. Because the posterior distributions of $$\gamma$$ lie below 1.5, stationarity must be induced by the drift rather than the volatility of interest rates. Therefore there is little difference between the results generated by the two different types of stationarity restriction.

Comparing the daily results of Figure 5 with the monthly results of Figure 6 reveals a relation similar to that found in the parameter estimates themselves: nonlinear mean reversion appears much stronger in daily data than it does in monthly data, despite the fact that confidence intervals are larger for daily data. As measured by the width of the 95% HPD intervals, the monthly data are actually more informative, and they suggest that nonlinear drift, if it exists, is not as large as one would conclude after looking only at higher-frequency data.

As with daily data, monthly data support more nonlinear drift more strongly under the flat prior than the Jeffreys prior. In fact, panel B of Figure 6 shows that monthly data provide no evidence of any drift when viewed under the Jeffreys prior, generating a drift posterior that is almost perfectly centered around zero. When a stationarity restriction is added to either prior, whether that stationarity is drift induced or volatility induced, nonlinear drift is again observed, but with a magnitude far below that implied by daily data.

These results suggest that the finding of nonlinear drift is highly dependent on the choice of the sampling frequency, the type of prior — flat or Jeffreys — and the prior belief about whether interest rates are stationary. Only for daily data under a flat prior can this negative drift in high interest rates be inferred without imposing stationarity.

For both daily and monthly data, discretization bias is evident when comparing the above results with those generated under the naive discretization $$(h = 1).$$ For daily data analyzed under the flat prior, for example, the posterior mean of $$\gamma$$ rises from 1.31 when $$h = 1$$ to 1.36 when $$h = .2,$$ a movement of more than two posterior standard deviations. Reducing $$h$$ even further to .05, however, does not further change this mean. For monthly data, the mean of $$\gamma$$ under the flat prior rises from 1.56 with $$h = 1$$ to 1.63 with $$h = .05,$$ and then rises slightly to 1.64 as $$h$$ is decreased further to .01.

Drift inferences change and discretization bias is reduced through data augmentation. Panels A and B of Figure 7 show the drift posteriors obtained using daily data under the stationary Jeffreys prior with $$h$$ set either to 1 or to .2. While the differences are not large, the drift non-linearity is slightly more severe with $$h = 1,$$ and the posterior variance appears smaller as well. Differences are much more pronounced in monthly data, as evident in panels C and D, where a conclusion of drift nonlinearity appears to hinge on the value of $$h$$ chosen, with smaller $$h$$ now making nonlinear drift significantly more likely.

### 3.4 What belief does the flat prior represent?

Although the flat prior is intuitively appealing and has a natural interpretation as being similar to maximum likelihood, it cannot be justified formally as uninformative. As Phillips (1991a, 1991b) argued for the first-order autoregressive model, the flat prior for the nonlinear drift model is likely to represent an informed belief about the probabilities of different parameter vectors that are near the boundaries of the stationary parameter space.

A natural question is whether a bias like that found in the simple AR(1) model might appear in the more complicated model considered here. I will then ask how results generated under the Jeffreys prior should be expected to differ.

Suppose interest rates are generated according to the linear drift model

While sampling distributions are of obvious interest to the frequentist econometrician, they are useful to the Bayesian as well. Because sampling distributions are known a priori, they are revealing about the properties of the prior. In particular, biases can be interpreted as evidence of a prior that is not completely uninformative.

The Monte Carlo experiment performed is designed to capture some characteristics of the daily sample of Eurodollar data. One thousand 5505-day samples were simulated under the parameter values $$\alpha_0 = .0072, \alpha_1 = - .12, \sigma = 1.55,$$ and $$\gamma = 1.36.$$ While the values of $$\sigma$$ and $$\gamma$$ are equal to their posterior means from Table 1, $$\alpha_0$$ and $$\alpha_1$$ are chosen to generate a highly persistent process that slowly reverts to a long-run mean of 6%.^{11} Because discretization bias appears to be negligible for daily data, the process was both simulated and estimated with $$h = 1,$$ meaning that no data augmentation was used.

Posterior distributions were computed under both the flat and Jeffreys priors used above. With posterior means chosen as point estimates, the top half of Table 2 contains bias and root mean squared error summaries for the six parameters of the model. The bottom half addresses the frequencies with which the true parameters are within the top 5% or bottom 5% of the posterior distributions, where values near 5% are clearly desirable for each.

$$\alpha_0$$ | $$\alpha_1$$ | $$\alpha_2$$ | $$\alpha_3$$ | $$\sigma$$ | $$\gamma$$ | |
---|---|---|---|---|---|---|

True parameters | 0.0072 | −0.12 | 0 | 0 | 1.55 | 1.36 |

Bias | ||||||

Flat prior | −0.0979 | 2.740 | −25.48 | 0.00117 | 0.0136 | 0.00124 |

Jeffreys prior | 0.0288 | −0.452 | −1.11 | −0.00042 | 0.0178 | 0.00197 |

Root mean squared error | ||||||

Flat prior | 0.1406 | 4.049 | 41.66 | 0.00185 | 0.1057 | 0.01981 |

Jeffreys prior | 0.0759 | 1.818 | 15.67 | 0.00108 | 0.1105 | 0.02053 |

Probability that true parameter is in top 5% of posterior | ||||||

Flat prior | 0.277 | 0.003 | 0.246 | 0.000 | 0.053 | 0.055 |

Jeffreys prior | 0.035 | 0.096 | 0.047 | 0.607 | 0.089 | 0.093 |

Probability that true parameter is in bottom 5% of posterior | ||||||

Flat prior | 0.000 | 0.269 | 0.001 | 0.284 | 0.074 | 0.070 |

Jeffreys prior | 0.174 | 0.051 | 0.048 | 0.031 | 0.124 | 0.119 |

$$\alpha_0$$ | $$\alpha_1$$ | $$\alpha_2$$ | $$\alpha_3$$ | $$\sigma$$ | $$\gamma$$ | |
---|---|---|---|---|---|---|

True parameters | 0.0072 | −0.12 | 0 | 0 | 1.55 | 1.36 |

Bias | ||||||

Flat prior | −0.0979 | 2.740 | −25.48 | 0.00117 | 0.0136 | 0.00124 |

Jeffreys prior | 0.0288 | −0.452 | −1.11 | −0.00042 | 0.0178 | 0.00197 |

Root mean squared error | ||||||

Flat prior | 0.1406 | 4.049 | 41.66 | 0.00185 | 0.1057 | 0.01981 |

Jeffreys prior | 0.0759 | 1.818 | 15.67 | 0.00108 | 0.1105 | 0.02053 |

Probability that true parameter is in top 5% of posterior | ||||||

Flat prior | 0.277 | 0.003 | 0.246 | 0.000 | 0.053 | 0.055 |

Jeffreys prior | 0.035 | 0.096 | 0.047 | 0.607 | 0.089 | 0.093 |

Probability that true parameter is in bottom 5% of posterior | ||||||

Flat prior | 0.000 | 0.269 | 0.001 | 0.284 | 0.074 | 0.070 |

Jeffreys prior | 0.174 | 0.051 | 0.048 | 0.031 | 0.124 | 0.119 |

The table reports results from the Monte Carlo simulation of the nonlinear drift model

While the volatility parameters are precisely estimated under both priors, the results show substantial bias under the flat prior for all four parameters of the drift. Using the Jeffreys prior results in biases that are uniformly smaller, in some cases by wide margins. For instance, under the flat prior the parameter $$\alpha_2$$ is on average estimated to be equal to −25.48, even though its true value is zero. The Jeffreys prior results are much better behaved, with a bias of just −1.11 for the same parameter. Root mean squared errors are also much lower under the Jeffreys prior, generally around half of their values under the flat prior.

It is also interesting to look at the frequencies with which the true parameter values lie in the tails of the posterior distributions. In this dimension, both priors exhibit difficulties. Ideally an uninformative prior would have the property that true parameter value would be contained in the upper 5% of the posterior mass in approximately 5% of the Monte Carlo samples. Table 2 shows, however, that in 1000 Monte Carlo samples, the true value of $$\alpha_3,$$ zero, was *never* in the upper in the upper tail of the posteriors computed under the flat prior, while it was in 60% of the upper tails using the Jeffreys prior. Less extreme but still problematic results are obtained for other parameters. Neither prior therefore adequately represents a completely uninformed view.

More important is how these biases are translated into biases about the drift as a whole. Following the procedure in Section 3.3, I compute a posterior mean for the drift function for each of the Monte Carlo samples. The average of the drifts computed under each prior, as well as the true drift, are plotted in panel A of Figure 8.

The graph reveals that the biases apparent in the elements of $$\alpha$$ under the flat prior generate strong biases toward nonlinear drift. Furthermore, the magnitudes of the nonlinear drift typically estimated using the flat prior are not unlike those estimated by Aït-Sahalia (1996b), CHLS (1997), and Stanton (1997), as well as in the current article. Panel A also shows that while the Jeffreys prior does not completely eliminate this sort of bias, it reduces it considerably. Panel B shows that the standard deviation of the Jeffreys prior “estimator” is less than half of that of the flat prior.

Panels C and D report the frequency with which the true drift falls in the upper 5% and lower 5% of the posterior distribution, respectively. Ideally, if a prior is truly uninformative these frequencies should each be close to 5%. Unfortunately the figure shows that both can be far from that value for both priors.

For the flat prior, panel C shows that the true drift for interest rates above 10% is in the posterior distribution's upper tail in 15% to 22% of all samples. Meanwhile, the probability that the true drift is in the lower tail of the posterior distribution is too low for the flat prior. Furthermore, summing these frequencies reveals that the true drift, for high interest rates, is in the middle 90% of the posterior distribution less than 80% of the time. The holder of a flat prior, in addition to exhibiting bias, therefore shows a tendency to be overly confident in his conclusions.^{12}

Panels C and D show that drift posteriors computed under the Jeffreys prior are comparatively well behaved for high interest rates, but that they have deficiencies at low to moderate rates. Specifically, the frequencies with which the true drift lies in the upper and lower tails of the posterior distribution are both far too high. At an interest rate of 5%, for instance, there is roughly a one in three chance that the true drift will lie outside the middle 90% of the posterior. Since both the bias and estimator standard deviations are very small in this region, the result can only be explained by the Jeffreys prior generating inferences that are overly sharp. When using the Jeffreys prior, inferences that small but significant positive or negative drift exists in low to moderate rates should therefore be discounted.

Before looking at the data, the holder of a flat prior expects to conclude in favor of the existence of nonlinear drift even when it is not a true feature of the data. As in the autoregressive model, the flat prior therefore represents an informative prior belief that the model is stationary. In particular, the flat prior in this case corresponds to a belief that the drift is nonlinear.

We can find some intuition for the directions of these biases in an analogy with linear time-series models. In the case of the AR(1), finite sample bias tends to make the process appear more mean reverting than it actually is, with the magnitude of this bias decreasing as the sample size grows [see, e.g., Marriott and Pope (1954)]. Since drift nonlinearity is a feature of the tails of the empirical distribution of short rates, the parameters that determine the degree of nonlinearity in the model, $$\alpha_2$$ and $$\alpha_3,$$ are effectively estimated with less data than the parameters $$\alpha_0$$ and $$\alpha_1,$$ which substantially affect the drift of the short rate throughout that distribution. As with the AR(1), finite sample biases lead us to find spurious mean reversion, but with biases most severe in the nonlinear parameters $$\alpha_2$$ and $$\alpha_3,$$ we also incorrectly characterize this mean reversion as nonlinear.

These results provide a natural interpretation of the drift posteriors graphed in Figures 5 and 6. In panels A and C of Figure 5, we saw that adding a belief in stationarity to the flat prior resulted in few changes. The Monte Carlo exercise suggests that this is because the flat prior is already informative about stationarity. The same effect is present in panels A and B of Figure 6, although not as strongly. The Jeffreys prior, meanwhile, represents less of a belief in stationarity, so the addition of this information to the prior has large effects. In Figure 6, panels B and D, for example, assuming stationarity leads to the conclusion that nonlinear drift is highly probable, even when no drift was evident without that assumption.

## 4. Specification Analysis and an Extension to the Model

The very different inferences drawn using daily and monthly data are compelling evidence for model misspecification, since for diffusion models, all sampling frequencies should generate similar parameter estimates, although possibly of differing precision. In this section I present additional evidence of model misspecification and explore an alternative model that reconciles some earlier results.

### 4.1 A specification check

A direct specification analysis may be performed by examining the normalized residuals that are generated in the estimation process at each step of the Markov chain. In the Euler approximation,

Following Zellner (1975), we may view $$\epsilon$$ as a parameter vector and compute the posterior distributions of various functions of it. For model diagnostic purposes, these functions should include moments and autocorrelations. Violations of either independence or normality is indicative of model misspecification.

Posterior distributions of these functions are obtained similarly to posteriors of the model parameters. At each iteration of the Markov chain, given the current draw of the parameter vector and the augmented data, the time series of $$\epsilon_k$$ may be calculated.^{13} The mean, standard deviation, skewness, and kurtosis of the $$\epsilon$$ vector are then computed for comparison with their theoretical values of 0, 1, 0, and 3, respectively. In addition, the first and 1/*h*th order autocorrelations are calculated to detect violations of independence, where the first-order autocorrelation primarily captures within-period dependence and the 1/*h*th order autocorrelation captures dependence between adjacent periods.

Panel A of Table 3 lists the posterior means and standard deviations of these functions of $$\epsilon$$. For daily data, only the mean and standard deviation of the standardized residuals appear to be close to their theoretical values. Residuals exhibit positive skewness and pronounced excess kurtosis, and their autocorrelations appear to be negative, particularly between adjacent days. Taken together, these observations suggest there is a transient and fat-tailed component of interest rates that is not captured by the current model specification.

Panel A:$$r_{(k + 1)h} - r_{kh} = h\mu(r_{kh}, \phi) + \sqrt {h}\sigma (r_{kh}, \phi)\epsilon_k$$ | ||||||
---|---|---|---|---|---|---|

$$\text {Mean}(\epsilon_k)$$ | $$\text {StDev}(\epsilon_k)$$ | $$\text {Skew}(\epsilon_k)$$ | $$\text {Kurt}(\epsilon_k)$$ | $$\rho_1(\epsilon_k)$$ | $$\rho_{1/h}(\epsilon_k)$$ | |

Daily data | 0.0000 (0.0061) | 1.0000 (0.0042) | 0.0656 (0.0255) | 4.7368 (0.4501) | −0.0169 (0.0060) | −0.0639 (0.0056) |

Monthly data | −0.0002 (0.0138) | 0.9999 (0.0096) | 0.0013 (0.0339) | 3.0036 (0.0678) | 0.0002 (0.0137) | 0.0025 (0.0135) |

Panel A:$$r_{(k + 1)h} - r_{kh} = h\mu(r_{kh}, \phi) + \sqrt {h}\sigma (r_{kh}, \phi)\epsilon_k$$ | ||||||
---|---|---|---|---|---|---|

$$\text {Mean}(\epsilon_k)$$ | $$\text {StDev}(\epsilon_k)$$ | $$\text {Skew}(\epsilon_k)$$ | $$\text {Kurt}(\epsilon_k)$$ | $$\rho_1(\epsilon_k)$$ | $$\rho_{1/h}(\epsilon_k)$$ | |

Daily data | 0.0000 (0.0061) | 1.0000 (0.0042) | 0.0656 (0.0255) | 4.7368 (0.4501) | −0.0169 (0.0060) | −0.0639 (0.0056) |

Monthly data | −0.0002 (0.0138) | 0.9999 (0.0096) | 0.0013 (0.0339) | 3.0036 (0.0678) | 0.0002 (0.0137) | 0.0025 (0.0135) |

Panel B:$$r_{(k + 1)h} - r_{kh} = h\mu^r(r_{kh}, \theta_{kh}, \phi) + \sqrt {h}\sigma^r(r_{kh}, \theta_{kh}, \phi)\epsilon_k^r \\ \;\; \theta_{(k + 1)h} - \theta_{kh} = h\mu^{\theta}(\theta_{kh}, \phi) + \sqrt {h}\sigma^{\theta}(\theta_{kh}, \phi)\epsilon_k^{\theta}$$ | ||||||
---|---|---|---|---|---|---|

$$\text {Mean}(\epsilon_k^r)$$ | $$\text {StDev}(\epsilon_k^r)$$ | $$\text {Skew}(\epsilon_k^r)$$ | $$\text {Kurt}(\epsilon_k^r)$$ | $$\rho_1(\epsilon_k^r)$$ | $$\rho_{1/h}(\epsilon_k^r)$$ | |

Daily data | −0.0003 (0.0061) | 1.0000 (0.0042) | −0.0779 (0.0169 | 3.3503 (0.0698) | −0.0016 (0.0060) | −0.0003 (0.0060) |

Monthly data | 0.0019 (0.0277) | 0.9999 (0.0198) | −0.0002 (0.0678) | 3.0059 (0.1370) | −0.0012 (0.0277) | −0.0022 (0.0279) |

$$\text {Mean}(\epsilon_k^{\theta})$$ | $$\text {StDev}(\epsilon_k^{\theta})$$ | $$\text {Skew}(\epsilon_k^{\theta})$$ | $$\text {Kurt}(\epsilon_k^{\theta})$$ | $$\rho_1(\epsilon_k^{\theta})$$ | $$\rho_{1/h}(\epsilon_k^{\theta})$$ | |

Daily data | −0.0001 (0.0060) | 1.0001 (0.0043) | −0.0001 (0.0147) | 3.0123 (0.0301) | 0.0003 (0.0060) | 0.0005 (0.0060) |

Monthly data | −0.0000 (0.0278) | 0.9998 (0.0198) | −0.0125 (0.0679) | 3.0556 (0.1463) | −0.0038 (0.0278) | −0.0025 (0.0283) |

$$\rho(\epsilon_k^r, \epsilon_k^{\theta})$$ | ||||||

Daily data | 0.0001 (0.0059) | |||||

Monthly data | −0.0008 (0.0275) |

Panel B:$$r_{(k + 1)h} - r_{kh} = h\mu^r(r_{kh}, \theta_{kh}, \phi) + \sqrt {h}\sigma^r(r_{kh}, \theta_{kh}, \phi)\epsilon_k^r \\ \;\; \theta_{(k + 1)h} - \theta_{kh} = h\mu^{\theta}(\theta_{kh}, \phi) + \sqrt {h}\sigma^{\theta}(\theta_{kh}, \phi)\epsilon_k^{\theta}$$ | ||||||
---|---|---|---|---|---|---|

$$\text {Mean}(\epsilon_k^r)$$ | $$\text {StDev}(\epsilon_k^r)$$ | $$\text {Skew}(\epsilon_k^r)$$ | $$\text {Kurt}(\epsilon_k^r)$$ | $$\rho_1(\epsilon_k^r)$$ | $$\rho_{1/h}(\epsilon_k^r)$$ | |

Daily data | −0.0003 (0.0061) | 1.0000 (0.0042) | −0.0779 (0.0169 | 3.3503 (0.0698) | −0.0016 (0.0060) | −0.0003 (0.0060) |

Monthly data | 0.0019 (0.0277) | 0.9999 (0.0198) | −0.0002 (0.0678) | 3.0059 (0.1370) | −0.0012 (0.0277) | −0.0022 (0.0279) |

$$\text {Mean}(\epsilon_k^{\theta})$$ | $$\text {StDev}(\epsilon_k^{\theta})$$ | $$\text {Skew}(\epsilon_k^{\theta})$$ | $$\text {Kurt}(\epsilon_k^{\theta})$$ | $$\rho_1(\epsilon_k^{\theta})$$ | $$\rho_{1/h}(\epsilon_k^{\theta})$$ | |

Daily data | −0.0001 (0.0060) | 1.0001 (0.0043) | −0.0001 (0.0147) | 3.0123 (0.0301) | 0.0003 (0.0060) | 0.0005 (0.0060) |

Monthly data | −0.0000 (0.0278) | 0.9998 (0.0198) | −0.0125 (0.0679) | 3.0556 (0.1463) | −0.0038 (0.0278) | −0.0025 (0.0283) |

$$\rho(\epsilon_k^r, \epsilon_k^{\theta})$$ | ||||||

Daily data | 0.0001 (0.0059) | |||||

Monthly data | −0.0008 (0.0275) |

The table reports posterior means and standard deviations (in parentheses) of various moments of the residuals of the one- and two-factor models. A correct specification implies that the average residual, $$\text {Mean}(\epsilon_k),$$ should be zero. The residual standard deviations, $$\text {StDev}(\epsilon_k),$$ should be one, $$\text {Skew}(\epsilon_k)$$ should be zero and $$\text {Kurt}(\epsilon_k)$$ should be three (since it represents total rather than excess kurtosis). Within-period order autocorrelation, $$\rho_1(\epsilon_k),$$ between-period autocorrelation, $$\rho_{1/h}(\epsilon_k),$$ and cross-equation correlation, $$\rho((\epsilon_k^r, (\epsilon_k^{\theta}))),$$ should all equal zero.

Results from monthly data reveal none of these problems, as the *i.i.d.* normal assumption appears to be well satisfied. This further supports the notion that the source of the model misspecification is a transient component that ceases to be relevant at a one-month horizon. The possible sources of such a component include bid-ask bounce and feedback from the reserve requirement cycle effects in the Federal funds market identified by Hamilton (1996).

The existence of this noisy component of high-frequency interest rates casts strong doubt on the relevance of some of the previous results and those of the studies that use the same data. As the data come from Aït-Sahalia (1996b), the criticisms are relevant for this article in particular, but they are also applicable to some of the parametric analysis of Chapman and Pearson (2000), which also uses the daily Eurodollar data to estimate a nonlinear one-factor model.

### 4.2 A nonlinear stochastic mean model of interest rates

Durham (2002) also finds that interest rate drift nonlinearity is more associated with noisy interest rate data, and he has suggested that the apparent transitory component not currently captured by the model motivates the adoption of a stochastic mean model of interest rates. These models posit that interest rates are driven by a persistent process, but that rates deviate from this process in a random but highly transient way. Examples of stochastic mean models may be found in the articles by Andersen and Lund (1997), Balduzzi, Das, and Foresi (1996), Jegadeesh and Pennacci (1996), and Piazzesi (2001), among others.

The stochastic mean model considered in this article,

^{14}The volatility elasticity is allowed to differ between the two processes, since earlier results suggested that more transient dynamics may have a lower elasticity. As simplifying assumptions, both variances are assumed to depend on $$\theta_t$$ only, rather than on both $$\theta_t$$ and $$r_t,$$ and the two Brownian motions are assumed to be independent.

The stochastic mean model is somewhat more difficult to estimate since the $$\theta_t$$ process is latent. Nevertheless, the econometric approach described previously and in Appendix B is easily extended to such models. Following this algorithm, parameter estimates were obtained using daily data by again setting $$h = .2$$ for daily data and $$h = .05$$ for monthly data.

Panel B of Table 3 reveals that the two-factor stochastic mean model shows much less evidence of misspecification than did the previous one-factor model. While the interest rate equation [Equation (21)] generates some excess kurtosis in its standardized residuals when estimated from daily data, it is far less than that reported for the original model. No violations of *i.i.d.* normality are apparent for the stochastic mean equation or for either equation when estimated with monthly data.

Parameter posterior statistics for the stochastic mean model estimated with daily data are reported in panel A of Table 4. Figure 9 contains the corresponding drift posterior graphs, where the drift shown is now the drift of the stochastic mean process, $$\theta_t,$$ rather than the interest rate. As before, a variety of priors are used, with the flat prior now given by $$p(\kappa, \xi, \delta, \alpha, \sigma, \gamma) \propto 1/\xi \sigma.$$ Instead of deriving a new Jeffreys prior on the combined set of drift parameters $$\kappa$$ and $$\alpha$$, I use the approximate Jeffreys prior on $$\alpha$$ derived for the univariate process. While this is not the true Jeffreys prior for the two-factor model, it is the Jeffreys prior for the drift parameters in $$\alpha$$ conditional on the vector $$(\kappa, \xi, \delta, \sigma, \gamma).$$ The high precision of the posterior distributions of these parameters suggests that conditioning on these parameters should be relatively harmless.

Flat trior | Stationary flat trior | Drift-stationary flat trior | Jeffreys trior | Stationary Jeffreys trior | Drift-stationary Jeffreys trior | |
---|---|---|---|---|---|---|

Panel A: Daily data | ||||||

Posterior means | ||||||

$$\kappa \times 10^{- 3}$$ | 3.15 | 3.14 | 3.14 | 3.13 | 3.03 | 3.03 |

$$\xi$$ | 1.66 | 1.66 | 1.66 | 1.65 | 1.69 | 1.69 |

$$\delta$$ | 1.35 | 1.35 | 1.35 | 1.35 | 1.36 | 1.36 |

$$\alpha_0 \times 10$$ | −1.08 | −1.51 | −1.54 | 0.34 | −0.19 | −0.19 |

$$\alpha_1$$ | 2.01 | 2.68 | 2.73 | −0.36 | 0.65 | 0.65 |

$$\alpha_2 \times 10^{- 1}$$ | −1.06 | −1.35 | −1.39 | 0.10 | −0.48 | −0.48 |

$$\alpha_3 \times 10^3$$ | 1.83 | 2.63 | 2.68 | −0.80 | 0.01 | 0.01 |

$$\sigma$$ | 1.62 | 1.62 | 1.62 | 1.66 | 1.63 | 1.63 |

$$\gamma$$ | 1.67 | 1.68 | 1.68 | 1.69 | 1.69 | 1.69 |

Posterior standard deviations | ||||||

$$\kappa \times 10^{- 3}$$ | 0.23 | 0.23 | 0.23 | 0.23 | 0.16 | 0.16 |

$$\xi$$ | 0.12 | 0.12 | 0.12 | 0.13 | 0.09 | 0.09 |

$$\delta$$ | 0.03 | 0.03 | 0.03 | 0.03 | 0.02 | 0.02 |

$$\alpha_0 \times 10$$ | 1.22 | 0.93 | 0.92 | 0.46 | 0.09 | 0.09 |

$$\alpha_1$$ | 1.94 | 1.52 | 1.48 | 0.87 | 0.35 | 0.35 |

$$\alpha_2 \times 10^{- 1}$$ | 0.92 | 0.75 | 0.73 | 0.53 | 0.32 | 0.32 |

$$\alpha_3 \times 10^3$$ | 2.25 | 1.71 | 1.70 | 0.77 | 0.05 | 0.05 |

$$\sigma$$ | 0.27 | 0.27 | 0.27 | 0.28 | 0.22 | 0.22 |

$$\gamma$$ | 0.06 | 0.06 | 0.06 | 0.06 | 0.06 | 0.06 |

Posterior 95% HPD intervals | ||||||

$$\kappa \times 10^{- 3}$$ | (2.71, 3.63) | (2.68, 3.60) | (2.68, 3.60) | (2.68, 3.61) | (2.64, 3.34) | (2.64, 3.34) |

$$\xi$$ | (1.43, 1.90) | (1.43, 1.90) | (1.43, 1.90) | (1.41, 1.90) | (1.55, 1.90) | (1.55, 1.90) |

$$\delta$$ | (1.29, 1.41) | (1.29, 1.41) | (1.29, 1.41) | (1.28, 1.40) | (1.33, 1.41) | (1.33, 1.41) |

$$\alpha_0 \times 10$$ | (−3.55, 1.23) | (−3.31, 0.02) | (−3.31, −0.06) | (−0.37, 1.23) | (−0.36, −0.09) | (−0.36, −0.09) |

$$\alpha_1$$ | (−1.75, 5.84) | (0.12, 5.81) | (0.32, 5.73) | (−2.23, 1.07) | (0.38, 1.36) | (0.38, 1.36) |

$$\alpha_2 \times 10^{- 1}$$ | (−2.87, 0.77) | (−2.94, −0.03) | (−2.72, −0.03) | (−0.83, 1.18) | (−1.10, −0.10) | (−1.10, −0.10) |

$$\alpha_3 \times 10^3$$ | (−2.54, 6.31) | (0.00, 5.78) | (0.00, 5.81) | (−2.39, 0.04) | (0.00, 0.04) | (0.00, 0.04) |

$$\sigma$$ | (1.12, 2.18) | (1.12, 2.18) | (1.12, 2.18) | (1.24, 2.31) | (1.20, 1.82) | (1.20, 1.82) |

$$\gamma$$ | (1.56, 1.81) | (1.56, 1.80) | (1.56, 1.80) | (1.59, 1.81) | (1.57, 1.74) | (1.57, 1.74) |

Panel B: monthly data | ||||||

Posterior means | ||||||

$$\kappa \times 10^{- 3}$$ | 0.29 | 0.29 | 0.29 | 0.27 | 0.21 | 0.19 |

$$\xi$$ | 0.45 | 0.44 | 0.44 | 0.46 | 0.58 | 0.58 |

$$\delta$$ | 1.31 | 1.30 | 1.30 | 1.28 | 1.37 | 1.36 |

$$\alpha_0 \times 10$$ | −1.27 | −1.68 | −1.71 | 0.29 | −0.17 | −0.19 |

$$\alpha_1$$ | 2.32 | 2.95 | 3.01 | −0.29 | 0.62 | 0.69 |

$$\alpha_2 \times 10^{- 1}$$ | −1.21 | −1.49 | −1.52 | 0.06 | −0.49 | −0.56 |

$$\alpha_3 \times 10^3$$ | 2.19 | 2.95 | 2.99 | −0.70 | 0.03 | 0.04 |

$$\sigma$$ | 1.95 | 1.97 | 1.97 | 2.53 | 2.97 | 3.14 |

$$\gamma$$ | 1.70 | 1.71 | 1.71 | 1.82 | 1.89 | 1.92 |

Posterior standard deviations | ||||||

$$\kappa \times 10^{- 3}$$ | 0.09 | 0.09 | 0.09 | 0.10 | 0.05 | 0.04 |

$$\xi$$ | 0.35 | 0.35 | 0.35 | 0.33 | 0.18 | 0.19 |

$$\delta$$ | 0.26 | 0.26 | 0.26 | 0.24 | 0.13 | 0.13 |

$$\alpha_0 \times 10$$ | 1.30 | 1.03 | 1.02 | 0.48 | 0.16 | 0.16 |

$$\alpha_1$$ | 2.09 | 1.70 | 1.66 | 0.90 | 0.33 | 0.28 |

$$\alpha_2 \times 10^{- 1}$$ | 1.01 | 0.85 | 0.82 | 0.54 | 0.25 | 0.17 |

$$\alpha_3 \times 10^3$$ | 2.40 | 1.89 | 1.88 | 0.81 | 0.30 | 0.32 |

$$\sigma$$ | 0.69 | 0.69 | 0.69 | 0.83 | 0.74 | 0.57 |

$$\gamma$$ | 0.13 | 0.13 | 0.13 | 0.13 | 0.11 | 0.08 |

Posterior 95% HPD intervals | ||||||

$$\kappa \times 10^{- 3}$$ | (0.12, 0.47) | (0.12, 0.46) | (0.12, 0.46) | (0.13, 0.48) | (0.14, 0.30) | (0.14, 0.23) |

$$\xi$$ | (0.07, 1.15) | (0.07, 1.15) | (0.07, 1.15) | (0.08, 1.14) | (0.27, 0.74) | (0.27, 0.74) |

$$\delta$$ | (0.94, 1.68) | (0.94, 1.68) | (0.94, 1.68) | (0.94, 1.65) | (1.13, 1.51) | (1.13, 1.48) |

$$\alpha_0 \times 10$$ | (−3.81, 1.27) | (−3.66, 0.02) | (−3.65, −0.05) | (−0.34, 1.36) | (−0.28, −0.04) | (−0.28, −0.12) |

$$\alpha_1$$ | (−1.73, 6.47) | (0.12, 6.42) | (0.19, 6.20) | (−2.34, 1.16) | (0.04, 1.13) | (0.49, 1.13) |

$$\alpha_2 \times 10^{- 1}$$ | (−3.19, 0.75) | (−3.20, 0.04) | (−3.04, −0.03) | (−0.82, 1.26) | (− 0.96, 0.09) | (−0.96, −0.42) |

$$\alpha_3 \times 10^3$$ | (−2.55, 6.86) | (0.00, 6.40) | (0.00, 6.42) | (− 2.52, 0.02) | (0.00, 0.01) | (0.00, 0.01) |

$$\sigma$$ | (0.86, 3.40) | (0.86, 3.40) | (0.86, 3.40) | (1.32, 3.85) | (1.43, 3.85) | (2.40, 3.85) |

$$\gamma$$ | (1.46, 1.96) | (1.46, 1.96) | (1.46, 1.96) | (1.59, 2.02) | (1.64, 2.02) | (1.84, 2.02) |

Flat trior | Stationary flat trior | Drift-stationary flat trior | Jeffreys trior | Stationary Jeffreys trior | Drift-stationary Jeffreys trior | |
---|---|---|---|---|---|---|

Panel A: Daily data | ||||||

Posterior means | ||||||

$$\kappa \times 10^{- 3}$$ | 3.15 | 3.14 | 3.14 | 3.13 | 3.03 | 3.03 |

$$\xi$$ | 1.66 | 1.66 | 1.66 | 1.65 | 1.69 | 1.69 |

$$\delta$$ | 1.35 | 1.35 | 1.35 | 1.35 | 1.36 | 1.36 |

$$\alpha_0 \times 10$$ | −1.08 | −1.51 | −1.54 | 0.34 | −0.19 | −0.19 |

$$\alpha_1$$ | 2.01 | 2.68 | 2.73 | −0.36 | 0.65 | 0.65 |

$$\alpha_2 \times 10^{- 1}$$ | −1.06 | −1.35 | −1.39 | 0.10 | −0.48 | −0.48 |

$$\alpha_3 \times 10^3$$ | 1.83 | 2.63 | 2.68 | −0.80 | 0.01 | 0.01 |

$$\sigma$$ | 1.62 | 1.62 | 1.62 | 1.66 | 1.63 | 1.63 |

$$\gamma$$ | 1.67 | 1.68 | 1.68 | 1.69 | 1.69 | 1.69 |

Posterior standard deviations | ||||||

$$\kappa \times 10^{- 3}$$ | 0.23 | 0.23 | 0.23 | 0.23 | 0.16 | 0.16 |

$$\xi$$ | 0.12 | 0.12 | 0.12 | 0.13 | 0.09 | 0.09 |

$$\delta$$ | 0.03 | 0.03 | 0.03 | 0.03 | 0.02 | 0.02 |

$$\alpha_0 \times 10$$ | 1.22 | 0.93 | 0.92 | 0.46 | 0.09 | 0.09 |

$$\alpha_1$$ | 1.94 | 1.52 | 1.48 | 0.87 | 0.35 | 0.35 |

$$\alpha_2 \times 10^{- 1}$$ | 0.92 | 0.75 | 0.73 | 0.53 | 0.32 | 0.32 |

$$\alpha_3 \times 10^3$$ | 2.25 | 1.71 | 1.70 | 0.77 | 0.05 | 0.05 |

$$\sigma$$ | 0.27 | 0.27 | 0.27 | 0.28 | 0.22 | 0.22 |

$$\gamma$$ | 0.06 | 0.06 | 0.06 | 0.06 | 0.06 | 0.06 |

Posterior 95% HPD intervals | ||||||

$$\kappa \times 10^{- 3}$$ | (2.71, 3.63) | (2.68, 3.60) | (2.68, 3.60) | (2.68, 3.61) | (2.64, 3.34) | (2.64, 3.34) |

$$\xi$$ | (1.43, 1.90) | (1.43, 1.90) | (1.43, 1.90) | (1.41, 1.90) | (1.55, 1.90) | (1.55, 1.90) |

$$\delta$$ | (1.29, 1.41) | (1.29, 1.41) | (1.29, 1.41) | (1.28, 1.40) | (1.33, 1.41) | (1.33, 1.41) |

$$\alpha_0 \times 10$$ | (−3.55, 1.23) | (−3.31, 0.02) | (−3.31, −0.06) | (−0.37, 1.23) | (−0.36, −0.09) | (−0.36, −0.09) |

$$\alpha_1$$ | (−1.75, 5.84) | (0.12, 5.81) | (0.32, 5.73) | (−2.23, 1.07) | (0.38, 1.36) | (0.38, 1.36) |

$$\alpha_2 \times 10^{- 1}$$ | (−2.87, 0.77) | (−2.94, −0.03) | (−2.72, −0.03) | (−0.83, 1.18) | (−1.10, −0.10) | (−1.10, −0.10) |

$$\alpha_3 \times 10^3$$ | (−2.54, 6.31) | (0.00, 5.78) | (0.00, 5.81) | (−2.39, 0.04) | (0.00, 0.04) | (0.00, 0.04) |

$$\sigma$$ | (1.12, 2.18) | (1.12, 2.18) | (1.12, 2.18) | (1.24, 2.31) | (1.20, 1.82) | (1.20, 1.82) |

$$\gamma$$ | (1.56, 1.81) | (1.56, 1.80) | (1.56, 1.80) | (1.59, 1.81) | (1.57, 1.74) | (1.57, 1.74) |

Panel B: monthly data | ||||||

Posterior means | ||||||

$$\kappa \times 10^{- 3}$$ | 0.29 | 0.29 | 0.29 | 0.27 | 0.21 | 0.19 |

$$\xi$$ | 0.45 | 0.44 | 0.44 | 0.46 | 0.58 | 0.58 |

$$\delta$$ | 1.31 | 1.30 | 1.30 | 1.28 | 1.37 | 1.36 |

$$\alpha_0 \times 10$$ | −1.27 | −1.68 | −1.71 | 0.29 | −0.17 | −0.19 |

$$\alpha_1$$ | 2.32 | 2.95 | 3.01 | −0.29 | 0.62 | 0.69 |

$$\alpha_2 \times 10^{- 1}$$ | −1.21 | −1.49 | −1.52 | 0.06 | −0.49 | −0.56 |

$$\alpha_3 \times 10^3$$ | 2.19 | 2.95 | 2.99 | −0.70 | 0.03 | 0.04 |

$$\sigma$$ | 1.95 | 1.97 | 1.97 | 2.53 | 2.97 | 3.14 |

$$\gamma$$ | 1.70 | 1.71 | 1.71 | 1.82 | 1.89 | 1.92 |

Posterior standard deviations | ||||||

$$\kappa \times 10^{- 3}$$ | 0.09 | 0.09 | 0.09 | 0.10 | 0.05 | 0.04 |

$$\xi$$ | 0.35 | 0.35 | 0.35 | 0.33 | 0.18 | 0.19 |

$$\delta$$ | 0.26 | 0.26 | 0.26 | 0.24 | 0.13 | 0.13 |

$$\alpha_0 \times 10$$ | 1.30 | 1.03 | 1.02 | 0.48 | 0.16 | 0.16 |

$$\alpha_1$$ | 2.09 | 1.70 | 1.66 | 0.90 | 0.33 | 0.28 |

$$\alpha_2 \times 10^{- 1}$$ | 1.01 | 0.85 | 0.82 | 0.54 | 0.25 | 0.17 |

$$\alpha_3 \times 10^3$$ | 2.40 | 1.89 | 1.88 | 0.81 | 0.30 | 0.32 |

$$\sigma$$ | 0.69 | 0.69 | 0.69 | 0.83 | 0.74 | 0.57 |

$$\gamma$$ | 0.13 | 0.13 | 0.13 | 0.13 | 0.11 | 0.08 |

Posterior 95% HPD intervals | ||||||

$$\kappa \times 10^{- 3}$$ | (0.12, 0.47) | (0.12, 0.46) | (0.12, 0.46) | (0.13, 0.48) | (0.14, 0.30) | (0.14, 0.23) |

$$\xi$$ | (0.07, 1.15) | (0.07, 1.15) | (0.07, 1.15) | (0.08, 1.14) | (0.27, 0.74) | (0.27, 0.74) |

$$\delta$$ | (0.94, 1.68) | (0.94, 1.68) | (0.94, 1.68) | (0.94, 1.65) | (1.13, 1.51) | (1.13, 1.48) |

$$\alpha_0 \times 10$$ | (−3.81, 1.27) | (−3.66, 0.02) | (−3.65, −0.05) | (−0.34, 1.36) | (−0.28, −0.04) | (−0.28, −0.12) |

$$\alpha_1$$ | (−1.73, 6.47) | (0.12, 6.42) | (0.19, 6.20) | (−2.34, 1.16) | (0.04, 1.13) | (0.49, 1.13) |

$$\alpha_2 \times 10^{- 1}$$ | (−3.19, 0.75) | (−3.20, 0.04) | (−3.04, −0.03) | (−0.82, 1.26) | (− 0.96, 0.09) | (−0.96, −0.42) |

$$\alpha_3 \times 10^3$$ | (−2.55, 6.86) | (0.00, 6.40) | (0.00, 6.42) | (− 2.52, 0.02) | (0.00, 0.01) | (0.00, 0.01) |

$$\sigma$$ | (0.86, 3.40) | (0.86, 3.40) | (0.86, 3.40) | (1.32, 3.85) | (1.43, 3.85) | (2.40, 3.85) |

$$\gamma$$ | (1.46, 1.96) | (1.46, 1.96) | (1.46, 1.96) | (1.59, 2.02) | (1.64, 2.02) | (1.84, 2.02) |

The table reports means, standard deviations, and 95% highest posterior density intervals (the shortest interval containing 95% of all posterior mass) for each of the nine parameters of the model

Prior robustness is also checked by varying the prior belief about stationarity. Since it turns out that $$\kappa \gt 0$$ with posterior probability one for all prior distributions, stationarity of the joint process depends in practice solely on the conditions on $$\alpha$$ and $$\gamma$$ described in Section 1.

Both Figure 9 and panel A of Table 4 reveal that the dynamics of the stochastic mean process estimated from daily data are almost identical to the dynamics of the original nonlinear interest rate process when estimated using *monthly* data. The transient dynamics identified earlier therefore appear to be well captured in the difference between the $$r_t$$ and $$\theta_t$$ processes.

Given the unobservability of $$\theta_t,$$ estimating the stochastic mean model using 261 monthly observations should be imprecise, at best. In addition, the transitory nature of the deviations of $$r_t$$ from $$\theta_t$$ induces a potential *aliasing* problem, since high-frequency dynamics or $$r_t$$ should be difficult, if not impossible, to estimate using low-frequency data. Nevertheless, for completeness, parameter estimates obtained using monthly data are reported in panel B of Table 4. The corresponding drift plots are in Figure 10.

The table shows that there are large differences between the values of $$\kappa$$ and $$\xi$$ supported by daily and monthly data, although this is somewhat to be expected due to aliasing. In the frequency domain, it is known that cycles of higher frequency than that of the observed data will be incorrectly attributed to lower frequency cycles [see Hamilton (1994)]. Since the deviations of $$r_t$$ from $$\theta_t$$ implied by daily data have half-lives well under one month, the aliasing problem is likely to be severe here.

The monthly parameter estimates of the stochastic mean process, however, capture a much lower frequency dynamic and are almost identical to the daily estimates. In addition, the precision of the posterior distributions of the drift parameters are almost identical to the precision obtained with daily data, suggesting again that high-frequency data have little information to add over monthly data about the shape of the drift function.

Figures 9 and 10 confirm earlier results that nonlinear drift is primarily a feature of a misspecified model of high-frequency data. While sufficient stationarity assumptions can be imposed through the prior to generate nonlinear drift, the magnitude of this nonlinearity is much less than that found under the original model using daily data. Using monthly data, conclusions are largely unaffected by the choice of model, as the short-term deviations from the stochastic mean process become irrelevant.

### 4.3 An economic specification test

Because one of the primary reasons for estimating the short-rate process is to be able to use that process to price other fixed-income securities, a natural evaluation of a model might therefore be based on how well the model describes the prices or price dynamics of these securities.

Specifically I consider whether the models and parameter estimates reported above are consistent with the observed volatility of three-month interest rates. While this maturity is relatively short, it is still substantially longer than the seven-day rates used to estimate the model.

Model bond prices are calculated under the local expectations hypothesis. Longstaff (2000) has argued that the expectations hypothesis is an accurate characterization of three-month repo rates, but may fail to hold for Treasury bills because of institutional demand for the high liquidity they provide. Duffee (1996) documents liquidity-driven volatility in Treasures bills that appears absent from other short-term debt. Because repo rates are difficult to obtain over a long sample, I use Eurodollar loan rates instead, which should similarly be unaffected by liquidity effects. Although these Eurodollar rates contain a credit risk component, if this component is relatively smooth it should not influence the calculation of daily interest rate volatilities.^{15}

Given a level, $$r_0,$$ of the current short rate, three-month bond prices $$B(r_0)$$ for the one-factor model are obtained by simulating 10,000 three-month paths of the short rate and then calculating the Monte Carlo estimate

Volatilities for the two-factor model are calculated similarly under the assumption that $$r_0 = \theta_0.$$^{16} As before, simulation is used to obtain $$R(r_0, \theta_0), R(r_0 + \epsilon, \theta_0),$$ and $$R(r_0, \theta_0 + \epsilon)$$ for each value of $$r_0.$$ Together, these may be used to numerically calculate the partial derivatives $$\partial R/\partial r_0$$ and $$\partial R/\partial \theta_0.$$ Because of the independence of the processes for $$r_t$$ and $$\theta_t,$$ the three-month interest rate variance is given by

These calculations were performed using both models and both daily and monthly posterior distributions computed under the flat prior. From each posterior, 500 sets of model parameters were drawn at random to construct posterior distributions of the three-month rate's volatility as a function of its level. Figure 11 plots the mean of this distribution, along with its 5th and 95th, percentiles, as solid lines.

These model-implied volatilities are compared to a locally linear non-parametric regression estimate of the daily volatility of changes in the three-month rate. This regression estimate is calculated using the Federal Reserve's time series of three-month Eurodollar rates over the same time period used to estimate the models. Figure 11 plots these curves as dashed lines, along with the 5th and 95th percentiles for the nonparametric estimate calculated from 5000 draws of the Kunsch (1989) block bootstrap with a block size of 100.^{17}

While comparison of Bayesian posteriors to frequentist confidence intervals is somewhat informal, the top left panel shows clearly that the volatility in daily Eurodollar rates is largely absent from three-month Eurodollar rates, as the one-factor model fitted to daily data grossly overpredicts the level of volatility of this longer-maturity yield. When fitted to monthly data (top right panel), the model-implied volatilities come very close to matching the nonparametric estimates, implying again that transient movements in the short rate do not impact the three-month rate.

The bottom panels contain results for the stochastic mean model. This model produces three-month volatilities that come very close to matching the nonparametric estimates regardless of what sampling interval is used to estimate the model, suggesting that it is better specified than the one-factor model.

Given the problems with transient noise in the seven-day Eurodollar rate, one might argue for using a different short-term rate, such as the Federal funds rate. Unfortunately other very short-term rates generate similar results, not reported here. The Federal funds rate and, to a lesser extent, the 30-day Eurodollar rate are both more volatile and have much stronger nonlinear mean reversion when sampled daily versus monthly. Another alternative would be to use a longer maturity rate, such as the three-month Treasury bill rate, to proxy for the short rate. Chapman, Long, and Pearson (1999) argue, however, that the three-month yield is a poor substitute for the “instantaneous” rate of interest when the model under consideration is nonlinear, as is ours. Using the longer yield “can significantly affect both estimates of the diffusion function and discount bond prices.” The use of noisy short-maturity rates may therefore be unavoidable.

## 5. Conclusion

Taken together, the results of the article combine to suggest that objective evidence for nonlinear mean reversion in the short-term interest rate is weak. The conclusion that high interest rates exhibit strong negative drift is extremely sensitive to the choice of prior, even when the choice is made between priors that could all be defended as representing relatively uninformed views. Results are also sensitive to the sampling frequency and model, with daily data implying much stronger nonlinearities and a level of interest rate volatility almost twice that apparent in monthly data.

Results in Chapman and Pearson (2000) suggest that nonparametric methods are biased toward finding nonlinear mean reversion even when it is not present. This article establishes that fully efficient parametric inference (such as maximum likelihood) may be just as vulnerable to such false inferences. From a frequentist perspective, this vulnerability arises in the form of biases similar to those found for simpler linear time-series models. In the nonlinear drift model, however, these biases affected nonlinear terms most severely, often generating spurious nonlinearity.

From a Bayesian perspective, we may attribute the tendency to find spurious nonlinearity to the selection of an informative prior distribution, possibly one that does not accurately reflect the investigator's actual prior belief. This view suggests that alternative priors be considered, and the article considered a number of variations. While it is impossible to say which prior is the “correct” one, several characteristics of the prior distributions are important to note.

The flat prior effectively represents a prior belief that the drift function is nonlinear, with the same shape and possibly the same magnitude as the drift function that is estimated in the data. Similar to the AR(1) model, the posterior means of the drift parameters are biased in repeated samples. A flat prior, by not anticipating and correcting for this bias, is implicitly taking an informed view that this bias is desirable.

The Jeffreys prior, which Phillips (1991b) argues is the best representation of true prior ignorance, suggests no evidence for nonlinear drift unless stationarity is imposed.

Imposing stationarity in a prior distribution represents a nontrivial amount of prior information. While such a prior is not unreasonable, it must be recognized that conclusions drawn about nonlinear drift under this prior are not entirely data based. For the monthly interest rate sample and also for the two-factor stochastic mean model, a stationarity restriction was required to generate the conclusion that high rates have a negative drift.

It was also shown that changing the sampling frequency can result in very different inferences about both the drift and volatility of interest rates. Specifically, daily data appear to contain a volatile transitory component that is unreflected in longer-term dynamics or the volatilities of the three-month Eurodollar rate. A nonlinear stochastic mean model of interest rates appears to fit the data much better and suggests that it is the unmodeled transitory component of short rates that is largely responsible for the finding of nonlinear drift.

While many of the problems with high-frequency data could be avoided, without loosing much sample information, by looking solely at month-end observations, the data augmentation procedure was crucial for eliminating discretization bias in these estimates. Under some priors, discretization bias was sufficiently severe to substantially change one's inferences about drift nonlinearity.

Although a definitive conclusion about the existence of nonlinear drift cannot be made solely by observing the short rate itself, there exists a variety of information in long-term yields and interest rate options that may be much more revealing than the short rate itself. While incorporating these data into the analysis of nonlinear drift remains a challenge, it is called for by the fact that although more than 5000 observations of daily data are available, the current data sample is effectively small. With these data alone, precise statements about the shape of the drift function — statements that different individuals with different prior beliefs can agree on — are impossible to make.

### Appendix A: An Introduction to the Gibbs Sampler and Data Augmentation

The Gibbs sampler is motivated by the frequent need to draw from intractable multivariate distributions. For simplicity, consider the bivariate case in which we desire to draw from the distribution $$p(\alpha, \beta|X),$$ where $$X$$ represents the observed data. In many cases the density $$p(\alpha, \beta|X)$$ is of an unknown form, while the conditional densities $$p(\alpha|\beta, X)$$ and $$p(\beta|\alpha, X)$$ are of standard forms.

A Gibbs sampling chain is formed as follows:

Choose some arbitrary value for $$\alpha$$ and label it $$\alpha_0.$$

Draw $$\beta_0$$ from the distribution $$p(\beta|\alpha_0, X).$$

Draw $$\alpha_1$$ from the distribution $$p(\alpha|\beta_0, X).$$

Repeatedly draw $$\beta_n$$ from $$p(\beta|\alpha_n, X)$$ and $$\alpha_{n + 1}$$ from $$p(\alpha|\beta_n, X).$$

Under very mild conditions, the pairs $$(\alpha_n, \beta_n)$$ converge in distribution to $$p(\alpha, \beta|X).$$ Posterior means, for example, may therefore be calculated by simulating a long chain of $$(\alpha_n, \beta_n),$$ discarding the values at the beginning of the chain (the “burn-in period”), and then averaging the remaining draws.

A simple example of the usefulness of the Gibbs sampler is provided by the following discrete time version of Vasicek's interest rate model:

Note, however, that were $$\mu$$ a known constant, then the equation would conform to the standard linear regression framework, with the quantity $$\mu - r_{t - 1}$$ filling the role of the regressor. Under the flat prior $$p(\kappa, \sigma) \propto 1/\sigma,$$ the posterior distribution of $$\kappa$$ and $$\sigma$$ is well known; $$\kappa$$ is distributed as a student's $$t$$ and $$\sigma$$ as an inverted gamma.

Similarly, if $$\kappa$$ and $$\sigma$$ were known, then Equation (25) could be rearranged as

By alternately drawing from the conditional distributions $$p(\kappa, \sigma|\mu, R)$$ and $$p(\mu|\kappa, \sigma, R),$$ the Gibbs sampler may be used to obtain draws from the joint posterior, $$p(\kappa, \sigma, \mu|R).$$^{18} Averaging these draws, for example, would produce an estimate of the posterior mean.

In principle, the Gibbs sampler may be used to draw from any distribution $$p(\theta^1, \theta^2, \ldots, \theta^{\text {k}}|X)$$ in which the conditional distributions $$p(\theta^i|\theta^j, j \ne i, X)$$ are of standard forms. Furthermore, each parameter “block,” $$\theta^i,$$ may be uni- or multivariate. The Gibbs sampler may therefore be used to analyze very complex posteriors when decomposition into simpler conditionals is possible. A variety of examples may be found in Chib and Greenberg (1996).

A particularly powerful incarnation of the Gibbs sampler has been coined “data augmentation” by Tanner and Wong (1987). This approach is motivated by the fact that many posterior distributions could be calculated more easily if some unobserved variable was in the researcher's dataset. Although the researcher does not observe this latent data, he may know (or be able to draw from) their distribution conditional on the observed data and the unobserved model parameters. The solution is to form a Gibbs sampling chain, alternately drawing from the conditional distribution of the model parameters given the observed and augmented data, and the conditional distribution of the augmented data given the real data and the model parameters.

Jacquier, Poison, and Rossi (1994) used this technique in a well-known analysis of stochastic volatility models. In this case, estimation of the price and volatility equations would be straightforward were volatility an observed variable, that is,

^{19}

In essence, the latent, or “augmented,” data are treated as a high-dimensional parameter vector. The data augmentation scheme therefore generates the joint posterior distribution of the parameters and the augmented data given the observed data. This makes it possible to construct marginal posteriors not only of the parameters, but for the latent variables as well. In applications such as stochastic volatility, this may be a very useful by-product of the estimation scheme.

### Appendix B: Details of the Data Augmentation Procedure

Let $$X_t$$ denote an $$L$$-dimensional diffusion process satisfying the stochastic differential equation

The Euler approximation of this model is given by

As stated above, the approach followed in this article will be to estimate the discretized process of Equation (28) while allowing $$h$$ to be arbitrarily small. If the discretization interval $$h$$ is smaller than the frequency of the observed data, Tanner and Wong's (1987) data augmentation algorithm will be used to augment the observed low-frequency data with unobserved high-frequency data.

Suppose the vector $$X_k$$ represents the time $$kh$$ realization of the $$L$$-dimensional process generated by the Euler approximation of Equation (28). Divide the $$L$$-dimensional vector $$X_k$$ into subvectors, $$X_k^o$$ and $$X_k^u,$$ based on whether the realization of the component of the process at that time is observed $$(X_k^o)$$ or unobserved $$(X_k^u).$$ If $$kh$$ is a noninteger, then $$X_k$$ is completely unobserved, implying $$X_k^u = X_k$$ and $$X_k^o = \phi.$$ In other cases, $$X_k$$ may be partially observed, as in a stochastic volatility model, where a price may be observed while volatility remains latent.

To perform the data augmentation, the Markov chain cycles through all $$k$$ for which $$X_k^u$$ is nonempty and uses the Metropolis–Hastings algorithm to replace old values of $$X_k^u$$ with new ones.

To draw the new value of $$X_k^u,$$ let $$\textbf {X}_{- \textbf {k}}^{\textbf {u}}$$ denote the set of all unobserved realizations save $$X_k^u,$$ the unobserved part of the process realized at time $$kh$$. Let $$\textbf {X}^{\textbf {o}}$$ denote the set of all observed data.

Our goal is to draw from the conditional distribution, $$p(X_k^u|\textbf {X}_{- \textbf {k}}^{\textbf {u}}, \textbf {X}^{\textbf {o}}, \phi).$$ Because the Euler approximation is a Markov process (reflecting our assumption about the underlying diffusion), only the contemporaneous and adjacent observations are relevant conditioning variables, meaning that

Bayes' rule and the Markov property can be applied to show that this density is proportional to

For every candidate-generating density, the Metropolis–Hastings algorithm specifies the acceptance probability required for convergence. The acceptance probability depends on both the target and candidate-generating densities evaluated at both the current and candidate draws. This probability is higher for candidate draws that have higher probability under the target density, but is lessened for draws that are generated too frequently by the candidate distribution. If $$q(X_k^u)$$ denotes the density of the candidate generator and $$\pi(X_k^u)$$ the target density (up to a constant of proportionality), then the acceptance probability (the probability of replacing the current draw $$X_k^u$$ with a new draw $$X_k^{u*}$$) is equal to

The main advantage of the candidate-generating density proposed is simply that it reduces the number of calculations required to implement the algorithm, since the candidate density cancels out one of the kernels in the target density of Equation (30). The candidate density $$p(X_k^u|X_{k - 1}, X_k^o, \phi),$$ along with a target density that is proportional to Equation (30), therefore results in a very simple implementation of Metropolis–Hastings: Essentially we simulate the process forward from time $$(k - 1)h$$ to time $$kh$$ to generate the candidate draw $$X_k^{u*},$$ then accept $$X_k^{u*}$$ over the current draw $$X_k^u$$ depending on how likely each one is to have preceded $$X_{k + 1}.$$

Draw a candidate value, $$X_k^{u*},$$ from $$p(X_k^u|X_{k - 1}, X_k^o, \phi)$$ as a possible replacement of the current value, $$X_k^u.$$

Replace the current value, $$X_k^u,$$ with the new draw, $$X_k^{u*},$$ with probability

Otherwise, retain the old value.(31)$$\min \left\{{\frac{{p(X_{k + 1} |X_k^{u*}, X_k^o, \phi)}} {{p(X_{k + 1} |X_k^u, X_k^o, \phi)}},1} \right\}.$$

One of the important characteristics of a candidate-generating density is that its tails dominate those of the target. If this is not the case, then the algorithm may display high rejection rates or even become “stuck” for many draws. The candidate generator chosen naturally has fatter tails than the target because it is conditioned on less information ($$X_{k - 1}$$ and $$X_k^o$$) than the target density ($$X_{k - 1}, X_k^o,$$ and $$X_{k + 1}$$), so we do not experience such problems here. Typically the acceptance rate for draws in the univariate case is about .6, while for the bivariate case it is about .4.

Lastly, in the interest rate diffusions considered in the article, negative interest rates are prohibited. Because the interest rate volatility rapidly declines as $$r \to 0$$ for the models considered, it is extremely rare for the candidate generator to produce a negative candidate draw for the interest rate. In these rare cases we simply reject the draw.

### Appendix C: Convergence of the Euler Approximation

Because Lipschitz and grown conditions are not satisfied by the drift or diffusion functions of the nonlinear model, standard sufficient conditions for the convergence of the Euler approximation are not met, raising the possibility that the approximation may not converge. In this appendix I briefly consider two moment-based tests of convergence intended to provide some validation of the Euler discretization.

For a given set of parameter values satisfying stationarity conditions, $$N$$ hundred year-long paths of the interest rate process are simulated using the Euler discretization of Equation (9). The terminal value of the *i*th simulation, $$r_{T, i}$$ is taken as a single draw from the unconditional distribution of the discretized process.

I first test whether the unconditional first through fourth moments of the discretized process match those of the corresponding diffusion. Following the work of Aït-Sahalia (1996a and 1996b), tools for solving for stationary densities have become well known. In particular, we know that the stationary distribution of any diffusion process $$dr_t = \mu(r_t)dt + \sigma(r_t)dB_t$$ is proportional to

Letting $$M_j$$ denote the *j*th uncentered moment of $$r_T$$ calculated from Equation (32), define the vector $$h_i$$ as

The second test uses moment restrictions derived by Hansen and Scheinkman (1995) for stationary diffusion processes. Let $$\phi(x) = x$$ and $$\phi^*(x) = x^2$$ denote two “test functions” and $$\mathcal {A}$$ the infinitesimal generator of the interest rate diffusion process for a given set of parameter values satisfying stationarity conditions.^{20} Hansen and Scheinkman's results may be applied to show that if the values $$r_{T, i}$$ are generated by the diffusion process corresponding to $$\mathcal {A},$$ then the random vector

While the first test is used to check that the Euler approximation and diffusion process produce the same marginal distribution for $$r_T,$$ the second test, since it relies on the joint distribution of $$r_{T - 1}$$ and $$r_T,$$ should also detect discrepancies between the transition probabilities of the Euler approximation and that of the diffusion.

Each test was implemented using both the flat prior posterior means from daily and monthly data to simulate data and construct the $$g_i$$ and $$z_i$$ variables. One hundred thousand independent simulations were performed using values of $$h$$ ranging from 1 to .05. Test statistics and $$p$$-values are displayed in Table 5.

Daily data | ||||
---|---|---|---|---|

$$h = 1$$ | $$h = .5$$ | $$h = .33$$ | $$h = .2$$ | |

Test 1 ($$p$$-value) | 8.61 (0.07) | 0.42 (0.98) | 4.30 (0.37) | 8.44 (0.08) |

Test 2 ($$p$$-value) | 4.07 (0.67) | 1.19 (0.98) | 10.69 (0.10) | 5.26 (0.51) |

Daily data | ||||
---|---|---|---|---|

$$h = 1$$ | $$h = .5$$ | $$h = .33$$ | $$h = .2$$ | |

Test 1 ($$p$$-value) | 8.61 (0.07) | 0.42 (0.98) | 4.30 (0.37) | 8.44 (0.08) |

Test 2 ($$p$$-value) | 4.07 (0.67) | 1.19 (0.98) | 10.69 (0.10) | 5.26 (0.51) |

Monthly data | ||||
---|---|---|---|---|

$$h = 1$$ | $$h = .5$$ | $$h = .2$$ | $$h = .05$$ | |

Test 1 ($$p$$-value) | 107.62 (0.00) | 15.34 (0.00) | 3.21 (0.52) | 5.99 (0.20) |

Test 2 ($$p$$-value) | 77.00 (0.00) | 44.98 (0.00) | 8.14 (0.23) | 6.23 (0.40) |

Monthly data | ||||
---|---|---|---|---|

$$h = 1$$ | $$h = .5$$ | $$h = .2$$ | $$h = .05$$ | |

Test 1 ($$p$$-value) | 107.62 (0.00) | 15.34 (0.00) | 3.21 (0.52) | 5.99 (0.20) |

Test 2 ($$p$$-value) | 77.00 (0.00) | 44.98 (0.00) | 8.14 (0.23) | 6.23 (0.40) |

The table reports test statistics and $$p$$-values for two tests of the convergence of the Euler approximation of the model

Overall, convergence does not seem to be much of an issue for data sampled at a daily frequency, as none of the test statistics are large enough to reject the null hypothesis that the simulated data are equivalent to data generated by the limiting diffusion process. Even the daily simulations with $$h = 1$$ produce a distribution that is indistinguishable from the true diffusion.

With monthly data, discretization bias is clearly evident, as both tests easily reject the null that the Euler approximation, simulated with either $$h = 1$$ or $$h = .5,$$ generates the same distribution as the diffusion process. Convergence appears extremely likely though, since the same tests do not result in rejections for smaller values of $$h$$. I conclude that concerns about the validity of the Euler approximation for this model are not large enough to avoid its use, though $$h$$ should preferably be set equal to a number smaller than .2 for monthly data.

### Appendix D: Drawing the Variance Parameters of the Short-Rate Model

The Euler approximation of the nonlinear spot rate model is given by

Were $$\gamma$$ known, however, the Euler approximation could be rearranged as

Construction of a Metropolis step requires the specification of a candidate-generating density for $$(\alpha, \gamma).$$ Our choice of candidate generator is driven by the availability of analytical draws from

Given a candidate-generating density for $$\gamma$$, say $$q(\gamma),$$ a joint candidate generator is given by

The Metropolis–Hastings acceptance probability, the probability of moving from one draw $$(\sigma, \gamma)$$ to a new draw $$(\sigma^*, \gamma^*)$$ is therefore equal to

### Appendix E: Implementing the Jeffreys Prior

As a preliminary step, we discuss the calculation of the Jeffreys prior. While the full Jeffreys prior is formulated as the square root of the determinant of the information matrix, for multiparameter models it is common to define the Jeffreys prior for a subset of the parameters of the model. What is called the “Jeffreys prior” in this article actually consists of a flat prior on $$\sigma$$ and $$\gamma$$, or $$p(\sigma, \gamma) \propto 1/\sigma,$$ multiplied by the square root of the determinant of the block of the information matrix that pertains to $$\alpha$$.

Using the Euler approximation likelihood, we calculate the 4 × 4 information matrix for the $$\alpha$$ vector, whose $$(i, j)$$ element is given by

Evaluation of these expressions is problematic for several reasons. First, although these partial derivatives can be evaluated easily, it is not possible to compute the expectations of these expressions analytically. We must therefore resort to simulation to take expectations.

Second, the likelihood of the process is only computable after augmenting with high-frequency data, while the expression above is an expectation of a function of the observed data only. In order to maintain tractability, an approximate Jeffreys prior is therefore derived under the assumption that the discretized process is observed continuously rather than only once per period. Since more frequent observation of a process does not generally result in sharper inference about mean parameters, the effect of assuming more frequent observation is most likely unimportant.

The Jeffreys prior is defined as the square root of the determinant of the information matrix. Given observations observed at intervals of length $$h$$, the likelihood function may be written as

To compute the Jeffreys prior in practice, the expectations in $$N(p)$$ must be computed by simulation. To evaluate the prior for a given set of parameters, 1000 interest rate paths, based on 500 paths of standard normal deviates, were simulated using antithetic random variables. To prevent nonnegativity, paths were truncated at .1%.^{21}

Rather than redoing the parameter draws under the Jeffreys prior, we can make use of the 10,000 parameter draws made for the flat prior. Let the subscript $$J$$ denote the Jeffreys prior and $$F$$ the flat prior, so

From our flat prior analysis, we already have many draws from $$p_F(\phi|\textbf {R}^{\textbf {o}}).$$ To “convert” these draws into draws from $$p_J(\phi|\textbf {R}^{\textbf {o}}),$$ we turn once again to the Metropolis–Hastings algorithm. Using our empirical distribution of $$p_F(\phi|\textbf {R}^{\textbf {o}})$$ as the candidate generator, the Metropolis acceptance probability of moving from $$\phi = (\alpha_0, \alpha_1, \alpha_2, \alpha_3, \sigma, \gamma)$$ to $$\phi^* = (\alpha_0^*, \alpha_1^*, \alpha_2^*, \alpha_3^*, \sigma^*, \gamma^*)$$ takes a simple form:

## References

*forthcoming in*Journal of Monetary Economics

*forthcoming in*Journal of Financial Econometrics

*forthcoming in*Journal of France