Two-Stage Least Squares as Minimum Distance

The Two-Stage Least Squares instrumental variables (IV) estimator for the parameters in linear models with a single endogenous variable is shown to be identical to an optimal Minimum Distance (MD) estimator based on the individual instrument specific IV estimators. The 2SLS estimator is a linear combination of the individual estimators, with the weights determined by their variances and covariances under conditional homoskedasticity. It is further shown that the Sargan test statistic for overidentifying restrictions is the same as the MD criterion test statistic. This provides an intuitive interpretation of the Sargan test. The equivalence results also apply to the effi cient two-step GMM and robust optimal MD estimators and criterion functions, allowing for general forms of heteroskedasticity. It is further shown how these results extend to the linear overidentified IV model with multiple endogenous variables. JEL Classification: C26, C13, C12


Introduction
For a single endogenous variable linear model with multiple instruments, the standard IV estimator is the Two-Stage Least Squares (2SLS) estimator, which is a consistent and asymptotically e¢ cient estimator under standard regularity assumptions and conditional homoskedasticity, see e.g. Hayashi (2000, p 228). This means that the 2SLS estimator combines the information from the multiple instruments asymptotically optimally under these conditions. An alternative estimator is the optimal Minimum Distance (MD) estimator, using an estimator of the variance matrix of the individual instrument-speci…c IV estimators of the parameter of interest. It is shown in the next section, Section 2, that this optimal MD estimator, with the variance speci…ed under conditional homoskedasticity, is identical to the 2SLS estimator. It is further shown that the Sargan test statistic for overidentifying restrictions is the same as the MD criterion test statistic, providing another intuitive interpretation of the Sargan test.
Surprisingly, it appears that these equivalence results are not available in the literature, and are not discussed in standard textbooks. Angrist (1991) derives similar results, but for the special case of orthogonal binary instruments, see also the discussion in Angrist and Pischke (2009, Section 4.2.2), whereas the results here are for general designs.
Recently, Chen, Jacho-Chávez and Linton (2016) used this setting and the two estimators as an example in their much wider-ranging paper, but they did not realise their equivalence and the results obtained in Section 2 modify the statements of Chen et al. (2016, pp 48-49).
In Section 2.2, the result is extended to the equivalence of the two-step GMM estimator and the optimal minimum distance estimator based on a robust variance-covariance estimator of the vector of instrument-speci…c IV estimates, robust to general forms of heteroskedasticity in the cross-sectional setting considered here. The two-step Hansen J-test statistic for overidentifying restrictions (Hansen, 1982) is also shown to be the same as the robust MD criterion test statistic.
Section 3 derives equivalence results for the multiple endogenous variables case. The setting considered there can best be characterised by the following simple example. Consider a linear model with two endogenous variables, and there are four instruments available. In principle, there are then six distinct sets of two, just identifying instruments.
However, a collection of three sets of two instruments that span all instruments is su¢cient to provide all information needed. For example, if the instruments are denoted by z 1 , z 2 , z 3 , and z 4 , then the collection of sets f(z 1 ; z 2 ) ; (z 2 ; z 3 ) ; (z 3; z 4 )g is su¢ cient. This results in three just identi…ed IV estimates of the two parameters of interest, and Section 3 shows that the per parameter optimal minimum distance estimators are identical to the 2SLS estimators.

Equivalence Result for Single Endogenous Variable Model
We have a sample f(y i ; x i ; z 0 i )g n i=1 and consider the model Note that other exogenous variables in the model, including the constant, have been partialled out. The k z > 1 instrument vector z i satis…es E (z i u i ) = 0 and is related to x i via the linear projection, or …rst-stage model Let y and x be the n-vectors (y 1 ; y 2 ; :::; y n ) 0 and (x 1 ; x 2 ; :::; x n ) 0 , and Z the n k z matrix with i-th row z 0 i and j-th column z j , i = 1; ::; n, j = 1; :::; k z .
The well-known Two-Stage Least Squares (2SLS) instrumental variables estimator is then and is given by Next consider the individual instrument-speci…c IV estimators for , given by It follows that the 2SLS estimator is a linear combination of the individual estimators, where is a k z -vector of ones, and is given by with n b p ! . The optimal minimum distance (MD) estimator for is then given by It is clear that the the MD estimator is also a linear combination of the individual instrument speci…c estimators, with P kz j=1 w md;j = 1.The next proposition states the main equivalence result, and w md;j = w 2sls;j for j = 1; ::; k z .
2sls and b md be as de…ned in (2), (7), (3) and (8) Note that the equivalence results obtained in Proposition 1 does, given the choice of b , not rely on any high level assumptions. For example, whilst the limiting distribution of b ind in (6) can only be derived under the assumption that j 6 = 0, for all j = 1; ::k z , the numerical equivalence results hold also when this assumption is violated and even if j = 0 for all j.
Let w j = w md;j = w 2sls;j . Whilst P kz j=1 w j = 1, the weigths w j can be negative, in which case b 2sls is not a weighted average of the b j . From the de…nition of w j in (5) it follows that sign(w j ) = sign b x;j z 0 j x , where b x;j is the j-th element of the OLS estimator of x in (1). Wlog, we can code the instruments such that b x;j 0 for all j, and standardise such that z 0 j z j =n = 1. It then follows that sign(w j ) = sign b j , where b j is the OLS estimator of j in the …rst-stage speci…cation x = z j j + v j . Therefore, A su¢ cient condition for w j 0 for all j is then that b jl 0 for all j,l, l > j , i.e. the instruments are uncorrelated or positively correlated with each other.
The weights for the minimum distance estimator are obtained from the constrained minimisation problem Imposing the constraints w j 0, for j = 1; :::; k z , results in a standard quadratic programming problem. If there are negative weigths in the original solution, then imposing nonnegativity will lead to some of the w md;j set equal to zero. The resulting estimator is then equal to a weighted average of the b j for a subset of the instruments that minimises the variance over the subsets for which the 2SLS and MD estimators are weighted averages of the instrument speci…c estimates.

Test for Overidentifying Restrictions
The standard test for the null hypothesis H 0 : E (z i u i ) = 0 is the Sargan test statistic given by . Under the null, standard regularity assumptions and conditional homoskedasticity, Sar b 2sls converges in distribution to a 2 kz 1 distributed random variable, see e.g. Hayashi (2000, p 228).

Next consider the MD criterion
Under the null hypothesis H 0 : 1 = 2 = ::: = kz = , and the assumptions stated above for the limiting distribution (6) to hold, M D b md converges in distribution to a 2 kz 1 distributed random variable, see e.g. Cameron and Trivedi (2005, p 203).
It follows directly from the results of Proposition 1 that S b

E¢ cient Two-Step Estimation
The equivalence results extend to the e¢ cient two-step GMM estimator. For the crosssectional setup considered here, this would cover the case of general conditional het- Using b 2sls as the initial consistent one-step GMM estimator, the e¢ cient two-step GMM estimator is de…ned as and, as b md = b 2sls , a robust variance estimator for b ind is given by De…ne the robust MD estimator as Under the null as speci…ed above, H 0 : 1 = 2 = ::: = kz = , and the assumptions stated above, M D r b md;r converges in distribution to a 2 kz 1 distributed random variable.
It follows directly from the proof of Proposition 1 that, for b 2 R, M D r (b) = J (b) and hence b gmm = b md;r and J b gmm = M D r b md;r .
Remark 2 An alternative "one-step" robust variance estimator for the MD estimator is for j; l = 1; :::; k z . The resulting minimum distance estimator, b md;ind , has the same limiting distribution as b md;r , but di¤ers in …nite samples.
Note that the minimum distance objective function we consider here is di¤erent from the minimum distance approach that leads for example to the LIML and Continuously Updating (CU) GMM estimators. Consider the OLS estimators b x and b y for x in model (1) and y in the speci…cation y i = z 0 i y + " i = z 0 i ( x ) + u i + v i . Then consider the minimum distance estimator b nmd ; b x;nmd = arg min V is a valid variance estimator under conditional homoskedasiticy only, b nmd is equal to the LIML estimator, see Goldberger and Olkin (1971). If b V is a robust variance estimator, b nmd is the CU-GMM estimator, see the discussion in Windmeijer (2018). Other recent approaches to minimum distance estimation are Sølvsten (2017) and Kolesár (2018).

Multiple Endogenous Variables
Consider next the multiple endogenous variables model where x i is a k x vector of endogenous variables. There are k z > k x instruments z i available. Let X be the n k x matrix of explanatory variables, with l-th column x l , then the 2SLS estimator is obtained as and is given by b 2sls = (X 0 P Z X) 1 X 0 P Z y: for t = 1; :::; k z k x + 1, where e for t = 1; :::; k z k x + 1, and hence 0 e D l = e x 0 l e X l : Therefore, from (13), b l;md = e x 0 l P e X l e x l 1 e x 0 l P e X l y: As the sets of instruments Z [t] kz kx+1 t=1 contain all k z instruments, it follows that e x l is in the column space of e X l , and so P e X l e x l = e x l . Therefore, for l = 1; :::; k x .
Next, consider the Sargan test statistic, given by where kz kx under the null H 0 : l;ind = l , but as b 2 u has to be a consistent estimator of 2 u , the maintained assumptions are that s;ind = s for s = 1; :::; k x , s 6 = l.
The following proposition states the equivalence of Sar b 2sls and M D b l;md for l = 1; :::; k x .  (14) and (15) As for the single-endogenous variable case, these results can be extended to the twostep GMM and robust MD estimators.