Abstract

In a recent paper Donohue and Wolfers (D&W) critique a number of modern econometric studies purporting to demonstrate a deterrent effect of capital punishment. This paper focuses on D&W's central criticism of a study by Zimmerman; specifically, that the estimated standard errors on the subset of his regressions that suggest a deterrent effect are downward biased due to autocorrelation. The method that D&W rely upon to adjust Zimmerman's standard errors is, however, potentially problematic, and is also only one of several methods to address the presence of autocorrelation. To this end, Zimmerman's original models are subjected to several parametric corrections for autocorrelation, all of which result in statistically significant estimates that are of the same magnitude to his original estimates. The paper also presents results obtained from an alternative model whose specification is motivated on theoretical and statistical grounds. These latter results also provide some evidence supporting a deterrent effect. Finally, the paper discusses D&W's use of randomization testing and their contention that executions are not carried out often enough to plausibly deter murders.

Introduction

In a recent article, John Donohue and Justin Wolfers (hereinafter “D&W”) (2005) review a number of modern econometric studies that purport to demonstrate a deterrent effect of capital punishment. The majority of these studies, including one by Zimmerman (2004), rely upon panel data methods. D&W are, for the most part, highly critical of these papers, arguing in each instance that their published results are highly sensitive to minor changes in model specification.

D&W certainly warrant commendation for the scale and scope of their critique, which has received extensive attention (and often praise) by other scholars as well as the mainstream media. The authors compiled and analyzed the data from a majority of the recent empirical studies on the deterrent effect of capital punishment, and subjected these data to a wide range of diagnostic testing/sensitivity analyses including, inter alia, changes in function form, sample periods, comparison groups, variable construction, and instrument selection. Indeed, D&W's methodical approach to addressing the gamut of empirical complexities raised in attempting to estimate the deterrent effect of capital punishment in itself marks the paper as an important contribution to the literature. However, while D&W offer a convincing challenge to the findings and conclusions of several of the panel data studies they chose to review, their criticism of Zimmerman (2004) is, in most regards, wanting.

The purpose of this paper is to address and respond to several criticisms made by D&W with regard to Zimmerman (2004) in specific and, more generally, to the issue of whether or not the “application” of the death penalty as currently practiced in the United States could be reasonably expected to deter potential offenders. Zimmerman, like D&W, ultimately concluded that the estimated deterrent effect of capital punishment was highly sensitive to model specification.1 Thus, there is no substantive disagreement between Zimmerman and D&W on this issue.2

D&W's primary criticism of Zimmerman (2004) concerns the statistical variability of those estimations that suggest a deterrent effect of capital punishment, i.e., they posit that the reported standard errors are not estimated with sufficient precision to be deemed “statistically significant” (due to the presence of autocorrelation in the data). D&W arrive at this conclusion by simply adjusting Zimmerman's standard error estimates through the use of a “clustering correction.” These type of corrections have become commonplace in the applied econometrics literature as a result of the influential paper by Bertrand et al. (2004). (See also Pepper (2002) and Wooldridge (2003) for further discussion on the application of clustering corrections in applied econometrics.)

Clustering (or cluster-robust) corrections, however, are only one of several ways to account for autocorrelation in panel data models. Further, recent econometric research has highlighted some potential shortcomings of cluster-robust corrections. For example, these methods are (in certain cases) now understood to be relatively low-powered, which might lead a researcher to conclude that an “intervention measure” (such as executions) imparts no statistically significant effect even when the null hypothesis is false.3 In particular, when the number of clustering groups is relatively small, cluster-robust corrections can result in estimated standard errors that are themselves severely biased. Therefore, revisiting the robustness of Zimmerman's original results with respect to alternative approaches to controlling for autocorrelation is warranted.4

The paper proceeds as follows. Section 2 briefly reviews Zimmerman's (2004) study. In Section 3, it is shown that applying a large number of parametric corrections for autocorrelation to Zimmerman's original empirical specifications (i.e., those regressions demonstrating a deterrent effect of executions) leads to point estimates of the deterrent effect that are statistically significant and identical in magnitude to those presented in Zimmerman (2004). Thus, the results of the latter study are robust to adjusting the standard errors for various parametric-based corrections for autocorrelation (and which, in some instances, may be preferable to the clustering approach relied upon by D&W).

Section 4 then proposes an alternative regression specification that accounts for potentially important dynamics in the evolution of crime rates in addition to addressing the concerns with autocorrelation raised by D&W. The results of these estimations also provide evidence for a deterrent effect of the death penalty, although they too are found to be somewhat sensitive to functional form. In this sense the empirical evidence of capital punishment's deterrent effect is best regarded as “mixed.” But, again, it is important to recognize that this is effectively the same conclusion originally reached by Zimmerman.

Sections 5 and 6 respectively consider two other facets of D&W's criticism that also concern the statistical variability of Zimmerman's (2004) estimates, one being methodological and the other conceptual. With regard to the former, in their critique D&W rely upon a “randomization test” that ostensibly demonstrates the spurious nature of the deterrent effect found in previous studies. However, the manner by which D&W implement their test does not provide any dispositive insights and precludes any scientifically valid determination of whether or not capital punishment deters the rate of murder. The latter Section concerns D&W's argument that capital punishment in the United States is applied “too infrequently” to allow for any causal identification of a deterrent effect. It is shown that D&W effectively reach this conclusion by relying on data that are simply too aggregated; when one looks at the more appropriate disaggregated (i.e., state-level) data the relevant “execution risk” is likely to be higher, and possibly high enough to impart a deterrent effect. Finally, Section 7 provides concluding remarks.

A Brief Overview of Zimmerman's Study

Zimmerman (2004) employs a panel of state-level data over the years 1978–1997 to estimate the effect of state executions on the rate of homicide. He constructs proxies of three “deterrence probabilities” based on Ehrlich's (1975) theoretical framework: (1) the probability of arrest for murder [forumla]; (2) the probability being convicted of committing murder conditional on being arrested [forumla]; and (3) the probability of being executed for murder conditional on conviction [forumla]. The structural regression model Zimmerman estimates is given by:  

(1)
formula
where the subscripts i and t index states and years, respectively. The dependent variable (Mi,t) is the number of reported Uniform Crime Report murders per 100,000 state residents. The variables forumla and forumla denote the constant and error term, respectively. The variable Ei,t denotes a vector of law enforcement covariates, Ci,t a vector of other per capita crime measures that may affect the rate of murder, and Xi,t a vector of economic and demographic covariates typically included in state-level studies of crime.5 The variables forumla, forumla, and forumla denote vectors of state indicators, year indicators, and state-specific time trends, respectively. All other variables reflect (vectors of) coefficients to be estimated.

Table 1 shows how each of the deterrence probabilities is constructed. In the “lagged models,” the variable in the numerator and denominator of each deterrence probability is the relevant once-lagged annual value. Thus, in these instances all deterrence probabilities are defined in terms of the underlying variables’ values in the previous year and, therefore, presumed to be “exogenous” to the current year's murder rate.6 For the sake of clarity, the characteristics of each empirical specification considered by Zimmerman (i.e., with respect to the construction of the various deterrence probabilities) are summarized in Table 2. Models 1–4 are the “base models,” which are estimated by OLS. Models 5 and 6, which incorporate the contemporaneous deterrence probability measures, are estimated using two-stage least squares (2SLS).

Table 1

Construction of Deterrence Probabilities

Deterrence Probability  Contemporaneous Models  Lagged Models 
Probability of arrest [Pr(a)]  forumla   forumla  
Probability of conviction  forumla   forumla  
 given arrest [Pr(c|a)]     
Probability of execution  forumla   forumla  
 given conviction [Pr(e|c)]     
Deterrence Probability  Contemporaneous Models  Lagged Models 
Probability of arrest [Pr(a)]  forumla   forumla  
Probability of conviction  forumla   forumla  
 given arrest [Pr(c|a)]     
Probability of execution  forumla   forumla  
 given conviction [Pr(e|c)]     

Notes: Sources and data are as described in Zimmerman (2004, Table. 2). All data used in constructing the variables presented above correspond to state-level observations.

Table 2

Summary of Per Capita Murder Regressions Estimated in Zimmerman (2004)

 Unadjusted Adjusted Unadjusted Adjusted  
 Contemporaneous Contemporaneous Lagged Lagged  
 Deterrence Deterrence Deterrence Deterrence Estimation 
Specification Probabilities Probabilities Probabilities Probabilities Method 
Model 1 Yes No No No OLS 
Model 2 No Yes No No OLS 
Model 3 No No Yes No OLS 
Model 4 No No No Yes OLS 
Model 5 Yes No No No 2SLS 
Model 6 No Yes No No 2SLS 
 Unadjusted Adjusted Unadjusted Adjusted  
 Contemporaneous Contemporaneous Lagged Lagged  
 Deterrence Deterrence Deterrence Deterrence Estimation 
Specification Probabilities Probabilities Probabilities Probabilities Method 
Model 1 Yes No No No OLS 
Model 2 No Yes No No OLS 
Model 3 No No Yes No OLS 
Model 4 No No No Yes OLS 
Model 5 Yes No No No 2SLS 
Model 6 No Yes No No 2SLS 

Notes: In Models 5 and 6 the endogenous (i.e., instrumented) variables are Pr(a), Pr(c|a), and Pr(e|c) as defined in Table 1. In some instances the variables in the denominators of the various deterrence probabilities were either zero or missing. A backward-looking adjustment is used to replace the undefined value of the observations with the last value where it was defined (i.e., on a state-year basis). This method was applied to both the contemporaneous and lagged deterrence probabilities models, which are referred to as the “adjusted” models above.

Zimmerman first obtains OLS estimates of Models 1–4, and finds that all point estimates of the relevant execution probability measures are negative but statistically insignificant at conventional levels (Zimmerman, 2004, Table 3). Recognizing that the OLS estimates of Models 1 and 2 may suffer from simultaneity bias (i.e., since the deterrence probabilities in those specifications rely on contemporaneous values of the underlying variables used in the numerator of each respective probability, and may therefore be contemporaneously correlated with the murder rate corresponding to the same year), Zimmerman proceeds by employing a set of instrumental variables to determine whether the state murder rate affects the relevant probability of execution proxy. These instruments, motivated by a “public choice” perspective of the capital punishment system,7 are used to estimate a simultaneous equations model where each of the deterrence probabilities in Models 5 and 6 is treated as endogenous.8

Zimmerman's 2SLS estimates of the (contemporaneous) execution probability measure, Pr(e|c), in Models 5 and 6 are larger in magnitude (i.e., more negative) than their corresponding OLS estimates in Models 1 and 2, respectively. And, unlike their corresponding OLS estimates, the estimates of Pr(e|c) in Models 5 and 6 are statistically significant at conventional levels. Specifically, the coefficient estimate on Pr(e|c) in Model 5 is −1.48 (t-statistic = −2.24) and −2.15 (t-statistic = −2.94) in Model 6. These results suggest that the murder rate might impart a positive (upward) effect on the contemporaneous execution probability measures or, all else equal, that a higher murder rate will (contemporaneously) result in a higher execution rate. As noted by D&W, this positive “reverse causation” running from the murder rate to the execution rate may occur, e.g., if high murder rates make the public frustrated enough to increase the use of the death penalty (Donohue and Wolfers, 2005, p. 819). In summary, the results of Zimmerman (2004) suggest that executions might impart a noticeable decrease on the number of murders. But, at the same time, there is sufficient variation in the estimated effects across models such that the deterrent effect of capital punishment cannot be unequivocally determined.

Alternative Approaches to Estimating Standard Errors in Panel Data Models to Account for the Effects of Autocorrelation on Statistical Inference

Over the past several years, applied econometric research has paid increasing attention to the effects that autocorrelation can exert on the statistical inference drawn from fixed effects or “difference-in-differences” models. Bertrand et al. (2004) show that failing to account for a highly auto- correlated “treatment effect” in the context of a difference-in-differences framework (specifically, where the treatment measure is constructed as a dummy variable and the panel data are of the state/year variety) can lead to OLS estimates “over-exaggerating” the amount of independent variation in the data that can be exploited by a researcher. This effect may result in the spurious finding of a statistically significant impact of the treatment effect.

To overcome the above problem, Bertrand et al. advocate the use of cluster-robust corrections to estimate standard errors in panel data, i.e., corrections that account for the fact that the data may be correlated either across space and/or over time. Cluster-robust standard errors (when appropriately applied) are advantageous in that they provide estimates that are robust to heteroskedasticity and (within-group) autocorrelation of arbitrary form. Applying this method to Zimmerman's original 2SLS models has a dramatic effect on the estimated standard errors of the conditional probability of execution variable: the relevant absolute t-statistic pertaining to Model 5 falls from 2.24 to 0.74 and that pertaining to Model 6 falls from 2.94 to 0.81.

While the Bertrand et al. (2004) study has made a substantial impact on modern econometric practice, subsequent research has highlighted potential shortcomings of cluster-robust standard errors. For example, it is now relatively well understood that these methods can be low-powered, meaning that the probability of rejecting the null when it is false is low.9 Put differently, cluster-robust corrections may lead to the null hypothesis being under-rejected when the alternative hypothesis is true (i.e., when the treatment actually imparts a real and substantive causal effect). In addition, it is not even clear whether clustering at the state-level (as is done by D&W) can be assumed to satisfy the asymptotic properties upon which these methods are justified.10

For panel data regressions controlling for unit (i.e., group) specific effects, Imbens and Wooldridge (2007) note that Hansen's (2007b) study implies that cluster-robust corrections will tend to work well when the cross-section and time-series dimensions of the data are similar and not “too small.” Again, it is not clear whether state-level data provide for a sufficiently large number of clustering groups and, clearly, Zimmerman's (2004) dataset does not consist of a (roughly) equal number of groups (states) and time periods (years). If time fixed effects are also controlled for in the panel regression (as is done in Zimmerman's estimations), then the asymptotics must apply with both the cross-section and time dimensions becoming “large” (Imbens and Wooldridge, 2007)—an even stricter requirement. But this latter condition also cannot be said to hold for Zimmerman's data, which further brings the efficacy of D&W's preferred cluster-robust correction into question.

It has also been shown that Bertrand et al. study is unduly pessimistic about the ability of parametric AR(1) corrections for autocorrelation (e.g., the Prais–Winston correction) to provide correct rejection rates of the null while offering higher statistical power.11 As such, examining the effects on the standard errors pertaining to Zimmerman's original models but using various (parametric) corrections for autocorrelation (i.e., as opposed to the cluster-robust estimates relied upon by D&W) warrants consideration.12 Indeed, doing as much seems entirely consistent with D&W's stance regarding the need to conduct sensitivity checks on econometric work that may inform important public policy decisions (such as the use of capital punishment).

Table 3 presents the results of re-estimating Models 5 and 6 using five alternative parametric approaches to correct for autocorrelation: (1) Prais–Winston regression (panel-specific AR(1) process); (2) linear fixed-effects regression (common AR(1) process); (3) generalized least squares (GLS) regression (common AR(1) process); (4) GLS regression (panel-specific AR(1) process); and (5) Newey–West regression (lag order of autocorrelation set to one or two).13 All of the first- and second-stage regression specifications used to estimate the models in Table 3 are identical to those used to obtain Zimmerman's original estimates of Models 5 and 6 (i.e., save for the estimation method).14 Further, in each instance the same estimation method used to obtain the results from the second-stage regression (e.g., GLS) is also used to estimate each first-stage regression.15

Table 3

Second-Stage Regression Estimates of the Deterrent Effect of Capital Punishment Using Parametric AR(1) Corrections for Autocorrelation

 Prais–Winston Linear Fixed-   Newey-West Newey-West 
 Regression Effects Regression GLS Regression GLS Regression regression regression 
 (Panel- Specific AR(1) Process) (Common AR (1) Process) (Common AR (1) Process) (Panel-Specific AR(1) Process) (lag order of autocorrelation =1) (lag order of autocorrelation =2) 
Independent variable Model 5 Model 6 Model 5 Model 6 Model 5 Model 6 Model 5 Model 6 Model 5 Model 6 Model 5 Model 6 
(Murder arrests in year t)/(Murders in year t−1.558** −1.750*** −1.289** −1.268* −1.772** −1.855*** −1.594** −1.631*** −1.956** −2.125** −1.956** −2.125** 
 (2.33) (2.61) (2.05) (1.88) (2.53) (2.61) (2.53) (2.63) (2.19) (2.33) (2.11) (2.24) 
(Death sentences in year t)/(Murders arrests in year t−1) [unadjusted] −5.56  −1.743  −4.192  −4.696  −1.849  −1.849  
 (1.09)  (0.26)  (0.71)  (0.97)  (0.25)  (0.25)  
(Executions in year t)/(Death sentences in year t−1) [unadjusted] 1.646***  1.809***  1.645***  −1.768***  −2.126***  −2.126***  
 (4.30)  (2.85)  (4.16)  (5.33)  (4.02)  (4.04)  
(Death sentences in year t)/(Murders arrests in year t−1) [adjusted]  −4.539  −3.923  −4.969  −5.346  −2.914  −2.914 
  (0.87)  (0.52)  (0.83)  (1.11)  (0.38)  (0.38) 
(Executions in year t)/(Death sentences in year t−1) [adjusted]  1.927***  1.622**  1.694***  1.935***  2.206***  2.206*** 
  (3.91)  (1.98)  (3.78)  (4.86)  (3.59)  (3.60) 
Prisoners per capita −0.006*** −0.006*** −0.006*** −0.005*** −0.006*** −0.006*** −0.006*** −0.006*** −0.006*** −0.006*** −0.006*** −0.006*** 
 (3.51) (3.46) (2.90) (2.60) (5.63) (5.48) (6.23) (6.03) (3.99) (3.84) (4.02) (3.86) 
Police per capita −0.001 −0.001 0.001 0.002 −0.002 −0.001 −0.001 −0.001 −0.003 −0.002 −0.003 −0.002 
 (0.18) (0.16) (0.32) (0.53) (0.54) (0.46) (0.29) (0.24) (0.63) (0.52) (0.61) (0.51) 
% Unemployed −0.084** −0.084** −0.144*** −0.139*** −0.121*** −0.112*** −0.083** −0.082** −0.139*** −0.129*** −0.139*** −0.129*** 
 (2.07) (2.07) (3.54) (3.39) (3.42) (3.19) (2.47) (2.46) (2.97) (2.75) (2.85) (2.65) 
Income per capita 0.287** 0.278** 0.312** 0.277** 0.356*** 0.330*** 0.319*** 0.281** 0.359** 0.304** 0.359** 0.304** 
 (2.07) (2.01) (2.40) (2.06) (3.03) (2.80) (2.91) (2.54) (2.49) (2.07) (2.42) (2.01) 
% Metro 0.015 0.015 −0.009 −0.008 −0.001 −0.002 0.021 0.016 −0.005 −0.007 −0.005 −0.007 
 (0.51) (0.51) (0.31) (0.27) (0.03) (0.09) (0.89) (0.68) (0.18) (0.23) (0.20) (0.25) 
% Poverty 0.007 0.010 0.024 0.016 0.017 0.01 0.010 0.009 0.026 0.019 0.026 0.019 
 (0.33) (0.46) (1.17) (0.78) (0.79) (0.46) (0.50) (0.46) (0.85) (0.60) (0.84) (0.59) 
% Black 1.666*** 1.800*** 0.381 1.068* 1.950*** 2.026*** 1.723*** 1.802*** 2.235*** 2.346*** 2.235*** 2.346*** 
 (3.64) (3.69) (1.03) (1.95) (5.49) (5.30) (5.22) (5.12) (4.65) (4.47) (4.48) (4.32) 
% Ages 18–24 0.135 0.161 −0.09 0.044 0.075 0.089 0.129 0.157 0.074 0.104 0.074 0.104 
 (1.08) (1.29) (0.71) (0.38) (0.65) (0.77) (1.17) (1.43) (0.63) (0.88) (0.61) (0.86) 
% Ages 25–44 0.087 0.091 −0.055 −0.073 0.135 0.096 0.104 0.088 0.179 0.138 0.179 0.138 
 (0.54) (0.57) (0.29) (0.37) (0.84) (0.60) (0.71) (0.60) (0.91) (0.69) (0.88) (0.67) 
% Ages 45–64 0.721*** 0.696*** 0.193 0.230 0.943*** 0.830*** 0.789*** 0.702*** 1.045*** 0.925*** 1.045*** 0.925*** 
 (2.90) (2.81) (0.71) (0.84) (3.75) (3.38) (3.47) (3.11) (3.11) (2.79) (2.95) (2.65) 
% Ages 65 and over −0.011 −0.11 0.501* 0.410 0.067 0.013 0.004 −0.110 0.167 0.062 0.167 0.062 
 (0.04) (0.37) (1.91) (1.55) (0.31) (0.06) (0.02) (0.51) (0.63) (0.23) (0.62) (0.23) 
Robberies per capita 0.017*** 0.017*** 0.020*** 0.020*** 0.016*** 0.016*** 0.016*** 0.017*** 0.016*** 0.016*** 0.016*** 0.016*** 
 (12.23) (12.22) (12.12) (12.05) (16.83) (16.88) (17.13) (17.28) (9.52) (9.48) (8.62) (8.58) 
Aggravated assaults per capita 0.005*** 0.005*** 0.003*** 0.003** 0.005*** 0.005*** 0.005*** 0.005*** 0.006*** 0.006*** 0.006*** 0.006*** 
 (4.53) (4.55) (2.63) (2.36) (5.31) (5.26) (5.75) (5.77) (4.15) (4.07) (3.93) (3.86) 
Constant −27.149** −25.789** −15.451 −22.020** −34.780*** −30.097*** −30.558*** −26.062*** −39.674*** −33.650** −39.674*** −33.650** 
 (2.53) (2.41) (1.54) (2.12) (3.40) (3.00) (3.22) (2.74) (2.89) (2.49) (2.82) (2.43) 
Observations 1000 1000 950 950 1000 1000 1000 1000 1000 1000 1000 1000 
 Prais–Winston Linear Fixed-   Newey-West Newey-West 
 Regression Effects Regression GLS Regression GLS Regression regression regression 
 (Panel- Specific AR(1) Process) (Common AR (1) Process) (Common AR (1) Process) (Panel-Specific AR(1) Process) (lag order of autocorrelation =1) (lag order of autocorrelation =2) 
Independent variable Model 5 Model 6 Model 5 Model 6 Model 5 Model 6 Model 5 Model 6 Model 5 Model 6 Model 5 Model 6 
(Murder arrests in year t)/(Murders in year t−1.558** −1.750*** −1.289** −1.268* −1.772** −1.855*** −1.594** −1.631*** −1.956** −2.125** −1.956** −2.125** 
 (2.33) (2.61) (2.05) (1.88) (2.53) (2.61) (2.53) (2.63) (2.19) (2.33) (2.11) (2.24) 
(Death sentences in year t)/(Murders arrests in year t−1) [unadjusted] −5.56  −1.743  −4.192  −4.696  −1.849  −1.849  
 (1.09)  (0.26)  (0.71)  (0.97)  (0.25)  (0.25)  
(Executions in year t)/(Death sentences in year t−1) [unadjusted] 1.646***  1.809***  1.645***  −1.768***  −2.126***  −2.126***  
 (4.30)  (2.85)  (4.16)  (5.33)  (4.02)  (4.04)  
(Death sentences in year t)/(Murders arrests in year t−1) [adjusted]  −4.539  −3.923  −4.969  −5.346  −2.914  −2.914 
  (0.87)  (0.52)  (0.83)  (1.11)  (0.38)  (0.38) 
(Executions in year t)/(Death sentences in year t−1) [adjusted]  1.927***  1.622**  1.694***  1.935***  2.206***  2.206*** 
  (3.91)  (1.98)  (3.78)  (4.86)  (3.59)  (3.60) 
Prisoners per capita −0.006*** −0.006*** −0.006*** −0.005*** −0.006*** −0.006*** −0.006*** −0.006*** −0.006*** −0.006*** −0.006*** −0.006*** 
 (3.51) (3.46) (2.90) (2.60) (5.63) (5.48) (6.23) (6.03) (3.99) (3.84) (4.02) (3.86) 
Police per capita −0.001 −0.001 0.001 0.002 −0.002 −0.001 −0.001 −0.001 −0.003 −0.002 −0.003 −0.002 
 (0.18) (0.16) (0.32) (0.53) (0.54) (0.46) (0.29) (0.24) (0.63) (0.52) (0.61) (0.51) 
% Unemployed −0.084** −0.084** −0.144*** −0.139*** −0.121*** −0.112*** −0.083** −0.082** −0.139*** −0.129*** −0.139*** −0.129*** 
 (2.07) (2.07) (3.54) (3.39) (3.42) (3.19) (2.47) (2.46) (2.97) (2.75) (2.85) (2.65) 
Income per capita 0.287** 0.278** 0.312** 0.277** 0.356*** 0.330*** 0.319*** 0.281** 0.359** 0.304** 0.359** 0.304** 
 (2.07) (2.01) (2.40) (2.06) (3.03) (2.80) (2.91) (2.54) (2.49) (2.07) (2.42) (2.01) 
% Metro 0.015 0.015 −0.009 −0.008 −0.001 −0.002 0.021 0.016 −0.005 −0.007 −0.005 −0.007 
 (0.51) (0.51) (0.31) (0.27) (0.03) (0.09) (0.89) (0.68) (0.18) (0.23) (0.20) (0.25) 
% Poverty 0.007 0.010 0.024 0.016 0.017 0.01 0.010 0.009 0.026 0.019 0.026 0.019 
 (0.33) (0.46) (1.17) (0.78) (0.79) (0.46) (0.50) (0.46) (0.85) (0.60) (0.84) (0.59) 
% Black 1.666*** 1.800*** 0.381 1.068* 1.950*** 2.026*** 1.723*** 1.802*** 2.235*** 2.346*** 2.235*** 2.346*** 
 (3.64) (3.69) (1.03) (1.95) (5.49) (5.30) (5.22) (5.12) (4.65) (4.47) (4.48) (4.32) 
% Ages 18–24 0.135 0.161 −0.09 0.044 0.075 0.089 0.129 0.157 0.074 0.104 0.074 0.104 
 (1.08) (1.29) (0.71) (0.38) (0.65) (0.77) (1.17) (1.43) (0.63) (0.88) (0.61) (0.86) 
% Ages 25–44 0.087 0.091 −0.055 −0.073 0.135 0.096 0.104 0.088 0.179 0.138 0.179 0.138 
 (0.54) (0.57) (0.29) (0.37) (0.84) (0.60) (0.71) (0.60) (0.91) (0.69) (0.88) (0.67) 
% Ages 45–64 0.721*** 0.696*** 0.193 0.230 0.943*** 0.830*** 0.789*** 0.702*** 1.045*** 0.925*** 1.045*** 0.925*** 
 (2.90) (2.81) (0.71) (0.84) (3.75) (3.38) (3.47) (3.11) (3.11) (2.79) (2.95) (2.65) 
% Ages 65 and over −0.011 −0.11 0.501* 0.410 0.067 0.013 0.004 −0.110 0.167 0.062 0.167 0.062 
 (0.04) (0.37) (1.91) (1.55) (0.31) (0.06) (0.02) (0.51) (0.63) (0.23) (0.62) (0.23) 
Robberies per capita 0.017*** 0.017*** 0.020*** 0.020*** 0.016*** 0.016*** 0.016*** 0.017*** 0.016*** 0.016*** 0.016*** 0.016*** 
 (12.23) (12.22) (12.12) (12.05) (16.83) (16.88) (17.13) (17.28) (9.52) (9.48) (8.62) (8.58) 
Aggravated assaults per capita 0.005*** 0.005*** 0.003*** 0.003** 0.005*** 0.005*** 0.005*** 0.005*** 0.006*** 0.006*** 0.006*** 0.006*** 
 (4.53) (4.55) (2.63) (2.36) (5.31) (5.26) (5.75) (5.77) (4.15) (4.07) (3.93) (3.86) 
Constant −27.149** −25.789** −15.451 −22.020** −34.780*** −30.097*** −30.558*** −26.062*** −39.674*** −33.650** −39.674*** −33.650** 
 (2.53) (2.41) (1.54) (2.12) (3.40) (3.00) (3.22) (2.74) (2.89) (2.49) (2.82) (2.43) 
Observations 1000 1000 950 950 1000 1000 1000 1000 1000 1000 1000 1000 

Notes: The data set comprises state-level observations based over the years 1978–1997. The dependent variable is the number of reported UCR murders per capita. All per capita variables are defined per 100,000 state residents. Absolute value of z-statistics or t-statistics in parentheses. The symbols “*”, “**”, and “***” denote statistical significance at the 10%, 5%, and 1% levels, respectively, in a two-tailed test. All regressions are estimated using state populations as weights (except for those pertaining to the linear fixed-effects model, which are unweighted) and include full sets of state indicators, year indicators, and state-specific time trends (estimates not shown). The same holds true for each of the respective models' first-stage regressions (estimates not shown). All regressions are statistically significant at (better than) the 1% level in a two-tailed test.

Each of the estimated execution risk measures in Table 3 takes a negative sign and all are statistically significant at conventional levels. Specifically, nine out of the 10 individual point estimates are statistically significant at the 1% level and one is statistically significant at the 5% level.16 As expected, the probability of arrest variable is also consistently negative and statistically significant across all specifications. The sentencing risk measure is also consistently negative as expected, but is never statistically significant at conventional levels. These latter findings are also consistent with Zimmerman (2004).

Measured across all 10 regressions, the estimates imply (on average) that each additional state execution deters approximately 15 murders.17,18 The average value of the lower bound of the 95% confidence interval as measured across all eight regressions is seven additional murders deterred, while the corresponding average upper bound is 23 additional murders. Clearly, these regressions provide strong evidence that Zimmerman's original results and conclusions are effectively unchanged when employing what may be a more appropriate type of correction for autocorrelation.

An Alternative Model Specification for Estimating the Deterrent Effect of Capital Punishment

Including the lagged value of the dependent variable as an explanatory measure is a common and simple approach to account for autocorrelation that is frequently employed in econometric studies (Keele and Kelly, 2006). In the context of the economic model of crime, this treatment may be also be motivated on theoretical grounds. For instance, Moody (2001, p. 806) notes that:

Given the very real possibility that crime causes crime, it would appear to be necessary to incorporate momentum, lags, and other dynamics into the analysis. An increase in crime can cause the law enforcement sector to be overwhelmed. The probability of arrest and conviction declines. Everyone seems to be committing crime and getting away with it. The result is a multiple increase in the crime rate.

As such, Models 1–6 of Zimmerman (2004) are re-estimated with the lagged murder rate as an additional explanatory measure.19 These alternative specifications are specified (respectively) with a prime symbol (“ ”) in the ensuing discussion. Of course, the interpretation of the estimated coefficients from these alternative specifications is nominally different from those obtained from the original specifications (i.e., since the former hold the lagged influence of the murder rate constant, whereas the latter do not). Similarly, the fact that the identification of these models is somewhat different relative to Zimmerman's original specifications (as discussed further below) should be taken into account when interpreting the results. In any event, the OLS specifications (Models 1′–4′) again provided no evidence to suggest a deterrent effect of capital punishment, which is consistent with Zimmerman's original results.20

In Zimmerman (2004), the same set of instruments was used to estimate both Models 5 and 6. The test of overidentifying restrictions (“OIR”) was found to be within conventional bounds for Model 5 (p-value = 0.16), but not so for Model 6 (p-value < 0.01). In the present case, when using the instrument set considered in Zimmerman (2004) to estimate Model 6′, the OIR test was again found to be outside conventional bounds. However, it is possible to explore the “failure” of the OIR test for Model 6′ by considering the statistical significance of the (excluded) instruments in a regression of the second-stage residuals on the (excluded) instruments and all other exogenous variables.21 Two of the (excluded) instruments were found to be statistically significant at conventional levels in this latter regression. When these two instruments are then dropped from the simultaneous equations system, the OIR test for Model 6′ becomes satisfied at conventional levels (p-value = 0.37).22 Finally, if the same instrument set that satisfied the OIR test for Model 6′ is also used to estimate Model 5′, the resultant OIR test is also passed at conventional levels (p-value = 0.37).23 As such, all of the estimates discussed below rely upon the set of instruments that satisfy the OIR test for Model 6′.

Table 4 presents the 2SLS estimates of Models 5′ and 6′. The coefficient estimate on the probability of execution is −0.96 in Model 5′ (approximately 35% smaller than the original Model 5 estimate) and statistically significant at the 10% level (t-statistic = −1.67). The corresponding estimate for Model 6′ is −1.38 (approximately 36% smaller than the original estimate) and statistically significant at the 5% level (t-statistic = −2.30).24 These estimates imply that each additional execution will deter approximately eight and 11 murders on average, respectively.25 That the estimated magnitudes of the execution risk coefficients are now smaller than their “original” counterparts is hardly surprising since (again) inclusion of the lagged dependent variable as a covariate tends to bias coefficient estimates toward zero. Further, each coefficient estimate resides within the 95% confidence interval associated with Zimmerman's original “preferred” estimate.

Table 4

Second-Stage Regression Estimates of the Deterrent Effect of Capital Punishment Using the Lagged Value of the Dependent Variable to Correct for Autocorrelation

 Model 5′ Model 6′ 
(Murder arrests in year t)/(Murders in −1.80** −1.70* 
 year t(2.07) (1.91) 
(Death sentences in year t)/(Murders −9.07  
 arrests in year t−1) [unadjusted] (0.82)  
(Death sentences in year t)/(Murders  −6.05 
 arrests in year t−1) [adjusted]  (0.72) 
(Executions in year t)/(Death sentences −0.96*  
 in year t−1) [unadjusted] (1.67)  
(Executions in year t)/(Death sentences  −1.38** 
 in year t−1) [adjusted]  (2.30) 
Once-lagged per capita murders 0.31*** 0.31*** 
 (8.03) (8.40) 
Prisoners per capita −3.88E -03*** −4.16E -03*** 
 (3.06) (3.45) 
Police per capita −2.85E -03 −1.41E -03 
 (0.70) (0.36) 
% Unemployed −0.13*** −0.09* 
 (2.87) (1.89) 
Income per capita 0.23 0.25* 
 (1.52) (1.72) 
% Metro 3.90E -03 3.06E -03 
 (0.11) (0.10) 
% Poverty −0.01 0.01 
 (0.35) (0.18) 
% Black 1.41*** 1.79*** 
 (2.77) (3.37) 
% Ages 18–24 0.02 0.18 
 (0.14) (1.30) 
% Ages 25–44 −0.10 −0.02 
 (0.39) (0.07) 
% Ages 45–64 0.47 0.44 
 (1.31) (1.45) 
% Ages 65 and over 0.49* 0.34 
 (1.83) (1.33) 
Robberies per capita 0.01*** 0.01*** 
 (10.32) (10.08) 
Aggravated assaults per capita 3.74E -03*** 4.84E -03*** 
 (2.83) (3.88) 
Constant −18.02 −22.32* 
 (1.12) (1.72) 
Observations 751 920 
F(HO: All slopes = 0) 113.91*** 121.21*** 
p(OIR) 0.37 0.37 
 Model 5′ Model 6′ 
(Murder arrests in year t)/(Murders in −1.80** −1.70* 
 year t(2.07) (1.91) 
(Death sentences in year t)/(Murders −9.07  
 arrests in year t−1) [unadjusted] (0.82)  
(Death sentences in year t)/(Murders  −6.05 
 arrests in year t−1) [adjusted]  (0.72) 
(Executions in year t)/(Death sentences −0.96*  
 in year t−1) [unadjusted] (1.67)  
(Executions in year t)/(Death sentences  −1.38** 
 in year t−1) [adjusted]  (2.30) 
Once-lagged per capita murders 0.31*** 0.31*** 
 (8.03) (8.40) 
Prisoners per capita −3.88E -03*** −4.16E -03*** 
 (3.06) (3.45) 
Police per capita −2.85E -03 −1.41E -03 
 (0.70) (0.36) 
% Unemployed −0.13*** −0.09* 
 (2.87) (1.89) 
Income per capita 0.23 0.25* 
 (1.52) (1.72) 
% Metro 3.90E -03 3.06E -03 
 (0.11) (0.10) 
% Poverty −0.01 0.01 
 (0.35) (0.18) 
% Black 1.41*** 1.79*** 
 (2.77) (3.37) 
% Ages 18–24 0.02 0.18 
 (0.14) (1.30) 
% Ages 25–44 −0.10 −0.02 
 (0.39) (0.07) 
% Ages 45–64 0.47 0.44 
 (1.31) (1.45) 
% Ages 65 and over 0.49* 0.34 
 (1.83) (1.33) 
Robberies per capita 0.01*** 0.01*** 
 (10.32) (10.08) 
Aggravated assaults per capita 3.74E -03*** 4.84E -03*** 
 (2.83) (3.88) 
Constant −18.02 −22.32* 
 (1.12) (1.72) 
Observations 751 920 
F(HO: All slopes = 0) 113.91*** 121.21*** 
p(OIR) 0.37 0.37 

Notes: The data set comprises state-level observations based over the years 1978–1997. The dependent variable is the number of reported UCR murders per capita. All per capita variables are defined per 100,000 state residents. The symbols “*”, “**”, and “***” denote statistical significance at the 10%, 5%, and 1% levels, respectively, in a two-tailed test. All regressions are estimated using state populations as weights and include full sets of state indicators, year indicators, and state-specific time trends (estimates not shown). The same holds true for each of the respective models' first-stage regressions (estimates not shown).

Of course, it remains to be determined whether the inclusion of the lagged murder rate actually controls for the influence of autocorrelation in the above estimations. As noted by Wooldridge (2002), statistically testing the series of idiosyncratic errors in fixed-effects models for the presence of autocorrelation is a very complex problem. However, when the number of time periods in the panel is greater than or equal to three, Wooldridge suggests the following approach. First, estimate the OLS regression  

(2)
formula
where forumla and forumla in Zimmerman's data. The variable forumla denotes the residual obtained from estimating the second-stage equation associated with Model 5′ or 6′, while forumla denotes the error term.

Under the null hypothesis that the (unobserved) errors are serially uncorrelated (assuming the fixed-effects model is estimated in levels) the estimated fixed-effects (time-demeaned) errors will have (first-order) correlation forumla, where T denotes the number of time periods in the panel (Wooldridge, 2002).26 As such, the presence of autocorrelation may be determined by testing the null hypothesis forumla, where forumla denotes the estimated first-order autocorrelation coefficient obtained from equation (1). If the absolute value of the computed t-statistic is sufficiently large (i.e., considering standard levels of statistical significance) then H0 would be rejected. This finding would suggest the presence of autocorrelation and raise the concern that the estimated standard errors might still be biased downward.

Since T = 20 in Zimmerman's data, the value of forumla to be tested under the null hypothesis is −0.05. For Model 5′, the computed value of the (absolute) t-statistic does reject the null hypothesis of “no autocorrelation,” but only at the 10% level. However, the corresponding t-statistic with regard to Model 6′ fails to reject the null hypothesis at any conventional level of statistical significance. These findings suggest that the influence of autocorrelation on the estimated standard errors is likely to be small at best after controlling for the lagged impact of the murder rate.

Finally, recall that when Zimmerman's original “preferred” specification was estimated using a (pseudo) double-logarithmic functional form the estimated coefficient on the execution risk turned statistically insignificant at conventional levels. Applying the double-log specification to Models 5′ also turns the estimated impact of the execution risk statistically insignificant. However, this finding does not hold for Model 6′, i.e., the estimated deterrent effect is robust to the change in functional form. Specifically, the (absolute) p-value associated with the estimated impact of the execution risk is 0.06.

The above results suggest that the estimated deterrent effect of capital punishment is sensitive to functional form, as was noted by Zimmerman (2004). In this sense one could reasonably argue that the “overall” evidence for a deterrent effect of capital punishment is “mixed” at best. On the other hand, Dezhbakhsh et al. (2003, p. 353) argue that the linear functional form may be the most theoretically appropriate in estimating aggregate economic models of crime since: (1) the underlying theory that motivates the aggregate empirical specification is derived from the maximizing behavior of an individual offender; and (2) only the linear functional form is invariant to aggregation (e.g., whereas the double-log functional form is not). If one accepts these arguments then the above estimations (considered in their totality) could be taken as evidence supporting a deterrent effect of capital punishment.

Randomization Testing

One particularly interesting analysis conducted by D&W in order to evaluate the precision of the Dezhbakhsh et al. (2003) and Zimmerman (2004) IV estimates involves a so-called “randomization test,” which involves conducting a counterfactual or “randomized” experiment on a set of “artificial” data.27 Specifically, D&W use the latter authors’ original panel data but match each state's homicide rate to a random state's independent variables (and instruments).28 With these artificial data, D&W re-ran each of the authors’ (alleged) preferred 2SLS regression models, repeating the process 1000 times. This process results in 1000 different “artificial” point estimates of the conditional execution probability. D&W then consider the distribution of these coefficient estimates (i.e., in terms of the estimated “life–life tradeoff”)29 relative to the value of Dezhbakhsh et al.'s and Zimmerman's preferred estimates.

The results of D&W's randomized experiment appear, prima facie, to be quite damning. The mass of the distribution of their artificial conditional execution probability coefficient estimates is skewed toward the negative scale of the “life–life tradeoff” (i.e., implying, in most instances, that the artificial estimated execution probability is positively correlated with the murder rate, or that a “brutalization effect” of capital punishment dominates).30 Relying on these estimates, D&W conclude that there is no “real” deterrent effect associated with executions.

The conclusions D&W draw from their randomization test appear to be questionable at best. This is simply due to the fact that D&W's method of interpreting their results is not consistent with that proscribed by the received econometric literature on randomized testing. Specifically, D&W only address the magnitude and signs of the various estimated “artificial” coefficients. However, randomization testing formally tests the relevant null hypothesis by considering the size of the estimated test statistics associated with the “artificial” estimates relative to the size of the test statistic associated with the “actual” coefficient estimate.31 See Kennedy (1995, 2003) for further discussion.

Intuitively, in the artificial or “shuffled” datasets, there is, by construction, no “real” relationship between the dependent and independent variables. Thus, if the alternative hypothesis were actually true, one would not expect to find very many “artificial” estimated t-statistics that were larger (in absolute value) than the (absolute value) of the “original” estimated t-statistic.

Approximately forumla, where X denotes the total number of t-values generated from the randomization test and (say) forumla, would have to be greater than or equal to the estimated t-statistic (both being expressed in absolute values) in order to fail to reject the null hypothesis (i.e., to statistically infer that there is no deterrent effect of capital punishment).

However, D&W never report or discuss the estimated standard errors (or t-statistics) pertaining to any of their artificial coefficient estimates, thus making it impossible to determine whether the relevant null hypothesis is or is not formally rejected by their randomization test.32 Again, the authors only take into account the overall distribution (i.e., density function) of their artificial coefficient estimates (i.e., not the relevant test statistics that must be relied upon to draw proper statistical inference from a randomization test) and consider the proportion of the estimates that were at least as large in magnitude to Zimmerman's estimated (actual) coefficient.33 Clearly, D&W do not appear to have conducted their randomization test in a meaningful fashion, and as such, their reported results cannot be used to infer that Zimmerman's estimates are statistically spurious.34

The Frequency of Executions and Deterrence

Some brief discussion regarding a central tenant of D&W's criticism is also warranted: namely, that the number of executions actually carried out is so low that one cannot reasonably extract the execution-related “signal” from the overall “noise” explaining the large year-to-year fluctuations in the murder rate. For example, D&W point out that “[i]n 2003, there were 16,503 homicides … but only 144 inmates were sentenced to death … of the 3374 inmates on death row at the beginning of the year, only 65 were executed” (Donohue and Wolfers, 2005, p. 795). It appears that D&W inappropriately include one Federal execution and two Federal death sentences in their reported figures.35 Removing these Federal cases results in a death sentence to homicide ratio (i.e., which would presumably be relied upon by D&W as a proxy of a death penalty “sentencing risk”) of approximately 0.86% [ = (142/16,503)*100], which one might argue is a seemingly “small” probability.

It is worth noting, however, that the number of homicides D&W report for 2003 corresponds to the national number of homicides reported in the United States, i.e., the number measured across all states.36 Further, given that only some states actually have the death penalty (and, as noted by D&W, some states that have the death penalty do not routinely conduct executions, which might suggest that death sentences handed down in those states would not entail a strong deterrent effect), a “sentencing risk” estimate based upon national homicide counts is necessarily biased downward. If one considers only those states that have the death penalty, the sentencing risk increases to approximately 0.95%.37 While the latter estimate is also seemingly “low” in magnitude, it is higher than the one based upon data from all states. And if one considers the subset of death penalty states that actually conducted an execution in 2003, the sentencing risk is: (number of persons received under sentence of death in states that executed an offender in 2003)/(number of murders in states that issued death sentences in 2003) = (79/5750)*100 = 1.4%.38

Again, while one might argue that this latter figure is seemingly “small” in magnitude, D&W do concede that “… certain catastrophic events that occur with low frequency [may be] given greater prominence in decision making than their likelihood warrants if individuals are given frequent vivid reminders of these events, which could conceivably make the death penalty more of a deterrent than a rational calculation of the risk … would suggest …” (Donohue and Wolfers, 2005, note 23) and that “there is little evidence on how criminals form their expectations” (Donohue and Wolfers 2005, note 67). As such, it may be entirely possible that even very rare occurrences of “severe punishments” (such as being executed) are sufficient to deter the behavior of a marginal offender (in this case a potential murderer).

Of course, one could reasonably argue that calculating the sentencing risk is most appropriate when done on a state-by-state basis since death sentences handed down (or executions carried) out in a given death penalty state should not induce a deterrent effect in any other state. For the subset of states that sentenced at least one offender to death in 2003, the sentencing risk ratio ranges from 0.15% (Georgia) to 13.33% (Montana).39 While D&W are absolutely correct in noting that social scientists do not have a good sense of how criminals (whether they be actual or “potential”) form their perceptions of risk arising from the presence or application of criminal sanctions (be they executions or otherwise), these disaggregated figures nonetheless highlight the potentially misleading inferences that can be drawn when regarding such probabilities and failing to examine the theoretically appropriate unit of observation.

A similar exercise can also be applied to the 2003 death row population and execution figures cited by D&W. D&W include 23 Federal prisoners under sentence of death in their death row population total of 3374. After removing these cases the national-level ratio of executions to death row inmates would imply an average “execution risk” in 2003 of approximately 1.94% [ = (65/3351)*100]. Again, when one considers the more theoretically relevant state-level unit of observation, for the subset of states in which at least one execution was carried out in 2003, the execution risk ranges from 0.82% (Florida) to 13.73% (Oklahoma). The (unweighted) average execution risk across the latter set of states is approximately 4.42%.40 Of course, whether or not execution-related signal that can be extracted from individual states’ application of the death penalty is actually “strong” enough to induce a deterrent effect of capital punishment is not, in any sense, directly answered (either theoretically or empirically) by D&W (or by anyone else) at this time, and thus remains an open and important question for future research.

Concluding Remarks

D&W's recent critique of various empirical studies in the economics literature suggesting a deterrent effect of capital punishment highlights both the inherent difficulties in identifying causal effects from states’ application of the death penalty and the inherent danger in drawing conclusions from potentially “fragile” estimates. These factors in themselves mark the paper as a significant contribution to the literature. This paper addresses several criticisms made by D&W with regard to the statistical variability of the estimates suggesting a deterrent effect of capital punishment as presented in Zimmerman (2004) while also discussing several apparent shortcomings in D&W's critique.

Recent econometric research has shown that the cluster-robust methods that D&W rely upon may not result in correct statistical inference when applied to U.S. state-level panel data. Some researchers have suggested that parametric corrections to address autocorrelation, when available, might be more appropriate than cluster-robust methods for the purpose of adjusting estimated standard errors in the presence of autocorrelation (e.g., Hansen 2007b). It is shown that Zimmerman's (2004) 2SLS estimates of the deterrent effect of capital punishment are robust to a wide variety of alternative parametric corrections for autocorrelation. An alternative model specification that incorporates a lagged dependent variable structure also provides some evidence of a deterrent effect of state executions. And while not all the point estimates of the deterrent effect presented herein are statistically significant, one could argue that the preponderance of the evidence suggests that state executions do in fact impart an appreciable reduction in the number of per capita state murders.

Clearly, the proverbial “death penalty debate” is hardly decided as a result of D&W's critique.41 This paper shows that many of D&W's criticisms of Zimmerman's original work do not hold up under scrutiny, and other authors have also rebutted D&W's criticisms of their research.42 Further, it is worth noting that more recent studies (which appeared after D&W's critique was published) also provide empirical evidence suggesting a deterrent effect of capital punishment.43 In any event, D&W's study will likely stimulate much more investigation in this topic (which is certainly needed in order to gain a clearer understanding of the effects of the death penalty). This outcome, of course, is another important contribution of their work.

1
Indeed, this fact was noted explicitly by Zimmerman (2004, p. 189) (“[T]he results appear to be highly sensitive to functional form. When the simultaneous equations model is specified in double-logs the estimated deterrent effect of capital punishment disappears. While other recent studies report a deterrent effect of capital punishment using either linear or logarithmic functional forms … these estimates [i.e., those obtained by Zimmerman] nonetheless highlight the longstanding difficulty in conclusively determining whether or not capital punishment deters murder, a difficulty which is unlikely to be resolved anytime soon.”) (emphasis added). Zimmerman informed D&W that the pre-publication version of their paper provided an incomplete and inaccurate synopsis of his conclusions. See e-mail from Paul R. Zimmerman to Justin Wolfers (December 2, 2005, 11:13 pm) (on file with author). However, D&W (for reasons known only to them) chose to ignore the message.
2
D&W do, however, make several errors of commission in discussing Zimmerman's analysis. For example, the authors state that Zimmerman's preferred estimate “suggests that each execution saves 19 lives ….” Donohue and Wolfers (2005, p. 835). However, Zimmerman's actual preferred estimate suggested that each execution saves 14 lives. D&W appear to have based their statement on the results of a model that was not emphasized by Zimmerman. See Zimmerman (2004, p. 185): “Taking the unadjusted probabilities model as the preferred case …” (emphasis added, internal citations omitted). Further, it does not appear that D&W based any of their analyses on the model upon which Zimmerman actually focused. See, e.g., Donohue and Wolfers (2005, note 108) (also misreporting Zimmerman's preferred estimate).
3
See, e.g., Hansen (2007a, 2007b) and the references cited therein, as well as Imbens and Wooldridge (2007) for further discussion.
4
It is worth noting that some researchers have questioned whether correcting estimated standard errors through clustering or parametric methods is appropriate at all when considering the types of panel data models most often considered in the economics of crime literature. See, e.g., National Research Council (2004).
5
Specifically, the additional covariates include measures of per capita police, per capita prisoners, per capita income, the state unemployment rate, the percent of the state residing in metropolitan areas, the state poverty rate, the proportion of the state population that is black, various measures reflecting the state age distribution, per capita robberies, and per capita assaults.
6
In some instances the variables in the denominators of the various deterrence probabilities were either zero (e.g., a death penalty state may not have sentenced any offenders to death in the previous year) or missing (i.e., no value of the variable was recorded in the published sources). In either case the deterrence probability would be “undefined,” and these observations would need to be dropped from the sample. In order to “recover” these undefined observations, Zimmerman follows an approach similar to Dezhbakhsh et al. (2003) and employs a backward-looking “adjustment” to replace the undefined value of the observations with the last value where it was defined (i.e., on a state-year basis). This method was applied to both the contemporaneous and lagged deterrence probability models, in which case they are referred to as the “adjusted models.” See Zimmerman (2004, p. 171) for further discussion.
7
The instruments include contemporaneous and once-lagged values of the percentage of murders classified as committed by strangers, under nonfelonious circumstances, and by non-whites, and indicators of whether an offender was released from death row in the previous year (due to doubts about his/her guilt) and whether there was a “botched” execution in the previous year.
8
See Zimmerman (2004, pp. 171–78) for further discussion of how the simultaneous equations system is specified and the hypothesized relationships between the instruments and the contemporaneous deterrence probabilities.
9
The author thanks an anonymous referee for pointing this out.
10
Some researchers have argued that the use of clustering corrections is only justified when the number of clusters is relatively large. See, e.g., National Research Council (2004, p. 138, note 11) (“[A] commonly used method for making these [standard error] corrections is reliable only when the number of “clusters” (here states) is large, and there is reason to think that the 50 states do not constitute a large enough set of clusters to make these methods reliable.”). Other studies draw similar inferences. See, e.g., Donald and Lang (2007), Cameron et al. (2007), and Dezhbakhsh and Rubin's 2007 unpublished paper, “From the ‘Econometrics of Capital Punishment’ to the ‘Capital Punishment’ of Econometrics: On the Use and Abuse of Sensitivity Analysis”. It is also worth noting that studies examining the finite-sample properties of cluster-robust corrections in simulations have been largely considered in terms of OLS or GLS estimation, but D&W apply these corrections to Zimmerman's instrumental variables (2SLS) regressions. As noted in Wooldridge's unpublished paper, “Cluster-Sample Methods in Applied Econometrics: An Extended Analysis” (2006, p. 14): “Unfortunately, for any of the IV methods there appears to be little simulation evidence for how ‘large G’ [where G denotes the number of clustering groups] standard errors work for IV methods when G is not so large.”
11
See Hansen (2007b, pp. 611–16) (“[T]he simulations reveal that a potential weakness of the clustered estimator is a relatively high variance. The [clustered covariance matrix estimator] estimates have a substantially higher standard deviation than the other estimators … This behavior … does suggest that if a parametric estimator is available, it may have better properties for estimating the variance of forumla.”).
12
It is possible that AR(1) corrections may not reasonably control for the biasing effect imparted on estimated standard errors from serial correlation if the observed degree in the latter does not “dampen” quickly beyond first-order effects. However, in the present context it may be possible to gain some insight into this matter by following an approach suggested by Bertrand et al. (2004, p. 255). An OLS regression of the state murder rate (Mi,t) on the state dummies, year dummies, and state-specific time trends (i.e., the controls for “fixed effects” in Zimmerman's models) is estimated in order to obtain the predicted values of the dependent variable (denoted by forumla). The estimated residuals (denoted forumla) are then computed as the difference between the actual and predicted values of the murder rate (i.e., forumla). Finally, the first-, second-, and third-order autocorrelation coefficients are estimated from an OLS regression of the estimated residuals on their respective first-, second-, and third-order lagged values (i.e., forumla, forumla, and forumla, respectively). This procedure yields an estimated first-order autocorrelation coefficient of 0.37 for the state murder rate data used in Zimmerman's analysis. The corresponding second-order estimated autocorrelation coefficient is also positive but approximately 76% smaller in magnitude. Finally, the third-order autocorrelation coefficient turns negative while remaining small in absolute magnitude. As such, these results do not suggest that the bias introduced by including the once-lagged murder rate in the covariate set is likely to be substantial, or that there is likely to be a problem with autocorrelation beyond that pertaining to first-order effects. This same procedure was also applied to the contemporaneous execution probabilities employed in Models 5 and 6. The estimated first-, second-, and third-order autocorrelation coefficients (in their respective order) are as follows: Model 5 (0.17, −0.25, −0.15); Model 6 (0.31, −0.26, −0.14).
13
The terms “common” and “panel-specific” as applied to these models refer to the way in which Stata (the statistical software used by both the author and D&W) allows the autocorrelation structure to be specified. The “common” structure assumes an AR(1) process that is identical across all panels (states), whereas the “panel-specific” structure is more flexible in that it allows for an AR(1) correlation within panels and for the coefficient of the AR(1) process to be specific to each panel.
14
The one exception is the linear fixed-effects regressions with a common AR(1) disturbance. Stata requires that the weighting variable (state population) be constant across panels in these models. As such, these particular regressions are not weighted by state population.
15
The predicted values obtained from estimating the three first-stage regressions are then entered in place of the respective endogenous deterrence probability measures in the second-stage regression to derive consistent estimates of the arrest, sentencing, and execution risk effects. This estimation approach follows the one used in the Dezhbakhsh et al. (2003) study and in D&W's replication of that study (see http://islandia.law.yale.edu/donohue/pubsdata.htm).
16
An attempt was also made to estimate Models 5 and 6 using Prais–Winston regression with a common AR(1) process. In this case, coefficient estimates for Model 5 could not be obtained because the estimated variance–covariance matrix was not positive definite. However, estimates for Model 6 were obtained using this specification. In this case, the coefficient estimate on the probability of execution proxy was −1.683 and was also statistically significant at conventional levels (t-statistic = −2.80).
17
Using the notation from equation (1), the estimated number of murders deterred in each model is calculated as
forumla,
where Population1997 is the total population (in thousands) of the death penalty states in 1997 and Death sentences1996 is the number of persons sentenced to death in 1996.
18
The point estimates on the execution risk measures presented in Table 3 differ slightly from the corresponding estimates originally presented in Zimmerman (2004). This discrepancy is due to the fact that there are missing data on murder arrests and death sentences, which implies that the dataset consists of unbalanced panels. Most of the panel data estimators in Stata require the use of the “force” option with unbalanced panels, which must be applied when the correlation structure requires information on the time dimension and that the observations be equally spaced. When this (or other similar) options are invoked, the effective sample size used to estimate the model increases, which explains the difference in the original point estimates and the Newey–West estimates (whereas the Prais–Winston and GLS estimates would tend to be somewhat different regardless). On the other hand, the fixed-effects AR(1) regression models require that a year of data be dropped in order to estimate, so for these specifications the effective sample size is smaller than that corresponding to the original sample. In any event, the null hypothesis that the respective point estimates are equal to those obtained from Zimmerman's corresponding original estimates cannot be rejected, with the computed p-values ranging from 0.31 to 0.93.
19
This approach was also used to derive the estimates presented within Zimmerman (2006) (disaggregating the effect of executions by their method of application).
20
Note that including the lagged value of the dependent variable can potentially lead to coefficient estimates in the structural model being downward biased. However, Keele and Kelly (2006) present Monte Carlo evidence suggesting that this bias is likely to be small unless the degree of (first-order) autocorrelation present in the original errors is relatively high. As such, the estimates presented here may be regarded as conservative approximations of the deterrent effect of capital punishment. Further, while some commentators have raised concerns regarding the use of lagged crime rates as instrumental variables, similar concerns are unlikely to apply to the inclusion of the lagged crime rate as a covariate in the structural equation. For instance, it is not obvious why the inclusion of a predetermined crime rate ought to be regarded as “endogenous” in this instance. But even if this is not the case, the analysis presented herein is not primarily concerned with identifying consistent estimates on the lagged crime rate, i.e., it is only concerned with deriving inference on the key explanatory variable of interest, namely the execution risk.
21
The second-stage residuals are those obtained from a regression of the dependent variable (i.e., the state murder rate) on the predicted values of the endogenous variables (which are obtained from the first-stage regressions) and all other exogenous variables from the structural regression. See Klepinger et al. (1995) for further discussion of this approach to instrument selection.
22
The resulting instrument set consisted of the lagged percentage of murders committed by strangers, the contemporaneous and lagged percentage of murders committed under nonfelonious circumstances, the contemporaneous and lagged percentage of murders committed by non-whites, and the indicator for the occurrence of a botched execution in the previous year. As such, the models remain overidentified.
23
D&W argue that some of Zimmerman's instruments may be invalid if, e.g., “certain classes of homicides simply vary more than others, their share in the total will be directly correlated with the homicide rate, invalidating the use of these variables as instruments.” See Donohue and Wolfers (2005, note 104). However, it was precisely these sorts of concerns that lead Zimmerman to note that the theoretical arguments motivating his instruments “are not considered so sufficiently compelling that they warrant ignoring the appropriate testing of the instruments excluded from the structural per-capita murder equation.” See Zimmerman (2004, p. 174) (emphasis added).
24
The F-statistics pertaining to the joint significance (relevance) of the instruments in the first-stage regressions of the endogenous deterrence probabilities are all statistically significant at (better than) the 1% level in a two-tailed test. In addition, the 2SLS coefficient estimates pertaining to the deterrence probabilities in Model 5′ and Model 6′ are substantially larger in magnitude relative to their respective OLS estimates, which should not be the case if the instruments are too “weak.”
25
The 95% confidence intervals associated with the execution probability estimates obtained from Model 5′ and Model 6′ are [5,23] and [9,29], respectively.
26
Alternatively, if the model is estimated in first-differences the errors will have correlation equal to −0.5. See Drukker (2003).
27
In fairness, D&W's randomized experiment as applied to Zimmerman’s data was included only as part of their pre-publication paper 2005's “Uses and Abuses of Empirical Evidence in the Death Penalty Debate” but was subsequently removed from the published version (Donohue and Wolfers, 2005). However, D&W did not remove any discussion of the randomization test that was applied to the Dezhbakhsh et al. (2003) study. As such, some discussion of D&W's test is still germane.
28
See John J. Donohue and Justin Wolfers' 2005 unpublished manuscript, “Uses and Abuses of Empirical Evidence in the Death Penalty Debate,” p. 832.
29
See Sunstein and Vermeule (2005, p. 706) for discussion the life–life tradeoff concept.
30
See Figure 8 of Donohue and Wolfers' 2005 unpublished manuscript.
31
See, e.g., Kennedy (2003, p. 77):
An alternative computer-based means of estimating a sampling distribution of a test statistic is that associated with a randomization/permutation test. The rationale behind this testing methodology is that if an explanatory variable has no influence on a dependent variable then it should make little difference to the outcome of the test statistic if the values of this explanatory variable are shuffled and matched up with different dependent variable values. By performing this shuffling thousands of times, each time calculating the test statistic, the hypothesis can be tested by seeing if the original test statistic value is unusual relative to the thousands of test statistic values created by the shufflings … Hypothesis testing is based on viewing the test statistic as having resulted from a game of chance … (emphasis added).
32
The author attempted to obtain the data and code used by D&W to generate with 1000 artificial coefficient estimates (as well as the actual underlying data used to construct the distribution of these estimates—i.e., the estimated coefficients and their estimated standard errors—in addition to all of the other material employed by D&W in addressing this study), by explicitly requesting said information on two separate occasions (and each several months apart) from Justin Wolfers. See e-mail from Paul R. Zimmerman to Justin Wolfers (December 9, 2005, 3:38 pm) (on file with author); e-mail from Paul R. Zimmerman to Justin Wolfers (May 22, 2006, 11:34 am) (on file with author). Because none of the requested data or computer code employed by D&W was ever provided to the author (despite the repeated requests), the discussion herein is (by necessity) based solely on the language contained within Donohue and Wolfers (2005) and their unpublished manuscript from 2005, “Uses and Abuses of Empirical Evidence in the Death Penalty Debate.”
33
Randomization tests require the use of a pivotal statistic, such as the t-statistic, in order to draw proper inference (which would not be obtained by relying on a coefficient estimate).
34
There are other issues obfuscating any assessment of D&W's randomization test. For instance, the authors state that their randomization test employs “block randomization.” See Donohue and Wolfers (2005, note 99). Presumably, this means that D&W only reshuffled the data within two or more groups or “blocks” of observations, and then obtained their 1000 artificial coefficient estimates from the 1000 “block-shuffled” datasets. This approach is ordinarily taken when the researcher believes that the effect of the “other” explanatory variables of interest (e.g., the variables other than the execution rate) on the outcome measure (the murder rate) is so strong that it is difficult to detect the effect of the key explanatory measure (the execution rate). Rejection of the null in the instant case would imply that executions had a significant effect within treatment groups. Yet, while it appears D&W may have based their blocks (treatment groups) on years, this is never stated explicitly, nor were the sensitivity of the randomization results to other block constructions (e.g., death penalty versus non-death penalty states, states that have conducted executions versus those that have not, by decade, etc.) discussed, even though these appear to be reasonable alternatives As such, it is difficult (if not impossible) to assess the validity or robustness of D&W's results.
35
See Capital Punishment, 2003, U.S. Bureau of Justice Statistics, http://www.ojp.usdoj.gov/bjs/pub/pdf/cp03.pdf.
36
See http://www.fbi.gov/filelink.html?file=/ucr/cius_03/xl/03tbl05.xls.
37
Following D&W, all the numbers relied upon in the ensuing discussion reflect sentencing and execution data as of December 31, 2003.
38
It is worth noting that 2003 was a somewhat “unusual” year in the following sense. As of December 31, 2002, Illinois had 159 prisoners under the sentence of death, but by December 31, 2003, all 159 of these prisoners had been removed from death row (none by executions). Thus, these 159 additional death sentences are not included in D&W's figures. In addition, only two persons were received on death row during 2003 in Illinois. As such, the entire year-end death row population in Illinois at the end of 2003 consisted of only two individuals. See Capital Punishment, 2003.
39
The (unweighted) average across all the relevant individual states is approximately 2.8%.
40
It is not being argued that the construction of the “execution risk” discussed here is appropriate. Rather, its consideration in the above discussion is made only to illustrate the need to consider the proper unit of analysis when drawing any inferences regarding the “signal” associated with state executions.
41
Indeed, two eminent scholars in the economics of crime, Judge Richard Posner and Nobel Laureate Gary Becker, have recently asserted that they believe capital punishment deters (at least some) murders. See Posner (2006) and Becker (2006), respectively.
42
See, e.g., Cloninger and Marchesini (forthcoming).
43
See, e.g., the recent contribution of Ekelund et al. (2006), which finds evidence in favor of the deterrence hypothesis while considering a broad range of econometric specification issues.

References

Becker
Gary S.
On the Economics of Capital Punishment
Economists’ Voice
 , 
2006
, vol. 
3
  
article 4. http://www.bepress.com/ev/vol3/iss3/art4/ (accessed April 2, 2009)
Bertrand
Marianne
Duflo
Esther
Mullainathan
Senhil
How Much Should We Trust Difference-in-Difference Estimates?
Quarterly Journal of Economics
 , 
2004
, vol. 
109
 (pg. 
249
-
75
)
Cameron
Colin
Gelbach
Jonah B.
Miller
Douglas L.
Bootstrap-Based Improvements for Inference with Clustered Errors
2007
 
Law and Economics Paper No. 07/002. Florida State University College of Law
Cloninger
Dale O.
Marchesini
Roberto
Reflections on a Critique
Applied Economics Letters
  
(forthcoming)
Donohue
John J.
Wolfers
Justin
Uses and Abuses of Empirical Evidence in the Death Penalty Debate
Stanford Law Review
 , 
2005
, vol. 
58
 (pg. 
791
-
846
)
Dezhbakhsh
Hashem
Rubin
Paul H.
Shepherd
Joanna M.
Does Capital Punishment Have a Deterrent Effect? New Evidence from Post-Moratorium Panel Data
American Law and Economics Review
 , 
2003
, vol. 
5
 (pg. 
344
-
76
)
Donald
Stephen, G.
Lang
Kevin
Inference with Difference-in-Differences and Other Panel Data
Review of Economics and Statistics
 , 
2007
, vol. 
89
 (pg. 
221
-
33
)
Drukker
David M
Testing for Serial Correlation in Linear Panel Data Models
Stata Journal
 , 
2003
, vol. 
3
 (pg. 
168
-
77
)
Ehrlich
Issac
The Deterrent Effect of Capital Punishment: A Question of Life and Death
American Economic Review
 , 
1975
, vol. 
65
 (pg. 
468
-
74
)
Ekelund
Robert B.
Jr.
Jackson
John D.
Ressler
Rand W.
Tollison
Robert D.
Marginal Deterrence and Multiple Murders
Southern Economic Journal
 , 
2006
, vol. 
72
 (pg. 
521
-
41
)
Hansen
Christian B
Generalized Least Squares Inference in Panel and Multilevel Models with Serial Correlation and Fixed Effects
Journal of Econometrics
 , 
2007
, vol. 
140
 (pg. 
670
-
94
)
Hansen
Christian B
Asymptotic Properties of a Robust Variance Matrix Estimator for Panel Data When T Is Large
Journal of Econometrics
 , 
2007
, vol. 
141
 (pg. 
597
-
620
)
Imbens
Guido W.
Wooldridge
Jeffery M.
What's New in Econometrics?
2007
 
Lecture Notes, NBER Summer Institute 2007, Cambridge, MA
Keele
Luke
Kelly
Nathan J.
Dynamic Models for Dynamic Theories: The Ins and Outs of Lagged Dependent Variables
Political Analysis
 , 
2006
, vol. 
14
 (pg. 
186
-
205
)
Kennedy
Peter E
Randomization Tests in Econometrics
Journal of Business and Economic Statistics
 , 
1995
, vol. 
13
 (pg. 
85
-
94
)
Kennedy
Peter
A Guide to Econometrics
 , 
2003
5th edition
Cambridge, MA
MIT Press
Klepinger
D.
Lundberg
S.
Plotnick
R.
Instrument Selection: The Case of Teenage Childbearing and Women's Educational Attainment
1995
 
Institute for Research on Poverty Discussion Paper No. 1077-95, Univ. of Wisconsin at Madison
Moody
Carlisle
Testing for the Effects of Concealed Weapons Laws: Specification Errors and Robustness
Journal of Law and Economics
 , 
2001
, vol. 
44
 (pg. 
799
-
813
)
National Research Council
Wellford
Charles F.
Pepper
John V.
Petri
Carol V.
Firearms and Violence: A Critical Review
 , 
2004
 
National Academy of Sciences. http://www.nap.edu/catalog/10881.html?onpi_newsdoc12162004#toc (accessed April 2, 2009)
Newey
Whitney K.
West
Kenneth D.
A Simple, Positive Semi-Definite, Heteroskedasticity and Autocorrelation Consistent Covariance Matrix
Econometrica
 , 
1987
, vol. 
55
 (pg. 
703
-
08
)
Pepper
John V
Robust Inferences from Random Clustered Samples: An Application Using Data from the Panel Study of Income Dynamics
Economics Letters
 , 
2002
, vol. 
75
 (pg. 
341
-
45
)
Richard
A.
The Economics of Capital Punishment
Economists’ Voice
 , 
2006
, vol. 
3
  
article 3. http://www.bepress.com/ev/vol3/iss3/art3/ (accessed April 2, 2009)
Sunstein
Cass R.
Vermeule
Adrian
Is Capital Punishment Morally Required?: Acts, Omissions, and Life-Life Tradeoffs
Stanford Law Review
 , 
2005
, vol. 
58
 (pg. 
701
-
48
)
Wooldridge
Jeffery M
Econometric Analysis of Cross Section and Panel Data
 , 
2002
1st Edition
Cambridge, MA
MIT Press
Wooldridge
Jeffery M
Cluster-Sample Methods in Applied Econometrics
American Economic Review
 , 
2003
, vol. 
93
 (pg. 
133
-
38
)
Zimmerman
Paul R
State Executions, Deterrence, and the Incidence of Murder
Journal of Applied Economics
 , 
2004
, vol. 
7
 (pg. 
163
-
93
)
Zimmerman
Paul R
Estimates of the Deterrent Effect of Alternative Execution Methods in the United States: 1978–2000
American Journal of Economics and Sociology
 , 
2006
, vol. 
65
 (pg. 
909
-
42
)

Author notes

The author thanks Bruce Benson, Larry Kenny, John Lott, and an anonymous referee for helpful comments on earlier versions of this paper and the editor (John Pepper) for his generous advice and guidance. Elisabeth Murphy provided excellent research assistance. The views expressed in this paper are not necessarily those of the FTC, its commissioners, or any other staff. The usual caveat applies.