## Abstract

In a recent paper Donohue and Wolfers (D&W) critique a number of modern econometric studies purporting to demonstrate a deterrent effect of capital punishment. This paper focuses on D&W's central criticism of a study by Zimmerman; specifically, that the estimated standard errors on the subset of his regressions that suggest a deterrent effect are downward biased due to autocorrelation. The method that D&W rely upon to adjust Zimmerman's standard errors is, however, potentially problematic, and is also only one of several methods to address the presence of autocorrelation. To this end, Zimmerman's original models are subjected to several parametric corrections for autocorrelation, all of which result in statistically significant estimates that are of the same magnitude to his original estimates. The paper also presents results obtained from an alternative model whose specification is motivated on theoretical and statistical grounds. These latter results also provide some evidence supporting a deterrent effect. Finally, the paper discusses D&W's use of randomization testing and their contention that executions are not carried out often enough to plausibly deter murders.

## Introduction

In a recent article, John Donohue and Justin Wolfers (hereinafter “D&W”) (2005) review a number of modern econometric studies that purport to demonstrate a deterrent effect of capital punishment. The majority of these studies, including one by Zimmerman (2004), rely upon panel data methods. D&W are, for the most part, highly critical of these papers, arguing in each instance that their published results are highly sensitive to minor changes in model specification.

D&W certainly warrant commendation for the scale and scope of their critique, which has received extensive attention (and often praise) by other scholars as well as the mainstream media. The authors compiled and analyzed the data from a majority of the recent empirical studies on the deterrent effect of capital punishment, and subjected these data to a wide range of diagnostic testing/sensitivity analyses including, *inter alia*, changes in function form, sample periods, comparison groups, variable construction, and instrument selection. Indeed, D&W's methodical approach to addressing the gamut of empirical complexities raised in attempting to estimate the deterrent effect of capital punishment in itself marks the paper as an important contribution to the literature. However, while D&W offer a convincing challenge to the findings and conclusions of several of the panel data studies they chose to review, their criticism of Zimmerman (2004) is, in most regards, wanting.

The purpose of this paper is to address and respond to several criticisms made by D&W with regard to Zimmerman (2004) in specific and, more generally, to the issue of whether or not the “application” of the death penalty as currently practiced in the United States could be reasonably expected to deter potential offenders. Zimmerman, like D&W, ultimately concluded that the estimated deterrent effect of capital punishment was highly sensitive to model specification.^{1} Thus, there is no substantive disagreement between Zimmerman and D&W on this issue.^{2}

D&W's primary criticism of Zimmerman (2004) concerns the statistical variability of those estimations that suggest a deterrent effect of capital punishment, i.e., they posit that the reported standard errors are not estimated with sufficient precision to be deemed “statistically significant” (due to the presence of autocorrelation in the data). D&W arrive at this conclusion by simply adjusting Zimmerman's standard error estimates through the use of a “clustering correction.” These type of corrections have become commonplace in the applied econometrics literature as a result of the influential paper by Bertrand *et al.* (2004). (See also Pepper (2002) and Wooldridge (2003) for further discussion on the application of clustering corrections in applied econometrics.)

Clustering (or cluster-robust) corrections, however, are only one of several ways to account for autocorrelation in panel data models. Further, recent econometric research has highlighted some potential shortcomings of cluster-robust corrections. For example, these methods are (in certain cases) now understood to be relatively low-powered, which might lead a researcher to conclude that an “intervention measure” (such as executions) imparts no statistically significant effect even when the null hypothesis is false.^{3} In particular, when the number of clustering groups is relatively small, cluster-robust corrections can result in estimated standard errors that are themselves severely biased. Therefore, revisiting the robustness of Zimmerman's original results with respect to *alternative* approaches to controlling for autocorrelation is warranted.^{4}

The paper proceeds as follows. Section 2 briefly reviews Zimmerman's (2004) study. In Section 3, it is shown that applying a large number of parametric corrections for autocorrelation to Zimmerman's original empirical specifications (i.e., those regressions demonstrating a deterrent effect of executions) leads to point estimates of the deterrent effect that are statistically significant and identical in magnitude to those presented in Zimmerman (2004). Thus, the results of the latter study are robust to adjusting the standard errors for various parametric-based corrections for autocorrelation (and which, in some instances, may be preferable to the clustering approach relied upon by D&W).

Section 4 then proposes an *alternative* regression specification that accounts for potentially important dynamics in the evolution of crime rates *in addition to* addressing the concerns with autocorrelation raised by D&W. The results of these estimations also provide evidence for a deterrent effect of the death penalty, although they too are found to be somewhat sensitive to functional form. In this sense the empirical evidence of capital punishment's deterrent effect is best regarded as “mixed.” But, again, it is important to recognize that this is effectively the *same* conclusion originally reached by Zimmerman.

Sections 5 and 6 respectively consider two other facets of D&W's criticism that also concern the statistical variability of Zimmerman's (2004) estimates, one being methodological and the other conceptual. With regard to the former, in their critique D&W rely upon a “randomization test” that ostensibly demonstrates the spurious nature of the deterrent effect found in previous studies. However, the manner by which D&W implement their test does not provide any dispositive insights and precludes any scientifically valid determination of whether or not capital punishment deters the rate of murder. The latter Section concerns D&W's argument that capital punishment in the United States is applied “too infrequently” to allow for any causal identification of a deterrent effect. It is shown that D&W effectively reach this conclusion by relying on data that are simply too aggregated; when one looks at the more appropriate disaggregated (i.e., state-level) data the relevant “execution risk” is likely to be higher, and possibly high enough to impart a deterrent effect. Finally, Section 7 provides concluding remarks.

## A Brief Overview of Zimmerman's Study

Zimmerman (2004) employs a panel of state-level data over the years 1978–1997 to estimate the effect of state executions on the rate of homicide. He constructs proxies of three “deterrence probabilities” based on Ehrlich's (1975) theoretical framework: (1) the probability of arrest for murder []; (2) the probability being convicted of committing murder conditional on being arrested []; and (3) the probability of being executed for murder conditional on conviction []. The structural regression model Zimmerman estimates is given by:

where the subscripts*i*and

*t*index states and years, respectively. The dependent variable (

*M*

_{i,t}) is the number of reported Uniform Crime Report murders per 100,000 state residents. The variables and denote the constant and error term, respectively. The variable

**E**

_{i,t}denotes a vector of law enforcement covariates,

**C**

_{i,t}a vector of other per capita crime measures that may affect the rate of murder, and

**X**

_{i,t}a vector of economic and demographic covariates typically included in state-level studies of crime.

^{5}The variables , , and denote vectors of state indicators, year indicators, and state-specific time trends, respectively. All other variables reflect (vectors of) coefficients to be estimated.

Table 1 shows how each of the deterrence probabilities is constructed. In the “lagged models,” the variable in the numerator and denominator of each deterrence probability is the relevant once-lagged annual value. Thus, in these instances all deterrence probabilities are defined in terms of the underlying variables’ values in the *previous* year and, therefore, presumed to be “exogenous” to the current year's murder rate.^{6} For the sake of clarity, the characteristics of each empirical specification considered by Zimmerman (i.e., with respect to the construction of the various deterrence probabilities) are summarized in Table 2. Models 1–4 are the “base models,” which are estimated by OLS. Models 5 and 6, which incorporate the contemporaneous deterrence probability measures, are estimated using two-stage least squares (2SLS).

Deterrence Probability | Contemporaneous Models | Lagged Models |
---|---|---|

Probability of arrest [Pr(a)] |
||

Probability of conviction | ||

given arrest [Pr(c|a)] |
||

Probability of execution | ||

given conviction [Pr(e|c)] |

Deterrence Probability | Contemporaneous Models | Lagged Models |
---|---|---|

Probability of arrest [Pr(a)] |
||

Probability of conviction | ||

given arrest [Pr(c|a)] |
||

Probability of execution | ||

given conviction [Pr(e|c)] |

Notes: Sources and data are as described in Zimmerman (2004, Table. 2). All data used in constructing the variables presented above correspond to state-level observations.

Unadjusted | Adjusted | Unadjusted | Adjusted | ||
---|---|---|---|---|---|

Contemporaneous | Contemporaneous | Lagged | Lagged | ||

Deterrence | Deterrence | Deterrence | Deterrence | Estimation | |

Specification | Probabilities | Probabilities | Probabilities | Probabilities | Method |

Model 1 | Yes | No | No | No | OLS |

Model 2 | No | Yes | No | No | OLS |

Model 3 | No | No | Yes | No | OLS |

Model 4 | No | No | No | Yes | OLS |

Model 5 | Yes | No | No | No | 2SLS |

Model 6 | No | Yes | No | No | 2SLS |

Unadjusted | Adjusted | Unadjusted | Adjusted | ||
---|---|---|---|---|---|

Contemporaneous | Contemporaneous | Lagged | Lagged | ||

Deterrence | Deterrence | Deterrence | Deterrence | Estimation | |

Specification | Probabilities | Probabilities | Probabilities | Probabilities | Method |

Model 1 | Yes | No | No | No | OLS |

Model 2 | No | Yes | No | No | OLS |

Model 3 | No | No | Yes | No | OLS |

Model 4 | No | No | No | Yes | OLS |

Model 5 | Yes | No | No | No | 2SLS |

Model 6 | No | Yes | No | No | 2SLS |

Notes: In Models 5 and 6 the endogenous (i.e., instrumented) variables are Pr(*a*), Pr(*c*|*a*), and Pr(*e*|*c*) as defined in Table 1. In some instances the variables in the denominators of the various deterrence probabilities were either zero or missing. A backward-looking adjustment is used to replace the undefined value of the observations with the last value where it was defined (i.e., on a state-year basis). This method was applied to both the contemporaneous and lagged deterrence probabilities models, which are referred to as the “adjusted” models above.

Zimmerman first obtains OLS estimates of Models 1–4, and finds that all point estimates of the relevant execution probability measures are negative but statistically insignificant at conventional levels (Zimmerman, 2004, Table 3). Recognizing that the OLS estimates of Models 1 and 2 may suffer from simultaneity bias (i.e., since the deterrence probabilities in those specifications rely on contemporaneous values of the underlying variables used in the numerator of each respective probability, and may therefore be contemporaneously correlated with the murder rate corresponding to the same year), Zimmerman proceeds by employing a set of instrumental variables to determine whether the state murder rate affects the relevant probability of execution proxy. These instruments, motivated by a “public choice” perspective of the capital punishment system,^{7} are used to estimate a simultaneous equations model where each of the deterrence probabilities in Models 5 and 6 is treated as endogenous.^{8}

Zimmerman's 2SLS estimates of the (contemporaneous) execution probability measure, Pr(*e*|*c*), in Models 5 and 6 are larger in magnitude (i.e., more negative) than their corresponding OLS estimates in Models 1 and 2, respectively. And, unlike their corresponding OLS estimates, the estimates of Pr(*e*|*c*) in Models 5 and 6 are statistically significant at conventional levels. Specifically, the coefficient estimate on Pr(*e*|*c*) in Model 5 is −1.48 (*t-*statistic = −2.24) and −2.15 (*t-*statistic = −2.94) in Model 6. These results suggest that the murder rate might impart a positive (upward) effect on the contemporaneous execution probability measures or, all else equal, that a *higher* murder rate will (contemporaneously) result in a *higher* execution rate. As noted by D&W, this positive “reverse causation” running from the murder rate to the execution rate may occur, e.g., if high murder rates make the public frustrated enough to increase the use of the death penalty (Donohue and Wolfers, 2005, p. 819). In summary, the results of Zimmerman (2004) suggest that executions might impart a noticeable decrease on the number of murders. But, at the same time, there is sufficient variation in the estimated effects across models such that the deterrent effect of capital punishment cannot be unequivocally determined.

## Alternative Approaches to Estimating Standard Errors in Panel Data Models to Account for the Effects of Autocorrelation on Statistical Inference

Over the past several years, applied econometric research has paid increasing attention to the effects that autocorrelation can exert on the statistical inference drawn from fixed effects or “difference-in-differences” models. Bertrand *et al*. (2004) show that failing to account for a highly auto- correlated “treatment effect” in the context of a difference-in-differences framework (specifically, where the treatment measure is constructed as a dummy variable and the panel data are of the state/year variety) can lead to OLS estimates “over-exaggerating” the amount of independent variation in the data that can be exploited by a researcher. This effect may result in the spurious finding of a statistically significant impact of the treatment effect.

To overcome the above problem, Bertrand *et al.* advocate the use of cluster-robust corrections to estimate standard errors in panel data, i.e., corrections that account for the fact that the data may be correlated either across space and/or over time. Cluster-robust standard errors (when appropriately applied) are advantageous in that they provide estimates that are robust to heteroskedasticity and (within-group) autocorrelation of arbitrary form. Applying this method to Zimmerman's original 2SLS models has a dramatic effect on the estimated standard errors of the conditional probability of execution variable: the relevant absolute *t-*statistic pertaining to Model 5 falls from 2.24 to 0.74 and that pertaining to Model 6 falls from 2.94 to 0.81.

While the Bertrand *et al*. (2004) study has made a substantial impact on modern econometric practice, subsequent research has highlighted potential shortcomings of cluster-robust standard errors. For example, it is now relatively well understood that these methods can be low-powered, meaning that the probability of rejecting the null when it is false is low.^{9} Put differently, cluster-robust corrections may lead to the null hypothesis being *under*-rejected when the *alternative* hypothesis is true (i.e., when the treatment actually imparts a real and substantive causal effect). In addition, it is not even clear whether clustering at the *state-level* (as is done by D&W) can be assumed to satisfy the asymptotic properties upon which these methods are justified.^{10}

For panel data regressions controlling for unit (i.e., group) specific effects, Imbens and Wooldridge (2007) note that Hansen's (2007b) study implies that cluster-robust corrections will tend to work well when the cross-section and time-series dimensions of the data are similar and not “too small.” Again, it is not clear whether state-level data provide for a sufficiently large number of clustering groups and, clearly, Zimmerman's (2004) dataset does not consist of a (roughly) equal number of groups (states) and time periods (years). If *time* fixed effects are also controlled for in the panel regression (as is done in Zimmerman's estimations), then the asymptotics must apply with *both* the cross-section and time dimensions becoming “large” (Imbens and Wooldridge, 2007)—an even stricter requirement. But this latter condition also cannot be said to hold for Zimmerman's data, which further brings the efficacy of D&W's preferred cluster-robust correction into question.

It has also been shown that Bertrand et al. study is unduly pessimistic about the ability of parametric AR(1) corrections for autocorrelation (e.g., the Prais–Winston correction) to provide correct rejection rates of the null while offering higher statistical power.^{11} As such, examining the effects on the standard errors pertaining to Zimmerman's original models but using various (parametric) corrections for autocorrelation (i.e., as opposed to the cluster-robust estimates relied upon by D&W) warrants consideration.^{12} Indeed, doing as much seems entirely consistent with D&W's stance regarding the need to conduct sensitivity checks on econometric work that may inform important public policy decisions (such as the use of capital punishment).

Table 3 presents the results of re-estimating Models 5 and 6 using five alternative parametric approaches to correct for autocorrelation: (1) Prais–Winston regression (panel-specific AR(1) process); (2) linear fixed-effects regression (common AR(1) process); (3) generalized least squares (GLS) regression (common AR(1) process); (4) GLS regression (panel-specific AR(1) process); and (5) Newey–West regression (lag order of autocorrelation set to one or two).^{13} All of the first- and second-stage regression specifications used to estimate the models in Table 3 are identical to those used to obtain Zimmerman's original estimates of Models 5 and 6 (i.e., save for the estimation method).^{14} Further, in each instance the *same* estimation method used to obtain the results from the second-stage regression (e.g., GLS) is also used to estimate each first-stage regression.^{15}

Prais–Winston | Linear Fixed- | Newey-West | Newey-West | |||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|

Regression | Effects Regression | GLS Regression | GLS Regression | regression | regression | |||||||

(Panel- Specific AR(1) Process) | (Common AR (1) Process) | (Common AR (1) Process) | (Panel-Specific AR(1) Process) | (lag order of autocorrelation =1) | (lag order of autocorrelation =2) | |||||||

Independent variable | Model 5 | Model 6 | Model 5 | Model 6 | Model 5 | Model 6 | Model 5 | Model 6 | Model 5 | Model 6 | Model 5 | Model 6 |

(Murder arrests in year t)/(Murders in year t) | −1.558^{**} | −1.750^{***} | −1.289^{**} | −1.268^{*} | −1.772^{**} | −1.855^{***} | −1.594^{**} | −1.631^{***} | −1.956^{**} | −2.125^{**} | −1.956^{**} | −2.125^{**} |

(2.33) | (2.61) | (2.05) | (1.88) | (2.53) | (2.61) | (2.53) | (2.63) | (2.19) | (2.33) | (2.11) | (2.24) | |

(Death sentences in year t)/(Murders arrests in year t−1) [unadjusted] | −5.56 | −1.743 | −4.192 | −4.696 | −1.849 | −1.849 | ||||||

(1.09) | (0.26) | (0.71) | (0.97) | (0.25) | (0.25) | |||||||

(Executions in year t)/(Death sentences in year t−1) [unadjusted] | −1.646 ^{***} | −1.809 ^{***} | −1.645 ^{***} | −1.768 ^{***} | −2.126 ^{***} | −2.126 ^{***} | ||||||

(4.30) | (2.85) | (4.16) | (5.33) | (4.02) | (4.04) | |||||||

(Death sentences in year t)/(Murders arrests in year t−1) [adjusted] | −4.539 | −3.923 | −4.969 | −5.346 | −2.914 | −2.914 | ||||||

(0.87) | (0.52) | (0.83) | (1.11) | (0.38) | (0.38) | |||||||

(Executions in year t)/(Death sentences in year t−1) [adjusted] | −1.927 ^{***} | −1.622 ^{**} | −1.694 ^{***} | −1.935 ^{***} | −2.206 ^{***} | −2.206 ^{***} | ||||||

(3.91) | (1.98) | (3.78) | (4.86) | (3.59) | (3.60) | |||||||

Prisoners per capita | −0.006^{***} | −0.006^{***} | −0.006^{***} | −0.005^{***} | −0.006^{***} | −0.006^{***} | −0.006^{***} | −0.006^{***} | −0.006^{***} | −0.006^{***} | −0.006^{***} | −0.006^{***} |

(3.51) | (3.46) | (2.90) | (2.60) | (5.63) | (5.48) | (6.23) | (6.03) | (3.99) | (3.84) | (4.02) | (3.86) | |

Police per capita | −0.001 | −0.001 | 0.001 | 0.002 | −0.002 | −0.001 | −0.001 | −0.001 | −0.003 | −0.002 | −0.003 | −0.002 |

(0.18) | (0.16) | (0.32) | (0.53) | (0.54) | (0.46) | (0.29) | (0.24) | (0.63) | (0.52) | (0.61) | (0.51) | |

% Unemployed | −0.084^{**} | −0.084^{**} | −0.144^{***} | −0.139^{***} | −0.121^{***} | −0.112^{***} | −0.083^{**} | −0.082^{**} | −0.139^{***} | −0.129^{***} | −0.139^{***} | −0.129^{***} |

(2.07) | (2.07) | (3.54) | (3.39) | (3.42) | (3.19) | (2.47) | (2.46) | (2.97) | (2.75) | (2.85) | (2.65) | |

Income per capita | 0.287^{**} | 0.278^{**} | 0.312^{**} | 0.277^{**} | 0.356^{***} | 0.330^{***} | 0.319^{***} | 0.281^{**} | 0.359^{**} | 0.304^{**} | 0.359^{**} | 0.304^{**} |

(2.07) | (2.01) | (2.40) | (2.06) | (3.03) | (2.80) | (2.91) | (2.54) | (2.49) | (2.07) | (2.42) | (2.01) | |

% Metro | 0.015 | 0.015 | −0.009 | −0.008 | −0.001 | −0.002 | 0.021 | 0.016 | −0.005 | −0.007 | −0.005 | −0.007 |

(0.51) | (0.51) | (0.31) | (0.27) | (0.03) | (0.09) | (0.89) | (0.68) | (0.18) | (0.23) | (0.20) | (0.25) | |

% Poverty | 0.007 | 0.010 | 0.024 | 0.016 | 0.017 | 0.01 | 0.010 | 0.009 | 0.026 | 0.019 | 0.026 | 0.019 |

(0.33) | (0.46) | (1.17) | (0.78) | (0.79) | (0.46) | (0.50) | (0.46) | (0.85) | (0.60) | (0.84) | (0.59) | |

% Black | 1.666^{***} | 1.800^{***} | 0.381 | 1.068* | 1.950^{***} | 2.026^{***} | 1.723^{***} | 1.802^{***} | 2.235^{***} | 2.346^{***} | 2.235^{***} | 2.346^{***} |

(3.64) | (3.69) | (1.03) | (1.95) | (5.49) | (5.30) | (5.22) | (5.12) | (4.65) | (4.47) | (4.48) | (4.32) | |

% Ages 18–24 | 0.135 | 0.161 | −0.09 | 0.044 | 0.075 | 0.089 | 0.129 | 0.157 | 0.074 | 0.104 | 0.074 | 0.104 |

(1.08) | (1.29) | (0.71) | (0.38) | (0.65) | (0.77) | (1.17) | (1.43) | (0.63) | (0.88) | (0.61) | (0.86) | |

% Ages 25–44 | 0.087 | 0.091 | −0.055 | −0.073 | 0.135 | 0.096 | 0.104 | 0.088 | 0.179 | 0.138 | 0.179 | 0.138 |

(0.54) | (0.57) | (0.29) | (0.37) | (0.84) | (0.60) | (0.71) | (0.60) | (0.91) | (0.69) | (0.88) | (0.67) | |

% Ages 45–64 | 0.721^{***} | 0.696^{***} | 0.193 | 0.230 | 0.943^{***} | 0.830^{***} | 0.789^{***} | 0.702^{***} | 1.045^{***} | 0.925^{***} | 1.045^{***} | 0.925^{***} |

(2.90) | (2.81) | (0.71) | (0.84) | (3.75) | (3.38) | (3.47) | (3.11) | (3.11) | (2.79) | (2.95) | (2.65) | |

% Ages 65 and over | −0.011 | −0.11 | 0.501* | 0.410 | 0.067 | 0.013 | 0.004 | −0.110 | 0.167 | 0.062 | 0.167 | 0.062 |

(0.04) | (0.37) | (1.91) | (1.55) | (0.31) | (0.06) | (0.02) | (0.51) | (0.63) | (0.23) | (0.62) | (0.23) | |

Robberies per capita | 0.017^{***} | 0.017^{***} | 0.020^{***} | 0.020^{***} | 0.016^{***} | 0.016^{***} | 0.016^{***} | 0.017^{***} | 0.016^{***} | 0.016^{***} | 0.016^{***} | 0.016^{***} |

(12.23) | (12.22) | (12.12) | (12.05) | (16.83) | (16.88) | (17.13) | (17.28) | (9.52) | (9.48) | (8.62) | (8.58) | |

Aggravated assaults per capita | 0.005^{***} | 0.005^{***} | 0.003^{***} | 0.003^{**} | 0.005^{***} | 0.005^{***} | 0.005^{***} | 0.005^{***} | 0.006^{***} | 0.006^{***} | 0.006^{***} | 0.006^{***} |

(4.53) | (4.55) | (2.63) | (2.36) | (5.31) | (5.26) | (5.75) | (5.77) | (4.15) | (4.07) | (3.93) | (3.86) | |

Constant | −27.149^{**} | −25.789^{**} | −15.451 | −22.020^{**} | −34.780^{***} | −30.097^{***} | −30.558^{***} | −26.062^{***} | −39.674^{***} | −33.650^{**} | −39.674^{***} | −33.650^{**} |

(2.53) | (2.41) | (1.54) | (2.12) | (3.40) | (3.00) | (3.22) | (2.74) | (2.89) | (2.49) | (2.82) | (2.43) | |

Observations | 1000 | 1000 | 950 | 950 | 1000 | 1000 | 1000 | 1000 | 1000 | 1000 | 1000 | 1000 |

Prais–Winston | Linear Fixed- | Newey-West | Newey-West | |||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|

Regression | Effects Regression | GLS Regression | GLS Regression | regression | regression | |||||||

(Panel- Specific AR(1) Process) | (Common AR (1) Process) | (Common AR (1) Process) | (Panel-Specific AR(1) Process) | (lag order of autocorrelation =1) | (lag order of autocorrelation =2) | |||||||

Independent variable | Model 5 | Model 6 | Model 5 | Model 6 | Model 5 | Model 6 | Model 5 | Model 6 | Model 5 | Model 6 | Model 5 | Model 6 |

(Murder arrests in year t)/(Murders in year t) | −1.558^{**} | −1.750^{***} | −1.289^{**} | −1.268^{*} | −1.772^{**} | −1.855^{***} | −1.594^{**} | −1.631^{***} | −1.956^{**} | −2.125^{**} | −1.956^{**} | −2.125^{**} |

(2.33) | (2.61) | (2.05) | (1.88) | (2.53) | (2.61) | (2.53) | (2.63) | (2.19) | (2.33) | (2.11) | (2.24) | |

(Death sentences in year t)/(Murders arrests in year t−1) [unadjusted] | −5.56 | −1.743 | −4.192 | −4.696 | −1.849 | −1.849 | ||||||

(1.09) | (0.26) | (0.71) | (0.97) | (0.25) | (0.25) | |||||||

(Executions in year t)/(Death sentences in year t−1) [unadjusted] | −1.646 ^{***} | −1.809 ^{***} | −1.645 ^{***} | −1.768 ^{***} | −2.126 ^{***} | −2.126 ^{***} | ||||||

(4.30) | (2.85) | (4.16) | (5.33) | (4.02) | (4.04) | |||||||

(Death sentences in year t)/(Murders arrests in year t−1) [adjusted] | −4.539 | −3.923 | −4.969 | −5.346 | −2.914 | −2.914 | ||||||

(0.87) | (0.52) | (0.83) | (1.11) | (0.38) | (0.38) | |||||||

(Executions in year t)/(Death sentences in year t−1) [adjusted] | −1.927 ^{***} | −1.622 ^{**} | −1.694 ^{***} | −1.935 ^{***} | −2.206 ^{***} | −2.206 ^{***} | ||||||

(3.91) | (1.98) | (3.78) | (4.86) | (3.59) | (3.60) | |||||||

Prisoners per capita | −0.006^{***} | −0.006^{***} | −0.006^{***} | −0.005^{***} | −0.006^{***} | −0.006^{***} | −0.006^{***} | −0.006^{***} | −0.006^{***} | −0.006^{***} | −0.006^{***} | −0.006^{***} |

(3.51) | (3.46) | (2.90) | (2.60) | (5.63) | (5.48) | (6.23) | (6.03) | (3.99) | (3.84) | (4.02) | (3.86) | |

Police per capita | −0.001 | −0.001 | 0.001 | 0.002 | −0.002 | −0.001 | −0.001 | −0.001 | −0.003 | −0.002 | −0.003 | −0.002 |

(0.18) | (0.16) | (0.32) | (0.53) | (0.54) | (0.46) | (0.29) | (0.24) | (0.63) | (0.52) | (0.61) | (0.51) | |

% Unemployed | −0.084^{**} | −0.084^{**} | −0.144^{***} | −0.139^{***} | −0.121^{***} | −0.112^{***} | −0.083^{**} | −0.082^{**} | −0.139^{***} | −0.129^{***} | −0.139^{***} | −0.129^{***} |

(2.07) | (2.07) | (3.54) | (3.39) | (3.42) | (3.19) | (2.47) | (2.46) | (2.97) | (2.75) | (2.85) | (2.65) | |

Income per capita | 0.287^{**} | 0.278^{**} | 0.312^{**} | 0.277^{**} | 0.356^{***} | 0.330^{***} | 0.319^{***} | 0.281^{**} | 0.359^{**} | 0.304^{**} | 0.359^{**} | 0.304^{**} |

(2.07) | (2.01) | (2.40) | (2.06) | (3.03) | (2.80) | (2.91) | (2.54) | (2.49) | (2.07) | (2.42) | (2.01) | |

% Metro | 0.015 | 0.015 | −0.009 | −0.008 | −0.001 | −0.002 | 0.021 | 0.016 | −0.005 | −0.007 | −0.005 | −0.007 |

(0.51) | (0.51) | (0.31) | (0.27) | (0.03) | (0.09) | (0.89) | (0.68) | (0.18) | (0.23) | (0.20) | (0.25) | |

% Poverty | 0.007 | 0.010 | 0.024 | 0.016 | 0.017 | 0.01 | 0.010 | 0.009 | 0.026 | 0.019 | 0.026 | 0.019 |

(0.33) | (0.46) | (1.17) | (0.78) | (0.79) | (0.46) | (0.50) | (0.46) | (0.85) | (0.60) | (0.84) | (0.59) | |

% Black | 1.666^{***} | 1.800^{***} | 0.381 | 1.068* | 1.950^{***} | 2.026^{***} | 1.723^{***} | 1.802^{***} | 2.235^{***} | 2.346^{***} | 2.235^{***} | 2.346^{***} |

(3.64) | (3.69) | (1.03) | (1.95) | (5.49) | (5.30) | (5.22) | (5.12) | (4.65) | (4.47) | (4.48) | (4.32) | |

% Ages 18–24 | 0.135 | 0.161 | −0.09 | 0.044 | 0.075 | 0.089 | 0.129 | 0.157 | 0.074 | 0.104 | 0.074 | 0.104 |

(1.08) | (1.29) | (0.71) | (0.38) | (0.65) | (0.77) | (1.17) | (1.43) | (0.63) | (0.88) | (0.61) | (0.86) | |

% Ages 25–44 | 0.087 | 0.091 | −0.055 | −0.073 | 0.135 | 0.096 | 0.104 | 0.088 | 0.179 | 0.138 | 0.179 | 0.138 |

(0.54) | (0.57) | (0.29) | (0.37) | (0.84) | (0.60) | (0.71) | (0.60) | (0.91) | (0.69) | (0.88) | (0.67) | |

% Ages 45–64 | 0.721^{***} | 0.696^{***} | 0.193 | 0.230 | 0.943^{***} | 0.830^{***} | 0.789^{***} | 0.702^{***} | 1.045^{***} | 0.925^{***} | 1.045^{***} | 0.925^{***} |

(2.90) | (2.81) | (0.71) | (0.84) | (3.75) | (3.38) | (3.47) | (3.11) | (3.11) | (2.79) | (2.95) | (2.65) | |

% Ages 65 and over | −0.011 | −0.11 | 0.501* | 0.410 | 0.067 | 0.013 | 0.004 | −0.110 | 0.167 | 0.062 | 0.167 | 0.062 |

(0.04) | (0.37) | (1.91) | (1.55) | (0.31) | (0.06) | (0.02) | (0.51) | (0.63) | (0.23) | (0.62) | (0.23) | |

Robberies per capita | 0.017^{***} | 0.017^{***} | 0.020^{***} | 0.020^{***} | 0.016^{***} | 0.016^{***} | 0.016^{***} | 0.017^{***} | 0.016^{***} | 0.016^{***} | 0.016^{***} | 0.016^{***} |

(12.23) | (12.22) | (12.12) | (12.05) | (16.83) | (16.88) | (17.13) | (17.28) | (9.52) | (9.48) | (8.62) | (8.58) | |

Aggravated assaults per capita | 0.005^{***} | 0.005^{***} | 0.003^{***} | 0.003^{**} | 0.005^{***} | 0.005^{***} | 0.005^{***} | 0.005^{***} | 0.006^{***} | 0.006^{***} | 0.006^{***} | 0.006^{***} |

(4.53) | (4.55) | (2.63) | (2.36) | (5.31) | (5.26) | (5.75) | (5.77) | (4.15) | (4.07) | (3.93) | (3.86) | |

Constant | −27.149^{**} | −25.789^{**} | −15.451 | −22.020^{**} | −34.780^{***} | −30.097^{***} | −30.558^{***} | −26.062^{***} | −39.674^{***} | −33.650^{**} | −39.674^{***} | −33.650^{**} |

(2.53) | (2.41) | (1.54) | (2.12) | (3.40) | (3.00) | (3.22) | (2.74) | (2.89) | (2.49) | (2.82) | (2.43) | |

Observations | 1000 | 1000 | 950 | 950 | 1000 | 1000 | 1000 | 1000 | 1000 | 1000 | 1000 | 1000 |

Notes: The data set comprises state-level observations based over the years 1978–1997. The dependent variable is the number of reported UCR murders per capita. All per capita variables are defined per 100,000 state residents. Absolute value of *z*-statistics or *t*-statistics in parentheses. The symbols “^{*}”, “^{**}”, and “^{***}” denote statistical significance at the 10%, 5%, and 1% levels, respectively, in a two-tailed test. All regressions are estimated using state populations as weights (except for those pertaining to the linear fixed-effects model, which are unweighted) and include full sets of state indicators, year indicators, and state-specific time trends (estimates not shown). The same holds true for each of the respective models' first-stage regressions (estimates not shown). All regressions are statistically significant at (better than) the 1% level in a two-tailed test.

Each of the estimated execution risk measures in Table 3 takes a negative sign and all are statistically significant at conventional levels. Specifically, nine out of the 10 individual point estimates are statistically significant at the 1% level and one is statistically significant at the 5% level.^{16} As expected, the probability of arrest variable is also consistently negative and statistically significant across all specifications. The sentencing risk measure is also consistently negative as expected, but is never statistically significant at conventional levels. These latter findings are also consistent with Zimmerman (2004).

Measured across all 10 regressions, the estimates imply (on average) that each additional state execution deters approximately 15 murders.^{17},^{18} The average value of the lower bound of the 95% confidence interval as measured across all eight regressions is seven additional murders deterred, while the corresponding average upper bound is 23 additional murders. Clearly, these regressions provide strong evidence that Zimmerman's original results and conclusions are effectively unchanged when employing what may be a more appropriate type of correction for autocorrelation.

## An Alternative Model Specification for Estimating the Deterrent Effect of Capital Punishment

Including the lagged value of the dependent variable as an explanatory measure is a common and simple approach to account for autocorrelation that is frequently employed in econometric studies (Keele and Kelly, 2006). In the context of the economic model of crime, this treatment may be also be motivated on *theoretical* grounds. For instance, Moody (2001, p. 806) notes that:

Given the very real possibility that crime causes crime, it would appear to be necessary to incorporate momentum, lags, and other dynamics into the analysis. An increase in crime can cause the law enforcement sector to be overwhelmed. The probability of arrest and conviction declines. Everyone seems to be committing crime and getting away with it. The result is a multiple increase in the crime rate.

As such, Models 1–6 of Zimmerman (2004) are re-estimated with the lagged murder rate as an additional explanatory measure.^{19} These alternative specifications are specified (respectively) with a prime symbol (“ ^{′} ”) in the ensuing discussion. Of course, the interpretation of the estimated coefficients from these alternative specifications is nominally different from those obtained from the original specifications (i.e., since the former hold the lagged influence of the murder rate constant, whereas the latter do not). Similarly, the fact that the identification of these models is somewhat different relative to Zimmerman's original specifications (as discussed further below) should be taken into account when interpreting the results. In any event, the OLS specifications (Models 1′–4′) again provided no evidence to suggest a deterrent effect of capital punishment, which is consistent with Zimmerman's original results.^{20}

In Zimmerman (2004), the *same* set of instruments was used to estimate *both* Models 5 and 6. The test of overidentifying restrictions (“OIR”) was found to be within conventional bounds for Model 5 (*p*-value = 0.16), but not so for Model 6 (*p-*value < 0.01). In the present case, when using the instrument set considered in Zimmerman (2004) to estimate Model 6′, the OIR test was again found to be outside conventional bounds. However, it is possible to explore the “failure” of the OIR test for Model 6′ by considering the statistical significance of the (excluded) instruments in a regression of the *second-stage* residuals on the (excluded) instruments and all other exogenous variables.^{21} Two of the (excluded) instruments were found to be statistically significant at conventional levels in this latter regression. When these two instruments are then dropped from the simultaneous equations system, the OIR test for Model 6′ becomes satisfied at conventional levels (*p*-value = 0.37).^{22} Finally, if the same instrument set that satisfied the OIR test for Model 6′ is also used *to estimate Model 5*′, the resultant OIR test is *also* passed at conventional levels (*p*-value = 0.37).^{23} As such, all of the estimates discussed below rely upon the set of instruments that satisfy the OIR test for Model 6′.

Table 4 presents the 2SLS estimates of Models 5′ and 6′. The coefficient estimate on the probability of execution is −0.96 in Model 5′ (approximately 35% smaller than the original Model 5 estimate) and statistically significant at the 10% level (*t*-statistic *=* −1.67). The corresponding estimate for Model 6′ is −1.38 (approximately 36% smaller than the original estimate) and statistically significant at the 5% level (*t*-statistic *=* −2.30).^{24} These estimates imply that each additional execution will deter approximately eight and 11 murders on average, respectively.^{25} That the estimated magnitudes of the execution risk coefficients are now smaller than their “original” counterparts is hardly surprising since (again) inclusion of the lagged dependent variable as a covariate tends to bias coefficient estimates toward zero. Further, each coefficient estimate resides within the 95% confidence interval associated with Zimmerman's original “preferred” estimate.

Model 5′ | Model 6′ | |
---|---|---|

(Murder arrests in year t)/(Murders in | −1.80^{**} | −1.70^{*} |

year t) | (2.07) | (1.91) |

(Death sentences in year t)/(Murders | −9.07 | |

arrests in year t−1) [unadjusted] | (0.82) | |

(Death sentences in year t)/(Murders | −6.05 | |

arrests in year t−1) [adjusted] | (0.72) | |

(Executions in year t)/(Death sentences | −0.96^{*} | |

in year t−1) [unadjusted] | (1.67) | |

(Executions in year t)/(Death sentences | −1.38^{**} | |

in year t−1) [adjusted] | (2.30) | |

Once-lagged per capita murders | 0.31^{***} | 0.31^{***} |

(8.03) | (8.40) | |

Prisoners per capita | −3.88E -03^{***} | −4.16E -03^{***} |

(3.06) | (3.45) | |

Police per capita | −2.85E -03 | −1.41E -03 |

(0.70) | (0.36) | |

% Unemployed | −0.13^{***} | −0.09^{*} |

(2.87) | (1.89) | |

Income per capita | 0.23 | 0.25^{*} |

(1.52) | (1.72) | |

% Metro | 3.90E -03 | 3.06E -03 |

(0.11) | (0.10) | |

% Poverty | −0.01 | 0.01 |

(0.35) | (0.18) | |

% Black | 1.41^{***} | 1.79^{***} |

(2.77) | (3.37) | |

% Ages 18–24 | 0.02 | 0.18 |

(0.14) | (1.30) | |

% Ages 25–44 | −0.10 | −0.02 |

(0.39) | (0.07) | |

% Ages 45–64 | 0.47 | 0.44 |

(1.31) | (1.45) | |

% Ages 65 and over | 0.49^{*} | 0.34 |

(1.83) | (1.33) | |

Robberies per capita | 0.01^{***} | 0.01^{***} |

(10.32) | (10.08) | |

Aggravated assaults per capita | 3.74E -03^{***} | 4.84E -03^{***} |

(2.83) | (3.88) | |

Constant | −18.02 | −22.32^{*} |

(1.12) | (1.72) | |

Observations | 751 | 920 |

F(H: All slopes = 0) _{O} | 113.91^{***} | 121.21^{***} |

p(OIR) | 0.37 | 0.37 |

Model 5′ | Model 6′ | |
---|---|---|

(Murder arrests in year t)/(Murders in | −1.80^{**} | −1.70^{*} |

year t) | (2.07) | (1.91) |

(Death sentences in year t)/(Murders | −9.07 | |

arrests in year t−1) [unadjusted] | (0.82) | |

(Death sentences in year t)/(Murders | −6.05 | |

arrests in year t−1) [adjusted] | (0.72) | |

(Executions in year t)/(Death sentences | −0.96^{*} | |

in year t−1) [unadjusted] | (1.67) | |

(Executions in year t)/(Death sentences | −1.38^{**} | |

in year t−1) [adjusted] | (2.30) | |

Once-lagged per capita murders | 0.31^{***} | 0.31^{***} |

(8.03) | (8.40) | |

Prisoners per capita | −3.88E -03^{***} | −4.16E -03^{***} |

(3.06) | (3.45) | |

Police per capita | −2.85E -03 | −1.41E -03 |

(0.70) | (0.36) | |

% Unemployed | −0.13^{***} | −0.09^{*} |

(2.87) | (1.89) | |

Income per capita | 0.23 | 0.25^{*} |

(1.52) | (1.72) | |

% Metro | 3.90E -03 | 3.06E -03 |

(0.11) | (0.10) | |

% Poverty | −0.01 | 0.01 |

(0.35) | (0.18) | |

% Black | 1.41^{***} | 1.79^{***} |

(2.77) | (3.37) | |

% Ages 18–24 | 0.02 | 0.18 |

(0.14) | (1.30) | |

% Ages 25–44 | −0.10 | −0.02 |

(0.39) | (0.07) | |

% Ages 45–64 | 0.47 | 0.44 |

(1.31) | (1.45) | |

% Ages 65 and over | 0.49^{*} | 0.34 |

(1.83) | (1.33) | |

Robberies per capita | 0.01^{***} | 0.01^{***} |

(10.32) | (10.08) | |

Aggravated assaults per capita | 3.74E -03^{***} | 4.84E -03^{***} |

(2.83) | (3.88) | |

Constant | −18.02 | −22.32^{*} |

(1.12) | (1.72) | |

Observations | 751 | 920 |

F(H: All slopes = 0) _{O} | 113.91^{***} | 121.21^{***} |

p(OIR) | 0.37 | 0.37 |

Notes: The data set comprises state-level observations based over the years 1978–1997. The dependent variable is the number of reported UCR murders per capita. All per capita variables are defined per 100,000 state residents. The symbols “^{*}”, “^{**}”, and “^{***}” denote statistical significance at the 10%, 5%, and 1% levels, respectively, in a two-tailed test. All regressions are estimated using state populations as weights and include full sets of state indicators, year indicators, and state-specific time trends (estimates not shown). The same holds true for each of the respective models' first-stage regressions (estimates not shown).

Of course, it remains to be determined whether the inclusion of the lagged murder rate *actually* controls for the influence of autocorrelation in the above estimations. As noted by Wooldridge (2002), statistically testing the series of idiosyncratic errors in fixed-effects models for the presence of autocorrelation is a very complex problem. However, when the number of time periods in the panel is greater than or equal to three, Wooldridge suggests the following approach. First, estimate the OLS regression

Under the null hypothesis that the (unobserved) errors are serially uncorrelated (assuming the fixed-effects model is estimated in levels) the estimated fixed-effects (time-demeaned) errors will have (first-order) correlation , where *T* denotes the number of time periods in the panel (Wooldridge, 2002).^{26} As such, the presence of autocorrelation may be determined by testing the null hypothesis , where denotes the estimated first-order autocorrelation coefficient obtained from equation (1). If the absolute value of the computed *t*-statistic is sufficiently large (i.e., considering standard levels of statistical significance) then *H*_{0} would be rejected. This finding would suggest the presence of autocorrelation and raise the concern that the estimated standard errors might still be biased downward.

Since *T =* 20 in Zimmerman's data, the value of to be tested under the null hypothesis is −0.05. For Model 5′, the computed value of the (absolute) *t*-statistic does reject the null hypothesis of “no autocorrelation,” but only at the 10% level. However, the corresponding *t*-statistic with regard to Model 6′ fails to reject the null hypothesis at *any* conventional level of statistical significance. These findings suggest that the influence of autocorrelation on the estimated standard errors is likely to be small at best after controlling for the lagged impact of the murder rate.

Finally, recall that when Zimmerman's original “preferred” specification was estimated using a (pseudo) double-logarithmic functional form the estimated coefficient on the execution risk turned statistically insignificant at conventional levels. Applying the double-log specification to Models 5′ also turns the estimated impact of the execution risk statistically insignificant. However, this finding does *not* hold for Model 6′, i.e., *the estimated deterrent effect is robust to the change in functional form*. Specifically, the (absolute) *p*-value associated with the estimated impact of the execution risk is 0.06.

The above results suggest that the estimated deterrent effect of capital punishment is sensitive to functional form, as was noted by Zimmerman (2004). In this sense one could reasonably argue that the “overall” evidence for a deterrent effect of capital punishment is “mixed” at best. On the other hand, Dezhbakhsh *et al.* (2003, p. 353) argue that the *linear* functional form may be the most *theoretically* appropriate in estimating aggregate economic models of crime since: (1) the underlying theory that motivates the aggregate empirical specification is derived from the maximizing behavior of an *individual* offender; and (2) only the linear functional form is invariant to aggregation (e.g., whereas the double-log functional form is not). If one accepts these arguments then the above estimations (considered in their totality) could be taken as evidence *supporting* a deterrent effect of capital punishment.

## Randomization Testing

One particularly interesting analysis conducted by D&W in order to evaluate the precision of the Dezhbakhsh et al. (2003) and Zimmerman (2004) IV estimates involves a so-called “randomization test,” which involves conducting a counterfactual or “randomized” experiment on a set of “artificial” data.^{27} Specifically, D&W use the latter authors’ original panel data but match each state's homicide rate to a *random* state's independent variables (and instruments).^{28} With these artificial data, D&W re-ran each of the authors’ (alleged) preferred 2SLS regression models, repeating the process 1000 times. This process results in 1000 different “artificial” point estimates of the conditional execution probability. D&W then consider the distribution of these coefficient estimates (i.e., in terms of the estimated “life–life tradeoff”)^{29} relative to the value of Dezhbakhsh *et al.*'s and Zimmerman's preferred estimates.

The results of D&W's randomized experiment appear, *prima facie,* to be quite damning. The mass of the distribution of their artificial conditional execution probability coefficient estimates is skewed toward the negative scale of the “life–life tradeoff” (i.e., implying, in most instances, that the artificial estimated execution probability is *positively* correlated with the murder rate, or that a “brutalization effect” of capital punishment dominates).^{30} Relying on these estimates, D&W conclude that there is no “real” deterrent effect associated with executions.

The conclusions D&W draw from their randomization test appear to be questionable at best. This is simply due to the fact that D&W's method of interpreting their results is not consistent with that proscribed by the received econometric literature on randomized testing. Specifically, D&W *only* address the magnitude and signs of the various estimated “artificial” coefficients. However, randomization testing formally tests the relevant null hypothesis by considering the size of the estimated *test statistics* associated with the “artificial” estimates *relative to* the size of the *test statistic* associated with the “actual” coefficient estimate.^{31} See Kennedy (1995, 2003) for further discussion.

Intuitively, in the artificial or “shuffled” datasets, there is, by construction, no “real” relationship between the dependent and independent variables. Thus, if the alternative hypothesis were actually true, one would not expect to find very many “artificial” estimated *t-*statistics that were *larger* (in absolute value) than the (absolute value) of the “original” estimated *t*-statistic.

Approximately , where *X* denotes the total number of *t*-values generated from the randomization test and (say) , would have to be greater than or equal to the estimated *t*-statistic (both being expressed in absolute values) in order to fail to reject the null hypothesis (i.e., to statistically infer that there is no deterrent effect of capital punishment).

However, D&W never report or discuss the estimated standard errors (or *t*-statistics) pertaining to any of their artificial coefficient estimates, thus making it impossible to determine whether the relevant null hypothesis is or is not formally rejected by their randomization test.^{32} Again, the authors only take into account the overall *distribution* (i.e., density function) of their artificial *coefficient* estimates (i.e., *not* the relevant test statistics that must be relied upon to draw proper statistical inference from a randomization test) and consider the proportion of the estimates that were at least as large in magnitude to Zimmerman's estimated (actual) coefficient.^{33} Clearly, D&W do not appear to have conducted their randomization test in a meaningful fashion, and as such, their reported results cannot be used to infer that Zimmerman's estimates are statistically spurious.^{34}

## The Frequency of Executions and Deterrence

Some brief discussion regarding a central tenant of D&W's criticism is also warranted: namely, that the number of executions actually carried out is so low that one cannot reasonably extract the execution-related “signal” from the overall “noise” explaining the large year-to-year fluctuations in the murder rate. For example, D&W point out that “[i]n 2003, there were 16,503 homicides … but only 144 inmates were sentenced to death … of the 3374 inmates on death row at the beginning of the year, only 65 were executed” (Donohue and Wolfers, 2005, p. 795). It appears that D&W inappropriately include one Federal execution and two Federal death sentences in their reported figures.^{35} Removing these Federal cases results in a death sentence to homicide ratio (i.e., which would presumably be relied upon by D&W as a proxy of a death penalty “sentencing risk”) of approximately 0.86% [ = (142/16,503)^{*}100], which one might argue is a seemingly “small” probability.

It is worth noting, however, that the number of homicides D&W report for 2003 corresponds to the *national* number of homicides reported in the United States, i.e., the number measured across *all* states.^{36} Further, given that only *some* states actually have the death penalty (and, as noted by D&W, some states that have the death penalty *do not* routinely conduct executions, which might suggest that death sentences handed down in those states would not entail a strong deterrent effect), a “sentencing risk” estimate based upon national homicide counts is necessarily biased downward. If one considers *only* those states that have the death penalty, the sentencing risk increases to approximately 0.95%.^{37} While the latter estimate is also seemingly “low” in magnitude, it is higher than the one based upon data from *all* states. And if one considers the *subset* of death penalty states that actually *conducted* an execution in 2003, the sentencing risk is: (number of persons received under sentence of death in states that executed an offender in 2003)/(number of murders in states that issued death sentences in 2003) = (79/5750)^{*}100 = 1.4%.^{38}

Again, while one might argue that this latter figure is seemingly “small” in magnitude, D&W do concede that “… certain catastrophic events that occur with low frequency [may be] given greater prominence in decision making than their likelihood warrants if individuals are given frequent vivid reminders of these events, which could conceivably make the death penalty more of a deterrent than a rational calculation of the risk … would suggest …” (Donohue and Wolfers, 2005, note 23) and that “there is little evidence on how criminals form their expectations” (Donohue and Wolfers 2005, note 67). As such, it may be entirely possible that even very rare occurrences of “severe punishments” (such as being executed) are sufficient to deter the behavior of a marginal offender (in this case a potential murderer).

Of course, one could reasonably argue that calculating the sentencing risk is most appropriate when done on a state-by-state basis *since death sentences handed down (or executions carried) out in a given death penalty state should not induce a deterrent effect in any other state*. For the subset of states that sentenced at least one offender to death in 2003, the sentencing risk ratio ranges from 0.15% (Georgia) to 13.33% (Montana).^{39} While D&W are absolutely correct in noting that social scientists do not have a good sense of how criminals (whether they be actual or “potential”) form their perceptions of risk arising from the presence or application of criminal sanctions (be they executions or otherwise), these disaggregated figures nonetheless highlight the potentially misleading inferences that can be drawn when regarding such probabilities and failing to examine the theoretically appropriate unit of observation.

A similar exercise can also be applied to the 2003 death row population and execution figures cited by D&W. D&W include 23 Federal prisoners under sentence of death in their death row population total of 3374. After removing these cases the *national-level* ratio of executions to death row inmates would imply an average “execution risk” in 2003 of approximately 1.94% [ = (65/3351)^{*}100]. Again, when one considers the more theoretically relevant *state-level* unit of observation, for the subset of states in which at least one execution was carried out in 2003, the execution risk ranges from 0.82% (Florida) to 13.73% (Oklahoma). The (unweighted) average execution risk across the latter set of states is approximately 4.42%.^{40} Of course, whether or not execution-related signal that can be extracted from individual states’ *application* of the death penalty is actually “strong” enough to induce a deterrent effect of capital punishment is not, in any sense, directly answered (either theoretically or empirically) by D&W (or by anyone else) at this time, and thus remains an open and important question for future research.

## Concluding Remarks

D&W's recent critique of various empirical studies in the economics literature suggesting a deterrent effect of capital punishment highlights both the inherent difficulties in identifying causal effects from states’ application of the death penalty and the inherent danger in drawing conclusions from potentially “fragile” estimates. These factors in themselves mark the paper as a significant contribution to the literature. This paper addresses several criticisms made by D&W with regard to the statistical variability of the estimates suggesting a deterrent effect of capital punishment as presented in Zimmerman (2004) while also discussing several apparent shortcomings in D&W's critique.

Recent econometric research has shown that the cluster-robust methods that D&W rely upon may not result in correct statistical inference when applied to U.S. state-level panel data. Some researchers have suggested that parametric corrections to address autocorrelation, when available, might be more appropriate than cluster-robust methods for the purpose of adjusting estimated standard errors in the presence of autocorrelation (e.g., Hansen 2007b). It is shown that Zimmerman's (2004) 2SLS estimates of the deterrent effect of capital punishment are robust to a wide variety of alternative parametric corrections for autocorrelation. An alternative model specification that incorporates a lagged dependent variable structure also provides some evidence of a deterrent effect of state executions. And while not all the point estimates of the deterrent effect presented herein are statistically significant, one could argue that the preponderance of the evidence suggests that state executions do in fact impart an appreciable reduction in the number of per capita state murders.

Clearly, the proverbial “death penalty debate” is hardly decided as a result of D&W's critique.^{41} This paper shows that many of D&W's criticisms of Zimmerman's original work do not hold up under scrutiny, and other authors have also rebutted D&W's criticisms of their research.^{42} Further, it is worth noting that more recent studies (which appeared after D&W's critique was published) also provide empirical evidence suggesting a deterrent effect of capital punishment.^{43} In any event, D&W's study will likely stimulate much more investigation in this topic (which is certainly needed in order to gain a clearer understanding of the effects of the death penalty). This outcome, of course, is another important contribution of their work.

*T*]

*he results appear to be highly sensitive to functional form*. When the simultaneous equations model is specified in double-logs

*the estimated deterrent effect of capital punishment disappears*. While other recent studies report a deterrent effect of capital punishment using either linear or logarithmic functional forms … these estimates [i.e., those obtained by Zimmerman]

*nonetheless highlight the longstanding difficulty in conclusively determining whether or not capital punishment deters murder, a difficulty which is unlikely to be resolved anytime soon*.”) (emphasis added). Zimmerman informed D&W that the pre-publication version of their paper provided an incomplete and inaccurate synopsis of his conclusions. See e-mail from Paul R. Zimmerman to Justin Wolfers (December 2, 2005, 11:13 pm) (on file with author). However, D&W (for reasons known only to them) chose to ignore the message.

*unadjusted*probabilities model as the preferred case …” (emphasis added, internal citations omitted). Further, it does not appear that D&W based

*any*of their analyses on the model upon which Zimmerman actually focused. See, e.g., Donohue and Wolfers (2005, note 108) (also misreporting Zimmerman's preferred estimate).

*et al.*(2003) and employs a backward-looking “adjustment” to replace the undefined value of the observations with the last value where it was defined (i.e., on a state-year basis). This method was applied to both the contemporaneous and lagged deterrence probability models, in which case they are referred to as the “adjusted models.” See Zimmerman (2004, p. 171) for further discussion.

*et al.*(2007), and Dezhbakhsh and Rubin's 2007 unpublished paper, “From the ‘Econometrics of Capital Punishment’ to the ‘Capital Punishment’ of Econometrics: On the Use and Abuse of Sensitivity Analysis”. It is also worth noting that studies examining the finite-sample properties of cluster-robust corrections in simulations have been largely considered in terms of OLS or GLS estimation, but D&W apply these corrections to Zimmerman's

*instrumental variables*(2SLS) regressions. As noted in Wooldridge's unpublished paper, “Cluster-Sample Methods in Applied Econometrics: An Extended Analysis” (2006, p. 14): “Unfortunately, for any of the IV methods there appears to be little simulation evidence for how ‘large

*G*’ [where

*G*denotes the number of clustering groups] standard errors work for IV methods when

*G*is not so large.”

*et al.*(2004, p. 255). An OLS regression of the state murder rate (

*M*

_{i,t}) on the state dummies, year dummies, and state-specific time trends (i.e., the controls for “fixed effects” in Zimmerman's models) is estimated in order to obtain the predicted values of the dependent variable (denoted by ). The estimated residuals (denoted ) are then computed as the difference between the actual and predicted values of the murder rate (i.e., ). Finally, the first-, second-, and third-order autocorrelation coefficients are estimated from an OLS regression of the estimated residuals on their respective first-, second-, and third-order lagged values (i.e., , , and , respectively). This procedure yields an estimated first-order autocorrelation coefficient of 0.37 for the state murder rate data used in Zimmerman's analysis. The corresponding second-order estimated autocorrelation coefficient is also positive but approximately 76%

*smaller*in magnitude. Finally, the third-order autocorrelation coefficient turns negative while remaining small in absolute magnitude. As such, these results do not suggest that the bias introduced by including the once-lagged murder rate in the covariate set is likely to be substantial, or that there is likely to be a problem with autocorrelation beyond that pertaining to first-order effects. This same procedure was also applied to the contemporaneous execution probabilities employed in Models 5 and 6. The estimated first-, second-, and third-order autocorrelation coefficients (in their respective order) are as follows: Model 5 (0.17, −0.25, −0.15); Model 6 (0.31, −0.26, −0.14).

*not*weighted by state population.

*et al.*(2003) study and in D&W's replication of that study (see http://islandia.law.yale.edu/donohue/pubsdata.htm).

*t*-statistic = −2.80).

*Population*

_{1997}is the total population (in thousands) of the death penalty states in 1997 and

*Death sentences*

_{1996}is the number of persons sentenced to death in 1996.

*dropped*in order to estimate, so for these specifications the effective sample size is smaller than that corresponding to the original sample. In any event, the null hypothesis that the respective point estimates are equal to those obtained from Zimmerman's corresponding original estimates

*cannot*be rejected, with the computed

*p*-values ranging from 0.31 to 0.93.

*downward*biased. However, Keele and Kelly (2006) present Monte Carlo evidence suggesting that this bias is likely to be small unless the degree of (first-order) autocorrelation present in the original errors is relatively high. As such, the estimates presented here may be regarded as

*conservative*approximations of the deterrent effect of capital punishment. Further, while some commentators have raised concerns regarding the use of lagged crime rates as instrumental variables, similar concerns are unlikely to apply to the inclusion of the lagged crime rate as a covariate in the structural equation. For instance, it is not obvious why the inclusion of a

*predetermined*crime rate ought to be regarded as “endogenous” in this instance. But even if this is not the case, the analysis presented herein is not primarily concerned with identifying consistent estimates on the lagged crime rate, i.e., it is only concerned with deriving inference on the key explanatory variable of interest, namely the execution risk.

*second-stage*residuals are those obtained from a regression of the dependent variable (i.e., the state murder rate) on the predicted values of the endogenous variables (which are obtained from the

*first-stage*regressions) and all other exogenous variables from the structural regression. See Klepinger

*et al*. (1995) for further discussion of this approach to instrument selection.

*are not considered so sufficiently compelling that they warrant ignoring the appropriate testing of the instruments excluded from the structural per-capita murder equation*.” See Zimmerman (2004, p. 174) (emphasis added).

*F-*statistics pertaining to the joint significance (relevance) of the instruments in the first-stage regressions of the endogenous deterrence probabilities are all statistically significant at (better than) the 1% level in a two-tailed test. In addition, the 2SLS coefficient estimates pertaining to the deterrence probabilities in Model 5′ and Model 6′ are substantially larger in magnitude relative to their respective OLS estimates, which should not be the case if the instruments are too “weak.”

*Zimmerman’s*data was included only as part of their pre-publication paper 2005's “Uses and Abuses of Empirical Evidence in the Death Penalty Debate” but was subsequently removed from the published version (Donohue and Wolfers, 2005). However, D&W did

*not*remove any discussion of the randomization test that was applied to the Dezhbakhsh

*et al*. (2003) study. As such, some discussion of D&W's test is still germane.

*test statistic*is that associated with a randomization/permutation test. The rationale behind this testing methodology is that if an explanatory variable has no influence on a dependent variable then it should make little difference to the outcome of the

*test statistic*if the values of this explanatory variable are shuffled and matched up with different dependent variable values. By performing this shuffling thousands of times, each time calculating the

*test statistic*, the hypothesis can be tested by seeing if the original

*test statistic*value is unusual relative to the thousands of

*test statistic*values created by the shufflings … Hypothesis testing is based on viewing the

*test statistic*as having resulted from a game of chance … (emphasis added).

*t*-statistic, in order to draw proper inference (which would not be obtained by relying on a coefficient estimate).

*within*two or more groups or “blocks” of observations, and then obtained their 1000 artificial coefficient estimates from the 1000 “block-shuffled” datasets. This approach is ordinarily taken when the researcher believes that the effect of the “other” explanatory variables of interest (e.g., the variables other than the execution rate) on the outcome measure (the murder rate) is so strong that it is difficult to detect the effect of the key explanatory measure (the execution rate). Rejection of the null in the instant case would imply that executions had a significant effect within treatment groups. Yet, while it appears D&W may have based their blocks (treatment groups) on years, this is never stated explicitly, nor were the sensitivity of the randomization results to other block constructions (e.g., death penalty versus non-death penalty states, states that have conducted executions versus those that have not, by decade, etc.) discussed, even though these appear to be reasonable alternatives As such, it is difficult (if not impossible) to assess the validity or robustness of D&W's results.

*Capital Punishment, 2003,*U.S. Bureau of Justice Statistics, http://www.ojp.usdoj.gov/bjs/pub/pdf/cp03.pdf.

*all*159 of these prisoners had been

*removed*from death row (none by executions). Thus, these 159 additional death sentences are not included in D&W's figures. In addition, only two persons were

*received*on death row during 2003 in Illinois. As such, the entire year-end death row population in Illinois at the end of 2003 consisted of only two individuals. See

*Capital Punishment, 2003*.

*not*being argued that the construction of the “execution risk” discussed here is appropriate. Rather, its consideration in the above discussion is made only to illustrate the need to consider the proper unit of analysis when drawing any inferences regarding the “signal” associated with state executions.

*et al.*(2006), which finds evidence in favor of the deterrence hypothesis while considering a broad range of econometric specification issues.