## Abstract

In many post-election surveys, the proportion of respondents who claim to have voted is greater than government-reported turnout rates. These differences have often been attributed to respondent lying (e.g., Burden 2000 ). In a search for greater accuracy, scholars have replaced respondent self-reports of turnout with government records of their turnout (a.k.a. turnout validation). Some scholars have interpreted “validated” turnout estimates as more accurate than respondent self-reports because “validated” rates tend to be lower than aggregate self-reported rates and tend to be closer to government-reported rates. We explore the viability of turnout validation efforts. We find that several apparently viable methods of matching survey respondents to government records severely underestimate the proportion of Americans who were registered to vote. Matching errors that severely underestimate registration rates also drive down “validated” turnout estimates. As a result, when “validated” turnout estimates appear to be more accurate than self-reports because they produce lower turnout estimates, the apparent accuracy is likely an illusion. Also, among respondents whose self-reports can be validated against government records, the accuracy of self-reports is extremely high. This would not occur if lying was the primary explanation for differences between reported and official turnout rates. These findings challenge the notion that the practice of “turnout validation” offers a means of measuring turnout that is more accurate than survey respondents’ self-reports.

## Introduction

In the United States, it is not uncommon for 70 to 90 percent of respondents in nationally representative sample surveys to report having voted in an election in which the actual turnout rate was 50 percent or lower, according to government figures ( McDonald 2005 ). As Gera, Krosnick, and DeBell (2011) demonstrated, even gold-standard surveys (e.g., the General Social Survey, the American National Election Studies, and the Current Population Survey Voter Supplement) have overestimated turnout by 1020 percent on average in recent decades (see table 1 ).

Table 1.

Average Margin of Overestimation in the General Social Survey (1972–2004), American National Election Studies (1952–2008), and the Current Population Surveys (1972–2008)

Average margin of overestimation
(all past elections)
Average margin of overestimation
(election within one year of the interview)
GSS ANES CPS GSS ANES CPS
Turnout gap 15% 17% 13% N/A 15% 11%
Average margin of overestimation
(all past elections)
Average margin of overestimation
(election within one year of the interview)
GSS ANES CPS GSS ANES CPS
Turnout gap 15% 17% 13% N/A 15% 11%

S ource .—Katie Gera, Jon A. Krosnick, and Matthew DeBell. 2011. “Overestimation of Voter Turnout in National Surveys: An Examination of Levels of Overestimation and Relevant Explanatory Factors.” Manuscript, Stanford University.

This pattern of error has led to an interest in turnout validation (TV) (e.g., Bernstein, Chadha, and Montjoy 2001 ; Green and Gerber 2005 ). TV is a process in which a survey respondent’s self-report of turnout in a specific election is compared to government records of that individual’s registration and turnout in the same election as a way to confirm or refute the veracity of the respondent’s report. When such checking has been done, government records have failed to validate some respondents’ reports of having voted (e.g., Belli, Traugott, and Beckmann 2001 ). This has been viewed as evidence of survey respondents lying about having turned out to vote.

TV has generally yielded aggregate turnout rates for survey samples that are lower than self-reported turnout rates and closer to officially reported turnout rates. Some scholars have used such findings to conclude that TV identifies liars and yields more accurate assessments of individuals’ turnout than do respondent self-reports. As a result, some researchers have used TV data instead of self-reports as the evidentiary basis for their voting research ( Sigelman and Jewell 1986 ; Gerber and Green 2000 ; Gimpel and Schuknecht 2003 ; Dyck and Gimpel 2005 ; Haspel and Knotts 2005 ).

In the 1980s, the American National Election Studies (ANES) conducted TV. In the 1990s, high costs and other factors led the project’s leaders to discontinue these efforts. Today, a proliferation of electronic voter databases and the data-collection mandates in the Help America Vote Act (HAVA) reduce TV’s costs. We therefore took this opportunity to validate ANES survey respondents’ claims about their registration and turnout behavior in the 2008 general election and thereby to evaluate the effectiveness of TV.

This paper describes our methods and findings. We begin below by outlining steps that must be taken to match survey respondent self-reports to individual-level government records of registration and turnout. We offer reasons to question the reliability of past matching techniques and then describe the survey data and methods that we employed, as well as our findings and their implications.

Our main finding is that several apparently viable methods of matching survey respondents to government records severely underestimate the proportion of Americans who were registered to vote. Matching errors that severely underestimate registration rates also drive down “validated” turnout estimates. We also find that among respondents whose self-reports can be validated against government records, the accuracy of self-reports is extremely high. This would not occur if lying was the primary explanation for differences between reported and official turnout rates.

An implication of these findings is that when “validated” turnout estimates appear to be more accurate than self-reports because they produce lower turnout estimates, the apparent accuracy is most likely an illusion. The illusion is a combination of estimated registration rates that are too low and survey respondents turning out to vote at significantly higher rates than non-respondents. In sum, our findings challenge the notion that the practice of “turnout validation” offers turnout measures that are more accurate than survey respondents’ self-reports.

## Methods of Turnout Validation

If a researcher wants to use government records to conclude that a particular survey respondent did or did not vote in a particular election, two critical assumptions must be made. One assumption is that a respondent who turned out has a government record that can be unambiguously located. A second assumption is that the inability to locate a respondent’s turnout record means that he/she did not vote. As we outline below, several facts about government records suggest caution before endorsing such assumptions.

### HOW GOVERNMENT RECORDS ARE ASSEMBLED AND UPDATED

Every state maintains individual-level registration and turnout records, which are created initially when a person registers to vote. Federal law requires all registrants to provide either a driver’s license number or Social Security number. States vary in what other information they solicit. 1 Frequently elicited information includes current and former names, current and former residential addresses, mailing addresses, date of birth, place of birth, sex, race, telephone number, and party affiliation.

The Help America Vote Act of 2002 and the National Voter Registration Act of 1993 specify minimum requirements about how state records must be kept, but states have considerable latitude in how they manage their records and in what information they release to analysts. For example, a state might require a resident to provide a full date of birth on a registration form but might include only the year of birth or the age of the person in the released government record. States differ in how they manage their records; information provided by some states is not provided by other states.

States also differ in the kinds of people who are listed in their records. In all states, government records of registration and turnout include information about citizens who are registered to vote at the time the records are produced. These records do not include information about citizens who never registered. However, some states’ records include information about individuals who were previously registered but are no longer eligible to vote (e.g., individuals who have died, moved, or committed a felony), while other states purge information about these individuals. Some states provide information about individuals who have attempted to register but were denied (e.g., individuals who failed to report all required information on a registration form). Other states do not provide this information.

States also differ in terms of when and how they update their records. For example, all states have access to the United States Post Office’s National Change of Address (NCOA) directory. However, states are not required to use this information with equal frequency or at identical times. Some states update rarely, and others update frequently. States also vary in the frequency with which they report name changes of individuals, report that individuals have died, or report that people have been convicted of felonies (and are therefore no longer eligible to vote).

Some states rely on local jurisdictions to provide relevant information. Local jurisdictions vary in the timing and frequency of their data collection and release (see, e.g., McDonald, 2007 ). Consequently, at any given time, state records can be more up to date for some jurisdictions than for others. Moreover, turnout histories for slower-to-report jurisdictions within a state may not be updated until well after a state has made other changes to its registration database.

These many variations across states challenge any attempt to validate survey respondents’ registration and turnout behaviors. Because states differ in how they collect, update, and release data, there may be no single point in time when multiple states’ records of who was registered to vote and who turned out in a given election are maximally accurate.

### THE CHALLENGE OF LOCATING GOVERNMENT RECORDS FOR SURVEY RESPONDENTS

If a state has an accurate registration and turnout record for a survey respondent, accurate validation requires locating that record. In this section, we discuss difficulties associated with locating such records.

One way of locating a record is to search a state’s registration and turnout databases for the full name that a respondent provided to survey researchers. However, government records list many people who have the same name. When two or more people in government records have the respondent’s name, researchers use other information about the respondent to attempt to identify the correct government record. But such additional information is not always available in government records and is sometimes inaccurate, which can make correct record identification less likely.

Locating the correct record becomes more difficult when mistakes are made during the survey data collection. An Internet survey respondent might mistype his or her street name or house number when completing a survey, or an interviewer might mishear or mistype what a respondent says during an oral interview. Likewise, a researcher might have difficulty deciphering what a respondent has written on a paper questionnaire. If these mistakes are part of a survey database, the mistakes can make it more difficult to locate the correct government record.

Government records are also incomplete and contain errors. McDonald (2007) reported that sex was missing from 50 percent of the 2004 California records and that race was missing from 50 percent of the 2004 Kentucky records. 2McDonald (2007) also found government records with birthdates indicating that some people were several hundreds of years old, while other people had not yet been born. Such errors and omissions make it difficult or impossible to locate some survey respondents’ government records. These errors, in turn, cause a downward bias in the apparent rate at which respondents are registered to vote (e.g. Presser, Traugott, and Traugott 1990 ).

An additional complicating factor is that the frequencies of inaccurate and missing data in government records vary across states. In contrast to California’s 2004 records, none of the 2004 records from Iowa and Kentucky were missing a person’s sex ( McDonald 2007 ). In contrast to Kentucky, none of North Carolina’s 2004 records were missing race ( McDonald 2007 ). About 10 percent of the records in the 2010 West Virginia records contained addresses that the US Post Office designated as undeliverable or probably undeliverable. Fewer than 2 percent of records in Maryland, Washington, and the District of Columbia displayed such problems ( Ansolabehere and Hersh 2010 ). Differences in the proportions of inaccurate, problematic, and missing data across states mean that a method for locating records that works well in one state may not succeed in other states.

Audits of state records illustrate further inaccuracies in government records. Ansolabehere et al. (2010a , b ) mailed questionnaires to registered persons in two jurisdictions between August 2008 and July 2009. They found that 8 to 25 percent of these government records included invalid entries.

Furthermore, information provided by a survey respondent about himself/herself and the information in his/her government record are not always identical, even if neither contains errors. For example, a government record might show an individual’s proper first name (e.g., Patrick), whereas he or she provided an informal version of that name to a survey researcher (e.g., Pat). This problem also occurs when a respondent provides an informal name or a nickname that is not commonly associated with a specific proper name (e.g., “Bud” or “Butch” is sometimes used to distinguish sons and fathers with the same name).

To conclude that a specific respondent’s registration and turnout in a specific election can be accurately ascertained with government records requires that respondents who turned out have a record that can be unambiguously located and that the failure to locate a record for a respondent indicates that he/she did not vote. Given the many problems with government records and efforts to locate them described above, the accuracy of TV data cannot simply be assumed. We therefore evaluated the extent to which such problems affect various TV methods.

## The Present Study

### SURVEY DATA

For the ANES 2008–2009 Panel Study, Knowledge Networks (now GfK Custom Research) made RDD telephone calls to recruit two cohorts of respondents. Cohort 1 was recruited between September 26, 2007, and January 27, 2008. Cohort 2 was recruited between May 28 and September 9, 2008. A total of 2,367 eligible respondents from Cohort 1 completed an initial recruitment survey, as did 1,839 eligible respondents from Cohort 2. Thus, the initial sample combining the two cohorts included 4,206 eligible respondents. 3 Cohort 1 began completing monthly surveys online in January 2008, and Cohort 2 began doing so in September 2008. The panel ended in September 2009.

We analyzed data from respondents who answered at least one registration question and one turnout question (see the for exact question wording and response options). Several early waves included registration questions, but turnout was measured only in the October and November 2008 surveys. Thus, a respondent must have completed the October or November 2008 survey to be included in our analyses.

We used two methods to estimate cumulative response rates (AAPOR RR3). Method 1 used different estimates of eligibility rates among cases with unknown eligibility for English-speaking, non-English-speaking, and non-contact households. Method 2 used the proportional allocation strategy to estimate these rates. 4 Cumulative response rates for Cohort 1 ranged from 19.9 percent (Method 1) to 26.7 percent (Method 2) for both the October and November waves. The rates for Cohort 2 ranged from 18.0 to 25.5 percent for the October wave and 18.7 to 26.4 percent for the November wave for Methods 1 and 2, respectively.

### GOVERNMENT RECORDS

We obtained individual-level registration and turnout records from California ( N = 17,094,209), Florida ( N = 12,570,869), New York ( N = 11,660,114), North Carolina ( N = 6,154,773), Ohio ( N = 8,246,881), and Pennsylvania ( N = 8,444,317) between February and June 2010 (see online appendix 1 for details on these government records). 5 Each state delivered an electronic file that contained the name, registration information, and turnout information for every individual registered in those states at the time the records were obtained, plus information for some people who had previously been registered but were no longer registered. 6

### MATCHING GOVERNMENT RECORDS TO SURVEY RESPONDENTS

To use a government record to evaluate a survey respondent’s registration or turnout self-report, it is necessary to develop criteria by which a respondent is matched to a record. 7 Different criteria yield different numbers of respondents who can be considered to be matched to the correct record. Overly stringent criteria will fail to match some respondents who should be matched to a particular record. Overly lax criteria will match some respondents to records that are not theirs.

We chose to evaluate a range of matching criteria to provide a broad view of the types of estimates that various TV procedures produce. Our STRICT criteria matched respondents to government records with an identical name, address, and birthdate information. 8 Our LEAST criteria matched respondents to records with identical or similar names, similar (but not necessarily identical) birthdates, and different addresses in certain combinations. The strictness of our MOD criteria fell between the STRICT and LEAST methods (see online appendix 2 for details). The percentages of survey respondents matched to government records for each of the methods were 45.6 percent (STRICT), 65.1 percent (MOD), and 77.4 percent (LEAST).

### MAKING THE POPULATIONS COMPARABLE

#### Weighting:

The ANES 2008–2009 Panel Study sample was designed to be representative of the population of US citizens, aged 18 years or older on election day and living in a household served by a landline. Online appendix 3 describes the procedure (developed by DeBell and Krosnick [2010] ) used to build weights to adjust the survey data to match the population demographically in each of the states we examined. Post-stratification used sex, age, race, ethnicity, education, income, and marital status.

#### Benchmarks:

To compute true rates of registration and turnout to which to compare various survey-based estimates, it is standard to introduce several adjustments to each target state’s Voting Age Population (VAP) and officially released government reports of registration and turnout. 9 The adjustments remove individuals who were not able to participate in surveys and are designed to increase the comparability of the population of people whom a survey sample represents to the population of people described by benchmark registration and turnout statistics.

Specifically, each state’s VAP was adjusted by removing people who were not eligible to participate in the ANES survey because they were incarcerated, not US citizens, or not living in a household served by a landline. Officially released state registration numbers were reduced by removing eligible citizens who were located overseas and could not be interviewed for the survey, eligible citizens living in households not served by a landline, and people registered in the state who lived in a different state. 10 We also removed eligible citizens located overseas, in households not served by a landline, or living in a state other than the one in which they were registered from the officially released state turnout numbers. These adjustments are described in online appendix 4 .

### RESULTS

We now describe three types of estimates. First, we used the three matching methods to produce turnout rate estimates of the type usually produced in TV studies. Next, we used the three methods to estimate registration rates. Then, we used the three methods to estimate turnout given registration estimates.

#### Turnout rates:

Consistent with previous studies, respondents in the 2008 ANES Panel Study said they turned out at higher rates than government statistics indicated. Of the 2,515 (weighted) ANES respondents who answered one of the turnout questions, 85.9 percent reported voting in the 2008 general election. The official turnout rate for the population of US citizens 18 years or older living in the United States in a household served by a landline was much lower: 61.7 percent (difference = 24.2 percent, p < .01). As table 2 shows, the discrepancy between self-reported turnout among respondents in the six target states and official turnout rates in those six states was nearly identical to the discrepancy found for the nation as a whole (rate among respondents in target states = 87.6 percent, official rate across target states = 62.6 percent, difference = 25.0 percent, p < .01).

Table 2.

Turnout Rates by Various Methods a

State  Of the US citizens over 18 years of age who lived in a household in the state served by a landline telephone as of November 2008, the percent who voted b  Of the survey respondents whose address was in the state, the percent who said they voted in that state c Of the survey respondents whose address was in the state, the percent matched to a government record indicating they voted  Weighted number of survey
respondents
whose
Knowledge
Networks
the state
STRICT MOD LEAST
All target states 62.6% 87.6% 43.1% 59.3% 69.3% 753
California 61.7% 88.4% 42.8% 58.2% 67.4% 255
Florida 65.0% 88.6% 41.2% 58.8% 72.9% 113
New York 58.5% 83.4% 27.7% 54.3% 68.5% 99
North Carolina 65.5% 84.4% 37.6% 50.6% 60.3% 71
Ohio 65.9% 87.5% 51.3% 64.4% 75.2% 104
Pennsylvania 62.1% 90.4% 55.0% 67.5% 71.2% 111
State  Of the US citizens over 18 years of age who lived in a household in the state served by a landline telephone as of November 2008, the percent who voted b  Of the survey respondents whose address was in the state, the percent who said they voted in that state c Of the survey respondents whose address was in the state, the percent matched to a government record indicating they voted  Weighted number of survey
respondents
whose
Knowledge
Networks
the state
STRICT MOD LEAST
All target states 62.6% 87.6% 43.1% 59.3% 69.3% 753
California 61.7% 88.4% 42.8% 58.2% 67.4% 255
Florida 65.0% 88.6% 41.2% 58.8% 72.9% 113
New York 58.5% 83.4% 27.7% 54.3% 68.5% 99
North Carolina 65.5% 84.4% 37.6% 50.6% 60.3% 71
Ohio 65.9% 87.5% 51.3% 64.4% 75.2% 104
Pennsylvania 62.1% 90.4% 55.0% 67.5% 71.2% 111

a Rates for survey respondents are weighted to reflect within-state populations (see online appendix 3 ).

c Percentages based on respondents who provided substantive responses to registration and turnout questions.

As was the case in previous TV research, turnout rates derived from attempts to match individual respondents to their government records were much lower than self-reported rates. STRICT produced a turnout rate (43.1 percent) that was much lower than the actual rate (difference = 19.5 percent, p < .01). 11 MOD (59.3 percent) and LEAST (69.3 percent) produced estimates that were not significantly different than the actual rate across target states (MOD: difference = 3.3 percent, ns; LEAST: difference = 6.7 percent, ns). The rates produced by MOD and LEAST were also closer to official turnout rates than those derived from self-reports. From this perspective, the TV estimates might seem to be more accurate than the self-reports.

Additional analyses seem to support the relative accuracy of the MOD and LEAST turnout estimates. With the exception of the LEAST estimate in Ohio (actual rate = 65.9 percent, LEAST estimate = 75.2 percent, difference = 9.3 percent, p < .10), MOD and LEAST yielded turnout rates that were not significantly different than the official rate in every state. By contrast, self-reports overestimated turnout in every state, and STRICT underestimated turnout rates for every state except Pennsylvania (Pennsylvania: actual rate = 62.1 percent, STRICT estimate = 55.0 percent, difference = 7.1 percent, ns). 12

To further assess the accuracy of the different methods, we decomposed the turnout rates into two components: (1) rates at which survey respondents were registered to vote ; and (2) rates at which registered respondents turned out to vote . If LEAST and MOD are more accurate than self-reports, then these methods should produce estimates of (1) and (2) that are closer to official government rates than are estimates derived from self-reports. However, this did not occur.

#### Registration rates:

Table 3 shows official registration rates, self-reported registration rates, and estimates of these rates created using STRICT, MOD, and LEAST. 13 According to official statistics, 83.7 percent of US citizens, 18 years old or older, living in the target states in households served by a landline as of November 2008, were registered to vote. Self-reports yielded an estimate of 87.8 percent (difference = 4.2 percent, p < .05). LEAST produced an estimated registration rate (78.5 percent) that was almost as close to the true rate as the self-reported rate, and it also differed significantly from the actual rate (difference = 5.1 percent, p < .05). 14 The estimated registration rates from MOD (66.3 percent) and STRICT (49.1 percent) substantially underestimated the true registration rate (MOD: difference = 17.3 percent, p < .01; STRICT: difference = 35.6 percent, p < .01).

Table 3.

Registration Rates by Various Methods a

State  Of the US citizens over 18 years of age who lived in a household in the state served by a landline telephone as of November 2008, the percent who were registered b  Of the survey respondents whose address was in the state, the percent who said they were registered in that state c Of the survey respondents whose address was in the state, the percent matched to a government registration record  Weighted number of survey
respondents
whose
Knowledge
Networks
the state
STRICT MOD LEAST
All target states 83.7% 87.8% 49.1% 66.3% 78.5% 753
California 74.6% 87.5% 45.1% 61.7% 73.3% 255
Florida 82.8% 83.4% 49.7% 67.4% 84.4% 113
New York 87.2% 82.7% 39.9% 67.6% 83.1% 99
North Carolina 92.5% 95.6% 56.3% 71.5% 81.2% 71
Ohio 91.4% 94.1% 55.0% 68.4% 84.3% 104
Pennsylvania 87.9% 86.7% 55.8% 69.5% 73.3% 111
State  Of the US citizens over 18 years of age who lived in a household in the state served by a landline telephone as of November 2008, the percent who were registered b  Of the survey respondents whose address was in the state, the percent who said they were registered in that state c Of the survey respondents whose address was in the state, the percent matched to a government registration record  Weighted number of survey
respondents
whose
Knowledge
Networks
the state
STRICT MOD LEAST
All target states 83.7% 87.8% 49.1% 66.3% 78.5% 753
California 74.6% 87.5% 45.1% 61.7% 73.3% 255
Florida 82.8% 83.4% 49.7% 67.4% 84.4% 113
New York 87.2% 82.7% 39.9% 67.6% 83.1% 99
North Carolina 92.5% 95.6% 56.3% 71.5% 81.2% 71
Ohio 91.4% 94.1% 55.0% 68.4% 84.3% 104
Pennsylvania 87.9% 86.7% 55.8% 69.5% 73.3% 111

a Rates for survey respondents are weighted to reflect within-state populations (see online appendix 3 ).

b Percentages based on registration statistics published by states (see online appendix 4 ).

c Percentages based on respondents who provided substantive responses to registration and turnout questions.

No criterion yielded registration rates that were consistently closest to the true registration rate in every state. For California, only LEAST generated a registration rate (73.3 percent) that was not significantly different from the actual rate (74.6 percent, difference = 1.3 percent, ns). For Pennsylvania, self-reports produced the only estimate with this quality (difference = 1.2 percent, ns). Both self-reports and LEAST yielded estimates that were not significantly different than the actual rates in the other states (Florida: actual = 82.8 percent, self-reports = 83.4 percent, LEAST = 84.4 percent, differences = .6 and 1.6 percent, respectively, both ns; New York: actual = 87.2 percent, self-reports = 82.7 percent, LEAST = 83.1 percent, differences = 4.5 and 4.0 percent, respectively, both ns; North Carolina: actual = 92.5 percent, self-reports = 95.6 percent, LEAST = 81.2 percent, differences = .3.1 and 11.3 percent, respectively, ns and p < .10, respectively: Ohio: actual = 91.4 percent, self-reports = 94.1 percent, LEAST = 84.3 percent, differences = 2.7 and 7.1 percent, respectively, ns and p < .10, respectively). STRICT and MOD, by contrast, substantially underestimated actual registration rates in every state. In sum, the matching procedures that produced closer-to-actual estimates of turnout than did self-reports did not have the same success in gauging registration.

#### Turnout given registration:

Table 4 describes the turnout behaviors of people who were registered. The true turnout rate of registered residents across the six states was 72.1 percent. Self-reports yielded a much larger estimate of the same behavior: 94.3 percent (difference = 22.2 percent, p < .01). All matching procedures also yielded much larger estimates than the officially reported rate (STRICT: difference = 15.6 percent, p < .01; MOD: difference = 17.3 percent, p < .01; LEAST: difference = 16.2 percent, p < .01).

Table 4.

Comparison of the Percent of Registered People Who Turned Out and What Government Records Suggest about the Percent of Registered Respondents Who Turned Out a

State  Of the registered US citizens over 18 years of age who lived in a household in the state served by a landline telephone as of November 2008, the percent who voted b  Of the survey respondents whose address was in the state and who said they were registered in the state, the percent who said they voted in that state c,d  Of the survey respondents matched to government record in the state in which their Knowledge Networks address was located, the percent whose government record said they voted d
STRICT MOD LEAST
All target states 72.1% 94.3% 87.7% 89.4% 88.3%
50,512,502 662 370 500 592
California 79.4% 96.3% 95.0% 94.3% 91.9%
13,623,965 223 115 157 187
Florida 75.2% 99.7% 82.8% 87.3% 86.4%
8,844,727 95 56 76 96
New York 64.2% 85.7% 69.3% 80.3% 82.3%
9,512,779 82 40 67 82
North Carolina 69.9% 88.3% 66.7% 70.8% 74.3%
5,045,307 68 40 51 58
Ohio 69.7% 92.8% 93.3% 94.2% 89.2%
6,523,018 98 57 71 88
Pennsylvania 68.5% 97.8% 98.6% 97.1% 97.3%
6,962,707 96 62 77 81
State  Of the registered US citizens over 18 years of age who lived in a household in the state served by a landline telephone as of November 2008, the percent who voted b  Of the survey respondents whose address was in the state and who said they were registered in the state, the percent who said they voted in that state c,d  Of the survey respondents matched to government record in the state in which their Knowledge Networks address was located, the percent whose government record said they voted d
STRICT MOD LEAST
All target states 72.1% 94.3% 87.7% 89.4% 88.3%
50,512,502 662 370 500 592
California 79.4% 96.3% 95.0% 94.3% 91.9%
13,623,965 223 115 157 187
Florida 75.2% 99.7% 82.8% 87.3% 86.4%
8,844,727 95 56 76 96
New York 64.2% 85.7% 69.3% 80.3% 82.3%
9,512,779 82 40 67 82
North Carolina 69.9% 88.3% 66.7% 70.8% 74.3%
5,045,307 68 40 51 58
Ohio 69.7% 92.8% 93.3% 94.2% 89.2%
6,523,018 98 57 71 88
Pennsylvania 68.5% 97.8% 98.6% 97.1% 97.3%
6,962,707 96 62 77 81

a Population and sample sizes are italicized.

b Percentages are based on registration and turnout statistics published by states (see online appendix 4 ).

c Percentages based on respondents who provided substantive responses to registration and turnout questions.

d Respondents are weighted to reflect within-state populations (see online appendix 3 ).

None of the matching methods was consistently accurate across the six target states. STRICT yielded estimates were not significantly different from the actual rates in Florida (actual = 75.2 percent, STRICT = 82.8 percent, difference = 7.6 percent, ns), New York (actual = 64.2 percent, STRICT = 69.3 percent, difference = 5.1 percent, ns), and North Carolina (actual = 69.9 percent, STRICT = 66.7 percent, difference = 3.2 percent, ns), but significantly overestimated rates in the other three states (California: actual = 79.4 percent, STRICT = 95.0 percent, difference = 15.6 percent, p < .01; Ohio: actual = 69.7 percent, STRICT = 93.3 percent, difference = 23.6 percent, p < .01; and Pennsylvania: actual = 68.5 percent, STRICT = 98.6 percent, difference = 30.2 percent, p < .01). MOD estimates of turnout among registered people showed the same pattern of differences from the actual rates (Florida: difference = 12.2 percent, ns; New York: difference = 16.1 percent, ns; and North Carolina: difference = .9 percent, ns; California: difference = 14.9 percent, p < .01; Ohio: difference = 24.5 percent, p < .01; and Pennsylvania: difference = 28.6 percent, p < .01). LEAST estimates were not significantly different from actual rates for two of the states (Florida: difference = 11.2 percent, ns; and North Carolina: difference = 4.4 percent, ns; California: difference = 12.5 percent, p < .01; New York: difference = 18.2 percent, p < .05; Ohio: difference = 19.5 percent, p < .01; and Pennsylvania: difference = 28.8 percent, p < .01).

Table 5 summarizes our findings so far. STRICT underestimated turnout and registration but overestimated turnout among people who were registered. MOD produced turnout estimates that were not significantly different from actual rates but significantly underestimated registration and significantly overestimated turnout among registered persons. LEAST overestimated turnout and overestimated turnout among registered persons but underestimated registration.

Table 5.

Summary of the Errors When Using Self-Reports and Matching to Estimate Turnout, Registration, and Turnout among People Registered

Rate estimated  Method for estimating the actual rate across the six target states
Self-reports STRICT matching MOD matching LEAST matching
Turnout –
Registration – – –
Turnout among people registered in the target states
Rate estimated  Method for estimating the actual rate across the six target states
Self-reports STRICT matching MOD matching LEAST matching
Turnout –
Registration – – –
Turnout among people registered in the target states

Note.— Cell entry indicates the type of error in estimating an actual rate across the six target states. “0” indicates no difference between the estimated and actual rate, “+” indicates that the method overestimates the actual rate, and “–“ indicates that the method underestimates the actual rate.

#### Turnout rates among people matched to government records:

Table 6 describes the apparent accuracy of self-reports. Among respondents who were matched to a government record using MOD, STRICT, or LEAST, the records confirmed 94 percent of their self-reports. This finding suggests that these people did not lie during their survey interviews. Across all target states, self-reports and STRICT both indicated that 88 percent of these respondents voted and 6 percent did not vote. Of the 6 percent of the respondents for whom self-reports and STRICT disagreed, all said they turned out when the government record indicated that they did not. 15 If we assume that record matches were accurate, or at least unbiased with respect to lying when they were inaccurate, we can infer that no more than 6 percent of respondents lied about having turned out to vote. 16 Similar results were obtained using MOD and LEAST: self-reports matched the government record for 94.0 and 93.9 percent of respondents, respectively. So whereas the ANES 2008–2009 Panel Study self-reports overestimated turnout by almost 30 percentage points, only 6 percent of matched respondents had government records contradicting their claims of having voted.

Table 6.

Agreement between Self-Reported Turnout and Turnout Based on Government Records among Respondents Matched to a Government Record a

State Matching method  Self-report: yes
Record: yes
Self-Report: no
Record: no
Total
consistent
Self-report: yes
Record: no
Self-report: no
Record: yes
Total
inconsistent
N
All target states STRICT 87.6% 5.9% 93.5% 6.5% 0.0% 6.5% 370
MOD 89.4% 4.6% 94.0% 6.0% 0.0% 6.0% 500
LEAST 88.3% 5.6% 93.9% 6.1% 0.0% 6.1% 591
California STRICT 95.6% 2.6% 98.2% 1.8% 0.0% 1.8% 114
MOD 94.3% 2.5% 96.8% 3.2% 0.0% 3.2% 158
LEAST 92.0% 2.7% 94.7% 5.3% 0.0% 5.3% 187
Florida STRICT 83.9% 0.0% 83.9% 16.1% 0.0% 16.1% 56
MOD 88.2% 0.0% 88.2% 11.8% 0.0% 11.8% 76
LEAST 86.5% 4.2% 90.6% 9.4% 0.0% 9.4% 96
New York STRICT 69.2% 28.2% 97.4% 2.6% 0.0% 2.6% 39
MOD 80.6% 17.9% 98.5% 1.5% 0.0% 1.5% 67
LEAST 81.9% 15.7% 97.6% 2.4% 0.0% 2.4% 83
North Carolina STRICT 67.5% 15.0% 82.5% 17.5% 0.0% 17.5% 40
MOD 72.0% 12.0% 84.0% 16.0% 0.0% 16.0% 50
LEAST 75.4% 10.5% 86.0% 14.0% 0.0% 14.0% 57
Ohio STRICT 93.1% 1.7% 94.8% 5.2% 0.0% 5.2% 58
MOD 94.4% 1.4% 95.8% 4.2% 0.0% 4.2% 71
LEAST 88.6% 5.7% 94.3% 5.7% 0.0% 5.7% 88
Pennsylvania STRICT 98.4% 0.0% 98.4% 1.6% 0.0% 1.6% 62
MOD 96.2% 1.3% 97.4% 2.6% 0.0% 2.6% 78
LEAST 96.3% 1.2% 97.6% 2.4% 0.0% 2.4% 82
State Matching method  Self-report: yes
Record: yes
Self-Report: no
Record: no
Total
consistent
Self-report: yes
Record: no
Self-report: no
Record: yes
Total
inconsistent
N
All target states STRICT 87.6% 5.9% 93.5% 6.5% 0.0% 6.5% 370
MOD 89.4% 4.6% 94.0% 6.0% 0.0% 6.0% 500
LEAST 88.3% 5.6% 93.9% 6.1% 0.0% 6.1% 591
California STRICT 95.6% 2.6% 98.2% 1.8% 0.0% 1.8% 114
MOD 94.3% 2.5% 96.8% 3.2% 0.0% 3.2% 158
LEAST 92.0% 2.7% 94.7% 5.3% 0.0% 5.3% 187
Florida STRICT 83.9% 0.0% 83.9% 16.1% 0.0% 16.1% 56
MOD 88.2% 0.0% 88.2% 11.8% 0.0% 11.8% 76
LEAST 86.5% 4.2% 90.6% 9.4% 0.0% 9.4% 96
New York STRICT 69.2% 28.2% 97.4% 2.6% 0.0% 2.6% 39
MOD 80.6% 17.9% 98.5% 1.5% 0.0% 1.5% 67
LEAST 81.9% 15.7% 97.6% 2.4% 0.0% 2.4% 83
North Carolina STRICT 67.5% 15.0% 82.5% 17.5% 0.0% 17.5% 40
MOD 72.0% 12.0% 84.0% 16.0% 0.0% 16.0% 50
LEAST 75.4% 10.5% 86.0% 14.0% 0.0% 14.0% 57
Ohio STRICT 93.1% 1.7% 94.8% 5.2% 0.0% 5.2% 58
MOD 94.4% 1.4% 95.8% 4.2% 0.0% 4.2% 71
LEAST 88.6% 5.7% 94.3% 5.7% 0.0% 5.7% 88
Pennsylvania STRICT 98.4% 0.0% 98.4% 1.6% 0.0% 1.6% 62
MOD 96.2% 1.3% 97.4% 2.6% 0.0% 2.6% 78
LEAST 96.3% 1.2% 97.6% 2.4% 0.0% 2.4% 82

a Percentages based on respondents who provided substantive responses to registration and turnout questions. Respondents are weighted to reflect within-state populations (see online appendix 3 ).

To explain the rest of the discrepancy, we offer a visual summary of our main findings and their implications. Table 7 divides the population into 10 mutually exclusive and collectively exhaustive groups. Each group characterizes an individual by whether he/she participated in the survey, whether he/she was registered and turned out to vote (denoted “T&R”), whether he/she reported these behaviors accurately, and whether his/her survey response was matched to the correct government record. The groups are then allocated to three cells.

Table 7.

A List of Ten Mutually Exclusive and Collectively Exhaustive Groups of People

Subgroups of respondents Implications and findings
1. Participant, T&R, accurate, matched.
2. Participant, ~T&R, accurate, matched.
3. Participant, ~T&R, inaccurate, matched.
4. Participant, T&R, inaccurate, matched.
If lying is the main cause of over- reporting and matching is accurate, then Group 3 should be large.
Finding: Group 3 is small.
5. Participant, T&R, accurate, not matched.
6. Participant, ~T&R, accurate, not matched.
7. Participant, ~T&R, inaccurate, not matched.
8. Participant, T&R, inaccurate, not matched.
If TV data are accurate, these groups should be small.
Finding: Groups 5–7 are large and cause severe registration underestimates.
9. Non-Participant, T&R.
10. Non-Participant, ~T&R.
$G1G1−4≫G9G9−10$
Subgroups of respondents Implications and findings
1. Participant, T&R, accurate, matched.
2. Participant, ~T&R, accurate, matched.
3. Participant, ~T&R, inaccurate, matched.
4. Participant, T&R, inaccurate, matched.
If lying is the main cause of over- reporting and matching is accurate, then Group 3 should be large.
Finding: Group 3 is small.
5. Participant, T&R, accurate, not matched.
6. Participant, ~T&R, accurate, not matched.
7. Participant, ~T&R, inaccurate, not matched.
8. Participant, T&R, inaccurate, not matched.
If TV data are accurate, these groups should be small.
Finding: Groups 5–7 are large and cause severe registration underestimates.
9. Non-Participant, T&R.
10. Non-Participant, ~T&R.
$G1G1−4≫G9G9−10$

The top row is the focus of many claims about TV estimates’ relative accuracy and respondent lying. In particular, many scholars have assumed that the percentage of people in Group 3 (liars) is large. We found that it was small.

The middle row of table 7 includes survey respondents who could not be matched to government records. We found that these groups were large, which caused MOD, STRICT, and LEAST to underestimate registration rates.

Like many previous studies, the present investigation revealed that aggregate estimates of turnout from record matching appear to be more accurate than aggregate estimates based on self-reports. However, we have now seen that this apparent accuracy is an illusion and offer evidence of an alternative explanation for the observed patterns. The alternate explanation focuses on the groups shown in table 7’s bottom row survey non-respondents. Our findings suggest the following relation between non-respondents and members of groups 1–4:

Many past claims about the accuracy of TV data are based on the assumption that these ratios are equal: that the percentage of survey respondents who were registered and turned out to vote is equal to the percentage of non-respondents who did so. However, we found that among respondents who could be matched to a government record, people who participated in the survey voted at a much higher rate than did people who did not participate in the survey. So a key assumption underlying past TV evaluations—that survey respondents truly voted at the rate at which the population did—appears to be incorrect.

These findings cast doubt on a seemingly plausible interpretation of currently available TV data. Although some TV estimates were closer to officially reported turnout rates than were turnout rates derived from self-reports, this pattern appears to be the result of more error in the TV estimates rather than greater accuracy. People who participate in election surveys (and perhaps all surveys) are apparently more likely to vote than are people who do not participate in such surveys. Furthermore, more than 93 percent of survey respondents whose government records could be located told the truth about whether they voted. Thus, estimated population turnout rates were closer to official statistics not because TV data eliminated distortion caused by lying. Instead, the seeming accuracy of TV estimates appears to be a joint product of a relatively small amount of lying, severe underestimating of registration, and differences between the turnout behavior of people who choose to participate in post-election surveys and people who do not.

In sum, as in many past surveys, the ANES survey respondents reported turnout rates that were far higher than officially reported government tallies. Two factors appear to have contributed to this overestimation. First, a small proportion of respondents reported turning out whose government records suggests did not. Second, more than 93 percent of survey respondents who were registered told the truth about their turnout—and their actual turnout rate was much higher than that of individuals in the population who do not participate in surveys.

## Discussion

Why were the registration rate estimates produced by the matching procedures so low? One possible explanation is that the ANES’s participating sample of respondents was biased in favor of unregistered individuals. This seems implausible. No literature of which we are aware has suggested that people least interested in the topic of a survey are most likely to participate in that survey. Indeed, many studies indicate the opposite (see, for example, Groves et al. [2006] ). Much more likely is that TV matching procedures failed frequently when attempting to locate survey respondents’ government records.

The registration rates based on matching respondents to government records illustrate a failure-to-match problem. Ideally, the personally identifying information about a respondent who is registered would perfectly match information in a government record. If this were the case, then STRICT would have matched every registered respondent to the correct government record. However, the STRICT algorithm located records for less than half of the respondents, far below what would be expected if more than 80 percent of people in the population were registered. Had we used STRICT to measure registration and turnout, we would have concluded that registration and turnout rates among respondents were dramatically lower than the population rates. We would also have concluded that a sizable number of respondents misreported registration and turnout.

Relaxing the matching criteria using MOD and LEAST caused government records to be located for respondents whose records were not located using STRICT. The additional records matched to respondents overwhelmingly confirmed respondents’ reports of having been registered and turning out to vote. Many respondents who would have been coded as lying about registration and turnout using STRICT were coded as accurately reporting registration and turnout using MOD and LEAST. However, MOD and LEAST generated registration rates that were well below actual rates. This suggests that MOD and LEAST did not completely solve the failure-to-match problem.

Thus, the evidence reported here suggests that matching survey respondents to government records yields lower aggregate turnout rates than do self-reports not because the validated data are more accurate but because the combination of two biases misleadingly drove apparent rates down: (1) the process of matching government records often failed to locate records of respondents who were truly registered and had voted; and (2) survey respondents truly voted at a much higher rate than the general population.

The higher rate of turnout among the ANES survey respondents than in the general public may be a result of biased survey non-response, or it could be the result of “conditioning” caused by completing several questionnaires about the election prior to election day. Past studies indicate that interviewing a person about politics increases the probability that he/she will vote ( Greenwald et al. 1987 ; Greenwald et al. 1988 ; Groves, Cialdini, and Couper 1992 , Mann 2005 ; Smith, Gerber, and Orlich 2003 ; Traugott and Katosh 1979 ). The design of the ANES 2008 Panel Study does not permit assessing the magnitude of such conditioning, so the explanation for high turnout remains an open question.

Also open is the question of whether other matching procedures can overcome the problems documented above. We evaluated a set of validation methods using a particular data set. Perhaps different results would be obtained if we used a different data set, focused on different states, or used different matching procedures. There are other ways to generate and evaluate TV methods that may be superior to ours, and we look forward to future work exploring such possibilities.

Ansolabehere and Hersh (2012) reported having found such a method. They analyzed turnout data sold by the firm Catalist. These investigators concluded that “the overreporting of turnout is attributable to misreporting rather than to sample selection bias” (2012, 438). However, Ansolabehere and Hersh did not explore the accuracy of registration rates and did not describe the details of the matching methods that yielded their data.

We have asked Catalist, as well as many other companies that sell validated turnout data, to describe their methods in enough detail to allow researchers to evaluate the companies’ claims about the relative accuracy of their data. However, Catalist and other such firms consider their matching methods to be proprietary. For example, at a National Science Foundation–sponsored workshop on “The Future of Survey Research” in 2012, Catalist’s representative was asked to describe their matching methods and declined to do so ( Blaemire 2012 ). Details of our attempts at obtaining such information from numerous firms are provided by Berent, Krosnick, and Lupia (2011 , 67–70, 88–90). To date, no distributor of this information has been willing to describe their procedures in as much detail as we have described our methods.

As a result, no scholar at present can accurately assess the extent to which the turnout estimates derived from commercially marketed TV data are a product of the kinds of record-matching errors identified in the investigation reported here. Hence, the scholarly community has no basis for understanding whether claims based on these commercial data reflect credible inferences from accurate data or whether they are the outcomes of the types of errors described in this paper. 17

We hope that in the future, research will live up to the standard of transparency that is vital for all of science and will implement TV methods in ways that can be rigorously evaluated by other scholars. Until there is greater transparency, it seems prudent to be cautious before presuming that TV estimates are more accurate than self-reports of registration and turnout.

## Conclusion

Although many scholars attribute survey overestimation of turnout to respondent lying, we have reported evidence for a different explanation in the ANES 2008–2009 Panel Study. Actual and self-reported turnout numbers were nearly identical among respondents for whom we could match to a government record, suggesting high accuracy of the self-reports. So whereas the ANES 2008–2009 Panel Study overestimated turnout by almost 30 percentage points, only 6 percent of matched respondents had government records contradicting their claims of having voted, and some of this discrepancy could be due to errors in government records. Hence, respondent lying apparently contributed less to turnout overestimation than is commonly presumed.

Moreover, the seeming superiority of TV data over self-reports appears to have been an illusion caused by two biases. A downward bias comes from failures to match survey respondents to their government records. These failures generate implausibly low registration rate estimates. An upward bias comes from survey respondents turning out to vote at a higher rate than non-respondents (and telling the truth about their behavior when answering survey questions). The apparent accuracy of “validated” estimates is due to the downward bias being large and the upward bias being smaller.

This creates a dilemma for researchers hoping to identify and employ the most accurate measure of respondents’ turnout behaviors in their empirical investigations. On the one hand, self-reports lead a few respondents who did not vote to be erroneously coded as having turned out. On the other hand, government records lead many more respondents who did vote to be wrongly coded as not having turned out. The former inflates sample registration rates, and the latter attenuates those rates. We look forward to engaging with the research community to identify rigorous, transparent, and broadly applicable solutions to an important problem in the study of voting.

## Supplementary Data

Supplementary data are freely available online at http://poq.oxfordjournals.org/ .

## References

Ansolabehere
Stephen
Hersh
Eitan
.
2010
.
“The Quality of Voter Registration Records: A State-by-State Analysis.” Available at
http://www.vote.caltech.edu/drupal/files/report/quality_of_voter_report_pdf_4c45d05624.pdf
———.
2012
.
“Validation: What Big Data Reveal about Survey Misreporting and the Real Electorate.”
Political Analysis

20
:
437
59
.
Ansolabehere
Stephen
Hersh
Eitan
Gerber
Alan
Doherty
David
.
2010
a.
“Voter Registration List Quality Pilot Study: Report on Detailed Results.” Available at
http://votingtechnologyproject.org/sites/default/files/voter_registration_list_results_pdf_4c34b18160.pdf
———.
2010
b.
“Voter Registration List Quality Pilot Study: Report on Methodology.” Available at
http://votingtechnologyproject.org/sites/default/files/voter_registration_list_methodology_pdf_4c34b18186.pdf
Belli
Robert F.
Traugott
Michael W.
Beckmann
Matthew N.
.
2001
.
“What Leads to Voting Overreports? Contrasts of Overreporters to Validated Voters and Admitted Nonvoters in the American National Election Studies.”
Journal of Official Statistics

17
:
479
98
.
Berent
Matthew K.
Krosnick
Jon A.
Lupia
Arthur
.
2011
.
“The Quality of Government Records and Over-Estimation of Registration and Turnout in Surveys: Lessons from the 2008 ANES Panel Study’s Registration and Turnout Validation Exercises.”
ANES Technical Report Series, No. nes012554.
Bernstein
Robert
Anita
Montjoy
Robert
.
2001
.
“Over-Reporting Voting: Why it Happens and Why It Matters.”
Public Opinion Quarterly

65
:
1
22
.
Blaemire
Robert
.
2012
.
“Linking Survey Data with Commercial Databases.” Transcript from National Science Foundation Conferences on “The Future of Survey Research.” Available at
https://iriss.stanford.edu/sites/default/files/blaemire_transcript.pdf
Burden
Barry C
.
2000
.
“Voter Turnout and the National Election Studies.”
Political Analysis

8
:
389
98
.
California Secretary of State
.
2009
.
“Voter Participation Statistics by County.” Available at
http://elections.cdn.sos.ca.gov/sov/2008-general/3_voter_part_stats_by_county.pdf
DeBell
Matthew
Krosnick
Jon A.
.
2010
.
“Computing Weights for American National Election Study Survey Data.” Available at
http://www.electionstudies.org/resources/papers/nes012427.pdf
DeBell
Matthew
Krosnick
Jon A.
Lupia
Arthur
.
2010
.
“Methodology Report and User’s Guide for the 2008–2009 ANES Panel Study.” Available at
http://electionstudies.org/studypages/2008_2009panel/anes2008_2009panel_MethodologyRpt.pdf
Dyck
Joshua J.
Gimpel
James G.
.
2005
.
“Distance, Turnout, and the Convenience of Voting.”
Social Science Quarterly

86
:
531
48
.
Florida Division of Elections
.
2009
.
“Voter Registration Statistics–By Election.” Available at
http://election.dos.state.fl.us/voter-registration/statistics/elections.shtml
Gera
Katie
Krosnick
Jon A.
DeBell
Matthew
.
2011
.
“Overestimation of Voter Turnout in National Surveys: An Examination of Levels of Overestimation and Relevant Explanatory Factors.”
Manuscript, Stanford University.
Gerber
Alan S.
Green
Donald P.
.
2000
.
“The Effects of Canvassing, Telephone Calls, and Direct Mail on Voter Turnout: A Field Experiment.”
American Political Science Review

94
:
653
63
.
Gimpel
James G.
Schuknecht
Jason E.
.
2003
.
“Political Participation and the Accessibility of the Ballot Box.”
Political Geography

22
:
471
88
.
Green
Donald P.
Gerber
Alan S.
.
2005
.
“Recent Advances in the Science of Voter Mobilization.”
Annals of the American Academy of Political and Social Science

601
:
6
9
.
Greenwald
Anthony G.
Carnot
Catherine G.
Beach
Rebecca
Young
Barbara
.
1987
.
“Increasing Voting Behavior by Asking People if They Expect to Vote.”
Journal of Applied Psychology

72
:
315
18
.
Greenwald
Anthony G.
Klinger
Mark R.
Vande Kamp
Mark E.
Kerr
Katherine L.
.
1988
.
“The Self-Prophecy Effect: Increasing Voter Turnout by Vanity-Assisted Consciousness Raising.”
Unpublished manuscript, University of Washington.
Groves
Robert M.
Cialdini
Robert B.
Couper
Mick P.
.
1992
.
“Understanding the Decision to Participate in a Survey.”
Public Opinion Quarterly

56
:
475
95
.
Groves
Robert M.
Couper
Mick P.
Presser
Stanley
Singer
Eleanor
Tourangeau
Roger
Acosta
G. Piani
Nelson
Lindsay
.
2006
.
“Experiments in Producing Nonresponse Bias.”
Public Opinion Quarterly

70
:
720
36
.
Haspel
Moshe
Knotts
H. Gibbs
.
2005
.
“Location, Location, Location: Precinct Placement and the Costs of Voting.”
Journal of Politics

67
:
560
73
.
Mann
Christopher B
.
2005
.
“Unintentional Voter Mobilization: Does Participation in Pre-Election Surveys Increase Voter Turnout?”
Annals of the American Academy of Political and Social Science

601
:
155
68
.
McDonald
Michael P
.
2005
.
“Reporting Bias.”
In
Polling in America: An Encyclopedia of Public Opinion
, edited by
Benjamin
and
Samuel
Best
.
Westport, CT
:
Greenwood Press
.
———.
2007
.
“The True Electorate: A Cross-Validation of Voter Registration Files and Election Survey Demographics.”
Public Opinion Quarterly

71
:
588
602
.
———.
2009
.
United States Elections Project [2008 general election turnout rates
. Available at http://www.electproject.org/2008g
]

New York State Board of Elections
.
2009
.
“2008 Election Results.” Available at
http://www.elections.ny.gov/NYSBOE/elections/2008/General/PresidentVicePresident08.pdf
North Carolina State Board of Elections
.
2009
.
“2008 General Election.” Available at
http://results.enr.clarityelections.com/NC/7937/21334/en/summary.html
Ohio Secretary of State
.
2009
.
“Voter Turnout: November 4, 2008.” Available at
http://www.sos.state.oh.us/sos/elections/Research/electResultsMain/2008ElectionResults/turnout_111808.aspx
Pasek
Josh
.
2012
.
“Linking Knowledge Networks Web Panel Data with External Data.”

Paper presented at the Future of Survey Research: A Pair of Conferences at the National Science Foundation
.
Washington, DC
. Available at https://iriss.stanford.edu/sites/default/files/pasek_transcript.pdf
Pennsylvania Department of State
.
2009
.
“Voter Registration Statistics: November 4, 2008.” Available at
http://www.dos.state.pa.us/portal/server.pt/community/voter_registration_ statistics/12725
Presser
Stanley
Traugott
Michael W.
Traugott
Santa
.
1990
.
“Vote ‘Over’ Reporting in Surveys: The Records or the Respondents?”
ANES Technical Report Series No. nes010157. Available at http://electionstudies.org/resources/papers/documents/nes010157.pdf
Sigelman
Lee
Jewell
Malcolm E.
.
1986
.
“From Core to Periphery: A Note on the Imagery of Concentric Electorates.”
Journal of Politics

48
:
440
49
.
Smith
Jennifer K.
Gerber
Alan S.
Orlich
Anton
.
2003
.
“Self-Prophecy Effects and Voter Turnout: An Experimental Replication.”
Political Psychology

24
:
593
604
.
Traugott
Michael W.
Katosh
John P.
.
1979
.
“Response Validity in Surveys of Voting Behavior.”
Public Opinion Quarterly

43
:
359
77
.
1
For example, the Florida registration form requests that applicants supply sex and race/ethnicity information, whereas the Ohio registration form does not.
2
McDonald (2007) reviewed 2004 general election records that were available from Delaware, California, the District of Columbia (DC), Florida, Iowa, Kentucky, Maryland, North Carolina, Ohio, Oklahoma, and South Carolina. He chose not to review records from Connecticut, Missouri, and New Jersey, “due to a high amount of missingness” in the data (592).
3
A respondent was eligible to participate in the 2008–2008 ANES Panel Study if he or she was a US citizen born on or before November 4, 1990, and lived in a household served by a landline telephone number at the time of recruitment.
4
See DeBell, Krosnick, and Lupia (2010) for explanations of eligibility determination. Response rate calculations include attrition. For respondents who reported being registered, we analyzed data only from people who reported being registered in the same state in which they resided.
5
We selected the six states for two primary reasons. First, we were advised that these states had relatively accurate records. Second, we were able to obtain the records at little to no cost.
6
The records we obtained were requested on the dates listed in online appendix 1 . We did not attempt to obtain government records that were current on the day of the 2008 general election for two reasons. First, recovering archived records for a specific past date required resources that some states were unwilling to allocate. Second, records recovered for a specific date may not necessarily be current or accurate as of that date. Some local jurisdictions in some states have taken up to 10 months to upload turnout data to the relevant state agency, by which time other local jurisdictions have purged records ( McDonald 2007 ).
7
This is not a problem for samples drawn from lists of registered citizens.
8
Birthdate information in the California, Florida, New York, and Pennsylvania records included day, month, and year. Ohio records included only birth year, and North Carolina records included only age on the day the state made the records available. Addresses included apartment numbers.
9
A state’s VAP is the number of people residing in a state who were 18 years or older on election day 2008. State VAP estimates were taken from McDonald (2009) .
10
11
A respondent not matched to a government record using a particular method is treated as not registered, and as having not turned out, in the estimates that the method produces.
12
We also looked for patterns in the differences between official and estimated turnout rates across states. No method produced significant differences across states (see online appendix 2 ).
13
To estimate the proportion of US citizens, 18 years old or older, who lived in a household served by a landline and who were registered in each of the target states, we began with the number of people registered to vote as of November 2008, published by each state ( California Secretary of State 2009 ; Florida Division of Elections 2009 ; New York State Board of Elections 2009 ; North Carolina State Board of Elections 2009 ; Ohio Secretary of State 2009 ; Pennsylvania Department of State 2009 ) and adjusted the published statistics (as described earlier in the text and in online appendix 4 ) to produce “true” population estimates.
14
Respondents unmatched to a government record using a particular method are considered not registered according to that method.
15
This does not necessarily indicate that a respondent’s self-report is incorrect. A respondent may have cast an absentee ballot that was rejected, and the respondent may be unaware that the ballot was rejected.
16
No respondents who reported not turning out were matched to a government record indicating that they did turn out.
17
Recent research provides an additional basis for concern about these data. Catalist uses data from a company that maintains a database of consumer records to augment registration and turnout records (see online appendix 5 ). Pasek (2012) compared consumer records purchased from three companies to self-reports from those consumers. They found substantial discrepancies between self-reports and the commercially provided records, leading to the conclusion that commercially provided records “do not seem particularly accurate.” As Catalist does not supply researchers with enough information to evaluate the extent to which their estimates are similarly afflicted, Pasek (2012) findings raise further questions about the suitability of the commercially provided data for scientific research.
18
“County” was replaced with “Washington, DC,” “Parish,” or “Borough” if the respondent lived in Washington, DC, or a state with parishes or boroughs instead of counties.
19
Each respondent selected his or her registration county and state from pulldown menus.
20
Four respondents who reported early turnout in the October survey reported having not turned out in the November survey. These respondents were coded as having voted.

### Appendix. Registration and Turnout Self-Reports

##### REGISTRATION

The January, February, June, and September 2008 waves of the ANES 2008–2009 Panel Study included questions about registration status:

“Are you registered to vote, or not?” (Yes, registered to vote; No, not registered; Don’t know)

If registered: “Your residence is located in [county]. Are you registered to vote in [county] or somewhere else?” 18 (Registered in [county]; Registered somewhere else)

If somewhere else: “In what county and state are you registered?” 19

The analyses described in this paper used each respondent’s latest answer to each of these questions. That is, self-reported registration was taken from the September 2008 survey for respondents who answered the questions during that month. Answers from the June survey were used for respondents who did not answer the questions in September. Similarly, answers from the January or February surveys were used for respondents who did not answer the questions during a later month. Respondents who did not answer the registration questions during any of the surveys are not included in the analyses we report.

##### TURNOUT

Turnout in the 2008 general election was measured using questions asked in the October and November surveys. The October survey asked:

“This question is not about the primary elections and caucuses that were held a few months ago. Instead, we’d like to ask you about the election for President to be held on November 4, in which [BARACK OBAMA / JOHN MCCAIN] is running against [JOHN MCCAIN / BARACK OBAMA]. Have you already voted in that election, or not?” (ANSWER CHOICES: “Have already voted in that election” and “Have not voted in that election”)

The order in which the major party candidates were presented was randomly determined for each respondent.

In the November survey, respondents were asked:

“The next few questions are about the presidential election that was held on November 4.

In asking people about elections, we often find that a lot of people were not able to vote because they weren’t registered, they were sick, they didn’t have time, or something else happened to prevent them from voting. And sometimes, people who usually vote or who planned to vote forget that something unusual happened on election day one year that prevented them from voting that time. So please think carefully for a minute about the election held on November 4, and other past elections in which you may have voted, and answer the following questions about your voting behavior.

Which one of the following best describes what you did in this election?

Definitely did not vote.

Definitely voted in person at a polling place on election day.

Definitely voted in person at a polling place before election day.

Definitely voted by mailing a ballot to elections officials before election day.

Definitely voted in some other way.

Not completely sure whether you voted or not.”

Respondents who said that they were not completely sure whether they voted were asked a follow-up question:

“If you had to guess, would you say that you probably did vote in the election, or probably did not vote in the election?” (“Probably voted” and “Probably did not vote”)

Respondents were coded as having turned out if they selected “Have already voted in that election” during the October survey or selected “Definitely voted in person at a polling place on election day,” “Definitely voted in person at a polling place before election day,” “Definitely voted by mailing a ballot to elections officials before election day,” “Definitely voted in some other way,” or “Probably voted” during the November survey. Among the people who answered turnout questions in both the October and November surveys ( n = 2,561), only those who selected “Have not voted in that election” during the October survey and “Definitely did not vote” or “Probably did not vote” during the November survey were labeled as having not turned out. 20 Of the 3,049 respondents, 84 percent answered both the October and November turnout questions. An additional 5 percent answered only the November turnout question, and 4 percent answered only the October question. The remaining 7 percent did not answer the turnout question during either survey. The respondents who did not answer the registration questions during any of the surveys, and respondents who reported having not turned out during the October survey and did not answer the turnout questions in the November survey, are not included in the analyses we report.