Abstract

Political commentators have offered evidence that the “polling misses” of 2016 were caused by a number of factors. This project focuses on one explanation: that likely-voter models—tools used by preelection pollsters to predict which survey respondents are most likely to make up the electorate and, thus, whose responses should be used to calculate election predictions—were flawed. While models employed by different pollsters vary widely, it is difficult to systematically study them because they are often considered part of pollsters’ methodological black box. In this study, we use Cooperative Congressional Election Study surveys since 2008 to build a probabilistic likely-voter model that takes into account not only the stated intentions of respondents to vote, but also other demographic variables that are consistently strong predictors of both turnout and overreporting. This model, which we term the Perry-Gallup and Demographics (PGaD) approach, shows that the bias and error created by likely-voter models can be reduced to a negligible amount. This likely-voter approach uses variables that pollsters already collect for weighting purposes and thus should be relatively easy to implement in future elections.

Forecasting who will actually vote is a particularly challenging problem because, unlike other estimation efforts, pollsters must make an inference about a population (all eventual voters) that does not yet exist. Unlike the standard sampling problem of identifying an existing population, selecting a suitable sampling frame that comes close to the true list of all population members, and conducting some variant of probability sampling, forecasting involves two distinct, but not independent, stages of estimation. Pollsters must identify the subset of citizens who will vote even though people do not correctly identify themselves as members of this subset. Added to this challenge is the fact that pollsters often treat their likely-voter models as proprietary, which makes it difficult for scholars to adjudicate among these models. For example, the American Association for Public Opinion Research (AAPOR) 2016 election postmortem report noted that likely-voter models may have contributed to polling errors, but due to the lack of available data the authors were unable to conduct a full exploration into this factor (Kennedy et al. 2018).

This paper builds on previous work (e.g., Keeter, Igielnik, and Weisel 2016) to develop a framework that incorporates indicators of the likelihood of voting and will be reasonably straightforward for pollsters to implement in elections. The study leverages the Cooperative Congressional Election Study (CCES) surveys taken during presidential and midterm election years over the last decade to build and evaluate different likely-voter models. Ultimately, the data argue against the common practice of hard classification of likely versus unlikely voters and in favor of weighting eligible voters’ potential vote choice by an estimated probability that they will vote, using a combination of respondent self-reported intention to vote, voting history, and demographic data that pollsters already collect. This approach, which we call PGaD (Perry-Gallup and Demographics), performs best at assessing individuals’ likelihood of voting, resulting in lower bias and improved accuracy in estimated candidate vote shares.

The Unique Problem of Likely Voters

Pollsters face a unique challenge when it comes to the task of making inferences about an electorate that has not yet formed. A survey firm’s typical approach is to sample from some larger population (e.g., adults or registered voters), then attempt to identify the subgroup from among that set of individuals who will actually vote. In practice, attempts to estimate a result for likely voters layer the problem of measurement error on top of the challenges of sampling error that pollsters already face. This is because pollsters are attempting to make an inference about a subgroup of the population that (1) often has different preferences than the larger population and (2) does not accurately identify itself when asked.

To illustrate the second point, table 1 indicates responses to a question on the 2016 CCES asking respondents whether they intend to vote in the upcoming general election. About four in five respondents report that they will definitely vote or have already voted (early). Yet, only about two-thirds of this group could later be matched to a valid vote record. Others have estimated that one out of every four nonvoters reports that they will vote (Freedman and Goldstein 1996). Such a degree of misidentification prevents pollsters from effectively using the question to filter out nonvoters. Indeed, if between 25 percent and 33 percent of respondents misreported their employment status, we would have little confidence in using surveys to draw inferences about the unemployed.

Table 1.

Validated turnout by intention to vote as reported in 2016 CCES

Do you intend to vote in the 2016 general election?Validated turnout (%)
Yes, definitely (78% of the sample)64
I already voted (3%)68
Probably (7%)29
Undecided (5%)18
No (7%)9
Do you intend to vote in the 2016 general election?Validated turnout (%)
Yes, definitely (78% of the sample)64
I already voted (3%)68
Probably (7%)29
Undecided (5%)18
No (7%)9

Note.—N = 64,600 respondents to the 2016 CCES. Percentages calculated using poststratification sampling weights.

Table 1.

Validated turnout by intention to vote as reported in 2016 CCES

Do you intend to vote in the 2016 general election?Validated turnout (%)
Yes, definitely (78% of the sample)64
I already voted (3%)68
Probably (7%)29
Undecided (5%)18
No (7%)9
Do you intend to vote in the 2016 general election?Validated turnout (%)
Yes, definitely (78% of the sample)64
I already voted (3%)68
Probably (7%)29
Undecided (5%)18
No (7%)9

Note.—N = 64,600 respondents to the 2016 CCES. Percentages calculated using poststratification sampling weights.

The people most likely to overreport voting (to say they will vote when they will not) are “those who are under the most pressure to vote” (Bernstein, Chadha, and Montjoy 2001, p. 41). For example, Ansolabehere and Hersh (2012, p. 449) find that “well-educated, high-income partisans who are engaged in public affairs, attend church regularly, and have lived in the community for a while” retrospectively misreport their voting behavior.

Misreporting poses a problem for pollsters because it introduces systematic error—people who vote more often are demographically and politically different than people who vote less frequently (Keeter, Igielnik, and Weisel 2016)—despite a lack of substantial differences between self-reported voters and self-reported nonvoters (Rogers and Aida 2014). Since many of the variables tied to misreporting also are associated with partisanship, polls that do not attempt to infer likely voters generally overestimate support for Democrats (Newport 2000). Thus, simple reliance on self-reported voting intention can produce biased estimates. Furthermore, volatility in candidate preference may be greater among likely and unlikely voters, taken separately, than would be indicated when considering all registered voters in the aggregate, with preferences across these groups not necessarily moving in tandem (Erikson, Panagopoulos, and Wlezien 2004). To illustrate, among people who said they would definitely vote or had already voted in the 2016 preelection wave of the CCES, Hillary Clinton held a 7.6-point margin over Donald Trump. However, among those confirmed to have eventually voted, her margin was just 4.5 points. Thus, the issue that pollsters face is clear: Respondents do not accurately indicate whether they will actually become voters, and this misreporting is not independent of candidate preference.1

Existing Approaches to Identifying Likely Voters

Most likely-voter models utilize responses to a combination of questions about an individual’s vote intent, voting history, and interest in politics. Approaches vary, from a single vote-intention question to composite scores based on several questions (Keeter, Igielnik, and Weisel 2016), including ones on past voting behavior (Freedman and Goldstein 1996; Murray, Riley, and Scime 2009) and others gauging knowledge about the voting process, such as where the respondent’s polling place is (Kiley and Dimock 2009). One prominent example is the Perry-Gallup index, based on seven questions capturing how much thought respondents have given the upcoming election, whether they have ever voted in their current district, how closely they follow government and public affairs, frequency of voting, their vote intention for the upcoming election (two questions), and self-reported voting in the previous election (Keeter, Igielnik, and Weisel 2016). Respondents are assigned points based on their responses to those questions, and those who achieve a certain number of points are considered likely voters. Deterministic likely-voter models—typically called “threshold” or “cutoff” models—create a likely-voter score for each respondent and then create a decision rule for whether to include or exclude a response when calculating election predictions.

On one hand, the simplicity of deterministic models is a virtue. Some respondents end up voting and some will not, so it seems sensible to model this reality by including responses of those most likely to vote and excluding responses from those least likely to vote. Deterministic models are easy to explain to a broad audience because they resemble how an election works. On the other hand, cutoff approaches suffer from the loss of information inherent in labeling each respondent as either clearly a voter or a nonvoter. Probabilistic models, which are less frequently implemented by political polling firms, offer clear benefits. Each respondent is assigned an estimated probability that they will vote. This probability is then used as a weight: Responses from those who are more likely to vote are weighted more heavily than responses from those who are unlikely to vote, but all are included in the election prediction. Pollsters who employ such a strategy generally use the same predictors from deterministic models to predict validated voting in previous election cycles, applying the resulting model to current survey data to generate a predicted probability that each respondent will vote in the upcoming election (Keeter, Igielnik, and Weisel 2016).

Probabilistic approaches serve as a compromise between registered-voter and likely-voter methods; the preferences of all respondents are utilized only to an extent proportional to their assessed probability of actually voting. This is a principled way to actually estimate the behavior of a yet-to-be-determined population of interest. Put simply, if we want to estimate the proportion of voters who will vote for Candidate X, we should be ascertaining the probability that each eligible voter will vote for Candidate X and then take the mean over all voters. But the probability of voting for Candidate X is really a joint probability of voting and specifically voting for that candidate. Letting V = the event a person votes and X = the event of preferring candidate X, and applying the general product (or chain) rule, a basic probabilistic principle, we have P (V, X) = P (X|V)P (V). That is, the joint probability that a person will vote and cast that vote for Candidate X is simply the probability of casting a vote for Candidate X given they vote at all, multiplied by the probability that they vote. Since this cannot be estimated for every eligible voter, we sample eligible voters and apply poststratification sample weights in the usual way. From this perspective, the estimated probability of voting is not just another kind of weight to be added to sampling weights; instead, it is an essential component of a simple model for the as-yet-unseen population.

A Demographics-Informed Probabilistic Model of Likely Voters

Although probabilistic models are seen as theoretically superior to deterministic models, how they fare empirically remains to be seen. To improve their prospects for success, we propose augmenting the set of questions typically used to assess likelihood of voting. In particular, pollsters regularly ask respondents a number of questions for the purpose of poststratification weighting that can also serve as powerful predictors of turnout. A vast literature demonstrates the primacy of socioeconomic status (education in particular) and age in differentiating voters from nonvoters (Verba and Nie 1972; Wolfinger and Rosenstone 1980; Blais 2006; Leighley and Nagler 2013). Political activism, ideological extremism, and race are related to likelihood of voting as well (Verba and Nie 1972). More importantly, conditioning on demographic features allows conventional features such as vote intention to be more informative, as propensity to accurately assess one’s own future voting behavior may depend on these features.

Thus, pollsters should incorporate these features into their likely-voter models. Specifically, we propose a probabilistic likely-voter model that not only uses the typical items (such as those from the Perry-Gallup index), but also adds demographic information about respondents such as age, education, race, income, gender, and strength of partisanship. Since these variables are consistently related to turnout and may provide valuable context for responses to conventional indicators, they should help produce more precise likely-voter probabilities. Indeed, as shown later, this approach provides a substantial improvement over more commonly employed likely-voter models.

Design and Modeling Approaches

In order to build and assess the performance of likely-voter models, we use CCES surveys fielded during presidential (2016, 2012, and 2008) and midterm (2014 and 2010) elections over the past decade. The CCES has as its target population US citizens,2 and offers unique data that allow for testing likely-voter models.

First, the CCES employs vote validation, considered the gold standard for studying individual-level election turnout. After the election, CCES respondents are matched into Catalist’s national voter file database, consisting of over 240 million unique voting-age individuals compiled from state voter-registration files and commercial records. The Catalist database allows clients to identify with a high level of accuracy the individuals who have voted in particular elections and those who have not (Ansolabehere and Hersh 2012; Enamorado and Imai 2019). A validated vote record for each respondent is the key dependent variable for the probabilistic likely-voter models: Respondents are coded as being voters if successfully matched to the voter file and had a record of voting in the election, and coded as nonvoters if matched to the file but lack a record of voting or if unmatched to the file.

Berent, Krosnick, and Lupia (2016) raise concerns about whether vote validation provides a true picture of who voted in an election. Specifically, their own validation attempts lead them to suspect that many people who claim to vote but who are unmatched to a government record may in fact be voters who cannot be matched to the file for other reasons. However, subsequent studies have cast some doubt on this conclusion. For example, Jackman and Spahn (2019) and Enamorado and Imai (2019) find that overreporting is the most important factor accounting for high self-reported turnout rates in the ANES. Additionally, Enamorado, Fifield, and Imai (2018) find that probabilistic matching techniques (like those used by Catalist) are highly accurate, but that deterministic matching approaches such as those used by Berent, Krosnick, and Lupia (2016) are more prone to failed matches.3 While we acknowledge that some unmatched individuals may be voters, we suspect that this number is quite small and are confident that the validated vote provides a significantly better measure of turnout than self-reports.

Second, the CCES is a high-quality nationally representative large-N survey, which allows for generalizability of the sample not only to the national population, but also to the populations in each state. Across the five CCES surveys we consider, there are 263,535 observations. After missing values are removed, the final data set contains 259,940 observations across five election cycles. The state with the fewest total respondents over the five years considered (Wyoming) still includes nearly 500 valid observations across the several elections. Most states have thousands of observations across these elections.

Finally, the CCES asks a wide range of demographic and attitudinal questions that may be useful for identifying likely voters based on the literature about misreporting and turnout. The survey items include vote intention, vote choice, self-reported voter registration status and voting history, interest in politics, age, gender, education, race, income, and partisanship.

Despite these advantages, there is a small degree of inconsistency in the operationalization of the vote history variable. For 2012 and 2016, the indicator variable is coded 1 if a respondent reported that they voted in the previous presidential election and 0 if they were unsure, did not recall, or reported that they did not vote. In 2010 and 2014, it is coded similarly, using a respondent’s vote history in the previous presidential election rather than the previous midterm election. In 2008, respondents were not asked about their voting behavior in the 2004 presidential election, so whether or not they reported that they voted in a primary election or caucus in 2008 is used instead. Although this is not a perfect substitute, there is substantial correlation between the two questions in the 2016 sample.4

Our analysis compares several approaches to defining likely voters. For each approach, we evaluate how well the models predict individual-level turnout and how well the models produce accurate estimates of the vote margin for president in 2016 and for the House of Representatives (in the case of our analysis of the 2014 midterm elections). For the purposes of predicting 2016 turnout, we use data from 2008, 2010, 2012, and 2014 as the training data. For predicting 2014 turnout, we use 2008, 2010, and 2012 as the training data.

In examining how well each likely-voter approach approximates the vote margin, we operationalize that margin as the difference between the percentage of validated voters in the CCES preelection poll who said they intended to vote for the Democratic candidate minus the percentage who intended to vote Republican. This benchmark is preferred over actual election outcomes because our interest is in examining how well we can approximate the electorate’s preference as it stood when the poll was conducted. Last-minute shifts in vote preferences or faulty poststratification weighting might cause a poll to miss the final election outcome even if that poll perfectly predicted who would vote. Since the goal is to evaluate the effectiveness of each likely-voter model, eliminating these other sources of error from consideration is desirable. Focusing on the vote share among the respondents in our survey who actually voted is the best way to do this.

This study considers four approaches to modeling likely voters. In the first and simplest approach, vote intention responses are compared to a threshold value in order to identify likely voters. Anyone who claims that they will definitely vote or already have done so is considered a likely voter. Lowering the threshold to include those who say simply that they will “probably” vote provides a far less accurate prediction of our quantities of interest than limiting the pool of likely voters to those in the “already voted” and “definitely” categories.5

The second cutoff approach is a reformulation of Pew’s Perry-Gallup index. The CCES does not contain all of the questions that are used in the index, and the question wording varies for the items that do appear. As an approximation of the Perry-Gallup approach, we use vote intent, self-reported turnout in a previous election, and political interest from the CCES to create a version of this index. Together these questions capture five of the seven items on the Perry-Gallup index; the one dimension they do not capture is self-reported historical voting behavior, as the CCES does not ask whether a respondent has voted in their district before or about their voting frequency. In this approach, respondents are assigned points based on the following criteria: two points for those who reported that they already voted (early or absentee) and those who said they will “definitely” vote, and one point for those who will “probably” vote in the election. Respondents who reported that they voted in the previous election are awarded one point. Those who follow what is going on in government and public affairs “most of the time” are given two additional points, while those who follow “some of the time” are awarded one additional point. We make two further adjustments. We give respondents who report that they are registered to vote one point. Further, since respondents who are younger than 22 would not have had the chance to vote in the previous election, they are given one additional point. The minimum score in this version of the Perry-Gallup index is zero, while the maximum score is six. We create likely-voter subsets based on these scores and compute the accuracy of our predictions. Here, we focus on two different cutoffs: (1) including only those who score a 6, or (2) including anyone who scores either a 5 or 6. These two groups provide the most accurate estimates of turnout and vote margins among the possible cutoff points (but see the Supplementary Material for results from all possible cutoffs).

The final two models estimated are the two probabilistic likely-voter models. Both of these models employ random forests, a powerful machine-learning tool that relies upon a large number of decision trees, each fed with a random subset of the data and a random subset of all possible variables at each split, that can be used to compute vote-propensity scores much in the same way that logistic regression can be used.6 The benefit of random-forest algorithms is twofold. First, since the individual decision trees on which they are based (classification trees in the current setting) employ bootstrap aggregation (or “bagging”)—randomly sampling the data and the available predictor variables—they avoid much of the bias that traditional decision-tree approaches encounter, which make them especially useful for prediction. Second, individual classification trees tend to lead to predictions that, while unbiased, suffer high variance. By pooling the predictions of many such trees—that is, a “random forest”—the resulting estimates tend to be both unbiased and have low variance. The random-forest algorithm outputs predict class probabilities for each respondent; that is, each respondent is assessed a probability that they will vote. We then weight each respondent according to that probability (as well as according to their poststratification sampling weight). Thus, an individual who received a vote-propensity score of 0.2 would contribute only one-fourth as much influence to the likely-voter estimates as an individual with a vote-propensity score of 0.8.

The first probabilistic random-forest model that we produce uses the variables that constitute the aforementioned modified Perry-Gallup index. The second model includes all of these items, but adds a set of common demographic variables that most pollsters routinely collect for the purposes of weighting and analysis. These variables include age, race, education, gender, family income, and partisan strength.

For the national validation, we pool all of the observations together as if they were fielded as part of a national poll. The models’ performance is evaluated using the full national sample in 2016, and the random-forest models are trained using all of the data from the previous CCES surveys. We also estimate state models, evaluating each type of likely-voter model 51 times (once for each state plus the District of Columbia). We train each state’s model with historical data from that state only to isolate its unique characteristics.

Results

Table 2 summarizes how each of the various likely-voter approaches fares in predicting the 2016 outcomes, both nationally and by state. The first column presents implied turnout, the percent of eligible voters expected to turn out under each model.7 For the cutoff models, this is a straightforward calculation indicating the share of respondents who meet a particular criterion. For example, 70.78 percent of the 2016 CCES respondents indicated that they had already voted or would definitely vote. For the probabilistic models, we take the average of the turnout propensity scores to calculate the implied turnout among the sample. As a baseline, 55.11 percent of the 2016 CCES respondents were validated voters.

Table 2.

Democratic bias in survey estimates of presidential vote preference based on different approaches to defining likely voters, 2016

ApproachImplied turnout(%)National bias(%)Avg. bias by state(%)Avg. absolute error by state(%)Predictive accuracy(A)
Cutoff approaches
Already voted + will definitely vote70.783.592.464.44–0.084
Perry-Gallup 6s41.66–1.70–2.766.190.041
Perry-Gallup 6s + 5s60.262.051.183.98–0.046
Probabilistic approaches
Perry-Gallup66.553.291.754.17–0.075
Perry-Gallup + Demographics59.86–0.19–0.364.020.007
ApproachImplied turnout(%)National bias(%)Avg. bias by state(%)Avg. absolute error by state(%)Predictive accuracy(A)
Cutoff approaches
Already voted + will definitely vote70.783.592.464.44–0.084
Perry-Gallup 6s41.66–1.70–2.766.190.041
Perry-Gallup 6s + 5s60.262.051.183.98–0.046
Probabilistic approaches
Perry-Gallup66.553.291.754.17–0.075
Perry-Gallup + Demographics59.86–0.19–0.364.020.007

Note.—Based on data provided by http://www.electproject.org, the turnout rate among the target population of nonincarcerated US citizens was 59.4 percent. A total of 55.1 percent of 2016 CCES respondents were validated voters.

Table 2.

Democratic bias in survey estimates of presidential vote preference based on different approaches to defining likely voters, 2016

ApproachImplied turnout(%)National bias(%)Avg. bias by state(%)Avg. absolute error by state(%)Predictive accuracy(A)
Cutoff approaches
Already voted + will definitely vote70.783.592.464.44–0.084
Perry-Gallup 6s41.66–1.70–2.766.190.041
Perry-Gallup 6s + 5s60.262.051.183.98–0.046
Probabilistic approaches
Perry-Gallup66.553.291.754.17–0.075
Perry-Gallup + Demographics59.86–0.19–0.364.020.007
ApproachImplied turnout(%)National bias(%)Avg. bias by state(%)Avg. absolute error by state(%)Predictive accuracy(A)
Cutoff approaches
Already voted + will definitely vote70.783.592.464.44–0.084
Perry-Gallup 6s41.66–1.70–2.766.190.041
Perry-Gallup 6s + 5s60.262.051.183.98–0.046
Probabilistic approaches
Perry-Gallup66.553.291.754.17–0.075
Perry-Gallup + Demographics59.86–0.19–0.364.020.007

Note.—Based on data provided by http://www.electproject.org, the turnout rate among the target population of nonincarcerated US citizens was 59.4 percent. A total of 55.1 percent of 2016 CCES respondents were validated voters.

The second and third column entries indicate observed Democratic bias (Republican bias if negative) corresponding to each likely-voter model as applied to national and state polls, respectively. Democratic bias is simply the difference between the margin by which a set of likely voters preferred Clinton over Trump and Clinton’s actual margin over Trump among all validated voters in the sample. Recall that Clinton’s margin over Trump among validated voters in the sample—rather than the actual election outcome—serves as our baseline and allows us to compare likely-voter models with respect to exhibited bias. Using the margin among validated voters in the sample allows us to eliminate other reasons that the survey may have produced an inaccurate result, such as sample composition or systematic changes in vote preferences after the survey was conducted. Among validated voters in the 2016 CCES, Clinton enjoyed a 2.92 percentage point advantage over Trump, compared with her actual margin of 2.1 percentage points. The fourth column calculates the average absolute error for the state-by-state modeling. Finally, we include A, a measure of predictive accuracy introduced by Martin, Traugott, and Kennedy (2005) and applied in the recent—and most comprehensive study to date—of 2016 preelection poll accuracy and bias (Panagopoulos, Endres, and Weinschenk 2018). A serves as a standardized metric that is comparable across different election outcomes.8

BASIC CUTOFF APPROACHES

The most basic approach to determining likely voters is simply to ask survey respondents if they intend to vote and then identify as voters those who say that they will definitely vote or, in a less restrictive case, those who will probably or definitely vote. Table 2 shows the results from identifying only those who said that they had already voted or would definitely vote as likely voters. Even with this most restrictive definition, 71 percent of CCES respondents would be considered likely voters. Additionally, this simple determination of likely voters leads to overstating Clinton’s national margin by 3.59 percentage points.

The Perry-Gallup approach to establishing a cutoff value for who should qualify as a likely voter relies on multiple indicators, but combines them in an ad hoc manner. The most restrictive Perry-Gallup cutoff method, “Perry-Gallup 6s” in table 2, counts as likely voters only those scoring the maximum score of 6 on the index. Under this approach, just 41.66 percent of CCES respondents are identified as likely voters. With this restrictive approach, Clinton’s margin is estimated to be 1.70 percentage points lower than it actually was. The more inclusive approach (counting those who score either a 5 or a 6) leads to 60.26 percent implied turnout. With this group of likely voters, Clinton’s margin is predicted to be 2.05 percentage points larger than it was among validated voters in the survey.

The third and fourth columns in table 2 show the average Democratic bias and the average absolute error across the states when we generate likely-voter sets state-by-state. For the first two cutoff models, the average bias exceeds two percentage points (the first favoring Democrats and the second favoring Republicans), while it drops to just 1.18 points for the lower Perry-Gallup threshold (6s + 5s). The average absolute error is also smallest for the more inclusive Perry-Gallup cutoff model. Among analyzed cutoff approaches, the Perry-Gallup versions work better than relying only on intention to vote, though for a national estimate the more restrictive cutoff appears in this case superior while for the state-by-state analysis the less restrictive approach produces less bias/error.

PROBABILISTIC MODELS

We turn next to probabilistic approaches to addressing variation in likelihood of voting. We show results from two different types of probabilistic models—one based on Perry-Gallup variables alone and another that adds demographic predictors. We use the random-forest approach to predict validated turnout in previous years, then implement the fitted model to estimate each respondent’s probability of voting in the test year. The resulting vote-propensity scores are then incorporated as an additional weight in estimating the vote margin between Clinton and Trump.

Table 2 shows that the Perry-Gallup probabilistic approach does not improve upon either Perry-Gallup cutoff method for the 2016 election. Indeed, using the Perry-Gallup items in a probabilistic model vastly overestimates implied turnout (66.55 percent) and produces a pro-Democratic bias of more than three percentage points, worse than either of the Perry-Gallup cutoff models.

Finally, we show the results from our proposed PGaD approach, which uses the Perry-Gallup items plus a variety of demographic variables to generate a vote-propensity score for each respondent. Table 2 demonstrates the value of this approach. First, this model produces an implied turnout of 59.81 percent, which is closer to the actual turnout of 59.86 percent among 2016 CCES respondents than all other approaches under consideration. More importantly, the bias produced by this approach is negligible. Indeed, the national vote share from the Perry-Gallup plus demographics model is just two-tenths of a percentage point off the actual margin among voters in the 2016 CCES. In the state-by-state analysis, the PGaD model’s average bias is just 0.36, with the average absolute error across the states a reasonable 4.02 points. Finally, A is much closer to zero for PGaD than any of the other approaches, at 0.007. The others range in magnitude from 0.041 to 0.084. According to Panagopoulos, Endres, and Weinschenk (2018, p. 161), the average A for 2016 preelection polls was 0.024.

EVALUATING THE PROBABILISTIC MODELS

Figure 1 evaluates how well each of the probabilistic models fares at predicting actual turnout by plotting respondents’ vote-propensity scores against actual turnout rates. An extremely accurate model would produce a line that fell exactly along the dashed 45-degree diagonal line. The left-side plot shows why only using the Perry-Gallup items fails to produce particularly accurate results. This model struggles to sort voters from nonvoters in the middle of the vote-propensity scale. In fact, turnout rates were actually higher among respondents receiving a vote-propensity score of 40 than they were for those receiving a propensity score of 60. The model also underperforms among those to whom it assigns a high probability of voting, perhaps due to the fact that people tend to overreport on many of the key Perry-Gallup items. The line for the Perry-Gallup plus demographics (PGaD) model tracks much closer to the 45-degree diagonal and is monotonically increasing. Overall, the model slightly underpredicts turnout among those assigned a propensity score of 40 or lower and slightly overpredicts for those at 50 or above.

Validated turnout rate based on respondent’s turnout propensity score for 2016 CCES test. Propensity scores generated using random-forest models.
Figure 1.

Validated turnout rate based on respondent’s turnout propensity score for 2016 CCES test. Propensity scores generated using random-forest models.

Figure 2 shows the distribution of predicted vote-propensity scores for 2016 CCES respondents generated by each model, providing additional evidence of PGaD’s value. Using Perry-Gallup items only, the resulting voting-propensity scores cluster near 0 and 1, with very few respondents in between; that is, little uncertainty is admitted. Adding demographic features to the model provides much more nuance. Indeed, the number of respondents assessed a probability of 1 (certainty) of voting drops by half and cases are more smoothly distributed across the range of possible propensities. Overall, the Perry-Gallup items alone provide a fairly blunt instrument that fails to reflect and incorporate uncertainty. PGaD better capitalizes on the probabilistic framework.

Distribution of propensity scores using each probabilistic model, 2016.
Figure 2.

Distribution of propensity scores using each probabilistic model, 2016.

Which items are most influential in helping us identify likely voters? One useful metric produced by a random forest is the importance of each variable for predicting the outcome. Here we use the Mean Decrease in Accuracy metric. The basic logic of this measure is to see how much less predictive the random forest is at correctly classifying observations when each variable is randomly perturbed. If the predictive accuracy of the model drops considerably when a variable’s values are randomly reassigned, then that variable is deemed to be more important for predicting the dependent variable.

Figure 3 shows variable importance plots for both random-forest models that we estimated for the 2008, 2010, 2012, and 2014 data sets. The plot on the left shows the importance of each of the Perry-Gallup items used in our simple model. The plot on the right includes this set of variables as well as the demographic items from our more complex model. Larger values on the x-axis indicate that the variable is more predictive of being a validated voter.

Variable importance plots showing importance of each item to predicting validated turnout, 2016.
Figure 3.

Variable importance plots showing importance of each item to predicting validated turnout, 2016.

Note that a respondent’s stated intention to vote is the most important predictor in both models. As noted, this is largely because the question is a very good predictor of nonvoting; if people say that they do not intend to vote, it is very likely the case that they will in fact be a nonvoter. However, the vote-intention question is less powerful as a predictor of voting, since many people who say that they will definitely vote ultimately fail to do so. Among the Perry-Gallup items, registration status is the next most predictive variable, followed by interest in politics, then turnout history.

However, recall that the Perry-Gallup items alone fail to sufficiently distinguish between voters and nonvoters, producing a fairly large pro-Democratic bias in 2016. By contrast, adding demographics to the model reduced the bias considerably and decreased the implied turnout rate toward the actual value among 2016 CCES respondents. The right-side plot in figure 3 shows that among all of the additional demographic items, race and age were the most important for helping discern voters from nonvoters (and those two variables were second only to vote intention in terms of overall importance).

Why does adding demographic variables produce such an improvement in the probabilistic model? As shown, the vote-intention question is quite useful at identifying nonvoters: Only 9 percent of individuals who say they do not plan to vote nonetheless show up as validated voters. Unfortunately, other responses to the vote-intention question are not nearly as predictive. Only 29 percent of people who report that they will probably vote actually do so, and not even two in three who say they will definitely vote actually turn out. This is where the demographics help the model—by separating out people whose reported intention to vote is a stronger (or weaker) signal about what they will actually do. The hierarchical nature of decision trees allows them to capture nonlinear interactions where these have great predictive value and leave them out where they do not. It is worth emphasizing that the accuracy of models trained on data from previous elections should not be especially sensitive to changes in demographic turnout (e.g., a surge in young voters or a drop in male voters). This is because any information about propensity to vote that is captured by a demographic variable such as age, race, or gender will be employed conditional on more causally proximate variables such as intention to vote, vote history, registration, and political interest. Furthermore, the hierarchical structure of the trees composing the random forest allows for interactions to be captured, and we believe this is where demographics offer the greatest potential boost to prediction. To the extent that the information conveyed by variables such as vote intention and history may vary by characteristics correlated with demographics, our approach can take this into account.

Figure 4 demonstrates the value of adding a variable such as age to the model. In particular, age is strongly predictive of turnout among two key groups—those who say they will definitely vote and those who say they will probably vote. For example, among respondents who were 70 years old and who said they would definitely vote, 80 percent actually voted. However, among respondents who were 20 years old and said they would definitely vote, turnout was just about 50 percent. Our underlying assumption, then, is that useful multivariate relationships are stable between elections—for example, we trust that 70-year-olds who say they will definitely vote in the next election won’t suddenly change places with 20-year-old “definite” voters, with the former sitting out in large numbers and the latter now much more likely to be true to their word.

Actual turnout of 2016 CCES respondents based on age and intention.
Figure 4.

Actual turnout of 2016 CCES respondents based on age and intention.

The role of age is even more clear when one compares percent voting as a function of age for probably, undecided, and no. Among 20-year-old respondents, there is no difference in turnout rates among those saying they do not plan to vote and those saying they are undecided. Yet, among 60-year-old respondents, the undecided group is twice as likely to vote as those saying no. Meanwhile, among 20-year-old respondents, respondents saying they will probably vote are about twice as likely to vote as those saying they will not vote. Yet, among 60-year-old respondents the “probably” group is four times more likely to vote than the no group.

EXTENDING THE MODEL TO A MIDTERM ELECTION

So far, we have demonstrated that our probabilistic Perry-Gallup plus demographics (PGaD) approach produces accurate projections about turnout and the presidential vote margin in 2016. However, a reasonable question is whether this approach would work equally well in a non-presidential election year. To investigate, we reproduced the analysis from above, but this time focusing on predictions for the 2014 CCES midterm election survey. For this validation task, we trained our models only on the 2008, 2010, and 2012 CCES surveys and then used those models to make prediction about turnout in 2014. Our measure of bias for this analysis comes in the form of the national vote for the House of Representatives. In our 2014 CCES sample, 50.1 percent of respondents were validated voters and they preferred Republican House candidates over Democrats by a margin of 5.57 percentage points.

Table 3 shows the implied turnout and national Democratic bias estimates for each approach to defining likely voters. In almost every case, the implied turnout rates are much higher than the actual percentage of respondents who were validated as voters. Part of the reason for this discrepancy may be due to the fact that our model is trained on data from one (relatively high-turnout) midterm and two presidential elections, whereas 2014 saw the lowest turnout rate of any federal election since 1942.9 Nevertheless, PGaD again produces the lowest bias in predicting the national House vote margin compared to the other approaches. In this case, the approach produces just a half-percentage-point pro-Democratic bias in terms of predicting the House vote margin and clearly has the lowest-magnitude A-value. The next most accurate approach is the Perry-Gallup cutoff that includes respondents scoring 5 points or higher. With this approach, the pro-Democratic bias was 1.20 percentage points. In both 2014 and 2016, the PGaD does the best, keeping bias well below 1 percent and A at magnitude 0.01 or lower.

Table 3.

Democratic bias in survey estimates of House vote based on different approaches to defining likely voters, 2014

ApproachImplied turnout(%)National bias(%)Predictive accuracy(A)
Cutoff approaches
Already voted + will definitely vote73.012.41–0.058
Perry-Gallup 6’s44.26–5.060.109
Perry-Gallup 6’s + 5’s66.351.20–0.046
Probabilistic approaches
Perry-Gallup75.112.42–0.058
Perry-Gallup + Demographics69.890.50–0.011
ApproachImplied turnout(%)National bias(%)Predictive accuracy(A)
Cutoff approaches
Already voted + will definitely vote73.012.41–0.058
Perry-Gallup 6’s44.26–5.060.109
Perry-Gallup 6’s + 5’s66.351.20–0.046
Probabilistic approaches
Perry-Gallup75.112.42–0.058
Perry-Gallup + Demographics69.890.50–0.011

Note.—Based on data provided by http://www.electproject.org, the turnout rate among the target population of nonincarcerated US citizens was 36.2 percent. A total of 50.1 percent of respondents to the 2014 survey were validated voters.

Table 3.

Democratic bias in survey estimates of House vote based on different approaches to defining likely voters, 2014

ApproachImplied turnout(%)National bias(%)Predictive accuracy(A)
Cutoff approaches
Already voted + will definitely vote73.012.41–0.058
Perry-Gallup 6’s44.26–5.060.109
Perry-Gallup 6’s + 5’s66.351.20–0.046
Probabilistic approaches
Perry-Gallup75.112.42–0.058
Perry-Gallup + Demographics69.890.50–0.011
ApproachImplied turnout(%)National bias(%)Predictive accuracy(A)
Cutoff approaches
Already voted + will definitely vote73.012.41–0.058
Perry-Gallup 6’s44.26–5.060.109
Perry-Gallup 6’s + 5’s66.351.20–0.046
Probabilistic approaches
Perry-Gallup75.112.42–0.058
Perry-Gallup + Demographics69.890.50–0.011

Note.—Based on data provided by http://www.electproject.org, the turnout rate among the target population of nonincarcerated US citizens was 36.2 percent. A total of 50.1 percent of respondents to the 2014 survey were validated voters.

SIMULATIONS

As a final method of comparing these approaches, we sought to examine how each model would perform on sample sizes that pollsters more typically deal with and across multiple samples. To do this, we took 1,000 random samples each from the 2016 CCES data with sample sizes of 500, 800, 1,000, and 1,200. For each sample, we calculated the Clinton vote margin produced by each of the five models. Figure 5 shows the distribution of vote margin estimates for each of these approaches at each of the four different sample sizes. Notably, the plots show that the PGaD model provides the best combination of unbiased estimates paired with lower variability. The cutoff method of taking only those scoring 6’s on the Perry-Gallup scale has, on average, low bias, but a wider distribution. This means that, on average, that approach will produce more error. For example, among the simulated samples with N = 1,000, the PGaD method produced a vote-share margin that was within three points of the true margin 43 percent of the time, while the Perry-Gallup cutoff approach did so just 34 percent of the time. Similarly, the average absolute error of the PGaD approach is at least 0.75 points lower than the Perry-Gallup cutoff (and all other methods) at each sample size. More detailed statistics from these simulations are available in the Supplementary Material.

Distribution of vote margin estimates of simulated samples from 2016 CCES.
Figure 5.

Distribution of vote margin estimates of simulated samples from 2016 CCES.

Conclusion

We have compared a number of common approaches to the likely-voter problem, including three examples of predominant cutoff/threshold techniques and two probabilistic models. The most successful Perry-Gallup-style threshold-based classification of likely voters led to a Democratic estimation bias for the national vote of 1.20 percent for the midterm election (2014) and 2.05 percent for the presidential election (2016). Among the probabilistic models, our Perry-Gallup + Demographics approach (PGaD) beats all the other estimates. The absolute bias (or aggregate error) is no more than a half-percent for each election. Furthermore, PGaD has the lowest A, by far, of all methods considered, meaning that the odds of a randomly selected validated voter casting their vote for the Republican versus Democratic candidate is extremely close to the corresponding odds based on estimates generated via PGaD.

Based on these results, we encourage pollsters to consider implementing likely-voter models (such as PGaD) that maximize the full amount of information at their disposal. Specifically, a likely-voter model that is probabilistic uses information from all respondents in the sample rather than discarding those that fail to meet a particular threshold. And a likely-voter model that makes use of demographic information for its predictions takes advantage of data that most pollsters collect anyway and which happen to be good predictors of turnout and overreporting. Because the PGaD model does make use of this additional information, it not only produces minimal bias, but in our simulations it also produces vote share estimates that are less variant across repeated samples. In other words, PGaD offers reduced bias and increased stability over traditional likely-voter models. Furthermore, this method is relatively simple to implement—we provide a road map in table 4. And while we present a random-forest approach in the paper, one may also produce such predictions with a basic logit or probit model.10 Ultimately, our findings suggest that future election predictions can be made more accurate by taking a more information-rich modeling approach to identifying likely voters.

Table 4.

A practical guide to implementing the PGaD model

StepGuidance
Design elementsAs it is necessary to use validated votes as the dependent variable when building the PGaD model, we assume that it is possible to match surveys from previous election cycles to voter files (or to access comparable publicly available surveys that have been matched to voter files, such as the CCES). It is also important to use validated vote survey data from at least two election cycles so that it is possible to have variation in the covariates so the model does not simply optimize parameters to reduce error for one election cycle. This approach was designed and tested to be effective for surveys that sample the voting-eligible population. Accordingly, we include self-reported measures of vote history and registration as covariates in our model. These may be also available through administrative data, so if a survey is sampling from a voter file, there may be another optimal method to incorporate these variables. In order to follow our approach exactly, it would be necessary to include the same questions as detailed in the Supplementary Material.
Data cleaning and codingSurvey data should be subset to US citizens before making predictions. Validated vote Matched to voter file and voted; Un-matched to voter file or matched to voter file and did not vote Intent to vote Yes, definitely; Probably; I already voted (early or absentee); No Vote history Did vote in most recent, similar election; Did not vote in most recent, similar election Voter registration Registered; Not registered Political interest (I follow what’s going on in government and public affairs...) Most of the time; Some of the time; Only now and then; Hardly at all/Don’t know Gender Male; Female AgeContinuous
Race White; Black; Hispanic; Asian; Native American; Other Education Did not graduate from high school; High school graduate; Some college, but no degree (yet); 2-year college degree; 4-year college degree; Postgraduate degree (MA, MBA, MD, JD, PhD, etc.) Income Under $40k; $40k–$100k; Over $100k; Prefer not to say Strength of partisanship Strong; Not Strong; Independent leaner; True independent
Fitting the modelThe model stores each of the decision trees it creates using each bagged sample. To predict, each new observation is run through each of these decision trees and all of the final classifications are saved. Then we can take the proportion of times each observation was predicted to be a voter and treat it as a predicted probability that the observation will be a voter. We use the randomForest package in R to perform our model training. When we wish to predict vote-propensity scores, using R’s built-in predict method, we specify prob=TRUE, which indicates that the estimated probability that each observation falls into either of the two classes—voted or did not vote—should be returned. This returns two numbers: the probability that the respondent voted and the probability that the respondent did not vote. We use the former as our vote-propensity score.
Producing estimatesOnce each respondent has been assessed a vote-propensity score via the steps outlined above, this number should simply be treated as an additional weight on top of the poststratification weight. For example, consider that there are n survey respondents and we are interested in candidate A’s vote share. We would simply compute i=1n[Wpi × Wvi | ci=Ai=1nWpi × Wvi] where wpi is the poststratification weight, wvi is the vote-propensity score, and ci is the candidate preference for respondent i.
StepGuidance
Design elementsAs it is necessary to use validated votes as the dependent variable when building the PGaD model, we assume that it is possible to match surveys from previous election cycles to voter files (or to access comparable publicly available surveys that have been matched to voter files, such as the CCES). It is also important to use validated vote survey data from at least two election cycles so that it is possible to have variation in the covariates so the model does not simply optimize parameters to reduce error for one election cycle. This approach was designed and tested to be effective for surveys that sample the voting-eligible population. Accordingly, we include self-reported measures of vote history and registration as covariates in our model. These may be also available through administrative data, so if a survey is sampling from a voter file, there may be another optimal method to incorporate these variables. In order to follow our approach exactly, it would be necessary to include the same questions as detailed in the Supplementary Material.
Data cleaning and codingSurvey data should be subset to US citizens before making predictions. Validated vote Matched to voter file and voted; Un-matched to voter file or matched to voter file and did not vote Intent to vote Yes, definitely; Probably; I already voted (early or absentee); No Vote history Did vote in most recent, similar election; Did not vote in most recent, similar election Voter registration Registered; Not registered Political interest (I follow what’s going on in government and public affairs...) Most of the time; Some of the time; Only now and then; Hardly at all/Don’t know Gender Male; Female AgeContinuous
Race White; Black; Hispanic; Asian; Native American; Other Education Did not graduate from high school; High school graduate; Some college, but no degree (yet); 2-year college degree; 4-year college degree; Postgraduate degree (MA, MBA, MD, JD, PhD, etc.) Income Under $40k; $40k–$100k; Over $100k; Prefer not to say Strength of partisanship Strong; Not Strong; Independent leaner; True independent
Fitting the modelThe model stores each of the decision trees it creates using each bagged sample. To predict, each new observation is run through each of these decision trees and all of the final classifications are saved. Then we can take the proportion of times each observation was predicted to be a voter and treat it as a predicted probability that the observation will be a voter. We use the randomForest package in R to perform our model training. When we wish to predict vote-propensity scores, using R’s built-in predict method, we specify prob=TRUE, which indicates that the estimated probability that each observation falls into either of the two classes—voted or did not vote—should be returned. This returns two numbers: the probability that the respondent voted and the probability that the respondent did not vote. We use the former as our vote-propensity score.
Producing estimatesOnce each respondent has been assessed a vote-propensity score via the steps outlined above, this number should simply be treated as an additional weight on top of the poststratification weight. For example, consider that there are n survey respondents and we are interested in candidate A’s vote share. We would simply compute i=1n[Wpi × Wvi | ci=Ai=1nWpi × Wvi] where wpi is the poststratification weight, wvi is the vote-propensity score, and ci is the candidate preference for respondent i.
Table 4.

A practical guide to implementing the PGaD model

StepGuidance
Design elementsAs it is necessary to use validated votes as the dependent variable when building the PGaD model, we assume that it is possible to match surveys from previous election cycles to voter files (or to access comparable publicly available surveys that have been matched to voter files, such as the CCES). It is also important to use validated vote survey data from at least two election cycles so that it is possible to have variation in the covariates so the model does not simply optimize parameters to reduce error for one election cycle. This approach was designed and tested to be effective for surveys that sample the voting-eligible population. Accordingly, we include self-reported measures of vote history and registration as covariates in our model. These may be also available through administrative data, so if a survey is sampling from a voter file, there may be another optimal method to incorporate these variables. In order to follow our approach exactly, it would be necessary to include the same questions as detailed in the Supplementary Material.
Data cleaning and codingSurvey data should be subset to US citizens before making predictions. Validated vote Matched to voter file and voted; Un-matched to voter file or matched to voter file and did not vote Intent to vote Yes, definitely; Probably; I already voted (early or absentee); No Vote history Did vote in most recent, similar election; Did not vote in most recent, similar election Voter registration Registered; Not registered Political interest (I follow what’s going on in government and public affairs...) Most of the time; Some of the time; Only now and then; Hardly at all/Don’t know Gender Male; Female AgeContinuous
Race White; Black; Hispanic; Asian; Native American; Other Education Did not graduate from high school; High school graduate; Some college, but no degree (yet); 2-year college degree; 4-year college degree; Postgraduate degree (MA, MBA, MD, JD, PhD, etc.) Income Under $40k; $40k–$100k; Over $100k; Prefer not to say Strength of partisanship Strong; Not Strong; Independent leaner; True independent
Fitting the modelThe model stores each of the decision trees it creates using each bagged sample. To predict, each new observation is run through each of these decision trees and all of the final classifications are saved. Then we can take the proportion of times each observation was predicted to be a voter and treat it as a predicted probability that the observation will be a voter. We use the randomForest package in R to perform our model training. When we wish to predict vote-propensity scores, using R’s built-in predict method, we specify prob=TRUE, which indicates that the estimated probability that each observation falls into either of the two classes—voted or did not vote—should be returned. This returns two numbers: the probability that the respondent voted and the probability that the respondent did not vote. We use the former as our vote-propensity score.
Producing estimatesOnce each respondent has been assessed a vote-propensity score via the steps outlined above, this number should simply be treated as an additional weight on top of the poststratification weight. For example, consider that there are n survey respondents and we are interested in candidate A’s vote share. We would simply compute i=1n[Wpi × Wvi | ci=Ai=1nWpi × Wvi] where wpi is the poststratification weight, wvi is the vote-propensity score, and ci is the candidate preference for respondent i.
StepGuidance
Design elementsAs it is necessary to use validated votes as the dependent variable when building the PGaD model, we assume that it is possible to match surveys from previous election cycles to voter files (or to access comparable publicly available surveys that have been matched to voter files, such as the CCES). It is also important to use validated vote survey data from at least two election cycles so that it is possible to have variation in the covariates so the model does not simply optimize parameters to reduce error for one election cycle. This approach was designed and tested to be effective for surveys that sample the voting-eligible population. Accordingly, we include self-reported measures of vote history and registration as covariates in our model. These may be also available through administrative data, so if a survey is sampling from a voter file, there may be another optimal method to incorporate these variables. In order to follow our approach exactly, it would be necessary to include the same questions as detailed in the Supplementary Material.
Data cleaning and codingSurvey data should be subset to US citizens before making predictions. Validated vote Matched to voter file and voted; Un-matched to voter file or matched to voter file and did not vote Intent to vote Yes, definitely; Probably; I already voted (early or absentee); No Vote history Did vote in most recent, similar election; Did not vote in most recent, similar election Voter registration Registered; Not registered Political interest (I follow what’s going on in government and public affairs...) Most of the time; Some of the time; Only now and then; Hardly at all/Don’t know Gender Male; Female AgeContinuous
Race White; Black; Hispanic; Asian; Native American; Other Education Did not graduate from high school; High school graduate; Some college, but no degree (yet); 2-year college degree; 4-year college degree; Postgraduate degree (MA, MBA, MD, JD, PhD, etc.) Income Under $40k; $40k–$100k; Over $100k; Prefer not to say Strength of partisanship Strong; Not Strong; Independent leaner; True independent
Fitting the modelThe model stores each of the decision trees it creates using each bagged sample. To predict, each new observation is run through each of these decision trees and all of the final classifications are saved. Then we can take the proportion of times each observation was predicted to be a voter and treat it as a predicted probability that the observation will be a voter. We use the randomForest package in R to perform our model training. When we wish to predict vote-propensity scores, using R’s built-in predict method, we specify prob=TRUE, which indicates that the estimated probability that each observation falls into either of the two classes—voted or did not vote—should be returned. This returns two numbers: the probability that the respondent voted and the probability that the respondent did not vote. We use the former as our vote-propensity score.
Producing estimatesOnce each respondent has been assessed a vote-propensity score via the steps outlined above, this number should simply be treated as an additional weight on top of the poststratification weight. For example, consider that there are n survey respondents and we are interested in candidate A’s vote share. We would simply compute i=1n[Wpi × Wvi | ci=Ai=1nWpi × Wvi] where wpi is the poststratification weight, wvi is the vote-propensity score, and ci is the candidate preference for respondent i.

Funding for this work was provided by the National Science Foundation [grant numbers 1225750, 1430505, and 1559125, all to B.F.S.].

References

Ansolabehere
,
Stephen
, and
Eitan
Hersh
.
2012
.
“Validation: What Big Data Reveal About Survey Misreporting and the Real Electorate.”
Political Analysis
20
:
437
59
.

Ansolabehere
,
Stephen
,
Sam
Luks
, and
Brian F.
Schaffner
.
2015
.
“The Perils of Cherry Picking Low Frequency Events in Large Sample Surveys.”
Electoral Studies
40
:
409
10
.

Berent
,
Matthew K.
,
Jon A.
Krosnick
, and
Arthur
Lupia
.
2016
.
“Measuring Voter Registration and Turnout in Surveys: Do Official Government Records Yield More Accurate Assessments?”
Public Opinion Quarterly
80
:
597
21
.

Bernstein
,
Robert
,
Anita
Chadha
, and
Robert
Montjoy
.
2001
.
“Overreporting Voting: Why It Happens and Why It Matters.”
Public Opinion Quarterly
65
:
22
44
.

Blais
,
André
.
2006
.
“What Affects Voter Turnout?”
Annual Review of Political Science
9
:
111
25
.

Enamorado
,
Ted
,
Benjamin
Fifield
, and
Kosuke
Imai
.
2018
.
“Using a Probabilistic Model to Assist Merging of Large-Scale Administrative Records.”
American Political Science Review
113
:
1
19
.

Enamorado
,
Ted
, and
Kosuke
Imai
.
2019
.
“Validating Self-Reported Turnout by Linking Public Opinion Surveys with Administrative Records.”
Public Opinion Quarterly
83
:
723
48
.

Erikson
,
Robert S.
,
Costas Panagopoulos, and Christopher Wlezien
.
2004
.
“Likely (and Unlikely) Voters and the Assessment of Campaign Dynamics.”
Public Opinion Quarterly
68
:
588
601
.

Freedman
,
Paul
, and
Ken
Goldstein
.
1996
.
“Building a Probable Electorate from Preelection Polls: A Two-Stage Approach.”
Public Opinion Quarterly
60
:
574
87
.

Jackman
,
Simon
and
Bradley
Spahn
.
2019
.
“Why Does the American National Election Study Overestimate Voter Turnout?”
Political Analysis
27
:
1
15
.

Keeter
,
Scott
,
Ruth
Igielnik
, and
Rachel
Weisel
.
2016
.
“Can Likely Voter Models Be Improved? Evidence from the 2014 US House Elections.”
Pew Research Center
. Available at www.pewresearch.org/2016/01/07/can-likely-voter-models-be-improved.

Kennedy
,
Courtney
,
Mark
Blumenthal
,
Scott
Clement
,
Joshua D.
Clinton
,
Claire
Durand
,
Charles
Franklin
,
Kyley
McGeeney
, et al. .
2018
.
“An Evaluation of 2016 Election Polls in the U.S.”
Public Opinion Quarterly
82
:
1
33
.

Kiley
,
Jocelyn
, and
Michael
Dimock
.
2009
.
“Understanding Likely Voters.”
Pew Research Center
. Available at https://www.pewresearch.org/wp-content/uploads/sites/4/2011/01/UnderstandingLikelyVoters.pdf.

Leighley
,
Jan E.
, and
Jonathan
Nagler
.
2013
.
Who Votes Now? Demographics, Issues, Inequality, and Turnout in the United States
.
Princeton, NJ
:
Princeton University Press
.

Martin
,
Elizabeth A.
,
Michael W.
Traugott
, and
Courtney
Kennedy
.
2005
.
“A Review and Proposal for a New Measure of Poll Accuracy.”
Public Opinion Quarterly
69
:
342
69
.

Murray
,
Gregg R.
,
Chris
Riley
, and
Anthony
Scime
.
2009
.
“Pre-Election Polling: Identifying Likely Voters Using Iterative Expert Data Mining.”
Public Opinion Quarterly
73
:
159
71
.

Newport
,
Frank
.
2000
.
“How Do You Define ‘Likely Voters’?”
The Gallup Poll 23. Retrieved September 11, 2017. Available at http://www.gallup.com/poll/4636/how-define-likely-voters.aspx.

Panagopoulos
,
Costas
,
Kyle
Endres
, and
Aaron C.
Weinschenk
.
2018
.
“Preelection Poll Accuracy and Bias in the 2016 US General Elections.”
Journal of Elections, Public Opinion and Parties
28
:
157
72
.

Rogers
,
Todd
, and
Masahiko
Aida
.
2014
.
“Vote Self-Prediction Hardly Predicts Who Will Vote, and Is (Misleadingly) Unbiased.”
American Politics Research
42
:
503
28
.

Verba
,
Sidney
, and
Norman H.
Nie
.
1972
.
Participation in America: Political Democracy and Social Equality
.
New York
:
Harper & Row
.

Wolfinger
,
Raymond E.
, and
Steven J.
Rosenstone
.
1980
.
Who Votes?
New Haven, CT
:
Yale University Press
.

Footnotes

1.

We use the 2016 CCES for illustrative purposes, but similar patterns can be found in other years and in other surveys.

2.

Following Enamorado and Imai (2019), we compare the turnout rates for the CCES surveys to the rate for voting-eligible citizens plus those on probation or parole. We exclude from the calculation noncitizens and those who are imprisoned, since these groups are not part of the survey’s target population. While the CCES sample includes a small percentage of respondents who self-identify as noncitizens, many of those responses are due to measurement error and Ansolabehere, Luks, and Schaffner (2015) show that only 0.45 percent of CCES respondents consistently identify as noncitizens.

3.

Enamorado and Imai (2019) also directly refute Berent, Krosnick, and Lupia’s (2016) claim that because registration rates are higher than successful match rates, this indicates that many people who actually voted are unable to be matched to a record. They point out that registration rates derived from voter files are overestimates of the true registration rates, due to inactive, deceased, or relocated voters who remain on the files.

4.

Virginia did not release turnout records to Catalist for the 2008 and 2010 election cycles. Thus, Virginia respondents were removed from the data set for those two years.

5.

See the Supplementary Material for results using other thresholds.

6.

These approaches can also be implemented with logistic regression, and doing so produces similar results, as shown in the Supplementary Material.

7.

All of these calculations are based on using the poststratification weights provided with the survey to ensure that the sample is nationally representative.

8.

A is the natural log of the ratio between the odds that a voter chooses Republican versus Democratic in a given forecast and the actual odds according to final results. When a forecast is perfectly accurate, the odds ratio will be 1 and its log, the A-measure, will be 0.

10.

Pollsters can also use publicly available vote-validated surveys such as the CCES or the American National Election Studies if they cannot conduct vote validation on their own surveys.

This article is published and distributed under the terms of the Oxford University Press, Standard Journals Publication Model (https://academic.oup.com/journals/pages/open_access/funder_policies/chorus/standard_publication_model)

Supplementary data