Do interviewers moderate the effect of monetary incentives on response rates in household interview surveys?

demographic characteristics. Our results show signiﬁcant and substantial variability between interviewers in the effectiveness of monetary incentives on the probability of cooperation across all three surveys. However, none of the interviewer characteristics considered are signiﬁcantly associated with more or less successful interviewers.

demographic characteristics. Our results show significant and substantial variability between interviewers in the effectiveness of monetary incentives on the probability of cooperation across all three surveys. However, none of the interviewer characteristics considered are significantly associated with more or less successful interviewers.

INTRODUCTION
It is widely acknowledged that low and declining response rates pose an existential threat to conventional approaches to data collection in survey research (Brick and Williams 2013;Couper 2013;Meyer, Mok, and Sullivan 2015;Miller 2017). In response to this pressing challenge, survey methodologists have invested considerable time and resources into investigating features of the survey process which can be leveraged to increase the probability of cooperation among sampled units (Groves and Couper 1998;Groves and Heeringa 2006). As the primary interface between survey organizations and sample members, interviewers are key to this endeavour (Morton-Williams 1993; Campanelli, Sturgis, and Purdon 1997;West and Blom 2017). They are a key factor in response rates for household interview surveys remaining substantially higher than all other available modes, although they also come at a commensurately higher cost. A large number of studies in a broad range of contexts have now established that demographic, attitudinal, and behavioral differences between interviewers can account for substantial variability in response rates (Hox and de Leeuw 2002;Hansen 2006;Durrant, Groves, and Steele 2010). For example, Campanelli and O'Muircheartaigh (1999) found that more experienced interviewers were more successful at obtaining contact and cooperation due to more effective calling patterns and an ability to tailor the survey request to sample members' motivations and concerns.
In addition to interviewers, monetary incentives of various kinds have played a central role in strategies for maximising survey cooperation (Singer, Groves, and Corning 1999;Singer, Hoewyk, Gebler, Raghunathan, and Mcgonagle 1999;Singer 2002). Monetary incentives are considered to operate by acting as a replacement for other nonpecuniary motivations for survey participation such as interest in the survey topic, enjoyment of social interaction, or a sense of civic duty Groves, Singer, and Corning 2000). A large body of evidence, predominantly based on randomized experiments, has established that monetary incentives exert a small to moderate positive effect on response rates and that larger incentives tend to produce more substantial effects but with diminishing marginal returns (Church 1993;Cantor, O'Hare, and O'Connor 2008;Singer and Ye 2013).
Given the sustained focus on the role of interviewers and monetary incentives in the existing survey methodological literature, it is surprising that their potential joint influence has seldom been considered. Because interviewers play such a key role in making contact with and persuading sample members to participate, it is plausible that interviewers vary in how effective they are at leveraging incentives to persuade sample members to provide an interview. For example, some interviewers may tailor their doorstep introductions to highlight the availability of a monetary incentive at households that are most likely to be sensitive to them (Campanelli et al. 1997;Groves and Couper 1998). Similarly, interviewers may feel more confident in their doorstep approach when they know an incentive is available, which may positively affect their persuasive efforts (Singer and Ye 2013). This joint influence is our focus in this article. We analyze data from three different face-to-face interview surveys that included a randomized incentive experiment to identify interviewer influences on the effectiveness of monetary incentives in promoting survey cooperation.
The remainder of the article is structured as follows: We first provide short reviews of the respective literatures on how interviewers and monetary incentives influence survey response, before setting out our expectations regarding the moderating effect of interviewers on the effectiveness of incentives. We then describe the three surveys that form the basis of our analysis and the administrative data on interviewers and areas to which they are linked. This is followed by an exposition of our analysis strategy and presentation of our key findings. We conclude with a consideration of the limitations of our study, a discussion of the implications of our findings for improving survey practice and suggestions on how future research in this area might usefully proceed.

The Effect of Interviewers on Response Rates
Face-to-face surveys consistently achieve higher response rates than those undertaken by self-administration or by telephone, a difference that is largely attributable to the role of interviewers. Interviewers locate and make repeated calls at sampled addresses, thereby keeping noncontacts to a minimum (Campanelli et al. 1997). Having made contact with a household, they undertake a number of additional tasks including respondent selection within households, conveying information about the survey such as the topic, sponsor, likely duration of the interview, and the availability of incentives (Couper and Schlegel 1998). They also often provide accompanying information about the survey in the form of copies of advanced letters (which will not have been read by all sample members) , provide reassurance about the bona fides of the survey, and show identity documentation (Groves and Couper 1998;Groves et al. 2000).
Interviewers also persuade reluctant respondents to provide an interview, thereby minimizing refusals. A range of dispositional factors and behavioral styles have been identified as important in determining how successful interviewers are at preventing refusals. These include an ability to maintain an interaction rather than accept a refusal and to tailor their approach on the doorstep to specific characteristics of sample units by identifying and presenting aspects of the survey that they judge are likely to be positively valued (Morton-Williams 1993;Campanelli et al 1997;Groves and Couper 1998). For example, an interviewer may remark upon the respondent's garden if they perceive that gardening is likely to be a hobby of the householder, or they might highlight the topic of the survey if they judge from the observable characteristics of the sample member that it is likely to be of interest. Studies which have examined the causes of noncontact and refusal have consistently found significant interviewer effects across a range of sample designs and international contexts (Campanelli et al. 1997;Hox and de Leeuw 2002;Durrant and Steele 2009;Durrant et al. 2010). For example, Blom, de Leeuw, and Hox (2011) found interviewer intraclass correlation coefficients of 0.27 for noncontact and 0.08 for cooperation across ten countries in the 2008 European Social Survey.
Existing research has also considered which characteristics of interviewers are important in producing these effects (Blom and Korbmacher 2013). This has found that experienced interviewers tend to be better at tailoring their approaches to household idiosyncrasies and concerns (Groves and Couper 1998;Lemay and Durand 2002). More experienced interviewers, both in terms of experience on the particular survey and of interviewing more generally, have also been found to obtain higher response rates, even though they are often allocated to more difficult areas (Purdon, Campanelli, and Sturgis 1999;West and Blom 2017). Other studies have found that interviewers with higher levels of self-confidence and more positive appraisals of the likelihood of achieving interviews also obtain higher cooperation rates, an effect which is thought to arise from the positive effect of confidence on the quality of doorstep interactions (Singer and Kohnke-Aguirre 1979;Groves and Couper 1998;Hox and de Leeuw 2002). The existing evidence suggests that interviewer skills and experience in recognizing, interpreting, and addressing visual cues and the confidence and self-belief with which interviewers approach the task of obtaining cooperation on the doorstep are the key mechanisms through which interviewers influence individual cooperation decisions.

Using Incentives to Increase Response Rates
Under the influential "leverage-salience" theory of survey cooperation (Groves et al. 2000), incentives are postulated to work by acting as a replacement for nonfinancial motivating factors such as engagement in the topic of the survey, enjoyment of social interaction, and a sense of civic or moral obligation. Incentives may also invoke norms of reciprocity, such that respondents feel a sense of obligation to provide an interview when they are offered or receive an incentive before the interview request is made (Dillman, Smyth, and Christian 2009).
The field of survey research has benefited from a wealth of systematic reviews and meta-analyses of the effects of survey incentives which have yielded a robust set of conclusions. We know from this body of evidence that monetary incentives are more effective in motivating participation than nonmonetary incentives such as pens, calendars, diaries, and so on (Church 1993;Cantor et al. 2008;Singer and Ye 2013). It is also well established that prepaid (or unconditional) incentives tend to produce more substantial effects on response rates than those that are promised (or conditional) on completion of the survey (Church 1993;Cantor et al. 2008;Lavrakas 2008), though it does not follow from this that they are necessarily more cost-effective (Brick, Montaquila, Hagedorn, Roth, and Chapman 2005). It is also apparent from these studies that the effect of incentives is greater for self-completion surveys and surveys that have a low response rate when no incentive is offered, presumably because there is more scope for the incentive to act as a replacement for nonmonetary motivations among a larger pool of potential nonrespondents Mercer, Caporaso, Cantor, and Townsend 2015).
Researchers have also established that the magnitude of the effect of incentives on response rates increases with the size of the incentive. For instance, in a meta-analysis of thirty-nine experimental studies, Singer, Groves, and Corning (1999) found that each dollar of incentive paid resulted in one-third of a percentage point increase in response rate compared with the no incentive condition. However, other studies have found that this "dose-response" relationship is curvilinear, with the size of the increase in the response rate declining with additional increases in the value of the monetary incentive (Gelman, Stevens, and Chan 2002;Cantor et al. 2008;Mercer et al. 2015). In sum, the existing evidence demonstrates that monetary incentives have a robust, positive effect on the probability of survey cooperation.

The Joint Effect of Interviewers and Incentives on Response Rates
We know that interviewers and incentives have a positive influence on response rates, but what of their joint effect? It seems plausible that interviewers might moderate the effect of incentives on cooperation probability for three inter-related reasons. First, interviewers are the primary conduit of information between survey organization and sample members and are, therefore, essential to ensuring that potential respondents are aware that an incentive is available. While most surveys will highlight incentives in an advanced letter, many respondents do not open-let alone read-them (Stoop 2005). Furthermore, it seems reasonable to assume that those who do not read advanced lettersthose who are busy and/or uninterested in the survey topic-are also more likely to be susceptible to monetary incentives. Second, interviewers may have more confidence in the likelihood of obtaining an interview when a monetary incentive is offered. This might exert an additional positive effect on cooperation over and above the influence of the incentive on respondents because higher levels of confidence improve the quality of interviewer approaches (Groves and Couper 1996;Singer and Ye 2013). Third, interviewers may vary in the extent to which they tailor their doorstep introductions by highlighting the availability of the incentive at addresses where they believe it is likely to be effective. For example, some interviewers might ask sample members whether they received the letter with information about the payment at an early stage of the interaction, while others do not mention it at all.
Existing research, however, offers little in the way of hard evidence on the question of whether or not interviewers moderate the effects of monetary incentives on cooperation probability. An exception is Singer, Hoewyk, and Maher (2000), who investigated the influence of interviewer expectations on the effect of incentives on cooperation rates using data from the Survey of Consumer Attitudes, a telephone survey of the American public. Singer and Maher randomly assigned interviewers and respondents to three groups: in groups one and two, respondents received an advance letter and a $5 unconditional incentive, while respondents in group three received an advance letter but no incentive. Interviewers in group one were unaware of the incentive, but interviewers in groups two and three were made aware of the incentive level via messages on their computers. Interviewers in groups one and two achieved response rates of 76 percent and 75 percent, respectively, compared with 62 percent for interviewers in group three. Singer, Hoewyk, and Maher (2000) concluded that, although the unconditional incentive boosted response, interviewer expectations about the likely cooperativeness of sample members had no additional effect. Lynn (2001) found similar evidence from a focus group of interviewers that expectations about the likely impact of incentives on cooperation bore little resemblance to actual response outcomes. While these studies support the conclusion that incentives operate primarily or exclusively via their effects on respondents rather than on interviewers, they do not rule out the possibility that interviewers vary in the effectiveness with which they deploy incentives. We turn next to a direct empirical assessment of this question.

DATA
We use data from three different United Kingdom face-to-face interview surveys. These are the 2015 National Survey for Wales Field Test (NSW2015), the 2016 National Survey for Wales Incentive Experiment (NSW2016), and wave one of the United Kingdom Household Longitudinal Study Innovation Panel (UKHLS-IP). All three surveys use stratified random sampling, with addresses selected from the Postcode Address File. The two Welsh surveys randomly select one eligible adult (aged sixteen and over), while UKHLS-IP attempts interviews with all eligible adults (aged eighteen and over) in the household. For UKHLS-IP, a cooperating household is defined as one in which at least one eligible adult provided an interview. The NSW2015 randomly allocated 50 percent of addresses to receive no incentive and 50 percent to receive £10; NSW2016 also used a 50/50 allocation but with a treatment condition of £5 and a control condition of no incentive. The UKHSL-IP randomly allocated one-third of addresses to receive a £5 incentive, one-third of addresses to receive £5 rising to £10 conditional on all household members completing the survey, and one-third to receive £10. For the analyses presented here, the latter two groups are combined. Incentives in all three surveys were offered conditional on completion of the questionnaire, and allocation of addresses to experimental conditions was implemented within interviewer workloads. We use response outcomes before any re-issuing in order to ensure that the random assignment of incentives within interviewers is maintained. However, detailed first issue outcomes were not available for the UKHLS-IP, so we are only able to model response/nonresponse for this survey rather than cooperation conditional on contact. Analysis of the two Welsh surveys shows that the results are substantively the same for both response and cooperation, so we do not consider this to be an important limitation. More detailed information about the design of each survey is provided in the appendix.
Each survey was linked to administrative data held on interviewers by the respective survey agencies. These were age, sex, and experience (number of years working for the agency). We use these variables to assess whether interviewer characteristics are associated with variability in the effectiveness of deploying incentives. For the UKHLS-IP, we also link aggregate census variables from the 2011 census to the sample file. A total of twenty-one census count variables were combined using a factorial ecology model (Rees 1971), with a total of five neighborhood indices extracted. These measures cover the extent of "concentrated disadvantage" (areas with a higher number of single parent families, those on income support and unemployed, fewer people in managerial and professional occupations, and less owner occupiers), "urbanicity" (high population density and domestic properties, and relatively little green space), and "population mobility" (higher levels of in-and outmigration and more single person households). We also account for differences in the neighborhood age structure (with higher scores for areas with a younger population), housing structure (higher scores for areas with more terraced and vacant properties), and the police recorded crime rate. We model both the response rate and the cooperation rate. Response rate is defined based on AAPOR RR2 (AAPOR 2016) as, where RR denotes Response Rate, I denotes interview, P denotes partial interviews, R denotes refusals, NC denotes noncontacts, O denotes other unproductive, UE(NC) denotes unknown eligibility (noncontacted), and UE denotes unknown eligibility. The cooperation rate (CR) conditions on those contacted and is defined as Response outcomes for the three surveys are presented in table 1. The response rates were higher in the incentive condition for all three surveys, with the NSW2015 and NSW2016 having a 3 and 2 percentage points higher cooperation rates for the incentivized households respectively and the UKHLS-IP having a 5 percentage points higher response rate in the incentive condition. The difference is statistically significant at the 95 percent level of confidence (using a v 2 test) for UKHLS-IP and NSW2015 but not for NSW2016. Table 1 demonstrates the cooperation rate was higher in the incentivized condition for all three surveys, though the difference was statistically significant in only one. Next, we proceed to a multivariate analysis to assesses whether these average differences in cooperation rates are constant across interviewers or whether some interviewers are more successful at using the incentive to convert refusals into interviews.

ANALYSIS
The influence of interviewers on the effectiveness of incentives on survey cooperation and response is assessed using multilevel logistic regression models (Hox and de Leeuw 2002;Durrant and Steele 2009;Goldstein 2010). The model applied here has the following form: Let y ij denote the binary response for household i ði ¼ 1; ...; iÞ, interviewed by interviewer jj ¼ 1; ...; j ðÞ where, y ij is assumed to follow a Bernoulli distribution, with conditional response probabilities p ij ¼ Pr y ij ¼ 1 ÀÁ and 1 À p ij ¼ Pr y ij ¼ 0 ÀÁ . The multilevel logistic regression model accounting for interviewer effects takes the form where b 0 is the intercept, b 1 is the coefficient for the incentive condition; x 1ij is a dummy indicator of the incentive group for household i within the assignment of interviewer j; x 0 ij is a vector of household-level characteristics with coefficient vector b; z 0 j is a vector of interviewer-level covariates with coefficient vector a; l 0j is a random intercept; and l 1j is a random coefficient for the incentive dummy. The random intercept and slope, l 0j and l 1j , are assumed to follow a normal distribution with zero mean and variance matrix X l defined as where r 2 l0 is the intercept variance, r 2 l1 is the variance in slope, and r l01 is the covariance between intercepts and slope. The positive values of r u01 indicate that the effect of the incentive is greater for interviewers with higher cooperation/response rates, although negative values indicate the opposite. Cross-level interactions between interviewer characteristic variables and the incentive variable are included to test whether observable characteristics of interviewers are associated with variability in the effectiveness of deploying incentives.
In standard face-to-face survey designs such as those considered here, identification of interviewer effects is complicated by the confounding of interviewer assignments and areas (Campanelli and O'Muircheartaigh 1999;Durrant et al. 2010). Failure to account for differences in the area-level composition of interviewer assignments can result in overestimation of the magnitude of interviewer effects (O'Muircheartaigh and Campanelli 1998). Where there is an overlap between interviewer assignments and areas, this can be mitigated using a cross-classified, multi-level model (Durrant and Steele 2009). However, this could not be done for the three datasets analyzed here because it was not possible to obtain geographic identifiers for the two Welsh surveys, and the UKHLS-IP did not contain sufficient crossing of interviewers and areas to implement a cross-classified model. We therefore control for area characteristics as fixed effects in the models for the UKHLS-IP data and assess the impact this has on the interviewer random effects.
Models are estimated using Markov Chain Monte Carlo (MCMC) methods using MLwiN software (Fearn, Gelman, Carlin, Stern, Rubin et al. 2004;Browne, Kelly, Charlton, and Pillinger 2016). The starting values for the fixed effects are the second-order penalized quasi-likelihood (PQL) estimates. Priors for the variance matrix are assumed to follow an inverse Wishart distribution p X À1 l $ Wishart n n; ðÞ , where n is the number of rows in the variance matrix and is an estimate for the true value of the variance matrix X l (Browne et al. 2016). Because we are using MCMC, we also assess significance of coefficient estimates using the change in model deviance information criterion (DIC) (Spiegelhalter, Best, Carlin, and van der Linde 2002). Deviance information criterion balances model fit and model complexity by taking the sum of the posterior expectation (mean) of the deviance function D ðÞ and the effective number of parameters pD ðÞ . When comparing DIC values, a model with a DIC value of at least 3 points lower than the previous model is considered to have a significantly better fit (Spiegelhalter et al. 2002;Rasbash, Steele, Browne, Goldstein, Charlton et al. 2012). The models had a burn-in length of 10,000 and 200,000 iterations. In order to avoid undue influence of starting values, different burn-in lengths were tried, as recommended by Fearn et al. (2004). The Brooks-Draper and Raftery-Lewis diagnostics were checked to determine how long the chain must be run to obtain accurate posterior estimates (Browne et al. 2016). Table 2 presents the coefficient estimates, their standard deviations, and the corresponding 95 percent credible intervals for the NSW2015 and NSW2016 models. As we saw in table 1, the coefficients for the incentive fixed effect are positive for both surveys, although only for NSW2015 does the 95 percent credible interval not include zero. The random coefficient variances of 0.09 and 0.07 are both significant, indicating that interviewers vary in the effectiveness with which they deploy incentives. The DIC decreases by 8.0 for NSW2015 and by 15.1 for NSW2016 when the interviewer random coefficient is introduced, indicating an improvement in model fit.

RESULTS
The cross-level interactions between the three interviewer characteristic variables-age, sex, and experience-and the incentive dummy are all nonsignificant, indicating that these interviewer characteristics do not explain betweeninterviewer variability in the effectiveness of incentives on cooperation. The DIC change, when these interaction terms are added, are À2 for NSW2015 and 4.1 for NSW2016, indicating a small improvement in model fit after the inclusion of these interactions for NSW2016. The covariance between the random intercept and random coefficient r u01 is nonsignificant for both surveys, with a point estimate of 0.02 for NSW2015 and of À0.02 for NSW2016. This indicates that the effectiveness of incentive deployment between interviewers is not related to the overall cooperation rate an interviewer achieves on their assignment of addresses. Figure 1 plots the difference in the mean predicted rates of cooperation for each interviewer derived as fitted values from the models in table 2. Each dot in figure 1 represents an interviewer, with the left Y axis being the difference in the cooperation rates for households in the incentive and nonincentive conditions. The triangles show the mean overall cooperation rate (plotted against the right Y axis) for each interviewer across all eligible households in their assignment. There is substantial variability across interviewers in the effectiveness of the incentive in obtaining cooperation. For NSW2015, the difference in the cooperation rate ranges from À9toþ13 percentage points, with the corresponding values for NSW 2016 being À7 and þ16 percentage points.
Not all of this variability is attributable to how skilful interviewers are in deploying incentives and simply reflects random variability in response propensities across interviewer assignments. We can get a better sense of the effect of interviewers on incentive effectiveness by taking the expected cooperation rate for an incentivized household using interviewer from the top and bottom deciles of the random coefficient variance, r 2 1j , while holding all other variables constant. For NSW2015, this shows that interviewers in the top performing decile achieve an expected cooperation rate of 67 percent for incentivized households compared with 64 percent for those in the bottom decile and compared with 68 percent for the median interviewer for nonincentivized households, a substantial difference. The corresponding figures for NSW2016 are 64 percent and 58 percent for the top and bottom deciles, respectively, and 65 percent for the median interviewer for nonincentivized households. There is no obvious relationship between the overall response rate and the effectiveness of the incentive within interviewers, so we find no evidence that interviewers who are, on average, better at obtaining cooperation are also more effective in deploying the incentive.
Next, we turn to the same analysis of UKHLS-IP, which, as a household longitudinal survey, has a rather different design from the Welsh cross-sectional surveys, although we focus on wave one only. Table 3 presents the estimated coefficients, standard deviations, and corresponding 95 percent credible intervals. These are consistent with those presented in table 2; the fixed effect for the incentive predicting response is positive but nonsignificant, and the interviewer characteristics-age, gender, and experience-are all nonsignificant, as are the interactions between these variables and the incentive fixed effect.
Three of the area level variables are significantly associated with response; the higher the urbanicity and population mobility, the lower the level of survey response, while areas with a housing structure comprising more terraced housing and vacant properties have higher levels of response. Even after controlling for these differences in area composition, the random coefficient for the incentive is significant, with a variance of 0.13 (95 percent credible interval 0.03-0.35). This suggests that the between-interviewer variability in the effectiveness of the incentive is caused by interviewer behavior rather than by differences in the sorts of people they have been allocated to interview. The model DIC decreases by 3.10 with the inclusion of the random coefficient, so we also find evidence of a between-interviewer difference in the effectiveness of the incentive on this alternative measure of statistical significance.
As with the Welsh surveys, the covariance between the random intercept and random slope is positive but with a 95 percent credible interval that includes zero. We therefore also find no support from UKHLS-IP for the idea that interviewers who, on average, obtain higher response rates might also be more effective in their deployment of incentives. Figure 2 plots the difference in the mean predicted rates of response for each interviewer derived as fitted values from the models in table 3. It shows a very similar pattern to what we saw in figure 1 for the Welsh surveys, with substantial between-interviewer variation in response rates between high and low incentive groups with a range of À21 to þ18 percentage points. Visually, there is more evidence of a positive correlation between difference in response rates and the overall mean response rates for each interviewer, although this difference is not statistically significant.

DISCUSSION
John Wannamaker, the American department store magnate, once (apocryphally) observed that "half the money I spend on advertising is wasted; the trouble is I don't know which half." The same sentiment might also be applied to monetary incentives in surveys (Rossolatos, 2013), although in this context, considerably more than half of the money is wasted. This is because incentives generally add only a few percentage points or so to the headline response rate. It follows therefore that the majority of respondents in any survey using a monetary incentive would have agreed to provide an interview anyway. A small minority, however, are susceptible to being converted from refusal to interview with the provision of an incentive, and this in turn raises the possibility that interviewers might play an important role in determining the rate of such "conversions." While there are other reasons for providing monetary incentives than boosting the response rate, this remains the primary rationale in most cases. It is therefore important to understand how best to maximize the effectiveness of monetary incentives in converting refusals to interviews. This is all the more pressing, given the likely need to place greater reliance on incentives to maintain response rates in the future. Our findings show that across three different UK face-to-face surveys, interviewers vary significantly in how effective they are at using incentives to increase rates of cooperation. The effects we observe are substantively and statistically significant; our model estimates show that exchanging interviewers from the top to the bottom decile of interviewer performance would yield an expected 14 to 15 percentage point increase in the effect of the incentive relative to the control condition. We have speculated that this heterogeneity results from interviewer expectations and behavior, particularly the use of "tailoring" of doorstep interactions (Groves and Couper 1998) and greater confidence in the probability of obtaining an interview when an incentive is offered (Singer, Frankel, and Glassman 1983;Singer, Hoewyk, and Maher 2000). However, while the between-interviewer variability in the effectiveness of incentives was consistent across the three surveys, we found no significant predictor of this variance among the covariates considered: interviewer age, sex, and experience. Nor was variability in incentive effectiveness related to the overall response rate an interviewer achieved. Therefore, the mechanisms underpinning this effect remain unclear.
Our focus in this article has been on the effect of incentives on cooperation because incentives seem likely to exert their primary influence on the cooperation decision. However, it is possible that they also have an effect on contact rates and other categories of nonresponse. We have therefore also carried out the analyses reported here with the dependent variable specified as response/ nonresponse for the two surveys where full outcome codes were available before any reissues. The results are substantively identical to those reported here, so we find no evidence of a differential effect of interviewers on cooperation relative to total nonresponse. 1 Our findings have implications for survey practice. The approach we have implemented here to identify interviewer effectiveness in deploying incentives could be used as a way of identifying underperforming interviewers. This sort of monitoring is now implemented routinely in many large-scale survey operations, often in real-time, as a way of identifying interviewers who show signs of missing fieldwork targets (Kreuter 2013;Edwards, Maitland, and O'Connor 2017). It should be feasible to include "incentive performance" alongside other forms of paradata to raise flags against particular interviewers on this performance dimension, although how this would be adapted to designs in which all households are offered the same incentive would require further consideration.
Relatedly, the ability to identify interviewers at the top end of the performance distribution offers opportunities to better understand the types of strategies employed by more successful interviewers. Information on successful approaches to incentive use that are identified in this way could be integrated into sections of interviewer briefings that address doorstep approaches, both for generic and survey-specific training. Indeed, simply highlighting to interviewers that the way they administer incentives can have substantial effects on their response outcomes may, on its own, have some effect on their subsequent behavior.
While our methodological approach and findings represent an advance in our understanding of how interviewers and incentives interact to promote cooperation, this study is not without limitations, and these should be acknowledged. First, the surveys we have considered all use a relatively narrow range of incentive values which are administered to all households in the incentive condition. Caution should be exercised in generalizing to contexts where larger incentives are used or where incentives of varying values are targeted at different subgroups of the sample based on response propensities (Lavrakas, McPhee, and Jackson 2016). Our results also have little relevance to the use of incentives in online surveys, which comprise a large and growing proportion of total survey volume, both in the United Kingdom and internationally.
1. These analyses are available from the corresponding author upon request.
We were also able to link the sample file and response outcome data to a limited range of area and interviewer characteristics. It is possible that with stronger controls for differences between interviewers in the composition of their allocated addresses, the magnitude of the effects we have observed might be reduced. The paucity of interviewer characteristic data available to us, particularly the absence of variables measuring interviewer attitudes, beliefs, and behaviors, means that our ability to explain why some interviewers are more effective in deploying monetary incentives than others is weak. These limitations, we contend, represent potentially fruitful avenues for future research. The sample design of the NSW2015 used a stratified, single-stage, random selection of addresses across Wales drawn from the small user Postcode Address File (PAF). Adults 16 years old or over within each sampled household were interviewed face-to-face, and each interview lasted for an average of 25 minutes. When a household contained more than one adult, a single adult was randomly selected. The aim of the incentive experiment was to assess the extent to which response rates improved by offering respondents a £10 gift card upon completing an interview. The experimental group (N ¼ 2,965) received a £10 conditional incentive, and the second group received no incentive (N ¼ 2,830). The households which were randomly selected to be offered a conditional £10 received advance letters mentioning the incentive, while the other half of households received advance letters that contained no information about incentives. To ensure that any differences in response rates between respondents who were offered £10 and those offered no incentive are not attributed to any interviewer abilities, addresses that were offered incentives were randomly allocated within each interviewer assignment. The survey was implemented by a team of 86 interviewers with the number of households interviewed by each interviewer ranging between 14 and 134. Further details on the NSW2015 sample design can be found in Hanson, Sullivan, and Mcgowan (2015).

NATIONAL SURVEY FOR WALES INCENTIVE EXPERIMENT 2016 (NSW 2016)
The Welsh government commissioned the office for National Office of National Statistics (ONS) to conduct the National Survey for Wales 2016 (NSW 2016) incentive experiment between July and October 2016. The sample was drawn from the Postcode Address File (PAF). The stratification was done by local authority (LA), using an allocation designed to ensure a minimum effective sample size was achieved in each LA based on estimated response rate. Further details on the sample design can be found in Aumeyr et al. (2017). Half of the addresses in each odd numbered quota 2 were offered a £5 incentive conditional on participation (N ¼ 3,604), and addresses with even quota number were offered no incentive (N ¼ 3,467). The incentive experiment ran from July to October 2016. Originally, it was intended to run the experiment until December 2016, but it was terminated at the end of October 2016 because both experimental and control groups experienced lower response rates at 55 percent and 54 percent, respectively, which were lower than expected. With an aim of boosting response rates, a new £10 incentive conditional on participation was introduced to the full sample in November 2016. This study only considers the experiment sample size from July to October 2016 that consists of 7,071 households across the two conditions. There were 85 interviewers working on the survey with the minimum and maximum number of interviews per interviewer ranging between one and 219. Sociodemographic characteristics of ten (12 percent) interviewers who conducted interviews on 249 (3.5 percent) households were missing because they did not provide consent. The final analysis sample had 6,122 households after excluding 742 (10.5 percent) ineligible households and those interviewed by interviewers with missing sociodemographic characteristics.

UK HOUSEHOLD LONGITUDINAL SURVEY INNOVATION PANEL WAVE ONE (UKHLS-IP)
The sample for wave one of the UK Household Longitudinal Survey Innovation Panel (UKHLS-IP) was clustered and stratified, consisting of 2,786 addresses from 120 primary sampling units (PSUs) from the Postcode Address File (PAF). The incentive experiment was comprised of three conditions, with each condition receiving a different conditional incentive: Group one was offered £5 per adult, group two was offered £10 per adult, and group three was offered £5 per adult, rising to £10 per adult if all adults in the household completed interviews. Single person households randomly assigned to group three received £5 initially that increased to £10 if they participated. For the purposes of our analysis, groups two and three are combined. Note that all households were also sent an unconditional £5 incentive with the advance letter. There were twenty-seven households in the UKHLS-IP that did not successfully merge with interviewer data due to lack of common unique identifiers. The neighborhood characteristic variables are drawn from the 2. Each quota contained between twenty and thirty addresses on average. census and were available for England only. This resulted in the exclusion of 342 households (12.3 percent) from Wales and Scotland. In addition, thirtyone (1.1 percent) households in five MSOAs in England did not successfully merge with Innovation Panel data due to lack of common unique identification codes. Therefore, the final analysis sample contained 2,123 households after excluding 263 (9.4 percent) ineligible households. The number of interviewers working on the UKHLS-IP was 107, with the number of households interviewed by each interviewer ranging between two and fifty. Further details about the UKHLS-IP can be found in Boreham and Constantine (2008)