Abstract

Survey interviewers are often tasked with assessing the quality of respondents’ answers after completing a survey interview. These interviewer observations have been used to proxy for measurement error in interviewer-administered surveys. How interviewers formulate these evaluations and how well they proxy for measurement error has received little empirical attention. According to dual-process theories of impression formation, individuals form impressions about others based on the social categories of the observed person (e.g., sex, race) and individual behaviors observed during an interaction. Although initial impressions start with heuristic, rule-of-thumb evaluations, systematic processing is characterized by extensive incorporation of available evidence. In a survey context, if interviewers default to heuristic information processing when evaluating respondent engagement, then we expect their evaluations to be primarily based on respondent characteristics and stereotypes associated with those characteristics. Under systematic processing, on the other hand, interviewers process and evaluate respondents based on observable respondent behaviors occurring during the question-answering process. We use the Work and Leisure Today Survey, including survey data and behavior codes, to examine proxy measures of heuristic and systematic processing by interviewers as predictors of interviewer postsurvey evaluations of respondents’ cooperativeness, interest, friendliness, and talkativeness. Our results indicate that CATI interviewers base their evaluations on actual behaviors during an interview (i.e., systematic processing) rather than perceived characteristics of the respondent or the interviewer (i.e., heuristic processing). These results are reassuring for the many surveys that collect interviewer observations as proxies for data quality.

Introduction

At the end of a survey interview, interviewers often evaluate respondents on dimensions such as cooperativeness, comprehension, or friendliness. These assessments have been used as indicators of data quality (e.g., Barrett, Sloan, and Wright 2006) and interviewer engagement (Olson and Peytchev 2007). Yet the measurement qualities of interviewer assessments and specific threats to validity have not been directly evaluated. For example, interviewers’ assessments may represent a more general impression based on stereotypes associated with respondent characteristics like age, race, and sex (e.g., women are more talkative than men) rather than behaviors during the interview. More generally, little research assesses whether the quality of interviewer assessments justifies the investment of interviewer time to complete.

In this paper, we develop competing hypotheses for how interviewers develop impressions that inform their end-of-survey evaluations of respondents. Those hypotheses are tested using CATI survey data about respondent and interviewer characteristics, respondents’ actual behavior during the survey, and interviewer evaluations of respondents’ cooperativeness, interest, friendliness, and talkativeness. While previous face-to-face and CATI studies have examined how respondent characteristics affect interviewer evaluations of respondent engagement and response quality (e.g., Hurtado 1994; Freedman et al. 2012), to our knowledge this is the first study to examine whether actual respondent behaviors predict postsurvey interviewer evaluations in a telephone survey.

In the next two sections, we provide the theoretical perspective of the dual process model of impression formation applied to interviewer evaluations. Then we describe the data and methods used, followed by analytic results. Finally, we discuss the findings and the implications for survey practice and future research.

Background

Interviewer observations collected in many surveys focus on nonresponse error and measurement error. Most of the research on interviewer observations focuses on understanding nonresponse error, including observable housing unit characteristics, or statements that a householder makes during recruitment (e.g., Olson 2013). Research has shown that these interviewer observations predict and are useful for adjusting for unit nonresponse and panel attrition (e.g., Lepkowski and Couper 2002; West, Kreuter, and Trappmann 2014). Despite this utility, these observations are subject to significant interviewer variance effects and measurement errors (e.g., Sinibaldi, Durant, and Kreuter 2013; West 2013). For instance, interviewers engage in social categorization when evaluating factual characteristics (West and Kreuter 2013).

Second, interviewers record observations about factors related to the measurement process, including respondent cooperativeness with answering questions or engagement in the interview (Olson and Parkhurst 2013). Surveys such as the General Social Survey (GSS) and the European Social Survey (ESS) ask interviewers to assess response quality after an interview, including measures of the levels of respondent cooperation and understanding (e.g., Smith 2009; ESS 2014). Investigators use these postsurvey evaluations to identify potential breakdowns in data quality (Olson and Parkhurst 2013). For example, in an incentive experiment, Medway and Tourangeau (2015) use interviewer assessments of a respondent’s level of effort in answering questions as an indicator of data quality, and find no differences in ratings between the control and the incentive group. Holbrook and colleagues (2014) find a significant positive relationship between interviewer ratings of respondent intelligence and response heaping, but a negative relationship between interest and response heaping. Tarnai and Paxson (2005) show that interviewer ratings of respondent comprehension difficulty are significantly positively related to the number of missing items in a survey. Thus, interviewer observations of the quality of an interview are used by researchers to evaluate data quality.

Although interviewer observations of response quality are often used in analyses (e.g., Barrett, Sloan, and Wright 2006; Kaminska, McCutcheon, and Billiet 2010), few studies have assessed their measurement properties. Olson and Peytchev (2007) find significant interviewer variance effects for interviewer evaluations of respondent interest. Other studies have found significant associations between respondents’ characteristics and interviewer evaluations of friendliness, interest, comprehension, engagement, and uncertainty (Hurtado 1994; Freedman et al. 2012). While it is clear that interviewer evaluations vary across interviewers and respondents, no study has examined whether respondent behaviors during the interview itself influence these observations.

TYPES OF INTERVIEWER OBSERVATIONS

Interviewer evaluations of response quality cover two general groups: task-related assessments of respondent engagement in the measurement process (e.g., Cannell, Miller, and Oksenberg 1981) and assessments of rapport or interpersonal affiliation between the interviewer and respondent (Garbarski, Schaeffer, and Dykema 2016). Many major US and European surveys, including the 2012 pre-election American National Election Studies (ANES 2013), the GSS (Smith 2009), the ESS (ESS 2014), the National Longitudinal Survey of Youth (NLSY 1997), and the Project on Human Development in Chicago Neighborhoods (PHDCN; Earls et al. 2000), ask interviewers to evaluate respondent cooperation and interest toward the interview as measures of engagement and motivation. As in the GSS and PHDCN, interviewers also assess aspects of interactional rapport such as respondents’ friendliness and talkativeness. These latter indicators reflect practices of everyday conversation that deviate from standardized survey interviewing practices and potentially affect data quality (Garbarski, Schaeffer, and Dykema 2016).

Continuum Model of Impression Formation

The social psychological literature on impression formation is useful for understanding which types of information interviewers draw on to make their postsurvey assessments. Dual processing models suggest that impressions about individuals can be formed either through heuristic or systematic processing (e.g., Chaiken 1980; Chaiken and Torpe 1999). In particular, the continuum model of impression formation suggests that perceivers initially classify individuals into preexisting social categories (e.g., male, elderly, African American) and then generate impressions based on these observed social characteristics (e.g., Fiske, Lin, and Neuberg 1999). Additionally, Tversky and Kahneman’s (1974) “availability” heuristic argues that people often rely on available experiences when making judgments about others. For example, a female interviewer who perceives herself as talkative could infer the same for a female respondent when forming an impression. These initial categorization processes are made heuristically, allowing impression formation to happen quickly and without much effortful thought (Chen, Duckworth, and Chaiken 1999; Fiske 2000; Fiske et al. 2002).

With sufficient motivation and available information, the perceiver will move beyond heuristic processing and process actual behaviors, updating the initial heuristic-based impressions through systematic processing or individuation. The process of individuation leads to a “piecemeal integration” of multiple sources of information to form the overall impression (Fiske, Lin, and Neuberg 1999).

INTERVIEWER OBSERVATIONS AND IMPRESSION FORMATION

We anticipate that survey interviewers conform to the continuum model when evaluating respondents. Namely, interviewers will first categorize respondents based on observed characteristics and the interviewers’ own characteristics. Interviewers will then update that initial impression based on the respondent’s behavior during the survey interview.

Whether or not interviewers proceed past the heuristic stage of impression formation is at the crux of how useful postsurvey interviewer evaluations are as indicators of data quality. If interviewers rely on heuristic impression formation, their evaluations will reflect stereotypes about respondents instead of respondents’ actual interview behaviors and data quality. That is, if interviewers rely primarily on heuristic processing to make their postsurvey evaluations, significant associations should exist between respondent and interviewer background characteristics and interviewer postsurvey evaluations. If interviewers proceed to systematic processing, their evaluations should be more indicative of data quality to the extent that respondent behaviors are associated with the quality of responses (Schaeffer and Dykema 2011). That is, if interviewers are engaging in more effortful systematic processing, significant associations should exist between observable respondent behaviors during the interview and postsurvey evaluations.

HEURISTIC PROCESSING DURING INTERVIEWS

For interviewers to use heuristic processing, they have to be able to perceive social group characteristics. In telephone interviews, interviewers rely exclusively on auditory cues and the content of the conversation (Conrad, Schober, and Dijkstra 2008). Research suggests that telephone interviewers can reliably perceive sociodemographic characteristics (e.g., Krauss, Freyberg, and Morsella 2002; Thomas and Reaser 2004). Furthermore, interviewers ask about respondents’ demographic characteristics during the interview, before any postsurvey interviewer evaluations.

We focus on respondent traits of age, sex, race, and socioeconomic status. Commonly held beliefs about older people are that they have reduced working memory capacity (e.g., Yan and Tourangeau 2008), are more verbose (Belli, Weiss, and Lepkowski 1999), have difficulty answering interview questions (Johnson et al. 2015), and are warmer than younger people (Fiske et al. 2002). Therefore, if interviewers rely on heuristic processing, they should evaluate elderly respondents as less cooperative but more friendly and talkative. For gender, men are thought to be more competent and rational whereas women are thought to be more warm, friendly, and expressive (e.g., Fiske et al. 2002). As such, interviewers should evaluate females as more friendly and talkative than their male counterparts. If being perceived as a racial minority triggers more hostile perceptions and other negative traits (Greenwald and Banaji 1995; Fiske et al. 2002), then we expect to see lower ratings of cooperativeness, interest, and friendliness for racial/ethnic minorities. Members of a higher socioeconomic status are typically perceived as more competent, intelligent, warm, or credible compared to lower socioeconomic status individuals (Tiedens, Ellsworth, and Mesquita 2000; Fiske et al. 2002). Thus, if interviewers engage in heuristic processing, respondents with higher socioeconomic status should be rated as more cooperative, interested, and friendly.

Because people rely on their own “available” experiences in attributing traits to others (Tversky and Kahneman 1974), an interviewer’s own characteristics can affect evaluations of respondents. If interviewers engage in heuristic processing based on their own fixed characteristics, significant associations should exist between the interviewers’ characteristics and their postsurvey evaluations of respondents.

SYSTEMATIC PROCESSING DURING INTERVIEWS

Four types of respondent behaviors are likely to influence postsurvey evaluations under systematic processing. First, question-answering behaviors are the most prevalent respondent behavior. For any given question, respondents can provide an adequate answer, qualify their answer (“about 5”), or provide responses that fail to fit into the response categories. Respondents can also elaborate on their answer by providing additional context (“5. I really like reading”).

Under the systematic-processing assumption, respondents who provide adequate answers should be evaluated as more cooperative, interested, and friendly because it represents respondents doing exactly what is expected of them during an interview. Conversely, respondents who provide any kind of inadequate answer (qualified, uncodable, don’t know, and refusal) should be evaluated as less cooperative because interviewers face greater burden in resolving these problems (Japec 2008).

We expect that elaborations on any type of answer will be associated with higher ratings of friendliness and talkativeness because elaborations reflect engagement and a desire to interact with the interviewer. Conversely, elaborations on adequate answers should be associated with lower ratings of cooperativeness if the interviewer perceives these as a violation of the paradigmatic question-response-feedback sequence. Elaborations on any kind of inadequate answer signal that the respondent is trying to help the interviewer. Elaborations on qualified and uncodable responses should be associated with higher ratings of cooperativeness and friendliness and, as with adequate answers, higher ratings of talkativeness. We have no clear expectations regarding other types of answers, such as answering a previous question.

The second type of respondent behaviors includes nonverbal utterances such as disfluencies and laughter. Nonverbal utterances are part of normal conversational behaviors and are not directly related to the task of responding (Jans 2010; Conrad et al. 2013). Speech disfluencies such as fillers (“ums” and “uhs”), stutters, and repairs are related to comprehension problems and difficulties with tasks requiring higher cognitive ability (e.g., Schober and Bloom 2004).

We have competing hypotheses for the association between disfluencies and postsurvey evaluations. Fluent speech styles produce positive ratings of speakers’ credibility, confidence, and social attractiveness (e.g., Oksenberg, Coleman, and Cannell 1986). Thus, respondent disfluencies should be associated with lower ratings of cooperativeness, interest, and friendliness. However, disfluencies have been shown to provide a “disfluency advantage,” as they give the listener more time to process what is being said and thus improve comprehension (Brennan and Schober 2001). If this is the case, then interviewers should have greater understanding of respondents with higher levels of disfluencies, which will lead to higher ratings of cooperativeness, interest, and friendliness. Inasmuch as laughter indicates rapport and affiliation between respondent and interviewer (e.g., Garbarski, Schaeffer, and Dykema 2016), we expect laughter to be positively associated with ratings of respondent cooperativeness, interest, and friendliness.

The third type of behaviors are those related to verbal measures of personal involvement and rapport. Rapport is inconsistently defined in the existing literature, but can include behaviors such as agreeing with the interviewer or providing personal disclosures (Garbarski, Schaeffer, and Dykema 2016). Personal involvement behaviors are expected to be associated with higher ratings of the interpersonal outcomes of friendliness and talkativeness but have no or negative associations with the task-related outcomes of cooperativeness and interest.

The fourth type of behaviors are requests for clarification, such as asking the interviewer to repeat a question or response options. These expressions of cognitive difficulty indicate decreased reliability and less accurate answers. On the other hand, requests for clarification also indicate harder-working respondents who are trying to give the “best” answer possible (Dykema, Lepkowski, and Blixt 1997). We expect requests for clarification to be associated with higher ratings of the task-related assessments of cooperativeness and interest. We do not expect these clarification behaviors to be associated with the interpersonal assessments of friendliness or talkativeness.

In sum, if interviewers evaluate respondents based on their interview behaviors, then interviewers are engaging in systematic processing. Whether these evaluations are related to actual behaviors during an interview has not been assessed. We now empirically address this question.

Data and Methods

DATA

We use data from the Work and Leisure Today survey, a random-digit-dial telephone survey fielded by AbtSRBI in August 2013 targeting adults in landline households (n = 450; AAPOR RR3 = 6.3 percent). In 2013, 38 percent of US adults lived in cell-phone-only households (Blumberg and Luke 2013). The survey took an average of fourteen minutes and contained an average of 46.7 questions. Sampled cases were randomly assigned to twenty-two interviewers. The surveys were audiorecorded with respondent consent and then transcribed. Two interviews were not fully recorded and are excluded from the analyses. Interviewers who had fewer than ten interviews (Olson and Peytchev 2007) were excluded, leaving an analytic sample size of 433 interviews conducted by nineteen interviewers.

DEPENDENT VARIABLES

After completing each interview, interviewers were asked to make an assessment about how cooperative, interested, friendly, and talkative the respondent was. Small cell sizes in some categories were collapsed for analysis. Table 1 presents frequency distributions for each of the interviewer evaluations (see online appendix A for question wording and full distributions). Interviewers generally evaluate respondents as having very good cooperation levels, showing above average and high interest, mostly friendly and eager, and neither talkative nor untalkative. Simple correlation analysis shows that assessments of cooperativeness, interest, and friendliness have a strong positive relationship (0.39 < Cramer’s V < 0.60), while the associations with talkativeness are generally weaker (0.11 < Cramer’s V < 0.25).

Table 1.

Descriptive Statistics of Interviewer Evaluations

MeasurePercentage
Cooperativeness
 Fair and below7.62
 Good29.10
 Very good63.28
Interest
 Average and below41.34
 Above average33.49
 Very high25.17
Friendliness
 Cooperative but not particularly eager and below40.18
 Friendly and eager59.82
Talkativeness
 Very untalkative3.00
 Somewhat untalkative12.01
 Neither talkative nor untalkative45.27
 Somewhat talkative30.95
 Very talkative8.78
MeasurePercentage
Cooperativeness
 Fair and below7.62
 Good29.10
 Very good63.28
Interest
 Average and below41.34
 Above average33.49
 Very high25.17
Friendliness
 Cooperative but not particularly eager and below40.18
 Friendly and eager59.82
Talkativeness
 Very untalkative3.00
 Somewhat untalkative12.01
 Neither talkative nor untalkative45.27
 Somewhat talkative30.95
 Very talkative8.78

Note.—n = 433.

Table 1.

Descriptive Statistics of Interviewer Evaluations

MeasurePercentage
Cooperativeness
 Fair and below7.62
 Good29.10
 Very good63.28
Interest
 Average and below41.34
 Above average33.49
 Very high25.17
Friendliness
 Cooperative but not particularly eager and below40.18
 Friendly and eager59.82
Talkativeness
 Very untalkative3.00
 Somewhat untalkative12.01
 Neither talkative nor untalkative45.27
 Somewhat talkative30.95
 Very talkative8.78
MeasurePercentage
Cooperativeness
 Fair and below7.62
 Good29.10
 Very good63.28
Interest
 Average and below41.34
 Above average33.49
 Very high25.17
Friendliness
 Cooperative but not particularly eager and below40.18
 Friendly and eager59.82
Talkativeness
 Very untalkative3.00
 Somewhat untalkative12.01
 Neither talkative nor untalkative45.27
 Somewhat talkative30.95
 Very talkative8.78

Note.—n = 433.

INDEPENDENT VARIABLES: HEURISTIC PROCESSING

Table 2 provides an overview of respondent and interviewer characteristics indicative of heuristic processing. Respondents reported their age, gender, race, education, and income during the interview. Interviewer gender, race, and experience are included in the model as fixed characteristics of the interviewer and come from administrative records. Additionally, we include the interviewer’s cooperation rate, operationalized through the percent of call attempts with a contact made by the interviewer that yielded a successful interview, divided into higher and lower cooperation rate groups based on a median split (6.9 percent cooperation rate).

Table 2.

Percentage Distribution of Respondent and Interviewer Characteristics

Respondent characteristics and question wording (if applicable) (n = 433)(Recoded) response categoriesPercentMeanSD
Age: What is your age?35 and less8.7
36 to 5016.2
51 and above70.0
Gender: I have to read every question in this survey, even if it seems obvious. What is your sex?Male36.0
Female64.0
Race: [IF NON-HISPANIC ASK:] What is your race? Are you white, black, Asian, or some other?
[IF HISPANIC ASK:] Are you white Hispanic, black Hispanic, or some other race?
White87.3
Nonwhite12.7
Education: What is the last grade or class that you completed in school? [INTERVIEWER CODE, DO NOT READ]High school and less28.9
Vocational29.3
College and above41.8
1 None, or grade 1–8
2 High school incomplete (grades 9–11)
3 High school graduate (grade 12 or GED certificate)
4 Business, technical, or vocational school AFTER high school
5 Some college, no 4-year degree
6 College graduate (BS, BA, or other 4-year degree)
7 Postgraduate training or professional schooling after college (e.g., toward a master’s degree or PhD; law or medical school)
Respondent characteristics and question wording (if applicable) (n = 433)(Recoded) response categoriesPercentMeanSD
Income: Last year, that is, in 2012, what was your total family income from all sources, before taxes? Just stop me when I get to the right category. [READ]$49,999 and less58.2
$50,000 and above41.8
1 Less than $10,000
2 $10,000 to under $20,000
3 $20,000 to under $30,000
4 $30,000 to under $40,000
5 $40,000 to under $50,000
6 $50,000 to under $75,000
7 $75,000 to under $100,000
8 $100,000 or more
Respondent controls
Married: Are you married, partnered, divorced, separated, widowed, or never been married?No52.2
Yes47.8
Computer user: The next few questions are about leisure activities using a computer. Do you happen to have a desktop, laptop or tablet computer?No22.3
Yes77.7
# of questions asked46.74.50
Interviewer characteristics (n = 19)(Recoded) response categoriesPercentMeanSD
GenderMale52.6
Female47.4
RaceWhite47.4
Nonwhite52.6
Experience0 years26.3
1+ years73.7
Cooperation rateLow57.9
High42.1
Respondent characteristics and question wording (if applicable) (n = 433)(Recoded) response categoriesPercentMeanSD
Age: What is your age?35 and less8.7
36 to 5016.2
51 and above70.0
Gender: I have to read every question in this survey, even if it seems obvious. What is your sex?Male36.0
Female64.0
Race: [IF NON-HISPANIC ASK:] What is your race? Are you white, black, Asian, or some other?
[IF HISPANIC ASK:] Are you white Hispanic, black Hispanic, or some other race?
White87.3
Nonwhite12.7
Education: What is the last grade or class that you completed in school? [INTERVIEWER CODE, DO NOT READ]High school and less28.9
Vocational29.3
College and above41.8
1 None, or grade 1–8
2 High school incomplete (grades 9–11)
3 High school graduate (grade 12 or GED certificate)
4 Business, technical, or vocational school AFTER high school
5 Some college, no 4-year degree
6 College graduate (BS, BA, or other 4-year degree)
7 Postgraduate training or professional schooling after college (e.g., toward a master’s degree or PhD; law or medical school)
Respondent characteristics and question wording (if applicable) (n = 433)(Recoded) response categoriesPercentMeanSD
Income: Last year, that is, in 2012, what was your total family income from all sources, before taxes? Just stop me when I get to the right category. [READ]$49,999 and less58.2
$50,000 and above41.8
1 Less than $10,000
2 $10,000 to under $20,000
3 $20,000 to under $30,000
4 $30,000 to under $40,000
5 $40,000 to under $50,000
6 $50,000 to under $75,000
7 $75,000 to under $100,000
8 $100,000 or more
Respondent controls
Married: Are you married, partnered, divorced, separated, widowed, or never been married?No52.2
Yes47.8
Computer user: The next few questions are about leisure activities using a computer. Do you happen to have a desktop, laptop or tablet computer?No22.3
Yes77.7
# of questions asked46.74.50
Interviewer characteristics (n = 19)(Recoded) response categoriesPercentMeanSD
GenderMale52.6
Female47.4
RaceWhite47.4
Nonwhite52.6
Experience0 years26.3
1+ years73.7
Cooperation rateLow57.9
High42.1
Table 2.

Percentage Distribution of Respondent and Interviewer Characteristics

Respondent characteristics and question wording (if applicable) (n = 433)(Recoded) response categoriesPercentMeanSD
Age: What is your age?35 and less8.7
36 to 5016.2
51 and above70.0
Gender: I have to read every question in this survey, even if it seems obvious. What is your sex?Male36.0
Female64.0
Race: [IF NON-HISPANIC ASK:] What is your race? Are you white, black, Asian, or some other?
[IF HISPANIC ASK:] Are you white Hispanic, black Hispanic, or some other race?
White87.3
Nonwhite12.7
Education: What is the last grade or class that you completed in school? [INTERVIEWER CODE, DO NOT READ]High school and less28.9
Vocational29.3
College and above41.8
1 None, or grade 1–8
2 High school incomplete (grades 9–11)
3 High school graduate (grade 12 or GED certificate)
4 Business, technical, or vocational school AFTER high school
5 Some college, no 4-year degree
6 College graduate (BS, BA, or other 4-year degree)
7 Postgraduate training or professional schooling after college (e.g., toward a master’s degree or PhD; law or medical school)
Respondent characteristics and question wording (if applicable) (n = 433)(Recoded) response categoriesPercentMeanSD
Income: Last year, that is, in 2012, what was your total family income from all sources, before taxes? Just stop me when I get to the right category. [READ]$49,999 and less58.2
$50,000 and above41.8
1 Less than $10,000
2 $10,000 to under $20,000
3 $20,000 to under $30,000
4 $30,000 to under $40,000
5 $40,000 to under $50,000
6 $50,000 to under $75,000
7 $75,000 to under $100,000
8 $100,000 or more
Respondent controls
Married: Are you married, partnered, divorced, separated, widowed, or never been married?No52.2
Yes47.8
Computer user: The next few questions are about leisure activities using a computer. Do you happen to have a desktop, laptop or tablet computer?No22.3
Yes77.7
# of questions asked46.74.50
Interviewer characteristics (n = 19)(Recoded) response categoriesPercentMeanSD
GenderMale52.6
Female47.4
RaceWhite47.4
Nonwhite52.6
Experience0 years26.3
1+ years73.7
Cooperation rateLow57.9
High42.1
Respondent characteristics and question wording (if applicable) (n = 433)(Recoded) response categoriesPercentMeanSD
Age: What is your age?35 and less8.7
36 to 5016.2
51 and above70.0
Gender: I have to read every question in this survey, even if it seems obvious. What is your sex?Male36.0
Female64.0
Race: [IF NON-HISPANIC ASK:] What is your race? Are you white, black, Asian, or some other?
[IF HISPANIC ASK:] Are you white Hispanic, black Hispanic, or some other race?
White87.3
Nonwhite12.7
Education: What is the last grade or class that you completed in school? [INTERVIEWER CODE, DO NOT READ]High school and less28.9
Vocational29.3
College and above41.8
1 None, or grade 1–8
2 High school incomplete (grades 9–11)
3 High school graduate (grade 12 or GED certificate)
4 Business, technical, or vocational school AFTER high school
5 Some college, no 4-year degree
6 College graduate (BS, BA, or other 4-year degree)
7 Postgraduate training or professional schooling after college (e.g., toward a master’s degree or PhD; law or medical school)
Respondent characteristics and question wording (if applicable) (n = 433)(Recoded) response categoriesPercentMeanSD
Income: Last year, that is, in 2012, what was your total family income from all sources, before taxes? Just stop me when I get to the right category. [READ]$49,999 and less58.2
$50,000 and above41.8
1 Less than $10,000
2 $10,000 to under $20,000
3 $20,000 to under $30,000
4 $30,000 to under $40,000
5 $40,000 to under $50,000
6 $50,000 to under $75,000
7 $75,000 to under $100,000
8 $100,000 or more
Respondent controls
Married: Are you married, partnered, divorced, separated, widowed, or never been married?No52.2
Yes47.8
Computer user: The next few questions are about leisure activities using a computer. Do you happen to have a desktop, laptop or tablet computer?No22.3
Yes77.7
# of questions asked46.74.50
Interviewer characteristics (n = 19)(Recoded) response categoriesPercentMeanSD
GenderMale52.6
Female47.4
RaceWhite47.4
Nonwhite52.6
Experience0 years26.3
1+ years73.7
Cooperation rateLow57.9
High42.1

INDEPENDENT VARIABLES: SYSTEMATIC PROCESSING

We derive indicators of systematic processing from behavior codes, typically used to understand the interviewer-respondent interaction in survey interviews (e.g., Schaeffer and Dykema 2011). Each interview was digitally audiorecorded and transcribed. Then, a team of trained coders behavior coded each survey transcript. The behavior codes were assigned at the conversational-turn level, with codes assigned for the actor (respondent or interviewer); the initial action (e.g., answer provided); an assessment of the initial action (e.g., whether the answer provided was adequate, qualified, or uncodable); a more specific assessment of this action (e.g., whether the answer was provided with or without elaborations); laughter (whether the respondent laughed or not); any disfluencies during any part of the turn; and interruptions. Table 3 provides examples of each of these codes.

Table 3.

Kappa Statistics for Behavior Codes and Examples of Respondent Behaviors, Work and Leisure Today Survey

Behavior codeKappaExample 1Example 2
1) Actor0.998RespondentRespondent
2) Initial action0.88Answers questionAsks for clarification or definition
3) Assessment of initial action0.21 to 0.76Provides adequate answerAsks to repeat response options
4) Details of action0.56 to 0.68Without elaborationn.a.
5) Laughter0.96The respondent laughsNo laughter
6) Disfluencies0.87There are no disfluencies, stutters, or repairsThere are disfluencies, stutters, or repairs
7) Interruptions0.94There are no interruptionsThe respondent interrupts the interviewer
Behavior codeKappaExample 1Example 2
1) Actor0.998RespondentRespondent
2) Initial action0.88Answers questionAsks for clarification or definition
3) Assessment of initial action0.21 to 0.76Provides adequate answerAsks to repeat response options
4) Details of action0.56 to 0.68Without elaborationn.a.
5) Laughter0.96The respondent laughsNo laughter
6) Disfluencies0.87There are no disfluencies, stutters, or repairsThere are disfluencies, stutters, or repairs
7) Interruptions0.94There are no interruptionsThe respondent interrupts the interviewer
Table 3.

Kappa Statistics for Behavior Codes and Examples of Respondent Behaviors, Work and Leisure Today Survey

Behavior codeKappaExample 1Example 2
1) Actor0.998RespondentRespondent
2) Initial action0.88Answers questionAsks for clarification or definition
3) Assessment of initial action0.21 to 0.76Provides adequate answerAsks to repeat response options
4) Details of action0.56 to 0.68Without elaborationn.a.
5) Laughter0.96The respondent laughsNo laughter
6) Disfluencies0.87There are no disfluencies, stutters, or repairsThere are disfluencies, stutters, or repairs
7) Interruptions0.94There are no interruptionsThe respondent interrupts the interviewer
Behavior codeKappaExample 1Example 2
1) Actor0.998RespondentRespondent
2) Initial action0.88Answers questionAsks for clarification or definition
3) Assessment of initial action0.21 to 0.76Provides adequate answerAsks to repeat response options
4) Details of action0.56 to 0.68Without elaborationn.a.
5) Laughter0.96The respondent laughsNo laughter
6) Disfluencies0.87There are no disfluencies, stutters, or repairsThere are disfluencies, stutters, or repairs
7) Interruptions0.94There are no interruptionsThe respondent interrupts the interviewer

Expert coders independently double-coded a 10 percent subsample of the survey transcripts to assess intercoder reliability. The reliability of these codes was quite high (table 3)—all but one kappa value exceeded 0.56, meeting a minimum kappa requirement of 0.40 (Bilgen and Belli 2010). The exception was the assessment of type of clarification (kappa = 0.21); thus, we aggregate clarifications into four more general categories.

We differentiate between four types of respondent behaviors with the behavior codes: (1) respondent-answering behavior such as providing an adequate response with or without elaborations (e.g., respondents stating “5” versus elaborating on their answer “5. I really like swimming”), or an uncodable answer that cannot be coded into the response format; (2) nonverbal utterances such as laughter or disfluencies; (3) personal involvement and rapport reflecting more general conversational processes and rapport; and (4) requests for clarifications such as asking for a definition of a term indicative of some form of cognitive difficulty.

We calculate the number of conversational turns on which each respondent behavior occurred throughout the entire interview for each respondent. Table 4 provides a summary of each behavior, its definition, and descriptive statistics.

Table 4.

Mean and Standard Deviation of Total Number of Turns with Respondent Behaviors

Behavior codeDefinitionMeanSD
Respondent answering behaviors
 Adequate answerProvides an answer that can be coded according to the response format45.839.06
  With elaboration3.573.41
  Without elaboration42.189.51
 Qualified answerAnswers with a qualifier that shows uncertainty5.034.47
  With elaboration0.911.61
  Without elaboration4.123.61
 Uncodable answerProvides an answer that cannot be coded according to the response format10.038.57
  With elaboration4.865.77
  Without elaboration5.174.11
 Don’t knowStates that they don’t know or don’t remember the answer1.001.34
 RefusalRefuses to answer the question0.631.35
 “Other” answerStates that they have an answer to a previous question or disagree with an interviewer0.150.55
Nonverbal utterances
 LaughterRespondent laughs5.135.68
 DisfluencyWhether there are any disfluencies, stutters, or repairs19.5312.59
Personal involvement and rapport
 Agrees with interviewerAgrees with interviewer, either as verification or as showing understanding0.971.79
 Affirmative feedbackProvides an affirmative statement7.005.68
 Acknowledging feedbackThanks interviewer or gives indication that they are thinking2.854.18
 Task-related feedbackTask-, time-, and telephone quality-related feedback0.240.73
 DigressionEngages in off-topic conversation1.272.86
 Personal disclosureMakes statement about self or own attitudes (outside of response)4.026.68
 “Other” feedbackStates an apology or negation1.221.38
Requests for clarification
 Interrupts interviewerRespondent interrupts the interviewer12.4512.56
 Clarification—repeatAsks for repetition of the question, the response options, or definition1.572.04
 Clarification—definitionAsks for a definition of a term0.480.92
 Clarification—whatSays “What?” or “What did you say?”1.841.86
 Clarification—unitAsks for unit of measurement for the response0.681.10
Behavior codeDefinitionMeanSD
Respondent answering behaviors
 Adequate answerProvides an answer that can be coded according to the response format45.839.06
  With elaboration3.573.41
  Without elaboration42.189.51
 Qualified answerAnswers with a qualifier that shows uncertainty5.034.47
  With elaboration0.911.61
  Without elaboration4.123.61
 Uncodable answerProvides an answer that cannot be coded according to the response format10.038.57
  With elaboration4.865.77
  Without elaboration5.174.11
 Don’t knowStates that they don’t know or don’t remember the answer1.001.34
 RefusalRefuses to answer the question0.631.35
 “Other” answerStates that they have an answer to a previous question or disagree with an interviewer0.150.55
Nonverbal utterances
 LaughterRespondent laughs5.135.68
 DisfluencyWhether there are any disfluencies, stutters, or repairs19.5312.59
Personal involvement and rapport
 Agrees with interviewerAgrees with interviewer, either as verification or as showing understanding0.971.79
 Affirmative feedbackProvides an affirmative statement7.005.68
 Acknowledging feedbackThanks interviewer or gives indication that they are thinking2.854.18
 Task-related feedbackTask-, time-, and telephone quality-related feedback0.240.73
 DigressionEngages in off-topic conversation1.272.86
 Personal disclosureMakes statement about self or own attitudes (outside of response)4.026.68
 “Other” feedbackStates an apology or negation1.221.38
Requests for clarification
 Interrupts interviewerRespondent interrupts the interviewer12.4512.56
 Clarification—repeatAsks for repetition of the question, the response options, or definition1.572.04
 Clarification—definitionAsks for a definition of a term0.480.92
 Clarification—whatSays “What?” or “What did you say?”1.841.86
 Clarification—unitAsks for unit of measurement for the response0.681.10

Note.—n = 433.

Table 4.

Mean and Standard Deviation of Total Number of Turns with Respondent Behaviors

Behavior codeDefinitionMeanSD
Respondent answering behaviors
 Adequate answerProvides an answer that can be coded according to the response format45.839.06
  With elaboration3.573.41
  Without elaboration42.189.51
 Qualified answerAnswers with a qualifier that shows uncertainty5.034.47
  With elaboration0.911.61
  Without elaboration4.123.61
 Uncodable answerProvides an answer that cannot be coded according to the response format10.038.57
  With elaboration4.865.77
  Without elaboration5.174.11
 Don’t knowStates that they don’t know or don’t remember the answer1.001.34
 RefusalRefuses to answer the question0.631.35
 “Other” answerStates that they have an answer to a previous question or disagree with an interviewer0.150.55
Nonverbal utterances
 LaughterRespondent laughs5.135.68
 DisfluencyWhether there are any disfluencies, stutters, or repairs19.5312.59
Personal involvement and rapport
 Agrees with interviewerAgrees with interviewer, either as verification or as showing understanding0.971.79
 Affirmative feedbackProvides an affirmative statement7.005.68
 Acknowledging feedbackThanks interviewer or gives indication that they are thinking2.854.18
 Task-related feedbackTask-, time-, and telephone quality-related feedback0.240.73
 DigressionEngages in off-topic conversation1.272.86
 Personal disclosureMakes statement about self or own attitudes (outside of response)4.026.68
 “Other” feedbackStates an apology or negation1.221.38
Requests for clarification
 Interrupts interviewerRespondent interrupts the interviewer12.4512.56
 Clarification—repeatAsks for repetition of the question, the response options, or definition1.572.04
 Clarification—definitionAsks for a definition of a term0.480.92
 Clarification—whatSays “What?” or “What did you say?”1.841.86
 Clarification—unitAsks for unit of measurement for the response0.681.10
Behavior codeDefinitionMeanSD
Respondent answering behaviors
 Adequate answerProvides an answer that can be coded according to the response format45.839.06
  With elaboration3.573.41
  Without elaboration42.189.51
 Qualified answerAnswers with a qualifier that shows uncertainty5.034.47
  With elaboration0.911.61
  Without elaboration4.123.61
 Uncodable answerProvides an answer that cannot be coded according to the response format10.038.57
  With elaboration4.865.77
  Without elaboration5.174.11
 Don’t knowStates that they don’t know or don’t remember the answer1.001.34
 RefusalRefuses to answer the question0.631.35
 “Other” answerStates that they have an answer to a previous question or disagree with an interviewer0.150.55
Nonverbal utterances
 LaughterRespondent laughs5.135.68
 DisfluencyWhether there are any disfluencies, stutters, or repairs19.5312.59
Personal involvement and rapport
 Agrees with interviewerAgrees with interviewer, either as verification or as showing understanding0.971.79
 Affirmative feedbackProvides an affirmative statement7.005.68
 Acknowledging feedbackThanks interviewer or gives indication that they are thinking2.854.18
 Task-related feedbackTask-, time-, and telephone quality-related feedback0.240.73
 DigressionEngages in off-topic conversation1.272.86
 Personal disclosureMakes statement about self or own attitudes (outside of response)4.026.68
 “Other” feedbackStates an apology or negation1.221.38
Requests for clarification
 Interrupts interviewerRespondent interrupts the interviewer12.4512.56
 Clarification—repeatAsks for repetition of the question, the response options, or definition1.572.04
 Clarification—definitionAsks for a definition of a term0.480.92
 Clarification—whatSays “What?” or “What did you say?”1.841.86
 Clarification—unitAsks for unit of measurement for the response0.681.10

Note.—n = 433.

Adequate responses occur on an average of 45.83 conversational turns, with respondents providing adequate responses without elaboration on an average of 42.18 conversational turns and with elaboration on 3.57 turns. Providing an uncodable response was the second most frequent response behavior, occurring on 10.03 turns, roughly equally split between uncodable responses without elaborations (mean = 5.17) and with elaborations (mean = 4.86). Disfluencies occur on an average of 19.53 conversational turns, and interruptions occur on 12.45 turns.

CONTROLS

We include household composition (proxied with marital status) and general questionnaire burden, that is, whether a respondent triggered a series of follow-up questions related to computer use and the number of questions (see table 2).

All continuous independent variables are grand-mean-centered (Raudenbush and Byrk 2002).

METHODS

We evaluate the association between indicators of heuristic and systematic processing for each of the interviewer assessments using a two-level ordered logistic regression model with an interviewer random effect with the meologit command in Stata 14 (Stata 2015). Each interviewer j = 1 … M has i = 1 … nj respondents, with K possible outcomes and cutpoints labeled as K1, K2KK-1, including xij covariates for the fixed effects, and random effects uj. For response yij, the probability of observing outcome k is pij=Pr(yij=k|K, uj)=H(Kk xijβuj)H(Kk1 xijβuj) (Stata 2015). Friendliness is a binary variable and therefore analyzed with a traditional logistic regression model.

Ordered-logistic models assume proportional odds across each pair of outcomes. Although this assumption is violated in some instances, our conclusions do not change, so we report the more parsimonious ordered-logistic models.1 Several robustness checks, including outlier diagnostics for the independent variables, show that all results generally hold when re-estimating these models censoring cases with high numbers of behaviors at the 95th percentile (results available on request).

Three models were estimated for each of the interviewer assessments. Model 1 is a null model as a baseline; model 2 includes variables associated with heuristic processing and controls; and model 3 adds variables capturing systematic processing. Unless indicated otherwise, none of the results from previously estimated models change using this stepwise approach. As such, we display results for model 3 only (for full results, see online appendix B).

We report the odds ratio and the average marginal effects (AME) for each statistically significant respondent behavior in the text. For each respondent, the AME calculates the difference in the predicted probability between each category of the outcome variable, holding the independent variable at a given value. This difference is then averaged across all respondents. Mathematically, the marginal change in probability is computed as (Long and Freese 2006):

Pr(y=k|x)xij=F(Kkxβ)xijF(Kk1xβ)xij

Holding all other variables constant, this is the slope of the curve relating xij to Pr(y=k|x) for each outcome. For categorical variables, the interpretation of the AME is straightforward—the effect of being in the focal category of the independent variable compared to the reference category. For continuous variables, the AME is related to a very small change (approximately the standard deviation of the variable divided by 1,000) in the independent variable. An AME of 5.0 would indicate that a very small unit increase in the independent variable (e.g., number of conversational turns with adequate responses) yields a five-percentage-point increase in the probability of a specific outcome occurring (e.g., the interviewer rating the respondent as very talkative). AMEs yield a straightforward interpretation of effect sizes and can be compared across models (Mood 2010). Because ordinal logistic regression models have multiple outcome categories, we only report the AME in percentage points for the highest category of each interviewer rating.

Results: Which Strategies Do Interviewers Use?

HEURISTIC PROCESSING

In an empty two-level model, interviewer-related variance components account for between 27 and 36 percent of the total variance in evaluations (p < 0.05).2

Table 5 presents the coefficients for the heuristic-processing models and controls based on the full models for each of the four interviewer evaluations. Evidence of heuristic processing for any of the outcomes in this study is limited. Talkativeness is explained by respondent characteristics, with an 18.8 percent reduction in the interviewer variance for talkativeness due to the inclusion of respondent characteristics. The associations between heuristic processing and the other outcomes are more modest, reaching a 4 percent reduction in interviewer-level variance for cooperativeness and modest increases in interviewer-level variance for interest and friendliness evaluations.

Table 5.

Multilevel Ordered Logistic Regression Coefficients and Standard Errors Predicting Interviewer Evaluations of Cooperativeness, Interest, Friendliness, and Talkativeness with Indicators of Heuristic and Systematic Processing

CooperativenessInterestFriendlinessTalkativeness
Heuristic processing
 Respondent characteristics
  Age (cent.)0.00–0.000.000.00
  Female (ref. Male)–0.07–0.230.090.76***
  Nonwhite (ref. White)0.030.25–0.15–0.53
  High school or less–0.39–0.63*–0.580.67**
  Income–0.04–0.07–0.040.01
 Interviewer characteristics
  Female (ref. Male)0.35–1.080.190.59
  Nonwhite (ref. White)–0.47–0.090.23–0.74
  Interviewer experience 1+ year(s)–1.18–1.020.85–0.25
  Cooperation rate–0.040.48–0.65–0.62
 Respondent control variables
  Married (ref. Unmarried)0.19–0.10–0.24–0.37
  Computer user0.88*0.68*0.120.03
  # of questions asked (cent.)0.020.040.050.04
Systematic processing
 Respondent answering behaviors
  Adequate answer with elaboration–0.040.050.020.17***
  Adequate answer w/o elaboration–0.03–0.010.000.04
  Qualified answer with elaboration–0.120.070.40**0.10
  Qualified answer w/o elaboration–0.03–0.040.030.02
  Uncodable answer with elaboration–0.050.01–0.070.08*
  Uncodable answer w/o elaboration–0.09*–0.08*–0.03–0.03
  Don’t know–0.17–0.33***–0.38***–0.17*
  Refusal–0.37***–0.26**–0.30**–0.07
  “Other” answer0.240.070.00–0.03
 Nonverbal utterances
  Laughter0.060.08**0.14***–0.03
  Disfluency0.04**0.03**0.020.02
 Personal involvement and rapport
  Agrees with interviewer–0.08–0.030.08–0.03
  Affirmative feedback0.14***–0.030.05–0.05
  Acknowledging feedback0.040.040.090.04
  Task-related feedback–0.04–0.03–0.27–0.07
  Digression0.110.030.080.07
  Personal disclosure–0.02–0.03–0.050.08**
  “Other” feedback0.06–0.020.02–0.06
 Clarification behaviors
  Interrupts interviewer–0.040.01–0.000.01
  Clarification—repeat–0.020.020.08–0.00
  Clarification—definition–0.03–0.080.040.08
  Clarification—what–0.17*–0.21**–0.14–0.02
  Clarification—unit–0.13–0.04–0.180.02
Intercept 1–4.77***–1.810.41–5.24***
Intercept 2–1.500.55–2.99***
Intercept 30.45
Intercept 43.66**
Interviewer-level variance2.03*2.16**1.89*2.15**
Model fit:
AIC595.57790.28492.03937.98
Observations433433433433
CooperativenessInterestFriendlinessTalkativeness
Heuristic processing
 Respondent characteristics
  Age (cent.)0.00–0.000.000.00
  Female (ref. Male)–0.07–0.230.090.76***
  Nonwhite (ref. White)0.030.25–0.15–0.53
  High school or less–0.39–0.63*–0.580.67**
  Income–0.04–0.07–0.040.01
 Interviewer characteristics
  Female (ref. Male)0.35–1.080.190.59
  Nonwhite (ref. White)–0.47–0.090.23–0.74
  Interviewer experience 1+ year(s)–1.18–1.020.85–0.25
  Cooperation rate–0.040.48–0.65–0.62
 Respondent control variables
  Married (ref. Unmarried)0.19–0.10–0.24–0.37
  Computer user0.88*0.68*0.120.03
  # of questions asked (cent.)0.020.040.050.04
Systematic processing
 Respondent answering behaviors
  Adequate answer with elaboration–0.040.050.020.17***
  Adequate answer w/o elaboration–0.03–0.010.000.04
  Qualified answer with elaboration–0.120.070.40**0.10
  Qualified answer w/o elaboration–0.03–0.040.030.02
  Uncodable answer with elaboration–0.050.01–0.070.08*
  Uncodable answer w/o elaboration–0.09*–0.08*–0.03–0.03
  Don’t know–0.17–0.33***–0.38***–0.17*
  Refusal–0.37***–0.26**–0.30**–0.07
  “Other” answer0.240.070.00–0.03
 Nonverbal utterances
  Laughter0.060.08**0.14***–0.03
  Disfluency0.04**0.03**0.020.02
 Personal involvement and rapport
  Agrees with interviewer–0.08–0.030.08–0.03
  Affirmative feedback0.14***–0.030.05–0.05
  Acknowledging feedback0.040.040.090.04
  Task-related feedback–0.04–0.03–0.27–0.07
  Digression0.110.030.080.07
  Personal disclosure–0.02–0.03–0.050.08**
  “Other” feedback0.06–0.020.02–0.06
 Clarification behaviors
  Interrupts interviewer–0.040.01–0.000.01
  Clarification—repeat–0.020.020.08–0.00
  Clarification—definition–0.03–0.080.040.08
  Clarification—what–0.17*–0.21**–0.14–0.02
  Clarification—unit–0.13–0.04–0.180.02
Intercept 1–4.77***–1.810.41–5.24***
Intercept 2–1.500.55–2.99***
Intercept 30.45
Intercept 43.66**
Interviewer-level variance2.03*2.16**1.89*2.15**
Model fit:
AIC595.57790.28492.03937.98
Observations433433433433

Note.—Model 3. See online appendix B for full models. The intercepts refer to the cutpoints or thresholds of the latent underlying variable y*. When the value of y* is above this threshold, the observed category in the outcome variable y changes (Long and Freese 2006, p. 185). For cooperativeness, fair and below ≤ intercept 1 < good ≤ intercept 2 < very good. For interest, average and below ≤ intercept 1 < above average ≤ intercept 2 < very high. “Friendly and eager” takes the value of 1 in the friendliness model. For talkativeness, very untalkative ≤ intercept 1, …≤ intercept 4 < very talkative.

*p < 0.05, **p < 0.01, ***p < 0.00.

Table 5.

Multilevel Ordered Logistic Regression Coefficients and Standard Errors Predicting Interviewer Evaluations of Cooperativeness, Interest, Friendliness, and Talkativeness with Indicators of Heuristic and Systematic Processing

CooperativenessInterestFriendlinessTalkativeness
Heuristic processing
 Respondent characteristics
  Age (cent.)0.00–0.000.000.00
  Female (ref. Male)–0.07–0.230.090.76***
  Nonwhite (ref. White)0.030.25–0.15–0.53
  High school or less–0.39–0.63*–0.580.67**
  Income–0.04–0.07–0.040.01
 Interviewer characteristics
  Female (ref. Male)0.35–1.080.190.59
  Nonwhite (ref. White)–0.47–0.090.23–0.74
  Interviewer experience 1+ year(s)–1.18–1.020.85–0.25
  Cooperation rate–0.040.48–0.65–0.62
 Respondent control variables
  Married (ref. Unmarried)0.19–0.10–0.24–0.37
  Computer user0.88*0.68*0.120.03
  # of questions asked (cent.)0.020.040.050.04
Systematic processing
 Respondent answering behaviors
  Adequate answer with elaboration–0.040.050.020.17***
  Adequate answer w/o elaboration–0.03–0.010.000.04
  Qualified answer with elaboration–0.120.070.40**0.10
  Qualified answer w/o elaboration–0.03–0.040.030.02
  Uncodable answer with elaboration–0.050.01–0.070.08*
  Uncodable answer w/o elaboration–0.09*–0.08*–0.03–0.03
  Don’t know–0.17–0.33***–0.38***–0.17*
  Refusal–0.37***–0.26**–0.30**–0.07
  “Other” answer0.240.070.00–0.03
 Nonverbal utterances
  Laughter0.060.08**0.14***–0.03
  Disfluency0.04**0.03**0.020.02
 Personal involvement and rapport
  Agrees with interviewer–0.08–0.030.08–0.03
  Affirmative feedback0.14***–0.030.05–0.05
  Acknowledging feedback0.040.040.090.04
  Task-related feedback–0.04–0.03–0.27–0.07
  Digression0.110.030.080.07
  Personal disclosure–0.02–0.03–0.050.08**
  “Other” feedback0.06–0.020.02–0.06
 Clarification behaviors
  Interrupts interviewer–0.040.01–0.000.01
  Clarification—repeat–0.020.020.08–0.00
  Clarification—definition–0.03–0.080.040.08
  Clarification—what–0.17*–0.21**–0.14–0.02
  Clarification—unit–0.13–0.04–0.180.02
Intercept 1–4.77***–1.810.41–5.24***
Intercept 2–1.500.55–2.99***
Intercept 30.45
Intercept 43.66**
Interviewer-level variance2.03*2.16**1.89*2.15**
Model fit:
AIC595.57790.28492.03937.98
Observations433433433433
CooperativenessInterestFriendlinessTalkativeness
Heuristic processing
 Respondent characteristics
  Age (cent.)0.00–0.000.000.00
  Female (ref. Male)–0.07–0.230.090.76***
  Nonwhite (ref. White)0.030.25–0.15–0.53
  High school or less–0.39–0.63*–0.580.67**
  Income–0.04–0.07–0.040.01
 Interviewer characteristics
  Female (ref. Male)0.35–1.080.190.59
  Nonwhite (ref. White)–0.47–0.090.23–0.74
  Interviewer experience 1+ year(s)–1.18–1.020.85–0.25
  Cooperation rate–0.040.48–0.65–0.62
 Respondent control variables
  Married (ref. Unmarried)0.19–0.10–0.24–0.37
  Computer user0.88*0.68*0.120.03
  # of questions asked (cent.)0.020.040.050.04
Systematic processing
 Respondent answering behaviors
  Adequate answer with elaboration–0.040.050.020.17***
  Adequate answer w/o elaboration–0.03–0.010.000.04
  Qualified answer with elaboration–0.120.070.40**0.10
  Qualified answer w/o elaboration–0.03–0.040.030.02
  Uncodable answer with elaboration–0.050.01–0.070.08*
  Uncodable answer w/o elaboration–0.09*–0.08*–0.03–0.03
  Don’t know–0.17–0.33***–0.38***–0.17*
  Refusal–0.37***–0.26**–0.30**–0.07
  “Other” answer0.240.070.00–0.03
 Nonverbal utterances
  Laughter0.060.08**0.14***–0.03
  Disfluency0.04**0.03**0.020.02
 Personal involvement and rapport
  Agrees with interviewer–0.08–0.030.08–0.03
  Affirmative feedback0.14***–0.030.05–0.05
  Acknowledging feedback0.040.040.090.04
  Task-related feedback–0.04–0.03–0.27–0.07
  Digression0.110.030.080.07
  Personal disclosure–0.02–0.03–0.050.08**
  “Other” feedback0.06–0.020.02–0.06
 Clarification behaviors
  Interrupts interviewer–0.040.01–0.000.01
  Clarification—repeat–0.020.020.08–0.00
  Clarification—definition–0.03–0.080.040.08
  Clarification—what–0.17*–0.21**–0.14–0.02
  Clarification—unit–0.13–0.04–0.180.02
Intercept 1–4.77***–1.810.41–5.24***
Intercept 2–1.500.55–2.99***
Intercept 30.45
Intercept 43.66**
Interviewer-level variance2.03*2.16**1.89*2.15**
Model fit:
AIC595.57790.28492.03937.98
Observations433433433433

Note.—Model 3. See online appendix B for full models. The intercepts refer to the cutpoints or thresholds of the latent underlying variable y*. When the value of y* is above this threshold, the observed category in the outcome variable y changes (Long and Freese 2006, p. 185). For cooperativeness, fair and below ≤ intercept 1 < good ≤ intercept 2 < very good. For interest, average and below ≤ intercept 1 < above average ≤ intercept 2 < very high. “Friendly and eager” takes the value of 1 in the friendliness model. For talkativeness, very untalkative ≤ intercept 1, …≤ intercept 4 < very talkative.

*p < 0.05, **p < 0.01, ***p < 0.00.

With respect to respondent characteristics, even after controlling for actual respondent behaviors, interviewers evaluate women as significantly more talkative (OR = 1.96, p < 0.001; AME = 3.90) than men in all models, as expected. Women are 3.9 percentage points more likely to be rated as “very talkative” than men. Additionally, interviewers rate respondents with a high school degree or less as significantly less interested (OR = 0.53, p < 0.05; AME = –7.56), as expected, and significantly more talkative (OR = 2.14, p < 0.001; AME = 3.46) than their more educated counterparts (here, 7.56 percentage points less likely to be rated as “very interested” and 3.46 percentage points more likely to be rated as “very talkative” than their more educated counterparts). Counter to the heuristic-processing hypotheses, none of the respondent characteristics are significantly related to interviewer assessments of respondents’ cooperativeness or friendliness. None of the interviewer characteristics are statistically significantly associated with any of the evaluations.

The interviewer’s gender and race do not moderate the effect of the respondent’s gender and race on each of the four types of evaluations (results not shown). None of the interaction effects are statistically significant, and their inclusion does not change our substantive conclusions.

Regarding the control variables, interviewers evaluate computer users as significantly more cooperative (OR = 2.42, p < 0.05; AME = 1.2) and more interested (OR = 1.98, p < 0.05; AME = 8.2). Although computer use was included as a measure of questionnaire burden, these results suggest that computer use also proxies for higher socioeconomic status. The initially significant positive effect of number of questions asked (OR = 1.09, p < 0.01) on talkativeness is fully absorbed when actual respondent behaviors are included. No other control variable is statistically significantly related to any of the evaluations in this study.

SYSTEMATIC PROCESSING

Table 5 presents the results for the systematic-processing hypotheses based on the full models for each interviewer evaluation. There is clear evidence for systematic behavior-based processing for each of the interviewer evaluations in this study. As expected, behaviors related to the process of responding and nonverbal mannerisms are significant predictors of respondent cooperativeness, interest, friendliness, and talkativeness. Particularly important and supportive of systematic processing is that different responding behaviors predict each of the assessments. Indicators of other conversational processes, including rapport building and clarification requests, are less consistently associated with interviewer assessments.

We hypothesized that adequate answering behaviors would be associated with higher ratings of cooperativeness, interest, and friendliness, whereas any form of inadequate answer would be associated with lower ratings of cooperativeness. Surprisingly, adequate and qualified answers—the most frequent types of respondent answering behaviors—are each associated with only one of the four interviewer evaluations. Elaborations were expected to be negatively associated with ratings of cooperativeness, but positively associated with friendliness and talkativeness. As hypothesized, adequate answers with elaboration (OR = 1.18, p < 0.001; AME = 0.9) are positively associated with respondents being perceived as more talkative but not associated with friendliness (OR = 1.02, p = 0.751). As the number of adequate answers with elaboration increases slightly from the mean, the probability of being evaluated as very talkative increases by 0.9 percentage points. However, providing adequate answers without elaboration is associated neither with evaluations of talkativeness at traditional levels (OR = 1.04, p = 0.103) nor with any other evaluation. Respondents who provide higher numbers of qualified answers with elaborations are evaluated as friendlier (OR = 1.49, p < 0.01; AME = 5.9). Qualified answers with elaborations are not associated with any other evaluation.

As anticipated, interviewers rate respondents more unfavorably the more often a task is left incomplete by the respondent. Interviewers rate respondents who provide more uncodable answers with elaboration (OR = 1.08, p < 0.05; AME = 0.4) as more talkative; don’t know responses (OR = 0.84, p < 0.05; AME = –0.9) result in ratings of respondents as being less talkative. Interviewers rate respondents as less cooperative when they provide more uncodable answers without elaboration (OR = 0.91, p < 0.05; AME = –1.3) or refuse to answer (OR = 0.69, p < 0.001; AME = –5.1). Similarly, respondents who provide more uncodable responses without elaboration (OR = 0.92, p < 0.05; AME = –1.0), who provide more “don’t know” responses (OR = 0.72, p < 0.001; AME = –4.0), and who refuse to respond to a question (OR = 0.77, p < 0.01; AME = –3.1) are evaluated as less interested. Respondents who provide more don’t know responses (OR = 0.68, p < 0.001; AME = –5.6) or refuse to provide a response (OR = 0.74, p < 0.01; AME = –4.4) are also rated as being less friendly.

We anticipate that nonverbal utterances of respondent laughter are positively associated with evaluations of respondent cooperativeness, interest, and friendliness, whereas the effect of disfluencies is less straightforward. Interviewers generally evaluate respondents more favorably when respondents display more of these normal conversational behaviors. Respondents who laugh more are evaluated as being more interested and friendly (OR = 1.08, p < 0.01; AME = 1.0; OR = 1.15, p < 0.001; AME = 2.0), but not as being more cooperative or talkative (OR = 1.06, p = 0.070; OR = 0.98, p = 0.307). Respondents who speak with more disfluencies are rated as being more cooperative and interested, confirming the “disfluency advantage” (OR = 1.04, p < 0.01; AME = 0.6; OR = 1.03, p < 0.01; AME = 0.4). Disfluencies are not associated with evaluations of being friendly (OR = 1.02, p = 0.246) or talkative (OR = 1.02, p = 0.149).

We expected verbal rapport and personal involvement behaviors to be associated with higher ratings of friendliness and talkativeness. Few behaviors related to personal involvement predict such assessments. Respondents who provide more affirmative feedback are perceived as being more cooperative (OR = 1.15, p < 0.001; AME = 1.9), and respondents who disclose personal information more frequently are perceived as more talkative (OR = 1.08, p < 0.01; AME = 0.4). Contrary to expectations, requests for clarification are not associated with higher ratings of cooperativeness and interest: Respondents who use more “What?” clarification requests (e.g., “What did you say?”) are perceived as being less cooperative and less interested (OR = 0.84, p < 0.05; AME = –2.4; OR = 0.81, p < 0.01; AME = –2.6). None of the other personal involvement behaviors are associated with any of the evaluations.

The AIC goodness-of-fit statistics show that while including indicators of heuristic processing improves model fit slightly, the drop in AIC and hence model improvement is largest when incorporating information on respondent behaviors, particularly related to the quality of the response and other nonverbal mannerisms (online appendix B). Interestingly, interviewer-level variance in these evaluations increases in all models once accounting for respondent behaviors, indicating heterogeneity in respondent behaviors across interviewers (Raudenbush and Bryk 2002).

Figure 1 shows the predicted probability for the two extreme categories of each of the interviewer evaluations. More specifically, we computed the predicted probability of being in the extreme categories for each interviewer evaluation. We set each significant independent variable at one standard deviation below the mean, the mean, and one standard deviation above the mean and hold all other variables at their observed values (online appendix C). The predicted probability of being rated as very cooperative (/fair and below) for someone who provides fewer uncodable answers, that is, one standard deviation below the mean, is 0.69 (/0.05) compared to 0.58 (/0.08) for someone who provides more uncodable answers, that is, one standard deviation above the mean.

Figure 1.

Adjusted Predictions for Significant Systematic Processing Indicators.

Our results suggest that interviewers perceive and use respondent behaviors to make their assessments rather than drawing on respondent attributes based on social categories. These results also suggest that interviewers’ assessments are predominantly influenced by respondents’ question-answering behaviors and nonverbal behaviors. Thus, interviewers differentiate across respondent behaviors in their assessments, incorporating those pieces of information that are most relevant to the judgment they are asked to make.

Conclusion and Discussion

Using the continuum model of impression formation, we investigated whether interviewers base their assessment of respondent engagement on stereotypes, their own characteristics, or interactions with the respondent. Overall, interviewer assessments vary systematically across interviewers. Although this systematic variation across interviewers occurs, for the task-related evaluation of cooperativeness, there is no evidence of any heuristic evaluation or inappropriate stereotyping beyond the respondents’ actual behavior. We find a similar lack of association with the more interpersonal assessment of friendliness. Education and gender are associated with the assessments of interest (education only) and talkativeness. None of the interviewer characteristics explained the statistically significant interviewer variation. One possible explanation for this finding could be that while interviewers rely on their own traits and experiences, we do not adequately measure the interviewer characteristics that lead to these differences. For example, gender may not matter as much as an interviewer’s perceptual ability.

Instead of using heuristic processing, interviewers rely on a more sophisticated strategy of information processing based on the quality of the data provided by the respondent and other behaviors throughout the interview. While using systematic processing, interviewers rely primarily on behaviors associated with the immediate response task and measures of nonverbal communication in the interpersonal interaction. That is, although interviewers vary significantly in their assessments of respondents, the assessments are based on the actual interaction with the respondent even if those occur infrequently (e.g., don’t know responses). This is important because respondent behaviors such as uncodable, don’t know, and refusal answers are associated with lower data quality, and in particular, with lower accuracy (e.g., Mathiowetz 1998). Indicators of rapport or personal involvement and requests for clarification indicating cognitive difficulty are much less likely to be associated with these four ratings of respondents.

Further analyses not presented here (see online appendix D) confirm that the assessments made by interviewers are valid. The variability across respondents is greater than the variability across interviewers, and the proportion of variance uniquely explained by indicators of systematic processing is substantially larger compared to the proportion of variance uniquely attributable to heuristic processing. Further research can be done to explain the unexplained variance at both the interviewer and respondent level—the behaviors themselves explain about one-quarter or less of the total variance in the assessments and less than one-third of the within-interviewer variance. Overall, our findings provide insight into the cognitive processes interviewers use when assessing respondents’ engagement.

The implications of these findings are many. First, these results suggest that the extra effort and money spent by survey organizations to collect evaluations results in assessments that reflect the interviewer-respondent interaction. A simple assessment of how the interviewer thinks the interview went is a less expensive insight than a more elaborate behavior coding study. Of course, the evaluations do not indicate what exactly went wrong during an interaction, and thus are not a full replacement for behavior coding. Additionally, this study does not assess exactly how useful these indicators are for assessing measurement error directly; this will be examined in future research.

Second, the implications for the use of these postsurvey evaluations in measurement error models are mixed. It is clear from these data that refusals or reports of don’t know contribute to how interviewers answer these evaluation questions. Thus, studies that use these evaluations to predict item-nonresponse rates (e.g., Kaminska, McCutcheon, and Billiet 2010) use endogenous measures. That is, a significant association between these evaluations and item nonresponse (e.g., Tarnai and Paxson 2005) will not be surprising because the don’t know responses and refusals themselves were used by the interviewer to make these evaluations. To the extent that the ratings identify potential item nonrespondents and are associated with the survey variables of interest, these ratings are useful as covariates in imputation models (as suggested by Mathiowetz [1998]). That is, the endogeneity of these measures is a problem for causal models, but could be beneficial for imputation models.

Third, these interactional properties of an interview could be important to respondents and their willingness to continue to participate in longitudinal studies (e.g., Lepkowski and Couper 2002). The association of the interviewer ratings with interview behaviors suggests that future research should more thoroughly investigate the potential of these ratings in response propensity models (in longitudinal studies) and their potential utility for responsive designs (Groves and Heeringa 2006).

Fourth, survey organizations that want to reduce the amount of variation over interviewers due to factors other than these behaviors could train interviewers about how to fill out these assessments. This kind of additional training on how to complete these evaluations would likely strengthen the association between behaviors and postsurvey evaluations and reduce inter-interviewer variance. Alternatively, if training is difficult or interviewer-related variance cannot be reduced, survey organizations who have collected interviewer assessments over multiple studies are advised to calculate interviewer-adjusted ratings. This kind of calibration will allow research organizations to adjust the evaluations for the interviewer’s own perspective, separate from the behaviors themselves.3

Finally, although the interview behaviors are associated with the assessments of interest, friendliness, cooperativeness, and talkativeness, the explained variation due to these behaviors—that is, the signal-to-noise ratio—is moderate (see online appendix D). That is, for any given respondent, the quality of the measurement is weak (the confidence interval around a predicted value would be wide given the poor measurement). Thus, using these observations to flag an individual respondent for potential removal from a dataset is unwise. Yet these ratings may successfully identify groups of respondents who should be investigated for potentially providing lower-quality data (e.g., evidence of straightlining, satisficing, or other kinds of inconsistent answers). Additionally, because the measures are valid, but somewhat unreliable, survey organizations could compare across studies, across time, or across groups of respondents as long as there is an approximate interpenetration (random assignment) of cases to interviewers, a design feature common in telephone surveys. If interpenetration is not achieved, then an interviewer-adjusted score may be warranted before such comparisons are made.4

This study has limitations. The telephone setting potentially suppresses some of the stereotyping effects relative to a face-to-face survey where interviewers see the physical characteristics of the respondents. Second, the sample is based on a landline RDD survey, leading to a more homogeneous set of respondents. Third, we looked at one study, but expect our results to generalize to other telephone surveys with different topics or lengths. Although the respondent behaviors may differ in another survey, we anticipate that interviewers will incorporate information about the respondents’ behaviors into their evaluations. Future research should examine interviewer evaluations on questionnaires with different types of items (e.g., more sensitive or complex items). Fourth, perceptions and behaviors of the interviewer likely elicit corresponding behavior by the respondent, but interviewer behaviors were not included here. Fifth, our sample size and number of interviewers is limited. Future research should replicate this study using a larger sample. Finally, our study investigates interviewer evaluations collected by an individual survey organization and should be replicated across different organizations to strengthen our inferences.

Overall, our results show that in postsurvey evaluations interviewers evaluate respondents based on their behaviors and distinguish subtleties in those behaviors, rather than their social categories. Telephone survey interview organizations and researchers can be confident that these evaluations provide a valid summary of the interaction between these two key actors.

Supplementary Data

Supplementary data are freely available at Public Opinion Quarterly online.

Antje Kirchner is a research survey methodologist in the Survey Research Division at RTI International, Research Triangle Park, NC, USA, and an adjunct research assistant professor in the Department of Sociology at the University of Nebraska–Lincoln, Lincoln, NE, USA. Kristen Olson is an associate professor in the Department of Sociology at the University of Nebraska–Lincoln, Lincoln, NE, USA. Jolene D. Smyth is an associate professor in the Department of Sociology, and director of the Bureau of Sociological Research at the University of Nebraska–Lincoln, Lincoln, NE, USA. This material is based upon work supported by the National Science Foundation [SES-1132015 to K.O.]. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of the National Science Foundation. An earlier version of this paper was presented at the Annual Conference of the Midwest Association for Public Opinion Research, November 2015, Chicago, IL, USA. The authors thank the anonymous reviewers and the editors for feedback.

References

American National Election Studies (ANES)
.
2013
.
ANES 2012 Pre-Election Questionnaire
. Available at http://www.electionstudies.org/studypages/anes_timeseries_2012/anes_timeseries_2012_qnaire_pre.pdf.

Barrett
Kirsten
Sloan
Matt
Wright
Debra
.
2006
. “
Interviewer Perceptions of Interview Quality
.” In
Proceedings of the ASA, Survey Research Methods Section
, pp.
4026
33
.

Belli
Robert F.
Weiss
Paul S.
Lepkowski
James M.
.
1999
. “
Dynamics of Survey Interviewing and the Quality of Survey Reports: Age Comparisons
.” In
Cognition, Aging and Self-Reports
, edited by
Norbert
Schwarz
Denise
Park
Bärbel
Knäuper
Seymour
Sudman
, pp.
285
302
.
Philadelphia
:
Psychology Press
.

Bilgen
Ipek
Belli
Robert F.
.
2010
. “
Comparison of Verbal Behaviors between Calendar and Standardized Conventional Questionnaires
.”
Journal of Official Statistics
26
:
481
505
.

Blumberg
Stephen J.
Luke
Julian V.
.
2013
.
“Wireless Substitution: Early Release of Estimates from the National Health Interview Survey, July–December 2013.”
National Center for Health Statistics
. Available at http://www.cdc.gov/nchs/data/nhis/earlyrelease/wireless201306.pdf.

Brennan
Susan E.
Schober
Michael F.
.
2001
. “
How Listeners Compensate for Disfluencies in Spontaneous Speech
.”
Journal of Memory and Language
44
:
274
96
. Available at http://www.mfschober.net/BrennanSchober01.pdf.

Cannell
Charles F.
Miller
Peter V.
Oksenberg
Lois
.
1981
. “
Research on Interviewing Techniques
.” In
Sociological Methodology
, edited by
Samuel
Leinhardt
, pp.
389
437
.
San Francisco
:
Jossey-Bass
.

Chaiken
Shelly
.
1980
. “
Heuristic Versus Systematic Information Processing and the Use of Source Versus Message Cues in Persuasion
.”
Journal of Personality and Social Psychology
39
:
752
66
.

Chaiken
Shelly
Torpe
Yaacov
.
1999
.
Dual-Process Theories in Social Psychology
.
New York
:
Guilford Press
.

Chen
Serena
Duckworth
Kimberly
Chaiken
Shelly
.
1999
. “
Motivated Heuristic and Systematic Processing
.”
Psychological Inquiry
10
:
44
49
.

Conrad
Frederick G.
Broome
Jessica S.
Benkí
José R.
Kreuter
Frauke
Groves
Robert M.
Vannette
David
McClain
Colleen
.
2013
. “
Interviewer Speech and the Success of Survey Invitations
.”
Journal of the Royal Statistical Society
176
:
191
210
.

Conrad
Frederick G.
Schober
Michael F.
Dijkstra
Wil
.
2008
. “
Cues of Communication Difficulty in Telephone Interviews
.” In
Advances in Telephone Survey Methodology
, edited by
James M.
Lepkowski
Clyde
Tucker
Michael J.
Brick
Edith de
Leeuw
Lilli
Japec
Paul J.
Lavrakas
Michael W.
Link
Roberta L.
Sangster
, pp.
212
30
.
Hoboken, NJ
:
John Wiley & Sons
.

Dykema
Jennifer
Lepkowski
James M.
Blixt
Steven
.
1997
. “
The Effect of Interviewer and Respondent Behavior on Data Quality: Analysis of Interaction Coding in a Validation Study
.” In
Survey Measurement and Process Quality
, edited by
Lars E.
Lyberg
Paul P.
Biemer
Martin
Collins
Edith de
Leeuw
Cathryn
Dippo
Norbert
Schwarz
Dennis
Trewin
, pp.
287
310
.
Hoboken, NJ
:
John Wiley & Sons
.

Earls
Felton J.
Brooks-Gunn
Jeanne
Raudenbusch
Stephen W.
Sampson
Robert J.
.
2000
.
“Project on Human Development in Chicago Neighborhoods (PHDCN): Child and Adolescent Behavior Rating Scale, Wave 3, 2000–2002 (ICPSR 13678).”
Available at http://www.icpsr.umich.edu/cgi-bin/file?comp=none&study=13678&ds=0&file_id=895548&path=ICPSR.

European Social Survey (ESS)
.
2014
.
“ESS Round 7 Source Questionnaire.” London: ESS ERIC Headquarters, Centre for Comparative Social Surveys, City University London
. Available at http://www.europeansocialsurvey.org/docs/round7/fieldwork/source/ESS7_source_main_questionnaire.pdf.

Fiske
Susan T
.
2000
. “
Stereotyping, Prejudice, and Discrimination at the Seam between the Centuries: Evolution, Culture, Mind, and Brain
.”
European Journal of Social Psychology
30
:
299
322
.

Fiske
Susan T.
Cuddy
Amy J. C.
Glick
Peter
Xu
Jun
.
2002
. “
A Model of (Often Mixed) Stereotype Content: Competence and Warmth Respectively Follow from Perceived Status and Competition
.”
Journal of Personality and Social Psychology
82
:
878
902
.

Fiske
Susan T.
Lin
Monica
Neuberg
Steven L.
.
1999
. “
The Continuum Model. Ten Years Later
.” In
Dual-Process Theories in Social Psychology
, edited by
Shelly
Chaiken
and
Yaacov
Trope
, pp.
231
54
.
New York/London
:
Guilford Press
.

Freedman
Vicki
Stafford
Frank P.
Conrad
Frederick G.
Schwarz
Norbert
Cornman
Jennifer
.
2012
. “
Assessing Time Diary Quality for Older Couples: An Analysis of the Panel Study of Income Dynamics’ Disability and Use of Time (DUST) Supplement
.”
Annals of Economics and Statistics
105/106
:
271
89
.

Garbarski
Dana
Schaeffer
Nora Cate
Dykema
Jennifer
.
2016
. “
Interviewing Practices, Conversational Practices, and Rapport: Responsiveness and Engagement in the Standardized Survey Interview
.”
Sociological Methodology
46
:
1
38
.

Greenwald
Anthony G.
Banaji
Mahzarin R.
.
1995
. “
Implicit Social Cognition: Attitudes, Self-Esteem, and Stereotypes
.”
Psychological Review
102
:
4
27
.

Groves
Robert M.
Heeringa
Steven G.
.
2006
. “
Responsive Design for Household Surveys: Tools for Actively Controlling Survey Errors and Costs
.”
Journal of the Royal Statistical Society A
169
:
439
57
.

Holbrook
Allyson L.
Anand
Sowmya
Johnson
Timothy P.
Cho
Young Ik
Shavitt
Sharon
Chávez
Noel
Weiner
Saul
.
2014
. “
Response Heaping in Interviewer-Administered Surveys: Is it Really a Form of Satisficing
?”
Public Opinion Quarterly
78
:
591
633
.

Hurtado
Aída
.
1994
. “
Does Similarity Breed Respect: Interviewer Evaluations of Mexican-Descent Respondents in a Bilingual Survey
.”
Public Opinion Quarterly
58
:
77
95
.

Jans
Matthew E
.
2010
. “
Verbal Paradata and Survey Error: Respondent Speech, Voice, and Question-Answering Behavior Can Predict Income Item Nonresponse
.” PhD diss.,
University of Michigan
.

Japec
Lilli
.
2008
. “
Interviewer Error and Interviewer Burden
.” In
Advances in Telephone Survey Methodology
, edited by
James M.
Lepkowski
Clyde
Tucker
J. Michael
Brick
Edith de
Leeuw
Lilli
Japec
Paul J.
Lavrakas
Michael W.
Link
L.
Roberta
, pp.
185
211
.
Hoboken, NJ
:
John Wiley & Sons
.

Johnson
Timothy P.
Shariff-Marco
Salma
Willis
Gordon
Cho
Young Ik
Breen
Nancy
Gee
Gilbert C.
Krieger
Nancy
Grant
David
Alegria
Margarita
Mays
Vickie M.
Williams
David R.
Landrine
Hope
Liu
Benmei
Reeve
Bryce B.
Takeuchi
David
Ponce
Ninez A.
.
2015
. “
Sources of Interactional Problems in a Survey of Racial/Ethnic Discrimination
.”
International Journal of Public Opinion Research
27
:
244
63
.

Kaminska
Olena
McCutcheon
Allan L.
Billiet
Jaak
.
2010
. “
Satisficing among Reluctant Respondents in a Cross-National Context
.”
Public Opinion Quarterly
74
:
956
84
.

Krauss
Robert M.
Freyberg
Robin
Morsella
Ezequiel
.
2002
. “
Inferring Speakers’ Physical Attributes from Their Voices
.”
Journal of Experimental Social Psychology
38
:
618
25
.

Lepkowski
James M.
Couper
Mick P.
.
2002
. “
Nonresponse in the Second Wave of Longitudinal Household Surveys
.” In
Survey Nonresponse
, edited by
Robert M.
Groves
Don A.
Dillman
John L.
Eltinge
Roderick J. A.
Little
, pp.
259
72
.
Hoboken, NJ
:
John Wiley & Sons
.

Long
Scott J.
Freese
Jeremy
.
2006
.
Regression Models for Categorical Dependent Variables Using Stata
, 2nd ed.
College Station, TX
:
Stata Press
.

Mathiowetz
Nancy A
.
1998
. “
Respondent Expressions of Uncertainty Data Source for Imputation
.”
Public Opinion Quarterly
62
:
47
56
.

Medway
Rebecca
Tourangeau
Roger
.
2015
. “
Response Quality in Telephone Surveys: Do Prepaid Cash Incentives Make a Difference
?”
Public Opinion Quarterly
79
:
524
43
.

Mood
Carina
.
2010
. “
Logistic Regression: Why We Cannot Do What We Think We Can Do, and What We Can Do About it
.”
European Sociological Review
26
:
67
82
.

National Longitudinal Survey of Youth (NLSY)
.
1997
.
“Interviewer Remarks, Characteristics & Contacts.”
Available at https://www.nlsinfo.org/content/cohorts/nlsy97/using-and-understanding-the-data/interviewer-remarks-characteristics-contacts.

Oksenberg
Lois
Coleman
Lerita
Cannell
Charles F.
.
1986
. “
Interviewers’ Voices and Refusal Rates in Telephone Surveys
.”
Public Opinion Quarterly
50
:
97
111
.

Olson
Kristen
.
2013
. “
Paradata for Nonresponse Adjustment
.”
Annals of the American Academy of Political and Social Science
645
:
142
70
.

Olson
Kristen
Parkhurst
Brian
.
2013
. “
Collecting Paradata for Measurement Error Evaluations
.” In
Improving Surveys with Paradata: Analytic Uses of Process Information
, edited by
Frauke
Kreuter
, pp.
43
72
.
Hoboken, NJ
:
John Wiley & Sons
.

Olson
Kristen
Peytchev
Andy
.
2007
. “
Effect of Interviewer Experience on Interview Pace and Interviewer Attitudes
.”
Public Opinion Quarterly
71
:
273
86
.

Raudenbush
Stephen W.
Bryk
Anthony S.
.
2002
.
Hierarchical Linear Models
, 2nd ed.
Thousand Oaks, CA
:
Sage Publications
.

Schaeffer
Nora Cate
Dykema
Jennifer
.
2011
. “
Response 1 to Fowler’s Chapter: Coding the Behavior of Interviewers and Respondents to Evaluate Survey Questions
.” In
Question Evaluation Methods: Contributing to the Science of Data Quality
, edited by
Jennifer
Madans
Kristen
Miller
Aaron
Maitland
Gordon
Willis
, pp.
23
39
.
Hoboken, NJ
:
John Wiley & Sons
.

Schober
Michael F.
Bloom
Jonathan E.
.
2004
. “
Discourse Cues That Respondents Have Misunderstood Survey Questions
.”
Discourse Processes
38
:
287
308
.

Sinibaldi
Jennifer
Durrant
Gabriele B.
Kreuter
Frauke
.
2013
. “
Evaluating the Measurement Error of Interviewer Observed Paradata
.”
Public Opinion Quarterly
77
:
173
93
.

Smith
Tom W
.
2009
. “
An Analysis of Computer Assisted Recorded Interviews (CARI) on the 2008 General Social Survey
.” GSS Methodology Report No. 117.

Stata
.
2015
.
Stata Multilevel Mixed-Effects Reference Manual
, 14th ed.
College Station, TX
:
Stata Press
.

Tarnai
John
Paxson
M. Chris
.
2005
. “
Interviewer Judgments about the Quality of Telephone Interviews
.” In
American Statistical Association, Proceedings of the Survey Research Methods Section
, pp.
3988
94
.

Thomas
Erik T.
Reaser
Jeffrey
.
2004
. “
Delimiting Perceptual Cues Used for the Ethnic Labeling of African American and European American Voices
.”
Journal of Sociolinguistics
8
:
54
87
.

Tiedens
Larissa Z.
Ellsworth
Phoebe C.
Mesquita
Batja
.
2000
. “
Sentimental Stereotypes: Emotional Expectations for High- and Low-Status Group Members
.”
Personality and Social Psychology Bulletin
26
:
560
75
.

Tversky
Amos
Kahneman
Daniel
.
1974
. “
Judgment under Uncertainty: Heuristics and Biases
.”
Science
185
:
1124
31
.

West
Brady T
.
2013
. “
An Examination of the Quality and Utility of Interviewer Observations in the National Survey of Family Growth
.”
Journal of the Royal Statistical Society A
176
:
211
25
.

West
Brady T.
Kreuter
Frauke
.
2013
. “
Factors Affecting the Accuracy of Interviewer Observations: Evidence from the National Survey of Family Growth
.”
Public Opinion Quarterly
77
:
522
48
.

West
Brady T.
Kreuter
Frauke
Trappmann
Mark
.
2014
. “
Is the Collection of Interviewer Observations Worthwhile in an Economic Panel Survey? New Evidence from the German Labor Market and Social Security (PASS) Study
.”
Journal of Survey Statistics and Methodology
2
:
159
81
.

Yan
Ting
Tourangeau
Roger
.
2008
. “
Fast Times and Easy Questions: The Effects of Age, Experience and Question Complexity on Web Survey Response Times
.”
Applied Cognitive Psychology
22
:
51
68
.

Footnotes

1

We assessed the proportional odds assumption by analyzing mixed-effects multinomial logistic models. We constrained the slopes to be equivalent across the categories of the outcome variables (the proportional odds assumption) and freed the slope parameters in a traditional multinomial model to calculate the appropriate test statistic.

2

These variance components were estimated based upon a two-level logistic model (melogit). Likelihood-ratio tests show sufficient variability between interviewers to justify mixed-effects models (available from the authors upon request).

3

We thank the editors for this insight.

4

We also thank the editors for this insight.

Supplementary data