Do Interviewer Postsurvey Evaluations of Respondents’ Engagement Measure Who Respondents Are or What They Do? A Behavior Coding Study

Table 1.

Descriptive Statistics of Interviewer Evaluations

Measure	Percentage
Cooperativeness
Fair and below	7.62
Good	29.10
Very good	63.28
Interest
Average and below	41.34
Above average	33.49
Very high	25.17
Friendliness
Cooperative but not particularly eager and below	40.18
Friendly and eager	59.82
Talkativeness
Very untalkative	3.00
Somewhat untalkative	12.01
Neither talkative nor untalkative	45.27
Somewhat talkative	30.95
Very talkative	8.78

Measure	Percentage
Cooperativeness
Fair and below	7.62
Good	29.10
Very good	63.28
Interest
Average and below	41.34
Above average	33.49
Very high	25.17
Friendliness
Cooperative but not particularly eager and below	40.18
Friendly and eager	59.82
Talkativeness
Very untalkative	3.00
Somewhat untalkative	12.01
Neither talkative nor untalkative	45.27
Somewhat talkative	30.95
Very talkative	8.78

Note.—n = 433.

INDEPENDENT VARIABLES: HEURISTIC PROCESSING

Table 2 provides an overview of respondent and interviewer characteristics indicative of heuristic processing. Respondents reported their age, gender, race, education, and income during the interview. Interviewer gender, race, and experience are included in the model as fixed characteristics of the interviewer and come from administrative records. Additionally, we include the interviewer’s cooperation rate, operationalized through the percent of call attempts with a contact made by the interviewer that yielded a successful interview, divided into higher and lower cooperation rate groups based on a median split (6.9 percent cooperation rate).

Table 2.

Percentage Distribution of Respondent and Interviewer Characteristics

Respondent characteristics and question wording (if applicable) (n = 433)	(Recoded) response categories	Percent	Mean	SD
Age: What is your age?	35 and less	8.7
	36 to 50	16.2
	51 and above	70.0
Gender: I have to read every question in this survey, even if it seems obvious. What is your sex?	Male	36.0
	Female	64.0
Race: [IF NON-HISPANIC ASK:] What is your race? Are you white, black, Asian, or some other? [IF HISPANIC ASK:] Are you white Hispanic, black Hispanic, or some other race?	White	87.3
	Nonwhite	12.7
Education: What is the last grade or class that you completed in school? [INTERVIEWER CODE, DO NOT READ]	High school and less	28.9
	Vocational	29.3
	College and above	41.8
1 None, or grade 1–8
2 High school incomplete (grades 9–11)
3 High school graduate (grade 12 or GED certificate)
4 Business, technical, or vocational school AFTER high school
5 Some college, no 4-year degree
6 College graduate (BS, BA, or other 4-year degree)
7 Postgraduate training or professional schooling after college (e.g., toward a master’s degree or PhD; law or medical school)
Respondent characteristics and question wording (if applicable) (n = 433)	(Recoded) response categories	Percent	Mean	SD
Income: Last year, that is, in 2012, what was your total family income from all sources, before taxes? Just stop me when I get to the right category. [READ]	$49,999 and less	58.2
	$50,000 and above	41.8
1 Less than $10,000
2 $10,000 to under $20,000
3 $20,000 to under $30,000
4 $30,000 to under $40,000
5 $40,000 to under $50,000
6 $50,000 to under $75,000
7 $75,000 to under $100,000
8 $100,000 or more
Respondent controls
Married: Are you married, partnered, divorced, separated, widowed, or never been married?	No	52.2
	Yes	47.8
Computer user: The next few questions are about leisure activities using a computer. Do you happen to have a desktop, laptop or tablet computer?	No	22.3
	Yes	77.7
# of questions asked			46.7	4.50
Interviewer characteristics (n = 19)	(Recoded) response categories	Percent	Mean	SD
Gender	Male	52.6
Gender	Female	47.4
Race	White	47.4
Race	Nonwhite	52.6
Experience	0 years	26.3
Experience	1+ years	73.7
Cooperation rate	Low	57.9
Cooperation rate	High	42.1

Respondent characteristics and question wording (if applicable) (n = 433)	(Recoded) response categories	Percent	Mean	SD
Age: What is your age?	35 and less	8.7
	36 to 50	16.2
	51 and above	70.0
Gender: I have to read every question in this survey, even if it seems obvious. What is your sex?	Male	36.0
	Female	64.0
Race: [IF NON-HISPANIC ASK:] What is your race? Are you white, black, Asian, or some other? [IF HISPANIC ASK:] Are you white Hispanic, black Hispanic, or some other race?	White	87.3
	Nonwhite	12.7
Education: What is the last grade or class that you completed in school? [INTERVIEWER CODE, DO NOT READ]	High school and less	28.9
	Vocational	29.3
	College and above	41.8
1 None, or grade 1–8
2 High school incomplete (grades 9–11)
3 High school graduate (grade 12 or GED certificate)
4 Business, technical, or vocational school AFTER high school
5 Some college, no 4-year degree
6 College graduate (BS, BA, or other 4-year degree)
7 Postgraduate training or professional schooling after college (e.g., toward a master’s degree or PhD; law or medical school)
Respondent characteristics and question wording (if applicable) (n = 433)	(Recoded) response categories	Percent	Mean	SD
Income: Last year, that is, in 2012, what was your total family income from all sources, before taxes? Just stop me when I get to the right category. [READ]	$49,999 and less	58.2
	$50,000 and above	41.8
1 Less than $10,000
2 $10,000 to under $20,000
3 $20,000 to under $30,000
4 $30,000 to under $40,000
5 $40,000 to under $50,000
6 $50,000 to under $75,000
7 $75,000 to under $100,000
8 $100,000 or more
Respondent controls
Married: Are you married, partnered, divorced, separated, widowed, or never been married?	No	52.2
	Yes	47.8
Computer user: The next few questions are about leisure activities using a computer. Do you happen to have a desktop, laptop or tablet computer?	No	22.3
	Yes	77.7
# of questions asked			46.7	4.50
Interviewer characteristics (n = 19)	(Recoded) response categories	Percent	Mean	SD
Gender	Male	52.6
Gender	Female	47.4
Race	White	47.4
Race	Nonwhite	52.6
Experience	0 years	26.3
Experience	1+ years	73.7
Cooperation rate	Low	57.9
Cooperation rate	High	42.1

Table 2.

Percentage Distribution of Respondent and Interviewer Characteristics

Respondent characteristics and question wording (if applicable) (n = 433)	(Recoded) response categories	Percent	Mean	SD
Age: What is your age?	35 and less	8.7
	36 to 50	16.2
	51 and above	70.0
Gender: I have to read every question in this survey, even if it seems obvious. What is your sex?	Male	36.0
	Female	64.0
Race: [IF NON-HISPANIC ASK:] What is your race? Are you white, black, Asian, or some other? [IF HISPANIC ASK:] Are you white Hispanic, black Hispanic, or some other race?	White	87.3
	Nonwhite	12.7
Education: What is the last grade or class that you completed in school? [INTERVIEWER CODE, DO NOT READ]	High school and less	28.9
	Vocational	29.3
	College and above	41.8
1 None, or grade 1–8
2 High school incomplete (grades 9–11)
3 High school graduate (grade 12 or GED certificate)
4 Business, technical, or vocational school AFTER high school
5 Some college, no 4-year degree
6 College graduate (BS, BA, or other 4-year degree)
7 Postgraduate training or professional schooling after college (e.g., toward a master’s degree or PhD; law or medical school)
Respondent characteristics and question wording (if applicable) (n = 433)	(Recoded) response categories	Percent	Mean	SD
Income: Last year, that is, in 2012, what was your total family income from all sources, before taxes? Just stop me when I get to the right category. [READ]	$49,999 and less	58.2
	$50,000 and above	41.8
1 Less than $10,000
2 $10,000 to under $20,000
3 $20,000 to under $30,000
4 $30,000 to under $40,000
5 $40,000 to under $50,000
6 $50,000 to under $75,000
7 $75,000 to under $100,000
8 $100,000 or more
Respondent controls
Married: Are you married, partnered, divorced, separated, widowed, or never been married?	No	52.2
	Yes	47.8
Computer user: The next few questions are about leisure activities using a computer. Do you happen to have a desktop, laptop or tablet computer?	No	22.3
	Yes	77.7
# of questions asked			46.7	4.50
Interviewer characteristics (n = 19)	(Recoded) response categories	Percent	Mean	SD
Gender	Male	52.6
Gender	Female	47.4
Race	White	47.4
Race	Nonwhite	52.6
Experience	0 years	26.3
Experience	1+ years	73.7
Cooperation rate	Low	57.9
Cooperation rate	High	42.1

Respondent characteristics and question wording (if applicable) (n = 433)	(Recoded) response categories	Percent	Mean	SD
Age: What is your age?	35 and less	8.7
	36 to 50	16.2
	51 and above	70.0
Gender: I have to read every question in this survey, even if it seems obvious. What is your sex?	Male	36.0
	Female	64.0
Race: [IF NON-HISPANIC ASK:] What is your race? Are you white, black, Asian, or some other? [IF HISPANIC ASK:] Are you white Hispanic, black Hispanic, or some other race?	White	87.3
	Nonwhite	12.7
Education: What is the last grade or class that you completed in school? [INTERVIEWER CODE, DO NOT READ]	High school and less	28.9
	Vocational	29.3
	College and above	41.8
1 None, or grade 1–8
2 High school incomplete (grades 9–11)
3 High school graduate (grade 12 or GED certificate)
4 Business, technical, or vocational school AFTER high school
5 Some college, no 4-year degree
6 College graduate (BS, BA, or other 4-year degree)
7 Postgraduate training or professional schooling after college (e.g., toward a master’s degree or PhD; law or medical school)
Respondent characteristics and question wording (if applicable) (n = 433)	(Recoded) response categories	Percent	Mean	SD
Income: Last year, that is, in 2012, what was your total family income from all sources, before taxes? Just stop me when I get to the right category. [READ]	$49,999 and less	58.2
	$50,000 and above	41.8
1 Less than $10,000
2 $10,000 to under $20,000
3 $20,000 to under $30,000
4 $30,000 to under $40,000
5 $40,000 to under $50,000
6 $50,000 to under $75,000
7 $75,000 to under $100,000
8 $100,000 or more
Respondent controls
Married: Are you married, partnered, divorced, separated, widowed, or never been married?	No	52.2
	Yes	47.8
Computer user: The next few questions are about leisure activities using a computer. Do you happen to have a desktop, laptop or tablet computer?	No	22.3
	Yes	77.7
# of questions asked			46.7	4.50
Interviewer characteristics (n = 19)	(Recoded) response categories	Percent	Mean	SD
Gender	Male	52.6
Gender	Female	47.4
Race	White	47.4
Race	Nonwhite	52.6
Experience	0 years	26.3
Experience	1+ years	73.7
Cooperation rate	Low	57.9
Cooperation rate	High	42.1

INDEPENDENT VARIABLES: SYSTEMATIC PROCESSING

We derive indicators of systematic processing from behavior codes, typically used to understand the interviewer-respondent interaction in survey interviews (e.g., Schaeffer and Dykema 2011). Each interview was digitally audiorecorded and transcribed. Then, a team of trained coders behavior coded each survey transcript. The behavior codes were assigned at the conversational-turn level, with codes assigned for the actor (respondent or interviewer); the initial action (e.g., answer provided); an assessment of the initial action (e.g., whether the answer provided was adequate, qualified, or uncodable); a more specific assessment of this action (e.g., whether the answer was provided with or without elaborations); laughter (whether the respondent laughed or not); any disfluencies during any part of the turn; and interruptions. Table 3 provides examples of each of these codes.

Table 3.

Kappa Statistics for Behavior Codes and Examples of Respondent Behaviors, Work and Leisure Today Survey

Behavior code	Kappa	Example 1	Example 2
1) Actor	0.998	Respondent	Respondent
2) Initial action	0.88	Answers question	Asks for clarification or definition
3) Assessment of initial action	0.21 to 0.76	Provides adequate answer	Asks to repeat response options
4) Details of action	0.56 to 0.68	Without elaboration	n.a.
5) Laughter	0.96	The respondent laughs	No laughter
6) Disfluencies	0.87	There are no disfluencies, stutters, or repairs	There are disfluencies, stutters, or repairs
7) Interruptions	0.94	There are no interruptions	The respondent interrupts the interviewer

Behavior code	Kappa	Example 1	Example 2
1) Actor	0.998	Respondent	Respondent
2) Initial action	0.88	Answers question	Asks for clarification or definition
3) Assessment of initial action	0.21 to 0.76	Provides adequate answer	Asks to repeat response options
4) Details of action	0.56 to 0.68	Without elaboration	n.a.
5) Laughter	0.96	The respondent laughs	No laughter
6) Disfluencies	0.87	There are no disfluencies, stutters, or repairs	There are disfluencies, stutters, or repairs
7) Interruptions	0.94	There are no interruptions	The respondent interrupts the interviewer

Table 3.

Kappa Statistics for Behavior Codes and Examples of Respondent Behaviors, Work and Leisure Today Survey

Behavior code	Kappa	Example 1	Example 2
1) Actor	0.998	Respondent	Respondent
2) Initial action	0.88	Answers question	Asks for clarification or definition
3) Assessment of initial action	0.21 to 0.76	Provides adequate answer	Asks to repeat response options
4) Details of action	0.56 to 0.68	Without elaboration	n.a.
5) Laughter	0.96	The respondent laughs	No laughter
6) Disfluencies	0.87	There are no disfluencies, stutters, or repairs	There are disfluencies, stutters, or repairs
7) Interruptions	0.94	There are no interruptions	The respondent interrupts the interviewer

Behavior code	Kappa	Example 1	Example 2
1) Actor	0.998	Respondent	Respondent
2) Initial action	0.88	Answers question	Asks for clarification or definition
3) Assessment of initial action	0.21 to 0.76	Provides adequate answer	Asks to repeat response options
4) Details of action	0.56 to 0.68	Without elaboration	n.a.
5) Laughter	0.96	The respondent laughs	No laughter
6) Disfluencies	0.87	There are no disfluencies, stutters, or repairs	There are disfluencies, stutters, or repairs
7) Interruptions	0.94	There are no interruptions	The respondent interrupts the interviewer

Expert coders independently double-coded a 10 percent subsample of the survey transcripts to assess intercoder reliability. The reliability of these codes was quite high (table 3)—all but one kappa value exceeded 0.56, meeting a minimum kappa requirement of 0.40 (Bilgen and Belli 2010). The exception was the assessment of type of clarification (kappa = 0.21); thus, we aggregate clarifications into four more general categories.

We differentiate between four types of respondent behaviors with the behavior codes: (1) respondent-answering behavior such as providing an adequate response with or without elaborations (e.g., respondents stating “5” versus elaborating on their answer “5. I really like swimming”), or an uncodable answer that cannot be coded into the response format; (2) nonverbal utterances such as laughter or disfluencies; (3) personal involvement and rapport reflecting more general conversational processes and rapport; and (4) requests for clarifications such as asking for a definition of a term indicative of some form of cognitive difficulty.

We calculate the number of conversational turns on which each respondent behavior occurred throughout the entire interview for each respondent. Table 4 provides a summary of each behavior, its definition, and descriptive statistics.

Table 4.

Mean and Standard Deviation of Total Number of Turns with Respondent Behaviors

Behavior code	Definition	Mean	SD
Respondent answering behaviors
Adequate answer	Provides an answer that can be coded according to the response format	45.83	9.06
With elaboration		3.57	3.41
Without elaboration		42.18	9.51
Qualified answer	Answers with a qualifier that shows uncertainty	5.03	4.47
With elaboration		0.91	1.61
Without elaboration		4.12	3.61
Uncodable answer	Provides an answer that cannot be coded according to the response format	10.03	8.57
With elaboration		4.86	5.77
Without elaboration		5.17	4.11
Don’t know	States that they don’t know or don’t remember the answer	1.00	1.34
Refusal	Refuses to answer the question	0.63	1.35
“Other” answer	States that they have an answer to a previous question or disagree with an interviewer	0.15	0.55
Nonverbal utterances
Laughter	Respondent laughs	5.13	5.68
Disfluency	Whether there are any disfluencies, stutters, or repairs	19.53	12.59
Personal involvement and rapport
Agrees with interviewer	Agrees with interviewer, either as verification or as showing understanding	0.97	1.79
Affirmative feedback	Provides an affirmative statement	7.00	5.68
Acknowledging feedback	Thanks interviewer or gives indication that they are thinking	2.85	4.18
Task-related feedback	Task-, time-, and telephone quality-related feedback	0.24	0.73
Digression	Engages in off-topic conversation	1.27	2.86
Personal disclosure	Makes statement about self or own attitudes (outside of response)	4.02	6.68
“Other” feedback	States an apology or negation	1.22	1.38
Requests for clarification
Interrupts interviewer	Respondent interrupts the interviewer	12.45	12.56
Clarification—repeat	Asks for repetition of the question, the response options, or definition	1.57	2.04
Clarification—definition	Asks for a definition of a term	0.48	0.92
Clarification—what	Says “What?” or “What did you say?”	1.84	1.86
Clarification—unit	Asks for unit of measurement for the response	0.68	1.10

Behavior code	Definition	Mean	SD
Respondent answering behaviors
Adequate answer	Provides an answer that can be coded according to the response format	45.83	9.06
With elaboration		3.57	3.41
Without elaboration		42.18	9.51
Qualified answer	Answers with a qualifier that shows uncertainty	5.03	4.47
With elaboration		0.91	1.61
Without elaboration		4.12	3.61
Uncodable answer	Provides an answer that cannot be coded according to the response format	10.03	8.57
With elaboration		4.86	5.77
Without elaboration		5.17	4.11
Don’t know	States that they don’t know or don’t remember the answer	1.00	1.34
Refusal	Refuses to answer the question	0.63	1.35
“Other” answer	States that they have an answer to a previous question or disagree with an interviewer	0.15	0.55
Nonverbal utterances
Laughter	Respondent laughs	5.13	5.68
Disfluency	Whether there are any disfluencies, stutters, or repairs	19.53	12.59
Personal involvement and rapport
Agrees with interviewer	Agrees with interviewer, either as verification or as showing understanding	0.97	1.79
Affirmative feedback	Provides an affirmative statement	7.00	5.68
Acknowledging feedback	Thanks interviewer or gives indication that they are thinking	2.85	4.18
Task-related feedback	Task-, time-, and telephone quality-related feedback	0.24	0.73
Digression	Engages in off-topic conversation	1.27	2.86
Personal disclosure	Makes statement about self or own attitudes (outside of response)	4.02	6.68
“Other” feedback	States an apology or negation	1.22	1.38
Requests for clarification
Interrupts interviewer	Respondent interrupts the interviewer	12.45	12.56
Clarification—repeat	Asks for repetition of the question, the response options, or definition	1.57	2.04
Clarification—definition	Asks for a definition of a term	0.48	0.92
Clarification—what	Says “What?” or “What did you say?”	1.84	1.86
Clarification—unit	Asks for unit of measurement for the response	0.68	1.10

Note.—n = 433.

Table 4.

Mean and Standard Deviation of Total Number of Turns with Respondent Behaviors

Behavior code	Definition	Mean	SD
Respondent answering behaviors
Adequate answer	Provides an answer that can be coded according to the response format	45.83	9.06
With elaboration		3.57	3.41
Without elaboration		42.18	9.51
Qualified answer	Answers with a qualifier that shows uncertainty	5.03	4.47
With elaboration		0.91	1.61
Without elaboration		4.12	3.61
Uncodable answer	Provides an answer that cannot be coded according to the response format	10.03	8.57
With elaboration		4.86	5.77
Without elaboration		5.17	4.11
Don’t know	States that they don’t know or don’t remember the answer	1.00	1.34
Refusal	Refuses to answer the question	0.63	1.35
“Other” answer	States that they have an answer to a previous question or disagree with an interviewer	0.15	0.55
Nonverbal utterances
Laughter	Respondent laughs	5.13	5.68
Disfluency	Whether there are any disfluencies, stutters, or repairs	19.53	12.59
Personal involvement and rapport
Agrees with interviewer	Agrees with interviewer, either as verification or as showing understanding	0.97	1.79
Affirmative feedback	Provides an affirmative statement	7.00	5.68
Acknowledging feedback	Thanks interviewer or gives indication that they are thinking	2.85	4.18
Task-related feedback	Task-, time-, and telephone quality-related feedback	0.24	0.73
Digression	Engages in off-topic conversation	1.27	2.86
Personal disclosure	Makes statement about self or own attitudes (outside of response)	4.02	6.68
“Other” feedback	States an apology or negation	1.22	1.38
Requests for clarification
Interrupts interviewer	Respondent interrupts the interviewer	12.45	12.56
Clarification—repeat	Asks for repetition of the question, the response options, or definition	1.57	2.04
Clarification—definition	Asks for a definition of a term	0.48	0.92
Clarification—what	Says “What?” or “What did you say?”	1.84	1.86
Clarification—unit	Asks for unit of measurement for the response	0.68	1.10

Behavior code	Definition	Mean	SD
Respondent answering behaviors
Adequate answer	Provides an answer that can be coded according to the response format	45.83	9.06
With elaboration		3.57	3.41
Without elaboration		42.18	9.51
Qualified answer	Answers with a qualifier that shows uncertainty	5.03	4.47
With elaboration		0.91	1.61
Without elaboration		4.12	3.61
Uncodable answer	Provides an answer that cannot be coded according to the response format	10.03	8.57
With elaboration		4.86	5.77
Without elaboration		5.17	4.11
Don’t know	States that they don’t know or don’t remember the answer	1.00	1.34
Refusal	Refuses to answer the question	0.63	1.35
“Other” answer	States that they have an answer to a previous question or disagree with an interviewer	0.15	0.55
Nonverbal utterances
Laughter	Respondent laughs	5.13	5.68
Disfluency	Whether there are any disfluencies, stutters, or repairs	19.53	12.59
Personal involvement and rapport
Agrees with interviewer	Agrees with interviewer, either as verification or as showing understanding	0.97	1.79
Affirmative feedback	Provides an affirmative statement	7.00	5.68
Acknowledging feedback	Thanks interviewer or gives indication that they are thinking	2.85	4.18
Task-related feedback	Task-, time-, and telephone quality-related feedback	0.24	0.73
Digression	Engages in off-topic conversation	1.27	2.86
Personal disclosure	Makes statement about self or own attitudes (outside of response)	4.02	6.68
“Other” feedback	States an apology or negation	1.22	1.38
Requests for clarification
Interrupts interviewer	Respondent interrupts the interviewer	12.45	12.56
Clarification—repeat	Asks for repetition of the question, the response options, or definition	1.57	2.04
Clarification—definition	Asks for a definition of a term	0.48	0.92
Clarification—what	Says “What?” or “What did you say?”	1.84	1.86
Clarification—unit	Asks for unit of measurement for the response	0.68	1.10

Note.—n = 433.

Adequate responses occur on an average of 45.83 conversational turns, with respondents providing adequate responses without elaboration on an average of 42.18 conversational turns and with elaboration on 3.57 turns. Providing an uncodable response was the second most frequent response behavior, occurring on 10.03 turns, roughly equally split between uncodable responses without elaborations (mean = 5.17) and with elaborations (mean = 4.86). Disfluencies occur on an average of 19.53 conversational turns, and interruptions occur on 12.45 turns.

CONTROLS

We include household composition (proxied with marital status) and general questionnaire burden, that is, whether a respondent triggered a series of follow-up questions related to computer use and the number of questions (see table 2).

All continuous independent variables are grand-mean-centered (Raudenbush and Byrk 2002).

METHODS

We evaluate the association between indicators of heuristic and systematic processing for each of the interviewer assessments using a two-level ordered logistic regression model with an interviewer random effect with the meologit command in Stata 14 (Stata 2015). Each interviewer j = 1 … M has i = 1 … n_j respondents, with K possible outcomes and cutpoints labeled as K₁, K₂ … K_K-1, including $x_{i j}$ covariates for the fixed effects, and random effects $u_{j}$ ⁠. For response $y_{i j}$ ⁠, the probability of observing outcome k is $p_{i j} = \Pr (y_{i j} = k | K, u_{j}) = H (K_{k} - x_{i j} β - u_{j}) - H (K_{k - 1} - x_{i j} β - u_{j})$ (Stata 2015). Friendliness is a binary variable and therefore analyzed with a traditional logistic regression model.

Ordered-logistic models assume proportional odds across each pair of outcomes. Although this assumption is violated in some instances, our conclusions do not change, so we report the more parsimonious ordered-logistic models.¹ Several robustness checks, including outlier diagnostics for the independent variables, show that all results generally hold when re-estimating these models censoring cases with high numbers of behaviors at the 95th percentile (results available on request).

Three models were estimated for each of the interviewer assessments. Model 1 is a null model as a baseline; model 2 includes variables associated with heuristic processing and controls; and model 3 adds variables capturing systematic processing. Unless indicated otherwise, none of the results from previously estimated models change using this stepwise approach. As such, we display results for model 3 only (for full results, see online appendix B).

We report the odds ratio and the average marginal effects (AME) for each statistically significant respondent behavior in the text. For each respondent, the AME calculates the difference in the predicted probability between each category of the outcome variable, holding the independent variable at a given value. This difference is then averaged across all respondents. Mathematically, the marginal change in probability is computed as (Long and Freese 2006):

\frac{\partial Pr (y = k | x)}{\partial x_{i j}} = \frac{\partial F (K_{k} - x β)}{\partial x_{i j}} - \frac{\partial F (K_{k - 1} - x β)}{\partial x_{i j}}

Holding all other variables constant, this is the slope of the curve relating $x_{i j}$ to $\Pr (y = k | x)$ for each outcome. For categorical variables, the interpretation of the AME is straightforward—the effect of being in the focal category of the independent variable compared to the reference category. For continuous variables, the AME is related to a very small change (approximately the standard deviation of the variable divided by 1,000) in the independent variable. An AME of 5.0 would indicate that a very small unit increase in the independent variable (e.g., number of conversational turns with adequate responses) yields a five-percentage-point increase in the probability of a specific outcome occurring (e.g., the interviewer rating the respondent as very talkative). AMEs yield a straightforward interpretation of effect sizes and can be compared across models (Mood 2010). Because ordinal logistic regression models have multiple outcome categories, we only report the AME in percentage points for the highest category of each interviewer rating.

Results: Which Strategies Do Interviewers Use?

HEURISTIC PROCESSING

In an empty two-level model, interviewer-related variance components account for between 27 and 36 percent of the total variance in evaluations (p < 0.05).²

Table 5 presents the coefficients for the heuristic-processing models and controls based on the full models for each of the four interviewer evaluations. Evidence of heuristic processing for any of the outcomes in this study is limited. Talkativeness is explained by respondent characteristics, with an 18.8 percent reduction in the interviewer variance for talkativeness due to the inclusion of respondent characteristics. The associations between heuristic processing and the other outcomes are more modest, reaching a 4 percent reduction in interviewer-level variance for cooperativeness and modest increases in interviewer-level variance for interest and friendliness evaluations.

Table 5.

Multilevel Ordered Logistic Regression Coefficients and Standard Errors Predicting Interviewer Evaluations of Cooperativeness, Interest, Friendliness, and Talkativeness with Indicators of Heuristic and Systematic Processing

	Cooperativeness	Interest	Friendliness	Talkativeness
Heuristic processing Respondent characteristics
Age (cent.)	0.00	–0.00	0.00	0.00
Female (ref. Male)	–0.07	–0.23	0.09	0.76***
Nonwhite (ref. White)	0.03	0.25	–0.15	–0.53
High school or less	–0.39	–0.63*	–0.58	0.67**
Income	–0.04	–0.07	–0.04	0.01
Interviewer characteristics
Female (ref. Male)	0.35	–1.08	0.19	0.59
Nonwhite (ref. White)	–0.47	–0.09	0.23	–0.74
Interviewer experience 1+ year(s)	–1.18	–1.02	0.85	–0.25
Cooperation rate	–0.04	0.48	–0.65	–0.62
Respondent control variables
Married (ref. Unmarried)	0.19	–0.10	–0.24	–0.37
Computer user	0.88*	0.68*	0.12	0.03
# of questions asked (cent.)	0.02	0.04	0.05	0.04
Systematic processing Respondent answering behaviors
Adequate answer with elaboration	–0.04	0.05	0.02	0.17***
Adequate answer w/o elaboration	–0.03	–0.01	0.00	0.04
Qualified answer with elaboration	–0.12	0.07	0.40**	0.10
Qualified answer w/o elaboration	–0.03	–0.04	0.03	0.02
Uncodable answer with elaboration	–0.05	0.01	–0.07	0.08*
Uncodable answer w/o elaboration	–0.09*	–0.08*	–0.03	–0.03
Don’t know	–0.17	–0.33***	–0.38***	–0.17*
Refusal	–0.37***	–0.26**	–0.30**	–0.07
“Other” answer	0.24	0.07	0.00	–0.03
Nonverbal utterances
Laughter	0.06	0.08**	0.14***	–0.03
Disfluency	0.04**	0.03**	0.02	0.02
Personal involvement and rapport
Agrees with interviewer	–0.08	–0.03	0.08	–0.03
Affirmative feedback	0.14***	–0.03	0.05	–0.05
Acknowledging feedback	0.04	0.04	0.09	0.04
Task-related feedback	–0.04	–0.03	–0.27	–0.07
Digression	0.11	0.03	0.08	0.07
Personal disclosure	–0.02	–0.03	–0.05	0.08**
“Other” feedback	0.06	–0.02	0.02	–0.06
Clarification behaviors
Interrupts interviewer	–0.04	0.01	–0.00	0.01
Clarification—repeat	–0.02	0.02	0.08	–0.00
Clarification—definition	–0.03	–0.08	0.04	0.08
Clarification—what	–0.17*	–0.21**	–0.14	–0.02
Clarification—unit	–0.13	–0.04	–0.18	0.02
Intercept 1	–4.77***	–1.81	0.41	–5.24***
Intercept 2	–1.50	0.55		–2.99***
Intercept 3				0.45
Intercept 4				3.66**
Interviewer-level variance	2.03*	2.16**	1.89*	2.15**
Model fit:
AIC	595.57	790.28	492.03	937.98
Observations	433	433	433	433

	Cooperativeness	Interest	Friendliness	Talkativeness
Heuristic processing Respondent characteristics
Age (cent.)	0.00	–0.00	0.00	0.00
Female (ref. Male)	–0.07	–0.23	0.09	0.76***
Nonwhite (ref. White)	0.03	0.25	–0.15	–0.53
High school or less	–0.39	–0.63*	–0.58	0.67**
Income	–0.04	–0.07	–0.04	0.01
Interviewer characteristics
Female (ref. Male)	0.35	–1.08	0.19	0.59
Nonwhite (ref. White)	–0.47	–0.09	0.23	–0.74
Interviewer experience 1+ year(s)	–1.18	–1.02	0.85	–0.25
Cooperation rate	–0.04	0.48	–0.65	–0.62
Respondent control variables
Married (ref. Unmarried)	0.19	–0.10	–0.24	–0.37
Computer user	0.88*	0.68*	0.12	0.03
# of questions asked (cent.)	0.02	0.04	0.05	0.04
Systematic processing Respondent answering behaviors
Adequate answer with elaboration	–0.04	0.05	0.02	0.17***
Adequate answer w/o elaboration	–0.03	–0.01	0.00	0.04
Qualified answer with elaboration	–0.12	0.07	0.40**	0.10
Qualified answer w/o elaboration	–0.03	–0.04	0.03	0.02
Uncodable answer with elaboration	–0.05	0.01	–0.07	0.08*
Uncodable answer w/o elaboration	–0.09*	–0.08*	–0.03	–0.03
Don’t know	–0.17	–0.33***	–0.38***	–0.17*
Refusal	–0.37***	–0.26**	–0.30**	–0.07
“Other” answer	0.24	0.07	0.00	–0.03
Nonverbal utterances
Laughter	0.06	0.08**	0.14***	–0.03
Disfluency	0.04**	0.03**	0.02	0.02
Personal involvement and rapport
Agrees with interviewer	–0.08	–0.03	0.08	–0.03
Affirmative feedback	0.14***	–0.03	0.05	–0.05
Acknowledging feedback	0.04	0.04	0.09	0.04
Task-related feedback	–0.04	–0.03	–0.27	–0.07
Digression	0.11	0.03	0.08	0.07
Personal disclosure	–0.02	–0.03	–0.05	0.08**
“Other” feedback	0.06	–0.02	0.02	–0.06
Clarification behaviors
Interrupts interviewer	–0.04	0.01	–0.00	0.01
Clarification—repeat	–0.02	0.02	0.08	–0.00
Clarification—definition	–0.03	–0.08	0.04	0.08
Clarification—what	–0.17*	–0.21**	–0.14	–0.02
Clarification—unit	–0.13	–0.04	–0.18	0.02
Intercept 1	–4.77***	–1.81	0.41	–5.24***
Intercept 2	–1.50	0.55		–2.99***
Intercept 3				0.45
Intercept 4				3.66**
Interviewer-level variance	2.03*	2.16**	1.89*	2.15**
Model fit:
AIC	595.57	790.28	492.03	937.98
Observations	433	433	433	433

Note.—Model 3. See online appendix B for full models. The intercepts refer to the cutpoints or thresholds of the latent underlying variable y*. When the value of y* is above this threshold, the observed category in the outcome variable y changes (Long and Freese 2006, p. 185). For cooperativeness, fair and below ≤ intercept 1 < good ≤ intercept 2 < very good. For interest, average and below ≤ intercept 1 < above average ≤ intercept 2 < very high. “Friendly and eager” takes the value of 1 in the friendliness model. For talkativeness, very untalkative ≤ intercept 1, …≤ intercept 4 < very talkative.

*p < 0.05, **p < 0.01, ***p < 0.00.

Table 5.

Multilevel Ordered Logistic Regression Coefficients and Standard Errors Predicting Interviewer Evaluations of Cooperativeness, Interest, Friendliness, and Talkativeness with Indicators of Heuristic and Systematic Processing

	Cooperativeness	Interest	Friendliness	Talkativeness
Heuristic processing Respondent characteristics
Age (cent.)	0.00	–0.00	0.00	0.00
Female (ref. Male)	–0.07	–0.23	0.09	0.76***
Nonwhite (ref. White)	0.03	0.25	–0.15	–0.53
High school or less	–0.39	–0.63*	–0.58	0.67**
Income	–0.04	–0.07	–0.04	0.01
Interviewer characteristics
Female (ref. Male)	0.35	–1.08	0.19	0.59
Nonwhite (ref. White)	–0.47	–0.09	0.23	–0.74
Interviewer experience 1+ year(s)	–1.18	–1.02	0.85	–0.25
Cooperation rate	–0.04	0.48	–0.65	–0.62
Respondent control variables
Married (ref. Unmarried)	0.19	–0.10	–0.24	–0.37
Computer user	0.88*	0.68*	0.12	0.03
# of questions asked (cent.)	0.02	0.04	0.05	0.04
Systematic processing Respondent answering behaviors
Adequate answer with elaboration	–0.04	0.05	0.02	0.17***
Adequate answer w/o elaboration	–0.03	–0.01	0.00	0.04
Qualified answer with elaboration	–0.12	0.07	0.40**	0.10
Qualified answer w/o elaboration	–0.03	–0.04	0.03	0.02
Uncodable answer with elaboration	–0.05	0.01	–0.07	0.08*
Uncodable answer w/o elaboration	–0.09*	–0.08*	–0.03	–0.03
Don’t know	–0.17	–0.33***	–0.38***	–0.17*
Refusal	–0.37***	–0.26**	–0.30**	–0.07
“Other” answer	0.24	0.07	0.00	–0.03
Nonverbal utterances
Laughter	0.06	0.08**	0.14***	–0.03
Disfluency	0.04**	0.03**	0.02	0.02
Personal involvement and rapport
Agrees with interviewer	–0.08	–0.03	0.08	–0.03
Affirmative feedback	0.14***	–0.03	0.05	–0.05
Acknowledging feedback	0.04	0.04	0.09	0.04
Task-related feedback	–0.04	–0.03	–0.27	–0.07
Digression	0.11	0.03	0.08	0.07
Personal disclosure	–0.02	–0.03	–0.05	0.08**
“Other” feedback	0.06	–0.02	0.02	–0.06
Clarification behaviors
Interrupts interviewer	–0.04	0.01	–0.00	0.01
Clarification—repeat	–0.02	0.02	0.08	–0.00
Clarification—definition	–0.03	–0.08	0.04	0.08
Clarification—what	–0.17*	–0.21**	–0.14	–0.02
Clarification—unit	–0.13	–0.04	–0.18	0.02
Intercept 1	–4.77***	–1.81	0.41	–5.24***
Intercept 2	–1.50	0.55		–2.99***
Intercept 3				0.45
Intercept 4				3.66**
Interviewer-level variance	2.03*	2.16**	1.89*	2.15**
Model fit:
AIC	595.57	790.28	492.03	937.98
Observations	433	433	433	433

	Cooperativeness	Interest	Friendliness	Talkativeness
Heuristic processing Respondent characteristics
Age (cent.)	0.00	–0.00	0.00	0.00
Female (ref. Male)	–0.07	–0.23	0.09	0.76***
Nonwhite (ref. White)	0.03	0.25	–0.15	–0.53
High school or less	–0.39	–0.63*	–0.58	0.67**
Income	–0.04	–0.07	–0.04	0.01
Interviewer characteristics
Female (ref. Male)	0.35	–1.08	0.19	0.59
Nonwhite (ref. White)	–0.47	–0.09	0.23	–0.74
Interviewer experience 1+ year(s)	–1.18	–1.02	0.85	–0.25
Cooperation rate	–0.04	0.48	–0.65	–0.62
Respondent control variables
Married (ref. Unmarried)	0.19	–0.10	–0.24	–0.37
Computer user	0.88*	0.68*	0.12	0.03
# of questions asked (cent.)	0.02	0.04	0.05	0.04
Systematic processing Respondent answering behaviors
Adequate answer with elaboration	–0.04	0.05	0.02	0.17***
Adequate answer w/o elaboration	–0.03	–0.01	0.00	0.04
Qualified answer with elaboration	–0.12	0.07	0.40**	0.10
Qualified answer w/o elaboration	–0.03	–0.04	0.03	0.02
Uncodable answer with elaboration	–0.05	0.01	–0.07	0.08*
Uncodable answer w/o elaboration	–0.09*	–0.08*	–0.03	–0.03
Don’t know	–0.17	–0.33***	–0.38***	–0.17*
Refusal	–0.37***	–0.26**	–0.30**	–0.07
“Other” answer	0.24	0.07	0.00	–0.03
Nonverbal utterances
Laughter	0.06	0.08**	0.14***	–0.03
Disfluency	0.04**	0.03**	0.02	0.02
Personal involvement and rapport
Agrees with interviewer	–0.08	–0.03	0.08	–0.03
Affirmative feedback	0.14***	–0.03	0.05	–0.05
Acknowledging feedback	0.04	0.04	0.09	0.04
Task-related feedback	–0.04	–0.03	–0.27	–0.07
Digression	0.11	0.03	0.08	0.07
Personal disclosure	–0.02	–0.03	–0.05	0.08**
“Other” feedback	0.06	–0.02	0.02	–0.06
Clarification behaviors
Interrupts interviewer	–0.04	0.01	–0.00	0.01
Clarification—repeat	–0.02	0.02	0.08	–0.00
Clarification—definition	–0.03	–0.08	0.04	0.08
Clarification—what	–0.17*	–0.21**	–0.14	–0.02
Clarification—unit	–0.13	–0.04	–0.18	0.02
Intercept 1	–4.77***	–1.81	0.41	–5.24***
Intercept 2	–1.50	0.55		–2.99***
Intercept 3				0.45
Intercept 4				3.66**
Interviewer-level variance	2.03*	2.16**	1.89*	2.15**
Model fit:
AIC	595.57	790.28	492.03	937.98
Observations	433	433	433	433

Note.—Model 3. See online appendix B for full models. The intercepts refer to the cutpoints or thresholds of the latent underlying variable y*. When the value of y* is above this threshold, the observed category in the outcome variable y changes (Long and Freese 2006, p. 185). For cooperativeness, fair and below ≤ intercept 1 < good ≤ intercept 2 < very good. For interest, average and below ≤ intercept 1 < above average ≤ intercept 2 < very high. “Friendly and eager” takes the value of 1 in the friendliness model. For talkativeness, very untalkative ≤ intercept 1, …≤ intercept 4 < very talkative.

*p < 0.05, **p < 0.01, ***p < 0.00.

Open in new tab Download slide

With respect to respondent characteristics, even after controlling for actual respondent behaviors, interviewers evaluate women as significantly more talkative (OR = 1.96, p < 0.001; AME = 3.90) than men in all models, as expected. Women are 3.9 percentage points more likely to be rated as “very talkative” than men. Additionally, interviewers rate respondents with a high school degree or less as significantly less interested (OR = 0.53, p < 0.05; AME = –7.56), as expected, and significantly more talkative (OR = 2.14, p < 0.001; AME = 3.46) than their more educated counterparts (here, 7.56 percentage points less likely to be rated as “very interested” and 3.46 percentage points more likely to be rated as “very talkative” than their more educated counterparts). Counter to the heuristic-processing hypotheses, none of the respondent characteristics are significantly related to interviewer assessments of respondents’ cooperativeness or friendliness. None of the interviewer characteristics are statistically significantly associated with any of the evaluations.

The interviewer’s gender and race do not moderate the effect of the respondent’s gender and race on each of the four types of evaluations (results not shown). None of the interaction effects are statistically significant, and their inclusion does not change our substantive conclusions.

Regarding the control variables, interviewers evaluate computer users as significantly more cooperative (OR = 2.42, p < 0.05; AME = 1.2) and more interested (OR = 1.98, p < 0.05; AME = 8.2). Although computer use was included as a measure of questionnaire burden, these results suggest that computer use also proxies for higher socioeconomic status. The initially significant positive effect of number of questions asked (OR = 1.09, p < 0.01) on talkativeness is fully absorbed when actual respondent behaviors are included. No other control variable is statistically significantly related to any of the evaluations in this study.

SYSTEMATIC PROCESSING

Table 5 presents the results for the systematic-processing hypotheses based on the full models for each interviewer evaluation. There is clear evidence for systematic behavior-based processing for each of the interviewer evaluations in this study. As expected, behaviors related to the process of responding and nonverbal mannerisms are significant predictors of respondent cooperativeness, interest, friendliness, and talkativeness. Particularly important and supportive of systematic processing is that different responding behaviors predict each of the assessments. Indicators of other conversational processes, including rapport building and clarification requests, are less consistently associated with interviewer assessments.

We hypothesized that adequate answering behaviors would be associated with higher ratings of cooperativeness, interest, and friendliness, whereas any form of inadequate answer would be associated with lower ratings of cooperativeness. Surprisingly, adequate and qualified answers—the most frequent types of respondent answering behaviors—are each associated with only one of the four interviewer evaluations. Elaborations were expected to be negatively associated with ratings of cooperativeness, but positively associated with friendliness and talkativeness. As hypothesized, adequate answers with elaboration (OR = 1.18, p < 0.001; AME = 0.9) are positively associated with respondents being perceived as more talkative but not associated with friendliness (OR = 1.02, p = 0.751). As the number of adequate answers with elaboration increases slightly from the mean, the probability of being evaluated as very talkative increases by 0.9 percentage points. However, providing adequate answers without elaboration is associated neither with evaluations of talkativeness at traditional levels (OR = 1.04, p = 0.103) nor with any other evaluation. Respondents who provide higher numbers of qualified answers with elaborations are evaluated as friendlier (OR = 1.49, p < 0.01; AME = 5.9). Qualified answers with elaborations are not associated with any other evaluation.

As anticipated, interviewers rate respondents more unfavorably the more often a task is left incomplete by the respondent. Interviewers rate respondents who provide more uncodable answers with elaboration (OR = 1.08, p < 0.05; AME = 0.4) as more talkative; don’t know responses (OR = 0.84, p < 0.05; AME = –0.9) result in ratings of respondents as being less talkative. Interviewers rate respondents as less cooperative when they provide more uncodable answers without elaboration (OR = 0.91, p < 0.05; AME = –1.3) or refuse to answer (OR = 0.69, p < 0.001; AME = –5.1). Similarly, respondents who provide more uncodable responses without elaboration (OR = 0.92, p < 0.05; AME = –1.0), who provide more “don’t know” responses (OR = 0.72, p < 0.001; AME = –4.0), and who refuse to respond to a question (OR = 0.77, p < 0.01; AME = –3.1) are evaluated as less interested. Respondents who provide more don’t know responses (OR = 0.68, p < 0.001; AME = –5.6) or refuse to provide a response (OR = 0.74, p < 0.01; AME = –4.4) are also rated as being less friendly.

We anticipate that nonverbal utterances of respondent laughter are positively associated with evaluations of respondent cooperativeness, interest, and friendliness, whereas the effect of disfluencies is less straightforward. Interviewers generally evaluate respondents more favorably when respondents display more of these normal conversational behaviors. Respondents who laugh more are evaluated as being more interested and friendly (OR = 1.08, p < 0.01; AME = 1.0; OR = 1.15, p < 0.001; AME = 2.0), but not as being more cooperative or talkative (OR = 1.06, p = 0.070; OR = 0.98, p = 0.307). Respondents who speak with more disfluencies are rated as being more cooperative and interested, confirming the “disfluency advantage” (OR = 1.04, p < 0.01; AME = 0.6; OR = 1.03, p < 0.01; AME = 0.4). Disfluencies are not associated with evaluations of being friendly (OR = 1.02, p = 0.246) or talkative (OR = 1.02, p = 0.149).

We expected verbal rapport and personal involvement behaviors to be associated with higher ratings of friendliness and talkativeness. Few behaviors related to personal involvement predict such assessments. Respondents who provide more affirmative feedback are perceived as being more cooperative (OR = 1.15, p < 0.001; AME = 1.9), and respondents who disclose personal information more frequently are perceived as more talkative (OR = 1.08, p < 0.01; AME = 0.4). Contrary to expectations, requests for clarification are not associated with higher ratings of cooperativeness and interest: Respondents who use more “What?” clarification requests (e.g., “What did you say?”) are perceived as being less cooperative and less interested (OR = 0.84, p < 0.05; AME = –2.4; OR = 0.81, p < 0.01; AME = –2.6). None of the other personal involvement behaviors are associated with any of the evaluations.

The AIC goodness-of-fit statistics show that while including indicators of heuristic processing improves model fit slightly, the drop in AIC and hence model improvement is largest when incorporating information on respondent behaviors, particularly related to the quality of the response and other nonverbal mannerisms (online appendix B). Interestingly, interviewer-level variance in these evaluations increases in all models once accounting for respondent behaviors, indicating heterogeneity in respondent behaviors across interviewers (Raudenbush and Bryk 2002).

Figure 1 shows the predicted probability for the two extreme categories of each of the interviewer evaluations. More specifically, we computed the predicted probability of being in the extreme categories for each interviewer evaluation. We set each significant independent variable at one standard deviation below the mean, the mean, and one standard deviation above the mean and hold all other variables at their observed values (online appendix C). The predicted probability of being rated as very cooperative (/fair and below) for someone who provides fewer uncodable answers, that is, one standard deviation below the mean, is 0.69 (/0.05) compared to 0.58 (/0.08) for someone who provides more uncodable answers, that is, one standard deviation above the mean.

Figure 1.

Adjusted Predictions for Significant Systematic Processing Indicators.

Our results suggest that interviewers perceive and use respondent behaviors to make their assessments rather than drawing on respondent attributes based on social categories. These results also suggest that interviewers’ assessments are predominantly influenced by respondents’ question-answering behaviors and nonverbal behaviors. Thus, interviewers differentiate across respondent behaviors in their assessments, incorporating those pieces of information that are most relevant to the judgment they are asked to make.

Conclusion and Discussion

Using the continuum model of impression formation, we investigated whether interviewers base their assessment of respondent engagement on stereotypes, their own characteristics, or interactions with the respondent. Overall, interviewer assessments vary systematically across interviewers. Although this systematic variation across interviewers occurs, for the task-related evaluation of cooperativeness, there is no evidence of any heuristic evaluation or inappropriate stereotyping beyond the respondents’ actual behavior. We find a similar lack of association with the more interpersonal assessment of friendliness. Education and gender are associated with the assessments of interest (education only) and talkativeness. None of the interviewer characteristics explained the statistically significant interviewer variation. One possible explanation for this finding could be that while interviewers rely on their own traits and experiences, we do not adequately measure the interviewer characteristics that lead to these differences. For example, gender may not matter as much as an interviewer’s perceptual ability.

Instead of using heuristic processing, interviewers rely on a more sophisticated strategy of information processing based on the quality of the data provided by the respondent and other behaviors throughout the interview. While using systematic processing, interviewers rely primarily on behaviors associated with the immediate response task and measures of nonverbal communication in the interpersonal interaction. That is, although interviewers vary significantly in their assessments of respondents, the assessments are based on the actual interaction with the respondent even if those occur infrequently (e.g., don’t know responses). This is important because respondent behaviors such as uncodable, don’t know, and refusal answers are associated with lower data quality, and in particular, with lower accuracy (e.g., Mathiowetz 1998). Indicators of rapport or personal involvement and requests for clarification indicating cognitive difficulty are much less likely to be associated with these four ratings of respondents.

Further analyses not presented here (see online appendix D) confirm that the assessments made by interviewers are valid. The variability across respondents is greater than the variability across interviewers, and the proportion of variance uniquely explained by indicators of systematic processing is substantially larger compared to the proportion of variance uniquely attributable to heuristic processing. Further research can be done to explain the unexplained variance at both the interviewer and respondent level—the behaviors themselves explain about one-quarter or less of the total variance in the assessments and less than one-third of the within-interviewer variance. Overall, our findings provide insight into the cognitive processes interviewers use when assessing respondents’ engagement.

The implications of these findings are many. First, these results suggest that the extra effort and money spent by survey organizations to collect evaluations results in assessments that reflect the interviewer-respondent interaction. A simple assessment of how the interviewer thinks the interview went is a less expensive insight than a more elaborate behavior coding study. Of course, the evaluations do not indicate what exactly went wrong during an interaction, and thus are not a full replacement for behavior coding. Additionally, this study does not assess exactly how useful these indicators are for assessing measurement error directly; this will be examined in future research.

Second, the implications for the use of these postsurvey evaluations in measurement error models are mixed. It is clear from these data that refusals or reports of don’t know contribute to how interviewers answer these evaluation questions. Thus, studies that use these evaluations to predict item-nonresponse rates (e.g., Kaminska, McCutcheon, and Billiet 2010) use endogenous measures. That is, a significant association between these evaluations and item nonresponse (e.g., Tarnai and Paxson 2005) will not be surprising because the don’t know responses and refusals themselves were used by the interviewer to make these evaluations. To the extent that the ratings identify potential item nonrespondents and are associated with the survey variables of interest, these ratings are useful as covariates in imputation models (as suggested by Mathiowetz [1998]). That is, the endogeneity of these measures is a problem for causal models, but could be beneficial for imputation models.

Third, these interactional properties of an interview could be important to respondents and their willingness to continue to participate in longitudinal studies (e.g., Lepkowski and Couper 2002). The association of the interviewer ratings with interview behaviors suggests that future research should more thoroughly investigate the potential of these ratings in response propensity models (in longitudinal studies) and their potential utility for responsive designs (Groves and Heeringa 2006).

Fourth, survey organizations that want to reduce the amount of variation over interviewers due to factors other than these behaviors could train interviewers about how to fill out these assessments. This kind of additional training on how to complete these evaluations would likely strengthen the association between behaviors and postsurvey evaluations and reduce inter-interviewer variance. Alternatively, if training is difficult or interviewer-related variance cannot be reduced, survey organizations who have collected interviewer assessments over multiple studies are advised to calculate interviewer-adjusted ratings. This kind of calibration will allow research organizations to adjust the evaluations for the interviewer’s own perspective, separate from the behaviors themselves.³

Finally, although the interview behaviors are associated with the assessments of interest, friendliness, cooperativeness, and talkativeness, the explained variation due to these behaviors—that is, the signal-to-noise ratio—is moderate (see online appendix D). That is, for any given respondent, the quality of the measurement is weak (the confidence interval around a predicted value would be wide given the poor measurement). Thus, using these observations to flag an individual respondent for potential removal from a dataset is unwise. Yet these ratings may successfully identify groups of respondents who should be investigated for potentially providing lower-quality data (e.g., evidence of straightlining, satisficing, or other kinds of inconsistent answers). Additionally, because the measures are valid, but somewhat unreliable, survey organizations could compare across studies, across time, or across groups of respondents as long as there is an approximate interpenetration (random assignment) of cases to interviewers, a design feature common in telephone surveys. If interpenetration is not achieved, then an interviewer-adjusted score may be warranted before such comparisons are made.⁴

This study has limitations. The telephone setting potentially suppresses some of the stereotyping effects relative to a face-to-face survey where interviewers see the physical characteristics of the respondents. Second, the sample is based on a landline RDD survey, leading to a more homogeneous set of respondents. Third, we looked at one study, but expect our results to generalize to other telephone surveys with different topics or lengths. Although the respondent behaviors may differ in another survey, we anticipate that interviewers will incorporate information about the respondents’ behaviors into their evaluations. Future research should examine interviewer evaluations on questionnaires with different types of items (e.g., more sensitive or complex items). Fourth, perceptions and behaviors of the interviewer likely elicit corresponding behavior by the respondent, but interviewer behaviors were not included here. Fifth, our sample size and number of interviewers is limited. Future research should replicate this study using a larger sample. Finally, our study investigates interviewer evaluations collected by an individual survey organization and should be replicated across different organizations to strengthen our inferences.

Overall, our results show that in postsurvey evaluations interviewers evaluate respondents based on their behaviors and distinguish subtleties in those behaviors, rather than their social categories. Telephone survey interview organizations and researchers can be confident that these evaluations provide a valid summary of the interaction between these two key actors.

Supplementary Data

Supplementary data are freely available at Public Opinion Quarterly online.

Antje Kirchner is a research survey methodologist in the Survey Research Division at RTI International, Research Triangle Park, NC, USA, and an adjunct research assistant professor in the Department of Sociology at the University of Nebraska–Lincoln, Lincoln, NE, USA. Kristen Olson is an associate professor in the Department of Sociology at the University of Nebraska–Lincoln, Lincoln, NE, USA. Jolene D. Smyth is an associate professor in the Department of Sociology, and director of the Bureau of Sociological Research at the University of Nebraska–Lincoln, Lincoln, NE, USA. This material is based upon work supported by the National Science Foundation [SES-1132015 to K.O.]. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of the National Science Foundation. An earlier version of this paper was presented at the Annual Conference of the Midwest Association for Public Opinion Research, November 2015, Chicago, IL, USA. The authors thank the anonymous reviewers and the editors for feedback.

References

American National Election Studies (ANES)

.

2013

.

ANES 2012 Pre-Election Questionnaire

. Available at http://www.electionstudies.org/studypages/anes_timeseries_2012/anes_timeseries_2012_qnaire_pre.pdf.

Barrett

Kirsten

Sloan

Matt

Wright

Debra

.

2006

. “

Interviewer Perceptions of Interview Quality

.” In

Proceedings of the ASA, Survey Research Methods Section

, pp.

4026

–

33

.

Belli

Robert F.

Weiss

Paul S.

Lepkowski

James M.

.

1999

. “

Dynamics of Survey Interviewing and the Quality of Survey Reports: Age Comparisons

.” In

Cognition, Aging and Self-Reports

, edited by

Norbert

Schwarz

Denise

Park

Bärbel

Knäuper

Seymour

Sudman

, pp.

285

–

302

.

Philadelphia

:

Psychology Press

.

Bilgen

Ipek

Belli

Robert F.

.

2010

. “

Comparison of Verbal Behaviors between Calendar and Standardized Conventional Questionnaires

.”

Journal of Official Statistics

26

:

481

–

505

.

Blumberg

Stephen J.

Luke

Julian V.

.

2013

.

“Wireless Substitution: Early Release of Estimates from the National Health Interview Survey, July–December 2013.”

National Center for Health Statistics

. Available at http://www.cdc.gov/nchs/data/nhis/earlyrelease/wireless201306.pdf.

Brennan

Susan E.

Schober

Michael F.

.

2001

. “

How Listeners Compensate for Disfluencies in Spontaneous Speech

.”

Journal of Memory and Language

44

:

274

–

96

. Available at http://www.mfschober.net/BrennanSchober01.pdf.

Cannell

Charles F.

Miller

Peter V.

Oksenberg

Lois

.

1981

. “

Research on Interviewing Techniques

.” In

Sociological Methodology

, edited by

Samuel

Leinhardt

, pp.

389

–

437

.

San Francisco

:

Jossey-Bass

.

Chaiken

Shelly

.

1980

. “

Heuristic Versus Systematic Information Processing and the Use of Source Versus Message Cues in Persuasion

.”

Journal of Personality and Social Psychology

39

:

752

–

66

.

Chaiken

Shelly

Torpe

Yaacov

.

1999

.

Dual-Process Theories in Social Psychology

.

New York

:

Guilford Press

.

Chen

Serena

Duckworth

Kimberly

Chaiken

Shelly

.

1999

. “

Motivated Heuristic and Systematic Processing

.”

Psychological Inquiry

10

:

44

–

49

.

Conrad

Frederick G.

Broome

Jessica S.

Benkí

José R.

Kreuter

Frauke

Groves

Robert M.

Vannette

David

McClain

Colleen

.

2013

. “

Interviewer Speech and the Success of Survey Invitations

.”

Journal of the Royal Statistical Society

176

:

191

–

210

.

Conrad

Frederick G.

Schober

Michael F.

Dijkstra

Wil

.

2008

. “

Cues of Communication Difficulty in Telephone Interviews

.” In

Advances in Telephone Survey Methodology

, edited by

James M.

Lepkowski

Clyde

Tucker

Michael J.

Brick

Edith de

Leeuw

Lilli

Japec

Paul J.

Lavrakas

Michael W.

Link

Roberta L.

Sangster

, pp.

212

–

30

.

Hoboken, NJ

:

John Wiley & Sons

.

Dykema

Jennifer

Lepkowski

James M.

Blixt

Steven

.

1997

. “

The Effect of Interviewer and Respondent Behavior on Data Quality: Analysis of Interaction Coding in a Validation Study

.” In

Survey Measurement and Process Quality

, edited by

Lars E.

Lyberg

Paul P.

Biemer

Martin

Collins

Edith de

Leeuw

Cathryn

Dippo

Norbert

Schwarz

Dennis

Trewin

, pp.

287

–

310

.

Hoboken, NJ

:

John Wiley & Sons

.

Earls

Felton J.

Brooks-Gunn

Jeanne

Raudenbusch

Stephen W.

Sampson

Robert J.

.

2000

.

“Project on Human Development in Chicago Neighborhoods (PHDCN): Child and Adolescent Behavior Rating Scale, Wave 3, 2000–2002 (ICPSR 13678).”

Available at http://www.icpsr.umich.edu/cgi-bin/file?comp=none&study=13678&ds=0&file_id=895548&path=ICPSR.

European Social Survey (ESS)

.

2014

.

“ESS Round 7 Source Questionnaire.” London: ESS ERIC Headquarters, Centre for Comparative Social Surveys, City University London

. Available at http://www.europeansocialsurvey.org/docs/round7/fieldwork/source/ESS7_source_main_questionnaire.pdf.

Fiske

Susan T

.

2000

. “

Stereotyping, Prejudice, and Discrimination at the Seam between the Centuries: Evolution, Culture, Mind, and Brain

.”

European Journal of Social Psychology

30

:

299

–

322

.

Fiske

Susan T.

Cuddy

Amy J. C.

Glick

Peter

Xu

Jun

.

2002

. “

A Model of (Often Mixed) Stereotype Content: Competence and Warmth Respectively Follow from Perceived Status and Competition

.”

Journal of Personality and Social Psychology

82

:

878

–

902

.

Fiske

Susan T.

Lin

Monica

Neuberg

Steven L.

.

1999

. “

The Continuum Model. Ten Years Later

.” In

Dual-Process Theories in Social Psychology

, edited by

Shelly

Chaiken

and

Yaacov

Trope

, pp.

231

–

54

.

New York/London

:

Guilford Press

.

Freedman

Vicki

Stafford

Frank P.

Conrad

Frederick G.

Schwarz

Norbert

Cornman

Jennifer

.

2012

. “

Assessing Time Diary Quality for Older Couples: An Analysis of the Panel Study of Income Dynamics’ Disability and Use of Time (DUST) Supplement

.”

Annals of Economics and Statistics

105/106

:

271

–

89

.

Garbarski

Dana

Schaeffer

Nora Cate

Dykema

Jennifer

.

2016

. “

Interviewing Practices, Conversational Practices, and Rapport: Responsiveness and Engagement in the Standardized Survey Interview

.”

Sociological Methodology

46

:

1

–

38

.

Greenwald

Anthony G.

Banaji

Mahzarin R.

.

1995

. “

Implicit Social Cognition: Attitudes, Self-Esteem, and Stereotypes

.”

Psychological Review

102

:

4

–

27

.

Groves

Robert M.

Heeringa

Steven G.

.

2006

. “

Responsive Design for Household Surveys: Tools for Actively Controlling Survey Errors and Costs

.”

Journal of the Royal Statistical Society A

169

:

439

–

57

.

Holbrook

Allyson L.

Anand

Sowmya

Johnson

Timothy P.

Cho

Young Ik

Shavitt

Sharon

Chávez

Noel

Weiner

Saul

.

2014

. “

Response Heaping in Interviewer-Administered Surveys: Is it Really a Form of Satisficing

?”

Public Opinion Quarterly

78

:

591

–

633

.

Hurtado

Aída

.

1994

. “

Does Similarity Breed Respect: Interviewer Evaluations of Mexican-Descent Respondents in a Bilingual Survey

.”

Public Opinion Quarterly

58

:

77

–

95

.

Jans

Matthew E

.

2010

. “

Verbal Paradata and Survey Error: Respondent Speech, Voice, and Question-Answering Behavior Can Predict Income Item Nonresponse

.” PhD diss.,

University of Michigan

.

Japec

Lilli

.

2008

. “

Interviewer Error and Interviewer Burden

.” In

Advances in Telephone Survey Methodology

, edited by

James M.

Lepkowski

Clyde

Tucker

J. Michael

Brick

Edith de

Leeuw

Lilli

Japec

Paul J.

Lavrakas

Michael W.

Link

L.

Roberta

, pp.

185

–

211

.

Hoboken, NJ

:

John Wiley & Sons

.

Johnson

Timothy P.

Shariff-Marco

Salma

Willis

Gordon

Cho

Young Ik

Breen

Nancy

Gee

Gilbert C.

Krieger

Nancy

Grant

David

Alegria

Margarita

Mays

Vickie M.

Williams

David R.

Landrine

Hope

Liu

Benmei

Reeve

Bryce B.

Takeuchi

David

Ponce

Ninez A.

.

2015

. “

Sources of Interactional Problems in a Survey of Racial/Ethnic Discrimination

.”

International Journal of Public Opinion Research

27

:

244

–

63

.

Kaminska

Olena

McCutcheon

Allan L.

Billiet

Jaak

.

2010

. “

Satisficing among Reluctant Respondents in a Cross-National Context

.”

Public Opinion Quarterly

74

:

956

–

84

.

Krauss

Robert M.

Freyberg

Robin

Morsella

Ezequiel

.

2002

. “

Inferring Speakers’ Physical Attributes from Their Voices

.”

Journal of Experimental Social Psychology

38

:

618

–

25

.

Lepkowski

James M.

Couper

Mick P.

.

2002

. “

Nonresponse in the Second Wave of Longitudinal Household Surveys

.” In

Survey Nonresponse

, edited by

Robert M.

Groves

Don A.

Dillman

John L.

Eltinge

Roderick J. A.

Little

, pp.

259

–

72

.

Hoboken, NJ

:

John Wiley & Sons

.

Long

Scott J.

Freese

Jeremy

.

2006

.

Regression Models for Categorical Dependent Variables Using Stata

, 2nd ed.

College Station, TX

:

Stata Press

.

Mathiowetz

Nancy A

.

1998

. “

Respondent Expressions of Uncertainty Data Source for Imputation

.”

Public Opinion Quarterly

62

:

47

–

56

.

Medway

Rebecca

Tourangeau

Roger

.

2015

. “

Response Quality in Telephone Surveys: Do Prepaid Cash Incentives Make a Difference

?”

Public Opinion Quarterly

79

:

524

–

43

.

Mood

Carina

.

2010

. “

Logistic Regression: Why We Cannot Do What We Think We Can Do, and What We Can Do About it

.”

European Sociological Review

26

:

67

–

82

.

National Longitudinal Survey of Youth (NLSY)

.

1997

.

“Interviewer Remarks, Characteristics & Contacts.”

Available at https://www.nlsinfo.org/content/cohorts/nlsy97/using-and-understanding-the-data/interviewer-remarks-characteristics-contacts.

Oksenberg

Lois

Coleman

Lerita

Cannell

Charles F.

.

1986

. “

Interviewers’ Voices and Refusal Rates in Telephone Surveys

.”

Public Opinion Quarterly

50

:

97

–

111

.

Olson

Kristen

.

2013

. “

Paradata for Nonresponse Adjustment

.”

Annals of the American Academy of Political and Social Science

645

:

142

–

70

.

Olson

Kristen

Parkhurst

Brian

.

2013

. “

Collecting Paradata for Measurement Error Evaluations

.” In

Improving Surveys with Paradata: Analytic Uses of Process Information

, edited by

Frauke

Kreuter

, pp.

43

–

72

.

Hoboken, NJ

:

John Wiley & Sons

.

Olson

Kristen

Peytchev

Andy

.

2007

. “

Effect of Interviewer Experience on Interview Pace and Interviewer Attitudes

.”

Public Opinion Quarterly

71

:

273

–

86

.

Raudenbush

Stephen W.

Bryk

Anthony S.

.

2002

.

Hierarchical Linear Models

, 2nd ed.

Thousand Oaks, CA

:

Sage Publications

.

Schaeffer

Nora Cate

Dykema

Jennifer

.

2011

. “

Response 1 to Fowler’s Chapter: Coding the Behavior of Interviewers and Respondents to Evaluate Survey Questions

.” In

Question Evaluation Methods: Contributing to the Science of Data Quality

, edited by

Jennifer

Madans

Kristen

Miller

Aaron

Maitland

Gordon

Willis

, pp.

23

–

39

.

Hoboken, NJ

:

John Wiley & Sons

.

Schober

Michael F.

Bloom

Jonathan E.

.

2004

. “

Discourse Cues That Respondents Have Misunderstood Survey Questions

.”

Discourse Processes

38

:

287

–

308

.

Sinibaldi

Jennifer

Durrant

Gabriele B.

Kreuter

Frauke

.

2013

. “

Evaluating the Measurement Error of Interviewer Observed Paradata

.”

Public Opinion Quarterly

77

:

173

–

93

.

Smith

Tom W

.

2009

. “

An Analysis of Computer Assisted Recorded Interviews (CARI) on the 2008 General Social Survey

.” GSS Methodology Report No. 117.

Stata

.

2015

.

Stata Multilevel Mixed-Effects Reference Manual

, 14th ed.

College Station, TX

:

Stata Press

.

Tarnai

John

Paxson

M. Chris

.

2005

. “

Interviewer Judgments about the Quality of Telephone Interviews

.” In

American Statistical Association, Proceedings of the Survey Research Methods Section

, pp.

3988

–

94

.

Thomas

Erik T.

Reaser

Jeffrey

.

2004

. “

Delimiting Perceptual Cues Used for the Ethnic Labeling of African American and European American Voices

.”

Journal of Sociolinguistics

8

:

54

–

87

.

Tiedens

Larissa Z.

Ellsworth

Phoebe C.

Mesquita

Batja

.

2000

. “

Sentimental Stereotypes: Emotional Expectations for High- and Low-Status Group Members

.”

Personality and Social Psychology Bulletin

26

:

560

–

75

.

Tversky

Amos

Kahneman

Daniel

.

1974

. “

Judgment under Uncertainty: Heuristics and Biases

.”

Science

185

:

1124

–

31

.

West

Brady T

.

2013

. “

An Examination of the Quality and Utility of Interviewer Observations in the National Survey of Family Growth

.”

Journal of the Royal Statistical Society A

176

:

211

–

25

.

West

Brady T.

Kreuter

Frauke

.

2013

. “

Factors Affecting the Accuracy of Interviewer Observations: Evidence from the National Survey of Family Growth

.”

Public Opinion Quarterly

77

:

522

–

48

.

West

Brady T.

Kreuter

Frauke

Trappmann

Mark

.

2014

. “

Is the Collection of Interviewer Observations Worthwhile in an Economic Panel Survey? New Evidence from the German Labor Market and Social Security (PASS) Study

.”

Journal of Survey Statistics and Methodology

2

:

159

–

81

.

Yan

Ting

Tourangeau

Roger

.

2008

. “

Fast Times and Easy Questions: The Effects of Age, Experience and Question Complexity on Web Survey Response Times

.”

Applied Cognitive Psychology

22

:

51

–

68

.