Persuasive Message Pretesting Using Non-Behavioral Outcomes: Differences in Attitudinal and Intention Effects as Diagnostic of Differences in Behavioral Effects

Persuasive message designers would like to be able to pretest messages to see which will be more effective in influencing behavioral outcomes, but pretesting using behavioral measures is commonly not practical. Examination of within-study effect size comparisons from 317 studies of 22 message variations suggests that persuasive messages’ relative effectiveness is strikingly similar across attitudinal, intention, and behavioral outcomes—with messages’ relative persuasiveness with respect to intention outcomes especially indicative of relative persuasiveness with respect to behavioral outcomes. Intention measures thus provide a convenient and accurate means of persuasive message pretesting.

encouraging flu shots is unlikely to be able to pretest candidate messages by seeing which ones are actually more effective at producing behavioral uptake.
The primary purpose of this article is to provide evidence concerning the possibility of using non-behavioral outcomes as a basis for identifying messages that will be more persuasive with respect to behavioral outcomes.The focus is specifically on attitude outcomes (general evaluations, e.g., consumer product liking) and intention outcomes (behavioral intentions, e.g., purchase intentions) as potential pretesting devices.The next section contextualizes this project against the backdrop of other message-pretesting methods.

Persuasive message pretesting methods
Because pretesting persuasive messages by obtaining behavioral-outcome data is commonly not practical, three other general kinds of pretesting methods have been suggested.

Checklist methods
One potential means of advance identification of relatively more persuasive messages is through appropriate content-analytic methods.In this approach, trained coders use a checklist to assess message properties thought to be relevant to persuasiveness.The supposition is that if message A ranks higher than message B with respect to such properties, then message A will likely be more persuasive than message B when behavioral outcomes are assessed.For example, for health education materials, Paul, Redman, & Sanson-Fisher (1997) developed a 62-item "checklist of the content and design characteristics of effective print materials . . .based on a critical review of the relevant research literature" (p.153).Armstrong (2010) identified 194 "evidence-based principles" for persuasive advertising, with these organized into a coding scheme yielding a Persuasive Principles Index score for advertisements.(For examples of similar methods, see Baumel, Faber, Mathur, Kane, & Muench, 2017;Cole, Keller, Reynolds, Schaur, & Krause, 2016.)To underwrite the message-pretesting use of such a method, the kind of research evidence needed would be evidence that those messages identified as relatively more persuasive using such a checklist are in fact ones relatively more effective when assessed using behavioral-outcome data.Unfortunately, such evidence is scant (see, e.g., Hartnett, Greenacre, Kennedy, & Sharp, 2020;Sharp & Hartnett, 2016).

Perceived persuasiveness
A second pretesting method has pretest participants from the target audience rate or rank messages on perceived persuasiveness (perceived message effectiveness, PME).The supposition is that if message A is rated higher in perceived effectiveness than message B, then message A will likely be more persuasive than message B when behavioral outcomes (i.e., actual effectiveness outcomes) are assessed.For example, Vaillancourt et al. (2019) had participants rate messages on scales with end-anchors such as persuasive/not persuasive, effective/ineffective, and convincing/ not convincing; Drovandi, Teague, Glass, & Malau-Aduli (2019) had smokers rank-order cigarette warnings in terms of perceived effectiveness.(For some general discussions of such methods, see Baig et al., 2019;Dillard, Weber, & Vail, 2007.)To underwrite the message-pretesting use of such a method, the kind of research evidence needed would be evidence that those messages identified as relatively more persuasive using such perceived-persuasiveness measures are in fact ones relatively more effective when assessed using behavioral-outcome data.Unfortunately, evidence underwriting the predictive diagnosticity of such methods is not strong (O'Keefe, 2018; for some discussion, see Cappella, 2018;Noar, Barker, & Yzer, 2018;O'Keefe, 2020).

Non-behavioral outcomes
A third possibility is to assess non-behavioral persuasive outcomes-specifically, attitude and intention outcomes-as pretest measures.The supposition is that relative message persuasiveness will be identical across attitude, intention, and behavioral outcomes, and hence if message A is more persuasive than message B when attitude or intention outcomes are assessed, then message A will likely be more persuasive than message B when behavioral outcomes are assessed.
To underwrite the message-pretesting use of such a method, the kind of research evidence needed would be evidence that those messages identified as relatively more persuasive using such non-behavioral outcomes are in fact ones relatively more effective when assessed using behavioral-outcome data.Such evidence is not in hand.Two other kinds of evidence might seem to be relevant here-but upon closer inspection, neither in fact underwrites the use of non-behavioral measures in message pretesting.
Positive attitude-behavior and intention-behavior correlations.One kind of evidence that would appear to support the use of non-behavioral measures in message pretesting is the existence of generally positive correlations between nonbehavioral and behavioral measures.Individuals' attitudes and behaviors are commonly positively (if imperfectly) correlated (e.g., Glasman & Albarracı ´n, 2006), as are individuals' intentions and behaviors (e.g., Sheeran, 2002).Thus, one might reason, attitudes and intentions will be good indicators of messages' relative behavioral effectiveness.
But this is not sound reasoning.(The following discussion is phrased in terms of the relationship of intentions and behavior, but the same analysis applies to the relationship of attitudes and behaviors.)The correlation between individuals' intention scores and behavior scores speaks to the question of whether one can predict individuals' behavior scores on the basis of their intention scores-more carefully, whether one can predict individuals' relative standing on behavioral outcomes on the basis of individuals' relative standing on intention outcomes.However, in message pretesting, one is trying to predict messages' relative standing on behavioral outcomes, not individuals' relative standing; the interest is with messages' means, not individuals' scores.
The distinction is important.Even if individuals' scores on intention and behavior measures are positively correlated, that does not imply that the relative standing of messages' means on intention measures will match the relative standing of messages' means on behavior measures.This point can be illustrated with a hypothetical dataset based on three messages (N ¼ 12): For message A, participants have the following (intention, behavior) pairs of scores: (52,48), (62,58), (72,68), and (82,78).For message B, the scores are (50,50), (60,60), (70,70), and (80,80).For message C, the scores are (78,82), (68,72), (58,62), and (48,52).The correlation between individuals' intention scores and behavior scores is strongly positive, þ.96.The ranking of messages on intention is A, B, C (means of 67.0, 65.0, and 63.0, respectively); the ranking of messages on behavior is C, B, A (means of 67.0, 65.0, and 63.0, respectively). 1  As this illustrates, it is possible for individuals' intention and behavior scores to be strongly positively correlated and yet for messages' relative standing on intention outcomes to be the opposite of those messages' relative standing on behavioral outcomes (even in the same dataset).Plainly, knowing that individuals' intention and behavior scores are generally positively correlated provides no guarantee that messages' relative standing on the two measures will generally match up.How messages' means sort themselves out on the two measures is something conceptually different from how individuals' scores on the two measures are related. 2 Thus the existence of positive correlations between individuals' intention scores and behavior scores cannot possibly show that messages' relative standing on intention outcomes is a good indicator of messages' relative standing on behavioral outcomes.It's one thing to say "the relative standing of individuals on intention positively covaries with the relative standing of individuals on behavior" and something quite different to say "the relative standing of messages on intention positively covaries with the relative standing of messages on behavior."The first could be true while the second is false.
Similar mean effect sizes across outcomes.A second form of apparent evidence supporting the pretesting use of non-behavioral outcomes was offered by O'Keefe (2013), in the form of comparisons of meta-analytic averages of effect sizes for attitude, intention, and behavioral outcomes in studies of message-variable persuasive effects.Over a large number of different message variables, those meta-analytic averages were similar across the three different outcome variables.The conclusion was that for "practical persuasive-message pretesting" purposes, "campaign planners need not collect attitudinal, intentional, and behavioral outcome data.Any one of these three kinds of assessment will suffice to identify the most persuasive message" (p.237).
But this is not sound reasoning, because the reported meta-analytic averages could potentially mask consequential within-study differences between effect sizes for different outcomes.For example, imagine that some studies had a large positive effect size for intention (indicating message A was more effective than message B with respect to the intention outcome) and a small negative effect size for behavior (indicating message B was more effective than message A with respect to the behavioral outcome), while other studies had a small negative effect size for intention and a large positive effect size for behavior.When effect sizes were averaged across such studies, the mean intention effect size and the mean behavior effect size would look similar-even though in every individual study, the intention outcome would have misidentified which of the two messages was more effective in behavioral terms.To see whether attitude, intention, and behavioral outcomes are equivalent for purposes of pretesting messages, meta-analytic averages are not informative; examination of within-study comparisons, however, can provide relevant evidence.
Summary.In short, the use of non-behavioral measures in message pretesting is not justified-cannot possibly be justified-either by positive correlations between individuals' scores on behavioral and non-behavioral measures or by the similarity of meta-analytic mean effect sizes across behavioral and non-behavioral outcomes.Other evidence is needed.

The present project
Expressed narrowly, the present project is aimed at providing evidence bearing on the suggestion that non-behavioral persuasive outcomes-attitude and intention outcomes specifically-provide a sound basis for diagnosing differences between messages in behavioral outcomes.As just intimated, the evidence to be examined consists of within-study comparisons of effect sizes for behavioral and non-behavioral outcomes.If, within individual studies, messages' relative standing on non-behavioral outcomes is usually indicative of those messages' relative standing on behavioral outcomes, then formative researchers would have a straightforward message pretesting protocol: to see which message will be more persuasive with respect to behavioral outcomes, see which message is more persuasive with respect to nonbehavioral (attitude or intention) outcomes.
Expressed more broadly, the question addressed here is whether messages' relative standing with respect to one persuasive outcome is indicative of their relative standing with respect to other outcomes.Hence, in addition to examining the relationship of non-behavioral (attitude and intention) effect sizes to behavioral effect sizes-the comparisons of interest for message pretesting questions-the relationship between attitude effect sizes and intention effect sizes is also considered.

Method
As an overview: Existing meta-analyses concerning the persuasive effects of message variations were used to identify primary research studies that had effect sizes (ESs) for at least two outcomes of interest (attitude, intention, and behavior).Each such study yielded within-study comparisons of the effect sizes associated with the different outcomes.Comparisons were analyzed so as to assess whether messages' relative standing on one outcome was indicative of messages' relative standing on other outcomes.

Locating relevant studies
The data of interest are comparisons between message-variation persuasion ESs for different outcomes within a given study.In principle, any individual study (of the effects of a persuasive message variation) that contains assessments of at least two of the outcomes of interest would provide relevant evidence.Retrieving all such studies is not practical.However, many such studies can be efficiently located through examination of existing relevant meta-analyses.Hence this report is based on culling relevant within-study comparisons from meta-analyses of the persuasive effects of message variations.
The primary-research studies of interest were studies that compared the persuasiveness of two message forms (e.g., gain-framed vs. loss-framed) by assessing at least two of the three outcomes of interest (attitude, intention, behavior).Thus to be included, a meta-analysis had to have analyzed such studies.Meta-analyses of potential interest were identified by searches of the Web of Science, Medline, PsycINFO, PsycEXTRA, and ProQuest Dissertations and Theses databases, combining meta-analysis with such terms as persuasion, message, and attitude, through July 2020.Additional candidates were located through examination of reviews of relevant meta-analyses (e.g., Eisend & Tarrahi, 2016;Rains, Levine, & Weber, 2018).
For most message variables, only one appropriate meta-analysis was available.In cases where more than one meta-analysis was available, the one with the largest number of contributing studies was included.If more than one meta-analysis of the same variable were to have been included, then individual studies would have been counted multiple times because the same study's results would have been included in multiple meta-analyses.
Data were obtained for a total of 22 message variables: gain-loss framing (data from the meta-analysis of O' Keefe & Jensen, 2006;65

Extracting relevant comparisons
Each included meta-analysis yielded a set of effect sizes and associated sample sizes.The ESs were not adjusted, deleted, recomputed, or otherwise altered, except for converting all ESs to correlations (rs) for analysis. 3 The unit of analysis was the comparison, within a given study, between the ES for one outcome and the ES for another outcome.A total of 317 studies reported results for at least two of the three outcomes of interest (attitude, intention, behavior) and so provided a basis for comparing effect sizes for different outcomes.Most studies (275, 87%) reported results for only two such outcomes and hence contributed only one comparison; 42 studies reported results for all three outcomes and hence contributed three comparisons.Thus a total of 401 comparisons were available.

Analyzing the comparisons
For each such comparison, several properties were of interest.One was whether the two effect sizes had the same direction of effect, that is, whether the message that appeared more persuasive on one outcome was also the message that appeared more persuasive on the other outcome; comparisons were coded as having the same direction of effect if the signs of the two ESs were the same or if either of the ESs was zero.A second was the size of the difference between the two ESs, described by Cohen's q (the difference between z-transformed rs). 4 A third was whether the two ESs were statistically significantly different, that is, whether q was statistically significant (two-tailed test, .05alpha).
No one of these properties is necessarily informative with respect to assessment of message pretesting procedures.To concretize this point, consider the comparison between an intention ES and a behavior ES.Just because two such ESs are statistically significantly different in a given study does not necessarily suggest a weakness in using intention measures for message pretesting.Suppose that the intention ES (expressed as r) was þ.15, the behavior ES was þ.25, and these were significantly different.A pretest using intention outcomes would nevertheless have correctly identified the message that would be more persuasive with behavioral outcomes.
Similarly, just because two such ESs have different directions of effect does not necessarily suggest a weakness in using intention measures for message pretesting.Suppose that the intention ES was þ.01, the behavior ES was À.01, and these were not significantly different.The two ESs have different signs, but plainly the two messages did not differ much in persuasiveness and so relying on an intention assessment in pretesting would likely not make for a dramatically bad message choice.
Hence, in addition to examining those three properties individually, the number of significant disjunctures between ESs was also tallied.A comparison was coded as a significant disjuncture under three conditions: (1) for the comparison of any two ESs, if the two ESs had different signs and were significantly different; (2) for the comparison of an attitude ES and an intention ES, if one of the ESs was zero, the other ES was non-zero, and the ESs were significantly different; (3) for the comparison of a non-behavior (i.e., attitude or intention) ES and a behavior ES, if the nonbehavior ES was zero, the behavior ES was non-zero, and the ESs were significantly different.Each of these three conditions represents a circumstance in which relying on the relative standing of messages on one outcome as a guide to relative standing on the other outcome would be a consequential error.
One additional useful way of describing the relationships of interest is provided by rank-order correlations.For a comparison in an individual study, the rank-order correlation (or, for that matter, the Pearson correlation) is þ1.00 if the two messages' relative standing is identical on the two outcomes, À1.00 if the messages' relative standing is reversed, and .00if either of the two ESs is zero (i.e., if the two messages are tied on an outcome). 5Thus across studies, a random-effects metaanalytic mean correlation (Borenstein & Rothstein, 2005) can provide an indication of the degree to which standing on one outcome is diagnostic of standing on the other. 6 For analytic purposes data were combined across message variables.The present interest is not with relationships between different outcome measures for this or that message variable in particular, but rather with whether in general these measures are equivalent indicators.And for any given variable, sometimes relatively few studies obtained data permitting comparison of effect sizes for different outcomes.For example, of the 50 studies in Lau et al.'s (2007) meta-analysis of negative political advertising, only three studies reported data on two outcomes of interest.Separate analysis of those cases would be uninformative.The mean difference between the ESs, expressed as Cohen's q, was .001; the mean of the absolute values of q was .120.The proportion of comparisons in which the two ESs had different directions of effect was .159for attitude-intention comparisons, .183for attitude-behavior comparisons, and .064for intention-behavior comparisons.The attitude-intention proportion was not significantly different (two-tailed test, .05alpha) from the attitude-behavior proportion (z ¼ .44,p ¼ .660).The intention-behavior proportion was significantly smaller than both the attitude-intention proportion (z ¼ 2.45, p ¼ .014)and the attitude-behavior proportion (z ¼ 2.40, p ¼ .016).

Results
The proportion of comparisons in which the two ESs differed in direction and were statistically significantly different from each other was .039for attitude-intention comparisons, .100for attitude-behavior comparisons, and .009for intentionbehavior comparisons.The attitude-intention proportion was not significantly different (two-tailed test, .05alpha) from either the attitude-behavior proportion (z ¼ 1.91, p ¼ .056)or the intention-behavior proportion (z ¼ 1.51, p ¼ .131).The intention-behavior proportion was significantly smaller than the attitude-behavior proportion (z ¼ 2.84, p ¼ .005).
The proportion of comparisons in which there was a significant disjuncture between the two ESs was .056for attitude-intention comparisons, .100for attitudebehavior comparisons, and .037for intention-behavior comparisons.No two of these proportions were significantly different (two-tailed test, .05alpha).The attitude-intention proportion was not significantly different from either the attitude-behavior proportion (z ¼ 1.23, p ¼ .219)or the intention-behavior proportion (z ¼ .77,p ¼ .441); the intention-behavior proportion was not significantly different from the attitude-behavior proportion (z ¼ 1.67, p ¼ .095).

Similarity of effect size direction
The primary purpose of the present report is to provide evidence bearing on the suggestion that non-behavioral persuasive outcomes-attitude and intention outcomes specifically-can be a sound basis for diagnosing differences between messages in behavioral outcomes.These data indicate that these non-behavioral measures can indeed be effective ways of pretesting the relative behavioral impact of messages.When two messages have been compared for their persuasiveness in influencing both behavioral outcomes and these non-behavioral outcomes, the relative standing of the two messages does not vary much between the two kinds of outcome.
Specifically, the direction of difference between messages on attitude outcomes commonly matches the direction of difference on behavioral outcomes (82% of comparisons).Only rarely do attitude and behavioral assessments differ significantly and have different directions of effect (10% of comparisons) or otherwise exhibit significant disjunctures (10% of comparisons).The meta-analytic mean rank-order correlation was .96.
Similarly, the direction of difference between messages on intention outcomes quite commonly matches the direction of difference on behavioral outcomes (94% of comparisons).Only exceptionally rarely do intention and behavioral assessments differ significantly and have different directions of effect (1% of comparisons) or otherwise exhibit significant disjunctures (4% of comparisons).The meta-analytic mean rank-order correlation was .99.
The straightforward implication is this: The message choices that would have been made in formative research using attitude or intention outcomes are generally the same as those that would have been made if behavioral outcomes had been examined.That is, the message that appears more effective when examining non-behavioral outcomes is generally the same as the message that is more effective when examining behavioral outcomes.

Similarity of effect size magnitudes
In addition to seeing whether the direction of ESs (the direction of difference in effectiveness between messages) is similar in behavioral and non-behavioral outcomes, the present data can also shed light on the degree to which the magnitude of ESs (the size of the difference in effectiveness between messages) is similar in behavioral and non-behavioral outcomes.Specifically, the present data can address two questions concerning ES magnitudes.
One question is whether the ES for one kind of outcome is generally larger or smaller than another.These data indicate that there are no systematic differences in ES magnitudes between behavioral and non-behavioral ESs; it's not the case that (say) behavioral ESs are generally smaller than non-behavioral ESs.On the contrary: the mean value of q was À.01 for attitude-behavior ESs and was .01 for intention-behavior ESs.That is, averaged across comparisons, the difference between attitude and behavior ESs or between intention and behavior ESs is functionally zero.
A second question concerns the degree to which behavioral-outcome ESs and non-behavioral-outcome ESs differ in magnitude in individual applications.The mean difference between the absolute values of q was .13 for comparisons between attitude and behavior ESs and was .11for comparisons between intention and behavior ESs; these represent an average difference of roughly .12 between two correlation coefficients of the magnitudes seen for persuasion message-variation effects. 8 The implication is that the magnitude of a non-behavioral ES might provide at least a general guide to the expected magnitude of the behavioral ES.For example, if in pretesting using intention outcomes an ES (r) of .20 is observed, one might plausibly expect a behavior ES roughly between .08 and .32(i.e., .20 6 .12).
In short, behavioral and non-behavioral ESs characteristically do not differ dramatically-and behavioral ESs are neither generally larger nor generally smaller than non-behavioral ESs.

Comparing attitude and intention as pretest measures
As between attitude and intention measures as possible ways of pretesting relative message persuasiveness, these data give some reason to favor intention measures.The direction of behavioral-outcome effects (the direction of difference between messages) was significantly more likely to match the direction of intention outcomes (94% of comparisons matched) than the direction of attitude outcomes (82% of comparisons matched).And the proportion of comparisons in which the direction of effect did not match and the two ESs were statistically significantly different was significantly smaller for intention outcomes (1%) than for attitude outcomes (10%).There was no significant difference in the proportion of significant disjunctures observed with intention and attitude measures or in the meta-analytic mean rank-order correlations, though in each case the direction of difference also favored intention measures.

Moderating factors?
Identifying factors that moderate the diagnosticity of non-behavioral measures as message pretesting devices could be useful, because then better guidance could be given to formative researchers about the use of such message pretests.However, it should immediately be acknowledged that these data show considerable consistency across studies: it is exceptionally common that messages' relative standing on nonbehavioral measures matches their relative standing on behavioral measures.There might well be moderators of the diagnosticity of non-behavioral measures as message pretesting devices, but these will be difficult to ferret out.
In looking for variables that might be moderators here, one might naturally initially be attracted to factors that moderate the correlation between individuals' intention and behavior scores.But that is a misleading path.In message pretesting one seeks to predict not individuals' scores on behavioral outcomes, but messages' scores; it's the relative standing of messages on the behavioral outcome that's of interest, not the relative standing of individuals.As discussed above, just because individuals' scores on intention and behavior are strongly positively correlated does not necessarily imply that messages' means will covary similarly.And because these are different relationships (the relationship between individuals' intention and behavior scores on the one hand, and the relationship between messages' mean scores on intention and behavior on the other), a variable that moderates one relationship will not necessarily moderate the other relationship.
So, for example: As the time interval between the (pretest) intention assessment and the behavioral assessment increases, the correlation between individuals' intention and behavior scores will likely weaken (see, e.g., Sheeran & Orbell, 1998, pp. 234-235).Some people with initially positive intentions come, over time, to have negative intentions and hence at a delayed assessment have negative behavior scores; and some people who had negative intentions initially come, over time, to have positive intentions and hence at a delayed assessment have positive behavior scores.This reduces the predictability of individuals' relative standing on the behavioral outcome.
But in a message-pretesting setting, such changes over time will affect all participants, regardless of which message they saw in the pretest.Thus there is no reason to suppose that messages' means will be differentially affected by these processes, that is, no reason to suppose that messages' relative standing will be affected.With longer time intervals individuals' relative standing can be affected and hence the correlation between individuals' scores may weaken, but the relative standing of messages can be unaffected.The larger point here is: Just because a given variable moderates the relationship of individuals' intention and behavior scores does not necessarily imply that it will moderate the relationship of messages' means on intention and behavior.Identifying factors that moderate that latter relationship will require new avenues of approach.
Summary: non-behavioral measures as pretesting methods These data clearly point to the usefulness of non-behavioral outcomes in persuasive message pretesting.Messages' relative standing on attitude and (especially) intention outcomes closely matches their relative standing on behavioral outcomes.Notably, both attitude assessments and intention assessments appear to be much better indicators of messages' relative standing on behavioral outcomes (meta-analytic mean rank-order correlations of .96and .99,respectively, across 60 and 109 studies) compared to perceived-persuasiveness (PME) assessments as indicators of messages' relative standing on measures of actual effectiveness (AME); the mean PME-AME rank-order correlation was reported as À.05 across 35 studies (O' Keefe, 2018, p. 133).
At present, when a study reports differences between messages in persuasiveness based on non-behavioral outcomes, it is common for researchers to offer a disclaimer that their results do not speak to differences in behavioral effects. 9For example, using intention outcome measures, Ferrer, Klein, Zajac, Land, & Ling (2012, p. 459) found that affective boosters enhanced the persuasiveness of gain-framed appeals, but suggested that "research is necessary to determine whether affective boosters increase the persuasiveness of gain-framed messages for actual screening behavior." But given the present results, such disclaimers are unnecessary-indeed, inappropriate.If two messages differ in persuasiveness with respect to attitude or intention outcomes, they are very likely to differ similarly with respect to behavioral outcomes.Thus these results suggest that non-behavioral outcome measures-especially intention measures-provide formative researchers with a convenient and accurate tool for identifying relatively more effective persuasive messages.
However, this conclusion needs to be tempered in several ways.First, these nonbehavioral measures are unlikely to be well-adapted to pretesting situations in which a very large number of messages are to be pretested and so a design is contemplated in which each pretest participant responds to multiple candidate messages (see Cappella & Kim, 2017).It is not clear that asking the same intention question after each message will yield results that could confidently be relied upon.It remains to be seen whether any pretesting method can be diagnostic in such a pretesting circumstance.
Second, if the messages being pretested differ only slightly on the non-behavioral pretest measure, the range of plausible differences in ES magnitudes between behavioral and non-behavioral outcomes suggests some caution.For example, if a nonbehavioral pretest favors message A over message B but the ES corresponds to an r of .08,formative researchers should be prepared to see behavioral ESs in the range of À.04 to .20 (i.e., .08 6 .12)-thatis, including outcomes in which message B is superior in behavioral assessments.Such divergences appear to be rare (see Table 1), but formative researchers should be alert to the possibility.
Third, as with pretesting methods generally, if the candidate messages do not differ much in effectiveness, large pretest samples will be needed to reliably detect such differences.For example, to have 80% power (two-tailed test, .05alpha) for detecting a population effect corresponding to r ¼ .10,nearly 800 pretest participants will be required.
One final general caveat arises from the present analysis' acceptance of the underlying meta-analytic data as given.Those underlying data almost certainly contain error.In the present project, the decisions of the primary researchers (about the realization of experimental contrasts, outcome measurement, etc.) and the decisions of the meta-analysts (about inclusion criteria, how to compute effect sizes, etc.) are all embedded in the data set under analysis.And so all their errors, oversights, poor decisions, mistakes, etc., are inevitably part of these data.
However, there seems no reason to expect systematic error, that is, error that would bias the present results in some direction.And perhaps the diversity of the evidentiary base offers some reassurance on this point.Many different message variations, many different advocacy topics, many different kinds of assessments, many different primary researchers, many different meta-analysts are represented in these data.There surely is error in these data, but it seems likely to be haphazard.

Overall effects
The larger question addressed by these data is whether messages' relative standing with respect to one persuasive outcome is indicative of their relative standing with respect to other outcomes.As will be apparent, in general, attitude, intention, and behavior ESs do not systematically differ in the direction of effect.If message A is more persuasive than message B on any one of these outcomes, it is likely to be more persuasive on the other two outcomes as well.
Moreover, in general, attitude, intention, and behavior ESs do not systematically differ in magnitude.As discussed above, the mean differences (expressed as Cohen's q) between attitude and behavior ESs (mean q ¼ À.01) and between intention and behavior ESs (mean q ¼ .01)were essentially zero.A similar result obtains for the mean difference between attitude and intention ESs (mean q ¼ .00).The mean absolute values of q for those three comparisons were, respectively, .13,.11,and .12.So one might expect that the ES obtained for one of these three outcomes (expressed as r) might vary by roughly 6.12 from either of the other two.
These results thus extend O'Keefe's (2013) conclusion that "the relative persuasiveness of message types will be substantively identical if compared using attitudinal, intention, or behavioral outcomes" (p.244, emphasis added): the present results indicate that the relative persuasiveness of individual messages will also be substantively identical if compared using attitudinal, intention, or behavioral outcomes.
Relative vs. absolute persuasiveness So: Where claims about relative message persuasiveness are concerned, attitudinal, intention, and behavioral outcomes are interchangeable; conclusions about relative persuasiveness will generally be the same no matter which of these outcomes is examined.This result has been framed here by focusing on message pretesting, because questions of relative message persuasiveness naturally arise in that enterprise.
But claims about relative message persuasiveness can appear in contexts other than formative research.In particular, such claims are common in persuasion research aimed at testing theory-based hypotheses, because those hypotheses characteristically involve claims about the relative persuasiveness of different message forms.
Some such hypotheses predict simple (main-effect) differences in persuasiveness between two message forms.For example, Meyerowitz and Chaiken (1987) used phenomena such as negativity bias and loss aversion to underwrite the hypothesis that loss-framed appeals will generally be more persuasive than gain-framed appeals.The hypothesis was not that loss-framed appeals will generally be highly persuasive in any absolute sense, only that such appeals will generally be more persuasive than gain-framed appeals.
Other hypotheses predict that the direction of difference in persuasiveness between two message forms will vary depending on some moderating condition.For example, Rothman and Salovey (1997) predicted that where the advocacy subject is disease detection behaviors, loss-framed appeals will generally be more persuasive than gain-framed appeals, but where the advocacy subject is disease prevention behaviors, gain-framed appeals will generally be more persuasive than loss-framed appeals.The hypothesis concerned not the absolute persuasiveness of either message kind on a given subject, but rather the relative persuasiveness of the two kinds.
Still other hypotheses predict that the size (but not necessarily the direction) of difference in persuasiveness between two message forms will vary depending on some moderating condition.For example, the elaboration likelihood model predicts that the size of the difference in persuasiveness between strong-argument messages and weak-argument messages will be affected by the audience's level of involvement: as involvement increases, the size of the difference in persuasiveness between the two message types is predicted to increase (Petty & Cacioppo, 1986, p. 83).The hypothesis did not speak to the absolute persuasiveness of (say) strong-argument messages at any given level of involvement, but rather to the question of how the relative persuasiveness of strong-argument and weak-argument messages would be affected by involvement.
These various theoretically motivated hypotheses thus concern not the absolute persuasiveness of one message type, but rather the relative persuasiveness of two message types-and hence non-behavioral outcomes can appropriately be used in tests of such hypotheses.As the present data indicate, for assessing claims about relative message persuasiveness, data about non-behavioral outcomes can straightforwardly substitute for data about behavioral outcomes, because the relative persuasiveness of individual messages (or message kinds) is likely to be identical across these outcomes.
But for assessing claims or hypotheses about absolute message persuasivenessthat is, the absolute persuasiveness of a given message or message kind-behavioral outcome assessments are likely to be essential, because one cannot assume that nonbehavioral-outcome data and behavioral-outcome data will yield identical conclusions where such claims are concerned. 9For example, imagine assessing the effectiveness of a given advertising campaign using post-campaign assessments of intention and behavior.The results for the two outcomes could be quite different: "90% of respondents reported positive intentions but only 20% performed the behavior," say. 10  Behavioral-outcome data may be desirable for other purposes as well.Consider, for example, causal claims embedded in frameworks such as reasoned action theory (Fishbein & Ajzen, 2010).The claims of interest here are ones suggesting that (e.g.) attitudes influence intentions, which in turn influence behaviors.For assessing such claims, one useful form of evidence would come from experimental studies with longitudinal data about the effects of interventions (e.g., different persuasive messages) on attitudinal, intention, and behavioral outcomes (see Weinstein, 2007).Such data could provide evidence about both relative and absolute persuasiveness with different outcomes at different points in time.
In short, even though behavioral-outcome data might not be needed for assessing claims about relative message persuasiveness, such data might be essential for other purposes.Researchers should not thoughtlessly abandon behavioral assessments of persuasive outcomes.

Conclusion
In formative research, one would like to be able to compare possible messages to see which will be relatively more effective in influencing behavioral outcomes, but pretesting using behavioral measures is often not practical.Examination of within-study effect size comparisons finds that messages' relative standing is quite consistent across attitude, intention, and behavioral outcomes-and specifically indicates that messages' relative standing on intention outcomes is indicative of their relative standing on behavioral outcomes.The practical implication is that in formative research, intention outcomes can confidently be used in persuasive message pretesting as a convenient means of identifying those messagesthat will be relatively more persuasive with respect to behavioral outcomes.

Supporting Information
Additional Supporting Information may be found in the online version of this article.Please note: Oxford University Press is not responsible for the content or functionality of any supplementary materials supplied by the authors.Any queries (other than missing material) should be directed to the corresponding author.
the value for attitude minus the value for behavior.For the comparison of intention and behavior ESs, q was the value for intention minus the value for behavior.So, using the last as an example: positive values of q represent cases in which the intention ES was larger than the behavior ES; negative values of q represent cases in which the behavior ES was larger than the intention ES. 5. When there are only two cases, the Pearson r and the rank-order r are numerically identical, with only three possible values (þ1.00,À1.00, .00).So these correlations might justifiably be described as Pearson correlations-but describing them as rank-order correlations is meant to make plain the connection to the usual message-pretesting protocol: choose whichever message ranks higher on the pretest assessment.6.The Ns used for these meta-analytic computations were the attitude N (for attitudeintention and attitude-behavior comparisons) and the intention N (for intentionbehavior comparisons).7. The CIs for reported proportions are 95% adjusted Wald (Agresti-Coull) CIs. 8. Cohen's q is the difference between two z-transformed rs (not between two rs).A q of .12represents the difference between rs of .05 and .17, or .08 and .20, or .20 and .31, or .35 and .45, or .60 and .67. 9. Researchers have sometimes sought to justify such use of non-behavioral outcomes by pointing to positive correlations between individuals' scores on non-behavioral and behavioral measures (e.g., Cho & Choi, 2010, p. 310;Panozzo, Head, Kornides, Feemster, & Zimet, 2020, p. 260).As discussed above, such reasoning is defective.Even if individuals' scores on non-behavioral measures and behavioral measures are positively correlated, that does not show or imply that the relative standing of messages' means on non-behavioral measures will match the relative standing of messages' means on behavioral measures.10.And just to make the contrast with relative persuasiveness explicit: Imagine that with intervention A 90% of respondents reported positive intentions and 20% performed the behavior, whereas with intervention B 80% of respondents reported positive intentions and 10% performed the behavior.The relative effectiveness of the two interventions is identical across the two kinds of outcome (with intervention A outperforming intervention B), but the absolute effectiveness of a given intervention differs dramatically depending on which outcome is examined.

Table 1
provides results for each of the three kinds of comparisons of interest-between attitude ESs and intention ESs, between attitude ESs and behavior ESs, and between intention ESs and behavior ESs. 7Collapsed across the different kinds, a total of 401 ES comparisons were available.Of those, 346 (.863; 95% CI [.826, .893])had the same direction of effect; 360 (.898; 95% CI [.864, .924])were not statistically
a q (Cohen's q) is the difference between two z-transformed rs.significantly