Emotion and Reason in Political Language

: This paper studies the use of emotion and reason in political discourse. Adopting computational-linguistics techniques to construct a validated text-based scale, we measure emotionality in 6 million speeches given in U.S. Congress over the years 1858-2014. Intuitively, emotionality spikes during times of war and is highest in speeches about patriotism. In the time series, emotionality was relatively low and stable in earlier years but increased signiﬁcantly starting in the late 1970s. Across Congress Members, emotionality is higher for Democrats, for women, for ethnic/religious minorities, for the opposition party, and for members with ideologically extreme roll-call voting records.

An emotional speaker always makes his audience feel with him, even when there is nothing in his arguments; which is why many speakers try to overwhelm their audience by mere noise." -Aristotle In politics, when reason and emotion collide, emotion invariably wins.

Introduction
In his treatise on Rhetoric, Aristotle suggests that persuasion can be achieved through either logical argumentation or emotional arousal in the audience; success depends on selecting the most appropriate strategy for the given context. Cultivated by these early ideas, the classic dichotomy between emotions and affect (pathos) on the one side and rationality and cognition (logos) on the other has informed all realms of social sciences, from social psychology (LeDoux, 1998), to political philosophy (Elster, 1999), to economics (Frank, 1988). In the day-to-day of political debate, politicians resort to a mix of emotion and reason and search for the right balance between these two elements.
The extent to which politicians engage with this trade-off, and what institutional, political, and psychological factors underlie their choices, is largely unknown. Providing empirical evidence on these questions has been difficult due to the lack of a reproducible, validated and scalable measure of emotionality in political language. In this paper, we propose a measure that satisfies these requirements, and we extensively validate it against human judgement. Our approach builds on recently developed computational linguistics tools, which represent semantic dimensions in language as geometric dimensions in a vector space. The algorithm for this purpose, word embedding, transforms words and phrases to vectors, where similar words tend to co-locate and directions in the space (dimensions) correspond to semantically meaningful concepts (Collobert and Weston, 2008;Mikolov et al., 2013;Pennington et al., 2014). Our goal is to construct a dimension in this space corresponding to reason at one pole and emotion on the other. To this end, we take validated word lists for emotion and reason and construct the poles as the average vectors for these semantically coherent word groups. The relative emotionality of a word is the proximity to the emotion pole, relative to the reason pole. In turn, the emotionality of a document is the relative proximity of the document vector to the emotion pole. We compute scores for 6 million floor speeches reported in the U.S. Congressional Record for the years 1858 through 2014.
Our measure of emotionality in political language convincingly survives a rigorous sequence of validation steps, consistent with rising standards in empirical work using text data (Quinn et al., 2010;Grimmer and Stewart, 2013;Goet, 2019;Rodman, 2020;Rodriguez and Spirling, Forthcoming;Osnabrügge et al., 2021a). First, we qualitatively inspect the words and sentences that are most associated with the ends of the emotion-rationality spectrum.
The inspected examples are intuitive and satisfying. Second, we undertake a substantial human validation effort and ask human annotators to assess the relative emotionality of thousands of ranked sentence pairs. The ranking provided by our preferred measure agrees The Economic Journal with human judgment over 90% of the time, a superior accuracy to that obtained applying commonly used dictionary-based methods that count relevant terms.
Given the lengthy historic time period covered by our corpus, we also seek to validate comparisons over time. A targeted human validation shows that the emotionality score is historically valid for the whole time period of the Congressional Record back to the 1850s. In addition, we show that our measure of emotionality in politics is distinct from the political topics chosen (Quinn et al., 2010), from positive and negative sentiment (e.g. Rheault et al., 2016), from changes in the sophistication of political language (e.g. Benoit et al., 2019), and from emotionality trends in the broader society (e.g. Morin and Acerbi, 2017). These checks allow us to confidently make comparisons over time and to attribute the observed results to dynamics that are specific to emotion in political language.
In the empirical analysis section, we provide a rich description of how emotion and reason have varied over time, by topic, and across speakers in Congress. First, we look at long-run rhetorical history using our 150-year time series. Intuitively, emotional expression spikes in times of war. Further, we find a significant increase in emotionality since the late 1970s, coinciding with the introduction of televised Congressional floor debates via C-SPAN.
This descriptive evidence is consistent with C-SPAN motivating the use of more emotional rhetoric, in line with previous work on how television has reshaped politics (Gentzkow, 2006;DellaVigna and Kaplan, 2007;Martin and Yurukoglu, 2017;Durante et al., 2019).
Second, we compare emotionality across topics. We find, intuitively, that patriotism, foreign policy, and social issues are discussed with the most emotion, while procedure and federal organization are discussed with the least emotion. Within the realm of economic policy, issues related to taxation and redistribution have increased the most in emotionality Emotion and Reason 5 in recent years (especially for Republicans), coincident with the post-Reagan increase in economic inequality. In light of recent debates on the ideological foundations of inequality (McCarty et al., 2016;Piketty, 2020), it is illuminating that Republicans use emotional rather than rational appeals when arguing about redistributive policies.
Third, we assess how emotionality varies across politicians' personal characteristics and institutional factors. Democrats, women, and racial/religious minorities tend to use more emotive language than Republican, male, white protestants serving in the same chamber and year. Further, members of Congress use more emotional language when in the opposition (minority) party. Overall, emotion appears in situations of disempowerment, not just in terms of political party control, but also for disadvantaged identity groups. This result is consistent with explanations of emotional appeals from both behavioural economics and political economy: Emotion may help politicians deal with loss of control or frustration of expectations (e.g MacLeod, 1996;Lin et al., 2006), or it may serve to push policy positions (Jerit et al., 2009) and complement minority representation strategies (see Swers, 2002;Tate, 2018).
Finally, we look at how emotionality is related to partisan polarization. We find that politicians with highly partisan roll-call-voting records (on either the left or the right) use more emotion than their more moderate colleagues. This higher emotional expressiveness is driven both by the speech topics chosen but also by how the same topics are framed. Hence, trends in emotive rhetoric are linked to increasing polarization in U.S. politics (McCarty et al., 2016;Gentzkow et al., 2019b).
Overall, the paper provides both a methodological and a substantive contribution.
Methodologically, we push forward the use of text analysis in economics and political The Economic Journal economy (Gentzkow et al., 2019a). The focus of most previous work has been partisan differences in language, taking a supervised learning approach (Gentzkow and Shapiro, 2010;Jensen et al., 2012;Ash et al., 2017;Gentzkow et al., 2019b). Baker et al. (2016) and Enke (2020) each use a dictionary approach, respectively to analyze policy uncertainty and moral norm priorities. Hansen et al. (2018) use a topic model to analyze central bank communications. Our new method, using word embeddings to scale emotionality, addresses the technical limitations of dictionary methods while still targeting a specific dimension of discourse.
Substantively, we add to the literature on rhetorical choices in political communication, and in particular the role of emotions in politics (e.g. Marcus, 2000;Lau and Rovner, 2009).
The previous literature has shown that political speech sentiment and emotional intensity respond to economic conditions (Rheault et al., 2016), ideological divisions (Kosmidis et al., 2019), institutional context (Osnabrügge et al., 2021b), and the characteristics of the speaker (Dietrich et al., 2019;Hargrave and Blumenau, 2020;Boussalis et al., 2021). Consistent with these findings, we find coherent descriptive evidence that emotional rhetoric corresponds to prevailing political opportunities and conditions, including party control, personal identity, and ideological polarization.
From a more historical perspective, a number of studies have attended to the long-run evolution of rhetoric in parliaments. The emerging theme is that of increasingly polarized language, accompanied by a general simplification. Upward trends in divisive language around party lines in U.S. Congress have been repeatedly replicated (Jensen et al., 2012;Gentzkow et al., 2019b;Rheault and Cochrane, 2020), with comparable trends also seen in U.K. Parliament (Peterson and Spirling, 2018;Goet, 2019). Meanwhile, the linguistic Emotion and Reason 7 sophistication of political speeches has decreased over time (Lim, 2002;Benoit et al., 2019), and confidence among politicians has increased (Jordan et al., 2019). We add an important piece to this picture by observing that the secular trends in polarization, simplification, and confidence have been accompanied by more intense expression of emotion. All of these trends can be understood as a coherent shift toward a rhetoric that addresses voters rather than fellow politicians and elites.
More generally, this research adds to a long tradition on the dichotomy of emotion and reason in social theory and social science (Damasio, 1995;LeDoux, 1998;Elster, 1999). A classic view from economics is Frank (1988), who explores how various emotions support both self-interested and socially conscious decision-making. A subsequent line of work in behavioural economics has shown the role of emotions in supporting prosocial behaviour, for example through motivating costly punishment (MacLeod, 1996;Bosman and Van Winden, 2002;Xiao and Houser, 2005;Van Buskirk et al., 2012). Overall, emotions are complementary with rationality in supporting human decisions and communication (e.g. Elster, 1999;Loewenstein, 2000;Kahneman, 2011;Lerner et al., 2015;Wälde and Moors, 2017). Thus, it is not surprising to observe a pivotal role for emotions, along with reason, in political discourse.

Measuring Emotion and Reason in Text
This section outlines the approach to measuring dimensions of emotion and reason in unstructured text. After giving some details on the political speeches corpus, we describe The Economic Journal the word lists for identifying emotion and cognition dimensions. Then, we introduce word embeddings and how they allow us to scale documents in this emotion-reason dimension.

Congressional Speeches Corpus
Our empirical corpus comprises digitized transcripts of the universe of speeches in the U.S.
House and Senate between 1858 and 2014 (N = 7, 336, 112 speeches). The corpus includes all speeches from the U.S. Congressional Record, after removing those speeches that contain readings of pieces of legislation.
The corpus pre-processing can be summarized as follows (see Appendix F.1 for additional details). Each speech in the corpus is first segmented into sentences. To extract the most informative tokens, we tag parts of speech and take only nouns, adjectives, and verbs. Punctuation, capitalization, digits, and stopwords (including names for states, cities, months, politicians and procedural words) are removed. Tokens are stemmed using the Snowball stemmer. After filtering out rare stems (those occurring in less than 10 speeches), we have 113,055 token types left in the vocabulary.

Dictionaries for Emotion and Cognition
The first ingredient in our method is to use thematically categorized lists of words for emotion and reason. To build lists of emotive and cognitive words, we start with Linguistic Inquiry and Word Count (LIWC), a leading set of categorized dictionaries validated by linguistic psychologists (Pennebaker et al., 2015). LIWC researchers have collected coherent sets of words, word stems, and idiomatic expressions that map onto various structural, cognitive, and emotional components of language.
From LIWC we take two word lists. First, to get at reasoning we use the "Cognitive Processing" category, consisting of 799 words, phrases, and wildcard expressions. This category embraces concepts of insight, causation, discrepancy, certainty, inhibition, inclusion, and exclusion. Second, to get at emotion we use the "Affective Processing" category, comprising 1,445 tokens, phrases, and wildcard expressions. This category refers to emotions, moods, and other affective states -both positive (joy, gratitude) and negative (anxiety, anger, sadness).
We reviewed the raw LIWC dictionaries and adapted them to analysis of Congressional speeches (see Appendix F.2 for details). We excluded a number of inappropriate patterns (e.g. emojis, punctuation, digits, multi-word expressions), and a number of words that do not translate well to the Congressional Record (e.g., "admir*" matching to "admiral"). At the end of the process, we have a list of stemmed nouns, verbs, and adjectives representing affective processing (629 tokens) and cognitive processing (169 tokens). Let A and C represent these word lists. Appendix F.5 provides the two final dictionaries and the frequency of each dictionary word in the corpus.
The word lists for emotion and cognition can already be used to produce a dictionarybased measure of emotionality. This type of measure, where one counts the words from the dictionary to detect semantic domains in documents, is the previous standard in social science (Kosmidis et al., 2019;Osnabrügge et al., 2021b). For our analysis below, we produce such a measure based on the relative frequency of these words in each speech (Appendix B.1 describes how this measure is calculated). In the human validation below, we will show that a dictionary method compares poorly with human judgments about the emotionality of congressional speech snippets.
The Economic Journal The problem with the dictionary approach, in our setting as in others, is that the method relies too heavily on the presence or absence of the particular listed words. The dictionary approach requires that the dictionary is reliably specified. This puts a lot of pressure on the researcher to identify all emotive words and their variants. This task could be especially difficult in historical contexts where the set of probative words might not be clear from a contemporary perspective.
A second problem is that the dictionary approach assumes that each word in the emotionality list is treated as equally indicative of emotionality, while each word not in the list is treated as equally indicative of emotionlessness. But this model of language is clearly wrong. For example, the word "like" could refer to preference or to similarity, while the word "dislike" only refers to preference (with "unlike" reserved for dissimilarity). A properly constructed emotion scale would give more weight to "dislike" than "like", but a dictionary approach assumes binary categories and cannot scale words continuously.

Embedding Approach to Scaling Emotionality
Word embeddings are well-suited to addressing the main problems with the dictionary approach. Rather than require that all words with emotive content be identified, word embeddings only require that a representative sample of emotive words are identified. In addition, word embeddings can flexibly learn from the corpus the intensity with which words are emotively associated, with no assumptions of discrete categories. In particular, if some emotive words in an historical period are missing from the specified list, their emotive association can still be learned by the model and accounted for by the resulting scale.
Beyond lexicon construction, a continuous scale can capture more subtle linguistic cues implied by full sentences. In contrast, a word-counting approach relies on the sparse, explicit, and intentional placement of emotion-laden words. Thus, as shown in Caliskan et al. (2017) and Ash et al. (2021), word embedding dimensions tend to reveal more about social attitudes than do word counts (see also Garg et al., 2018;Kozlowski et al., 2019).
Word Embeddings. More formally, word embeddings are a tool from natural language processing for learning numerical representations of words based on co-occurrence statistics in a given corpus (Mikolov et al., 2013;Pennington et al., 2014). A word, normally a string object drawn from a high-dimensional list of categories, is "embedded" in a lower-dimensional space, where the geometric location encodes semantic meaning. Semantically related words (e.g. "happy" and "joyful") will tend to have geometrically proximate vectors. Semantically unrelated words (e.g. "happy" and "econometrics") will tend to have geometrically distant vectors.
In the context of word embedding algorithms, semantic relatedness means that the words appear in similar contexts. The key intuition is: "You shall know a word by the company it keeps" (Firth, 1957). Take the sentence, "I was to learn that I had won re-election." While happy and joyful would fit in nicely, econometrics would not. A word embedding algorithm learns word locations that predict which words would best complete any given sentence. The useful result is that directions in the embedding space correspond to semantic dimensions of language (e.g., emotion and rationality dimensions).
Thus we learn a vector w corresponding to each word w in the vocabulary based on how words co-occur in Congressional speeches. More technically, we learn word embeddings The Economic Journal using the Word2Vec algorithm from Mikolov et al. (2013) applied to the full corpus. We use the python gensim implementation with 300 dimensional vectors, an eight-word context window, and training for 10 epochs. These are all standard hyperparameter choices from the applied NLP literature. Rodriguez and Spirling (Forthcoming) and Ash et al. (2021) show that results produced from word embeddings are generally robust to variation in those choices.
Scaling Congressional Speeches by Emotion and Cognition. We now can use the embeddings and our dictionaries to scale speeches in the Congressional Record with an emotionality score. First, the word embeddings are combined with the thematic word lists to isolate directions in embedding space corresponding to emotion and reason. The vector A representing emotion is the average of the vectors w for the words in the emotion word list, w ∈ A. The vector C for cognition is defined analogously. 1 Second, we produce vector representations for each congressional speech, using the same specification as done for the emotion and rationality poles. Let the vector d i for speech i be the average of the vectors w of the words w in the speech. 2 Thus we construct a 300-dimensional vector for each speech in the Congressional Record.
1 Some recent papers have used this approach to extrapolate word lists more effectively to the political domain. Word embedding models can expand the dictionaries to larger lists of sentiment or emotion words (e.g. Rheault et al., 2016;Rice and Zorn, 2021;Osnabrügge et al., 2021b). This approach can address the problem of missing words in the dictionary, but it does not address the problem that dictionaries assume discrete categories of words, rather than a continuous scale of emotion. 2 In the preferred specification, the document vector averages (as well as the emotion and cognition vector averages) are weighted by the smoothed inverse frequency of each word, as done in Arora et al. (2016). That is, words that appear relatively often are down-weighted, while words that are relatively rare -and therefore distinctive -are up-weighted. This weighting improved performance of the metric in human validation, but does not make a difference in the downstream results. See Appendix F.3 and Appendix Table A6. We note, further, that this step of dimension reduction using word embeddings is an alternative to the regularization approach taken by Gentzkow et al. (2019b) to address sparsity in their high-dimensional n-gram representation of the Congressional speeches.

Emotion and Reason 13
Taking these ingredients together, we can scale texts along the emotion and cognition dimensions. Our measure for the emotionality of speech i is is the cosine similarity between vectors v and w. The addition of a constant b in the numerator and denominator is for smoothing outliers; we set b = 1 but it can be set to any small positive number. An increase in Y i indicates a shift towards the emotion pole relative to the cognition pole. In addition, we can produce separate measures for emotion and cognition, respectively, with sim(d i , A) and sim(d i , C).
A speech that is equally emotive and cognitive would take value Y i = 1. Appendix Figure A4 shows the distributions of the measure and its emotive and cognitive components.
Appendix Table A4 and Appendix Figure A3 show the distribution of Y i and how that evolved over time in U.S. Congress.
Alternative Emotionality Measures. For robustness and to assess better the performance of our measure, we calculate two alternative measures of emotionality in speeches based on the previous literature. First, as already mentioned, we compute a count-based measure using the frequencies that the words in our dictionaries appear in each speech. The count-based measure turns out to produce a quite different ranking of speeches than the embedding-based measure from Eq. (1). As shown in Appendix Figure A4, the distribution of the count measure is highly sparse and skewed because it relies on the presence or absence of words in the dictionaries. The correlation coefficient is 0.15 with our baseline measure The Economic Journal in the full dataset. In addition, the count-based measure performs much worse in the human validation. Still, many of our central results hold when using the count-based measure (Appendix B.1).
Second, we compute an alternative distance metric from the embeddings, based more closely on Kozlowski et al. (2019) and Ash et al. (2021) and using the feature that analogous dimensions in vector space can be constructed with vector differences. In these previous papers, a gender dimension is constructed as the "male" vector minus the "female" vector.
Correspondingly, we isolate an emotion-to-cognition dimension as the vector difference AC = A − C (the vector for "affect" minus the vector for "cognition"). Then the emotion score for document i is the cosine similarity to the differenced vector, sim(d i , AC).
Conceptually, this "geometric" measure directly leverages the "analogy-solving" capacity of word embeddings. In subtracting cognition from emotion, the scale implicitly assumes a semantic dimension along which an increase in the relatedness to one concept pole (i.e. emotion) corresponds to a decrease in relatedness to the other concept pole (i.e. reason). In contrast, the baseline ratio measure from Eq. (1) relies only on the assumption that vector distances proxy for semantic distances, allowing for cases where an increase in emotion does not imply a decrease in reason. Thus we prefer the ratio measure as the baseline in our setting. Moreover, it performs slightly better than the geometric measure in the human validations. Practically, however, the data reveal that the two measures are highly correlated in the congressional speeches. The geometric measure's correlation coefficient with our main score is 0.95 in the full sample of speeches, and thus the measures can be used as substitutes. Unsurprisingly, our empirical results are robust to using the geometric measure

Emotion and Reason 15
3 Validation This section reports our multiple validation exercises (as in Quinn et al., 2010;Goet, 2019;Osnabrügge et al., 2021a). First, we show qualitative evidence that our approach captures distinctive semantic dimensions that correspond to emotion and cognition. Second, we compare our measure to human judgements about the emotionality of short speech segments.
The pairwise rankings provided by our embedding-based measure agree with human rankings over 90% of the time, much higher than that for a more standard count-based measure.

Qualitative Evaluation of the Semantic Dimensions
We first ask: Do the vector dimensions underlying our measure capture qualitatively coherent and distinctive semantic dimensions in language? A simple test for semantic validity is to inspect the language associated with the geometric poles for cognitive and emotional language. For each word in the vocabulary outside the lexicon, we compute the relative similarity to the cognitive and emotive poles. This gives a ranking of the words along a single cognitive-to-emotive dimension. To evaluate these dimensions of language in context, we next inspect prototypical speech snippets that correspond to the emotional and rational poles. After sampling speeches from The Economic Journal the top and the bottom of the distribution, we then sample the most emotive and cognitive sentences within those speeches, for a qualitative analysis. 3 Appendix Table A1 provides lists of example sentences for the most emotional and most cognitive speeches, respectively. Consistent with the word clouds, there is a clear differential in the tone, following intuitive language for logic and emotion. For example, the emotional sentences feature tributes to colleagues and to veterans, while the cognitive sentence include dry enumerations of policy details. 4 Differences in emotionality also emerge within specific topics: As illustrative examples, we report the most emotional and most cognitive sentences about taxation (Appendix Table A2) and abortion (Appendix Table A3).
A potential question with our measure is that it might capture positive and negative sentiment, as opposed to cognition and emotion. Hence, we would like to demonstrate that emotionality is a separable dimension of language from sentiment. For this purpose, we construct positive and negative sentiment dimensions in our embedding space using our centroid method, with positive and negative seed lexicons taken from Demszky et al. (2019) (see Appendix A.4 for details). We can then assess whether emotionality and sentiment dimensions work independently.
First, we inspect the 2×2 semantic context around four centroids in our embedding space: cognitive-positive, cognitive-negative, emotive-positive and emotive-negative. Appendix Fig-ure A1 shows word clouds for the closest vectors to these four poles, revealing intuitive and distinctive words in each of these groups. The cognitive dimension has both positive tone 3 Specifically, we select speeches that fall within the 1st and 99th percentiles for the score distribution. We then exact 10 random sentences among the highest and lowest scoring sentences within the sample. 4 Appendix Table A5 shows additional examples where we have excluded any sentences containing a word from the lexicons. These sentences are still clearly and intuitively related to emotion and logic, respectively, yet they would be missed by a lexicon-based approach. (discern, knowledge, insight) and negative tone (contradict, vague, irrelevant). For emotion, the positive (serene, smile, thrill) and negative (frighten, disgust, sicken) are even more divergent.
Similarly, in our dataset of speeches, emotionality and sentiment are separable. Appendix Figure A2 provides a scatter plot of speeches across the two dimensions and shows they are only weakly positively correlated. The R 2 from regressing emotionality on sentiment is just 0.011.

Validation with Human Judgment
This subsection reports the results of a human annotation task to assess the validity of our score in capturing emotion and cognition in language (e.g. Lowe and Benoit, 2013). The task is as follows. Coders are provided with pairs of sentences extracted from the corpus.
For each pair, they are asked which sentence is more emotional and which sentence is more cognitive. In particular, the coder is provided with three options: (i) sentence A is more The Economic Journal emotional[logical] than sentence B, (ii) sentence B is more emotional[logical] than sentence A, and (iii) the sentences are equivalent or I don't understand one or both of the sentences.
No additional information is provided about where the snippets come from.
In the baseline validation check, sentence pairs are constructed as follows. We start by selecting the 5 000 most and least emotional speeches for each decade. From those speeches, we extract and score all the sentences. Finally, we randomly pair sentences that come from the top and bottom 5% of the score distribution. Pairs are always formed from sentences that come from the same decade, and all decades are roughly equally represented in the set of annotated snippets.
The annotators are Amazon Mechanical Turk workers born in the USA and whose primary language is English. Each coder is asked to code 10 sentence pairs (20 sentences).
To assess inter-coder reliability, each pair of sentences is annotated by two different coders.
In addition, each coder took a simple English comprehension test, which asked them to correctly separate a set of unambiguously emotion and cognition words into two groups. 5 We obtain 1 714 annotations in total. The coders chose option iii (could not understand the snippets or judge the relative emotionality) for only 3.5% of the sentence pairs. These pairs are not considered for the computed accuracy statistics.
to annotated pairs where both assigned coders agree on the ranking (columns 7-9), accuracy reaches 93%.
Even if the measure is accurate overall, we must still confront the possibility that it is not valid for the older time periods in our sample. In particular, inconsistency may come from the use of modern seed dictionaries to evaluate dimensions of language in the past, since word meanings shift over time (Hamilton et al., 2016;Garg et al., 2018). We address this issue directly. In Panel B, we report analogous statistics when subsetting the pairs by the decade when they were spoken in Congress, starting from the first decade, i.e. 1858-1868, up until the sixteenth (incomplete) decade, i.e. 2008-2014. Importantly, there are no significant drops in the accuracy of our score in earlier decades. This temporal validation addresses a major concern with our method: Although it relies on recently developed seed dictionaries that use modern understandings of emotional and cognitive language, the final score produces a time-consistent measure of emotionality. Therefore we can produce meaningful long-run historical comparisons.
In Panel C, we compare the performance in human validation for the two alternative measures of emotionality, described above in the methods section. First, the geometric measure Sim −→ AC refers to the cosine similarity between each document vector d and the cognition-to-emotion dimension AC, as done in Kozlowski et al. (2019). This vector-distance alternative obtains very similar accuracy to our baseline measure in the human validation task. Second, Word Count refers to the count-based measure giving the ratio of emotion words to cognition words. The performance for the count-based measure is much worse than the embedding-based measures and comparable to random guessing. 6 6 Appendix A.7 provides some additional results on the human annotation validation. Appendix Table A6 reports a set of complementary assessments using alternative sentencing pairing procedures based on variants of the emotionality measure. show that members of the minority party resort systematically to more emotional rhetoric.
The use of emotional language also differs across individual politician's characteristics, such as their gender, race and religion, and is positively correlated with ideological extremism.

Emotionality over Time
An initial descriptive question is how the relative use of emotion and reason has shifted over time. We use the long temporal range of our data to show the evolution of emotive language since the start of our data in 1858. These results add to other recent work looking at evolution of party polarization in congressional speeches over this period Gentzkow et al. The Economic Journal higher emotionality during these events is intuitive and adds credibility about the behaviour of the measure.
Next, consider the broader trends. Emotionality makes a slow but steady increase up until the 1950s, then drops a bit in the early 1970s, and then starts a more rapid increase starting in the late 1970s which continues until the present. This striking pattern is seen for both chambers. The trend break is especially salient for the House of Representatives and is followed with some delay by the Senate.
We highlight that the emotionality trend is quite different from trends in text polarization -that is, differences in the language used by Democrats and Republicans. Jensen et al. (2012) and Gentzkow et al. (2019b) find historically low levels which then increase only starting in the mid 1990s. Thus, the rise of emotive rhetoric pre-dates the current wave of polarized language. In addition, as shown in Appendix B.4, we can rule out that the shift is due to changes in the readability or simplicity of language (Benoit et al., 2019). Emotional language is a distinct dimension of political rhetoric that evolves independently from other salient dimensions.
A potential concern is that these trends reflect changes in language generally, rather than changes in the political sphere. To check for this possibility, Appendix B.5 provides a comparison trend in emotionality for a more general historical corpus: Google Books.
Emotional language in Google Books actually declines up until the 1980s, after which it shows a small rebound (Appendix Figure A12). 7 Thus, the trends we see in Congressional speeches appear to be specific to politics. Appendix Figure A13 shows that we can normalize 7 This trend in emotional expression is similar to that estimated by Morin and Acerbi (2017), who also use Google Books but focus on fiction. They write: "Our data confirm that the decrease in emotionality in English-speaking literature is no artefact of the Google Books corpus, and that it pre-dates the twentieth century, plausibly beginning in the early nineteenth century". Acerbi et al. (2013)  Given that these trends are indeed politics-specific, that makes the trend break in the late 1970s especially noteworthy. An intriguing possible explanation is the introduction of C-SPAN, a public television network for Congress that started broadcasting from the House in 1979 and from the Senate in 1986. Zooming in on this time period, we note that the first Congress elected after the founding of C-SPAN takes office in 1977. This is the precise timing of the trend break in emotional language. It could be that when television comes online in Congress, that increases the marginal benefit to emotional language in floor speeches as more voters are now viewing them. While such a mechanism deserves additional investigation, it would be consistent with previous empirical work on the effectiveness of emotional appeals in influencing voters (Gross, 2008;Brader et al., 2008;Renshon et al., 2015).

Emotionality and Topics
Our second descriptive analysis is to look at how emotive-cognitive content for congressional debates varies by topic. In particular, the observed emotional variation over time may be driven by the selection of different topics over time, and that politicians talk more about emotionally charged issues in recent years. Alternatively, politicians may have changed their rhetoric style in how the same topics are framed.
To understand the relationship between emotionality and topics, we apply an unsupervised topic model (latent dirichlet allocation or LDA, see e.g. Blei 2012). This is the same 8 Note that the decreasing trend in Google Books also addresses another potential issue with our measure: that it is built with LIWC, a dictionary based on contemporary language as of 2015. On top of the consistent rates of human validation across decades (Table 1), this confirms again that our measure is not just picking up increasing use of the language used in LIWC; if that were the case, we would also see a similar increase in Google Books.
The Economic Journal We apply LDA to the full pre-processed corpus, with speeches treated as documents and assuming 128 topics. To get at non-emotive dimensions in language, we drop from the vocabulary all words in our emotive-cognitive lexicon. Appendix Table A11 lists the topics learned by the topic model and the most representative words for each topic. Overall, the quality is good and 119 of the 128 topics are recognizable as a coherent topic. For ease of interpretation, we inspected the individual topics and aggregated them into eleven larger categories (also indicated in Appendix Table A11).
Using the trained model, we assign to each speech the topic with the highest probability based on the speech content. Appendix Figure A17 shows the historical proportions of the eleven broad topic categories in congressional speeches over time. Speeches concerning procedural aspects of decision-making comprise the largest single category. 9 The share of procedural speeches shrinks slightly over time, mostly in favour of speeches on social issues and speeches that hinge upon a national narrative, historical heritage, or patriotism. Given the proportional importance of procedure, Appendix Figure A19 shows robustness of our main time series results to dropping procedural speeches.
To show topic-level variation in emotional expression, we residualize out time fixed effects (to adjust for secular trends) and then compute the average topic-specific emotionality. On the other side of the spectrum, it is not surprising that speeches referring to internal Procedure and Governance (government organization) tend to rank low and to use more cognitive language.
How does this topic-level variation in emotionality vary by political party affiliation? Figure 3 Panel B reports the ratio of Republican emotionality to Democrat emotionality by topic. First, and perhaps most strikingly, fiscal policy is the most Republican-slanted topic 9 See Appendix Figure A18 for the time series of topic shares after excluding procedural speeches. 10 For the rankings of all 128 individual topics, see Appendix Figure A16.
The Economic Journal in its emotional content, with the Republican score being 2.5 times larger than the Democrat score. In comparison, most other topics are quite similar across parties in emotive content.
The exceptions are two Democrat-slanted topics: social issues, which makes sense in light of Democrats' defense of civil rights and women's rights, and economic policy, a topic that is focused on regulation of corporate misbehaviour. Thus emotionality helps capture partisan differences in policy priorities.
Next, we explore the time series in emotionality by topic. As illustrated in Figure 4 Panel  emotional rhetoric to defend inequality-increasing fiscal policies. In light of the evidence that economic inequality increases political polarization (Garand, 2010;McCarty et al., 2016;Piketty, 2020), it makes sense that divisive issues related to redistribution have become more emotionally charged.
The Economic Journal

Emotionality and Politician Characteristics
So far we have looked at the broad temporal, topical, and partisan factors explaining emotion and reason in U.S. legislative politics. In this section we assess how emotionality varies across politicians. First, we look at party opposition status. Second, we attend to politician identity characteristics. Third, we compare to partisanship in voting records.
Opposition Status. First, we explore whether U.S. politicians resort to emotionality more when they are in the opposition. As discussed in Green (2015) and Lee (2016), minority-party politicians are engaged in crafting a national message to accrue electoral gains in upcoming campaigns. Emotional language can be used to communicate large and consensual values Jerit (2004), and it is more likely to be reported by traditional and social media (Bennett, 2016;Brady et al., 2017). Thus politicians may use more emotional language when they are in the minority party.
As initial visual evidence on this point, Figure  Throughout the time series, changes in House majorities correspond to changes in relative emotionality in the two parties. 11 Partisanship in Voting. To delve further into the role of emotional rhetoric in political division, we next explore its relation to ideological policy choices. Previous work has shown that ideological extremists use dissent with their own party to appeal to extreme voters (Kirkland and Slapin, 2018). Extremism can also be associated with simpler sentences and longer speeches (Slapin and Kirkland, 2020). We test whether Members of Congress that are more ideologically polarized are also more likely to use emotional rhetoric.
We measure ideological extremism using DW-NOMINATE, a standard measure constructed from roll call votes. DW-NOMINATE summarizes the tendency of a congressman to vote with Republicans versus with Democrats. As initial visual evidence for a relationship, we see in Appendix Figure Table A9 reports the estimates from a series of ordinary least squares regressions for the effect of opposition status on the emotion score, for both the House and the Senate. The regressions include chamberyear fixed effects and standard errors are clustered by politician. These results confirm that the dynamic relation noted in Figure 5 is statistically significant when looking at both chambers. Including politician fixed effects reveals that the same politician uses more emotional appeals when her party is in a minority position, relative to her personal average level. Results are not driven by the choice of different topics, for example due to mechanical differences in responsibility for procedural functions.
The Economic Journal choices, and standard errors are clustered by politician to allow for serial correlation in the error term by politician across speeches and over time. Table 2 reports the results in the first row of estimates. There is a significant and positive relationship between ideological voting and emotionality (column 1), which holds when adjusting for politician demographics (column 6). The ideology effect is of the same magnitude even when conditioning on topic fixed effects (column 7), showing that polarized rhetoric comes through the framing of topics rather than selection of topics. 12 Identity Characteristics. The next question is: Are there observable individual characteristics of Congress members who tend to use more emotional language? For example, members from demographic groups that are underrepresented in Congress are typically associated to distinctive policy position and representation choices (e.g. Swers, 2002;Tate, 2018). We explore whether members of underrepresented groups are also more likely to use emotional language.

Demographic information on Members of Congress comes from the CQ Press Congress
Collection. The explanatory variables of interest are indicator variables for party, gender, race, and religion. The regression specification is the same as that used above for DW-NOMINATE, with chamber-year fixed effects used to compare Congressmen to their contemporaneous colleagues.
As reported in Considering also the estimate for DW-NOMINATE, this evidence suggests that identity characteristics are more pivotal than political factors for variation in emotion across politi- cians. An important question, then, is whether the long-run changes in emotionality from Figure 2 are due to a changing composition of politician types. We explore this issue in Appendix Figure A8, where we compare the unconditional time trends with the emotionality score residualized on demographic variables (gender, race, religion). We can see that after this adjustment for demographics, the trends in emotionality remain virtually unchanged. 14

Conclusion
This paper has provided an analysis of emotion and reason in the language of U.S. Members of Congress. We produced a new measure of emotive speech, which combines dictionary 13 Appendix Table A10 reproduces the main results controlling for the length of the speech and positive or negative sentiment. The results are unaffected by the inclusion of these controls on rhetoric styles: This suggests that the group differences in emotionality are not driven by a different use of language. Table A10 also shows that these relationships are roughly constant over time. To provide additional visual support for these estimates, Appendix Figure A20 shows the time series of emotionality by gender and race. The differences in the use of emotional language across demographic groups is constant over time. Appendix Table A8 reports the same results on the two separate components of the emotionality score -emotion by itself, and reason by itself. See Appendix C for a discussion of the differences in those results. 14 In Figure A9 we also include topic fixed effects and can explain some, but not all of the long-run changes in emotionality.
The Economic Journal Second, we produce a series of results on how emotional rhetoric is related to power imbalance and conflict. Emotionality is higher for less empowered political minorities: women, hispanics, blacks, Jews, and Catholics. The status of being in the minority party and therefore having less power over policy increases emotional language. Relatedly, we find evidence for emotions as a response for conflict. They increase during wars. Income inequality is an ingredient for class conflict over redistribution, which we can observe in high emotional intensity on fiscal policy. And finally, we find evidence that the more divisive and ideologically polarized members of Congress tend to use more emotional rhetoric.
The new measurement approach and initial descriptive results set the stage for a number of further empirical studies. Notably, further research is needed to understand the role of television in increasing emotional rhetoric. This work could go beyond the introduction of C-SPAN, for example to focus on partisan cable news (e.g. Clinton and Enamorado, 2014;Arceneaux et al., 2016;Martin and Yurukoglu, 2017). Such an investigation should consider Another important open question concerns the relation between emotionality and polarization. As affective polarization in the electorate is on the rise in the U.S and Europe alike (Iyengar et al., 2019), more attention should be devoted to understand the possible feedback loops between polarization and emotive speech in parliaments. Do politicians use more emotion when discussing people, policies, or principles?
Beyond these substantive avenues, the new emotionality metric could itself be a useful tool to be applied in other empirical contexts. In Congress, analyzing committee debates would be a natural next step to delve deeper into congressional dynamics. Measuring emotional expression in newsletters that congressmen send to their constituents would provide for interesting insights into linkages between a politician and her constituency. Outside of politics, news articles or television transcripts would be perfect candidates to provide evidence on how expressed emotion is used for different persuasive and professional purposes.
Finally, our methodology may inform experimental studies of how emotionality in political language influences voters. Using the emotion metric combined with generative language models (e.g. Radford et al., 2019;Brown et al., 2020), it is possible to identify or generate comparable political arguments that differ in their use of emotive language. More causal analysis of how emotions influence voters is needed to validate the mechanism of emotional rhetoric as a strategic response to voter preferences.

Supplementary data
The data and codes for this paper are available on the Journal repository. They were checked for their ability to reproduce the results presented in the paper. The authors were granted an exemption to publish parts of their data because access to these data is restricted. However, the authors provided a simulated or synthetic dataset that allowed the Journal to run their codes. The synthetic/simulated data and the codes for the parts subject to exemption are also available on the Journal repository. They were checked for their ability to generate all tables and figures in the paper, however the synthetic/simulated data are not designed to reproduce the same results. The replication package for this paper is available at the