Social Media Public Opinion as Flocks in a Murmuration: Conceptualizing and Measuring Opinion Expression on Social Media

We propose a new way of imagining and measuring opinions emerging from social media. As people tend to connect with like-minded others and express opinions in response to current events on social media, social media public opinion is naturally occurring, temporally sensitive, and inherently social. Our framework for measuring social media public opinion ﬁrst samples targeted nodes from a large social graph and identiﬁes homogeneous, interactive, and stable networks of actors, which we call “ﬂocks,” based on social network structure, and then measures and presents opinions of ﬂocks. We apply this framework to Twitter and provide empirical evidence for ﬂocks being meaningful units of analysis and ﬂock membership predicting opinion expression. Through contextualizing social media public opinion by foregrounding the various homogeneous networks it is embedded in, we highlight the need to go beyond the aggregate-level measurement of social media public opinion and study the social dynamics of opinion expression using social media.

The results also inform social media opinion measurement: to measure social media public opinion, texts should be combined with social network structure so that opinion expression can be disaggregated and situated in its online social context.

Open Science Framework awards
Keywords: Public Opinion, Social Media, Social Network Structure, Social Network Analysis, Social Media Data Mining hhtps://doi.org/10.1093/jcmc/zmab021 As the cornerstone of democracy, public opinion has been predominantly treated as mass opinionan aggregate of individual opinions gathered by survey-based public opinion polls. Though a powerful measure of the pulse of the public, this approach tends to yield a snapshot of private preferences that are prompted by pollsters and contingent on the artificial context of polling, thus overlooking the social context and the uneven influence of opinion expression (Blumer, 1948;Lin, Margolin, Keegan, & Lazer, 2013;Zaller, 1992). The drawbacks of survey-based polls have recently been amplified by rising non-response rates due to factors like changing patterns of technology use and public distrust in polling (Edwards-Levy, 2021; Groves &Peytcheva, 2008).
Social media can bring a new way to conceptualize and measure public opinion, complementing survey-based opinion polls (Chen & Tomblin, 2021; Kla snja, Barberá, Nagler, Beauchamp, & Tucker, 2015). Social media platforms like Twitter have emerged as one key battleground of public discourse, where people from different backgrounds actively comment on current events and public issues and strive to exert influence (Conway, Kenski, & Wang, 2015;Kim, Kim, Lee, Oh, & Lee, 2015;Tufekci, 2013). This leads to naturally occurring, temporally sensitive, and inherently social opinions (Anstead & O'Loughlin, 2015;McGregor, 2019), which are drastically different from those gathered by surveybased opinion polls. Another key feature of social media public opinion is its embeddedness in various homogeneous networks, where like-minded individuals interact with each other and reinforce opinions (Cinelli, Morales, Galeazzi, Quattrociocchi, & Starnini, 2021;Colleoni, Rozza, & Arvidsson, 2014). However, issues like a lack of demographic representativeness and opinion manipulation pose challenges for using social media as a public opinion measurement tool (Ferrara, Chang, Chen, Muric, & Patel, 2020;Kla snja et al., 2015).
To capture the unique characteristics of social media public opinion while addressing the measurement challenges, we introduce a methodological framework called "murmuration." This framework centers around identifying meaningful groups of social media actors and studying opinion expression by those groups. Given that homophily drives friendship formation on social media (Aiello et al., 2012;Barberá, 2015;De Choudhury, 2011) and that social network structure (i.e., structure of friendship ties 1 ) correlates strongly with individual characteristics (e.g., Al Zamal, Liu, & Ruths, 2012;Grabowicz, Ramasco, Moro, Pujol, & Eguiluz, 2012;Pennacchiotti & Popescu, 2011), this framework relies on social network structure to computationally identify focus groups. Drawing on the idiom "birds of a feather flock together," we call our focus groups "flocks." A flock can be conceptually defined as a collection of similar social media accounts situated in the same neighborhood of a social graph. We expect social media public opinion to exhibit homogeneity within a given flock or similar flocks and heterogeneity across different flocks. Therefore, the unfolding of the opinions of various flocks on social media in response to external events is akin to a murmuration of starlings whose formation changes fluidly.
In the following sections, we first review current research on social media public opinion and explain how the murmuration framework can fill the gaps and address the challenges of measuring public opinion on social media raised by existing studies. Then, we apply this framework to the case of political opinion leaders on Twitter, a unique social media platform featuring political commentary and contention (Yarchi, Baden, & Kligler-Vilenchik, 2021) and boasting instantaneous response to external events (Zhang et al., 2019). As information on Twitter primarily flows from elites to average accounts (Wu, Hofman, Mason, & Watts, 2011), opinion leader flocks in the Twitter sphere are essential for understanding the general opinion climate on Twitter. By analyzing social network structure and opinion expression, we provide empirical evidence for 1) flocks being homogeneous, interactive, and stable networks on Twitter and 2) flocks predicting opinion expression. Using three distinct news events, we further demonstrate the power of the murmuration framework in capturing a) the intensity and b) temporal dynamics of opinion expression by flocks and c) the opinion contestation between them. These results demonstrate how the murmuration framework can reveal the social dynamics of public opinion by foregrounding actors who express opinions and situating opinion expression in its social context. The theoretical implications for social media as the battleground of public discourse as well as the methodological implications for social media public opinion research are finally discussed.

Social media public opinion: conceptualization and measurement
Public opinion is a rather "nebulous" construct (Herbst, 2001;Zaller, 1994). From "deliberative opinion" emerging from rational critical debates (Habermas, 1989), "enlightened opinion" formed through rational thinking and with sufficient information (Berelson, 1952), to "latent opinion" affected by government decision-making and expressed in elections (Key, 1961), this construct is conceptualized differently by different scholars. However, the "dominant construct" of public opinion is "the aggregate of responses to national representative polls" (Zaller, 1994, p. 276). Herbst (2001) discerns that public opinion is tied to the public opinion infrastructures-opinion formation, measurement tools, and media-all of which are historically situated. For example, the dominant construct treats public opinion as a representative collection of individual opinions gathered by survey-based opinion polls and broadcast by mass media; this construct fits with the 20th-century mass society featuring detached and dispersed individuals whose common knowledge and awareness are shaped by mass media (Lang & Lang, 2009).
Opinions gathered through survey-based polls tend to be formed when prompted and by relying on sampling accessible considerations; therefore, how public opinion questions are phrased can greatly affect response (Zaller, 1992). In contrast to elite expression in coffee houses and salons in 18th-century Europe (Habermas, 1989), such mass opinion carries some level of artificiality and volatility. Blumer (1948) further argues that treating public opinion as a simple aggregate of individual opinions ignores how people are embedded in various social groups and institutions. According to him, individual opinions crystalize in group settings and are expressed vocally by social institutions. Moreover, despite public opinion being customarily broken down by demographics, such as age, gender, and party identification, these overarching demographic segmentations might not fully reveal the formation and distribution of public opinion. Instead, some latent features-unidentified cultural categories yet to be observed in empirical data (Hallinan & Striphas, 2016)-might better capture the opinions of the public.
Social media dramatically change the media landscape of the 21st century and consequently the way people communicate with each other, bringing a new way to conceptualize public opinion. Driven by homophily and organized by connective communication technologies like social media, networked publics emerge online both as a collection of individuals bound together by common ties and/or common identity, and as a techno-social space conditioning individual opinion expression (boyd, 2010;Marwick & boyd, 2011). As people express opinions in the social context of networked publics, such opinions go beyond the sum of atomic opinions and become "an on-going product of conversation, embedded in social relationships" (Anstead & O'Loughlin, 2015, p. 215). Existing research reveals that social media public opinion tends to be situated in echo chambers-people with similar views talk to each other without reaching across the aisle-though cross-cutting talks can occur early on in non-political contexts and more frequently on certain platforms (Barbera, Jost, Nagler, Tucker, & Bonneau, 2015;Conover et al., 2011;Yarchi et al., 2021). Also, social media public opinion has a strong temporal dimension: people respond to breaking news in real time and comment on related social issues in the wake of various events (Chen & Tomblin, 2021;Kla snja et al., 2015;Zhang et al., 2019).
Social media not only provide a space for the active expression of opinions, but also serve as a tool for measuring public opinion. Journalists quote social media expression as "vox-populi" in their stories to represent the voices of the common people or use engagement metrics as quantitative indicators of public sentiments (Beckers & Harder 2016;McGregor, 2019). Private polling firms and academic researchers alike also tap into social media to capture the buzz and sentiment of public discussion by using data mining techniques (Chen & Tomblin, 2021;Kla snja et al., 2015). For example, some firms specializing in social media semantic polling employ natural language processing to distill insights from vast amounts of social media data about opinions of different pockets of the public on various issues and candidates; such results often enter news media coverage as a form of public opinion (Anstead & O'Loughlin, 2015).
All these suggest that social media have become an inherent public opinion infrastructure in the 21st century, influencing opinion formation, measurement, and circulation. Based on the three-pronged framework by Herbst (2001), social media public opinion can be conceptualized as naturally occurring, temporally sensitive, and inherently social opinions emerging from the networked publics of social media, measured by data mining techniques, and broadcast by mass media to the larger public. Although social media users are not representative of the general public (Tufekci, 2013), those actively engaged in opinion expression on social media can shape public opinion (Dubois & Gaffney, 2014;Lasorsa, Lewis, & Holton, 2012). More importantly, blended into individual day-to-day practice and the social world, social media are an important public opinion domain in and of itself (Couldry, 2012).

The murmuration framework
Existing studies that leverage social media to studying public opinion have primarily used natural language processing of texts to identify patterns of expression, such as sentiments or topics (Bollen, Mao, & Pepe, 2011;Cody, Reagan, Mitchell, Dodds, & Danforth, 2015), and to compare the results with survey-based opinion polls (Chen & Tomblin, 2021;O'Connor, Balasubramanyan, Routledge, & Smith, 2010). However, as the conception of public opinion is deeply intertwined with tools of opinion measurement, a blunt comparison between social media and survey-based opinion polls can be misleading. Additionally, the text-centric approach fails to take full advantage of social media data to reveal the social and conversational aspects of public opinion (Anstead & O'Loughlin, 2015) and the dynamic process of public opinion formation and shift (Lin et al., 2013). With only texts, we cannot accurately provide answers to questions like "Who talked about which events at a given time?" "Who led what discussion?," and "How did the discussion of an event by a certain group change over time?" We argue that the study of social media public opinion must go beyond the aggregate level and foreground networks of actors whose views about various issues and topics might be highly consistent internally yet quite distinct from each other. This focus on networks of actors can potentially address the issues of social media opinion manipulation and demographic unrepresentativeness. Abundant research points to how public opinion on social media can be manipulated by bots and bad actors, who can spread disinformation, polarize public discussion, and manufacture dominant opinion (e.g., Ferrara et al., 2020;Ross et al., 2019;Tucker et al., 2018). Research also shows that social media users differ in many ways from the general public, with the former being younger and more politically opinionated (Barberá & Rivero, 2015;Wojcik & Hughes, 2019). However, a recent study that compares traditional public opinion measures with social media finds that social media afford the unique advantage of "mapping of public voices that may be neglected" by traditional methods (Chen & Tomblin, 2021, p. 11). By focusing on the networks of actors and sources of opinions, we can avoid treating social media users as a monolith and differentiate opinions emerging from different corners of social media-including bots as they tend to cluster within networks (Hagen, 2020).
To measure social media public opinion in a way to fully capture its spontaneity, temporality, and sociality, and to minimize the conflation of distinct actors, we propose the murmuration framework. We envision social media public opinion as composed of opinions of different flocks, who band together via their shared ties and/or identities and express their opinions in the network context. At the core of the framework is identifying flocks of interest and studying their opinions over time. A conceptual illustration of the framework is shown in Figure 1, which includes network sampling of targeted accounts, identification of flocks, analysis of flock opinions over time, and presentation of analytical results. The first two modules are executed as needed and the last two modules can be performed on a regular basis.
The first module-targeted sampling ( Figure 1a)-samples from a targeted population using seed accounts known to be central to the population. For instance, if the target population is the political right, the seed accounts should cover the key conservative politicians, media personalities, and activists, like Donald Trump, Tucker Carlson, and Richard Spencer. The friendship ties of those seed accounts can help us locate who are connected to them, constituting an efficient way to assemble the targeted sample.
The second module-flock identification ( Figure 1b)-detects the underlying communities or flocks of accounts based on social network structure. Different methods have been applied to identify communities of actors, relying on information like hashtag use (Lin, Keegan, Margolin, & Lazer, 2014;Joseph, Gallagher, & Welles, 2020), Twitter lists (Wu et al., 2011), or social network structure (Boutyline & Willer, 2017;Featherstone, Barnett, Ruiz, Zhuang, & Millam, 2020). Given the focus on the discovery of various persistent networks of actors, the relatively stable social network structure is preferred for flock identification.
The third module-opinion extraction ( Figure 1c)-analyzes daily opinions from the sample. Given the flocks identified in the previous module, their expression data can be collected and analyzed to see how different flocks respond to events of the day. This module presents a timely digest of flock opinions, and over time it offers a unique window into public opinion dynamics. The fourth module-murmuration demonstration ( Figure 1d)-presents social media public opinion by flock in response to major news events.
Since the murmuration framework studies public opinion at the level of flocks, it is crucial to assess whether flocks are meaningful analytical units that reflect the formation of networked actors. In other words, to study flocks rather than individual accounts, we must first ascertain that flock members are similar to each other, interact with each other, and have stable ties over time, all of which create a social context of opinion expression. Therefore, our first research question concerns those three features of flocks:

Is a flock a (a) homogenous, (b) interactive, and (c) stable network of actors? (RQ1)
Our second question turns to opinions expressed by flocks. If accounts within a flock are alike, interact with each other, and have stable friendship ties with each other, they are expected to share similar topic emphases and sentiments in opinion expression. If this holds, it provides empirical support for aggregating individual opinions into flock opinion. Therefore, we ask: Can patterns of opinion expression be captured by flocks? (RQ2)

Implementing the murmuration framework Targeted sampling from the Twitter Friendship Network
In August 2018, we sampled political opinion leaders on Twitter who actively expressed political opinions in the Twitter following network using the personalized page-rank (PPR) sampling method (Chen, Zhang, & Rohe, 2020). The PPR sampling takes a set of seed accounts and examines their neighborhood through a personalized random walk. The random walker starts from a seed account. At each step, the walker either walks randomly to a Twitter account that is followed by the current account with probability a or teleports back to the seed account with probability 1 À a. Here, a is called the teleportation constant and is set to 0.15. The stationary probability distribution of the personalized random walk is defined as the PPR vector and nodes with the highest scores are sampled. Under the degree-corrected stochastic block model (DCSBM), the PPR sampling can consistently locate members of a targeted population. Also, since the random walk goes from an account to other accounts it follows and since bots are less likely to attract followers, this method can find a list of accounts that is unlikely contaminated by social bots. 2 Our analysis used a curated list of Twitter accounts including activists, pundits, journalists, and media outlets spanning the whole political spectrum in the United States (Online Appendix SI). We obtained a total of 267,117 Twitter accounts, with a total of 10,174,291 friends followed by them. Given that an account who follows or is followed by few accounts is difficult to classify, we removed any accounts following fewer than two friends and followed by fewer than five accounts. This resulted in a total of 193,120 Twitter accounts, which followed a total of 1,310,051 accounts.

Flock identification and interpretation
Through PPR sampling, we obtained a bipartite network consisting of accounts and those they follow.
To identify flocks, we used vintage sparse principal (VSP) component analysis, a simple algorithm for sparse principal component analysis where the loadings of principal components are sparse. In this algorithm, the estimates are initialized with a low-rank singular value decomposition (SVD), then the singular vectors are rotated with an orthogonal rotation (e.g., varimax rotation) to create sparsity. VSP can effectively cluster millions of accounts in less than an hour and provide a consistent estimate of community memberships Rohe & Zeng, 2020). In particular, we applied the two-way version of VSP to detect 100 communities among the followers and the followees. The estimated communities of followers and followees are matched, that is, the kth follower community tends to follow members in the kth followee community. Additional details about VSP and a schema of the algorithm are provided in Online Appendix SII.
For the downstream analysis, we focused on communities of followees, that is, the flocks that we track in this study. This decision is based on the assumption that these communities are more likely to be prominent Twitter opinion leaders because they are followed by the presumably influential Twitter accounts that we sampled. In this 2018 Twitter sample, the size (number of member accounts) of 100 flocks is 13,101 on average, with only four flocks smaller than 1,000 and the largest being 56,943. Unless otherwise specified in the remaining analysis, we considered the top 1,000 central accounts from each flock to control for the effect sizes of individual flocks.
The flocks were interpreted based on the profile descriptions of all members. This process was assisted by a computational approach that identifies the keywords of each flock using the best feature function (BFF), a feature selection method (Wang & Rohe, 2016). A detailed description of BFF is provided in Online Appendix SIII. BFF takes tokenized unigrams in the profile descriptions of all accounts and extracts unigrams that are most unique to one flock when compared with all other flocks. Based on the best unigrams associated with each flock, validated by the authors using the actual profile descriptions, we interpreted and named each flock. We selected 50 flocks for the downstream analysis.

Evaluating the effectiveness of flock identification
To answer RQ1 about whether the flock structure is meaningful, we evaluated the effectiveness of flock identification through shared followers (RQ1a), retweeting network (RQ1b), and stability of flock membership (RQ1c). As people project their own identities through self-expression on social media (Marwick & boyd, 2011;Fox & Warber, 2015) and attract like-minded audiences with their messages, similar accounts should attract similar followers. To estimate the pattern of shared followers among flocks, we utilized the friendship information of the accounts who followed accounts in at least one flock. Specifically, we counted the number of shared followers between any pair of accounts in the 50 selected flocks. We then aggregated individual counts into flocks' shared follower counts: for each follower, if it follows n i member accounts of flock i and n j of flock j, then we add n i n j Â 10 À6 to the shared follower counter between flock i and j. To quantify this pattern, we calculated an in-and-out ratio: the average number of shared followers by two accounts within a flock divided by the average number of shared followers by one from the flock and one outside it.
To examine the retweeting relationship, we constructed a random sample containing all tweets on Mondays from 1 October 2018 to 1 October 2019 from the 50 flocks. 3 This yielded 30,028,074 tweets, 15,846,255 of which were retweets. We computed the percentage of retweets that occurred between member accounts of a flock, that is, the percentage of within-flock retweeting. We also used the chi-square test after multiplicity correction with Benjamini-Hochberg (BH) procedure to examine the statistical relationship between an account retweeting other accounts in the sample and it retweeting other accounts in its flock.
To examine the stability of flocks, we performed another targeted sampling in August 2019, a year after we first performed the targeted sampling. We updated the seed nodes by removing inactive seeds (such as @RealAlexJones and @RichardBSpencer) and added new seeds that emerged in the 2018 data (Online Appendix SI). We compared the 100 flocks (all member accounts) identified in the 2018 sample and the 2019 sample. Stability was determined by computing the number of accounts in the 2018 sample that recurred in the 2019 sample. Fidelity was calculated by first defining a matching between two sets of 100 flocks by maximizing the total number of shared accounts (i.e., accounts that appear in both flocks) between pairs of matched flocks, whose solution was computed with the Hungarian algorithm (Kuhn, 1955). Such matching was then used to calculate the percentage of accounts that fell into the same flock among the reoccurring accounts.

Studying opinion expression by flocks
To answer RQ2 about whether flocks can capture the patterns of opinion expression, we first relied on the Monday tweets. Given that hashtags are semantic markers of full tweets and indicators of values and identities (Freelon et al., 2018) and elite accounts are unlikely to use hashtags in unexpected ways like hashjacking, we focused on the pattern of the 50 flocks' use of 129 hashtags that appeared over 4,000 times in our tweet sample. These hashtags were grouped into six categories. For example, hashtags presumably used by progressive accounts, like #voteblue and #bluewave, were labeled "liberal." Likewise, hashtags often used by conservative accounts, such as #tcot and #votered, were categorized as "conservative." The "Trump campaign" category included hashtags like #maga and #trumptrain presumably used by Trump supporters to rally around Trump. "QAnon" hashtags, like #qanon and #thegreatawakening, were presumably used by people holding conspiracy beliefs that deep state traitors schemed to thwart the Trump presidency. The "issue/topic" category included miscellaneous hashtags concerning political issues or topics. The remaining hashtags, mainly related to popular culture, fell under the "other" category. The concentration of hashtag use within flocks can show the relationship between flock structure and opinion expression. We also used a different approach to examine opinion expression: we treated all of an account's tweets as a single document, fitted a Latent Dirichlet allocation (LDA) model (Blei, Ng, & Jordan, 2003) with 50 topics and the Gibbs sampling (Griffiths & Steyvers, 2004), and examined the pattern between the actual topics of tweets and flocks (Online Appendix SIV).
While the Monday tweets can show general patterns of expression, it does not provide insights into how flocks respond to specific events. Therefore, we selected three events to investigate the opinion expression by 10 flocks (including more influential media flocks and less influential activist flocks) over time. The three news events were selected to balance liberal and conservative political issues: (1)  . To obtain a low-noise set of tweets about each news event (i.e., tweets that are highly likely to be about the event), we applied restrictive search strings to retrieve content. For the concluding phase of the Mueller investigation, any tweet containing "mueller" or "russia probe" (case insensitive) or any tweet quoting another tweet containing the same terms was included, resulting in a total of 1,160,120 tweets. For the passing of anti-abortion laws, we collected a total of 261,205 tweets using the search term "abortion." Lastly, for the killing of Khashoggi, "khashoggi" yielded 151,478 tweets. A comparison of tweets using restrictive search and extensive search does not show significant difference.
In this part of the analysis, we focused on the intensity, temporal dynamics, and difference of opinion expression by flocks. To study the intensity and temporal pattern of expression, we defined a measure to assess the daily activity level of opinion expression of a Twitter account in response to a news event: the number of tweets per thousand event tweets (TPK). The measure of TPK normalizes for the total number of event tweets, thus comparable across different news events. Given a set of Twitter accounts and their event tweets, TPK is computed in two steps. First, calculate a "per thousand event tweet scaling factor," defined as the average number of daily event tweets divided by 1,000. Second, divide the event tweet counts of individual accounts by the event scaling factor. To examine the content of expression, we identified the keywords in each flock's tweets using BFF, the same procedure used to find keywords in each flock's profile descriptions. We conducted sentiment analysis (Liu & Zhang, 2012) using the AFINN lexicon (Nielsen, 2011). Specifically, we calculated the average of each word's sentiment score, weighted by the square root of their frequency in given a set of tweets (e.g., tweets grouped by flock). The square root was taken for variance stabilization under the Poisson rate model (Bartlett, 1947).

Results
Based on the observed social network in our sample of influential political Twitter users, the 100 flocks cover various social, cultural, political, and geographical entities. The full list of 100 flocks and their inter-relationship are provided in Online Appendix SV. After excluding most regional flocks, the 50 selected flocks include media flocks, partisan flocks, issue flocks, and non-political flocks (Online Appendix SVI). The media flocks span the political spectrum, such as "Mainstream Media," "Progressive Media," and "Conservative Media/pundits." The partisan flocks include partisans of different stripes, ranging from ardent Trump supporters ("The Trump Train"), traditional conservatives ("Christian Constitutionalists"), far right actors ("White Nationalists"), to liberals who opposed the Trump presidency ("The Resistance"), supported Bernie Sanders ("Bernie Bros"), and consumed media veraciously ("News Junkies"). There are also various issue-centric flocks like "#blacklivesmatter," "Brexit," and "Middle East Correspondents." Flocks are homogeneous, interactive, and stable networks RQ1 asks whether a flock is a homogenous, interactive, and stable network. First, member accounts of a flock demonstrate homogeneity because markedly more followers were shared by members of the same flock than members of different flocks, answering RQ1a (Figure 2a)  minimum of 5.52. Notably, on average an account of the "#Uniteblue" flock shared 35.2 times more followers with accounts within the flock than with accounts outside the flock; similar results hold for the "Christian Constitutionalists" and "National Political Journalists" flocks with 31.4 and 20.2 times, respectively. In addition, flocks of the same category also shared more followers (e.g., "Mainstream Media" and "National Political Journalists" under the "media" category), revealing inter-flock structure. Second, interaction in the form of retweeting is concentrated among member accounts of a flock, showing the similarity between flocks and offline social networks where interactions are localized (RQ1b). Among the 7,379,555 retweeting relationships between accounts in the 50 flocks, on average 44.1% were between accounts within a flock, with the "Brexit" flock having as high as 80.8% of within-flock retweeting (Figure 2b). We found strong statistical evidence in the correlation between an account retweeting other accounts in the 50 flocks and the retweeted post originating with other accounts within its own flock (p-value < 2:2 Â 10 À16 Þ. The flocks with low levels of within-flock retweeting retweeted flocks of the same category. For example, 50.6% of retweeting by "#Uniteblue" was of other flocks under the "liberals" category and 52.1% of retweeting by "Christian Constitutionalists" was of other flocks under the "conservatives" category (Online Appendix SVII).
Third, the flock structure we identified is stable and consistent even after 1 year (RQ1c). This means that despite Twitter users' ability to freely follow additional accounts or unfollow existing ones, flock members exhibited relative consistency in accounts they followed. On average, 60.3% (median 71.6%) member accounts across the 100 flocks of 2018 recurred among the 100 flocks of 2019 ( Figure 3a). Particularly, 68 flocks in 2018 had more than half of their members reappear after 1 year, and only 18 flocks retained less than 30% of their previous year's members. Furthermore, among the 2018 accounts that reappeared in 2019, on average 75.9% fell into a matched flock (two flocks were matched if they shared more than 90% accounts). In particular, as many as 60 flocks in 2018 matched a new flock of 2019 (Figure 3b). 4

Flocks predict opinion expression
RQ2 asks whether patterns of opinion expression can be captured by the flock structure. We present a subset of hashtags representing the range of patterns observed in all hashtags in Figure 4 (see the full hashtag results in Online Appendix SVIII). Overall, we found a high level of correspondence between hashtags and the flocks that used them, suggesting the predictability of opinion expression by flock membership. Hashtags presumably used by liberals appeared most frequently in liberal flocks' tweets. Similarly, hashtags indicating conservative values, allegiance to Trump, and conspiracy beliefs appeared most frequently in conservative flocks' tweets. Issueand topic-specific hashtags behaved similarly; for instance: #syria and #iran were overwhelmingly used by "Middle East Correspondents." Hashtags even validated the distinction between similar flocks: #bernie2020 and #notmeus were nearly exclusively used by the "Bernie Bros" flock on the liberal side. These results are consistent with previous research showing the similarity between friendship ties and tweets (Aiello et al., 2012). A similar pattern can be observed in the actual topics of tweets from the flocks (Online Appendix SIV).
in general "The Trump Train" and "Christian conservatives" on the conservative side and the "#Uniteblue" and "The Resistance" on the liberal side were the most active, the pattern of the 10 flocks' expression intensity varied across events. For the Mueller investigation, the three conservative flocks and the three liberal flocks were nearly equally engaged in talking about the investigation, accounting for 45.6% and 42.8% of the conversations, respectively. In "The Trump Train," member accounts had on average 97.1 relevant tweets per month (TPM) and in "The Resistance," about 70.4 TPM. However, the passing of abortion laws was primarily a conservative issue: "The Trump Train" alone accounted for 46.5% of total tweets (with 50.3 TPM), and the three conservative flocks combined tweeted 60.2% of relevant content. In contrast to the abortion laws, the killing of Khashoggi stimulated discussion mostly among "Middle East Correspondents" and the three liberal flocks.
Besides the level of expression intensity, the temporal pattern of expression diverges across events (Figure 5c). For the Mueller investigation, all 10 flocks were relatively in sync in terms of tweets per day, suggesting that opinion expression about the Mueller investigation was driven by key moments, such as Attorney General William Barr sending his principal conclusions of the Mueller report to Congress, the redacted version of the full report going public, Mueller contemplating not testifying, and his testimony before House committees. However, the passing of anti-abortion laws generated a completely different temporal pattern of tweets. The conservative flocks, spearheaded by "The Trump Train," had remained agitated on the abortion bans, as evidenced in their constant hyperactivity. However, the liberal flocks did not join the conversations en masse until much later, when the Alabama governor signed the most extreme abortion ban. A different pattern can be observed in the killing of Khashoggi. His disappearance first and foremost concerned "Middle East Correspondents," spreading next to "National Political Journalists" and liberal flocks. Conservative flocks, unlike their response in the other two events, reacted to this event later than other flocks.
The drastically different words used by flocks in their opinions toward each event demonstrate how opinion expression is tied to the flock context (Online Appendix SX). For the Mueller report, conservative flocks saw it as a vindication of Trump (suggested by keywords like "#maga," "trump2020") and turned their attention to Democrats ("democrats" "witch," "hunt," "obama," The 50 selected flocks are those whose opinion expression was studied in this article.
Social Media Public Opinion Y. Zhang et al.
"hillary"). However, liberals saw it as evidence for "obstruction" of "justice" and a reason for the "impeachment" of Trump. They also called upon the public to "read" the "report" and the Department of Justice to release the full report. Responding to the anti-abortion laws, conservative flocks emphasized the sanctity of life, while liberal flocks championed women's rights. Conservatives and conservative media invoked pro-life tropes, characterized by terms like "babies," "heartbeat," "life," "murder," and "infanticide," whereas liberals and other media couched their language in legal and activism terms, like "access," "rights," "ban," and "#stopthebans." For the Khashoggi event, while  Figure 4 Heat map of 53 hashtags frequently used by 50 flocks. Each column corresponds to one flock, with column panels indicating flock category and column strips on the bottom indicating the category name. Each row corresponds to one hashtag, with row panels indicating the hashtag category and row strips on the left indicating the category name. The shade of color indicates the percentage of active accounts in the flock that utilized the hashtag. The bar plot above the heat map reports the number of daily tweets from each flock; the bar plot on the right reports the number of hashtags observed per million tweets collected.
"Middle East Correspondents" and "National Political Journalists" mainly focused on the event itself, the three liberal networks and the three conservative networks politicized this event. The liberal flocks tied it to Trump's and Kushner's relationships with the Saudis, while the conservatives focused on Khashoggi's alleged tie with Muslim brotherhood and on Obama's "mistreatment" of western journalists, and tried to channel the attention back to the Benghazi attack.

Discussion
In this article, we discuss how social media public opinion can be conceptualized as naturally occurring, temporally sensitive, and inherently social opinions embedded in homogenous networks of actors. We introduce the murmuration framework for the large-scale measurement of such opinions. This framework goes beyond the aggregate-level measurement of social media public opinion and addresses conceptual and methodological challenges like a lack of demographic representativeness and opinion manipulation. It treats flocks, which encode social network structure information, as the unit of analysis of social media public opinion. We demonstrate that flocks are meaningful units of analysis. Not only do members of a flock identified through social network structure share similar followers and interact with each other through retweeting, their friendship ties are also relatively stable over an extended period of time. This points to the high plausibility that flocks are networked publics whose opinion expression is shaped by the social context on Twitter.
Overall, our results speak to the effectiveness of the murmuration framework in capturing the temporal and social dynamics of public opinion on social media. Our analysis of opinion expression by the flocks across a period of one year shows that opinion expression can be predicted by flock membership. Both the hashtag use and the topic emphasis of tweets display concentration among certain flocks. We further show that the murmuration framework can reveal distinct patterns of opinion intensity, temporality, and contestation when people talk about real-world events. When it comes to abortion bans, conservative flocks were the most vocal, contrasting with the killing of the journalist Jamal Khashoggi that stimulated outspokenness from liberals. While the Mueller investigation captured the attention of all media and partisan flocks in a similar fashion, the discussion surrounding the abortion bans and the killing of Khashoggi was initially driven by conservatives and liberals, respectively. Lastly, different flocks used distinct terms to frame those events, indicating their polarized interpretations.
These social network and opinion expression patterns portray social media as the space where like-minded people are densely connected and express highly consistent messages. With abundant literature laying bare the echo chamber tendency of social media (e.g., Song & Boomgaarden, 2017;Yarchi, Baden, & Kligler-Vilenchik, 2021), such a result is not surprising. However, these patterns add nuances to our existing knowledge. Networks on Twitter that talk politics exhibited shared attention, though at varying levels of intensity, to events that might not even align with their views; and they attempted to frame those events from different angles in line with their existing values and identities. Flocks might engage in such practices not so much to convince the outside world as to invoke their core beliefs or ideological priors to defend their egos against any ideologically disruptive evidence (Katz, 1960). Alternatively, they might seize the opportunities that those high-profile events afford them to jostle for discursive power by advancing certain definitions of issues and shaping the corresponding public response (Entman, 2004;Jungherr, Posegga, & An, 2019). By identifying different flocks and examining the intensity, temporal pattern, and content of their expression, we can gain deeper insights far beyond where liberals and conservatives stand on a certain issue. This is because these flocks are segments of the population, defined not by demographic variables of questionable salience (e.g., white women aged 18-29 years), but by their online connections and response to events. As such, we can observe opinion variations within an ideological camp and opinions of people that might not be typically assumed to have an opinion on certain issues. Moreover, we can trace how the intensity of expression and framing of issues of different flocks might be temporally related to talking points of political figures and news media attention and coverage, which can further shed light on how information and communication flow in a hybrid media system (Chadwick, 2017). These findings are in line with Chen and Tomblin's (2021) observation that social media can not only help researchers identify more nuanced opinions but also uncover opinions from fine-grained "subpopulations with specialized knowledge and unique orientations toward a subject" (p. 1).
Methodologically, this study offers one way to study public opinion that is clearly different than survey-based public opinion polls. It offers insights into public opinion that can complement discoveries based on opinion polls. Moreover, it demonstrates how the prevailing text-centric and aggregate approach to mining social media public opinion might be an inadequate measure of the construct. Particularly, randomly drawn tweets surrounding a topic can come from a cacophony of opposing activists, bots, propagandists, spammers, and comedians discussing the news with cynicism and sarcasm. Relying on just texts on the aggregate level amounts to studying words from no voice and expression with no context. Our synthetic approach that combines texts and social network structure can disaggregate opinion expression and give texts fuller meaning, helping discover patterns of expression and interaction linked to social actors and their networks. As a result, we can better take advantage of social media data to study public opinion as a form of social interaction and reveal underlying social dynamics.
This approach can be increasingly relevant given the variety of actors attempting to use social media to project their voices and grow their influence, such as social movement activists (Jackson & Foucault Welles, 2016), subcultural groups (Marwick & Lewis, 2017), and foreign disinformation actors (Zhang et al., 2021). By harnessing social network structure, we can potentially identify the flocks formed by those actors (Hagen, 2020) and their followers who might be susceptible to their opinion manipulation. This helps trace the actual influence of those disparate groups of actors on social media. However, we need to balance this approach by protecting the rights and privacy of ordinary Twitter users. Though we focus on relatively high-profile public figures and opinion leaders in this article, this framework can be applied in other contexts where ordinary Twitter users might be involved. In those cases, researchers must not conduct analyses that compromise the identities of individual users and serve microtargeting purposes.
We must note that the opinions that we measure in this article belong to the elite layer of public opinion. We see this as a feature, not a limitation. Given previous studies showing the two-step flow of opinions, understanding this stratum of opinion leaders is essential. Moreover, since these opinion leaders on social media might interact with mass media, this project in its next phase will examine how social media public opinion, in terms of both intensity and content, interacts with news media attention and coverage.

Data availability
The data underlying this article are available upon request. The R code for implementation of the PPR sampling and VSP is available at https://github.com/rohelab.

Supporting information
The Supplementary data are available at JCMCOM online.