- Split View
-
Views
-
Cite
Cite
Sílvia Majó-Vázquez, Ana S. Cardenal, Sandra González-Bailón, Digital News Consumption and Copyright Intervention: Evidence from Spain before and after the 2015 “Link Tax”, Journal of Computer-Mediated Communication, Volume 22, Issue 5, 1 September 2017, Pages 284–301, https://doi.org/10.1111/jcc4.12196
- Share Icon Share
We analyze patterns of digital news consumption before and after a “link tax” was introduced in Spain. This new legislation imposed a copyright fee for showing snippets of content created by newspapers and resulted in the shutdown of Google News Spain. The Spanish copyright law is a precedent to the Copyright Directive currently submitted to the European Parliament, which is planning to impose a similar “link tax.” We offer empirical evidence that can help evaluate the impact of that sort of intervention. We analyze data tracking news consumption behavior to assess changes in audience reach and audience fragmentation. We show that the law has no discernible impact on reach, but we identify an increase in the fragmentation of news consumption.
In October 2014, the Spanish Congress approved a reformed Intellectual Property Law (IPL) regulating the creation of online content. The law, which took effect in January 2015, introduces a nonwaivable copyright fee to be paid by online aggregators for linking to content created by newspapers and publishers. The law is designed to affect any activity that relies on hyperlinks to online news content created by third parties, but makes an explicit exception for social media platforms like Facebook or Twitter (Ministerio Educación, 2014; Xalabarder, 2014). Immediate reactions to this law included Google closing down its News portal and removing Spanish media outlets from the service (Google, 2014; Rushe, 2014). Prior to this episode, similar regulatory efforts had been attempted in Belgium, France, and Germany. However, in those instances the decision to tax links was made optional. The Spanish legislation, in other words, opened the door for a more aggressive regulation of online content. An important consequence of this legislation is that it affects not only giants like Google but also smaller actors, especially “native” digital newspapers or news communities like Menéame (the Spanish version of Digg).
The negative effects of the law are already visible, according to a report commissioned by the Spanish Association of Periodical Publishers (AEEPP), and they affect both legacy and new media actors (Méndez, 2017; Posada de la Concha, Gutiérrez, & Hernández, 2015). More recent research offers a better quantification of the impact. In one study, the evidence suggests that the shutdown of Google News in Spain resulted in a decrease of daily visits to Spanish news outlets by 11% (Calzada & Gil, 2016). Another study shows a decrease of 20% in overall news consumption among Google News users in the period after the search engine decided to close its portal (Athey, Mobius & Pal, 2017). This decrease, the study finds, was concentrated around small publishers: large publishers did not see significant changes in their overall traffic as a consequence of the shutdown. This negative assessment was also echoed recently by the U.S. government in a 2016 report stating that the “link tax” requires “careful monitoring” since it threatens the success of emerging business models (USTR, 2016, p. 179). And yet, in spite of growing concerns on the impact that this sort of intervention can have on the online media environment, similar regulations are now being considered on the European level (European Commission, 2016, 2017; Schechner & Woo, 2016). As currently phrased, the legislation proposed by the EU executive could allow press publishers to charge for the reproduction of headlines or the mere indexing of their articles (Meyer, 2016).
Underlying these regulatory efforts there is an assumption that access to online information is mainly driven by the linking activity of news providers (Calzada & Gil, 2016; Posada de la Concha et al., 2015; Xalabarder, 2014; Athey, Mobius & Pal, 2017). Whether links are used to steal or boost audience flow depends on who makes the argument, i.e. legacy media or digital-born media organizations. This paper presents novel data that can help evaluate the empirical consequences of the “link tax” legislation, and determine whether the regulation of linking activity has a discernible impact on how audiences consume news online. This is an important question if we also consider the role that social media play in granting access to news. Social media were explicitly excluded from the Spanish legislation but they are becoming increasingly important as entry points to news and political information (Newman, Fletcher, Kalogeropoulos, Levy, & Nielsen, 2017).
The regulation of online content relates, more broadly, to academic discussions about how digital technologies are affecting news consumption and access to political news, which is one of the most important components of democratic life (Delli Carpini & Keeter, 1996). Early accounts of the dangers of personalized information (Katz, 1996; Resnick, 1997; Sunstein, 2009; Turow, 1998) and more recent discussions of the effects of the ‘filter bubble’ (Pariser, 2011) share a common argument: that new technologies are increasing audience fragmentation, and that this has negative consequences for society and democracy. In the background of these discussions lies a model of democratic engagement that stems from epistemic and deliberative conceptions of the public domain (Berelson, 1952; Converse, 1964; Gutmann & Thompson, 2009; Habermas, 1994). This democratic ideal puts information at the center of civic life: information becomes a vehicle for political engagement; the key to gain political knowledge; and the foundation for political action (Baum & Groeling, 2008; Delli Carpini & Keeter, 1997; Knobloch-Westerwick & Johnson, 2014; Prior, 2007; Verba & Nie, 1987). Access to a common and rich informational space is, consequently, an important democratic condition: the quality of decision-making depends on having a space for the discussion of public affairs which, in turn, requires access to political information. It is no wonder, then, that a lot of attention is being paid to how digital technologies are reshaping the public domain and reconfiguring access to news.
The extent to which the web and related technologies preserve this ideal of a public sphere (Habermas, 1994) depends on the behavior of two actors: the providers of information and the consumers of that information. The “link tax” responds, to a great measure, to the lobbying efforts of European publishers. As stated by one of the politicians helping the recent Copyright Directive advance through the channels of the European Parliament, “Newspapers help set the agenda, so politicians have to listen to them” (Scott & Clark, 2015). The economic costs of having to comply with a “link tax” are clear: they are the main reason why Google decided to withdraw its services from Spain (Google, 2014); the social news website Menéame has already stated that it will not be able to play the costs derived from the “link tax” (Méndez, 2017); and El País, one of the legacy newsapers that promoted the law, recently withdrew its support to the regulation (Editors, 2017). But what is not so clear is how much of a difference links make to the other side of the equation, i.e. audiences, and to how they consume news and political information.
On the one hand, there is evidence that most online news consumption results from individuals visiting the home pages of their favorite news outlets, which tend to be mainstream media (Flaxman, Goel & Rao, 2016). There is also evidence that traffic to mainstream news sites from links and search engines accounts for a very small fraction of their total traffic (Allcott & Gentzkow, 2017). What this suggests is that imposing a copyright fee on links might not drastically change the way in which consumers access news online. On the other hand, there is evidence that the Google News shutdown did reduce the consumption of news, although this decrease affected mostly the smaller publishers (Calzada & Gil, 2016; Athey, Mobius & Pal, 2017).
In this paper, we conduct empirical analyses that add additional evidence to prior work and shed light on audience behavior in a unique regulatory framework. Using data from two different sources that track online browsing patterns, we show that the “link tax” did not have a significant impact on the reach of news sites, defined as the fraction of the online population accessing those sites. However, we also show that there is an increase in audience fragmentation, which we define as a reduction in the audience overlap of news media sites. Crucially, the approach we take to measuring audience fragmentation makes an important methodological improvement to how audience networks were mapped in prior work (explained in more detail in Mukerjee, Majó-Vázquez, & González-Bailón, 2017). Ultimately, our goal is to discuss whether the increase in fragmentation could result from the “link tax.” First, though, we start by contextualizing our research in the larger literature on news consumption in the digital age.
Digital News Consumption
Digital technologies have brought fundamental changes to the way people consume political information. The Internet allows citizens to have greater control over news selection; it offers a broad range of sources to keep up with political events; and it transforms the gatekeeping process so that it is no longer the monopoly of traditional media (Benkler et al., 2015; Farrell & Drezner, 2008; Groshek & Tandoc, 2017). Early theoretical accounts of these changes were, for the most part, optimistic. However, scholars soon honed in on some of the pernicious consequences of digital technologies, identified with traditional conceptions of the public sphere in the backdrop (Chaffee & Metzger, 2001; Gitlin, 2002; Prior, 2008; Sunstein, 2009; Turow, 1998). These critical accounts pay special attention to the increasing fragmentation of the online domain and the negative effects that fragmentation has on democracy.
The main claim of these negative accounts is that the fragmentation of news production provokes a fragmentation in the audience, which, in turn, weakens the foundations of a common space for the discussion of public affairs. However, this prior work usually disregards the decisions that users make on how to navigate online content; it offers, in other words, theoretical approaches to media fragmentation that do not take into consideration the consumption patterns revealed by users. Scholars who assess the behavior of online audiences have, in fact, offered evidence that contradicts the fears of fragmentation (Athey, Mobius, & Pal, 2017; Flaxman, Goel, & Rao, 2016; Gentzkow & Shapiro, 2011; Taneja, 2016; Trilling & Schoenbach, 2013; Webster & Ksiazek, 2012; Webster, 2014). These studies show that media diets are diverse and that they include the most prominent news media, which become the de facto common ground for news exposure. Determining how these two streams of evidence come together (i.e. the increasing fragmentation of news production, and the common grounds revealed by media diets) requires more research. The debate on how digital technologies shape audience behavior is, in other words, still open and unresolved.
Fragmentation in news consumption
In this study, fragmentation relates to the demand side of information. We define fragmentation not in terms of the aggregate number of media options available to consumers but in terms of how audiences distribute across those sources to obtain their news. We take a structural approach that looks at media networks and at how audiences navigate news sites. More formally, we measure fragmentation as a reduction in the number of ties capturing audience overlap between news media sites. If these ties exist it means that audiences consume news from a variety of sources; conversely, an absence of overlapping ties means that audiences self-select to consume news from a smaller range of sites.
Authors studying digital news from the supply side have overemphasized the specialization of news outlets, and assumed that the enhanced capacity for content personalization inevitably leads to increasing audience fragmentation (Chaffee & Metzger, 2001; Napoli, 2008; Sunstein, 2009). For example, Tewksbury (2005) sustains that the specialization of online media outlets leads to fragmentation because they attract homogeneous groups of users. Parallel to these studies, there is a related area of work that considers whether audience fragmentation responds to the informational structure built by news providers through their links (Ackland & Gibson, 2004; Williams, Trammell, Postelnicu, Landreville, & Martin, 2005). Theoretically, the provision of a link should be a journalistic activity: Through links, readers can trace sources of information; expand on the context that gives meaning to data; and elaborate on interpretations. But due to the crisis affecting the media industry worldwide (Pew Research Center, 2016; Newman et al., 2017), business criteria tend to dominate journalistic decisions. As a consequence, news sites rarely link to external competitors: They treat links not as journalistic objects that add value to the stories, but as economic assets or functional devices that can, in principle, keep audiences within corporate boundaries (Dimitrova, Connolly-Ahern, Williams, Kaid, & Reid, 2003; Karlsson, Clerwall, & Örnebring, 2014). The “link tax” that motivates this study, and the lobbying efforts that underlie its political sponsorship, are a good example of how business criteria dominate journalistic decisions – and of how the industry is trying to monetize emerging patterns of news consumption.
According to De Maeyer (2012), the tendency to avoid linking to outside sources shapes the network of news providers as “walled gardens” (Napoli, 2008, p. 63). This metaphor presumes that users will not venture beyond the walls established by hyperlinks. Yet again, this argument is not supported by evidence explicitly measuring the extent to which audiences respond to the connections created by media organizations. The “link tax” offers a good natural experiment to determine the impact that the existence of those links (or their absence, due to imposed copyright regulations) have on audience behavior.
The emergence of new players
These contradictory scenarios demand attention to the role that digital actors play in shaping online news consumption. Digital technologies have multiplied the number of sources that grant exposure to news – but also the means to get access to those news. News sources today can be classified in four main types: (a) legacy media (i.e. the online version of newspapers that predate the internet, or other mainstream sources, like public service broadcasters or commercial TV channels); (b) digital-born media (i.e. outlets that were born with the internet, like BuzzFeed or the Huffington Post); (c) social media platforms (i.e. Facebook, Twitter); and (d) search engines (i.e. Google). Sources (c) and (d) are mostly referral sites, which means that they direct users, through links, to sources (a) and (b). The “link tax” discussed in this paper affects the links created by news aggregators, which, depending on interpretation (and the law is ambiguous in this respect) could affect media classified as search engines (category d) or digital-born news outlets (category b). Social media (category c) were explicitly excluded from the regulation, although they are also now under public scrutiny for their role in the spread of misinformation (e.g. Brinkhurst-Cuff, 2017). In fact, Facebook recently decided to use third-party news organizations and fact checkers to fight fake news (Mosseri, 2016; Sharockman, 2016; Snopes, 2016). This decision clearly points to the distinctive role that legacy and digital-born media play in the production of news, a role that social media platforms cannot play on their own.
Figure 1 provides a ranking of the most important social media sites for news in Spain, according to the Digital News Report published by the Reuters Institute for the Study of Journalism (Newman et al., 2017). For reference, rankings are also provided for the UK and the US. As the figure suggests, Facebook is reported to be the most important social media platform to obtain news, especially in Spain. It is unclear, however, what fraction of this population also consumes news through other means, i.e. subscriptions or direct search; or whether Facebook refers those users more frequently to legacy media or digital-born outlets. Since Facebook does not produce original content (or not yet), users obtaining news though this platform are still referred to the outlets that produced the news. One caveat is that news content published directly through the Instant Articles service is hosted by Facebook and hence traffic is not referred to the website of news organizations (Goel & Ravi, 2015). As of today, there is no systematic data tracking how many users encountering news through Facebook actually click on the links to read the original content. But we do know through several sources, including the Reuters surveys and the observational data analyzed in this paper, that Facebook (and other social media) are definitely not the only sources of news online – they are not even the main source of referrals: search engines and direct access are still the most prominent sources of traffic (Flaxman, Goel, & Rao, 2016; Allcott & Gentzkow, 2017).
The question of how diverse the average media diet is can be answered by looking at two trends: changes in the audience reach of different news outlets; and changes in the audience overlap between those news sources. In this paper, we focus on categories (a) and (b) above, i.e. legacy and digital-born news outlets. We do not include social media platforms in our analyses because, as already stated, they were explicitly excluded from the Spanish “link tax” law. In addition, even if they are important entry points to news, they are still just, for the most part, referral sites. This is probably the reason why social media were excluded from the “link tax” in the first place. It is also the case that social media like Facebook have been more willing to reach agreements to share revenues with news organizations, of which the Instant Articles service is an example. In any case, most links circulating through social media are created by users themselves (which accentuates, in fact, the filter-bubble effect to a greater extent than automated feed algorithms, at least in the US, see Bakshy, Messing & Adamic, 2015; also Lazer 2015). With these caveats in mind, the following section introduces the data and the methods we use to analyze changes in media reach and audience overlap. Ultimately, our goal is to provide empirical evidence that casts light on the impact that copyright regulations have on news consumption patterns.
Data and Methods
The first step in our analyses was to compile a list with the most prominent Spanish news sites in terms of their traffic, that is, their percentage reach of the online population. We used Alexa Internet's traffic rankings and comScore audience measurement statistics to create this list. These two companies use different methodologies to create their estimates. Alexa's ranks are based on the browsing behavior of people in their global panel, which is a sample of all Internet users. The panel used by comScore, on the other hand, is restricted to users based in Spain, and their estimates refer to the Spanish online population. In spite of these differences, both rankings have a correlation of 0.86, giving us confidence that the news sites selected (legacy and digital-born) are the most influential in terms of readership. One caveat worthwhile making is that rankings and audience metrics for the sites at the tail of the distribution (i.e. those with low traffic) are more unreliable and volatile. Since we analyze longitudinal data, a few of the sites in the original list appeared intermittently in the data. We removed these sites and, in the end, we were left with 93 sites, 36 classified as digital-born and 57 as legacy media.
Reach Data
We obtained daily reach data from Alexa for the period 1 September 2014 to 30 September 2015 (the “link tax” came into force on 1 January 2015; Google News closed on 16 December 2014). This gave us a time series with 395 observations for every news site. We aggregated these data for the two categories we are considering here: legacy and digital-born media. The purpose of analyzing these time series is to determine if there is a visible shift in the average reach of these sites after the copyright law came into force in 1 January 2015. In the light of prior research (Calzada & Gil, 2016; Athey, Mobius, & Pal, 2017), our hypothesis is that the “link tax” had a negative impact on the traffic of digital-born outlets (the younger and smaller publishers) and no impact on the more established digital-born outlets. To determine if this hypothesis holds, we use time series modelling and, in particular, a technique known as intervention analysis (Cryer & Chan, 2008). The goal of this technique is to estimate if there is a statistically significant trend change after an intervention, controlling for seasonality and autocorrelation, which are typical features of time series data.
Audience Overlap Data
We obtained audience overlap data from comScore. This statistic gives us information on the size of the audience that access any two news outlets in a given month. Audience overlap offers the building block to reconstruct networks that can then be used to analyze the cohesiveness or fragmentation of news consumption patterns, as depicted in Figure 2. Panel A contains (an idealized) example of dyadic overlap between two news sources. According to this example, 10,000 online users access both site i and site j: this is the absolute duplication, or the fraction of the online population that gets exposed to these two news sources. Since media sites differ greatly in their audience reach, this raw overlap can be expressed as a percentage proportional to the number of people who read a given source, as depicted in panel B: here the network captures relative duplication and it tells us that 25% of site i's audience also reads site j, but 50% of those reading site j also read site i. This asymmetry in audience behavior is best depicted in panel C, which tells us that site j shares 25% more audience with site i than vice versa.
Both the symmetrical and asymmetrical versions of this network (panels A and B) have been used in prior work (Ksiazek, 2011; Webster, 2014; Webster & Ksiazek, 2012; Taneja & Webster, 2016) following the methodology introduced in (Ksiazek, 2011). In this paper we use the undirected version of panel A to build our audience overlap networks, especially since we are also analyzing differences in reach through the time series. We aggregate audience data for the months of September 2014 and September 2015 – that it, at the beginning and at the end of the time series tracking changes in reach. Overall, in analyzing these networks we follow the intuition of previous work but, crucially, we propose an important improvement to the methodology, as explained in more detail elsewhere (Mukerjee, Majó-Vázquez, & González-Bailón, 2017). The analyses conducted in prior work are limited in one important respect: the construction of audience overlap networks does not eliminate overlapping ties that might result from random noise in the data or just random browsing behavior. In particular, there are two problems with the approach followed in previous studies: first they calculate the difference between observed and expected duplication (as one would in an ANOVA table) using percentages, not frequencies, as the row and column marginals; and second, they do not test whether that difference (assuming it is meaningful) is statistically significant. Hence, prior work analyzed networks that contained substantial amounts of noise.
For a significance level p < 0.01 (two tailed) t values need to be tij = 2.58; overlapping ties with values below that threshold are eliminated as nonsignificant.
Figure 3 illustrates how much the networks change before and after thresholding. Panel A shows that the test eliminates ∼25% of all the overlapping ties. Most importantly, the resulting networks are significantly more centralized, which is a measure we can use to capture status or asymmetry in relational structures (Freeman, 1979). In the context of these networks, the centrality of sites flags where audiences concentrate and which sources are more prominent – which offers a good metric to compare the two categories of news sources we are interested in: legacy and digital-born news media.
Results
Changes in Percentage Reach
Figure 4 shows the time series of the aggregated percentage reach for legacy and digital-born media. As stated above, we focus on the period September 2014 to September 2015 but the figure also shows, for illustration, the longer trends in reach. The time series show a dip that coincides with the beginning of our observation window. We do not have a substantive reason to explain for this shift, but we suspect it is related to how the rise of mobile devices impacts Alexa's estimates (which rely on tracking software installed on desktops). The shorter time series we analyze are bounded by our two measurements of audience networks (assembled with comScore data and analyzed below).
The blue dots track the average reach for legacy media sites; the red triangles track the average for digital-born news outlets. The first message contained by this figure is that legacy media have consistently better rankings in audience reach: they are read by twice as many internet users for the whole time period. The second message is that the “link tax” intervention seems to have made a small difference to the preintervention trend. The fitted line plots the predicted values of a linear model that includes a dummy variable for 1 January 2015, i.e. the date when the “link tax” took effect. There is an overall decaying trend, but the line suggests a bump in the reach of legacy media and, to a lesser extent, digital-born media that coincides with the implementation of the law. Of course, this is a very rough approximation to the data and the question is whether this change of trend remains significant once we control for autocorrelation and seasonality.
Table 1 shows the results of ARIMA models fitted to the data aggregated weekly (and selected following an assessment of usual diagnostics). The estimated parameters suggest that, for both time series, the intervention does coincide with a slight increase in reach; however, as the standard errors and z values indicate, this increase does not reach the threshold of statistical significance. In other words, there is no strong evidence that the law changed patterns of news consumption – at least, as assessed through this global aggregated measure of audience reach.
. | ARIMA Model . | Parameter . | Estimate . | SE . | Z value . |
---|---|---|---|---|---|
Legacy (weekly data) | (1,0,0) | Intercept | 3.113 | 0.231 | 13.502 |
AR1 | 0.872 | 0.080 | 10.956 | ||
Intervention | 0.017 | 0.147 | 0.910 | ||
Digital-Born (weekly data) | (1,0,0) | Intercept | 1.650 | 0.127 | 12.983 |
AR1 | 0.836 | 0.102 | 8.187 | ||
Intervention | 0.059 | 0.106 | 0.579 |
. | ARIMA Model . | Parameter . | Estimate . | SE . | Z value . |
---|---|---|---|---|---|
Legacy (weekly data) | (1,0,0) | Intercept | 3.113 | 0.231 | 13.502 |
AR1 | 0.872 | 0.080 | 10.956 | ||
Intervention | 0.017 | 0.147 | 0.910 | ||
Digital-Born (weekly data) | (1,0,0) | Intercept | 1.650 | 0.127 | 12.983 |
AR1 | 0.836 | 0.102 | 8.187 | ||
Intervention | 0.059 | 0.106 | 0.579 |
. | ARIMA Model . | Parameter . | Estimate . | SE . | Z value . |
---|---|---|---|---|---|
Legacy (weekly data) | (1,0,0) | Intercept | 3.113 | 0.231 | 13.502 |
AR1 | 0.872 | 0.080 | 10.956 | ||
Intervention | 0.017 | 0.147 | 0.910 | ||
Digital-Born (weekly data) | (1,0,0) | Intercept | 1.650 | 0.127 | 12.983 |
AR1 | 0.836 | 0.102 | 8.187 | ||
Intervention | 0.059 | 0.106 | 0.579 |
. | ARIMA Model . | Parameter . | Estimate . | SE . | Z value . |
---|---|---|---|---|---|
Legacy (weekly data) | (1,0,0) | Intercept | 3.113 | 0.231 | 13.502 |
AR1 | 0.872 | 0.080 | 10.956 | ||
Intervention | 0.017 | 0.147 | 0.910 | ||
Digital-Born (weekly data) | (1,0,0) | Intercept | 1.650 | 0.127 | 12.983 |
AR1 | 0.836 | 0.102 | 8.187 | ||
Intervention | 0.059 | 0.106 | 0.579 |
The analysis of the comScore data also show that there are no substantive changes in reach, although in this case we identify some significant (albeit small) shifts. Figure 5 shows the difference in the percentage reach of news sites in 2015 compared to their reach on 2014 (panel A). There is evidence that legacy media sites increase their average reach during this year by about 0.4 percentage points, a small increase that is nonetheless statistically significant (panel C). This increase sets legacy media significantly apart from digital-born sites (panel B). Digital-born sites remain, on average, on the same reach levels but they become significantly less visible relative to legacy media sites. Overall, legacy news organizations seem to have improved in their ability to attract audiences while digital-born outlets remain on similar levels compared to 2014. It is important to remember, however, that the outliers who dominate both groups are the net winners of these trends: the least popular legacy news sites perform worse, in terms of reach, than the most popular digital-born sites.
Changes in Audience Overlap Networks
Figure 6 summarizes changes in the audience overlap networks – which, again, we use to assess changes in audience fragmentation. In particular, the figure pays attention to degree centrality, which measures the number of other sites a focal media outlet shares audience with. Another way to think of centrality in this network is in terms of public recognition: sites that are very well connected are also those that become the gravity centers of people that will then consult other, more diverse, secondary sources. Panel A shows there is a lot of variance in how central sites are in 2015 compared to 2014, and that there is a general tendency to lose overlapping ties. Both legacy and digital-born media lose audience overlap with other outlets (panel B) but this decrease is higher for digital-born media. In both cases, the decrease in centrality is statistically significant (panel C).
Table 2 gives some more statistics to compare these networks. Because of the lower density in the network of 2015, there is also a lower transitivity, that is, less clustering – which we interpret as additional evidence of increasing fragmentation. Still in both networks transitivity levels are higher than those expected in random networks (the confidence intervals give the transitivity scores corresponding to the 2.5th and 97.5th percentiles of the simulated distributions, drawn from N = 1,000 random graphs that preserve the same size, density, and degree sequence). This means that, even though audience overlap goes down over time, it still reveals a clustering in how audiences behave that is not attributable to random browsing. The lack of significance of the homophily and degree correlation scores means that audiences do not self-select according to the type of media (legacy or digitalȁborn) or to their prominence (core vs peripheral).
. | 2014 . | 2015 . | ||
---|---|---|---|---|
. | observed . | bootstrapped CI . | observed . | bootstrapped CI . |
Number of Nodes | 93 | 93 | ||
Number of Edges | 3189 | 3032 | ||
Transitivity | 0.781 | (0.516, 0.536) | 0.758 | (0.502, 0.522) |
Homophily | 0.007 | (-0.028, 0.004) | -0.018 | (-0.028, 0.005) |
Degree Correlation | -0.143 | (-0.033, 0.035) | -0.177 | (-0.036, 0.034) |
. | 2014 . | 2015 . | ||
---|---|---|---|---|
. | observed . | bootstrapped CI . | observed . | bootstrapped CI . |
Number of Nodes | 93 | 93 | ||
Number of Edges | 3189 | 3032 | ||
Transitivity | 0.781 | (0.516, 0.536) | 0.758 | (0.502, 0.522) |
Homophily | 0.007 | (-0.028, 0.004) | -0.018 | (-0.028, 0.005) |
Degree Correlation | -0.143 | (-0.033, 0.035) | -0.177 | (-0.036, 0.034) |
Note: bootstrapped confidence intervals (CIs) are based on random networks (N = 1,000) that preserve the same number of nodes, edges and degree distribution of the observed networks. For the homophily statistic, CIs are based on random permutations (N = 1,000) of the node labels classifying the sites as legacy or digital-born media.
. | 2014 . | 2015 . | ||
---|---|---|---|---|
. | observed . | bootstrapped CI . | observed . | bootstrapped CI . |
Number of Nodes | 93 | 93 | ||
Number of Edges | 3189 | 3032 | ||
Transitivity | 0.781 | (0.516, 0.536) | 0.758 | (0.502, 0.522) |
Homophily | 0.007 | (-0.028, 0.004) | -0.018 | (-0.028, 0.005) |
Degree Correlation | -0.143 | (-0.033, 0.035) | -0.177 | (-0.036, 0.034) |
. | 2014 . | 2015 . | ||
---|---|---|---|---|
. | observed . | bootstrapped CI . | observed . | bootstrapped CI . |
Number of Nodes | 93 | 93 | ||
Number of Edges | 3189 | 3032 | ||
Transitivity | 0.781 | (0.516, 0.536) | 0.758 | (0.502, 0.522) |
Homophily | 0.007 | (-0.028, 0.004) | -0.018 | (-0.028, 0.005) |
Degree Correlation | -0.143 | (-0.033, 0.035) | -0.177 | (-0.036, 0.034) |
Note: bootstrapped confidence intervals (CIs) are based on random networks (N = 1,000) that preserve the same number of nodes, edges and degree distribution of the observed networks. For the homophily statistic, CIs are based on random permutations (N = 1,000) of the node labels classifying the sites as legacy or digital-born media.
Discussion
Overall, our analyses do not show strong evidence that the “link tax” drastically changed the audience reach of news sites, but they show evidence of increasing fragmentation. The structure of the audience overlap network becomes sparser and more fragmented over time. A sparser and more fragmented network means that news sites share less audience, thus becoming more differentiated in who consumes their news. One possible explanation for this reduction in the number of overlapping ties is that, in the absence of aggregators like Google News, consumers do not explore so widely the available news sources – a possibility that prior research supports (Calzada & Gil, 2016; Athey, Mobius & Pal, 2017). The audience data we analyze would reflect this effect as a decreasing overlap between sites. Another explanation is that more people are consuming news through mobile devices that do not leave a trail in the comScore or Alexa panels (yet). For the period of analysis considered here, however, smartphones and tablets still trailed behind desktops as entry points to the web (Newman et al., 2017).
One additional piece of evidence that can help discriminate between those possibilities comes in the form of referral data, as already discussed in the introduction to the paper. Figure 7 shows the sources of traffic for the news sites analyzed here, in the period following the “link tax” (and according to the Alexa global panel). As the numbers reveal, most people arrived at the news outlets either directly (by clicking on bookmarks or typing the address in the browser) or through search. Links and social media referrals amount to a small fraction of the total number of sources. In other words, the importance of links to direct online traffic might be overrated.
The figure also suggests that social media are not that prominent as gateways to news. Of course, referral data does not account for exposure to news within the platform – that is, for the number of social media users that never click on links but still read headlines and snippets from their feeds; and those who read third-party news content hosted by social media (i.e. through the Instant Articles service). Be that as it may, social media content has not been the subject of regulation – or not yet. The “link tax” makes it more difficult for media outlets to refer to content they did not create, but the impact this regulation has on user behavior is vague. For the time period we consider, branded legacy media is consistently the preferred source of information. There is also a range of secondary news outlets that users access to complement, not substitute, their primary sources. Given the clear and consistent evidence about this trend, both in this paper and in prior research, the question then is: why are not more people willing to pay for the news services of legacy media brands? This question is especially pertinent in Spain where legacy media are weaker in terms of subscriptions, compared to other European countries (Nicholls, Nabeelah, Nielsen, 2016). The causes that explain a weaker legacy media lie outside the scope of this paper, but it is unlikely that those causes respond only to the emergence of digital-born news outlets. The alternative hypothesis, i.e. that digital-born media emerges stronger where legacy brands are weaker, seem equally (if not more) plausible.
Future research should consider the impact of other regulatory initiatives – particularly aggressive in Europe – in redirecting the dynamics of news production and consumption. This policy dimension will become increasingly relevant as the impact of new technologies on the media landscape become more palpable. New technologies have allowed the emergence of unprecedented gateways to news consumption, and regulators are still trying to determine the extent of the transformations and whether (and how) to harness those changes, especially as they relate to shifting power structures. Modifications to intellectual property laws, as the Spanish “link tax” exemplifies, are also manifestations of a more general problem that will require more attention in the immediate future, namely the impact of technology in public life – especially as algorithms become increasingly relevant in the curation of news and access to politically relevant information. As social media sites become even more prominent in directing traffic to political news and even in hosting third-party news content, it will also be important to understand their role in the public domain, determine the extent of their accountability, and translate that knowledge into effective interventions.
Conclusion
Our analyses contribute new evidence to still open debates about how digital technologies affect access to news. We used data from two different sources to track online browsing patterns, and we analyzed the impact of a copyright regulation that taxed links to news. We found that this intervention did not have a significant impact on the reach of news sites. However, we also found an increase in audience fragmentation, defined as a reduction in the audience overlap of news media sites. Our approach to measuring audience fragmentation makes an important methodological improvement to how audience networks had been mapped in the past. We discussed whether the increase in fragmentation could result from the “link tax” or from the parallel rise of social media as sources of information. We conclude that more research is necessary to determine which factors are the most important drivers in observed changes in news consumption patterns.
Work on this paper has been partially funded by NSF grant #1729412 and by the Spanish Ministry of Economy and Competitiveness grant #CSO2013-47082-P. This work was conducted while the first author was visiting the Annenberg School for Communication at the University of Pennsylvania.
References
About the Author
Sílvia Majó-Vázquez is Research Fellow at the Reuters Institute for the Study of Journalism at the University of Oxford. Her research areas include news audience behavior and the role of digital-born and legacy media in European news markets. She applies tools from network science to the study of news consumption and production. She has been a research fellow at the DiMeNet group of the Annenberg School for Communication at the University of Pennsylvania and a pre-doctoral researcher at the Internet Interdisciplinary Institute (In3-UOC).
Ana S. Cardenal is an Associate Professor of Political Science in the Law and Political Science Department at the Universitat Oberta de Catalunya (UOC). She holds a Ph. D in Political Science by the Universitat Autónoma de Barcelona (UAB), a degree in IR by Johns Hopkins University, and a B.Sc. in Journalism (UAB). She has been a visiting researcher at Stanford's Center for Latin American Studies and a Fulbright Fellow at New York University's Politics Department. Currently, she leads a research group at UOC on digital media, public opinion, and political behavior.
Sandra González-Bailón (corresponding author) is an Assistant Professor at the Annenberg School for Communication, University of Pennsylvania, where she leads the DiMeNet research group. Prior to joining Penn, she was a Research Fellow at the Oxford Internet Institute, where she is now a Research Associate. Her research lies at the intersection of network science, data mining, computational tools, and political communication. Her book Decoding the Social World is forthcoming with MIT Press (fall 2017).
Address: Annenberg School for Communication, University of Pennsylvania, 3620 Walnut Street, Philadelphia, PA 19104-6220, USA. E-mail: sgonzalezbailon@asc.upenn.edu.
Author notes
Editorial Record: First manuscript received on January 17, 2016. Revisions received on August 15, 2016 and February 27, 2017. Accepted by Noshir Contractor on July 6, 2017. Final manuscript received on July 17, 2017. First published online on September 15, 2017.