The impact of a national research assessment on the publications of sociologists in Italy

This article investigates the impact of the second national research assessment (VQR 2004–10), which was performed in 2011 by the Italian National Agency for the Evaluation of Universities and Research Institutes, on publication strategies by sociologists in Italy. We reconstructed all publications from Italian sociologists in Scopus between 2006 and 2015, that is ﬁve years before and after the assessment. We also checked academic tenure and promotions during the assessment. Our results showed the potentially distortive effect of institutional signals on publications given that Italian sociologists published more in journals that were considered inﬂuential for assessment, some, however, being of doubtful quality. Our ﬁndings would suggest that the use of informed peer review and ad hoc journal ranking could stimulate adaptive responses based on strategic journal targeting to ensure publication


Introduction
The dominant worldwide 'publish or perish' culture in academia is often associated with regular and pervasive quantitative research assessments of scientific outputs in terms of publications.The 'meritocratic culture' of academia requires systematic and impartial assessment as a means to legitimize funding allocation on a competitive basis (Abramo and D'Angelo 2015;Jappelli et al. 2017).These assessments are often key for promotion and career advancement (Abramo et al. 2014;Edwards and Roy 2017;Nederhof 2006).
Previous research has suggested that these assessments may have serious implications on scientists' behaviour, though it is difficult to measure the effect of these institutional impulses (Rijcke et al. 2016;Zacharewicz et al. 2019).Here, context matters and developing comprehensive and robust indicators of evaluation are always difficult, also considering that research is rarely pursued individually (Waltman 2018).Once introduced, the performative nature of quantitative evaluation can also trigger gamification (Seeber et al. 2019;Baccini et al. 2019), which in turn risks distorting academic freedom, encouraging short-termism of research investment and portfolios (Hicks et al. 2015;Rijcke et al. 2016), even nurturing an excessively competitive environment (Edwards and Roy 2017).
In this respect, Rijcke et al. (2016) suggested that metrics-based evaluation may have four negative implications.Firstly, metrics could affect the strategic behaviour of scientists leading to switching goals with means.This occurs when evaluation itself becomes a scientist's goal, and research activity is expected to produce publishable outcomes (Dupps and Randleman, 2012;Supak Smol ci c 2013) and maximize evaluation data/metrics.This would in turn increase scientists' risk aversion and disincentivize long-term, challenging research projects (e.g.Butler (2003); see also van den Besselaar et al. (2017) for a critical re-examination and Hicks (2017) and Gla ¨ser (2017) for a critical response).Secondly, bibliometric indicators could embody bias against interdisciplinary research, especially when combined with peer review.These indicators could also encourage scientists to reduce task complexity and standardize collaboration (Whitley, 2007), reducing diversity of approaches, methods, and standards, while imposing rigid constraints on publication formats.Finally, these evaluation metrics may also have some effect on academic institutions, by increasing resource allocation towards more productive institutions capable of creating cumulative advantages (Abramo and D'Angelo 2015).To reconstruct the potential effect of these complex factors on scientists' behaviour, considering research assessment in contexts where productivity, quantification and indicators are not intrinsic to the pre-existing reward and incentive structure of academia could give useful insights.
This article aims to analyse the impact of a recent national research assessment on a sample of Italian sociologists, who are members of an academic community considered relatively unfamiliar with the 'cult' of productivity, quantification, and indicators (Abramo et al. 2009).The metrics for assessments here were a mix of quantitative and informed peer review, which could have further made the assessment more complex given the effect of a journal's prestige could be mixed with reviewers' personal criteria for quality and merit.Our analysis attempts to reveal adaptive responses by academics to ambiguous institutional signals to optimize the balance between their effort and possible outcomes in the most rational way possible, in-line with the criteria used in the national research assessment.
This article is organized as follows: Section 2 presents our research background and the motivation of our study.Section 3 presents the methodology and data, while Section 4 presents the results.Finally, Section 5 summarizes the main findings and discusses certain limitations of the study.

Background
Valuations, assessments and rankings are pillars of academic 'audit culture', and especially so in research where indicators are used to promote accountability and public governance of resources, using a mix of tools and approaches (Sauder and Espeland 2009).For instance, certain European countries have adapted different research evaluation systems, often with a combination of bibliometric indicators and peer review (Hicks 2012;Jonkers and Zacharewicz 2016;Zacharewicz et al. 2019).In a thorough analysis of performancebased research funding systems in EU28 member countries, Zacharewicz et al. (2019) found important differences in terms of evaluation objectives (e.g.only teaching activities, or a combination of research and teaching) and the temporal dimension (e.g.ex ante or ex post assessment).They suggested that any institutional design of evaluation needs to consider contexts and specificities at the expense of generalization and comparison.
For instance, Sivertsen (2016) discussed the 'Norwegian model', followed by various countries, such as Belgium (Flanders), Denmark, and Norway.This model has three main components: (1) a national database of all publications; (2) a single publication indicator with weights to accommodate certain field specificities and publishing traditions; and (3) a performance-based funding model that allocates resources based on the weight of institutions on the total publications at the country level.It aims to reflect on certain differences between the humanities and other fields, such as the diversity of scholarly communication means in the former, for example, national journals and books.However, Engels and Guns (2018) highlighted the difference of the Flemish adaptation of the Norwegian model, such as the use of a commercially curated database for social sciences and humanities to complement current sources (e.g.Web of Science (WoS)).As suggested by Aagaard and Schneider (2016), while concentrating on the specific case of Denmark, the evolving contexts of evaluation and the multiplicity of policies would not help reflection on the fundamental drivers of scientific performance of either institutes or academics.
The Italian case is a good example.This country has recently made progress towards attaining a viable national research-based evaluation (Hicks 2012;Jonkers and Zacharewicz 2016).After preliminary attempts (i.e.VTR 1 ), in October 2006 the Italian government established a public independent agency, called the 'Italian National Agency for the Evaluation of University and Research Institutes' (henceforth ANVUR) (ANVUR 2006), with the institutive Law published only four years later with DPR 76/2010.The agency started to operate in February 2011 with the nomination of its Board of Directors.ANVUR's mission was to assess research productivity of universities and research institutes.Its mandate was to establish more competitive criteria to improve the allocation of university budgets and link promotion and careers to research productivity (e.g.scientific output) and scientific merit.Besides assessing research output, it also evaluated teaching, administrative performance, social impact, and student competence by covering all areas, from hard sciences to music conservatories (Benedetto et al. 2017b).ANVUR members include well-known Italian scientists from various scientific fields appointed by Italian Ministry of University and Research (MIUR) (Ancaiani et al. 2015;ANVUR 2013;Benedetto et al. 2017b).
The first National assessment by ANVUR was VQR 2004-10, which started with a call for participation published on 7 November 2011 ( basically overlapping the introduction of the national scientific qualification as a requirement for promotion and careers of all academics (Abramo and D'Angelo 2015;Marzolla 2016)).About 185,000 research products were evaluated in fourteen research areas.In STEM 2 and hard sciences, nearly 90 per cent of products were journal articles.In the humanities and social sciences, journal articles covered only 26 per cent of the total number of products, the rest being mostly books and book chapters.Academics working in research institutes were required to submit six research products, while those in universities with teaching duties were required to submit three such products for evaluation.An exception was considered for younger researchers based on the time of their employment in academia (Ancaiani et al. 2015;ANVUR 2013).
ANVUR selected 450 experts as 'Groups of experts in evaluation' (GEVs), which reflected the structure of disciplinary sectors that dominate the organization of Italian academia.They were asked to define the details and methodology of evaluation, including deciding either to use a combination of peer review, publications, and bibliometric analysis or exclusively peer review to assess the scientists' production.Under pressure from GEVs and many academics in the humanities 3 , ANVUR decided to follow a mix of quantitative metrics, such as the number of publications, citations and h-index in hard sciences, and qualitative metrics, such as peer review in the humanities and social sciences.For the latter, VQR 2004-10 complied with the so-called informed peer review.Scientists were required to preselect and submit their best published research products, while VQR peer reviewers could identify each author and the impact of the journal or the prestige of the book series in which products were published via online sources.This also occurred when products were articles previously published in peer-reviewed, authoritative journals.They were peer-reviewed again by GEV experts after publication.Each product was categorized as Excellent, Good, Acceptable or Limited (Ancaiani et al. 2015;ANVUR 2013;Bertocchi et al. 2015).
After the publication of evaluation results, members of ANVUR found a certain degree of agreement between peer review and bibliometric analysis (Bertocchi et al. 2015).However, other studies reexamined the situation and found inconsistencies (Baccini et al. 2020;Baccini and De Nicolao 2016), while some concluded that 'bibliometric indicators could be used only to assess hard sciences' (Abramo et al. 2009).In addition, supporters of bibliometric evaluations outlined the excessive cost of peer review (Geuna and Piolatto 2016;Zacharewicz et al. 2019), insisting on the subjective bias of peer reviewers and argued the higher fidelity of quantitative exercises that measured each scientist's productivity without adding selection distortion (Abramo and D'Angelo 2011a;Zacharewicz et al. 2019).
In one study, Bertocchi et al. (2015), who were members of the panel which evaluated research output of scientists in scientific area 13 (Economics, Management, and Statistics), selected 590 random papers from 5,681 reviewed by VQR 2004-10 experts.GEV in this field opted for informed peer review, while peer reviewers used bibliometric indicators and journal impact indices to assess the quality of research products.They found substantial agreement between bibliometrics and peer review.However, they also suggested a potential ambiguity of these non-blind processes, as it was impossible to establish whether close agreement was due to reviewers' opinion or to the prestige of the outlet where articles were published.
For instance, by using a subset of ANVUR data of VQR 2011-14, Traag et al. (2020) examined the respective agreement between bibliometrics and peer review in assessing the performance of academic institutions.They found that bibliometric and peer review outcomes converged the most when considering aggregate institutions rather than individual publications and when estimating the degree of internal agreement between reviewers.While outlining possible bias and pitfalls, they concluded that in some fields, metrics could be used as an alternative to peer review in research assessment.
Besides the interesting mix of quantitative metrics and criteria, Italy is worth reviewing also for other reasons.Firstly, while ANVUR assessment aimed to evaluate academic and research structures rather than individual scientists, the Minister, the agency and the press also explicitly considered the importance of productivity and quantitative indicators (Turri 2014).Secondly, ANVUR was also involved in developing common standards for the national qualification of all new associate and full professors, which linked promotion and resources to research productivity.It needs to be noted that these standards generated widespread debate within the scientific community and were opposed by many academics, sometimes generating contrasting outcomes (Baccini et al. 2020;Baccini and De Nicolao 2016).However, all these initiatives marked the beginning of a cultural shift in the institutional environment of Italian academia, making quantitative research assessments and bibliometric databases more popular, for example, h-index, WoS, and Scopus (Akbaritabar et al. 2018;Bonaccorsi 2018).
The case of sociologists could also give us an important insight.While the impact of ANVUR assessment has been extensively examined for the hard sciences and economics, management, and statistics (Abramo et al. 2009;Bertocchi et al. 2015), sociologists have not yet been considered.On the one hand, ANVUR followed the dominant opinion of academics when considering bibliometric indicators as being unreliable for assessing the productivity of social scientists (Benedetto et al. 2017a).On the other hand, sociologists are part of a community that includes not only qualitatively oriented academics, who are predominantly anti-bibliometric and might preferably publish in national journals, but also quantitative social scientists, whose research standards are closer to hard scientists, are familiar with bibliometric indicators and preferably publish in international journals.This co-existence of different epistemic communities within the same discipline could make research assessment even more problematic and worthwhile (Akbaritabar et al. 2018).Furthermore, differing sub-communities of Italian sociologists have been considered disconnected from each other, even if they have overlapping thematic areas of focus with a drastically diverse collaboration behaviour (Akbaritabar et al. 2020).This makes them an interesting case to see how these sub-communities have reacted to ANVUR evaluation.
Another point to consider was the decision by ANVUR, under pressure from GEVs, to allow these experts to classify the quality of journals in their own field, regardless of international prestige.This added a further layer of ambiguity to the policy signal.It is worth noting here that most Italian journals with a national academic readership were considered the equal of international ones, though internal peer review procedures, impact factor, and citation rates are not comparable.This may have led rational, adaptive scientists to minimize the risk of delaying publication by targeting national outlets, which were included in the ANVUR's selected list of journals.It is also worth considering that assessment rewards and penalties were addressed to the universities and departments rather than to teams or individual scientists.This may have created incentives for less intrinsically motivated, productive scientists towards minimizing effort.

Data and measurements
Data were collected from Scopus in September 2016 and included all records published by Italian sociologists between 2006 and 2015.This period covered five full years before and after ANVUR, whose original call for participation was on 7 November 2011.By considering five years before and after the call, we aimed to trace pre-existing behaviour and examine scientists' reactions to institutional policies.
We started from available institutional data on the MIUR website and gathered a list of all sociologists currently registered in Italian Universities and research institutes, that is, 1,029 scholars.We reconstructed each academic's grade (i.e.assistant, associate or full professor), the 'scientific disciplinary sector' 4 in which they were formally assigned to, gender, affiliation, department, and finally their first and last names.Promotion and careers were reconstructed by comparing data from 2010 to 2016.This was coded as a dichotomous variable in dataset: 'academic level changed' or 'unchanged'.
Regarding publication records, Scopus included approximately 8,698 journals in the social sciences (Scopus 2017).These include some of the sociology journals published by Italian publishers or edited by Italian sociologists, such as Sociologica, Sociologia, Rassegna Italiana di Sociologia, Salute e Societa, Studi Emigrazione, Stato e Mercato, Italian Sociological Review, Journal of Modern Italian Studies, Etnografia e Ricerca Qualitativa, and Polis.This confirmed that popular and mainstream Italian journals were represented in the dataset. 5However, in VQR 2004-10, ANVUR categorized some of the Italian journals (including a total of fifty journals from which 50 per cent were indexed in Scopus (last checked on 1 August 2018) under the title of 'Fascia journals' considering three levels, Fascia A, Fascia B, and Fascia C (ANVUR 2012).To complete our dataset, we then coded all articles in the sample as Fascia (either A, B, or C) or non-Fascia journals.Note that although ANVUR's list also included some prestigious journals, the list excluded many renowned international journals.
Following Akbaritabar et al. (2018), we used a productivity indicator developed by Abramo and D'Angelo (2011b) by considering a micro-economic stance to measure the scientists' output and examine the effect of certain institutional and structural factors.This function requires time for the input of academic work and publications and impact of publications as output, while also considering the number of contributors to any given scientific publication.This index was called Fractional Scientific Strength (FSS) as Equation (1) shows: • t: The time window between first publication and the last one for each researcher. 6In case an author had one or several publications published in a single year and they did not have other publications in the sample, we considered one year as the time input.
• N: Number of sociologist's publications.
• c i : Number of citations that each publication i collected.
• c: Average number of citations of all other records published in the same year. 7• f i : Inverse of the number of authors (fractional contribution of each author to paper i).
To measure the scientists' level of international collaborations (Akbaritabar et al. 2018;Katz and Martin 1997), we used an 'internationalization index', which considered the co-authors' affiliation and country (Leydesdorff et al. 2014) and calculated the number of authors with non-Italian affiliations a fi on the total number of authors of each paper a i .We aggregated this value by averaging the internationalization scores of all publications, N as follows in Equation ( 2):

Repeated measurement model with nested membership
In order to examine the importance of institutional embeddedness and compare individuals at different institutional levels before and after ANVUR, we followed Akbaritabar et al. (2018) in assuming that each scientist was nested in different clusters, possibly influencing their scientific productivity.We considered two cluster levels: (1) the department, as the first level of organizational embeddedness and (2) the university, 8 relevant for power, career and strategic relationships, and sometimes also for research and collaboration (Abramo et al. 2016;Akbaritabar et al. 2020).This was to understand whether differences in these levels could be associated with different reactions to institutional policies.Sociologists in similar departments in different universities could be exposed to similar cultural contexts, and sociologists in different departments of the same university could be exposed to similar organizational facilities and constraints.In order to accommodate this complexity, we used a nested membership random effects structure (Baayen et al. 2008).In order to model ANVUR impact among different departments and universities, we followed previous research (Faraway 2005;Snjiders and Bosker 1999;Zuur et al. 2009) and used a repeated measurements mixed effect model with nested membership structure in departments and universities.This was to create 'between' (different individuals in each condition) and 'within' (same individuals before and after ANVUR) group measures that could help us understand any possible heterogeneity in individual responses.We then kept the same random effects structure for each model so that important variables were compared without excessive complications or computational inefficiencies. 9In case of the total number of publications due to the count nature of this variable and to control for the robustness of our results, we ran separate models with a negative binomial distribution (while keeping the random effects structure similar to our other models).
To investigate the probability of publishing in 'Fascia' journals after ANVUR, we used logistic regression models with differing control variables.To further control the robustness of our analysis and results, we ran Bayesian models with baseline and some of the main control variables, whose results are presented in the next section.

Results
It is important to note that despite the vast coverage of publications in Scopus (2017), our dataset only covered a fraction of the scientific productivity of Italian sociologists (which is limited even more when focusing only on the five years before and after ANVUR).Indeed, only 57.53 per cent of the 1,029 sociologists had at least one publication record indexed in Scopus.Most missing records were probably published in national outlets, including book series by national publishers (Akbaritabar et al. 2018(Akbaritabar et al. , 2020)).We excluded all Italian sociologists without any publications from our analysis.
Figure 1 shows that the distribution of publications was highly skewed with a few sociologists publishing a considerable amount of the total number of publications.This was in line with previous studies, which showed that productivity is cumulative and nonlinear (Coile 1977;Nygaard 2015;Ramsden 1994).Table 1 presents a descriptive view (mean and median) of the total number of publications and the FSS of sociologists compared over geographical regions, universities of different ranks, sectors, departments, academic levels, and contrasting those promoted to a new academic position from 2010 to 2016 (indicated as level change).It shows that if we consider the average FSS, average publications and median publications have increased post-ANVUR.In some cases, the median FSShas decreased post-ANVUR.It is worth noting that here we only focused on the most productive sample of Italian sociologists since those without a single publication indexed in Scopus were excluded from our dataset.
We then considered multi-level repeated measures with nested structure for each relevant variable such as the FSS, overall citations, internationalization, number of co-authors, and number of    publications (Fig. 2).Results showed that the FSS increased significantly after ANVUR.We found a lower rate of citations after ANVUR, but this simply reflects the dependence of citations on time (Wang 2013).Unlike previous studies, which suggested that the internationalization of scientific collaborations has recently increased (Akbaritabar et al. 2018;Leydesdorff et al. 2014), we found that the internationalization of this most productive subset of Italian sociologists actually decreased after ANVUR.However, the number of coauthors had increased.Note that the number of co-authorships could increase due to various factors, including the increasing appeal of team science also in the social sciences (Greene 2007;Leahey 2016).The number of publications by Italian sociologists, which we modelled with a negative binomial distribution to accommodate for the count nature of data, increased after ANVUR.Note that these were baseline (NULL) models with the same random structure as those described in the Methods section, but did not include any fixed effects.This was to explore general trends after ANVUR (see Supplementary Appendix Table A.5 for further detail on these models).
We then considered the effect of 'within' and 'between' groups on FSS and the number of papers, including the academic level and its change from 2010 to 2016.We also considered each subject's scientific disciplinary sector, gender, academic level in 2010 and in 2016, promotion, university ranks, geographical regions, and departments, which were indicated in the column names in the tables of results.Differing categories of each variable are indicated in the row names.
Table 2 shows our results with FSS as a dependent variable. 10In general, the FSS of sociologists increased after ANVUR for most models while being statistically significant only for sector, level and level in 2010 (see third row, i.e.Post-ANVUR).When considering within-group effects (see Â post-ANVUR rows), full professors had a significantly decreasing FSS after ANVUR only for academic level. 11 We then considered the total number of publications as a dependent variable rather than the FSS presented in Table 3.While in all models (except the model evaluating departments and the full model) the total number of publications significantly increased after ANVUR (see third row, i.e.Post-ANVUR), we did not find any specific within or between group effect. 12 Given that the total number of publications has a count nature and to control for the robustness of our results, we ran negative binomial models (see Supplementary Appendix Table B.6).For sociologists from Northern universities and who were full professors in 2010, the number of publications was significantly higher than those from universities located in central regions and sociologists who were assistant professor in 2010, respectively.In the within-group comparison, full professors had a higher number of publications after ANVUR.
As previously mentioned, while ANVUR ranked some Italian journals and assigned categories such as Fascia (either Fascia A, B, or C (ANVUR 2012)), we assigned non-Fascia to all remaining Italian or international journals.From a total of 2,133 papers in our sample, there were 350 (16.40 per cent) Fascia articles, while there were 1,783 (83.60 per cent) non-Fascia articles.We used these categories in a separate set of logistic models to calculate the probability of publication in 'fascia' versus 'non-fascia' journals before and after ANVUR. 13We tried to verify whether sociologists published differently in these journals.We hypothesized that under assessment, sociologists could be induced to target more national journals given that  Table 3. Comparative table of repeated measures mixed effect analysis to check overall ANVUR effect on research productivity (dependent variable number of papers published).these were considered influential for evaluation (i.e.being listed in the 'Fascia' journals).Table 4 shows that Italian sociologists indeed preferably published in 'Fascia' journals after ANVUR.However, this probability increased less in cases of SPS/10 compared with SPS/07 and for sociologists working in Northern universities.This probability remained statistically significant even in our full model, after controlling for all different variables.To further control for the robustness of these results, we ran Bayesian models (Figs.3-5). 14Our results confirmed the observed trend in the baseline models.Considering that our sample covered only journals and book series indexed in Scopus (e.g.international and national journals including 50 per cent of Fascia journals), it is reasonable to suppose that other 'active' sociologists targeted other journals and book series not indexed in Scopus but perhaps included in 'fascia' journals.

Discussion and conclusions
This article aims to provide a quantitative analysis of the potential impact of ANVUR and VQR 2004-10 research assessment on Italian sociologists' publications.We considered sociologists worth studying, as this community involves the co-existence of different sub-communities, some closer to more qualitative sciences, while others to hard sciences (Akbaritabar et al. 2020).Considering that the assessment followed informed peer review in a context without any prior experience or consensual evaluation standards, it was interesting to examine scientists' reactions (Abramo and D'Angelo 2011a).Indeed, besides the pros and cons of this assessment (Abramo and D'Angelo 2017;Baccini et al. 2020;Baccini and De Nicolao 2016;Franceschini and Maisano 2017), evaluating the research output of scientists in similar periods can provide a picture of the interplay between institutional pressure and endogenous patterns of behaviour.This is why, regardless of detail, the VQR assessment by ANVUR could be considered the first policy experiment on 'nudging' Italian scientists towards higher scientific productivity (Ancaiani et al. 2015;Benedetto et al. 2017aBenedetto et al. , 2017b;;Bertocchi et al. 2015;Bonaccorsi 2018).
It is also worth noting that VQR assessment embodied a certain degree of institutional ambiguity (Boffo and Moscati 1998).The methods and procedures used by ANVUR for sociologists included redundant signals, sometimes even contradictory, for example, discriminating publication outlets according to their quality in principle, at the same time restricting the 'Fascia' list (top quality) and excluding many renowned international journals.This would suggest that certain endogenous properties of the Italian academic system, such as a well-established system of national and local publication outlets, possibly controlled by close colleagues (Miniaci and Pezzoni 2020), may have discouraged international publications without penalizing promotion and careers (Abramo et al. 2017).Furthermore, considering that scientists have their own personal career agenda, influenced by local issues and that there are fixed costs and endogenous forces that constrain productivity, it is not surprising that institutional pressures were individually interpreted differently.This lack of coherence between performance evaluation and career advancement (Abramo et al. 2014) has led some observers even to question whether the Italian academic system is prepared to reward any 'performance based' scheme (Bertocchi et al. 2015).Although the assessment had certainly positive implications, that is, establishing a competitive resource allocation scheme for university structures, its signalling function of the importance of high-quality  research was limited, with adaptive responses that probably reflected the rigidities and constraints of short-term adjustments.
Our findings suggest that ANVUR and specifically the VQR 2004-10 research assessment possibly stimulated strategic behaviour among Italian sociologists towards publishing more in 'Fascia' journals.Observed trends were consistent even after controlling for multiple contextual variables and analytical techniques.This is inline with what Rijcke et al. (2016) hypothesized as strategic behaviour, where researchers might try to use metrics and quantitative evaluation procedures as a signal to target outlets, which are considered more influential in evaluation.It is probable that the ambiguity of institutional signals of the quality of publication outlets, the artificial definition of 'Fascia' journals, often regardless of the 'objective' prestige of outlets, and the mismatch between VQR 2004-10 and overlapping national qualification procedures were used strategically by academics (Jonkers and Zacharewicz 2016).
Secondly, while attempts at using VQR 2004-10 to link research assessment to resource allocation at the university level have been successfully pursued by MIUR, positive and negative rewards have not affected individual academics.In Italy, universities could exploit their relative degree of autonomy on careers and promotion regardless of any performance indicator of applicants.Thus, comparative studies on similar EU research (Sandstro ¨m and Van den Besselaar 2018), in which universities are embedded in different institutional regimes (e.g. more or less autonomy and centralized/decentralized university systems), could improve our understanding of the complex interplay between institutional policies and individual behaviour (Provasi et al. 2012).
It is therefore worth noting that initiatives aligning assessment and promotions at different academic organizational levels and based on more transparent evaluation procedures could help reduce this institutional ambiguity.Systematic research comparing trajectories and adaptations in different fields could be used to understand the interplay of pre-existing endogenous forces and the different malleability of specific communities (Akbaritabar et al. 2020).This would help to contextualize evaluation methods (e.g.bibliometrics, informed peer review and the time scale of evaluations) at least on context-specific (Zacharewicz et al. 2019), if not on individual characteristics (see the case of gender in Marini and Meschitti (2018), see also Abramo et al. (2014Abramo et al. ( ), (2015)); Jonkers and Zacharewicz (2016)).
Finally, certain limitations in this study should be considered.Firstly, while Scopus covers some international and national outlets, including international book series, most Italian sociologists consider local outlets not indexed in Scopus and so excluded from our sample (Bertocchi et al. 2015).By including datasets covering national publications in more detail (e.g.Google Scholar or MIUR database), future studies could verify whether Italian sociologists reacted to the institutional pressure of ANVUR by increasing publications in these outlets.Unfortunately, as far as we know, MIUR has not made their evaluation data publicly available.
Furthermore, it is worth mentioning that Scopus is increasing its step-by-step coverage with implications on the number and type of products.This in turn affects the observed level of scientist production as reflected in our and similar samples.Controlling for these changes is impossible without proper access to micro and retrospective data on initial journal coverage.Secondly, our analysis of the effects of competition for promotion and career on publication patterns is incomplete.We were only able to reconstruct academics who were eventually promoted during the assessment (Abramo et al. 2014), while we did not have any data on candidates who applied for the same positions but were unsuccessful.This means that the effect of competition for promotion and career may be over-estimated in our analysis.
In addition, other exogenous factors might have affected scientist behaviour, which we could not control.One of these factors was probably the introduction of the national scientific qualification as a requirement for promotion and careers (Abramo and D'Angelo 2015;Marzolla 2016), which overlapped with the establishment of ANVUR and the research assessment.Furthermore, establishing exactly when ANVUR and VQR 2004-10 have started to affect academics and estimating their reaction in a common time frame is difficult (Hicks 2017).For instance, the tax deduction incentives that Italian government introduced to encourage academics-and other highly skilled nationalsresiding abroad to return to the country could also have affected the rate of co-authorship observed in our study.Given that simultaneous changes in multiple policies and their often complex effects on adaptive behaviour (Aagaard and Schneider 2017; Gla ¨ser 2017), our results should be interpreted cautiously and any causal relation based on our correlational research design should be avoided.
Finally, we used a nested membership random effects structure which enabled studying organizational contexts along with other fixed effects, while ruling out individual subjects' random noise (see Baayen et al. (2008) for a discussion).However, there are other methodological possibilities to study the effect of policy changes in scientist's behaviour, including Dif in Dif and Regression Discontinuity Design (Seeber et al. 2019).These efforts might help provide more insights into the effect of ANVUR on scientist reactions.

Figure 1 .
Figure 1.Distribution of the total number of publications 2006-15 (Scopus data) (X ¼ number of publications, Y ¼ number of authors with that many publications, 437 authors did not have any publications, which were removed from visualization and analysis).

Figure 2 .
Figure 2. Baseline repeated measures mixed effect analysis to check overall ANVUR effect on research productivity (different dependent variables presented).

Figure 3 .Figure 5 .
Figure 3. Bayesian model of publications in fascia journals controlling the robustness of previous models.

Table 1 .
Descriptive statistics of the total number of publications and FSS, BF ¼ Before ANVUR, AF ¼ After ANVUR.

Table 2 .
Comparative table of repeated measures mixed effect analysis to check overall ANVUR effect on research productivity (dependent variable FSS).

Table 4 .
Logistic regression models with differing control variables on publications in fascia A journals.