The bioinformatics wealth of nations

We quantify scientific output for bioinformatics across the world, using a range of bibliometric indices. The most prolific 40 countries generate 96% of all publications in the field, a fact also reflected in the number of citations and the country h-index. Remarkably, 30 of these countries have also been found to generate >98% of the world's top-cited publications. Smaller, productive countries attain a higher status when bibliometric indices are normalized for population size, without altering the overall picture. These 'productivity' patterns can be used for planning local, regional or international initiatives and a more effective development of the field.


SUPPLEMENTARY INFORMATION
Supplementary data are available at Bioinformatics online.


Introduction
In bibliometrics, scientific output is typically measured in terms of quantity, e.g. number of publications, or quality, e.g. number of citations (Almeida et al., 2009). For individual researchers, absolute counts are regarded as sufficient, although it is well-known that these numbers may vary per research field (Yang et al., 2012). The h-index, the number of N publications that have been cited at least N times (Hirsch, 2005), has also been shown to vary across scientific disciplines (Lillquist and Green, 2010). Other, more complex metrics have been devised, yet the h-index is indeed a widely used measure of academic 'success' or impact (Alonso et al., 2009)-despite the fact that the primary metrics on which it depends are the number of publications and citations (Yong, 2014).
To assess the standing of entire countries, similar measures are in use (Kahn, 2018). Numbers of publications, citations and the hindex have all been compared across nations, to investigate trends of scientific performance (Thelwall and Fairclough, 2017), identify the focus of research in countries, country groups or world regions (Lin et al., 2018), and monitor growth or decline patterns in research intensity (Jenab, 2016). For countries, normalization with econometric indices such as population size or gross domestic product (GDP) is usually necessary, if one needs to take into account relative, not absolute, performance (May, 1997). For large numbers such as publications or citations, this step is critical (Chasapi et al., 2019); it is less important for the h-index, which is a good measure of performance that reflects the impact of an entire country in science (Harzing and Giroud, 2014). The h-index can be compared against other measures, or rank-order countries in a comparative manner (Jacsó , 2009). Criticisms related to h-index such as its variation across fields, a certain lack of discriminatory power and dependence on self-citation patterns do not really apply to country-level statistics for a specific field, where the above factors are mitigated, rendering it ideal for this type of comparisons (Jacsó , 2009).

Materials and methods
To quantify the output of bioinformatics publications across countries, we have obtained numbers of publications and citations and obtained the h-index using the Web of Science (WoS) by Clarivate Analytics (formerly ISI Web of Knowledge) and a simple query, 'bioinformatics' for 'all fields' and 'country name' (slightly edited for accuracy) in the 'address' field (date: December 31, 2019; WoS Core Collection, across all years 1900-present-full list in Supplementary Table S1). This straightforward (and reproducible) query returns multiple counts for bilateral or multi-lateral collaborations, not affecting the overall picture-as counts are kept high for the top performers and in fact collaborations are taken into account as a real component of total output (King, 2004).

Results
We have used a list of 288 countries and territories and queried WoS for publications containing the search terms and requested publications, citations, the citations/publication ratio and the hindex for the returned results (four primary indices). The frequency distribution of the h-index rank follows an exponential decay curve with the following formula y ¼ 152.94e À0.0312x , where x is the rank of the entry and y is the h-index, and R 2 ¼ 0.9812 (Supplementary  Table S1). Of the 288 instances, 119 have h ¼ 0 and 28 instances have h ¼ 1 or 2-these are not further discussed (tiny countries or territories, or scientifically less active). The remaining 141 countries have an h-index > 2, 78 of those have an h-index > 11, just 53 of them have an h-index > 22 and 36 have an h-index > 44 (Fig. 1). The least active countries include those in the American, African and Asian tropics, as well as former Soviet republics and parts of the Middle East-unsurprisingly, and consistent with previous findings (Radosevic and Yoruk, 2014). More needs to be done to establish and develop additional activity in these areas, where possible, through international collaboration (Hennemann et al., 2012). Examples of proposed activities and recommendations from our own experience for Greece and Cyprus have been provided elsewhere (Chasapi et al., 2019). Ultimately, the 'top' 78 countries generate 137 072/138 015 ¼ 99% of the world's output in the field of bioinformatics ('all fields' in WoS query, as mentioned above).
To examine whether the use of the h-index generates a certain bias as a single metric, we have further examined the top 78 countries for numbers of publications in the field and retained only those with >450 publications: this list includes 37 countries, all with an hindex ! 44 (>10% of the maximum: USA, h-index 427), with the exception of Iran (926 publications, h-index 37). We have also included three other entries in this list, namely Argentina (420 publications, h-index 44), Estonia (133 publications, h-index 44) and Hungary (350 publications, h-index 52) on the basis of their h-index performance (Fig. 2a). Interestingly, when a relative metric such as publications/million inhabitants is used, the resulting picture is slightly different promoting smaller countries with high performance in terms of the number of publications per capita, such as Switzerland or Denmark (Fig. 2b, for details please refer to Supplementary Table S2). The h-index ranks of those can be examined in comparison to a group of 30 countries that produce >98% of the world's highly cited (top 1%) papers (EU15, before 2004 accession and the G8 group, 31 in total, EU excluded here) (King, 2004) and two derived, population-normalized indices (publications and h-index per million inhabitants) (Supplementary Table S2). These 40 'top'-producer countries generate 132 244/138 015 ¼ 96% of all publications in bioinformatics, according to the WoS query (cf. 99% for the 78 countries, above; the remaining 38 have generated just 4828 publications, i.e. 3% of total, Fig. 1). The h-index ranking for bioinformatics against the ranking for the 1997-2001 contributions of the top 1% highly cited publications-arguably two independently produced sets-exhibits an astonishing similarity (Fig. 3). The rank (Spearman's rho) correlation coefficient for the two indices is 0.914 (P-value ¼ 0), climbing to 0.964 if Greece, Iran, Italy and Russia are excluded (h-index minus top 1% rank difference > 5, Fig. 3). Only Luxembourg (h ¼ 27, in the top 1% list: rank 31) is missing (Table 1). Disparities between the two types of rankings may indeed arise from the significant impact of bioinformatics (Wren, 2016). It is worth noting that the 'elite' top 1% group has not changed significantly in the past 20 years, as reported recently (Bornmann et al., 2018). The correlation between the ranking of countries with the top 1% cited publications and the country hindex for bioinformatics suggests that the leading nations in science with the highest influence and impact in general are, by and large, also those most active in a highly specialized field such as bioinformatics, an expected yet hitherto unknown fact. Our findings also imply that much of the production in the field is generated by the most wealthy nations (GDP or GDP per capita, not shown), raising questions about barriers to entry, and despite a wealth of opportunities for international collaboration, that will need to be addressed in the future.
As the field of bioinformatics has expanded across all of the life sciences (Ouzounis, 2012), the present analysis can form a basis upon which targeted policies for global research and training programs can be implemented, enhancing the productivity of lagging countries to align with the global activity elsewhere, where possible. Such policies might be formulated in alignment with sustainable development goals to match national priorities and perceived public views (Bain et al., 2019), while at the same time maintaining an appropriate balance between global trends and local needs (El-Chichakli et al., 2016).  Bioinformatics h-index rank (blue), top 1% highly cited papers rank (orange)-lower is better. Thirty countries are listed (see Table 1, and Supplementary  Table S2 for a full list of 40 countries)