The impact factor is based on citations of papers published by a scientific journal. It has been published since 1961 by the Institute for Scientific Information. It may be regarded as an estimate of the citation rate of a journal's papers, and the higher its value, the higher the scientific esteem of the journal. Although the impact factor was originally meant for comparison of journals, it is also used for assessment of the quality of individual papers, scientists and departments. For the latter a scientific basis is lacking, as we will demonstrate in this contribution.
The impact factor is a bibliometric parameter based on the number of times that papers in a particular journal are cited by all journals. It is considered a parameter of scientific quality of a journal. We will define the impact factor more exactly and we will also explain the details of its calculation in a following section. The Institute for Scientific Information (ISI) in Philadelphia (USA) has published the Science Citation Index (SCI) since 1961. The SCI covers all journals related to the field of clinical science and life sciences. It counts the citation of individual papers based on the list of references of all papers in journals that are indexed by the SCI itself, by the Social Sciences Citation Index (SSCI) and by the Arts and Humanities Citation Index (A&HCI). The SCI used to be published in roughly 20 ‘telephone book’ volumes on a yearly basis, but with the development of more powerful computers, it is now released monthly with accumulating data during the year.
The impact factor was intended primarily as a bibliographic research tool for retrieval of overlapping research for the benefit of scientists who worked in relative isolation to contact colleagues with comparable interests. Later it also developed as a research tool for the social sciences and more recently administrators appear to have discovered the impact factor as a parameter for quality of work of (groups of) scientists.
2. Does citation reflect quality?
The assumption behind the use of the impact factor is that citation reflects quality. Efforts have been made to correlate ‘peer esteem’ with the ‘citation rate’ of selected individual authors . The ‘peer esteem’ was scored by means of questionnaires. The correlation coefficient between the two parameters varied from 0.53 for the field of physics to 0.70 for the field of biochemistry, with chemistry, psychology and sociology in between . The problem with these correlations is that the two parameters (‘peer esteem’ and ‘number of citations’) are probably not independent.
Another problem with the interpretation of the number of citations is that simple counting of citations does not take into account the context of the citation. It is obvious that citations like “we confirmed previous data of Opthof et al.….” and “by misinterpretation of their own data Opthof et al. erroneously suggest that.….rd or “the fraudulous work of Opthof has retarded the field of autonomic influences on heart rate for decades” constitute different qualifications even if they all are scored as one citation.
A considerable amount of published scientific work is never cited. De Jong and Schaper  analysed 137019 papers on clinical cardiovascular science published by authors from the G7 countries (Canada, France, Germany, Italy, Japan, United Kingdom, USA) and by 7 smaller European countries (Belgium, Denmark, Finland, Netherlands, Norway, Sweden, Switzerland) between 1981 and 1992. Despite the fact that these papers had an average period of 6 years for citation, 46% of them were never cited, with the best score for Norway (31% not cited) and the worst score for Japan (69% not cited). It is not possible to discriminate between two possible explanations. There may be many redundant or low-quality publications within this field.
Cole  reported a number of citations of 5.5 for all cited authors in the SCI of 1961 (Fig. GR1; left bar). This does not reflect the citation rate of all authors, because we saw in the previous paragraph that even over a much longer period than 1 year (1 to 12 years!) about half of the papers (and thus also a lot of authors) are never cited . Still, if we take the citation rate of 5.5 for all cited authors in 1961 as a reference value, it is of interest that the number of citations of Nobel Prize winners in Physics who were awarded the prize between 1955 and 1965 was 58 in 1961 (Fig. GR1; middle bar). The number of citations of a subgroup of laureates who were awarded the Nobel Prize between 1962 and 1965—that is, after the count of their number of citations—was 62 in that same year (Fig. GR1; right bar). The latter excludes that their higher-than-average citation rate was caused by the fact that they were awarded the Nobel Prize. These data corroborate the point of view that citation indeed reflects scientific quality.
3. Definition and calculation of the impact factor
The impact factor of journal X in year Y equals the average number of citations in year Y scored in all journals of papers published by journal X in the years (Y-1) and (Y-2). Table 1 shows the calculation of the impact factor of Journal A in 1994. The first column lists the first authors (1,2,3,4,…, 557,558,559,560) of papers published between January 1992 and December 1993. The second column lists the months of publication. The third column gives the number of citations of the individual papers in all SCI, SSCI and A&HCI journals. For brevity, the months of February 1992 to November 1993 have been summed. In total, these 560 authors obtained 3493 citations (Table 1). The impact factor will then be 3493/560 = 6.24 (Table 1). As a matter of fact, the ISI does not score citations to individual papers, but to journals. This generates 411 extra citations, increasing the total number of citations to 3904 and the (official ISI) impact factor of journal A to 6.97. Obviously, this discrepancy results from inaccuracies by citing authors. For example, if the name of the first author of the first paper in the Table is ‘Breithardt’ and a citing author by mistake refers to the paper as ‘Breithart’, the citation would still go to journal A, although to the non-existing author ‘Breithart’. Interestingly, this type of mistake occurs in about 10% (411 of 3904) of the citations.
Author = listing of all papers by names of the first authors. Issue = all issues between January 1992 and December 1993. Cit. 1994 = citations of the first 4 and last 4 papers and of all issues published between January 1992 and December 1993. Impact 1994 = impact factor obtained by dividing the total number of citations by the total number of papers (560). The term ‘errors’ reflects citations that were correctly scored for Journal A, but were not correctly scored for the first author.
4. Use and abuse of the impact factor
It is tempting to use the impact factor as a tool for quality assessment not only for journals, but also for individual papers and for (groups of) scientists. One should be aware that time must elapse after publication of a paper before a meaningful citation analysis can be made. The latter is a major issue that makes citation analysis (which constitutes by definition an a posteriori measurement) less attractive. It is mainly for this reason that a priori systems were suggested for quality assessment. By assigning a ‘quality label’ to papers in the form of the impact factor of the journal at the time of publication, in theory a much faster quality assessment could be made. It must be emphasized, however, that the basic assumption is that an article published by a journal adequately represents the quality of the journal as a whole [1, 3, 4]. For individual parties such as (groups of) scientists, the outcome of quality assessment may have severe consequences. Therefore, analysis based on a priori determination of quality will provoke discussion on which (type of) paper will qualify for the a priori assessment. For example, how is a paper resulting from collaboration between groups graded? Do Editorials or (invited) Letters to the Editor qualify as scientific output? The following sections will discuss the suitability of a priori determination of quality for individual papers and (groups of) scientists. First we will focus on the significance of the impact factor for assessment of scientific journal quality.
4.1. Does the impact factor permit assessment of the quality of journals?
Calculation of the impact factor of a journal is performed by the ISI by counting the citations to the journal and not to the individual authors. In the example of Journal A (Table 1) we have seen that a ‘journal count’ produced an impact factor of 6.97. The total number of citations (3904) was obtained as a lump sum and it was divided by the number of papers (560). The ‘author count’ produced a lower impact factor (6.24). In the latter case the individual scores of the papers are known. Therefore it is possible to calculate not only the average, but also the standard deviation and the standard error of the mean (sem). This provides an estimate of the accuracy of the mean. For Journal A these figures were 6.24 ± 0.32 (s.e.m.). In Fig. GR2 the same has been done for another Journal B. The result was 2.69 ± 0.19. The difference between the impact factors of Journals A and B was highly significant (see legend, Fig. GR2). Therefore, it may be concluded that the impact factor indeed permits assessment of the quality of journals.
4.2. Does the impact factor permit assessment of the quality of individual papers?
Fig. GR3 shows a comparison of the number of citations of individual articles in Journals A and B (same journals as inFig. GR2). The abscissa shows the number of citations obtained in 1994 from 0 to 9. Papers with more than 10 citations were grouped into one bin. The ordinate shows the fraction of papers for each number of citations. Although the papers in Journal A were more frequently cited than those in Journal B (see also Fig. GR2), it may be appreciated that 35% of the papers in Journal A (summation of papers cited 0, 1 and 2 times) were actually less frequently cited than as indicated by the impact factor of Journal B. Also, both journals published very successful papers that were cited more frequently than 10 times, albeit at different percentages (Fig. GR3). Thus, the impact factor does not permit quality assessment of an individual paper.
4.3. Does the impact factor permit assessment of the quality of individual scientists?
Although comparison of single papers of different authors is not permitted on the basis of assigning an a priori quality label in the form of the impact factor of the journal in which the work was published, one might argue that this does not necessarily hold for comparison of many papers of one or more authors. Fig. GR4 shows an analysis of the work of a single author over a 17-year period . On the abscissa is the impact of the journals in which the work of this author was published expressed as the citations per article of the journal per year. On the ordinate is the actual citation rate of the papers published by this author expressed as the citations per article of the author per year. On average, this particular author has published papers in a journal with a journal impact of 3.1, whereas his article impact was 7.0. In fact, the author was more often cited than the other papers in the journals in which he published. Thus, one cannot use the impact factor for the assessment of the quality of the work of an individual author. Fig. GR4 also shows that there was no relation between the impact of the journals and the eventual citation rate of the papers of this author, because the correlation coefficient was virtually zero.
4.4. Does the impact factor permit assessment of the quality of groups of scientists (departments, institutes, universities)?
To study the influence of the number of papers for a priori quality assessment an experiment was performed. A listing of the contents of Journals A and B (Table 1, Fig. GR2 and Fig. GR3) was made. In 1992 and 1993 Journal A published 560 papers, whereas Journal B published 484 papers. By throwing dice (twice, 4 times and 6 times) particular papers were selected from the contents list and the associated number of citations in 1994 was scored. In addition, each 50th paper was scored in a separate subset. These ‘samples’ produced the 4 data points on the left in Fig. GR5 for Journals A (upper set) and B (lower set). As could be predicted from basic statistical rules, a sample size of about 15% (more than 50 papers) was needed before the ‘impact factor of the samples’ equalled the impact factor of journals. The implication is that even with random samples, more than 50 papers published in two previous years are required before the a priori assignment of impact factors to groups of papers can be performed for quality assessment. It is obvious that a selection of papers of a group of scientists is far from a random selection. Therefore, the number of papers needed for the assessment of quality of groups of scientists is probably much larger than the amount they normally would be able to produce. More research is needed to corroborate this point of view, but we propose that also for groups of scientists citation analysis is to be preferred to a priori labeling of papers. Possibly the a priori technique can provide a reliable estimate of the scientific quality of very large groups (such as universities).
5. Citation bias: papers simultaneously published in more than one journal
The question whether the impact factor reflects quality can be answered by studying cases in which the same papers are—simultaneously and on purpose—published in more than one journal. This may apply to the reports of (combined) Working Groups of the European Society of Cardiology and of the American College of Cardiology or the American Heart Association [5–7]. The papers were published in the European Heart Journal and in either Circulation or the Journal of the American College of Cardiology or in all three journals. These papers render in the consecutive years after publication a small, but unique set of data to study citation bias [5–7]. If scientific quality were the sole determinant of citation, one would expect that the number of citations would be equal in all journals yielding a citation ratio of 1 (Fig. GR6; left bar). If the quality would be unimportant and citation would follow the impact factors of the journals, then the citation ratio would equal the ratio of the impact factors of the journals at the time of publication (Fig. GR6; right bar). Fig. GR6 shows that the observed citation ratio was about 1.8 (Fig. GR6; middle bar). Therefore, the quality of the paper is more important than the impact factor of the journals, but on the other hand the visibility of a journal may increase the citation rate of a paper by as much as 80% (compare the left and middle bars in Fig. GR6; the difference was significant).
6. Citation and grants
It has previously been shown that there is a poor relation between the ‘immediate past performance’ measured with citation analysis of research groups in the fields of chemistry and biology and the peer judgement of two Dutch ‘National Survey Committees’ . Needless to say, the peer judgements of such committees have important consequences for grant applications. The lack of agreement between the two parameters in itself permits no preference between the two possible explanations. One explanation is that citation analysis simply does not reflect quality differences as peer judgement is supposed to do. The other possibility is that peer judgement insufficiently takes into account the past performance and the international position of research groups. On the basis of previous sections we are inclined to have more confidence in the latter explanation.
7. International aspects of citation
De Jong and Schaper  analysed the citation of papers published in the clinical cardiology category of the SCI (see also Section 2). The 137019 papers were published between 1981 and 1992 by the G7 countries and 7 smaller European countries (see Section 2for details). This type of analysis renders useful information on the success of research in clinical cardiology in those countries. Although the average number of citations (between 1981 and 1992) of these papers was just below 6 over the 12 years of analysis, there are major differences between the countries with the USA at the top with 7.5 and Japan in the bottom position with an average number of citations of only 2.0 per paper during 12 years (Fig. GR7). De Jong and Schaper correlated these results also with economic data on investments in research which may further differentiate the data of Fig. GR7.
At a national level one already starts to encounter difficulties when one compares peer judgement of national committees with results of citation analysis as we saw in the previous section . Applying citation analysis to even smaller entities such as research groups or individual scientists should be performed with care and can certainly not be substituted for by a priori quality labeling of individual papers by means of the impact factor of scientific journals.
1. The impact factor is a valid tool for the quality assessment of scientific journals.
2. The impact factor is not valid for the assessment of the quality of individual papers.
3. The impact factor is not valid for the assessment of the quality of individual scientists.
4. The impact factor is not valid for the assessment of the quality of groups of scientists if they produce fewer than 100 papers in 2 years.
5. For quality assessment of individual papers, individual scientists and groups of scientists, citation analysis should be preferred to a priori assumptions on the quality of papers.
6. Citation analysis does not necessarily agree with peer judgement.
7. Citation analysis may render useful a posteriori information on the success of governmental and university research policy.
CAST and beyond. Implications of Cardiac Arrhythmia Suppression Trial. Task Force of the Working Group on Arrhythmias of the ESC. Circulation 1990;81:1123–1127. Eur Heart J 1990; 11:194–199.