## Abstract

There are many ecological diversity measures, but their suitability for use with highly diverse bacterial communities is unclear and seldom considered. We assessed a range of species richness and evenness/dominance indices, and the use of species abundance models using samples of bacteria from zinc-contaminated and control soils. Bacteria were assigned to operational taxonomic units (OTUs) using amplified ribosomal DNA restriction analysis of 236 clones from each soil. The reduced diversity apparent in the contaminated soil was reflected by the diversity indices to varying degrees. The number of clones analysed and the weighting given to rare vs. abundant OTUs are the most important considerations when selecting measures. Our preferences, arrived at using theory and practical experience, include: the log series index alpha; the *Q* statistic (but only if coverage is 50% or more); the Berger–Parker and Simpson's indices, although their ecological relevance may be limited; and, unexpectedly, the Shannon–Wiener and Shannon evenness indices, even though their meanings may not be clear and their values inaccurate when coverage is low. For extrapolation, the equation for the log series distribution seems the best for extrapolating from OTU accumulation curves while non-parametric methods, such as Chao 1, show promise for estimating total OTU richness. Due to a preponderance of single-occurrence OTUs, none of the five species abundance models fit the OTU abundance distribution of the control soil, but both the log and log normal models fit the less diverse contaminated soil. Species abundance models are useful, irrespective of coverage, because they address the whole distribution of a sample, aiding comparison by revealing overall trends as well as specific changes in particular abundance classes.

## Introduction

Studies of bacterial diversity from environments ranging from anaerobic digesters [1] to arid soils [2] are producing ever-larger libraries of 16S rDNA clones. Whatever method is used to define operational taxonomic units (OTUs), suitable measures are required to describe and, more importantly, compare these highly diverse communities. Torsvik et al. [3] estimated that soils and sediments contain in the order of 10 000 different bacterial ‘species’. There are many ways to measure diversity. Methods vary in the particular aspect, or aspects, of diversity that they measure, their sensitivity to different abundance classes and their failings. Particular measures can be chosen to suit the goal of the study, or a suite of measures can be applied to obtain a diversity profile.

A range of diversity indices have been used with bacterial communities, in particular the ubiquitous Shannon index, the evenness indices derived from it, and Simpson's dominance index (e.g., [2,4,5). However, the suitability of these and other measures with diverse communities of bacteria has until recently been given little consideration [6,7.

Why choose the Shannon index (*H*′) as a default? The index, which is the negative sum of each OTU's proportional abundance multiplied by the log of its proportional abundance, is a measure of the amount of information (entropy) in the system and hence is a measure of the difficulty in predicting the identity of the next individual sampled [8]. It is positively correlated with species richness and evenness and gives more weight per individual to rare than common species. Outwardly then it seems a good general measure to use with diverse communities. A closer look, however, reveals major problems.

Crucially, “the difficulty with this statistic is to understand its meaning”[9]. This seems to be due to *H*′ being a measure but “not in any way a probability”[9] of the difficulty in predicting the identity of the next bacterial clone. As a result, discussions are typically limited to simply pointing out that a particular sample had the highest *H*′ and hence appears to be the most diverse. But what would it mean if the *H*′ of a hypothetical soil sample fell from 4.5 to 4.1, and would it be a cause for concern? An added complication is that the value of *H*′ obtained will be an underestimate of the true value due to incomplete coverage. The size of this error will differ between samples depending on the diversity and evenness in each, and be large with small samples. The value of *H*′ will also depend upon the resolution used to define OTUs, with the highest values generally occurring when sequencing is used. Moreover, why use an index that gives extra weighting to rare OTUs? Why discriminate against the more abundant species without any knowledge of their relative importance or usefulness as bioindicators?

The uncertainty surrounding the Shannon index is exemplified by the variety of names given to it. While it is fully termed the Shannon–Wiener function [8], after being derived independently by Claude Shannon and Norbert Wiener [10,11, it is often called the Shannon–Weaver index after Wallace Weaver, Shannon's co-author. Recent papers of bacterial diversity have even named it the Shannon–Weiner or Shannon–Weaner index. Sometimes there are two spellings in the same paper. Paraphrasing Oscar Wilde [12]“to lose valuable information by relying upon an enigmatic and perhaps inappropriate index may be regarded as a misfortune, but to lose its name as well looks like carelessness”.

The Shannon index belongs to the first of three general categories used by Magurran [13] to group the various diversity measures. These are species richness indices, indices based on the proportional abundances of species (evenness/dominance measures), and species abundance models. Using Magurran as a guide, we assessed a range of measures from these three categories for their suitability with highly diverse bacterial populations. We also included additional relevant indices, one extra species abundance model (to adequately cover the breadth of distributions that may exist) and three estimators of total species abundance. The impact of zinc contamination upon bacterial diversity in an agricultural soil [14] was used as a case study.

## Test datasets and the selection of diversity measures

We used amplified ribosomal DNA restriction analysis (ARDRA) of the extractable bacterial fraction to compare the diversity of a zinc-contaminated loamy sand (400 mg kg^{−1} Zn, pH 5.7) with that of a control loamy sand (57 mg kg^{−1} Zn, pH 6.2) from a long-term sewage sludge experiment established in 1982 at ADAS Gleadthorpe. Various arable and grass crops have been grown on the site, but in the year of sampling the site was green fallow. Full details of the methods are given in Moffett et al. [14]. Briefly, the extractable bacterial fraction was obtained from 50 g fresh soil (a bulked sample of five, 0–10 cm deep cores) from each plot using a method based on Steffan et al. [15]. DNA was then extracted and purified using a method adapted from Zhou et al. [16]. Each pellet was initially ground with sand under liquid nitrogen, then incubated in a phosphate buffer containing cetyltrimethyl-ammonium bromide to which proteinase K and SDS were added. After the lysis and the removal of cell debris, proteins were removed using chloroform–isoamyl alcohol and the DNA precipitated with isopropanol. Gel electrophoresis was used to simultaneously purify the DNA further and isolate the high-molecular-mass fraction. Approximately 530 bp of the bacterial 16S rRNA gene was amplified using a 24-cycle PCR with primers 27f and 519r. It was then cloned, the cloned sequences amplified with vector-specific primers, then digested simultaneously with *Hpa*II and *Eco*RI. Fragments were separated and sized in 3% agarose gels using the molecular size marker *Hin*fI-digested ΦX174. Clones that produced a very simple fragment pattern were re-analysed by subsequently digesting them with *Mbo*I and *Eco*RI. Fragment patterns were sorted and grouped into OTUs using the number and sizes of fragments. Potential sources of error associated with the molecular methods were minimised wherever possible (see [14]).

Two hundred and thirty six clones were analysed from each soil. In the sludged control 120 OTUs were identified while 90 were found in the zinc-contaminated soil. The combined library of 472 clones contained 168 OTUs. Table 1 and Fig. 1 illustrate the distribution of clones among the OTUs in each soil.

Number of clones per OTU | Number of OTUs | |

Control | Zinc-contaminated | |

1 | 82 | 52 |

2 | 21 | 15 |

3 | 5 | 7 |

4 | 4 | 6 |

5 | 1 | 2 |

6 | 2 | 2 |

7 | 1 | 1 |

8 | 0 | 1 |

9 | 0 | 1 |

10 | 2 | 1 |

13 | 1 | 0 |

16 | 0 | 1 |

24 | 1 | 0 |

37 | 0 | 1 |

Number of clones per OTU | Number of OTUs | |

Control | Zinc-contaminated | |

1 | 82 | 52 |

2 | 21 | 15 |

3 | 5 | 7 |

4 | 4 | 6 |

5 | 1 | 2 |

6 | 2 | 2 |

7 | 1 | 1 |

8 | 0 | 1 |

9 | 0 | 1 |

10 | 2 | 1 |

13 | 1 | 0 |

16 | 0 | 1 |

24 | 1 | 0 |

37 | 0 | 1 |

The decrease in diversity was accompanied by a decrease in sample evenness: in the control soil 86% of OTUs were represented by one or two clones whereas in the contaminated soil only 67% of OTUs were ‘singletons’ or ‘doubletons’. Concurrently, the more abundant OTUs, which tended to be common to both soils, became even more abundant in the contaminated soil, compounding the fall in evenness caused by the loss of rare groups. Eight clones from the most abundant OTU were sequenced. They possessed a mean sequence similarity of 89%. All were high-G+C Gram-positive bacteria of the *Rubrobacter radiotolerans* group most closely matching environmental clones obtained from agricultural soils [5,17. Summarising, the chronic stress imposed by zinc produced two trends: a dilution of already rare OTUs and an increase in dominance of already abundant OTUs. Diversity measures enable us to further investigate these sample differences.

## Application of the diversity measures to the test data

As outlined above, Magurran [13] grouped diversity measures into three categories: species richness indices; indices based on the proportional abundances of species, which can be sub-divided according to whether they are influenced more by changes in evenness or dominance; and species abundance models. Species abundance models vary in the evenness of the distributions they describe, and are compared with the observed distribution to determine which, if any, fit.

The Shannon index introduces a problem for classification; while it is a proportional abundance index it can also be considered primarily a richness index. It seeks to “crystallize richness and evenness into a single figure”[13]. Indeed, Magurran characterised it as a richness measure in two summary tables ([13], tables 4.4 and 4.5). Classifying *H*′ as a richness index requires that the emphasis of the second category be changed subtly to evenness and dominance indices based on proportional abundances. The general formulae of the diversity measures applied to each dataset are given in Table 2.

Unless stated in the key, formulae are from Magurran [13] who also provides worked examples of most. Common symbols are: *S*, the number of OTUs in the sample; *S**, the predicted total number of OTUs; *N*, the total number of clones in the sample; and *n*, the number of clones in an OTU. References listed in this table can also be found in the reference list [18,22,23.

Results for the first two categories using the test data from the control and zinc-polluted soils are given in Table 3 and the goodness of fit of the data to the abundance models is given in Table 4. The reduced diversity of restriction fragment length polymorphism (RFLP) seen in the zinc-contaminated soil sample was reflected by all the diversity indices. The standard deviations of the log series index and the Shannon index are provisional because not all species in the population were sampled, a major source of error especially with *H*′. Richness-based indices were somewhat more discriminating with this data than evenness/dominance indices. Within the latter group the Shannon evenness and Lloyd and Ghelardi indices, which are both more sensitive to changes in evenness, were lowered by the reduction in number of rare OTUs in the polluted soil, while the Simpson's and Berger–Parker indices, which are sensitive to changes in dominance, were raised by the further increase in abundance of already abundant OTUs.

Diversity measure | Control | Zinc |

Species richness indices | ||

Recorded OTUs (S) | 120 | 90 |

OTU richness estimators (S*) | ||

Negative exponential function | 198 | 130 |

Two-parameter hyperbola | 323 | 197 |

Chao 1 | 280 (51) | 180 (35) |

Margalef index (D_{Mg}) | 21.8 | 16.3 |

Log series index (α) | 97.7 (8.9) | 53.1 (5.6) |

Log normal index (λ) | 409 | 257 |

Q statistic (Q) | 74.3 | 40.5 |

Shannon index (H′) | 4.33 (0.07) | 3.91 (0.08) |

Evenness and dominance indices | ||

Shannon evenness index (E) | 0.90 | 0.87 |

Lloyd and Ghelardi index (J) | 0.96 | 0.84 |

Simpson's index (D) | 0.020 | 0.037 |

Berger–Parker index (d) | 0.10 | 0.16 |

Diversity measure | Control | Zinc |

Species richness indices | ||

Recorded OTUs (S) | 120 | 90 |

OTU richness estimators (S*) | ||

Negative exponential function | 198 | 130 |

Two-parameter hyperbola | 323 | 197 |

Chao 1 | 280 (51) | 180 (35) |

Margalef index (D_{Mg}) | 21.8 | 16.3 |

Log series index (α) | 97.7 (8.9) | 53.1 (5.6) |

Log normal index (λ) | 409 | 257 |

Q statistic (Q) | 74.3 | 40.5 |

Shannon index (H′) | 4.33 (0.07) | 3.91 (0.08) |

Evenness and dominance indices | ||

Shannon evenness index (E) | 0.90 | 0.87 |

Lloyd and Ghelardi index (J) | 0.96 | 0.84 |

Simpson's index (D) | 0.020 | 0.037 |

Berger–Parker index (d) | 0.10 | 0.16 |

Values in parentheses are standard deviations.

Species abundance model | Control | Zinc | Comments |

Log series | No (P<0.008) | Yes | “The small number of abundant species and the large proportion of ‘rare’ species (the class containing one individual [one clone per OTU] is always the largest) predicted by the log series model suggest that … it will be most applicable in situations where one or a few factors dominate the ecology of a community.” ([13], p.18) |

Log normal distribution^{a} | No (P<0.005) | Yes | When OTUs are grouped, using a log scale, according to the number of clones that they contain (e.g. 1, 2, 3–4, 5–8…) the number of OTUs per group is normally distributed. This is a more even distribution than the log series and will be produced by the random variation of many independent factors that influence a large and diverse community [13,24. |

Broken stick model | No (P<0.001) | No (P=0.05) | This distribution would be created by randomly breaking a stick into pieces, and would arise from sampling a uniform distribution. It indicates “…that some major factor is being roughly evenly apportioned among the community's constituent species…” ([24], p. 94). It is a considerably more equitable division of niche space, or a limiting resource, than either of the above models. |

Overlapping niche model | No (P<0.001) | No (P<0.001) | This distribution would be created if a separate, equal-length stick (or piece of uncooked spaghetti) was picked for each OTU, and the size of each OTU's niche space was proportional to the length of the centre piece produced by breaking the spaghetti in two random positions. “Species are assumed to be independent of one another and … each takes what it needs” ([21], p. 27). It is the most equitable of the four models. |

Species abundance model | Control | Zinc | Comments |

Log series | No (P<0.008) | Yes | “The small number of abundant species and the large proportion of ‘rare’ species (the class containing one individual [one clone per OTU] is always the largest) predicted by the log series model suggest that … it will be most applicable in situations where one or a few factors dominate the ecology of a community.” ([13], p.18) |

Log normal distribution^{a} | No (P<0.005) | Yes | When OTUs are grouped, using a log scale, according to the number of clones that they contain (e.g. 1, 2, 3–4, 5–8…) the number of OTUs per group is normally distributed. This is a more even distribution than the log series and will be produced by the random variation of many independent factors that influence a large and diverse community [13,24. |

Broken stick model | No (P<0.001) | No (P=0.05) | This distribution would be created by randomly breaking a stick into pieces, and would arise from sampling a uniform distribution. It indicates “…that some major factor is being roughly evenly apportioned among the community's constituent species…” ([24], p. 94). It is a considerably more equitable division of niche space, or a limiting resource, than either of the above models. |

Overlapping niche model | No (P<0.001) | No (P<0.001) | This distribution would be created if a separate, equal-length stick (or piece of uncooked spaghetti) was picked for each OTU, and the size of each OTU's niche space was proportional to the length of the centre piece produced by breaking the spaghetti in two random positions. “Species are assumed to be independent of one another and … each takes what it needs” ([21], p. 27). It is the most equitable of the four models. |

Chi-square analyses were used to test the fit of the data to each by comparing observed with expected numbers of OTUs in five frequency abundance classes (upper boundaries of 1.5, 2.5, 4.5, 8.5 and 64.5 clones per OTU).

Fitted to the truncated log normal distribution.

The distribution of OTUs found in the control sample was not closely described by any of the theoretical distributions (Table 4 and Fig. 1a). This was because it had more singleton OTUs and fewer moderately frequent OTUs (three to eight clones each) than predicted by any model. These differences were less pronounced in the zinc-contaminated soil. As a result its OTU abundance distribution fit the log and the log normal models reasonably well (Fig. 1b), but was still too uneven (i.e., still too many singletons and too few moderately frequent OTUs) for the broken stick and overlapping niche models. A judgement about whether the fit of the log or log normal was better would be premature since both look similar for distributions that are so strongly skewed by singleton or doubleton OTUs, a consequence of low coverage.

The diversity measures and abundance models vary in their usefulness with such diverse communities. Some are more suited for use with limited-coverage datasets while others are ‘waiting in the wings’ for use with larger samples that provide at least 50% coverage of OTUs. Some are better with ARDRA since they minimise the limitations of the method. And, most importantly, some will generally be more reliable and sensitive indicators of ecologically relevant changes in bacterial diversity. Below, we assess the characteristics and applicability of each with respect to our own case study and other data.

### Richness indices

When a library of clones is analysed and coverage is low, the observed number of species (*S*) is not a reliable index of richness since it is influenced by the evenness (i.e., the OTU abundance distribution) of the sample. *S* values can only be compared if the OTU accumulation curves of the samples show that an asymptote has been reached [25]. Gotelli and Colwell [25] note, however, that when the accumulation curves do not plateau they themselves may often be compared, but using them to assign a rank order of richness to samples is dangerous since the curves may cross once or even twice as sample size increases [25,25. To obtain an estimate of the total number of OTUs in the sample (*S**) three approaches can be used ([19], see also [6]).

Firstly, species accumulation curves can be extrapolated. Two asymptotic richness estimators, the negative exponential function and the two-parameter hyperbola, fit the accumulation curves of the control and zinc-polluted soils quite well, the latter being better. Each soil's mean accumulation curve was obtained by randomising the input order of clones 100 times, but it can more easily be derived by rarefaction using the RAREFACT.FOR programme (C.J. Krebs, Dept. of Zoology, University of British Columbia [http://www.biology.ualberta.ca/jbrzusto/rarefact.php]).

Secondly, parametric models can be used. Fitting the truncated log normal model to each dataset generated estimates of the total number of OTUs for each soil. However, as discussed below, the method is unreliable with such incomplete distributions and indeed produced unrealistically low values of 140 for the control and 108 for the polluted soil (see Fig. 2). The equation for the log series distribution, which is non-asymptotic, described the sampling distributions very well (Fig. 2). This is not unexpected since May [24] points out that when relatively small samples are taken from a large area or, as in this case, volume, the species accumulation curve will likely obey the log series. The curve was produced using *S*=*α*1*n*(1+*βN*) ([24]; eq. 6.1]), adjusting *α* and *β* to obtain a line of best fit. It also closely fitted the accumulation curves from two other studies of bacterial diversity ([1], B.F. Moffett, unpublished data). A non-asymptotic method may inherently be more appropriate in soil, due to the continuity and range of the soil's spatial and temporal gradients. By not providing a value for *S** it also removes the temptation of prematurely estimating total richness with modest samples. Over intermediate ranges, up to perhaps twice the number of sampled clones, it should accurately predict the number of new OTUs that would be found.

The third group of estimators are non-parametric [19,26. These were designed for use with very diverse samples and use the frequency of occurrence of rarer OTUs to derive *S**. Chao 1 uses singletons and doubletons while the abundance-based coverage estimator (ACE) uses OTUs with one to 10 clones each. When used with diverse samples they provide a lower limit of total diversity [25]. Both are readily calculated using the EstimateS programme (version 5; R.K. Colwell, Dept. of Ecology and Evolutionary Biology, University of Connecticut [http://viceroy.eeb.uconn.edu/estimates]). They have recently been investigated in detail by Hughes et al. [6] who concluded that they “show particular promise for microbial data and in some habitats may require sample sizes of only 200 to 1,000 clones to detect richness differences of only tens of species”. Hughes et al. [6] found that Chao 1 stabilised as sample size increased in the three dissimilar bacterial data sets tested. By contrast, ACE did not plateau. We tested Chao 1 in the same way and found that *S** stabilised with the control soil but not with the zinc-contaminated soil. ACE did not stabilise with either.

The total number of OTUs predicted in the zinc and control samples by all the estimators was a few hundred; lower than expected. We assumed that this was caused by the limited resolution of ARDRA. However, in Hughes et al.'s [6] estimation of bacterial diversity in two grassland soils analysed by McCaig et al. [5], where OTUs were defined with greater resolution, by a 3% difference in partial 16S rDNA sequences, total estimated richness was still in the hundreds, at 467 and 590. Where have all the OTUs gone (see [3])? Only the analysis of larger samples will resolve this intriguing finding.

Two general OTU richness indices, the Margalef index (*D*_{Mg}) and the log series index (*α*) require only *S* and *N* for calculation. *D*_{Mg} has a linear response to a change in *S*/*N* ratio. When equal-sized samples are compared, as should be done when coverage is low, *D*_{Mg} will closely correlate with the *S*/*N* ratio, effectively making it redundant (the same applies for the very similar Menhinick's index (see [13])). *α* is a fitted constant in the sequential equations of the log series that predict the number of OTUs per abundance category. It is easy to calculate and responds approximately exponentially to changes in *S*/*N* ratio, making it much more sensitive than *D*_{Mg} to changes in OTU richness. It also stabilised more with increasing sample size (Fig. 3). In particular, the relative difference between samples (their ratio) was very stable above about 40 clones. One drawback with *D*_{Mg} and *α* is that they do not detect changes in evenness if the overall *S*/*N* ratio remains the same. While this potential blindness should be kept in mind it is likely that any change in evenness will also alter the value of *S* in samples from a diverse population. Based on its performance with various datasets Magurran [13] recommended *α* as a possible universal diversity statistic. Krebs [8] also recommended it as a useful index even for populations that do not possess a exponential distribution but noted that there is a disagreement about its usefulness due to variability in its goodness of fit to data and whether it has a sound theoretical base (see [24]).

By contrast, the log normal index (*λ*) has theoretical appeal but has practical drawbacks. *λ* is the ratio of two estimates, the predicted total number of OTUs (*S**) divided by the standard deviation (SD) of the predicted mean OTU abundance. Thus, it will increase if *S** increases and if the SD decreases (i.e., when evenness increases). Use of the truncated log normal model, a method of fitting the log normal distribution to incomplete data, is complicated. Also the very asymmetric, incomplete (veiled) distributions that result when modest or even quite large samples are taken from such diverse populations make the derivation of its two parameters inaccurate. Krebs [8] stressed that the truncated log normal is only reliable if there is some evidence of a peaking of the species abundance curve, and Pielou [21] recommended that *S** should only be estimated if the truncated log normal distribution fits the observed distribution well. Colwell and Coddington [19] raise other problems. Hence, our estimates of *λ* are illustrative.

The *Q* statistic uniquely measures the accumulation rate (the slope of the cumulative OTU abundance curve) of OTUs of intermediate abundance. Since it ignores 25% of OTUs with the lowest abundance it is less weighted than other richness indices by the long tail of singletons that will always occur. It also excludes the 25% most abundant species, which may be either good or bad: good if common OTUs are predominantly ephemeral *r*-selected species utilising readily degradable substrates, but bad if the dominants are *r*-selected species indicative of stress [27,28 or *K*-selected dominants whose exclusion would simply mean a loss of valuable data. Stress simultaneously causes some species to become relatively common while many others become marginalised. For example, Burkhardt et al. [29] found that heavy metal contamination led to an increase in abundance of bacteria that possessed relatively common degradative capabilities while concurrently causing a decline in abundance of bacteria that possessed unusual degradative capabilities. We found a similar pattern of a concentration of already abundant OTUs and a dilution of already rare OTUs (see [14]). This process of polarisation would cause a broadening of the range of abundance classes over which OTUs of intermediate abundance were found, lowering the value of *Q*. Therefore, *Q* may be a unique and valuable index for the detection of stress. Unfortunately, as with *λ*, the values derived here are inaccurate due to the overwhelming predominance of rare OTUs. In general, the *Q* statistic cannot be reliably derived when the samples are dominated by singletons, especially when they account for more than 25% of OTUs, or if less than 50% of all OTUs are sampled [13].

As indicated in Section 1, the Shannon index (*H*′) is the most-commonly used diversity statistic, even though its meaning is not very clear and its use with diverse bacterial communities problematic. Reiterating, it is a measure, but not a probability, of the difficulty in predicting the identity of the next analysed clone (i.e., the *N*+1st clone) and it is positively correlated with diversity and evenness [8,13. It gives more weight to rare than to common OTUs making it more sensitive to absolute (but not relative) changes in their abundance. As an aid to understanding, Usher [9] recommended comparing the observed value of *H*′ with the theoretical extremes that could be obtained using the same values of *S* and *N*. For the control soil these would be 3.1, for a community with one dominant OTU and all others singletons, and 4.79, if all OTUs were equally common. For the zinc-contaminated soil the respective values would be 2.36 and 4.50. More useful is expressing *H*′ as *e*^{H′} (10^{H′} with log_{10} and 2^{H′} with log_{2}) since it gives the number of equally common OTUs required to attain the same value of *H*′. For the contaminated soil *e*^{H′} was 50, indicating that due to unevenness it possessed a value equivalent to 50 equally abundant OTUs in a sample of 236.

The main problem is that the *H*′ value calculated from any sample will be an underestimate of its true value due to incomplete coverage. Fig. 4 shows the effect of increasing sample size upon *H*′ for the data from this study as well as data from Godon et al.'s comprehensive sequence-based survey of archaeal and bacterial diversity in an anaerobic digester (139 OTUs among 556 clones) [1] and data from two soil samples, analysed using ARDRA of the first 530 bp, of the extractable bacterial fraction from a plant species-rich floodmeadow (175 OTUs among 245 clones in one and 155 OTUs among 231 clones in the other) [B.F. Moffett, unpublished data]. Perhaps 500–1000 clones need to be sampled to approach a plateau. Extrapolation to estimate the true *H*′ is not possible because the trajectory of the *H*′ versus *N* curve will be different for each sample depending on its OTU abundance distribution. Additional underestimation may occur when ARDRA is used instead of sequencing due to its lower resolution, but this error is minimal if the entire 16S rRNA gene is digested. Sequencing brings its own problem of deciding upon the level of similarity used to define OTUs. Watve and Gangal [30] showed that *H*′ was more sensitive to a change in this threshold level than was Simpson's *D*.

A fundamental concern about *H*′ is that it may simply be superfluous. Hill [31] showed how the Shannon index, expressed as *e*^{H′}, forms part of a continuum of evenness/dominance indices that differ only in the weighting they give to rarer species. Its value will always be bounded by *S*, which is weighted by rare OTUs, and by the reciprocal of Simpson's index. For the control soil these number series were 120, 76 and 49 for *S*, *e*^{H′} and 1/*D*, respectively. Since the Shannon index is essentially an intermediate between *S* and *D* he concluded that it conveys little extra information. Ironically though, this very intermediacy could be regarded as its strength. Because both *S* and *D* give strong weighting to rare and dominant components, respectively, they are polarised. By contrast *H*′ gives a more moderate and broad weighting to rare and intermediate abundance OTUs with respect to the dominants (Fig. 5). Unexpectedly then, *H*′ may actually be quite a useful, while not totally comprehensible, general index that is tuned to be more sensitive to changes in abundance of the rare groups. Since the values of *H*′ from a set of replicate samples will be normally distributed [13] this also allows a *t*-test to be used.

### Evenness and dominance indices

Because the Shannon evenness index (*E*) is derived from *H*′, the meaning of its value is also unclear. Shannon evenness gives the ratio of *H*′ to the maximum possible value of *H*′ that could theoretically be obtained with the observed number of OTUs. This maximum value of *H*′ occurs when all OTUs have equal abundances and equals ln (*S*). Like *H*′, *E* is also more sensitive to changes in evenness of rare OTUs: an increase in abundance of a rare OTU will raise *E* more than the equivalent reduction in abundance of a dominant OTU. Just as *e*^{H′} is preferable to *H*′, a more understandable expression of *E* is *e*^{H′}/*S*. This gives the ratio of the Shannon index, expressed in terms of its equivalent number of equally common OTUs, to the actual number recorded. For the contaminated soil this would be 50/90=0.55, revealing that while 90 OTUs were found the unevenness in their abundances gave it the equivalent value of only 55% of that number of equally abundant OTUs. For the control soil *E* by this definition was 0.63 (76/120). When coverage is incomplete, *E* suffers from the same, but opposite, systematic bias as *H*′: it will always be an overestimate of the true value [32]. In fact it showed less sign of stabilising with increased sample size than did *H*′ (Fig. 6).

The Lloyd and Ghelardi index (*J*) was proposed as a refinement of *E* because it assumes that the most equitable distribution that could realistically be expected would be that of the broken stick model and not the completely equal, uniform distribution assumed by *E*. The diversity of RFLPs found by Dunbar et al. [2] in rhizosphere and ‘interspace’ arid, pine–juniper woodland soils translated to *J* values of 1.17–1.41 and the *J* values of partial 16S rDNA sequences found by McCaig et al. [5] in soil from natural and improved grassland plots were 1.41 and 1.37, respectively. That is, the index suggests that due to extraordinary evenness the samples contained up to 1.41 times the number of OTUs than would be expected from a broken stick distribution. However, these values are unreliable because inadequate coverage will also produce very high values for *J*[20]. Also, even though the control soil had a *J* value close to 1.0, the implicit conclusion that it also had a close fit to the broken stick distribution was not true. The Lloyd and Ghelardi index was, therefore, no improvement upon *E* and may even be misleading. Bulla's evenness index [33], was briefly checked since it is sensitive to changes in the proportional abundance of rare species. Unfortunately the index is so heavily weighted by singletons, interpreting an increase in their number as a sign of increased marginalisation of species, and thus greater inequality, that it erroneously rated the zinc-contaminated site as having greater evenness than the control. It does not seem reliable when diversity is very high and coverage poor.

Simpson's index (*D*) gives a strong weighting to the dominants. It is also easily understood: *D* gives the probability that two clones chosen at random will be from the same OTU. And the even narrower Berger–Parker index (*d*) is simply the relative abundance of the most abundant OTU. It was supported by May [24] as being as good as or better than any other index for characterising a distribution. An advantage of both *D* and *d* is that they should converge relatively quickly toward their true values as the number of clones increases. Using randomised subsets from the control and zinc-contaminated samples we found that *d* reached a plateau at 100 clones in both samples while *D* stabilised at 100 clones with the zinc soil and at 160 clones with the control soil. Both are suitable for use with ARDRA so long as the homogeneity of the dominant(s) is confirmed. Watve and Gangal [30] showed, however, that *D* was less sensitive to reductions in diversity than was *H*′ in four datasets. The question of the utility of *d* and *D* for bacteria in soil may really depend on whether changes among dominant OTUs are ecologically relevant, as discussed with the *Q* statistic.

### Species abundance models

Species abundance models are more sophisticated tools to investigate diversity because they examine the distribution of abundances in a population rather than distilling all this information down into a single number. Four models were tested, ranging downward in equitability from the niche overlap model to the log series. The imposition of a major stress would likely push the OTU abundance distribution toward a less equitable form. While no model fit the OTU distribution in the control soil, due to underestimation of the number of singletons and overestimation of moderately frequent OTUs, the log series and log normal distributions did fit the data from the contaminated soil (Table 4).

The log series also fit several other distributions. When applied to Godon et al.'s [1] anaerobic digester sample it accurately described the observed distribution of 139 OTUs among 556 clones (*χ*^{2}=3.8, d.f.=5, *P*=0.58). It also fit the two samples of bacteria from a floodmeadow soil (*χ*^{2}=2.6, d.f.=3, *P*=0.45 for the sample with 175 OTUs in 245 clones and *χ*^{2}=4.4, d.f.=4, *P*=0.35 for the sample with 155 OTUs in 231 clones) [B.F. Moffett, unpublished data]. The fit of the log series to all these datasets suggests that we may actually be looking at sampling distributions: as mentioned above, when relatively small samples are taken from a large population the species accumulation curve will likely obey the log series [24].

Because the log series was the least equitable of the four, another and even less equitable model, the geometric series, was subsequently tested to check its performance. The geometric series describes the simple partitioning of one limiting resource, an unrealistic expectation in soil. Its fit was indeed poor with all the datasets, always greatly underestimating the number of singletons.

The log normal distribution is intuitively appealing, especially in soil, because it will result when many factors, and their random variation, control a variable. It also fit Godon et al.'s anaerobic digester distribution (*χ*^{2}=7.0, d.f.=4, *P*=0.13), but did not fit either of the two floodmeadow samples, although the discrepancies were not large. Intriguingly, Curtis et al. [7] has exploited the properties of the log normal distribution to enable total OTU diversity of a sample to be estimated for different breadths of the distribution by using the ratio of *N*/*n*_{max}, the inverse of the Berger–Parker index.

Niche overlap between bacterial species may allow an even more equitable distribution than the log normal to exist; that is, somewhere between the log normal and the broken stick or niche overlap models. The broken stick model reflects the distribution of abundances that would arise from sampling a uniform distribution (i.e., in which all OTUs are equally abundant), while the overlapping niche model assumes that species are independent of one another and not constrained by sharing some limiting resource and has, therefore, been considered unrealistic. Neither of the models matched any of the datasets, including the digester and floodmeadow samples. Even so, they may be useful as theoretical benchmarks to estimate the level of niche overlap, and hence functional redundancy, that exists.

A recent distribution model based on self-similarity (a pattern, such as a fractal, is self-similar if it does not vary with spatial scale) has a skew towards improbable events and so predicts a higher abundance of infrequent species than other models [34]. It will be interesting to see how well it fits more comprehensive samples of diverse environments.

The predominance of singletons and doubletons is a major problem that prevents a reliable assessment of the closeness of fit of the models. The cause, a lack of coverage, suggests that if more clones were analysed the full, symmetrical distribution would be revealed. However, Colwell and Coddington [19] note that even exhaustive sampling will leave many species represented by only a few specimens or a single specimen. Unfortunately, a substantially incomplete survey looks very much the same as a substantially complete one, in terms of persistence of singletons [19]. Thus the processing of 1000 or more clones may still produce a skewed, veiled distribution that is difficult to assign to one of the models with confidence.

### Indices combining proportional abundances with phylogenetic diversity

If sequencing is used to define OTUs, indices that combine abundance weightings with a measure of phylogenetic diversity can be used as principal measures. These include the dissimilarity-based index (*D*_{mean}) of Watve and Gangal [30], the quadratic entropy index (*Q*) of Izsák and Papp [35] and the phylogenetic moment measure (*M _{R}*) of Horn et al. ([36], cf. [37]; the programme to calculate it is available from Mark Horn, CSIRO Mathematical and Information Sciences, GPO Box 664, Canberra, 2001, Australia).

*D*_{mean} is a measure of the mean level of sequence dissimilarity between clones. It can also be used with RFLPs using band sharing coefficients but the results are unreliable [K. Walsh, unpublished data]. The quadratic entropy index is the sum of a product calculated for every possible pairwise combination of OTUs. For each pair the product is the proportional abundance of one OTU multiplied by the proportional abundance of the other multiplied by the percent sequence dissimilarity between the two. Thus, it is similar to Simpson's index with an additional phylogenetic weighting. But unlike Simpson's *D* it does not give disproportional weighting to the dominants. The phylogenetic moment (*M _{R}*) is the most advanced of the three and can be used as a measure of the phylogenetic diversity within each sample and/or as a measure of the efficiency with which the clones in each sample represent the overall diversity found among all clones from all samples combined.

*M*has appeal because it appears to be positively weighted by the evenness of the phylogenetic diversity.

_{R}## Conclusions

Of course, there are other diversity indices (e.g., [31,38) and, more importantly, abundance models [39–41 that need to be evaluated. In this review we have investigated the general suitability of the tested measures with diverse bacterial communities. Our preferences, based on a mix of theory, performance and some intuition are listed below.

Non-asymptotic extrapolators of species accumulation curves, such as the equation for the log series distribution, seem better than asymptotic extrapolators. Non-parametric richness estimators, especially Chao 1, are promising and need to be tested on larger samples.;

The log series index (

*α*), recommended by both Magurran and Krebs [8,13, discriminated well between sites due to its high sensitivity to*S*/*N*ratio. The log series from which it is derived also fit most datasets tested.;The

*Q*statistic is very appealing because it uniquely focuses on OTUs of intermediate abundance, but low coverage makes it unreliable. It may be able to be used with larger samples or denaturing gradient gel electrophoresis data.;Irrespective of its faults, the Shannon index (

*H*′) seems a useful general diversity index that is influenced by both richness and evenness and is more sensitive to changes in abundance of the rare groups. The meaning of the number is more comprehensible when expressed as the exponential. The underestimation caused by incomplete coverage can be minimised by analysing several hundred clones per sample and by only comparing samples of equal size. The Shannon evenness index (*E*) is, likewise, more sensitive to rare OTUs. As with*H*′ its meaning is clearer when the exponential (*e*^{H′}/*S*) is used. Low coverage causes overestimation.;By contrast, the Berger–Parker index (

*d*) and Simpson's index (*D*) are heavily weighted by the dominant(s). Both are easily understood and are much less affected by coverage. Their worth may depend primarily on the ecological relevance of changes in abundance of dominant OTUs.;Abundance models should be used even when coverage is low because they address the whole distribution, aiding comparison by revealing overall trends as well as specific changes in particular abundance classes. They thus avoid the over-simplification and mystery that can accompany indices.

Finally, Bengtsson [42] cautions that “it is naı¨ve to contemplate that one single number – species richness, a diversity index, the number of functional groups or connection – can capture the complex relationships and interactions between many species and the functions performed by these interactions” in soil. The indices and, in particular, the abundance models may reveal more if used to measure diversity within key functional groups, or when applied to smaller, *alpha* diversity (i.e., homogeneous) habitats, such as microaggregates, primary root rhizospheres and worm casts, where a reasonable level of coverage may be obtained.

## Acknowledgements

We acknowledge the financial support of Professor John McGinnety for granting a sabbatical period (B.M.) and Professor Shawn Doonan during manuscript preparation (T.H.). We sincerely thank Kim Willis, Mark Horn and Tom Curtis for their invaluable instruction and advice about diversity measures.