Abstract

Deserts are not usually considered biodiversity hotspots, but desert microbiotic crust communities exhibit a rich diversity of both eukaryotic and prokaryotic life forms. Like many communities dominated by microscopic organisms, they defy characterization by traditional species-counting approaches to assessing biodiversity. Here we use exclusive molecular phylodiversity (E) to quantify the amount of evolutionary divergence unique to desert-dwelling green algae (Chlorophyta) in microbiotic crust communities. Given a phylogenetic tree with branch lengths expressed in units of expected substitutions per site, E is the total length of all tree segments representing exclusively desert lineages. Using MCMC to integrate over tree topologies and branch lengths provides 95% Bayesian credible intervals for phylodiversity measures. We found substantial exclusive molecular phylodiversity based on 18S rDNA data, showing that desert lineages are distantly related to their nearest aquatic relatives. Our results challenge conventional wisdom, which holds that there was a single origin of terrestrial green plants and that green algae are merely incidental visitors rather than indigenous components of desert communities. We identify examples of lineage diversification within deserts and at least 12 separate transitions from aquatic to terrestrial life apart from the most celebrated transition leading to the embryophyte land plants.

Microbiotic crust communities occur worldwide in arid habitats and include diverse photosynthetic and nonphotosynthetic organisms such as cyanobacteria, lichens, green algae, diatoms, bryophytes, fungi, and microarthropods. These surface-dwelling organisms often experience great extremes in environmental conditions such as moisture and temperature. In the Mojave Desert of North America, for example, summer soil temperatures can exceed 90°C, and rainfall can be under 45 mm per year (Rosentreter and Belnap, 2001), whereas crust organisms in Antarctica can experience summer temperatures that are below freezing (Broady, 1993). Crust communities are now considered important to nutrient cycling in deserts and are also known to play a key role in soil stabilization (Evans and Johansen, 1999; Schlesinger et al., 1996; West, 1990). A growing interest in understanding the ecological role of crusts has resulted in a need for more intensive characterization of the organisms of these communities.

It was thought until recently that the biodiversity of green algae (Chlorophyta) in desert crust communities was represented by a small number of species. This perception emerged because the vegetative stages of soil algae are morphologically simple, and the earliest studies on crust algae were based on light microscopy of these stages alone (Cameron, 1960, 1964; Metting, 1981; Shields and Drouet, 1962). Recent studies that incorporate data from additional life history stages, such as zoospores and gametes, demonstrate greater numbers of different green algae from desert soils (e.g., Flechtner et al., 1998). The true taxonomic affiliations of microscopic desert green algae are only now being established with the aid of nucleotide data, particularly 18S data (Lewis and Flechtner, 2002, 2004). Many of the familiar genera of unicellular green algae that occur in deserts, such as Chlorella, Chlamydomonas, and Chlorococcum, were shown to be polyphyletic using 18S rDNA data, and these results were congruent with ultrastructural data obtained from alternate life history stages (Buchheim et al., 1996; Friedl, 1995; Friedl and Zeltner, 1994; Huss and Sogin, 1990; Lewis et al., 1992; Watanabe and Floyd, 1989). These results highlight the extent to which morphological convergence of the vegetative morphology obscures genetic diversity and illustrate the ability of 18S data to uncover broad scale phylogenetic relationships within the green algae.

Quantifying the biodiversity of microscopic taxa can be problematic because most groups of microscopic organisms are poorly studied, and traditional biodiversity measures often rely on the assignment of taxa to known species (Hughes et al., 2001). Phylogenetic diversity (or phylodiversity) provides a surrogate measure that more accurately portrays the underlying genetic diversity of organisms, and is standardized across the variety of life histories, reproductive strategies, and morphological variability that create problems with traditional species-counting measures. The concept of phylodiversity (PD) was proposed as a conservation biology tool in order to quantify the phylogenetic heritage captured by the organisms in a particular geographic area (Faith, 1992). The original measure was based primarily on parsimony analysis of morphological data, but one example in Faith's original paper involved mitochondrial DNA, and this measure has been recently adapted for use with model-based phylogenetic methods (Shaw et al., 2003).

These quantitative measures of phylodiversity are based on the path length of the subtree connecting contemporary organisms of interest. This original definition of PD is appropriate for answering questions involving the phylogenetic heritage of a geographic area of conservation interest. Recent theoretical work has bolstered this application of PD (Steel, 2005) by showing that it is straightforward to find subset(s) of taxa maximizing PD. PD is also useful for comparing divergence among monophyletic groups (Shaw et al., 2003). It fails, however, to adequately address questions involving the importance of groups that are neither monophyletic nor geographically circumscribed.

We introduce exclusive molecular phylodiversity (E; Fig. 1a) to be the sum of all branch lengths in a tree that support either individual taxa or clades composed exclusively of taxa in some group of interest. Inclusive molecular phylodiversity (I; Fig. 1b) is identical to a molecular version of Faith's (1992) PD where path lengths are measured as expected number of substitutions per site. The total tree length (T; Fig. 1c) provides an upper bound for both E and I. If the set of taxa representing the group of interest is S, and S′ is the complement of S (i.e., the set of all taxa in the study but not in the group of interest), then E may be unambiguously defined as T minus I′, where I′ (Fig. 1d) equals I computed for S′.

Figure 1

Measures of phylogenetic diversity for a subset of taxa S. (a) Exclusive phylodiversity (E) is the sum of only those branch lengths that subtend taxa in S, or clades solely composed of taxa in S; (b) traditional phylogenetic diversity (I) is calculated as the sum of branch lengths in the path connecting all of the taxa in S; (c) total tree length, T; (d) the inclusive phylodiversity (I) calculated for S′, the complement of S. Note that E equals T minus I', where I′ is I calculated for S′.

Figure 1

Measures of phylogenetic diversity for a subset of taxa S. (a) Exclusive phylodiversity (E) is the sum of only those branch lengths that subtend taxa in S, or clades solely composed of taxa in S; (b) traditional phylogenetic diversity (I) is calculated as the sum of branch lengths in the path connecting all of the taxa in S; (c) total tree length, T; (d) the inclusive phylodiversity (I) calculated for S′, the complement of S. Note that E equals T minus I', where I′ is I calculated for S′.

If the focal group is monophyletic, E is greater than I by an amount equal to the length of the branch subtending the group, but E can be much smaller than I when the group is highly polyphyletic. E is a more appropriate measure than I if the only evolutionary diversity of interest is that uniquely accrued by organisms occupying a particular habitat, and not the diversity accrued by their ancestors occupying other habitats. In this case, E measures the amount of evolutionary divergence that can be unambiguously attributed to organisms living in the focal habitat. Stochastic character mapping (Nielsen, 2002; Huelsenbeck et al., 2003) provides an alternative means of defining exclusive phylodiversity. This Bayesian definition measures exclusive phylodiversity as the posterior mean number of substitutions per site accumulated while in the focal habitat. We denote the definition based on stochastic character mapping ES to distinguish it from E.

Phylodiversity is not the only way to assess biodiversity in situations where species counting is not feasible. Faith (2004) introduced a uniqueness measure intended for use in making conservation decisions about individual taxa. Martin (2002) introduced two measures for addressing the diversity and differentiation of two communities that, like our phylodiversity measures, do not depend on identifying species. Martin's FST measure can be used to compare community diversity to the total diversity overall, and his P-test measures the degree of differentiation between two communities. The phylodiversity measures introduced here are not intended for comparing two communities, but they nevertheless are correlated with Martin's measures.

In this paper, we compare our phylodiversity measures with those of Martin (2002) and to the uniqueness measure introduced by Faith (2004). In addition, we present new 18S rDNA sequence data obtained from desert green algae and utilize Bayesian phylogenetic analyses to address basic questions about green algae living in desert soils. Did the transition from aquatic to terrestrial habitats occur just once, or have green algae adapted to desert life numerous times independently? Are the green algae in deserts simply representatives of widespread ecological generalists, or have they diversified in arid habitats and form desert clades? What fraction of the total diversity of all green algae do these desert lineages represent?

Materials and Methods

Sample Collection

Desert green algae were collected from distinct geographic locations in western North America, ranging from Baja California, Mexico, into California, New Mexico, and Utah, USA (Appendix 1). Algae were isolated from the soils using a dilution plating method, as described in Flechtner et al. (1998). Nucleotide sequence data of the small subunit ribosomal RNA gene (18S) of nine isolates of desert green algae were obtained using direct sequencing of PCR amplifications. Briefly, this included DNA extraction from unialgal isolates (cultures grown from a single cell) using a modified CTAB extraction method, followed by PCR amplification and sequencing using the primers SSU1, SSU2, N18G, N18H, C18G, C18H, and C18J (Lewis and Flechtner, 2002; Shoup and Lewis, 2003). Base calls in the consensus sequences were verified from individual sequencing reactions in both the forward and reverse orientations, or from duplicate sequencing reactions in the same orientation. Over 98% of base calls had at least twofold coverage. To confirm the absence of contaminant sequences, each newly obtained consensus sequence was subjected to a BLAST search (Altshul et al., 1990). The 18S sequence data for each isolate were deposited in the GenBank database (Table 1).

Table 1

Summary of the published 18S ribosomal RNA gene sequence data used in this study. Taxa are divided into their major taxonomic groups and their corresponding GenBank accession numbers are shown.

Taxon GenBank accession number 
Desert Taxa (new sequences)  
 CNP2VF11b AY271675 
 EM1VF1 AY271673 
 LG2VF30 AY271676 
 LG3VF20 AY271674 
 MX219VF21 AY614713 
 NB1VF11 AY614714 
 SRS2VF14 AY377441 
 ZNP2VF21 AY377440 
 ZNP3VF36 AY377439 
Desert Taxa (previously published)  
 BC2-1 AF516676 
 BC4VF9 AF516675 
 BC8-8 AF516674 
Cylindrocystis brebissonii BC9-8 AF115439 
 CNP1VF2 AF513378 
 CNP2VF25 AF516677 
 H1VF1 AF513369 
 LG2VF16 AF513372 
 SEV2VF1 AF516678 
 SEV3VF14 AF513371 
 SEV3VF49 AF513373 
 SRS2VF18 AF513375 
 UT8-26 AF513376 
 ZNP1VF32 AF513379 
Prasinophyceae  
Cymbomonas tetramitiformis AB017126 
Pterosperma cristatum AJ010407 
Halosphaera sp. AB017125 
Nephroselmis olivacea X74754 
Tetraselmis striata X70802 
Pseudoscourfieldia marina X75565 
Mantoniella squamata X73999 
Dolichomastix tenuilepis AF509625 
Picocystis salinarum AF153313 
Charophyceae and Embryophytes  
Chaetosphaeridium globosum AJ250110 
Chlorokybus atmophyticus M95612 
Cylindrocystis crassa AJ428080 
Coleochaete orbicularis M95611 
Coleochaete scutata X68825 
Desmidium grevillii AJ428117 
Klebsormidium flaccidum AF408240 
Marchantia polymorpha AY342318 
Mesostigma viride AJ250108 
Mesotaenium kramstai AJ428079 
Mougeotia scalaris X70705 
Nitella capillaries AJ250111 
Nitellopsis obtuse AF408226 
Peniumcylindrus AJ553930 
Raphidonema nivale AF448477 
Staurastrum sp. X74752 
Zygnema circumcarinata X79495 
Ulvophyceae  
Acrosiphonia duriuscula AB049418 
Enteromorpha intestinalis AJ000040 
Gloeotilopsis sarcinoidea Z47998 
Hazenia mirabilis AF387156 
Monostroma grevillei AF015279 
Pseudendoclonium basiliense Z47996 
Pseudoneochloris marina U41102 
Quadrigula closterioides Y17924 
Ulothrix zonata Z47999 
Urospora penicilliformis AB049417 
Ulva curvata AF189078 
Trebouxiophyceae  
Amphikrikos sp. AF228690 
Chlorella ellipsoidea X63520 
Chlorella minutissima X56102 
Chlorella fusca X56104 
Chlorella saccharophila X63505 
Choricystis minor X89012 
Coenocystis inconstans AB017435 
Eremosphaera viridis AF387154 
Fusochloris perforatum M62999 
Golenkinia longispicula AF499923 
Koliella spiculiformis AF278744 
Marvania geminata AF124336 
Micractinium pusillum AF499921 
Microthamnion kuetzingianum Z28974 
Muriella aurantica AB005748 
Myrmecia biatorellae Z28971 
Myrmecia israeliensis M62995 
Oocystis heteromucosa AF228689 
Parietochloris pseudoalveolaris M63002 
Planktosphaeria gelatinosa AY044648 
Pleurastrum insigne Z28972 
Pleurastrum terrestris Z28973 
Prasiola crispa AJ416106 
Prototheca wickerhamii X56099 
Radiofilum conjunctivum AF387155 
Stichococcus chodati AB055867 
Trebouxia asymmetrica Z21553 
Trebouxia impressa Z21551 
Trebouxia magna Z21552 
Trochiscia hystrix AF277651 
Watanabea reniformis X73991 
Chlorophyceae  
Ankistrodesmus stipitatus X56100 
Ankyra judayi U73469 
Asteromonas gracilis M95614 
Atractomorpha echinata U73470 
Bracteacoccus medionucleatus U63098 
Bracteacoccus giganteus U63099 
Bracteacoccus aerius U63101 
Bulbochaete hiloensis U83132 
Carteria obtuse AF182818 
Chaetopeltis orbicularis U83125 
Chaetophora incrassata D86499 
Characiosiphon rivularis AF395437 
Characium hindakii M63000 
Chlamydomonas baca U70781 
Chlamydomonas reinhardtii M32703 
Chlamydomonas humicola U13984 
Chlamydomonas noctigama AF008241 
Chlamydopodium vacuolatum M63001 
Chlorococcum cf. tatrense AF514407 
Chlorogonium euchlorum AJ410443 
Chlorogonium capillatum AJ410442 
Chloromonas reticulata AJ410448 
Chlorosarcinopsis minor AB049415 
Coelastrum microporum AF388373 
Cylindrocapsa geminella U73471 
Desmodesmus communis X73994 
Dictyochloris fragrans AF367861 
Dunaliella parva M62998 
Ettlia minuta M62996 
Fritschiella tuberosa U83129 
Gloeococcus maximus U83122 
Gongrosira papuasica U18503 
Haematococcus zimbabwiensis U70797 
Heterochlamydomonas inaequalis AF367857 
Hormotila blennista U83123 
Hormotilopsis gelatinosa U83126 
Hydrodictyon reticulatum M74497 
Lobochlamys culleus AJ410461 
Lobochlamys segnis AJ410464 
Mychonastes homosphaera AB025423 
Neochloris aquatica M62861 
Oedogonium cardiacum U83133 
Ourococcus multisporus AF277648 
Paulschulzia pseudovolvox U83120 
Pediastrum duplex M62997 
Planophila terrestris U83127 
Polytoma uvella U22940 
Pseudodictyosphaerium jurisii AF106074 
Scenedesmus obliquus X56103 
Scenedesmus pupukensis X91267 
Scenedesmus rubescens X74002 
Schizomeris leibleinii AF182820 
Spermatozopsis similes X65557 
Sphaeroplea robusta U73472 
Spongiochloris spongiosa U63107 
Stigeoclonium helveticum U83131 
Tetraspora sp. U83121 
Uronema belkae AF182821 
Volvox carteri X53904 
Taxon GenBank accession number 
Desert Taxa (new sequences)  
 CNP2VF11b AY271675 
 EM1VF1 AY271673 
 LG2VF30 AY271676 
 LG3VF20 AY271674 
 MX219VF21 AY614713 
 NB1VF11 AY614714 
 SRS2VF14 AY377441 
 ZNP2VF21 AY377440 
 ZNP3VF36 AY377439 
Desert Taxa (previously published)  
 BC2-1 AF516676 
 BC4VF9 AF516675 
 BC8-8 AF516674 
Cylindrocystis brebissonii BC9-8 AF115439 
 CNP1VF2 AF513378 
 CNP2VF25 AF516677 
 H1VF1 AF513369 
 LG2VF16 AF513372 
 SEV2VF1 AF516678 
 SEV3VF14 AF513371 
 SEV3VF49 AF513373 
 SRS2VF18 AF513375 
 UT8-26 AF513376 
 ZNP1VF32 AF513379 
Prasinophyceae  
Cymbomonas tetramitiformis AB017126 
Pterosperma cristatum AJ010407 
Halosphaera sp. AB017125 
Nephroselmis olivacea X74754 
Tetraselmis striata X70802 
Pseudoscourfieldia marina X75565 
Mantoniella squamata X73999 
Dolichomastix tenuilepis AF509625 
Picocystis salinarum AF153313 
Charophyceae and Embryophytes  
Chaetosphaeridium globosum AJ250110 
Chlorokybus atmophyticus M95612 
Cylindrocystis crassa AJ428080 
Coleochaete orbicularis M95611 
Coleochaete scutata X68825 
Desmidium grevillii AJ428117 
Klebsormidium flaccidum AF408240 
Marchantia polymorpha AY342318 
Mesostigma viride AJ250108 
Mesotaenium kramstai AJ428079 
Mougeotia scalaris X70705 
Nitella capillaries AJ250111 
Nitellopsis obtuse AF408226 
Peniumcylindrus AJ553930 
Raphidonema nivale AF448477 
Staurastrum sp. X74752 
Zygnema circumcarinata X79495 
Ulvophyceae  
Acrosiphonia duriuscula AB049418 
Enteromorpha intestinalis AJ000040 
Gloeotilopsis sarcinoidea Z47998 
Hazenia mirabilis AF387156 
Monostroma grevillei AF015279 
Pseudendoclonium basiliense Z47996 
Pseudoneochloris marina U41102 
Quadrigula closterioides Y17924 
Ulothrix zonata Z47999 
Urospora penicilliformis AB049417 
Ulva curvata AF189078 
Trebouxiophyceae  
Amphikrikos sp. AF228690 
Chlorella ellipsoidea X63520 
Chlorella minutissima X56102 
Chlorella fusca X56104 
Chlorella saccharophila X63505 
Choricystis minor X89012 
Coenocystis inconstans AB017435 
Eremosphaera viridis AF387154 
Fusochloris perforatum M62999 
Golenkinia longispicula AF499923 
Koliella spiculiformis AF278744 
Marvania geminata AF124336 
Micractinium pusillum AF499921 
Microthamnion kuetzingianum Z28974 
Muriella aurantica AB005748 
Myrmecia biatorellae Z28971 
Myrmecia israeliensis M62995 
Oocystis heteromucosa AF228689 
Parietochloris pseudoalveolaris M63002 
Planktosphaeria gelatinosa AY044648 
Pleurastrum insigne Z28972 
Pleurastrum terrestris Z28973 
Prasiola crispa AJ416106 
Prototheca wickerhamii X56099 
Radiofilum conjunctivum AF387155 
Stichococcus chodati AB055867 
Trebouxia asymmetrica Z21553 
Trebouxia impressa Z21551 
Trebouxia magna Z21552 
Trochiscia hystrix AF277651 
Watanabea reniformis X73991 
Chlorophyceae  
Ankistrodesmus stipitatus X56100 
Ankyra judayi U73469 
Asteromonas gracilis M95614 
Atractomorpha echinata U73470 
Bracteacoccus medionucleatus U63098 
Bracteacoccus giganteus U63099 
Bracteacoccus aerius U63101 
Bulbochaete hiloensis U83132 
Carteria obtuse AF182818 
Chaetopeltis orbicularis U83125 
Chaetophora incrassata D86499 
Characiosiphon rivularis AF395437 
Characium hindakii M63000 
Chlamydomonas baca U70781 
Chlamydomonas reinhardtii M32703 
Chlamydomonas humicola U13984 
Chlamydomonas noctigama AF008241 
Chlamydopodium vacuolatum M63001 
Chlorococcum cf. tatrense AF514407 
Chlorogonium euchlorum AJ410443 
Chlorogonium capillatum AJ410442 
Chloromonas reticulata AJ410448 
Chlorosarcinopsis minor AB049415 
Coelastrum microporum AF388373 
Cylindrocapsa geminella U73471 
Desmodesmus communis X73994 
Dictyochloris fragrans AF367861 
Dunaliella parva M62998 
Ettlia minuta M62996 
Fritschiella tuberosa U83129 
Gloeococcus maximus U83122 
Gongrosira papuasica U18503 
Haematococcus zimbabwiensis U70797 
Heterochlamydomonas inaequalis AF367857 
Hormotila blennista U83123 
Hormotilopsis gelatinosa U83126 
Hydrodictyon reticulatum M74497 
Lobochlamys culleus AJ410461 
Lobochlamys segnis AJ410464 
Mychonastes homosphaera AB025423 
Neochloris aquatica M62861 
Oedogonium cardiacum U83133 
Ourococcus multisporus AF277648 
Paulschulzia pseudovolvox U83120 
Pediastrum duplex M62997 
Planophila terrestris U83127 
Polytoma uvella U22940 
Pseudodictyosphaerium jurisii AF106074 
Scenedesmus obliquus X56103 
Scenedesmus pupukensis X91267 
Scenedesmus rubescens X74002 
Schizomeris leibleinii AF182820 
Spermatozopsis similes X65557 
Sphaeroplea robusta U73472 
Spongiochloris spongiosa U63107 
Stigeoclonium helveticum U83131 
Tetraspora sp. U83121 
Uronema belkae AF182821 
Volvox carteri X53904 

Phylogenetic Analysis of Green Algae

The newly obtained desert algae sequences were combined with previously published sequences from 14 other desert green algae, and from a broad representation of all orders of green plants for which 18S sequences have been published in GenBank, with the exception that embryophytes are represented only by the Marchantia polymorpha sequence. To ensure a conservative estimate of exclusive phylodiversity measures, the closest matching full 18S rDNA sequence for each desert isolate, as found from BLAST (Altshul et al., 1990) searches, was also included. The sequences and their corresponding taxa and GenBank accession numbers are listed in Table 1.

A final alignment of 150 taxa (23 desert taxa and 127 others from freshwater, marine, and soil habitats) was constructed initially in smaller subsets of taxa using ClustalW (Thompson et al., 1994) and then refined by eye. The 150-taxon alignment was 1839 nucleotides in length. Of 1839 sites, 188 were eliminated because of alignment uncertainty, leaving 1651 aligned sites of which 441 were parsimony informative and an additional 227 were variable but not informative. The alignment, MrBayes file, and resulting trees associated with this analysis are available as supplementary material at http://systematicbiology.org/.

Bayesian Phylogenetic Analyses

ModelTest 3.06 (Posada and Crandall, 1998) used in conjunction with PAUP 4b10 (Swofford, 2001) determined that the GTR+I+G model (Lanave et al., 1994; Gu et al., 1995) provided the best fit to the data according to both the likelihood-ratio test and the AIC criterion. Two independent runs were performed using the GTR+I+G (four-rate categories) model in MrBayes 3.0b4 (Huelsenbeck and Ronquist, 2001). Each run was started from an independent random starting tree and extended 25 million generations. Each run employed Metropolis-coupled MCMC (Geyer, 1991) using three heated chains (temperature parameter 0.2) in addition to the sampled (cold) chain. We used a flat Dirichlet prior for relative nucleotide frequencies and relative rate parameters, a discrete uniform prior for topologies, and an exponential distribution (mean 1.0) for the gamma-shape parameter and all branch lengths. We used MrBayes to construct a majority-rule consensus tree of 20,000 trees sampled from the last 10 million generations of each of the two runs (trees were sampled every 1000 generations). Convergence was assessed by comparing splits included in majority-rule consensus trees of each run separately. For continuous model parameters, we used Gelman and Rubin's estimated potential scale reduction approach (Gelman, 1996; Gelman and Rubin, 1992a, 1992b), which uses variation within and among independent MCMC runs to assess the degree to which the separate chains have converged.

Phylodiversity Measures

We calculated several measures related to Faith's original phylogenetic diversity statistic (Faith, 1992). The basic quantities calculated from phylogenetic trees were T (total tree length), E (exclusive phylodiversity), and I (inclusive or total phylodiversity, which is identical to Faith's original measure). E includes terminal branches associated with desert taxa and shared ancestral edges subtending clades of desert taxa. We also estimated ES using the program SIMMAP 1.0b1 (Bollback, 2004). ES is perhaps a more natural measure of exclusive phylodiversity given the Bayesian approach taken here, but we based most of our discussion on E because it is equally applicable in both Bayesian and frequentist (i.e., maximum likelihood) contexts.

Three combinations of these basic measures are useful and were also computed. First, PEI = E/I is related to the number of independent evolutionary transitions into the focal environment, which is “desert” in this study. PEI is 0 if each sample represents an independent transition to the desert environment and no detectable evolution has occurred following the transition. At the other extreme, PEI is at least 1 if only one transition to deserts is indicated by the phylogeny. Intermediate values of PEI indicate that more than 1 transition occurred and at least some substitutions have accrued after at least some of the desert lineages were established.

The quantity PET = E/T describes the proportion of the total evolutionary history that apparently occurred in the desert environment. PET is important in this study for distinguishing between two hypotheses: (1) desert algae are simply transient algal spores carried on the wind and dropped onto deserts; and (2) desert algae are representatives of true desert-endemic lineages of green algae. If the first hypothesis is true, PET is expected to be zero because the desert isolates would in this case be common widespread taxa most likely already represented in GenBank. Presumably, our practice of including the nearest sequence in GenBank (using BLAST scores) for each of the desert isolates would result in zero-length branches leading to that taxon, and each desert isolate is expected to be an independent transition to land. PET is expected to be greater than zero under the alternative hypothesis because desert taxa will have had time to accumulate lineage-specific substitutions. The value of PET thus bears direct witness to the importance of desert green algae for understanding the evolution of green algae in general. PET = 1 would indicate that all knowledge of green algae comes from desert-dwelling green algae, whereas PET = 0 would mean that desert green algae essentially contribute nothing to our knowledge of green algal evolution (i.e., desert taxa represent minor tip branches on the tree). Of lesser interest in this study is PIT = I/T, which measures the proportion of the total tree length accounted for by the inclusive phylodiversity. This measure is of use primarily in comparing our phylodiversity measures to the biodiversity measures of Martin (2002).

Finally, we were interested in how much evolution occurred on desert versus nondesert terminal branches. Terminal branches were of particular interest because species limits are often arbitrarily determined by the amount of evolutionary divergence separating contemporary organisms from their nearest relatives. Although our phylodiversity measures do not depend on any species definition, we were interested in whether, on average, desert green algae would be considered separate species. The average and median terminal branch lengths were recorded for both desert and nondesert taxa as a way of assessing this, and discussed in light of divergence values already observed for different species of desert green algae (Lewis and Flechtner, 2004).

Results and Discussion

Convergence in model likelihood was apparent in the two independent MCMC runs by 15 million generations. The majority-rule trees constructed from the last 10 million generations of the two runs differed primarily in the position of Koliella, which occupied either a position within the Trebouxiophyceae clade (Fig. 2) or a position outside of the branch leading to Trebouxiophyceae, Chlorophyceae, and Ulvophyceae. The phylogenetic position of Koliella has been previously investigated (Katana et al., 2001) and was shown to be a member of the Trebouxiophyceae. The trees were otherwise similar, disagreeing only about the inclusion of three splits, each with posterior probabilities less than 0.8. Gelman and Rubin's scale reduction parameter R was less than 1.14 for the tree length and all continuous GTR model parameters (Table 2). To put this value in perspective, a value of R = 1 is ideal, indicating that the parallel Markov chains are completely exchangeable, and a value much larger than 1 is unacceptable, indicating that credible intervals might be much larger than for a comparable situation in which the chains had converged. The values we obtained indicated acceptable convergence according to the rule of thumb offered by Gelman (1996). Hereafter, all discussion of the tree and phylodiversity measures are based on a combined sample comprising the last 10 million generations of each of the two independent MCMC runs.

Figure 2

Majority-rule consensus tree combining sampled trees from the last 10 million generations from each of two 25 million generation MCMC simulations. Details of the simulation conditions and substitution model are provided in the text. Desert lineages are bolded and classes of green algae labeled. A single embryophyte sequence is included (Marchantia polymorpha) and arises from within the Charophyceae clade. Numbers associated with branches represent posterior probabilities. Scale bar indicates the expected number of substitutions per site.

Figure 2

Majority-rule consensus tree combining sampled trees from the last 10 million generations from each of two 25 million generation MCMC simulations. Details of the simulation conditions and substitution model are provided in the text. Desert lineages are bolded and classes of green algae labeled. A single embryophyte sequence is included (Marchantia polymorpha) and arises from within the Charophyceae clade. Numbers associated with branches represent posterior probabilities. Scale bar indicates the expected number of substitutions per site.

Table 2

Estimates of GTR+I+G model parameters. The sample size for the posterior mean (Mean), 95% credible interval (2.5%, 97.5%) and Gelman-Rubin measure (R) is 20,000, comprising 10,000 samples taken during the last 10 million generations from each of two MCMC simulations. The MLE column resulted from maximizing the likelihood on the MCMC consensus tree. Parameters: T is the total tree length; r<sub>CT</sub>, r<sub>CG</sub>, r<sub>AT</sub>, r<sub>AG</sub>, and r<sub>AC</sub> are relative rate parameters of the GTR model (MrBayes fixes r<sub>GT</sub> = 1); π<sub>A</sub>, π<sub>C</sub>, π<sub>G</sub>, and π<sub>T</sub> are the relative nucleotide frequency parameters; α is the shape parameter of the gamma (α, 1/α) distribution of relative rates across sites; and pinvar is the proportion of invariable sites.

Parameter MLE Mean 2.5% 97.5% 
4.37050 10.7021 9.1200 12.3580 1.1385 
rCT 5.1468 5.7958 4.7430 6.6942 1.1276 
rCG 1.0497 1.2966 1.0417 1.5444 1.0409 
rAT 1.1564 1.3918 1.1350 1.6633 1.0079 
rAG 2.6609 3.6464 3.0437 4.2737 1.0077 
rAC 1.0735 1.5342 1.2041 1.8681 1.0299 
πA 0.2564 0.2332 0.2169 0.2492 1.0035 
πC 0.2079 0.2039 0.1905 0.2208 1.0547 
πG 0.2862 0.2854 0.2679 0.3021 1.0043 
πT 0.2494 0.2776 0.2608 0.2934 1.0066 
α 0.5551 0.4582 0.3739 0.5243 1.0488 
pinvar 0.3753 0.3226 0.2759 0.3614 1.0125 
Parameter MLE Mean 2.5% 97.5% 
4.37050 10.7021 9.1200 12.3580 1.1385 
rCT 5.1468 5.7958 4.7430 6.6942 1.1276 
rCG 1.0497 1.2966 1.0417 1.5444 1.0409 
rAT 1.1564 1.3918 1.1350 1.6633 1.0079 
rAG 2.6609 3.6464 3.0437 4.2737 1.0077 
rAC 1.0735 1.5342 1.2041 1.8681 1.0299 
πA 0.2564 0.2332 0.2169 0.2492 1.0035 
πC 0.2079 0.2039 0.1905 0.2208 1.0547 
πG 0.2862 0.2854 0.2679 0.3021 1.0043 
πT 0.2494 0.2776 0.2608 0.2934 1.0066 
α 0.5551 0.4582 0.3739 0.5243 1.0488 
pinvar 0.3753 0.3226 0.2759 0.3614 1.0125 

Although we did not perform a maximum likelihood search, we did obtain maximum likelihood estimates of the GTR model parameters on the majority rule consensus tree resulting from the MCMC analysis using PAUP* 4.0b10 (Swofford, 2001) (Table 2). The maximum likelihood estimates for nearly all parameters are smaller than the corresponding posterior means. This presumably reflects the effects of the prior distributions assumed for these parameters. For example, the mean branch length based on the maximum likelihood estimate of tree length is 4.37/271 = 0.016, whereas the mean branch length based on the posterior distribution is about twice this (10.93/297 = 0.037). Although the effect of the prior was not strong, increasing each branch length on average only about 0.02 substitutions per site, it is clear that the exponential branch length prior (which had mean 1.0) is exerting an influence. Importantly for this study, the phylodiversity ratios PET, PEI, and PIT appear much less sensitive to the effects of the prior distributions than tree length and other model parameters (discussed below).

The MCMC consensus tree of green algae (Fig. 2) illustrates that the transition from aquatic to desert habitats occurred numerous times independently in green algae, and from diverse phylogenetic backgrounds. Desert lineages have arisen in three of the five classes of green algae: Chlorophyceae, Trebouxiophyceae, and Charophyceae. All of these transitions arise from within clades of aquatic freshwater organisms; the predominantly marine classes Ulvophyceae and Prasinophyceae apparently lack desert representatives, even though Ulvophyceae does have terrestrial lineages (e.g., Trentepohlia).

Although the consensus tree in Figure 2 provides a point estimate of the number of transitions to land, credible intervals can be constructed using the trees sampled during the MCMC analysis. Using PAUP*4.0b10 (Swofford, 2001) we performed parsimony-based ancestral state reconstructions under both ACCTRAN and DELTRAN optimization for all 20,000 sampled tree topologies to obtain posterior probabilities of each possible number of gains and losses of terrestriality (Table 3). The combination with the highest posterior probability (0.3425) under ACCTRAN optimization (which favors reversals when there is homoplasy) was 14 gains, 4 losses; under DELTRAN (which favors parallelism when there is homoplasy), the most probable (0.4534) combination was 18 gains and no losses. There was no support under either optimization strategy for fewer than 12 transitions to terrestriality from aquatic lifestyles. We also used the SIMMAP 1.0b1 program (Bollback, 2004) to map this character using a two-state Markov model on all 20,000 trees sampled during the MCMC analysis. SIMMAP does not currently distinguish between forward (aquatic to terrestrial) and reverse changes, but the overall estimated number of transitions (17.6) was consistent with the most probable ACCTRAN and DELTRAN parsimony reconstructions.

Table 3

Posterior probabilities associated with alternate reconstructions of the transition to desert from aquatic habitat under ACCTRAN (A) and DELTRAN (B) optimization. Rows are transitions from nondesert to desert; columns are numbers of transitions from desert back to nondesert. Combinations with a posterior probability of zero are not shown. The combination with highest posterior probability is in bold type.

  Number of transitions from desert back to nondesert 
  
Number of transitions from nondesert to desert A. 
 12 — — — — 0.0001 0.0010 0.0022 
 13 — — 0.0018 0.0164 0.0636 0.0688 — 
 14  0.0067 0.0846 0.3144 0.3425 — — 
 15 0.0015 0.0147 0.0402 0.0391 — — — 
 16 0.0002 0.0017 0.0009 — — — — 
     B.    
 14 — 0.0072      
 15 0.0028 0.0707      
 16 0.0452 0.1465      
 17 0.2744 —      
 18 0.4534 —      
  Number of transitions from desert back to nondesert 
  
Number of transitions from nondesert to desert A. 
 12 — — — — 0.0001 0.0010 0.0022 
 13 — — 0.0018 0.0164 0.0636 0.0688 — 
 14  0.0067 0.0846 0.3144 0.3425 — — 
 15 0.0015 0.0147 0.0402 0.0391 — — — 
 16 0.0002 0.0017 0.0009 — — — — 
     B.    
 14 — 0.0072      
 15 0.0028 0.0707      
 16 0.0452 0.1465      
 17 0.2744 —      
 18 0.4534 —      

Desert lineages are divergent enough at the 18S rDNA locus to be considered distinct species. The posterior mean of the average length of a terminal branch leading to a desert taxon is 0.0246 substitutions per nucleotide site (95% credible interval 0.0198–0.0294). Although we do not advocate using a certain rule for the amount of sequence divergence needed to define species, this value corresponds to nearly 5% pairwise divergence and is far greater than what has been observed between different species of desert green algae (e.g., Lewis and Flechtner, 2004). The average length of a terminal branch leading to a nondesert taxon is 0.0507, roughly twice the length of an average branch leading to a desert taxon. This result is expected because of the inclusion of close relatives of desert taxa and broad taxon sampling across green algae otherwise.

In three cases (e.g., one in the class Charophyceae and two in the Chlorophyceae), lineages without apparent aquatic close relatives are evident, implying that novel organisms are being recovered from these understudied communities. These new lineages are themselves tremendously diverse. For example, isolates EM1VF1 and SEV3VF14 exhibit 76 sequence differences, and LG2VF30 and BC9-8 have 50 nucleotide differences. This level of divergence within a clade of desert algae exceeds the divergence between monocot and eudicot angiosperms (comparison of rice and tomato 18S rRNA gene sequences in the same algal alignment, data not shown). In addition, there are examples of diversification within three of the desert lineages, one involving charophyte algae and two within the Chlorophyceae. Together, these results indicate that consideration of desert lineages may be essential for understanding the evolution of green algae (and even green plants), increasing taxon sampling in regions of the phylogeny not previously recognized as being poorly sampled.

Table 4 provides posterior means and 95% credible intervals of the basic phylodiversity measures (E, I, and T) and their derivative measures (PET, PEI, and PIT) for the 150-taxon Bayesian MCMC analysis. Maximum likelihood estimates (MLEs) of these quantities were computed for comparison using PAUP* 4.0b10 (Swofford, 2001) using the majority-rule consensus tree from the Bayesian analysis (Fig. 2). We note that the MLEs for E, I, and T are quite different than the posterior means for these quantities; however, phylodiversity measures based on ratios of these quantities (i.e., PET, PEI, and PIT) all fall within their respective 95% Bayesian credible intervals, suggesting that the assumed prior distributions have scaled branch lengths upward relative to the MLEs but had little effect on relative branch lengths. Hereafter, all discussion of phylodiversity measures will refer to the estimates based on the posterior distribution.

Table 4

Summary of phylodiversity measures. The sample size for the posterior mean (Mean column) and credible interval (columns labeled 2.5% and 97.5%) is 20,000, comprising 10,000 samples taken during the last 10 million generations from each of two MCMC simulations. The maximum likelihood estimates (MLE column) were obtained by optimizing branch lengths on the majority rule consensus tree resulting from the MCMC analysis. Phylodiversity measures are described in the text.

Measure MLE Mean 2.5% 97.5% 
T 4.3705 10.9283 9.1199 12.3583 
I 0.6811 1.7669 1.4616 2.0378 
E 0.2904 0.7229 0.5856 0.8547 
PEI = E/I 0.4263 0.4092 0.3697 0.4492 
PET = E/T 0.0664 0.0662 0.0587 0.0741 
PIT = I/T 0.1558 0.1617 0.1501 0.1736 
Average desert tip 0.0100 0.0246 0.0198 0.0294 
Median desert tip 0.00471 0.0127 0.0080 0.0183 
Average nondesert tip 0.0205 0.0507 0.0422 0.0574 
Median nondesert tip 0.01417 0.0349 0.0284 0.0413 
Measure MLE Mean 2.5% 97.5% 
T 4.3705 10.9283 9.1199 12.3583 
I 0.6811 1.7669 1.4616 2.0378 
E 0.2904 0.7229 0.5856 0.8547 
PEI = E/I 0.4263 0.4092 0.3697 0.4492 
PET = E/T 0.0664 0.0662 0.0587 0.0741 
PIT = I/T 0.1558 0.1617 0.1501 0.1736 
Average desert tip 0.0100 0.0246 0.0198 0.0294 
Median desert tip 0.00471 0.0127 0.0080 0.0183 
Average nondesert tip 0.0205 0.0507 0.0422 0.0574 
Median nondesert tip 0.01417 0.0349 0.0284 0.0413 

PET revealed that 6.6% of all substitutions occurred in desert green algal lineages. Obviously, this measure depends on taxon sampling: adding more nondesert taxa would decrease the apparent contribution of the desert taxa. Were it possible to include representative sequences of all green algae, the actual value of PET and other phylodiversity measures could be obtained. We are still far from adequate sampling with respect to green algae as a whole and desert lineages in particular. Although we included the closest nondesert sequence to each of the desert green algal isolates to intentionally introduce a conservative bias into phylodiversity estimates, it is possible that the true value is quite different than the 6.6% value obtained in this study. Nevertheless, these measures allow some extreme hypotheses to be ruled out.

The fact that PET is greater than zero means that a nontrivial amount of evolution occurred in deserts, and is thus evidence against the idea that desert green algal isolates are simply ephemeral and incidental desert inhabitants. Even though PET may change as our knowledge of both desert and nondesert green algae improves, it is unlikely that this conservative conclusion will be overturned by new data. The implication is that there are green algae endemic to deserts, and these represent an important part of a complex community that has been underappreciated, despite being right under our nose, so to speak.

The posterior mean of PEI was 0.4092, which indicates that less than one half of the inclusive phylodiversity can be attributed to substitutions accrued exclusively in desert lineages. The fact that PEI is much less than 1 means that desert green algal lineages have arisen numerous times, and in fact the tree shows that 14 transitions to desert environments from freshwater aquatic ancestors are required to explain the 23 desert isolates. Although PEI, like PET, will change with increased taxon sampling of nondesert green algae and increased isolation of desert lineages, the conservative conclusion that can be drawn (and that will not change with the addition of future data) is that desert green algae represent many independent transitions to land from aquatic green algal ancestors. This is significant because heretofore only one transition to land has been widely recognized, namely the one leading to the embryophytes and including the green plants most familiar to us (e.g., mosses, ferns, conifers, flowering plants). For those seeking to understand the evolutionary, developmental, and physiological changes that necessarily underlie the transition from aquatic to terrestrial existence in green plants, it is clear that more than one lineage should be examined.

Choice of branch length prior distribution

The Bayesian approach taken here necessitates specifying a prior distribution for branch lengths, and for this study we used an exponential distribution with mean 1.0. Because the choice of prior distribution affects branch lengths sampled during the MCMC analysis, and hence influences phylodiversity measures obtained from those samples, careful consideration of branch length prior distributions is important. MrBayes 3.0b4 allows only exponential or uniform prior distributions to be applied to branch lengths. The use of a truncated uniform distribution—i.e., Uniform(0,T), where T is an arbitrarily large upper bound—has been shown to create serious artifacts (Felsenstein, 2004); for example, yielding credible intervals for a parameter that exclude the maximum likelihood estimate. We chose to use an exponential distribution with mean 1.0 rather than one with a smaller mean to increase the variance (and thus decrease the influence) of the prior distribution. This unfortunately makes the prior branch length mean larger than what many would consider typical for branch lengths because the standard deviation equals the mean in exponential distributions. Rather than the pure Bayesian approach taken here, it would be possible to use an empirical Bayes approach (basing the mean of the branch length prior on the maximum likelihood estimate of tree length), or an hierarchical model (e.g., letting the mean of the branch length prior be determined by an hyperprior distribution), as advocated by Suchard et al. (2001). As the software for Bayesian phylogenetics evolves, more flexibility with respect to branch length prior distributions will be possible.

Caveats to the use of phylodiversity summary measures

Any single statistic summarizing phylodiversity can be misleading, and thus these statistics should be viewed as tools but not complete descriptions of phylodiversity. For example, single isolates with unusually high rates of substitution represent outliers that could unduly influence measures of molecular phylodiversity. On the other hand, unusual levels of phylodiversity may point out interesting features, for example elevated levels of substitution associated with life in certain environments but not others. Phylodiversity measures should not be used in isolation, and there is a need for new measures designed to identify specific types of outliers; however, at the very least, simple graphical inspection of the tree with branch lengths drawn proportional to the expected number of substitutions (e.g., Fig. 2) allows identification of many potential problems.

Relationship to previous approaches to quantifying phylodiversity

Previous approaches used phylodiversity measures to assess the conservation value of a particular area (Faith, 1992) or to compare biodiversity in competing environments (Hughes et al., 2001; Martin, 2002). Our motivation for supplementing Faith's original PD measure arose from the need to address a type of question not previously addressed, namely the importance of a group of taxa defined on the basis of environment in the context of the containing clade of all green plants. Although our intention is to supplement rather than replace existing measures of phylodiversity, we feel that a comparison of existing measures to our quantities is appropriate.

Comparison to Faith's uniqueness measure

The exclusive phylodiversity E is related to the uniqueness measure U discussed by Faith (1994). Whereas U represents the probability of at least one substitution (per site) unique to the focal lineages, E measures the expected number of unique substitutions (per site) in these segments. Faith's (1994)U is related to T and E as follows:  

formula
U and E capture the uniqueness of the focal lineages in different ways, reflecting the different contexts in which they are deemed useful. Faith's motivation was to measure the unique genetic contribution of a candidate taxon for conservation, for purposes of comparison with other candidate taxa. The winning candidate taxon contributes relatively more unique substitutions than other candidates against a reference subtree comprising already-conserved taxa. In our case, the reference subtree would necessarily include all nondesert taxa included in the study. As the size of the nonfocal group grows, the uniqueness according to U decreases because substitutions must occur on focal lineages and not occur on other lineages in order to be counted by U. Thus, for questions like those addressed in this study, keeping E separate from T is important.

Comparison to Martin's measures

The measures described here are correlated with the quantities proposed by Martin (2002), and Figure 3 presents a comparison of our phylodiversity measures, PEI and PIT, to the FST and P-test approaches of Martin (2002). Figure 3 is essentially the same as Martin's (2002) figure 4, the only difference being that we approximated the relative branch lengths because they were not provided in Martin's paper.

Figure 3

Phylogenetic trees used to compare phylodiversity measures: (a) tree in which both the F-test and the P-test are significant: ((D:3.0,(C:1.25,(A:1.0,B:1.0):0.25):1.75):1.0,(E:0.5,F:0.5):3.5, ((G:0.5,H:0.5):3.5,(I:3.0,(J:1.25,(K:0.9,L:0.9):0.35):1.75):1.0):9.0); (b) tree in which the F-test is significant but the P-test is not: ((D:1.67,(C: 1.33,(A:1.0,B:1.0):0.33):0.34):5.66,(E:0.5,F:0.5):6.83,((G:1.0,H:1.0):6.25,(I: 6.0,(J:1.25,(K:1.0,L:1.0):0.25):4.75):0.25):2.42); (c) tree in which the P-test is significant but the F-test is not: ((D:6.75,(C:6.25,(A:5.75, B:5.75):0.5):0.5):1.0,(E:7.25,F:7.25):0.5,((G:6.5,H:6.5):1.25,(I:6.67,(J:6.0,(K: 5.67,L:5.67) :0.33):0.67):1.08):1.5); (d) tree in which neither the F-test nor the P-test are significant: ((D:3.0,(C:1.25,(A:0.9,B:0.9):0.35):1.75): 0.75,(E:0.5,F:0.5):3.25,((G:0.5,H:0.5):3.5,(I:3.0,(J:1.33,(K:1.0,L:1.0):0.33): 1.67):1.0):9.25). Redrawn from Martin (2002: Fig 4).

Figure 3

Phylogenetic trees used to compare phylodiversity measures: (a) tree in which both the F-test and the P-test are significant: ((D:3.0,(C:1.25,(A:1.0,B:1.0):0.25):1.75):1.0,(E:0.5,F:0.5):3.5, ((G:0.5,H:0.5):3.5,(I:3.0,(J:1.25,(K:0.9,L:0.9):0.35):1.75):1.0):9.0); (b) tree in which the F-test is significant but the P-test is not: ((D:1.67,(C: 1.33,(A:1.0,B:1.0):0.33):0.34):5.66,(E:0.5,F:0.5):6.83,((G:1.0,H:1.0):6.25,(I: 6.0,(J:1.25,(K:1.0,L:1.0):0.25):4.75):0.25):2.42); (c) tree in which the P-test is significant but the F-test is not: ((D:6.75,(C:6.25,(A:5.75, B:5.75):0.5):0.5):1.0,(E:7.25,F:7.25):0.5,((G:6.5,H:6.5):1.25,(I:6.67,(J:6.0,(K: 5.67,L:5.67) :0.33):0.67):1.08):1.5); (d) tree in which neither the F-test nor the P-test are significant: ((D:3.0,(C:1.25,(A:0.9,B:0.9):0.35):1.75): 0.75,(E:0.5,F:0.5):3.25,((G:0.5,H:0.5):3.5,(I:3.0,(J:1.33,(K:1.0,L:1.0):0.33): 1.67):1.0):9.25). Redrawn from Martin (2002: Fig 4).

Unlike Martin (2002), we calculated FST assuming that distances between taxa are additive. That is, the length of the shortest path from one taxon to another through the tree exactly equals the pairwise distance. This additivity is unlikely in real data, but is justified for our (purely illustrative) purposes. Martin's P-test distinguishes the two cases on the left (Fig. 3a, Fig. 3b, Fig. 3c), which are characterized by few transitions between environments, from the two on the right (Fig. 3b, Fig. 3d), which are characterized by numerous transitions. Our measure PEI also distinguishes left from right, being high (1.00, 0.96) for the few-transitions cases and low (0.63, 0.29) for the cases involving numerous transitions.

Martin's F-test (test of the hypothesis FST = 0) distinguishes top from bottom. FST is high (0.50, 0.17) (i.e., significantly greater than 0) for the two cases on the top (Fig. 3a, Fig. 3b), which are characterized by having at least one clade of closely related taxa that also share the same environment. FST is low (0.08) or even negative (−0.08) (i.e., not significantly greater than 0) for the two cases on the bottom (Fig. 3c, Fig. 3d), in which all pairs of taxa are nearly equally distantly related (Fig. 3c) or pairs that are relatively closely related include members from both environments (Fig. 3d).

Our measure PIT is related to FST. The highest values of FST occur when there are only two clades each representing a different environment, and these clades are shallow (i.e., pairs of taxa within each clade are closely related). These are the same conditions under which PIT is small. On the other hand, FST is lowest when all clades are heterogeneous, each being composed of taxa from different environments, so that paths between pairs of taxa from the same environment often pass through the root of the tree. Such situations are expected to yield large PIT values. In fact, the correlation between FST and the quantity 1 − PIT is either 0.99 or 0.74, depending on whether the open squares or closed squares, respectively, are considered the focal environment.

Although PIT can distinguish one extreme (Fig. 3a) from the other (Fig. 3d), it does not do as well as FST in distinguishing the top from the bottom two cases in Martin's figure 4. FST also has the advantage of symmetry: the choice of focal environment matters when calculating PIT, but does not matter for FST. We do not consider this a failing of our phylodiversity measures because they were not designed to compare biodiversity in two (or more) contrasting environments, but rather to compare the diversity in one focal environment to the total phylodiversity.

Bayesian credible intervals versus frequentist hypothesis tests

Using a Bayesian approach, we were able to compute Bayesian credible intervals for phylodiversity measures. This approach explicitly accounts for the fact that the phylogeny is not known without error. The interpretation of these 95% credible intervals is straightforward: given the data, the model, and the assumed prior distributions for model parameters, the true value of each phylodiversity measure lies within its credible interval with probability 95%. There are dangers associated with this approach, however: using a model that fails to capture an important feature of molecular evolution can substantially change the size and location of the credible intervals, as can changes in the assumed prior distributions. However, the ability to account for uncertainty in the phylogeny is a powerful motivation for taking the Bayesian approach.

Martin (2002) takes a frequentist approach, testing particular null hypotheses about FST or the number of changes in environment. The F-test is independent of phylogeny, as it depends only on pairwise comparisons, and thus there is no reason to account for phylogenetic uncertainty in this case, although it may be advisable to account for multiple substitutions in some way when calculating FST. The P-test is explicitly phylogenetic, however. The null hypothesis in this case is that the number of transitions between environments is what would be expected if the underlying phylogeny were random. If the number of inferred transitions is small relative to this expectation, then the P-test is significant. It is sometimes not entirely clear what the null distribution represents in randomization tests (e.g., Swofford et al., 1996). This is apparent from the ease with which a different but equally reasonable null distribution can be proposed. For instance, one could fix the topology of the tree and randomly shuffle the assignment of environments to the tip nodes. This randomization approach seems just as sensible as randomizing the underlying topology, but counterexamples in which the two ways of randomizing produce different null distributions are easy to find. For example, consider a six-taxon tree in which two closest neighbors are in one environment and the other four taxa belong to a different environment. The significance probability if topologies are randomized is 3/7 because 1 change can be inferred in 15 of the 105 possible topologies. The significance probability obtained when assignment of environment is shuffled is either 1/5 or 2/15 depending on the shape of the tree, but importantly, neither of these equals the 3/7 obtained when topologies are randomized. Maddison and Slatkin (1991: 1195) concluded in their review of null models for tests of this sort that “… there are several possible null models that can be used in evolutionary studies, and [that] they lead to somewhat different distributions of the number of character state changes.” Later (p. 1196), they state “if the tree is known and considered a given in the evolution of the characters, then the appropriate null model is one that randomizes characters.” Such problems with interpretation argue for a Bayesian approach because posterior probabilities provide a direct and unambiguous answer to the question posed. In this case, the question might be “What is the probability that the number of transitions between environments is X or fewer?” Simply noting the proportion of sampled trees from a Bayesian MCMC run in which the number of inferred transitions is X or fewer provides a direct answer to this question in the form of an approximated posterior probability.

Conclusions

Our results on green algae in deserts echo the surprising amount of eukaryotic diversity recently uncovered from such habitats as highly acidic rivers, anoxic mud, and deep sea vents (e.g., Zettler et al., 2002). Together, these studies greatly expand our knowledge of the range of environments in which eukaryotes can exist. Phylogenetic analyses of eukaryotes from extreme environments and their nearest relatives allow exploration of the number and rapidity of transitions to the habitat of interest, and can provide insights into the physiological traits important in these transitions. Our results indicate that desert green algae are not simply close relatives of aquatic taxa, but instead represent levels of divergence that could be interpreted as new species, new genera, or even higher order taxa. The exclusive molecular phylodiversity measure and related measures together provide a useful way to directly characterize both the distribution and extent of variation for a given set of taxa.

Acknowledgements

We thank S. Olm for technical help, R. Colwell and P. Turchin for comments made on an earlier version of the manuscript, and G. Burleigh and one anonymous reviewer for their helpful and constructive comments. The authors acknowledge support from National Aeronautics and Space Administration (EXB02-0042-0054) and the National Science Foundation (DEB9870201) to LAL, and Alfred P. Sloan Foundation/NSF grant (98-4-5 ME) to POL.

References

Altshul
S. F.
Gish
W.
Myers
E. W.
Miller
W.
Lipman
D. J.
Basic local alignment search tool
J. Mol. Biol.
 , 
1990
, vol. 
215
 (pg. 
403
-
410
)
Bollback
J. P.
SIMMAP version 1.0b1
 , 
2004
 
Distributed by the author at http://www.simmap.com/
Broady
P. A.
Friedmann
E. I.
Soils heated by volcanism
Antarctic microbiology
 , 
1993
New York
Wiley-Liss
(pg. 
413
-
432
)
Buchheim
M. A.
Lemieux
C.
Otis
C.
Gutell
R.
Chapman
R. L.
Turmel
M.
Phylogeny of the Chlamydomonadales (Chlorophyceae): A comparison of ribosomal RNA gene sequences from the nucleus and the chloroplast
Mol. Phylogen. Evol.
 , 
1996
, vol. 
5
 (pg. 
391
-
402
)
Cameron
R. E.
Communities of soil algae occurring in the Sonoran Desert in Arizona
J. Ariz. Acad. Sci.
 , 
1960
, vol. 
1
 (pg. 
85
-
88
)
Cameron
R. E.
Algae of southern Arizona. II. Algal Flora (exclusive of bluegreen algae)
Rev. Algol.
 , 
1964
, vol. 
7
 (pg. 
151
-
177
)
Evans
R. D.
Johansen
J. R.
Microbiotic crusts and ecosystem processes
Crit. Rev. Plant Sci.
 , 
1999
, vol. 
18
 (pg. 
183
-
225
)
Faith
D.
Conservation evaluation and phylogenetic diversity
Biol. Conserv.
 , 
1992
, vol. 
61
 (pg. 
1
-
10
)
Faith
D.
Genetic diversity and taxonomic priorities for conservation
Biol. Conserv.
 , 
1994
, vol. 
68
 (pg. 
69
-
74
)
Flechtner
V. R.
Johansen
J. R.
Clark
W. H.
Algal composition of microbiotic crusts from the central desert of Baja California, Mexico
Great Basin Naturalist
 , 
1998
, vol. 
58
 (pg. 
295
-
311
)
Friedl
T.
Inferring taxonomic positions and testing genus level assignments in coccoid green lichen algae: A phylogenetic analysis of 18S ribosomal RNA sequences from Dictyochloropsis reticulata and from members of the genus Myrmecia (Chlorophyta, Trebouxiophyceae cl. nov.)
J. Phycol.
 , 
1995
, vol. 
31
 (pg. 
632
-
639
)
Friedl
T.
Zeltner
C.
Assessing the relationships of some coccoid green lichen algae and the Microthamniales (Chlorophyta) with 18S ribosomal RNA gene sequence comparisons
J. Phycol.
 , 
1994
, vol. 
30
 (pg. 
500
-
506
)
Gelman
A.
Gilks
W. R.
Richardson
S.
Spiegelhalter
D. J.
Inference and monitoring convergence
Markov chain Monte Carlo in practice
 , 
1996
New York
Chapman & Hall/CRC
(pg. 
131
-
143
)
Gelman
A.
Rubin
D. B.
Inference from iterative simulation using multiple sequences
Stat. Sci.
 , 
1992
, vol. 
7
 (pg. 
457
-
472
)
Gelman
A.
Rubin
D. B.
Bernardo
J. M.
Berger
J. O.
Dawid
A. P.
Smith
A. F. M.
A single sequence from the Gibbs sampler gives a false sense of security
Bayesian statistics 4
 , 
1992
Oxford
Oxford University Press
(pg. 
625
-
631
)
Geyer
C. J.
Keramidas
E. M.
Markov chain Monte Carlo maximum likelihood
Computing science and statistics: Proceedings of the 23rd Symposium on the Interface
 , 
1991
Fairfax Station, Virginia
Interface Foundation
(pg. 
156
-
163
)
Gu
X.
Fu
Y. -X.
Li
W. -H.
Maximum likelihood estimation of the heterogeneity of substitution rate among nucleotide sites
Mol. Biol. Evol.
 , 
1995
, vol. 
12
 (pg. 
546
-
557
)
Huelsenbeck
J. P.
Nielsen
R.
Bollback
J. P.
Stochastic mapping of morphological characters
Syst. Biol.
 , 
2003
, vol. 
52
 (pg. 
131
-
158
)
Huelsenbeck
J. P.
Ronquist
F.
MrBayes: Bayesian inference of phylogenetic trees
Bioinformatics
 , 
2001
, vol. 
17
 (pg. 
754
-
755
)
Hughes
J. B.
Hellmann
J. J.
Ricketts
T. H.
Bohannan
B. J. M.
Counting the uncountable: Statistical approaches to estimating microbial diversity
Appl. Environ. Microbiol.
 , 
2001
, vol. 
67
 (pg. 
4399
-
4406
)
Huss
V. A. R.
Sogin
M. L.
Phylogenetic position of some Chlorella species within the Chlorococcales based upon complete small-subunit ribosomal RNA sequences
J. Mol. Evol.
 , 
1990
, vol. 
31
 (pg. 
432
-
442
)
Katana
A.
Kwiatowski
J.
Spalik
K.
Zakrys
B.
Szalacha
E.
Szymanska
H.
Phylogenetic position of Koliella (Chlorophyta) as inferred from nuclear and chloroplast small subunit rDNA
J. Phycol.
 , 
2001
, vol. 
37
 (pg. 
443
-
451
)
Lanave
C.
Preparata
G.
Saccone
C.
Serio
G.
A new method for calculating evolutionary substitution rates
J. Mol. Evol.
 , 
1984
, vol. 
20
 (pg. 
86
-
93
)
Lewis
L. A.
Flechtner
V. R.
Green algae (Chlorophyta) of desert microbiotic crusts: Diversity of North American taxa
Taxon
 , 
2002
, vol. 
51
 (pg. 
443
-
451
)
Lewis
L. A.
Flechtner
V. R.
Cryptic species of Scenedesmus (Chlorophyta) from Desert Soil Communities of Western North America
J. Phycol.
 , 
2004
, vol. 
40
 (pg. 
1127
-
1137
)
Lewis
L. A.
Wilcox
L. W.
Fuerst
P. A.
Floyd
G. L.
Concordance of molecular and ultrastructural data in the study of zoosporic chlorococcalean green algae
J. Phycol.
 , 
1992
, vol. 
28
 (pg. 
375
-
380
)
Maddison
W. P.
Slatkin
M.
Null models for the number of evolutionary steps in a character on a phylogenetic tree
Evolution
 , 
1991
, vol. 
45
 (pg. 
1184
-
1197
)
Martin
A. P.
Phylogenetic Approaches for describing and comparing the diversity of microbial communities
Appl. Environ. Microbiol.
 , 
2002
, vol. 
68
 (pg. 
3673
-
3682
)
Metting
B.
The systematics and ecology of soil algae
Bot. Rev.
 , 
1981
, vol. 
47
 (pg. 
195
-
312
)
Nielsen
R.
Mapping mutations on phylogenies
Syst. Biol.
 , 
2002
, vol. 
51
 (pg. 
729
-
739
)
Posada
D.
Crandall
K. A.
ModelTest: Testing the model of DNA substitution
Bioinformatics
 , 
1998
, vol. 
14
 (pg. 
817
-
818
)
Rosentreter
R.
Belnap
J.
Belnap
J.
Lange
O. L.
Biological soil crusts of North America
Biological soil crusts: Structure, function, and management
 , 
2001
Berlin
Springer-Verlag
(pg. 
31
-
50
)
Schlesinger
W. H.
Raikes
J. A.
Hartley
A. E.
Cross
A. E.
On the spatial pattern of soil nutrients in desert ecosystems
Ecology
 , 
1996
, vol. 
77
 (pg. 
364
-
374
)
Shaw
A. J.
Cox
C. J.
Goffinet
B.
Buck
W. R.
Boles
S. B.
Phylogenetic evidence of a rapid radiation of Pleurocarpous mosses (Bryophyta)
Evolution
 , 
2003
, vol. 
57
 (pg. 
2226
-
2241
)
Shields
L. M.
Drouet
F.
Distribution of terrestrial algae within the Nevada Test Site
Am. J. Bot.
 , 
1962
, vol. 
49
 (pg. 
547
-
554
)
Shoup
S.
Lewis
L. A.
Polyphyletic origin of parallel basal bodies in swimming cells of chlorophycean green algae (Chlorophyta)
J. Phycol.
 , 
2003
, vol. 
39
 (pg. 
789
-
796
)
Steel
M.
Phylogenetic diversity and the greedy algorithm
Syst. Biol.
 , 
2005
, vol. 
54
 (pg. 
527
-
529
)
Suchard
M. A.
Weiss
R. E.
Sinsheimer
J. S.
Bayesian selection of continuous-time Markov chain evolutionary models
Mol. Biol. Evol.
 , 
2001
, vol. 
18
 (pg. 
1001
-
1013
)
Swofford
D. L.
PAUP*: Phylogenetic analysis using parsimony (*and other methods), version 4.0b10
 , 
2001
Sunderland, Massachusetts
Sinauer Associates
Swofford
D. L.
Thorne
J. L.
Felsenstein
J.
Wiegman
B. M.
The topology-dependent permutation test for monophyly does not test for monophyly
Syst. Biol.
 , 
1996
, vol. 
45
 (pg. 
575
-
579
)
Thompson
J. D.
Higgins
D. G.
Gibson
T. J.
CLUSTAL W: Improving the sensitivity of progressive multiple sequence alignment through sequence weighting, positions-specific gap penalties and weight matrix choice
Nucleic Acids Res.
 , 
1994
, vol. 
22
 (pg. 
4673
-
4680
)
Watanabe
S.
Floyd
G. L.
Comparative ultrastructure of the zoospores of nine species of Neochloris (Chlorophyta)
Plant Syst. Evol.
 , 
1989
, vol. 
168
 (pg. 
195
-
219
)
West
N. E.
Structure and function of microphytic soil crusts in wildland ecosystems of arid to semi-arid regions
Adv. Ecol. Res.
 , 
1990
, vol. 
20
 (pg. 
179
-
223
)
Zettler
L. A. A.
Gómez
F.
Zettler
E.
Keenan
B. G.
II
Amils
R.
Sogin
M. L.
Eukaryotic diversity in Spain's River of Fire
Nature
 , 
2002
, vol. 
417
 pg. 
137
 

Appendix 1

Locality data for the isolates of desert green algae.

BCPa ID Locality 
CNP2 Canyonlands National Park, Needles District, Chesler Park, San Juan County, UT, USA. Latitude 38°06.204′N, Longitude 109°51.000′W, Elevation 1724 m. Coll: 11 May 1999. 
EM1 Mojave National Preserve, Cinder cone site, San Bernardino County, CA, U.S.A. Latitude 35°11.671′N, Longitude 115°52.223′W, Elevation 641 m. Coll: 4 June 1998. 
LG2 Sierra San Padra Martir of Baja California, Mexico, LaGrulla Meadow area. Latitude 30°54′N, Longitude 115°28′W, Elevation 2100 m. Coll: 15 June 1998. 
LG3 Sierra San Padra Martir of Baja California, Mexico. LaGrulla Meadow area. Latitude 30°54′N, Longitude 115°29′W, Elevation 2100 m. Coll: 15 June 1998. 
SRS2 San Rafael Swell, Emery County, UT, USA. Latitude: 39°08.574′N, Longitude 110°46.282′W. Coll: 9 May 1999. 
ZNP2 Zion National Park, Washington County, UT, USA. Latitude: 37°12.932′N, Longitude 112°34.933′W, Elevation 1511 m. Coll: 17 May 1999 
ZNP3 Zion National Park, Washington County, UT, USA. Latitude: 37°20.556′N, Longitude 113°06.578′W, Elevation 2042 m. Coll: 18 May 1999. 
BCPa ID Locality 
CNP2 Canyonlands National Park, Needles District, Chesler Park, San Juan County, UT, USA. Latitude 38°06.204′N, Longitude 109°51.000′W, Elevation 1724 m. Coll: 11 May 1999. 
EM1 Mojave National Preserve, Cinder cone site, San Bernardino County, CA, U.S.A. Latitude 35°11.671′N, Longitude 115°52.223′W, Elevation 641 m. Coll: 4 June 1998. 
LG2 Sierra San Padra Martir of Baja California, Mexico, LaGrulla Meadow area. Latitude 30°54′N, Longitude 115°28′W, Elevation 2100 m. Coll: 15 June 1998. 
LG3 Sierra San Padra Martir of Baja California, Mexico. LaGrulla Meadow area. Latitude 30°54′N, Longitude 115°29′W, Elevation 2100 m. Coll: 15 June 1998. 
SRS2 San Rafael Swell, Emery County, UT, USA. Latitude: 39°08.574′N, Longitude 110°46.282′W. Coll: 9 May 1999. 
ZNP2 Zion National Park, Washington County, UT, USA. Latitude: 37°12.932′N, Longitude 112°34.933′W, Elevation 1511 m. Coll: 17 May 1999 
ZNP3 Zion National Park, Washington County, UT, USA. Latitude: 37°20.556′N, Longitude 113°06.578′W, Elevation 2042 m. Coll: 18 May 1999.