## Abstract

Deserts are not usually considered biodiversity hotspots, but desert microbiotic crust communities exhibit a rich diversity of both eukaryotic and prokaryotic life forms. Like many communities dominated by microscopic organisms, they defy characterization by traditional species-counting approaches to assessing biodiversity. Here we use exclusive molecular phylodiversity (*E*) to quantify the amount of evolutionary divergence unique to desert-dwelling green algae (Chlorophyta) in microbiotic crust communities. Given a phylogenetic tree with branch lengths expressed in units of expected substitutions per site, *E* is the total length of all tree segments representing exclusively desert lineages. Using MCMC to integrate over tree topologies and branch lengths provides 95% Bayesian credible intervals for phylodiversity measures. We found substantial exclusive molecular phylodiversity based on 18S rDNA data, showing that desert lineages are distantly related to their nearest aquatic relatives. Our results challenge conventional wisdom, which holds that there was a single origin of terrestrial green plants and that green algae are merely incidental visitors rather than indigenous components of desert communities. We identify examples of lineage diversification within deserts and at least 12 separate transitions from aquatic to terrestrial life apart from the most celebrated transition leading to the embryophyte land plants.

Microbiotic crust communities occur worldwide in arid habitats and include diverse photosynthetic and nonphotosynthetic organisms such as cyanobacteria, lichens, green algae, diatoms, bryophytes, fungi, and microarthropods. These surface-dwelling organisms often experience great extremes in environmental conditions such as moisture and temperature. In the Mojave Desert of North America, for example, summer soil temperatures can exceed 90°C, and rainfall can be under 45 mm per year (Rosentreter and Belnap, 2001), whereas crust organisms in Antarctica can experience summer temperatures that are below freezing (Broady, 1993). Crust communities are now considered important to nutrient cycling in deserts and are also known to play a key role in soil stabilization (Evans and Johansen, 1999; Schlesinger et al., 1996; West, 1990). A growing interest in understanding the ecological role of crusts has resulted in a need for more intensive characterization of the organisms of these communities.

It was thought until recently that the biodiversity of green algae (Chlorophyta) in desert crust communities was represented by a small number of species. This perception emerged because the vegetative stages of soil algae are morphologically simple, and the earliest studies on crust algae were based on light microscopy of these stages alone (Cameron, 1960, 1964; Metting, 1981; Shields and Drouet, 1962). Recent studies that incorporate data from additional life history stages, such as zoospores and gametes, demonstrate greater numbers of different green algae from desert soils (e.g., Flechtner et al., 1998). The true taxonomic affiliations of microscopic desert green algae are only now being established with the aid of nucleotide data, particularly 18S data (Lewis and Flechtner, 2002, 2004). Many of the familiar genera of unicellular green algae that occur in deserts, such as *Chlorella*, *Chlamydomonas*, and *Chlorococcum,* were shown to be polyphyletic using 18S rDNA data, and these results were congruent with ultrastructural data obtained from alternate life history stages (Buchheim et al., 1996; Friedl, 1995; Friedl and Zeltner, 1994; Huss and Sogin, 1990; Lewis et al., 1992; Watanabe and Floyd, 1989). These results highlight the extent to which morphological convergence of the vegetative morphology obscures genetic diversity and illustrate the ability of 18S data to uncover broad scale phylogenetic relationships within the green algae.

Quantifying the biodiversity of microscopic taxa can be problematic because most groups of microscopic organisms are poorly studied, and traditional biodiversity measures often rely on the assignment of taxa to known species (Hughes et al., 2001). Phylogenetic diversity (or *phylodiversity*) provides a surrogate measure that more accurately portrays the underlying genetic diversity of organisms, and is standardized across the variety of life histories, reproductive strategies, and morphological variability that create problems with traditional species-counting measures. The concept of phylodiversity (PD) was proposed as a conservation biology tool in order to quantify the phylogenetic heritage captured by the organisms in a particular geographic area (Faith, 1992). The original measure was based primarily on parsimony analysis of morphological data, but one example in Faith's original paper involved mitochondrial DNA, and this measure has been recently adapted for use with model-based phylogenetic methods (Shaw et al., 2003).

These quantitative measures of phylodiversity are based on the path length of the subtree connecting contemporary organisms of interest. This original definition of PD is appropriate for answering questions involving the phylogenetic heritage of a geographic area of conservation interest. Recent theoretical work has bolstered this application of PD (Steel, 2005) by showing that it is straightforward to find subset(s) of taxa maximizing PD. PD is also useful for comparing divergence among monophyletic groups (Shaw et al., 2003). It fails, however, to adequately address questions involving the importance of groups that are neither monophyletic nor geographically circumscribed.

We introduce *exclusive molecular phylodiversity* (*E*; Fig. 1a) to be the sum of all branch lengths in a tree that support either individual taxa or clades composed exclusively of taxa in some group of interest. *Inclusive molecular phylodiversity* (*I*; Fig. 1b) is identical to a molecular version of Faith's (1992) PD where path lengths are measured as expected number of substitutions per site. The total tree length (*T*; Fig. 1c) provides an upper bound for both *E* and *I*. If the set of taxa representing the group of interest is *S*, and *S*′ is the complement of *S* (i.e., the set of all taxa in the study but not in the group of interest), then *E* may be unambiguously defined as *T* minus *I*′, where *I*′ (Fig. 1d) equals *I* computed for *S*′.

If the focal group is monophyletic, *E* is greater than *I* by an amount equal to the length of the branch subtending the group, but *E* can be much smaller than *I* when the group is highly polyphyletic. *E* is a more appropriate measure than *I* if the only evolutionary diversity of interest is that uniquely accrued by organisms occupying a particular habitat, and not the diversity accrued by their ancestors occupying other habitats. In this case, *E* measures the amount of evolutionary divergence that can be unambiguously attributed to organisms living in the focal habitat. Stochastic character mapping (Nielsen, 2002; Huelsenbeck et al., 2003) provides an alternative means of defining exclusive phylodiversity. This Bayesian definition measures exclusive phylodiversity as the posterior mean number of substitutions per site accumulated while in the focal habitat. We denote the definition based on stochastic character mapping *E*_{S} to distinguish it from *E*.

Phylodiversity is not the only way to assess biodiversity in situations where species counting is not feasible. Faith (2004) introduced a uniqueness measure intended for use in making conservation decisions about individual taxa. Martin (2002) introduced two measures for addressing the diversity and differentiation of two communities that, like our phylodiversity measures, do not depend on identifying species. Martin's *F*_{ST} measure can be used to compare community diversity to the total diversity overall, and his *P*-test measures the degree of differentiation between two communities. The phylodiversity measures introduced here are not intended for comparing two communities, but they nevertheless are correlated with Martin's measures.

In this paper, we compare our phylodiversity measures with those of Martin (2002) and to the uniqueness measure introduced by Faith (2004). In addition, we present new 18S rDNA sequence data obtained from desert green algae and utilize Bayesian phylogenetic analyses to address basic questions about green algae living in desert soils. Did the transition from aquatic to terrestrial habitats occur just once, or have green algae adapted to desert life numerous times independently? Are the green algae in deserts simply representatives of widespread ecological generalists, or have they diversified in arid habitats and form desert clades? What fraction of the total diversity of all green algae do these desert lineages represent?

## Materials and Methods

### Sample Collection

Desert green algae were collected from distinct geographic locations in western North America, ranging from Baja California, Mexico, into California, New Mexico, and Utah, USA (Appendix 1). Algae were isolated from the soils using a dilution plating method, as described in Flechtner et al. (1998). Nucleotide sequence data of the small subunit ribosomal RNA gene (18S) of nine isolates of desert green algae were obtained using direct sequencing of PCR amplifications. Briefly, this included DNA extraction from unialgal isolates (cultures grown from a single cell) using a modified CTAB extraction method, followed by PCR amplification and sequencing using the primers SSU1, SSU2, N18G, N18H, C18G, C18H, and C18J (Lewis and Flechtner, 2002; Shoup and Lewis, 2003). Base calls in the consensus sequences were verified from individual sequencing reactions in both the forward and reverse orientations, or from duplicate sequencing reactions in the same orientation. Over 98% of base calls had at least twofold coverage. To confirm the absence of contaminant sequences, each newly obtained consensus sequence was subjected to a BLAST search (Altshul et al., 1990). The 18S sequence data for each isolate were deposited in the GenBank database (Table 1).

Taxon | GenBank accession number |
---|---|

Desert Taxa (new sequences) | |

CNP2VF11b | AY271675 |

EM1VF1 | AY271673 |

LG2VF30 | AY271676 |

LG3VF20 | AY271674 |

MX219VF21 | AY614713 |

NB1VF11 | AY614714 |

SRS2VF14 | AY377441 |

ZNP2VF21 | AY377440 |

ZNP3VF36 | AY377439 |

Desert Taxa (previously published) | |

BC2-1 | AF516676 |

BC4VF9 | AF516675 |

BC8-8 | AF516674 |

Cylindrocystis brebissonii BC9-8 | AF115439 |

CNP1VF2 | AF513378 |

CNP2VF25 | AF516677 |

H1VF1 | AF513369 |

LG2VF16 | AF513372 |

SEV2VF1 | AF516678 |

SEV3VF14 | AF513371 |

SEV3VF49 | AF513373 |

SRS2VF18 | AF513375 |

UT8-26 | AF513376 |

ZNP1VF32 | AF513379 |

Prasinophyceae | |

Cymbomonas tetramitiformis | AB017126 |

Pterosperma cristatum | AJ010407 |

Halosphaera sp. | AB017125 |

Nephroselmis olivacea | X74754 |

Tetraselmis striata | X70802 |

Pseudoscourfieldia marina | X75565 |

Mantoniella squamata | X73999 |

Dolichomastix tenuilepis | AF509625 |

Picocystis salinarum | AF153313 |

Charophyceae and Embryophytes | |

Chaetosphaeridium globosum | AJ250110 |

Chlorokybus atmophyticus | M95612 |

Cylindrocystis crassa | AJ428080 |

Coleochaete orbicularis | M95611 |

Coleochaete scutata | X68825 |

Desmidium grevillii | AJ428117 |

Klebsormidium flaccidum | AF408240 |

Marchantia polymorpha | AY342318 |

Mesostigma viride | AJ250108 |

Mesotaenium kramstai | AJ428079 |

Mougeotia scalaris | X70705 |

Nitella capillaries | AJ250111 |

Nitellopsis obtuse | AF408226 |

Peniumcylindrus | AJ553930 |

Raphidonema nivale | AF448477 |

Staurastrum sp. | X74752 |

Zygnema circumcarinata | X79495 |

Ulvophyceae | |

Acrosiphonia duriuscula | AB049418 |

Enteromorpha intestinalis | AJ000040 |

Gloeotilopsis sarcinoidea | Z47998 |

Hazenia mirabilis | AF387156 |

Monostroma grevillei | AF015279 |

Pseudendoclonium basiliense | Z47996 |

Pseudoneochloris marina | U41102 |

Quadrigula closterioides | Y17924 |

Ulothrix zonata | Z47999 |

Urospora penicilliformis | AB049417 |

Ulva curvata | AF189078 |

Trebouxiophyceae | |

Amphikrikos sp. | AF228690 |

Chlorella ellipsoidea | X63520 |

Chlorella minutissima | X56102 |

Chlorella fusca | X56104 |

Chlorella saccharophila | X63505 |

Choricystis minor | X89012 |

Coenocystis inconstans | AB017435 |

Eremosphaera viridis | AF387154 |

Fusochloris perforatum | M62999 |

Golenkinia longispicula | AF499923 |

Koliella spiculiformis | AF278744 |

Marvania geminata | AF124336 |

Micractinium pusillum | AF499921 |

Microthamnion kuetzingianum | Z28974 |

Muriella aurantica | AB005748 |

Myrmecia biatorellae | Z28971 |

Myrmecia israeliensis | M62995 |

Oocystis heteromucosa | AF228689 |

Parietochloris pseudoalveolaris | M63002 |

Planktosphaeria gelatinosa | AY044648 |

Pleurastrum insigne | Z28972 |

Pleurastrum terrestris | Z28973 |

Prasiola crispa | AJ416106 |

Prototheca wickerhamii | X56099 |

Radiofilum conjunctivum | AF387155 |

Stichococcus chodati | AB055867 |

Trebouxia asymmetrica | Z21553 |

Trebouxia impressa | Z21551 |

Trebouxia magna | Z21552 |

Trochiscia hystrix | AF277651 |

Watanabea reniformis | X73991 |

Chlorophyceae | |

Ankistrodesmus stipitatus | X56100 |

Ankyra judayi | U73469 |

Asteromonas gracilis | M95614 |

Atractomorpha echinata | U73470 |

Bracteacoccus medionucleatus | U63098 |

Bracteacoccus giganteus | U63099 |

Bracteacoccus aerius | U63101 |

Bulbochaete hiloensis | U83132 |

Carteria obtuse | AF182818 |

Chaetopeltis orbicularis | U83125 |

Chaetophora incrassata | D86499 |

Characiosiphon rivularis | AF395437 |

Characium hindakii | M63000 |

Chlamydomonas baca | U70781 |

Chlamydomonas reinhardtii | M32703 |

Chlamydomonas humicola | U13984 |

Chlamydomonas noctigama | AF008241 |

Chlamydopodium vacuolatum | M63001 |

Chlorococcum cf. tatrense | AF514407 |

Chlorogonium euchlorum | AJ410443 |

Chlorogonium capillatum | AJ410442 |

Chloromonas reticulata | AJ410448 |

Chlorosarcinopsis minor | AB049415 |

Coelastrum microporum | AF388373 |

Cylindrocapsa geminella | U73471 |

Desmodesmus communis | X73994 |

Dictyochloris fragrans | AF367861 |

Dunaliella parva | M62998 |

Ettlia minuta | M62996 |

Fritschiella tuberosa | U83129 |

Gloeococcus maximus | U83122 |

Gongrosira papuasica | U18503 |

Haematococcus zimbabwiensis | U70797 |

Heterochlamydomonas inaequalis | AF367857 |

Hormotila blennista | U83123 |

Hormotilopsis gelatinosa | U83126 |

Hydrodictyon reticulatum | M74497 |

Lobochlamys culleus | AJ410461 |

Lobochlamys segnis | AJ410464 |

Mychonastes homosphaera | AB025423 |

Neochloris aquatica | M62861 |

Oedogonium cardiacum | U83133 |

Ourococcus multisporus | AF277648 |

Paulschulzia pseudovolvox | U83120 |

Pediastrum duplex | M62997 |

Planophila terrestris | U83127 |

Polytoma uvella | U22940 |

Pseudodictyosphaerium jurisii | AF106074 |

Scenedesmus obliquus | X56103 |

Scenedesmus pupukensis | X91267 |

Scenedesmus rubescens | X74002 |

Schizomeris leibleinii | AF182820 |

Spermatozopsis similes | X65557 |

Sphaeroplea robusta | U73472 |

Spongiochloris spongiosa | U63107 |

Stigeoclonium helveticum | U83131 |

Tetraspora sp. | U83121 |

Uronema belkae | AF182821 |

Volvox carteri | X53904 |

Taxon | GenBank accession number |
---|---|

Desert Taxa (new sequences) | |

CNP2VF11b | AY271675 |

EM1VF1 | AY271673 |

LG2VF30 | AY271676 |

LG3VF20 | AY271674 |

MX219VF21 | AY614713 |

NB1VF11 | AY614714 |

SRS2VF14 | AY377441 |

ZNP2VF21 | AY377440 |

ZNP3VF36 | AY377439 |

Desert Taxa (previously published) | |

BC2-1 | AF516676 |

BC4VF9 | AF516675 |

BC8-8 | AF516674 |

Cylindrocystis brebissonii BC9-8 | AF115439 |

CNP1VF2 | AF513378 |

CNP2VF25 | AF516677 |

H1VF1 | AF513369 |

LG2VF16 | AF513372 |

SEV2VF1 | AF516678 |

SEV3VF14 | AF513371 |

SEV3VF49 | AF513373 |

SRS2VF18 | AF513375 |

UT8-26 | AF513376 |

ZNP1VF32 | AF513379 |

Prasinophyceae | |

Cymbomonas tetramitiformis | AB017126 |

Pterosperma cristatum | AJ010407 |

Halosphaera sp. | AB017125 |

Nephroselmis olivacea | X74754 |

Tetraselmis striata | X70802 |

Pseudoscourfieldia marina | X75565 |

Mantoniella squamata | X73999 |

Dolichomastix tenuilepis | AF509625 |

Picocystis salinarum | AF153313 |

Charophyceae and Embryophytes | |

Chaetosphaeridium globosum | AJ250110 |

Chlorokybus atmophyticus | M95612 |

Cylindrocystis crassa | AJ428080 |

Coleochaete orbicularis | M95611 |

Coleochaete scutata | X68825 |

Desmidium grevillii | AJ428117 |

Klebsormidium flaccidum | AF408240 |

Marchantia polymorpha | AY342318 |

Mesostigma viride | AJ250108 |

Mesotaenium kramstai | AJ428079 |

Mougeotia scalaris | X70705 |

Nitella capillaries | AJ250111 |

Nitellopsis obtuse | AF408226 |

Peniumcylindrus | AJ553930 |

Raphidonema nivale | AF448477 |

Staurastrum sp. | X74752 |

Zygnema circumcarinata | X79495 |

Ulvophyceae | |

Acrosiphonia duriuscula | AB049418 |

Enteromorpha intestinalis | AJ000040 |

Gloeotilopsis sarcinoidea | Z47998 |

Hazenia mirabilis | AF387156 |

Monostroma grevillei | AF015279 |

Pseudendoclonium basiliense | Z47996 |

Pseudoneochloris marina | U41102 |

Quadrigula closterioides | Y17924 |

Ulothrix zonata | Z47999 |

Urospora penicilliformis | AB049417 |

Ulva curvata | AF189078 |

Trebouxiophyceae | |

Amphikrikos sp. | AF228690 |

Chlorella ellipsoidea | X63520 |

Chlorella minutissima | X56102 |

Chlorella fusca | X56104 |

Chlorella saccharophila | X63505 |

Choricystis minor | X89012 |

Coenocystis inconstans | AB017435 |

Eremosphaera viridis | AF387154 |

Fusochloris perforatum | M62999 |

Golenkinia longispicula | AF499923 |

Koliella spiculiformis | AF278744 |

Marvania geminata | AF124336 |

Micractinium pusillum | AF499921 |

Microthamnion kuetzingianum | Z28974 |

Muriella aurantica | AB005748 |

Myrmecia biatorellae | Z28971 |

Myrmecia israeliensis | M62995 |

Oocystis heteromucosa | AF228689 |

Parietochloris pseudoalveolaris | M63002 |

Planktosphaeria gelatinosa | AY044648 |

Pleurastrum insigne | Z28972 |

Pleurastrum terrestris | Z28973 |

Prasiola crispa | AJ416106 |

Prototheca wickerhamii | X56099 |

Radiofilum conjunctivum | AF387155 |

Stichococcus chodati | AB055867 |

Trebouxia asymmetrica | Z21553 |

Trebouxia impressa | Z21551 |

Trebouxia magna | Z21552 |

Trochiscia hystrix | AF277651 |

Watanabea reniformis | X73991 |

Chlorophyceae | |

Ankistrodesmus stipitatus | X56100 |

Ankyra judayi | U73469 |

Asteromonas gracilis | M95614 |

Atractomorpha echinata | U73470 |

Bracteacoccus medionucleatus | U63098 |

Bracteacoccus giganteus | U63099 |

Bracteacoccus aerius | U63101 |

Bulbochaete hiloensis | U83132 |

Carteria obtuse | AF182818 |

Chaetopeltis orbicularis | U83125 |

Chaetophora incrassata | D86499 |

Characiosiphon rivularis | AF395437 |

Characium hindakii | M63000 |

Chlamydomonas baca | U70781 |

Chlamydomonas reinhardtii | M32703 |

Chlamydomonas humicola | U13984 |

Chlamydomonas noctigama | AF008241 |

Chlamydopodium vacuolatum | M63001 |

Chlorococcum cf. tatrense | AF514407 |

Chlorogonium euchlorum | AJ410443 |

Chlorogonium capillatum | AJ410442 |

Chloromonas reticulata | AJ410448 |

Chlorosarcinopsis minor | AB049415 |

Coelastrum microporum | AF388373 |

Cylindrocapsa geminella | U73471 |

Desmodesmus communis | X73994 |

Dictyochloris fragrans | AF367861 |

Dunaliella parva | M62998 |

Ettlia minuta | M62996 |

Fritschiella tuberosa | U83129 |

Gloeococcus maximus | U83122 |

Gongrosira papuasica | U18503 |

Haematococcus zimbabwiensis | U70797 |

Heterochlamydomonas inaequalis | AF367857 |

Hormotila blennista | U83123 |

Hormotilopsis gelatinosa | U83126 |

Hydrodictyon reticulatum | M74497 |

Lobochlamys culleus | AJ410461 |

Lobochlamys segnis | AJ410464 |

Mychonastes homosphaera | AB025423 |

Neochloris aquatica | M62861 |

Oedogonium cardiacum | U83133 |

Ourococcus multisporus | AF277648 |

Paulschulzia pseudovolvox | U83120 |

Pediastrum duplex | M62997 |

Planophila terrestris | U83127 |

Polytoma uvella | U22940 |

Pseudodictyosphaerium jurisii | AF106074 |

Scenedesmus obliquus | X56103 |

Scenedesmus pupukensis | X91267 |

Scenedesmus rubescens | X74002 |

Schizomeris leibleinii | AF182820 |

Spermatozopsis similes | X65557 |

Sphaeroplea robusta | U73472 |

Spongiochloris spongiosa | U63107 |

Stigeoclonium helveticum | U83131 |

Tetraspora sp. | U83121 |

Uronema belkae | AF182821 |

Volvox carteri | X53904 |

### Phylogenetic Analysis of Green Algae

The newly obtained desert algae sequences were combined with previously published sequences from 14 other desert green algae, and from a broad representation of all orders of green plants for which 18S sequences have been published in GenBank, with the exception that embryophytes are represented only by the *Marchantia polymorpha* sequence. To ensure a conservative estimate of exclusive phylodiversity measures, the closest matching full 18S rDNA sequence for each desert isolate, as found from BLAST (Altshul et al., 1990) searches, was also included. The sequences and their corresponding taxa and GenBank accession numbers are listed in Table 1.

A final alignment of 150 taxa (23 desert taxa and 127 others from freshwater, marine, and soil habitats) was constructed initially in smaller subsets of taxa using ClustalW (Thompson et al., 1994) and then refined by eye. The 150-taxon alignment was 1839 nucleotides in length. Of 1839 sites, 188 were eliminated because of alignment uncertainty, leaving 1651 aligned sites of which 441 were parsimony informative and an additional 227 were variable but not informative. The alignment, MrBayes file, and resulting trees associated with this analysis are available as supplementary material at http://systematicbiology.org/.

### Bayesian Phylogenetic Analyses

ModelTest 3.06 (Posada and Crandall, 1998) used in conjunction with PAUP 4b10 (Swofford, 2001) determined that the GTR+I+G model (Lanave et al., 1994; Gu et al., 1995) provided the best fit to the data according to both the likelihood-ratio test and the AIC criterion. Two independent runs were performed using the GTR+I+G (four-rate categories) model in MrBayes 3.0b4 (Huelsenbeck and Ronquist, 2001). Each run was started from an independent random starting tree and extended 25 million generations. Each run employed Metropolis-coupled MCMC (Geyer, 1991) using three heated chains (temperature parameter 0.2) in addition to the sampled (cold) chain. We used a flat Dirichlet prior for relative nucleotide frequencies and relative rate parameters, a discrete uniform prior for topologies, and an exponential distribution (mean 1.0) for the gamma-shape parameter and all branch lengths. We used MrBayes to construct a majority-rule consensus tree of 20,000 trees sampled from the last 10 million generations of each of the two runs (trees were sampled every 1000 generations). Convergence was assessed by comparing splits included in majority-rule consensus trees of each run separately. For continuous model parameters, we used Gelman and Rubin's estimated potential scale reduction approach (Gelman, 1996; Gelman and Rubin, 1992a, 1992b), which uses variation within and among independent MCMC runs to assess the degree to which the separate chains have converged.

### Phylodiversity Measures

We calculated several measures related to Faith's original phylogenetic diversity statistic (Faith, 1992). The basic quantities calculated from phylogenetic trees were *T* (total tree length), *E* (exclusive phylodiversity), and *I* (inclusive or total phylodiversity, which is identical to Faith's original measure). *E* includes terminal branches associated with desert taxa and shared ancestral edges subtending clades of desert taxa. We also estimated *E*_{S} using the program SIMMAP 1.0b1 (Bollback, 2004). *E*_{S} is perhaps a more natural measure of exclusive phylodiversity given the Bayesian approach taken here, but we based most of our discussion on *E* because it is equally applicable in both Bayesian and frequentist (i.e., maximum likelihood) contexts.

Three combinations of these basic measures are useful and were also computed. First, *P*_{EI} = *E*/*I* is related to the number of independent evolutionary transitions into the focal environment, which is “desert” in this study. *P*_{EI} is 0 if each sample represents an independent transition to the desert environment and no detectable evolution has occurred following the transition. At the other extreme, *P*_{EI} is at least 1 if only one transition to deserts is indicated by the phylogeny. Intermediate values of *P*_{EI} indicate that more than 1 transition occurred and at least some substitutions have accrued after at least some of the desert lineages were established.

The quantity *P*_{ET} = *E*/*T* describes the proportion of the total evolutionary history that apparently occurred in the desert environment. *P*_{ET} is important in this study for distinguishing between two hypotheses: (1) desert algae are simply transient algal spores carried on the wind and dropped onto deserts; and (2) desert algae are representatives of true desert-endemic lineages of green algae. If the first hypothesis is true, *P*_{ET} is expected to be zero because the desert isolates would in this case be common widespread taxa most likely already represented in GenBank. Presumably, our practice of including the nearest sequence in GenBank (using BLAST scores) for each of the desert isolates would result in zero-length branches leading to that taxon, and each desert isolate is expected to be an independent transition to land. *P*_{ET} is expected to be greater than zero under the alternative hypothesis because desert taxa will have had time to accumulate lineage-specific substitutions. The value of *P*_{ET} thus bears direct witness to the importance of desert green algae for understanding the evolution of green algae in general. *P*_{ET} = 1 would indicate that all knowledge of green algae comes from desert-dwelling green algae, whereas *P*_{ET} = 0 would mean that desert green algae essentially contribute nothing to our knowledge of green algal evolution (i.e., desert taxa represent minor tip branches on the tree). Of lesser interest in this study is *P*_{IT} = *I*/*T*, which measures the proportion of the total tree length accounted for by the inclusive phylodiversity. This measure is of use primarily in comparing our phylodiversity measures to the biodiversity measures of Martin (2002).

Finally, we were interested in how much evolution occurred on desert versus nondesert terminal branches. Terminal branches were of particular interest because species limits are often arbitrarily determined by the amount of evolutionary divergence separating contemporary organisms from their nearest relatives. Although our phylodiversity measures do not depend on any species definition, we were interested in whether, on average, desert green algae would be considered separate species. The average and median terminal branch lengths were recorded for both desert and nondesert taxa as a way of assessing this, and discussed in light of divergence values already observed for different species of desert green algae (Lewis and Flechtner, 2004).

## Results and Discussion

Convergence in model likelihood was apparent in the two independent MCMC runs by 15 million generations. The majority-rule trees constructed from the last 10 million generations of the two runs differed primarily in the position of *Koliella*, which occupied either a position within the Trebouxiophyceae clade (Fig. 2) or a position outside of the branch leading to Trebouxiophyceae, Chlorophyceae, and Ulvophyceae. The phylogenetic position of *Koliella* has been previously investigated (Katana et al., 2001) and was shown to be a member of the Trebouxiophyceae. The trees were otherwise similar, disagreeing only about the inclusion of three splits, each with posterior probabilities less than 0.8. Gelman and Rubin's scale reduction parameter *R* was less than 1.14 for the tree length and all continuous GTR model parameters (Table 2). To put this value in perspective, a value of *R* = 1 is ideal, indicating that the parallel Markov chains are completely exchangeable, and a value much larger than 1 is unacceptable, indicating that credible intervals might be much larger than for a comparable situation in which the chains had converged. The values we obtained indicated acceptable convergence according to the rule of thumb offered by Gelman (1996). Hereafter, all discussion of the tree and phylodiversity measures are based on a combined sample comprising the last 10 million generations of each of the two independent MCMC runs.

Parameter | MLE | Mean | 2.5% | 97.5% | R |
---|---|---|---|---|---|

T | 4.37050 | 10.7021 | 9.1200 | 12.3580 | 1.1385 |

r_{CT} | 5.1468 | 5.7958 | 4.7430 | 6.6942 | 1.1276 |

r_{CG} | 1.0497 | 1.2966 | 1.0417 | 1.5444 | 1.0409 |

r_{AT} | 1.1564 | 1.3918 | 1.1350 | 1.6633 | 1.0079 |

r_{AG} | 2.6609 | 3.6464 | 3.0437 | 4.2737 | 1.0077 |

r_{AC} | 1.0735 | 1.5342 | 1.2041 | 1.8681 | 1.0299 |

π_{A} | 0.2564 | 0.2332 | 0.2169 | 0.2492 | 1.0035 |

π_{C} | 0.2079 | 0.2039 | 0.1905 | 0.2208 | 1.0547 |

π_{G} | 0.2862 | 0.2854 | 0.2679 | 0.3021 | 1.0043 |

π_{T} | 0.2494 | 0.2776 | 0.2608 | 0.2934 | 1.0066 |

α | 0.5551 | 0.4582 | 0.3739 | 0.5243 | 1.0488 |

pinvar | 0.3753 | 0.3226 | 0.2759 | 0.3614 | 1.0125 |

Parameter | MLE | Mean | 2.5% | 97.5% | R |
---|---|---|---|---|---|

T | 4.37050 | 10.7021 | 9.1200 | 12.3580 | 1.1385 |

r_{CT} | 5.1468 | 5.7958 | 4.7430 | 6.6942 | 1.1276 |

r_{CG} | 1.0497 | 1.2966 | 1.0417 | 1.5444 | 1.0409 |

r_{AT} | 1.1564 | 1.3918 | 1.1350 | 1.6633 | 1.0079 |

r_{AG} | 2.6609 | 3.6464 | 3.0437 | 4.2737 | 1.0077 |

r_{AC} | 1.0735 | 1.5342 | 1.2041 | 1.8681 | 1.0299 |

π_{A} | 0.2564 | 0.2332 | 0.2169 | 0.2492 | 1.0035 |

π_{C} | 0.2079 | 0.2039 | 0.1905 | 0.2208 | 1.0547 |

π_{G} | 0.2862 | 0.2854 | 0.2679 | 0.3021 | 1.0043 |

π_{T} | 0.2494 | 0.2776 | 0.2608 | 0.2934 | 1.0066 |

α | 0.5551 | 0.4582 | 0.3739 | 0.5243 | 1.0488 |

pinvar | 0.3753 | 0.3226 | 0.2759 | 0.3614 | 1.0125 |

Although we did not perform a maximum likelihood search, we did obtain maximum likelihood estimates of the GTR model parameters on the majority rule consensus tree resulting from the MCMC analysis using PAUP* 4.0b10 (Swofford, 2001) (Table 2). The maximum likelihood estimates for nearly all parameters are smaller than the corresponding posterior means. This presumably reflects the effects of the prior distributions assumed for these parameters. For example, the mean branch length based on the maximum likelihood estimate of tree length is 4.37/271 = 0.016, whereas the mean branch length based on the posterior distribution is about twice this (10.93/297 = 0.037). Although the effect of the prior was not strong, increasing each branch length on average only about 0.02 substitutions per site, it is clear that the exponential branch length prior (which had mean 1.0) is exerting an influence. Importantly for this study, the phylodiversity ratios *P*_{ET}, *P*_{EI}, and *P*_{IT} appear much less sensitive to the effects of the prior distributions than tree length and other model parameters (discussed below).

The MCMC consensus tree of green algae (Fig. 2) illustrates that the transition from aquatic to desert habitats occurred numerous times independently in green algae, and from diverse phylogenetic backgrounds. Desert lineages have arisen in three of the five classes of green algae: Chlorophyceae, Trebouxiophyceae, and Charophyceae. All of these transitions arise from within clades of aquatic freshwater organisms; the predominantly marine classes Ulvophyceae and Prasinophyceae apparently lack desert representatives, even though Ulvophyceae does have terrestrial lineages (e.g., *Trentepohlia*).

Although the consensus tree in Figure 2 provides a point estimate of the number of transitions to land, credible intervals can be constructed using the trees sampled during the MCMC analysis. Using PAUP*4.0b10 (Swofford, 2001) we performed parsimony-based ancestral state reconstructions under both ACCTRAN and DELTRAN optimization for all 20,000 sampled tree topologies to obtain posterior probabilities of each possible number of gains and losses of terrestriality (Table 3). The combination with the highest posterior probability (0.3425) under ACCTRAN optimization (which favors reversals when there is homoplasy) was 14 gains, 4 losses; under DELTRAN (which favors parallelism when there is homoplasy), the most probable (0.4534) combination was 18 gains and no losses. There was no support under either optimization strategy for fewer than 12 transitions to terrestriality from aquatic lifestyles. We also used the SIMMAP 1.0b1 program (Bollback, 2004) to map this character using a two-state Markov model on all 20,000 trees sampled during the MCMC analysis. SIMMAP does not currently distinguish between forward (aquatic to terrestrial) and reverse changes, but the overall estimated number of transitions (17.6) was consistent with the most probable ACCTRAN and DELTRAN parsimony reconstructions.

Number of transitions from desert back to nondesert | ||||||||
---|---|---|---|---|---|---|---|---|

0 | 1 | 2 | 3 | 4 | 5 | 6 | ||

Number of transitions from nondesert to desert | A. | |||||||

12 | — | — | — | — | 0.0001 | 0.0010 | 0.0022 | |

13 | — | — | 0.0018 | 0.0164 | 0.0636 | 0.0688 | — | |

14 | 0.0067 | 0.0846 | 0.3144 | 0.3425 | — | — | ||

15 | 0.0015 | 0.0147 | 0.0402 | 0.0391 | — | — | — | |

16 | 0.0002 | 0.0017 | 0.0009 | — | — | — | — | |

B. | ||||||||

14 | — | 0.0072 | ||||||

15 | 0.0028 | 0.0707 | ||||||

16 | 0.0452 | 0.1465 | ||||||

17 | 0.2744 | — | ||||||

18 | 0.4534 | — |

Number of transitions from desert back to nondesert | ||||||||
---|---|---|---|---|---|---|---|---|

0 | 1 | 2 | 3 | 4 | 5 | 6 | ||

Number of transitions from nondesert to desert | A. | |||||||

12 | — | — | — | — | 0.0001 | 0.0010 | 0.0022 | |

13 | — | — | 0.0018 | 0.0164 | 0.0636 | 0.0688 | — | |

14 | 0.0067 | 0.0846 | 0.3144 | 0.3425 | — | — | ||

15 | 0.0015 | 0.0147 | 0.0402 | 0.0391 | — | — | — | |

16 | 0.0002 | 0.0017 | 0.0009 | — | — | — | — | |

B. | ||||||||

14 | — | 0.0072 | ||||||

15 | 0.0028 | 0.0707 | ||||||

16 | 0.0452 | 0.1465 | ||||||

17 | 0.2744 | — | ||||||

18 | 0.4534 | — |

Desert lineages are divergent enough at the 18S rDNA locus to be considered distinct species. The posterior mean of the average length of a terminal branch leading to a desert taxon is 0.0246 substitutions per nucleotide site (95% credible interval 0.0198–0.0294). Although we do not advocate using a certain rule for the amount of sequence divergence needed to define species, this value corresponds to nearly 5% pairwise divergence and is far greater than what has been observed between different species of desert green algae (e.g., Lewis and Flechtner, 2004). The average length of a terminal branch leading to a nondesert taxon is 0.0507, roughly twice the length of an average branch leading to a desert taxon. This result is expected because of the inclusion of close relatives of desert taxa and broad taxon sampling across green algae otherwise.

In three cases (e.g., one in the class Charophyceae and two in the Chlorophyceae), lineages without apparent aquatic close relatives are evident, implying that novel organisms are being recovered from these understudied communities. These new lineages are themselves tremendously diverse. For example, isolates EM1VF1 and SEV3VF14 exhibit 76 sequence differences, and LG2VF30 and BC9-8 have 50 nucleotide differences. This level of divergence within a clade of desert algae exceeds the divergence between monocot and eudicot angiosperms (comparison of rice and tomato 18S rRNA gene sequences in the same algal alignment, data not shown). In addition, there are examples of diversification within three of the desert lineages, one involving charophyte algae and two within the Chlorophyceae. Together, these results indicate that consideration of desert lineages may be essential for understanding the evolution of green algae (and even green plants), increasing taxon sampling in regions of the phylogeny not previously recognized as being poorly sampled.

Table 4 provides posterior means and 95% credible intervals of the basic phylodiversity measures (*E*, *I*, and *T*) and their derivative measures (*P*_{ET}, *P*_{EI}, and *P*_{IT}) for the 150-taxon Bayesian MCMC analysis. Maximum likelihood estimates (MLEs) of these quantities were computed for comparison using PAUP* 4.0b10 (Swofford, 2001) using the majority-rule consensus tree from the Bayesian analysis (Fig. 2). We note that the MLEs for *E*, *I*, and *T* are quite different than the posterior means for these quantities; however, phylodiversity measures based on ratios of these quantities (i.e., *P*_{ET}, *P*_{EI}, and *P*_{IT}) all fall within their respective 95% Bayesian credible intervals, suggesting that the assumed prior distributions have scaled branch lengths upward relative to the MLEs but had little effect on relative branch lengths. Hereafter, all discussion of phylodiversity measures will refer to the estimates based on the posterior distribution.

Measure | MLE | Mean | 2.5% | 97.5% |
---|---|---|---|---|

T | 4.3705 | 10.9283 | 9.1199 | 12.3583 |

I | 0.6811 | 1.7669 | 1.4616 | 2.0378 |

E | 0.2904 | 0.7229 | 0.5856 | 0.8547 |

P_{EI} = E/I | 0.4263 | 0.4092 | 0.3697 | 0.4492 |

P_{ET} = E/T | 0.0664 | 0.0662 | 0.0587 | 0.0741 |

P_{IT} = I/T | 0.1558 | 0.1617 | 0.1501 | 0.1736 |

Average desert tip | 0.0100 | 0.0246 | 0.0198 | 0.0294 |

Median desert tip | 0.00471 | 0.0127 | 0.0080 | 0.0183 |

Average nondesert tip | 0.0205 | 0.0507 | 0.0422 | 0.0574 |

Median nondesert tip | 0.01417 | 0.0349 | 0.0284 | 0.0413 |

Measure | MLE | Mean | 2.5% | 97.5% |
---|---|---|---|---|

T | 4.3705 | 10.9283 | 9.1199 | 12.3583 |

I | 0.6811 | 1.7669 | 1.4616 | 2.0378 |

E | 0.2904 | 0.7229 | 0.5856 | 0.8547 |

P_{EI} = E/I | 0.4263 | 0.4092 | 0.3697 | 0.4492 |

P_{ET} = E/T | 0.0664 | 0.0662 | 0.0587 | 0.0741 |

P_{IT} = I/T | 0.1558 | 0.1617 | 0.1501 | 0.1736 |

Average desert tip | 0.0100 | 0.0246 | 0.0198 | 0.0294 |

Median desert tip | 0.00471 | 0.0127 | 0.0080 | 0.0183 |

Average nondesert tip | 0.0205 | 0.0507 | 0.0422 | 0.0574 |

Median nondesert tip | 0.01417 | 0.0349 | 0.0284 | 0.0413 |

*P*_{ET} revealed that 6.6% of all substitutions occurred in desert green algal lineages. Obviously, this measure depends on taxon sampling: adding more nondesert taxa would decrease the apparent contribution of the desert taxa. Were it possible to include representative sequences of all green algae, the actual value of *P*_{ET} and other phylodiversity measures could be obtained. We are still far from adequate sampling with respect to green algae as a whole and desert lineages in particular. Although we included the closest nondesert sequence to each of the desert green algal isolates to intentionally introduce a conservative bias into phylodiversity estimates, it is possible that the true value is quite different than the 6.6% value obtained in this study. Nevertheless, these measures allow some extreme hypotheses to be ruled out.

The fact that *P*_{ET} is greater than zero means that a nontrivial amount of evolution occurred in deserts, and is thus evidence against the idea that desert green algal isolates are simply ephemeral and incidental desert inhabitants. Even though *P*_{ET} may change as our knowledge of both desert and nondesert green algae improves, it is unlikely that this conservative conclusion will be overturned by new data. The implication is that there are green algae endemic to deserts, and these represent an important part of a complex community that has been underappreciated, despite being right under our nose, so to speak.

The posterior mean of *P*_{EI} was 0.4092, which indicates that less than one half of the inclusive phylodiversity can be attributed to substitutions accrued exclusively in desert lineages. The fact that *P*_{EI} is much less than 1 means that desert green algal lineages have arisen numerous times, and in fact the tree shows that 14 transitions to desert environments from freshwater aquatic ancestors are required to explain the 23 desert isolates. Although *P*_{EI}, like *P*_{ET}, will change with increased taxon sampling of nondesert green algae and increased isolation of desert lineages, the conservative conclusion that can be drawn (and that will not change with the addition of future data) is that desert green algae represent many independent transitions to land from aquatic green algal ancestors. This is significant because heretofore only one transition to land has been widely recognized, namely the one leading to the embryophytes and including the green plants most familiar to us (e.g., mosses, ferns, conifers, flowering plants). For those seeking to understand the evolutionary, developmental, and physiological changes that necessarily underlie the transition from aquatic to terrestrial existence in green plants, it is clear that more than one lineage should be examined.

#### Choice of branch length prior distribution

The Bayesian approach taken here necessitates specifying a prior distribution for branch lengths, and for this study we used an exponential distribution with mean 1.0. Because the choice of prior distribution affects branch lengths sampled during the MCMC analysis, and hence influences phylodiversity measures obtained from those samples, careful consideration of branch length prior distributions is important. MrBayes 3.0b4 allows only exponential or uniform prior distributions to be applied to branch lengths. The use of a truncated uniform distribution—i.e., Uniform(0,T), where T is an arbitrarily large upper bound—has been shown to create serious artifacts (Felsenstein, 2004); for example, yielding credible intervals for a parameter that exclude the maximum likelihood estimate. We chose to use an exponential distribution with mean 1.0 rather than one with a smaller mean to increase the variance (and thus decrease the influence) of the prior distribution. This unfortunately makes the prior branch length mean larger than what many would consider typical for branch lengths because the standard deviation equals the mean in exponential distributions. Rather than the pure Bayesian approach taken here, it would be possible to use an empirical Bayes approach (basing the mean of the branch length prior on the maximum likelihood estimate of tree length), or an hierarchical model (e.g., letting the mean of the branch length prior be determined by an hyperprior distribution), as advocated by Suchard et al. (2001). As the software for Bayesian phylogenetics evolves, more flexibility with respect to branch length prior distributions will be possible.

#### Caveats to the use of phylodiversity summary measures

Any single statistic summarizing phylodiversity can be misleading, and thus these statistics should be viewed as tools but not complete descriptions of phylodiversity. For example, single isolates with unusually high rates of substitution represent outliers that could unduly influence measures of molecular phylodiversity. On the other hand, unusual levels of phylodiversity may point out interesting features, for example elevated levels of substitution associated with life in certain environments but not others. Phylodiversity measures should not be used in isolation, and there is a need for new measures designed to identify specific types of outliers; however, at the very least, simple graphical inspection of the tree with branch lengths drawn proportional to the expected number of substitutions (e.g., Fig. 2) allows identification of many potential problems.

#### Relationship to previous approaches to quantifying phylodiversity

Previous approaches used phylodiversity measures to assess the conservation value of a particular area (Faith, 1992) or to compare biodiversity in competing environments (Hughes et al., 2001; Martin, 2002). Our motivation for supplementing Faith's original PD measure arose from the need to address a type of question not previously addressed, namely the importance of a group of taxa defined on the basis of environment in the context of the containing clade of all green plants. Although our intention is to supplement rather than replace existing measures of phylodiversity, we feel that a comparison of existing measures to our quantities is appropriate.

#### Comparison to Faith's uniqueness measure

The exclusive phylodiversity *E* is related to the uniqueness measure *U* discussed by Faith (1994). Whereas *U* represents the *probability* of at least one substitution (per site) unique to the focal lineages, *E* measures the *expected number* of unique substitutions (per site) in these segments. Faith's (1994)*U* is related to *T* and *E* as follows:

*U*and

*E*capture the uniqueness of the focal lineages in different ways, reflecting the different contexts in which they are deemed useful. Faith's motivation was to measure the unique genetic contribution of a candidate taxon for conservation, for purposes of comparison with other candidate taxa. The winning candidate taxon contributes relatively more unique substitutions than other candidates against a reference subtree comprising already-conserved taxa. In our case, the reference subtree would necessarily include all nondesert taxa included in the study. As the size of the nonfocal group grows, the uniqueness according to

*U*decreases because substitutions must occur on focal lineages and

*not*occur on other lineages in order to be counted by

*U*. Thus, for questions like those addressed in this study, keeping

*E*separate from

*T*is important.

#### Comparison to Martin's measures

The measures described here are correlated with the quantities proposed by Martin (2002), and Figure 3 presents a comparison of our phylodiversity measures, *P*_{EI} and *P*_{IT}, to the *F*_{ST} and *P*-test approaches of Martin (2002). Figure 3 is essentially the same as Martin's (2002) figure 4, the only difference being that we approximated the relative branch lengths because they were not provided in Martin's paper.

Unlike Martin (2002), we calculated *F*_{ST} assuming that distances between taxa are additive. That is, the length of the shortest path from one taxon to another through the tree exactly equals the pairwise distance. This additivity is unlikely in real data, but is justified for our (purely illustrative) purposes. Martin's *P*-test distinguishes the two cases on the left (Fig. 3a, Fig. 3b, Fig. 3c), which are characterized by few transitions between environments, from the two on the right (Fig. 3b, Fig. 3d), which are characterized by numerous transitions. Our measure *P*_{EI} also distinguishes left from right, being high (1.00, 0.96) for the few-transitions cases and low (0.63, 0.29) for the cases involving numerous transitions.

Martin's *F*-test (test of the hypothesis *F*_{ST} = 0) distinguishes top from bottom. *F*_{ST} is high (0.50, 0.17) (i.e., significantly greater than 0) for the two cases on the top (Fig. 3a, Fig. 3b), which are characterized by having at least one clade of closely related taxa that also share the same environment. *F*_{ST} is low (0.08) or even negative (−0.08) (i.e., not significantly greater than 0) for the two cases on the bottom (Fig. 3c, Fig. 3d), in which all pairs of taxa are nearly equally distantly related (Fig. 3c) or pairs that are relatively closely related include members from both environments (Fig. 3d).

Our measure *P*_{IT} is related to *F*_{ST}. The highest values of *F*_{ST} occur when there are only two clades each representing a different environment, and these clades are shallow (i.e., pairs of taxa within each clade are closely related). These are the same conditions under which *P*_{IT} is small. On the other hand, *F*_{ST} is lowest when all clades are heterogeneous, each being composed of taxa from different environments, so that paths between pairs of taxa from the same environment often pass through the root of the tree. Such situations are expected to yield large *P*_{IT} values. In fact, the correlation between *F*_{ST} and the quantity 1 − *P*_{IT} is either 0.99 or 0.74, depending on whether the open squares or closed squares, respectively, are considered the focal environment.

Although *P*_{IT} can distinguish one extreme (Fig. 3a) from the other (Fig. 3d), it does not do as well as *F*_{ST} in distinguishing the top from the bottom two cases in Martin's figure 4. *F*_{ST} also has the advantage of symmetry: the choice of focal environment matters when calculating *P*_{IT}, but does not matter for *F*_{ST}. We do not consider this a failing of our phylodiversity measures because they were not designed to compare biodiversity in two (or more) contrasting environments, but rather to compare the diversity in one focal environment to the total phylodiversity.

#### Bayesian credible intervals versus frequentist hypothesis tests

Using a Bayesian approach, we were able to compute Bayesian credible intervals for phylodiversity measures. This approach explicitly accounts for the fact that the phylogeny is not known without error. The interpretation of these 95% credible intervals is straightforward: given the data, the model, and the assumed prior distributions for model parameters, the true value of each phylodiversity measure lies within its credible interval with probability 95%. There are dangers associated with this approach, however: using a model that fails to capture an important feature of molecular evolution can substantially change the size and location of the credible intervals, as can changes in the assumed prior distributions. However, the ability to account for uncertainty in the phylogeny is a powerful motivation for taking the Bayesian approach.

Martin (2002) takes a frequentist approach, testing particular null hypotheses about *F*_{ST} or the number of changes in environment. The *F*-test is independent of phylogeny, as it depends only on pairwise comparisons, and thus there is no reason to account for phylogenetic uncertainty in this case, although it may be advisable to account for multiple substitutions in some way when calculating *F*_{ST}. The *P*-test is explicitly phylogenetic, however. The null hypothesis in this case is that the number of transitions between environments is what would be expected if the underlying phylogeny were random. If the number of inferred transitions is small relative to this expectation, then the *P*-test is significant. It is sometimes not entirely clear what the null distribution represents in randomization tests (e.g., Swofford et al., 1996). This is apparent from the ease with which a different but equally reasonable null distribution can be proposed. For instance, one could fix the topology of the tree and randomly shuffle the assignment of environments to the tip nodes. This randomization approach seems just as sensible as randomizing the underlying topology, but counterexamples in which the two ways of randomizing produce different null distributions are easy to find. For example, consider a six-taxon tree in which two closest neighbors are in one environment and the other four taxa belong to a different environment. The significance probability if topologies are randomized is 3/7 because 1 change can be inferred in 15 of the 105 possible topologies. The significance probability obtained when assignment of environment is shuffled is either 1/5 or 2/15 depending on the shape of the tree, but importantly, neither of these equals the 3/7 obtained when topologies are randomized. Maddison and Slatkin (1991: 1195) concluded in their review of null models for tests of this sort that “… there are several possible null models that can be used in evolutionary studies, and [that] they lead to somewhat different distributions of the number of character state changes.” Later (p. 1196), they state “if the tree is known and considered a given in the evolution of the characters, then the appropriate null model is one that randomizes characters.” Such problems with interpretation argue for a Bayesian approach because posterior probabilities provide a direct and unambiguous answer to the question posed. In this case, the question might be “What is the probability that the number of transitions between environments is X or fewer?” Simply noting the proportion of sampled trees from a Bayesian MCMC run in which the number of inferred transitions is X or fewer provides a direct answer to this question in the form of an approximated posterior probability.

## Conclusions

Our results on green algae in deserts echo the surprising amount of eukaryotic diversity recently uncovered from such habitats as highly acidic rivers, anoxic mud, and deep sea vents (e.g., Zettler et al., 2002). Together, these studies greatly expand our knowledge of the range of environments in which eukaryotes can exist. Phylogenetic analyses of eukaryotes from extreme environments and their nearest relatives allow exploration of the number and rapidity of transitions to the habitat of interest, and can provide insights into the physiological traits important in these transitions. Our results indicate that desert green algae are not simply close relatives of aquatic taxa, but instead represent levels of divergence that could be interpreted as new species, new genera, or even higher order taxa. The exclusive molecular phylodiversity measure and related measures together provide a useful way to directly characterize both the distribution and extent of variation for a given set of taxa.

## Acknowledgements

We thank S. Olm for technical help, R. Colwell and P. Turchin for comments made on an earlier version of the manuscript, and G. Burleigh and one anonymous reviewer for their helpful and constructive comments. The authors acknowledge support from National Aeronautics and Space Administration (EXB02-0042-0054) and the National Science Foundation (DEB9870201) to LAL, and Alfred P. Sloan Foundation/NSF grant (98-4-5 ME) to POL.

## References

*Dictyochloropsis reticulata*and from members of the genus

*Myrmecia*(Chlorophyta, Trebouxiophyceae

*cl. nov*.)

*Chlorella*species within the Chlorococcales based upon complete small-subunit ribosomal RNA sequences

*Scenedesmus*(Chlorophyta) from Desert Soil Communities of Western North America

*Neochloris*(Chlorophyta)

BCP^{a} ID | Locality |
---|---|

CNP2 | Canyonlands National Park, Needles District, Chesler Park, San Juan County, UT, USA. Latitude 38°06.204′N, Longitude 109°51.000′W, Elevation 1724 m. Coll: 11 May 1999. |

EM1 | Mojave National Preserve, Cinder cone site, San Bernardino County, CA, U.S.A. Latitude 35°11.671′N, Longitude 115°52.223′W, Elevation 641 m. Coll: 4 June 1998. |

LG2 | Sierra San Padra Martir of Baja California, Mexico, LaGrulla Meadow area. Latitude 30°54′N, Longitude 115°28′W, Elevation 2100 m. Coll: 15 June 1998. |

LG3 | Sierra San Padra Martir of Baja California, Mexico. LaGrulla Meadow area. Latitude 30°54′N, Longitude 115°29′W, Elevation 2100 m. Coll: 15 June 1998. |

SRS2 | San Rafael Swell, Emery County, UT, USA. Latitude: 39°08.574′N, Longitude 110°46.282′W. Coll: 9 May 1999. |

ZNP2 | Zion National Park, Washington County, UT, USA. Latitude: 37°12.932′N, Longitude 112°34.933′W, Elevation 1511 m. Coll: 17 May 1999 |

ZNP3 | Zion National Park, Washington County, UT, USA. Latitude: 37°20.556′N, Longitude 113°06.578′W, Elevation 2042 m. Coll: 18 May 1999. |

BCP^{a} ID | Locality |
---|---|

CNP2 | Canyonlands National Park, Needles District, Chesler Park, San Juan County, UT, USA. Latitude 38°06.204′N, Longitude 109°51.000′W, Elevation 1724 m. Coll: 11 May 1999. |

EM1 | Mojave National Preserve, Cinder cone site, San Bernardino County, CA, U.S.A. Latitude 35°11.671′N, Longitude 115°52.223′W, Elevation 641 m. Coll: 4 June 1998. |

LG2 | Sierra San Padra Martir of Baja California, Mexico, LaGrulla Meadow area. Latitude 30°54′N, Longitude 115°28′W, Elevation 2100 m. Coll: 15 June 1998. |

LG3 | Sierra San Padra Martir of Baja California, Mexico. LaGrulla Meadow area. Latitude 30°54′N, Longitude 115°29′W, Elevation 2100 m. Coll: 15 June 1998. |

SRS2 | San Rafael Swell, Emery County, UT, USA. Latitude: 39°08.574′N, Longitude 110°46.282′W. Coll: 9 May 1999. |

ZNP2 | Zion National Park, Washington County, UT, USA. Latitude: 37°12.932′N, Longitude 112°34.933′W, Elevation 1511 m. Coll: 17 May 1999 |

ZNP3 | Zion National Park, Washington County, UT, USA. Latitude: 37°20.556′N, Longitude 113°06.578′W, Elevation 2042 m. Coll: 18 May 1999. |

Biotic Crust Project: http://hydrodictyon.eeb.uconn.edu/bcp/.