DNA barcodes successfully identified Macaronesian Lotus (Leguminosae) species within early diverged lineages of Cape Verde and mainland Africa

Plant barcoding uses short DNA sequences to identify unknown samples at species level. This technique relies on the universality of these gene regions and the existence of enough variation among species to allow discrimination. Island radiations pose one challenging scenario where insufficient variation has accumulated in recently diverged groups to allow species identification. In this work we tested whether six gene regions are suitable for barcoding such a radiation in the Macaronesian Lotus. We found high levels of species discrimination in lineages of 3.5 Mya old or older and that the efficiency drastically reduces for younger radiations.


Introduction
DNA barcoding is a procedure that uses universal DNA sequences to assign species names to sampled individuals (http://www.barcodeoflife.org/). Plant DNA barcoding is currently performed with the two-locus (matK and rbcL) recommended by the CBOL Plant Working Group (2009). This suggested combination is able to discriminate 72 % of the samples used by the CBOL Plant Working Group at the species level, with the remaining samples assigned to congeneric species groups.
Many of the studies that have tested regions as barcodes in plants have focused on large data sets that span a wide range of land plants, or at least angiosperms (Kress and Erickson 2007;Fazekas et al. 2008;Lahaye et al. 2008;CBOL Plant Working Group 2009;Ford et al. 2009). Their purpose has been the assessment of the universal applicability of the regions in species discrimination. However, it has been argued that the success in species discrimination of DNA barcodes will drop in (i) some groups with complex biology and (ii) closely related species within the same genus (or in recently evolved groups) (Pillon et al. 2013).
To date, the level of species discrimination within the same genus has been tested in a number of cases (Sass et al. 2007;Newmaster et al. 2008;Newmaster and Ragupathy 2009;Song et al. 2009;Starr et al. 2009;Clerc-Balin et al. 2010;Liu et al. 2010;Wang et al. 2010), and at least some groups of closely related species will be problematic for barcoding (Sass et al. 2007;Miller et al. 2009;Seberg and Petersen 2009). When individual genera are sampled more extensively, the percentage of species discrimination tends to decrease, even when several regions are combined (Kondo et al. 2007;Sass et al. 2007;Edwards et al. 2008;Seberg and Petersen 2009).
The Macaronesian Lotus has colonized and radiated into the five volcanic archipelagos (Azores, Madeira, Salvage Islands, Canary Islands and Cape Verde) within this region from mainland Africa (Allan et al. 2004;Ojeda et al. 2012). This group comprises 41 described species, divided into two sections: Pedrosia and Rhyncholotus (or the 'rhyncholotus group') (Degtjareva et al. 2006). Section Pedrosia comprises 37 recognized species, while section Rhyncholotus comprises only four species (Allan et al. 2004). The two groups are distinguished by contrasting floral morphology associated with different pollination syndromes, but within each group vegetative features are more useful for species recognition and identification (Sandral et al. 2006). Based on these vegetative and reproductive structures, Sandral et al. (2006) further subdivided this group into nine informal taxonomic groups (Table 1), which reflects the lineages recovered in the most recent phylogenetic analysis (Ojeda et al. 2012). Many of these species are restricted to specific habitats, such as the pine forest and the lowland scrub. Furthermore, 70 % of them are endemic to single islands. Thus, the group is highly susceptible to habitat destruction, and at least 10 species are listed under some category of conservation threat, ranging from rare to critically endangered (VV. AA. 2000;Martín et al. 2008;Bañ ares et al. 2011; Table 2).
The applicability of the recommended barcode regions for species recognition within very recently evolved groups, such as those resulting from island radiations, has not being extensively tested (Mort et al. 2010;Pillon et al. 2013). It is unclear whether the levels of DNA variation observed in the two-locus recommended barcodes of the CBOL Plant Working Group will allow species discrimination in groups that radiated on islands. Here we present the assessment of five plastid regions suggested as barcodes in previous studies (matK, rpoC1, rpoB, trnH-psbA and rbcL) and the nuclear ribosomal internal transcribed spacer (ITS) as barcodes within the Macaronesian Lotus. Additionally, we related the performance of species identification of these six barcodes with age estimates of each lineage and the time of most recent common ancestor (MRCA) of each lineage within each archipelago and mainland Africa.
In this study, we aim to address the following questions: (i) Are these six DNA regions (matK, rpoC1, rpoB, trnH-psbA, rbcL and ITS) variable enough to allow species discrimination within the different age lineages in the Macaronesian Lotus group? (ii) Are these six barcodes variable enough to allow species discrimination for the endangered species of this group? and (iii) Can these regions reliably identify assemblages (informal taxonomic groups) of species recognized on morphological grounds?

Taxon sampling
Our sampling included 78 accessions representing all the species currently described within the sections Pedrosia and Rhyncholotus, except for three species (Lotus loweanus, L. chazalei and L. tibesticus) that were not available for this analysis. For 10 species we were unable to add more than one sample to represent the species and we included more than one accession for 27 species. Our analysis also included accessions from some populations that (based on previous molecular and morphological analyses) may represent four new undescribed species within the section Pedrosia (Oliva-Tejera et al. 2005Sandral et al. 2006;A. Santos-Guerra, Unidad de Botanica-ICIA, pers. comm.). For comparison, we also included five accessions from section Lotus [see Supporting Information]. Table 1. Dates of divergence from the MCRA in the eight informal taxonomic groups within Pedrosia and the Rhyncholotus group. *Taxonomic groups following morphological features according to Sandral et al. (2006). The remaining groups classified in this study. } Species not sampled in this analysis. Dating the phylogeny of Macaronesian Lotus The Macaronesian Lotus seems to have colonized this region from mainland Africa more than once (Allan et al. 2004) and the ornithophilous traits present in the four species of the Ryncholotus group evolved recently, within the last 2 Mya (Ojeda et al. 2012) from a group of entomophilous ancestors (Ojeda et al. 2013). The group has colonized the five volcanic archipelagos of this region at different times and it has recent species radiations in some of these archipelagos (e.g. Canary Islands) (Ojeda et al. 2012). The most recent phylogenetic analysis of the group recovered four major clades using a combined analysis of four nuclear (ITS, three CYCLOIDEA homologues) and two chloroplast (trnH-psbA and matK) regions (Ojeda et al. 2012). Despite the nearly complete sampling of the group and the number of gene regions used, the most recently diverged clades had moderate-to-low bootstrap support. In order to obtain an estimate of the divergence times of the nine informal taxonomic groups identified within the Macaronesian Lotus, we used a combined matrix of 52 samples and four gene regions (ITS, matK, trnH-psbA and CYB6) with a total of 2092 bp. Divergence times were obtained using the program Beast v1.5.4 (Drummond and Rambaut 2007), and the analysis was done using a constant-rate Yule (speciation process) prior and all other priors and operators with the default settings. Four independent runs were performed using the uncorrelated lognormal relaxed-clock model (Drummond et al. 2006) for 50 000 000 generations. Trees and parameters were sampled every 5000 generations, yielding a total of 10 000 trees, with a burn-in of 5 000 000. All analyses were run using the HYK + gamma substitution model. The Beast file was created using the BEAUti program v 1.5.4 within Beast. The performance of each run was further analysed with the program Tracer. Mean parameter estimates and 95 % highest posterior densities were determined by analysing the Beast tree files with TreeAnnotator v 1.5.4 (Drummond and Rambaut 2007). Trees were visualized and edited with Figtree v1.3.1. This analysis was constrained with the best hypothesis of relationship (topology) of this group obtained from MP and ML (Ojeda et al. 2012).
The topology was calibrated in three points. Two points were calibrated using two endemic taxa from two different islands, Lotus sessilifolius subsp. villossisimus (El Hierro, 1.12 Mya) and L. sessilifolius subsp. sessilifolius (La Palma 1.77 Mya) (Ancochea et al. 1994;Carracedo 1994). The third calibration point of 20.6 Mya was based on the age of the oldest island, Fuerteventura, as an upper limit for the colonization of the Canary Islands (Carracedo 1994) and therefore an upper limit for the age of the MRCA for the species of this archipelago.

Barcode regions selected
We sequenced six regions: the recommended two-locus cpDNA barcode (matK + rbcL; CBOL Plant Working Group 2009), three other cpDNA regions (trnH-psbA, rpoB and rpoC1) and the nuclear ITS region, which has been assessed in some plant groups as a barcode (Chase et al. 2005;

Molecular analysis
Genomic DNA was extracted from fresh leaves, silica-gel dried leaf material or voucher specimens following standard procedures (Doyle and Doyle 1987). Amplification was carried out with the following PCR conditions for all the plastid regions: 94 8C for 3 min, 30 cycles of 94 8C for 3 min, 45 8C for 1 min and 72 8C for 2 min, with a final cycle of 72 8C for 5 min. The nuclear ribosomal intergenic spacer ITS was amplified using the following conditions: 94 8C for 3 min, 30 cycles of 94 8C for 1 min, 55 8C for 1 min and 72 8C for 1.5 min, with a final cycle of 72 8C for 5 min. Each locus was sequenced and the raw sequence data were imported to Sequencher 4.1 for editing and constructing contig sequences. Consensus sequences were imported to Se-Al ver. 1.0 (Rambaut 1996). To eliminate sequencing error, sequence quality was carefully assessed. Polymorphisms were sequenced on both strands and dubious cases re-sequenced. Each region was analysed separately and the accessions with failed amplifications were removed.

Assessment of the barcode regions
Three requirements have been suggested for the official barcodes: universality, sequence quality and coverage, and discrimination (CBOL Plant Working Group 2009). We evaluated these three parameters in the six regions tested within this group. Universality: we estimated the percentage of amplification success on the first trial as an indicator of universality, using the same PCR conditions. Sequence quality and coverage: we estimated the percentage of bidirectional sequences with few or no ambiguous bases for each region.
Discrimination: we evaluated discrimination at two levels: species discrimination and discrimination of informal taxonomic groups following previous taxonomic analysis based on morphological features (Sandral et al. 2006). Nine informal taxonomic groups at the infrageneric level have been suggested within the Macaronesian assemblage (Table 1). We considered that useful discrimination at this level was achieved when at least 50 % of the species were assigned within the same group. For species discrimination we used a distance-based method to assign species. Each region was analysed separately and in various combinations with neighbour-joining (NJ) using Kimura two-parameter as the standard in barcoding applications. We also analysed the data using unweighted pair group method with arithmetic mean and parsimony (Lahaye et al. 2008) as implemented in PAUP4b10 (Swofford 2001). However, those methods did not result in any major differences in species discrimination. We also tested whether the inclusion or exclusion of missing sequences affected species discrimination in two-locus combinations and when all five plastid regions were combined with ITS.

Results
We found that the barcode regions tested successfully identified early diverged species from Cape Verde and mainland Africa and Europe but the success was reduced in more recent speciation events. The different lineages included within Macaronesian Lotus diverged and radiated within this archipelago at different times. The earliest divergent lineages include two African groups (Lotus arenarius and L. jolyi) and the lineage that colonized Cape Verde (Lotus purpureus). All the species included within these three groups had 100 % of species discrimination when all five regions were combined, and even when individual regions were analysed alone (Table 3 and Fig. 1). Species discrimination was greatly reduced on the lineages that diverged at the end of the Pliocene and beginning of the Pleistocene (3.5 to 2 Mya) within the Canary Islands, Madeira and the Salvages.
All regions had .95 % sequencing success, except for the matK region with 83 % success, due to failure of amplification or due to regions with T or A repeats that caused failure during sequencing. This region had the lowest level of bidirectional sequence quality (Table 4).
The trnH-psbA and rpoB regions showed the highest and the lowest level of variation and species discrimination of all regions evaluated, respectively (Table 3). The combination trnH-psbA + matK showed the highest level (34 %) of discriminatory power at the species level for two-locus combinations. Three two-locus combinations (trnH-psbA + matK, matK + rpoC1, rpoC1 + trnH-psbA) showed slightly better discriminatory power than the barcode recommended by CBOL (matK + rbcL) ( Fig. 2 and Table 3).
When all five plastid regions were combined, we achieved the identification of 14 species (36 %) of the 38 species in our sample (Table 3). Even using five regions only 3 of the 10 species (30 %) of conservation concern were identified at the species level.
We were able to identify only four informal taxonomic groups with the combination of matK + trnH-psbA and no improvement was observed when all regions were combined, or with any region when analysed alone ( Table 3). The intergenic spacer trnH-psbA was the only plastid region in which we observed intraspecific variation, due to two indels and a small inversion.
The ITS region showed the highest level of variability of all regions tested in this study when analysed alone, with a species identification rate of 26 %. The overall level of species discrimination increased substantially when we combined this region with a plastid region, with the best two combinations being ITS + trnH-psbA AoB PLANTS www.aobplants.oxfordjournals.org & The Authors 2014 and ITS + matK (Table 3). The addition of ITS increased the discriminatory power in the species sampled overall, with 52 % species discrimination when all six regions were combined (Fig. 3). However, even with six regions we were able to identify only 30 % of the species of conservation concern.
Discussion Plant DNA barcoding of phylogenetically diverse assemblages has proven successful with high levels of species discrimination, e.g. Panamanian trees with 98 % of species identification (Kress et al. 2009) and Mesoamerican orchids with .90 % of species identification (Lahaye et al. 2008), but the success of species discrimination tends to decrease as the number of species within families or genera is increased (Gonzalez et al. 2009;Xiang et al. 2011;Yesson et al. 2011;Zhang et al. 2011;Arca et al. 2012;Maia et al. 2012;Saarela et al. 2013). Previous studies have reported relatively low (55 % using trnH-psbA in Aspalathus) to moderately high percentages (e.g. 92 % in Crocus) of species discrimination in several congeneric plant groups (Sass et al. 2007;Edwards et al. 2008;Seberg and Petersen 2009), and it has been suggested that barcodes will have some limitations in closely related species  Table 3. Performance of the five plastid regions tested separately and in some combinations with the nuclear ITS gene region. *Informal sections according to Sandral et al. (2006). A, including all accessions; B, excluding accessions with missing sequences in two-pair combinations.       (Chase and Fay 2009), and especially on island radiations (Pillon et al. 2013).
In the particular case of the Macaronesian Lotus, we were able to identify 18 % of the samples at the species level with the CBOL suggested two-locus combination (matK + rbcL) (Fig. 3 and Table 3) and only 52 % of the samples when all six regions were combined. This contrasts with the general rate of discrimination success with matK + rbcL, which is 70 % taking as a reference the database used by CBOL (CBOL Plant Working Group 2009). The nrITS region showed the highest level of species discrimination (26 %) of all regions, and trnH-psbA (18 %) of the plastid barcodes tested. These two regions have also low levels of variation in the recent lineages of Lotus. The L. sessilifolius and the Rhyncholotus groups have identical ITS sequences, despite the differences in vegetative and floral traits between the two groups (Ojeda et al. 2012).
Although the percentage of species discrimination within the Macaronesian Lotus is low overall, the discriminatory power of these barcode regions is not uniformly distributed across the lineages we analysed. Early divergent groups within Africa and Europe (Lotus jolyi and L. arenarius group in Fig. 1) have 100 % of species discrimination. The same applies for lineages that presumably colonized this archipelago early (L. purpureus group, Fig. 1).
The successful application of barcodes in recent radiations will depend on several factors, including the colonization time and the time of the most recent radiations within each particular group. To date, barcodes have been only tested in species of Tolpis within Macaronesia (Mort et al. 2010) and at the floristic level in 64 endemic taxa from 23 angiosperm families within the Garajonay National Park in La Gomera (Jaé n-Molina et al. 2010). Using two combinations of four plastid regions (matK, rpoC1, rpoB and trnH-psbA), Mort et al. (2010) found high levels of species discrimination in the majority of the morphologically distinct species within the genus Tolpis (Asteraceae), even within the relatively recent radiated clades of the Canary Islands. Using the twolocus (matK + rbcL) recommended barcode, Jaén- Molina et al. (2010) found similar levels of species discrimination compared with other geographical regions where these two regions have been applied (CBOL Plant Working Group 2009), suggesting that these two regions have practical application in species discrimination in this particular island.
Therefore, it seems that the limited success of species discrimination we found in the Macaronesian Lotus might be associated with their recent colonization and diversification. In Hawaii, the recently radiated genera Cyrtandra and Clermontia also have lower levels of species discrimination within the rbcL, trnH-psbA and matK regions (Pillon et al. 2013), suggesting that recent island radiations might pose a difficulty for barcoding. Although the current amount of examples of barcoding island radiations is too limited to draw definitive conclusions, it seems that there is a threshold at which these species are too young to be barcoded with the current markers. It has been suggested, for instance, that the Hawaiian species of Cyrtandra and Clermontia have a threshold between 3 -4.7 and 2 -3 Mya, respectively (Pillon et al. 2013). In the particular case of the Macaronesian Lotus, we found that lineages ,3 Mya have reduced species discrimination. Kim et al. (2008) identified three windows of colonization within other Macaronesian plant groups. Some groups colonized Macaronesia in the late Miocene (Aeonium, 15.2 Mya) or in the late Pliocene (Sideritis, 3.3 Mya), but most of them radiated during the Pleistocene (,3 Mya). Those groups that colonized Macaronesia relatively recently, and also those that radiated within the last 3 Mya, might represent a particular challenge to barcodes. Additional studies in other plant groups will be necessary to estimate the overall success of barcodes in this geographical region.
Besides the age of colonization and radiation of this group, the current taxonomy and species delimitation within the Macaronesian Lotus group could also explain the overall low levels of species discrimination. In this particular case we used a relatively narrow species concept, but one that follows usual taxonomic practice in the Macaronesian flora (Sandral et al. 2006). Further re-assessment of the species boundaries within this group is necessary in order to fully evaluate the effectiveness of barcodes in this group.

Conclusions
In this study, we found that species discrimination in recent radiations in oceanic islands will be lower compared with continental counterparts. In the particular case of Macaronesian Lotus, we have shown that the discriminatory power of the barcodes is not homogeneous in all lineages, and radiations that occurred recently (≤3.5 Mya) will have the lowest levels of species discrimination. Species discrimination was successful in continental African species and lineages that radiated earlier than 4 Mya in this group, but additional approaches will be required for the most recent radiations.
With the reduction of costs in high-throughput nextgeneration sequencing, the application of ultra-barcoding, or the generation sequencing to produce whole organellar genomes and substantial nuclear ribosomal sequence (Kane et al. 2012), is potentially a suitable approach to overcome such rapid radiation in oceanic islands. This approach has been successfully applied to identify subspecies, varieties and individual genotypes in Theobroma cacao, and it will be a suitable approach to identify more variable regions in the genome of other plant groups in which evolution occurred on recent island radiations.

Accession Numbers
All sequences obtained in this study from fresh, herbarium specimens and silica gel preserved material have been deposited in the GenBank data base under the accession numbers KM372590 -KM373057.

Sources of Funding
Funding for this work was provided by Consejo Nacional de Ciencia y Tecnología (CONACyT) from Mexico to D.I.O., by a grant from the NSERC Discovery Grant program from Canada to Q.C. and by the project No. RF2006-00030 from the Spanish Instituto Nacional de Investigació n y Tecnología Agraria y Alimentaria to A.S.G.

Conflicts of Interest Statement
None declared.