A three-marker DNA barcoding approach for ecological studies of xerothermic plants and herbivorous insects from central Europe

The DNA barcoding technique developed for species identiﬁcation has recently been adapted for ecological studies (e.g. host plant identiﬁcation). Comprehensive barcode databases, covering most species inhabiting areas, habitats or communities of interest are essential for reliable and efficient identiﬁcation of plants. Here we present a three-barcode (plastid rbcL and matK genes and the trnL intron) database for xerothermic plant species from central Europe. About 85% of the xerothermic plant species (126 out of c . 150) known to be associated with xerothermic habitats were collected and barcoded. The database contains barcodes for 117 ( rbcL and trnL ) and 96 ( matK ) species. Interspeciﬁc nucleotide distances were in the ranges 0–17.9% (0–3.2% within genera) for rbcL , 0–44.4% (0–3.1%) for trnL and 0–52.5% (0–10.9%) for matK . Blast-searching of each sequence in the database against the entire database showed that species-level identiﬁcation is possible for 89.6% ( rbcL ), 98.4% ( trnL ) and 96.4% ( matK ) of examined plant species. The utility of the presented database for identiﬁcation of host plants was demonstrated using two insect species associated with xerothermic habitats: the oligophagous leaf-beetle Cheilotoma musciformis (for which two host plants in Fabaceae were identiﬁed) and the polyphagous weevil Polydrusus inustus (which was found to feed on 14 host plants, mostly Rosaceae, Asteraceae and Fabaceae). The developed database will be useful in various applications, including biodiversity, phyloge-ography, conservation and ecology. © 2015 The Linnean Society of London, Botanical Journal of the Linnean Society , 2015, 177 , 576–592.


INTRODUCTION
Xerothermic (calcareous) grasslands are one of the most diverse habitats in the temperate zone and are considered to be extrazonal analogues of continental Eurasiatic steppes (Niemelä & Baur, 1998;Ewald, 2003;Dengler et al., 2014). This plant formation is highly threatened in Europe (Janišová et al., 2011). It is limited by current climatic conditions that favour forests and restrict dry grasslands to local steep, dry and warm slopes on calcareous soils in central and western Europe. Xerothermic grasslands in central Europe sustain highly diverse plant communities, mainly belonging to the Festuco-Brometea association (Matuszkiewicz, 2005;Schubert, Hillbig & Klotz, 2001;Chytrý, 2007;Illyés et al., 2007;Dúbravková et al., 2010). Approximately 150 plant species can be found in this type of vegetation north of the Carpathians. This association is protected by the European Habitats Directive 92/43/ EEC, which classifies Festuco-Brometea grasslands, occurring mainly on calcareous substrates, under Habitat number 6210. Most xerothermic species are restricted to Festuco-Brometea grasslands; only a few can inhabit other types of habitats (such as sandy turfs). Xerothermic grasslands sustain populations of many rare and relic species with endemic taxa: Galium cracoviense Ehrend (only in the Kraków-Częstochowa Uplands), Erysimum pieninicum (Zapał.) Pawł. (only in Pieniny Mountains), Carlina onopordifolia Besser ex DC. (only in the Polish and Ukrainian Uplands) and several other species annexed in the Habitat Directive of the European Union.
Xerothermic grasslands have been highly fragmented and degraded due to man-made land transformations, which reduced their area as a result of afforestation and agricultural development (Pärtel, Mandla & Zobel, 1999;Dutoit et al., 2003;Poschlod et al., 2005;Johansson et al., 2008). This kind of plant formation is often vulnerable to plant succession (it can become overgrown by herbs, bushes and trees) and in many areas was sustained by traditional land use, mainly extensive grazing by roaming flocks of sheep in spring and autumn combined with summer haymaking (Michalik & Zarzycki, 1995;WallisDeVries, Poschlod & Willems, 2002). Xerothermic grasslands are also characterized by a rich entomofauna, particularly diverse assemblages of Orthoptera, butterflies (Lepidoptera) and beetles (Coleoptera) (Liana, 1987;Mazur, 2001;Rákosy & Varga, 2006;. Ecological studies on xerothermic plants and their insect assemblages require the development of techniques that allow for reliable and rapid species identification (both plants and insects). The DNA barcoding approach should facilitate not only identification of particular plant and insect species, but also understanding of ecological interactions and associations between host plants and insects feeding on these plants. Such knowledge would also be of practical importance for conservation of particular species and whole assemblages and for management planning for xerothermic grasslands.
DNA barcoding was developed primarily as an auxiliary technique for species identification. It was first used in animals and was based on a mitochondrial gene, cytochrome oxidase unit I (COI; Hebert, Ratnasingham & deWaard, 2003). Later, this technique was also adapted for studies on fungi with the final choice of the internal transcribed spacers (ITS) of nuclear ribosomal DNA (Seifert, 2009). When considering plants, a long-term debate ensued about the barcode of choice: several DNA markers were proposed for land plants, either individually or in combinations (Chase et al., 2007;Kress & Erickson, 2007;Fazekas et al., 2008;Hollingsworth et al., 2009). Finally, a two-locus barcode was proposed and widely accepted consisting of the plastid genes ribulose-bisphosphate carboxylase (rbcL) and maturase K (matK) (CBOL Plant Working Group, 2009). Additionally, the trnH-psbA intergenic spacer region of plastid DNA was proposed as a plant barcode (Shaw et al., 2005;Fazekas, Steeves & Newmaster, 2010;Pang, Luo & Sun, 2012). However, this raised concerns due to its extensive length variation (Chase et al., 2007;Kress & Erickson, 2007), the presence of intraspecific microinversions associated with palindromes (Whitlock, Hale & Groff, 2010;Jeanson, Labat & Little, 2011) and sequencing problems related to mononucleotide repeats (Fazekas et al., 2008;Devey, Chase & Clarkson, 2009; but see Fazekas et al., 2010). In some situations, however, these standard plant barcodes cannot be used. For example, the identification of host plant species from animal gut contents is a difficult task due to DNA degradation (e.g. Wallinger et al., 2013). Moreover, primers for matK rarely cover a wide spectrum of plant taxonomic units and therefore have limited utility for host plant identification from polyphagous animal guts, as several primer pairs should be used to increase the probability of amplification for all or most host plants present in samples. As an alternative, a plastid intron, located in the tRNALeu UAA gene (trnL; Taberlet et al., 1991), has successfully been used for diet analyses Taberlet et al., 2007). This intron has some limitations similar to those of trnH-psbA (e.g. length variation) and therefore its utility for plant species identification could be questionable. Nevertheless, it proved to be the barcode of choice for host plant barcoding in insects, particularly beetles (Jurado-Rivera et al., 2009;Pinzón-Navarro et al., 2010;Kubisz et al., 2012;Kitson et al., 2013). The trnL intron has also been successfully used for identification of below-ground plant richness (from roots) (Hiiesalu et al., 2012).
Recently, the DNA barcoding approach has been used for other types of ecological studies, particularly for identification of plant species and evaluation of species richness from selected areas, habitats and/or plant communities. These studies focused on tropical biodiversity hotspots such as forests of South and Central America and South Asia (Kembel & Hubbell, 2006;Dick & Kress, 2009;Gonzalez et al., 2009;Kress et al., 2009Kress et al., , 2010Pei et al., 2011). So far, there have been several examples of studies using plant barcodes for ecological studies in other areas and plant communities, e.g. boreal forests in Canada (Fazekas et al., 2008). However, there are hardly any analogous studies concerning plant species identification and evaluation of species richness for open land habitats such as grasslands, with the exception of a single study on the mountain dry grasslands of Italy (De Mattia et al., 2012). One may ask why one would develop barcodes if plants can be identified on the basis of traditional morphological examination. Indeed, there is no need for barcoding in many botanical studies (e.g. in standard vegetation inventories), but barcode databases could potentially be useful if In the present study, we evaluated the performance of different DNA barcode markers (matK, rbcL and trnL) for identification of xerothermic plant species and evaluation of species richness using xerothermic grasslands from Poland as an example. Xerothermic grasslands in Poland were selected as the subject of this research as this plant association has been intensively studied by botanists and habitat specialists from the end of the 19th century (Preuss, 1912;Kozłowska, 1931;Ceynowa, 1968;Medwecka-Kornaś & Kornaś, 1977). In Poland, all major types of dry grasslands known from central Europe can be found and most central European plant species associated with this vegetation are also present there (Zając & Zają c, 2001;Mirek et al., 2002;Matuszkiewicz, 2005). Moreover, Polish dry grasslands are located in two areas which differ with respect to the history of formation and persistence of xerothermic grasslands. Southern Poland was glaciated only once (Sanian glaciation, c. 730 000-430 000 years ago), whereas northern Poland was glaciated several times during the Pleistocene (including the Vistulian glaciation, which ended 10 000-12 000 or 17 000-18 000 years ago in the Kujawy basin) (Marks, 2002;Lindner et al., 2006;Wysota, Molewski & Sokolowski, 2009). Moreover, southern Poland was, and partially still is, connected with the Pontic and Pannonic steppe areas, whereas northern Poland could have been settled by xerothermic species in the Holocene and only via some specific routes (such as along the Vistula River valley). Lastly, xerothermic grasslands in Poland are highly threatened as they are extrazonal, highly fragmented and sensitive to human land transformations. This plant association shelters also diverse communities of invertebrates, including numerous species of Coleoptera. As the diet of some of xerothermic beetles has been intensively studied based on field observations or feeding experiments (e.g. Szymczakowski, 1960;Warchałowski, 1991;Mazur, 2001), they can be used as excellent objects to test performance of plant barcodes for host plant identification. Among xerothermic beetles, well known regarding their feeding preferences are, the oligophagous leaf-beetle Cheilotoma musciformis and the polyphagous weevil Polydrusus inustus.
Evaluation of the performance of barcodes for identification of xerothermic plant species and evaluation of species richness was performed in four steps: (1) amplification efficiency; (2) sequencing success; (3) accuracy of plant species identification; and (4) application for host plant identification. The main goal of this study was to develop a database of xerothermic plant barcodes for further ecological and conservation studies. Additionally, the database was used for evaluation of the utility of these barcodes for identification of insect host plants on the basis of gut content. To this end we examined two beetles: C. musciformis and P. inustus

SAMPLING AREA
The study was performed on xerothermic (calcareous) grasslands of the Festuco-Brometea association located in two areas. The majority of plants were collected in the Polish Uplands located in southern Poland (between the cities of Kraków and Kielce; coordinates of the centre of this area 50.374°N, 20.407°E). The remaining plants, especially species absent or difficult to find or rare in southern Poland, were collected in northern Poland in the Kujawy Basin (between the cities of Toruń and Bydgoszcz; coordinates of the centre of this area 52.942°N, 18.572°E). Xerothermic communities in the first sampling area consisted mainly of xerothermic grasslands on steep slopes of chalk and gypsum hills. In the second area, mainly xerothermic grasslands on steep scarps along river valleys on clay soils were sampled.

PLANT SAMPLING
Prior to field surveys, a list of all plant species native to Poland and associated exclusively or mainly with xerothermic grasslands (Zają c & Zają c, 2001;Mirek et al., 2002;Matuszkiewicz, 2005) was compiled. After floristic reconnaissance, we also added species commonly found in xeric grasslands, but strongly associated with other syntaxonomic groups (mostly species associated with Molinio-Arrhenatheretea meadows and Rhamno-Prunetea shrubland). The final list comprised 152 plant species. Field survey was executed in two seasons in 2011 and 2012 (from April to August). Xerothermic plant species and other species characteristic for open dry habitats were collected. Voucher specimens (dried) were collected and are deposited in the Jagiellonian University Herbarium (collector: W. Heise) (voucher specimen numbers presented in Table 1). For the purposes of molecular analyses several green leaves from a single individual of each species were collected and preserved in plastic bags with silica gel. All samples were stored in a refrigerator at 4°C until DNA isolation. Plant species were identified in the field. Parts of specimens important for taxonomic identification were collected and preserved.

BEETLE SAMPLING
To evaluate the utility of plant barcodes for host plant identification from insect gut two species were 578 W. HEISE ET AL.   selected: the leaf-beetle C. musciformis (Chrysomelidae) and the weevil P. inustus (Curculionidae). Both species are characteristic of dry grasslands and scrublands of central and eastern Europe (Warchałowski, 1971;Borowiec, 1984;Mazur, 1994;Korotyaev & Meleshko, 1995;Korotyaev, 1996;. The population genetics of both these species have recently been studied in detail (see Kajtoch, LachowskaCierlik & Mazur, 2009;Kajtoch, Korotyaev & LachowskaCierlik, 2012;Kajtoch et al., 2013). Beetles were collected using sweep-nets from herb, shrub and bush layers on xerothermic turfs in 2011 and 2012 (May-June). To avoid over-representation of specimens feeding on the same plants (collected in the same place and the same time), 24 specimens of P. inustus were randomly selected, each from a different xerothermic patch. Similarly, single individuals of C. musciformis were randomly selected from distinct xerothermic patches; only ten specimens were used in analyses, as this species is highly threatened in Poland (Ś cibior, 2004;Kajtoch et al., 2013). Beetles were only collected in good weather conditions to avoid collection of starving specimens (as efficiency of plant DNA isolation and amplification from such individuals is decreased; Kajtoch & Mazur, in press) and preserved immediately in ethanol (96%) in the field to reduce DNA degradation. Samples were kept frozen until DNA isolation.

LABORATORY PROCEDURE
Plant tissues (leaves) were frozen in liquid nitrogen prior to DNA isolation. Frozen samples were crushed (homogenized, pulverized) in an agate mortar, and DNA was isolated using the Nucleospine Plant Tissue Kit (Macherey-Nagel). Beetles were digested with proteinase K, and DNA was isolated using the Nucleospine Tissue Kit and protocol for animal tissue isolation. The DNA concentration and purity of all isolates were assessed using Nanodrop, and the quality of DNA isolates from beetles was checked by amplification of the COI mitochondrial gene using standard barcode primers (Folmer et al., 1994). Next, DNA isolates were used for amplification of three plastid barcodes, matK, rbcL and trnL, using the following primers: matK472F and matK1248R for matK (Yu, Xue & Zhou, 2011), 1F and 724R for rbcL (Fay, Swensen & Chase, 1997), and A49325 and B49863 for trnL (Taberlet et al., 1991). We did not use primers developed to amplify short barcodes [minibarcodes; e.g. Hofreiter et al. (2000) for rbcL; Taberlet et al. (2007) for trnL] as these short markers do not have sufficient discriminatory power and rarely allow for plant species identification (see also Little, 2014). Amplicons of the trnL intron were of variable length (c. 350-640 bp), whereas amplicons of the plastid genes showed a smaller range of length variation: rbcL, 650-680 bp; matK, 690-720 bp. The PCRs of samples that did not amplify any fragment were repeated using less stringent conditions: reduction of up to 5°C in the annealing temperature and a higher concentration of MgCl 2. For species for which this procedure failed to amplify any barcode, the PCRs were repeated on other DNA isolates. The same primers were used for amplification of plant DNA from plant tissues (leaves) and from insect guts. All PCR products were visualized on agarose gels. PCR products from plant leaves and C. musciformis samples were then purified using an ExoProStar kit (GE Chemicals). Purified DNA products were then Sanger sequenced using forward primers and a BigDye Terminator v.3.1. Cycle Sequencing Kit (Applied Biosystems) and run on an ABI 3100 Automated Capillary DNA Sequencer. In cases of unreadable sequences, the sequencing procedure was repeated with reverse primers. For P. inustus, another procedure of host plant identification was used: only rbcL and trnL barcodes were amplified separately for each individual (to avoid problems and errors caused by unequal concentration of plant DNA in isolates from weevil bodies). This procedure was followed because the matK database of xerothermic plants was too incomplete for reliable species assignment (see Results). All amplicons (small volumes of both rbcL and trnL) were first checked on agarose gel and then pooled approximately equimolarly (all PCRs of rbcL separately from PCRs of trnL) and purified using a Nucleospine DNA Extraction Kit. The sequencing library was prepared using a NexteraXT library preparation kit (Illumina). The library was sequenced as a part of a MiSeq paired-end 2× 150-bp run.

DATA ANALYSIS
Sanger sequences were checked visually using BioEdit v.7.0.5.2 (Hall, 1999). Only sequences of good-quality fragments, longer than 400 bp (trnL) or 650 bp (rbcL and matK), were used for further analysis. Sequences of all three plant barcodes used in this study and obtained directly from plant tissues were stored as FASTA files. All sequences of the particular barcode were aligned using MAFFT v.7 (Katoh & Standley, 2013). Because the generated database of xerothermic plants does not cover all species known from the study area (see Results), the NCBI GenBank database was additionally searched for rbcL, trnL and matK sequences of xerothermic plant species missing in the xerothermic database (see Table 1). Although the CBOL Plant Working Group has initiated a plant DNA barcoding database based on rbcL 582 W. HEISE ET AL. and matK (see http://www.boldsystems.org), it currently contains an insufficient number of records, especially for taxa from poorly known environments and areas such as xerothermic grasslands of central Europe and therefore this database was not sufficient for the purposes of this study. Moreover, this database contains only rbcL and matK sequences; therefore, the trnL barcode cannot be used for species identification using BOLD. For these reasons, instead of using BOLD we decided to use the resources available in NCBI GenBank. MEGABLAST (Basic Local Alignment Search Tool, Altschul et al., 1990) was used to search for most similar sequences of three barcodes (independently) in the NCBI GenBank sequence library. Results of identification were provided as a list of best hits of the nearest matches (maximum identity) according to BOLD-IDS guidelines (http:// www.boldsystems.org/views/idrequest_plants.php).
Due to the limitation of NCBI GenBank resources, it was not possible in some cases to identify plant species that were barcoded (as many xerothermic plants were absent in NCBI GenBank before this study); therefore, other species (usually of the same genus) were retrieved and reported as the nearest matches. This was done only for quick verification of barcode amplification and sequencing efficiency and accuracy. The performance of each barcode was evaluated by use of a local Blast search in BioEdit v.7.2.2 (Hall, 1999) of the developed barcode database against this database to find how many plant species could not be discriminated. Only hits with 100% identity and > 95% sequence coverage were retrieved. In the local Blast search we used 128 sequences for trnL and rbcL barcodes and 107 sequences for the matK barcode (including plant species for which sequences were downloaded from NCBI GenBank). Moreover, according to the guidelines provided by CBOL (http:// www.barcoding.si.edu/protocols.html), the evaluation of comparative levels of variation and discrimination for the three markers were undertaken using MEGA 5.10 (Tamura et al., 2011) to generate Kimura twoparameter (K2P) distance matrices for each locus. These distances were calculated for the whole sets of barcodes (for all species) and also separately for plant genera that were represented by more than one species in the developed barcode databases. Next, we performed the identification of Sanger sequences (of three barcodes) obtained from C. musciformis guts via comparison with prepared databases of xerothermic plant barcodes. Again, the MEGAB-LAST search tool was used ('align two or more sequences' option). FASTA alignments of each plant barcode were used as references for searching nearest matches for sequences obtained from C. musciformis. Only sequences of a query coverage larger than 95%, Expect (E) value = 0 and a maximum identity at least 99% were retrieved. These thresholds were set somewhat arbitrarily to maximize stringency of identification of host plant species. Query coverage of at least 95% was required so that entire reads would show high similarity to the query species, excluding, for example, chimaeric sequences that may have been generated during PCR. An identity of at least 99% was chosen to allow for sequencing errors and intraspecific genetic variation.
Finally, Illumina sequences obtained from the P. inustus mixed sample were used for host plant species identification. In this particular paired-end Illumina run, the quality of the second reads was much lower; only the first read from each pair was used in Blast analyses, but both reads were used for mapping (see below). Identification of plants was performed by the comparison of the sequencing reads with sequences in our database of plant barcodes. We used two complementary methods. The first method was based on MEGABLAST searches. For each read of at least 120 bp (ungapped), a MEGABLAST search with cutoff E value of 1×10 -20 was performed. Only reads with at least 98% identity to at least one plant species in the database were retained. This threshold was used as 98% identity was used in other studies that performed host plant identification with use of plant barcodes and next-generation sequencing technologies (e.g. Soininen et al., 2009;Valentini et al., 2009;Hajibabaei et al., 2011). A read was considered to have a unique hit if only a single hit was reported or when the bitscore of the second-best hit was not better than 0.95× the bitscore of the best hit. Plant species were identified only on the basis of these reads. When this condition was not met, then all hits (species) with bitscores > 0.95× the bitscore of the best hit were considered as matching the read equally well. This group of reads, together with reads that could be assigned to particular plant species (previous category), was used jointly for estimation of host plant frequencies at the plant family level.
The second method employed mapping read pairs to the references from the plant database. Mapping was performed with Bowtie2 (Langmead et al., 2009). End-to-end alignment with the minimum insert size of 100 bp was used, and only reads pairs mapping concordantly (using the default Bowtie2 definition of concordance) were reported. Only the best alignment was reported for each read, and reads with mapping quality < 10 (which corresponds to a P < 0.9 that the read mapped uniquely) were excluded. The number of read pairs mapped to each reference was calculated with SAMtools (Li et al., 2009).
For both methods, we reported only those plant species with at least 1.0% of assigned reads.

TAXONOMIC OVERVIEW OF XEROTHERMIC PLANTS
The majority of studied plant species belonged to Dicotyledoneae. The rest belonged to Monocotyledoneae and represented 29 orders, 33 families and 79 genera (including nine genera for which PCR failed to amplify any barcode). The most species-rich families of xerothermic plants from Poland are Fabaceae (21 species), Asteraceae (14 species), Rosaceae (11 species), Apiaceae (eight species), Caryophyllaceae (seven species), Scrophulariaceae (six species) and Poaceae (six species) ( Table 1, Supplementary Table S1).

BARCODING OF XEROTHERMIC PLANTS
In total, 126 plant species characteristic for xerothermic grasslands or associated generally with dry and warm habitats were collected and used for DNA isolation and amplification (83% of 152 xerothermic species known from Poland; Tables 1, S1). For 92.1% of the collected species rbcL and trnL barcodes produced PCR bands; almost all of them were successfully sequenced (both 94%). On the other hand, 90.6% of the plant species were successfully amplified for matK, but only 80.0% of them could be successfully sequenced ( Table 2). All sequences of plant barcodes generated in this study are available as Files S1-3 (in FASTA format) or on request from the corresponding author. The quality-trimmed fragments (excluding short initial and final fragments that could not be determined for all species and several sequences for which only short fragments were generated) have been submitted to the NCBI GenBank database (https:// www.ncbi.nlm.nih.gov/NCBI GenBank/; accession numbers in Table 1).
Nineteen taxa generated low-quality or unreadable matK sequences due to the presence of internal short tandem repeats of single nucleotides, which most probably led to polymerase errors (replication slippage).

IDENTIFICATION ACCURACY
The accuracy of plant identification (based on MEGABLAST search of the NCBI GenBank data-base) varied for each of the three examined barcodes ( Table 2). The trnL intron allowed for correct species identification in 32.5% of cases, genus identification in 55.5% of cases and family identification in 12.0% of cases. These assignments for rbcL were 26.5, 64.1 and 9.4% and for matK 45.8, 50.0 and 4.2%. In total, 66 out of 117 species showed correct plant identification in at least one barcode (38 in trnL, 33 in rbcL and 45 in matK) ( Table 2).
Evaluation of the efficiency of the generated barcodes in identification of plant species showed that with use of the trnL intron only one pair of species (Peucedanum oreoselinum Moench and P. cervaria Cusson ex Lapeyr.) could not be distinguished (1.6% of all examined species). The matK gene showed slightly lower power to distinguish species: two pairs of species K2P distances calculated for sequences of each barcode were in the range 0-17.9% for rbcL, 0-44.4% for trnL and 0-52.5% for matK. The distributions of K2P distances among all pairs of species are presented in Figure 1. K2P distances calculated for plant species belonging to the same genera showed that for several pairs of species these distances are equal to zero (11 pairs for trnL, six for rbcL and four for matK) (Table S2).

Cheilotoma musciformis
Amplification was successful for all barcodes in all analysed specimens of C. musciformis; each amplicon produced a single sequence (ten sequences were generated for each barcode). All barcodes enabled unambiguous identification of the host species (100% query coverage, E-value = 0 and identity = 100% for all MEGABLAST searches). Eight out of ten individuals were found to feed on Onobrychis viciifolia Scop. and the remaining two were found to feed on Oxytropis pilosa DC. (both Fabaceae) (Fig. 2).

Polydrusus inustus
In total, 18 795 read pairs mapped to the reference barcode sequences; of these, 9293 mapped uniquely (6030 pairs mapped to rbcL and 3263 to trnL) and thus could be used for plant identification to the species level. Only first reads from each pair were useful for blast searches (due to the low quality of second reads of Illumina sequencing, see details above); 6307 reads of at least 120 produced blast hits (3381 rbcL and 2926 trnL). Illumina sequencing of plant barcodes amplified from the P. inustus weevil gut revealed that the majority of host plants (with highest relative share in both barcodes) were assigned to three members of Rosaceae: Prunus spinosa L., Crataegus monogyna Jacq. and Rosa canina L. (Table S3). Additionally, a substantial (but much lower) share was found for Fragaria viridis Weston (Rosaceae), Sarothamnus scoparius L. (Fabaceae), Artemisia campestris L. and  (Fig. 2). Fourteen plant species were identified as host plants for this weevil using the blast algorithm and eight using the mapping method. A larger number of species identified by the blast algorithm was observed for rbcL. In total, rbcL allowed for the identification of 11 species and trnL for the identification of seven species. Some species were identified based only on trnL (one species) or rbcL (five) ( Table S3). In general, P. inustus was found to be a feeder of mostly Rosaceae, Asteraceae and Fabaceae (Fig. 2).

XEROTHERMIC PLANT BARCODES
Here we present one of the first multi-marker plant barcode databases from Europe prepared by extensive sampling of a selected type of vegetation. This database will be likely to facilitate and improve future ecological studies. It is worth emphasizing that this is one of few databases that includes not only two standard plant barcodes (rbcL and matK genes), but also the trnL intron, which proved to be more useful for identification of host plants from animal DNA sources (e.g. guts or faeces) (Jurado-Rivera et al., 2009;Valentini et al., 2009;Pinzón-Navarro et al., 2010;Taberlet et al., 2007;Kubisz et al., 2012;Kitson et al., 2013).
This database covers c. 80% of plant species associated with xerothermic grasslands in Poland and central Europe. It should be further noted that only for two barcodes (rbcL and trnL) were most plant species successfully sequenced. High amplification and sequencing success in the case of rbcL and trnL and problems with amplification and sequencing of matK are consistent with previous reports about the utility and characterization of these barcodes Hollingsworth et al., 2009). Indeed, the matK gene could be the preferred barcode due to its relatively high structural conservation and simultaneously high discrimination power (it allows for correct species identification for 46% of studied plants). However, universal primers developed by Yu et al. (2011) failed to amplify a significant fraction of xerothermic plant species. Moreover, mononucleotide tandem repeats in this barcode are present in some species, which due to possible polymerase replication errors (replication slippage) makes sequencing difficult and more costly due to the necessity of sequencing from both directions. Even this procedure failed in some species, as mononucleotide repeats are present in more than one part of this gene. It should be possible to use a set of matK primers for particular plant families known from xerothermic grasslands and use them for preparing a complete barcode database. However, this procedure would be much more expensive and time consuming, and therefore not useful for host plant barcoding of polyphagous species of unknown diet. Moreover, it would be extremely hard to use such sets of primers for ecological studies (e.g. diet analyses) as it would require the use of many pairs of primers for all samples. On the other hand, the rbcL gene is the least variable among all examined barcodes, and it has low discriminatory power (especially members of the same genus). Moreover, the low polymorphism of this barcode does not often allow for species or even genus identification when using short fragments (minibarcodes), which is often necessary with degraded templates (e.g. from animal faeces or museum plant collections). According to Little (2014), the best set of primers for rbcL minibarcodes allow for discrimination of only 38% of species. Based on obtained data and considering previous studies on various plants and animal diets (Jurado-Rivera et al., 2009;Valentini et al., 2009;Pinzón-Navarro et al., 2010;Taberlet et al., 2007;Kubisz et al., 2012;Kitson et al., 2013), the trnL intron should be the barcode of choice for ecological studies, especially for applications requiring high amplification and sequencing success, coverage of distantly related plant species and high discriminatory power. In this study we demonstrated that trnL allowed for amplification and sequencing of > 90% of xerothermic plants and that is it a highly informative barcode as only one pair of species could not be distinguished in the blast search. Moreover, this barcode enables identification of 70% of host plants based on short reads. However, this barcode also has some drawbacks partially shared with the trnH-psbA intergenic spacer region (Shaw et al., 2005;Fazekas et al., 2010;Pang et al., 2012). Both barcodes have high length variation due to the presence of large indels (Chase et al., 2007;Kress & Erickson, 2007), but trnL has probably fewer long mononucleotide repeats, which are common in trnH-psbA (Fazekas et al., 2008;Devey et al., 2009; but see Fazekas et al., 2010;Whitlock et al., 2010;Jeanson et al., 2011). Both these non-coding plastid fragments were used successfully for identification of beetle host plants (for trnL: Jurado-Rivera et al., 2009;Pinzón-Navarro et al., 2010;Kubisz et al., 2012;Garcia-Robledo et al., 2013;for trnH-psbA). The choice between trnL and trnH-psbA barcodes should also depend on availability of reference databases, as NCBI GenBank includes > 170 000 sequences of trnL and c. 70 000 of trnH-psbA sequences (April 2014). However, the most important criterion for barcode selection should be its efficiency of amplification for plants present in the studied sample (area, habitat, community, etc.) and in this study we demonstrated that trnL has the greatest discrimination power for xerothermic plant species from Poland. However, it is also important to emphasize that our analyses do not include assessment of intraspecific variation; if intraspecific variation is high, discrimination of some other, closely related taxa may be problematic. Generally, the approach of using two or three barcodes simultaneously provides better resolution and discriminatory power for plant species identification, especially if some of the barcodes failed to amplify or produced unreadable or low-quality sequences. These advantages should overcome the slightly higher cost and additional time needed to develop and use a multi-barcode database. A multi-barcode approach should also decrease the probability of false positive species identifications, as the simultaneous use of two or more barcodes allows for self-testing of identification reliability and detection of errors caused by problems with polymerase replication, sequencing or identification algorithms. The barcode database developed for xerothermic plants in the current study allowed for discrimination of nearly all plant species with the use of two or three barcodes, as only one pair of species (Peucedanum oreoselinum and P. cervaria) could not be distinguished with the use of all three barcodes. Lower sequence divergence between these two congeners could be explained by recent speciation, incomplete lineage sorting or hybridization, which are common phenomena among plants. It should be emphasized that the multi-barcode approach would not allow for detecting and eliminating errors caused by species misidentification during collection or contamination.

EVALUATION OF THE UTILITY OF THE DATABASE FOR ECOLOGICAL STUDIES
To verify how a barcode database of xerothermic plants works for identification of host plants of phytophagous animals, the experiment was implemented using two beetle species: a polyphagous weevil and an oligophagous leaf-beetle. These two species were chosen because their feeding preferences are relatively well known (but only on the basis of field observations).
The first of the investigated beetles (Cheilotoma musciformis) was observed to feed in Poland on Onobrychis Mill. (Szymczakowski, 1960;Warchałowski, 1991) and on Rumex L. and Anthyllis vulneraria L. in the southern regions of Europe (Gruev & Tomov, 1984;Warchałowski, 1991). Recent studies also confirm that in Slovakia it can feed on Lotus L. and Dorycnium Mill. (Kajtoch et al., 2013). The three-barcode database of xerothermic plants confirmed that this species in Poland mostly feeds on O. viciifolia, but some individuals also utilize another species of Fabaceae: Oxytropis pilosa, which is new host plant for this beetle. It is possible that this species is generally associated with Fabaceae, but the low number of individuals used in this study (due to the rarity and threatened status of the species) prevented identifying more host plants.
Our results clearly show that the second species (Polydrusus inustus) is indeed polyphagous. It is known to feed on Rosaceae and on Medicago sativa L., Cirsium arvense (L.) Scop. and Melilotus alba Medik. (Mazur, 2001). Plant barcodes confirmed its association with Rosaceae and Fabaceae, but none of the investigated individuals fed on Cirsium or Melilotus. Moreover, plant barcodes added new species as host plants: two Asteraceae (Artemisia campestris, Inula ensifolia), one Fabaceae (Sarothamnus scoparius) and one Campanulaceae (Campanula glomerata).
Generally, all plant barcodes were shown to be useful in host plant species identification for oligophagous beetles and also for monophagous species such as the leaf-beetle Crioceris quatuordecimpunctata (associated only with Asparagus L.; Kubisz et al., 2012). However, in cases of polyphagous species, rbcL and matK genes failed if the studied individual fed on more than one plant species due to similar length of PCR products. A similar pattern was observed for another polyphagous weevil, Centricnemus leucogrammus (Kajtoch 2014;Kajtoch & Mazur, in press). This sequence length uniformity did not allow for gel extraction of distinct amplicons and their direct Sanger sequencing unless single strand conformation polymorphism (SSCP) was implemented (Kishimoto-Yamada et al., 2013); even then, it is not possible to identify host plants for all samples. This problem could be circumvented by a cloning step, but it is too costly and time-consuming to use this technique on larger numbers of samples. On the other hand, the trnL intron, which showed a wide range of sequence length, often enables the identification of two or three host plants for a particular individual, but this approach does not allow for the identification of all host plants without the cloning step. Recently, this problem was overcome by the use of high-throughput sequencing technologies to study host plants of polyphagous beetles at the population level (e.g. the xerothermic weevil Centricnemus leucogrammus; Kajtoch 2014). Results obtained here for P. inustus confirm the utility of plant barcodes combined with highthroughput platforms such as Illumina.

FUTURE APPLICATIONS
A wide coverage of xerothermic species from central Europe and the availability of three barcodes (rbcL, matK and trnL) should be helpful in various ecological studies on xerothermic associations and assemblages. This database could be used in various ways. It should allow for more efficient and rapid evaluation of plant species richness in xerothermic patches of central Europe. Moreover, this database could help in verification assignment of plant tissues from museum collections to particular species. It could be also used for identification of rare, threatened and protected plants illegally collected, traded and/or cultivated. All these activities pose a serious threat for xerothermic plants in central Europe. Plant barcodes, especially the highly polymorphic trnL intron, could be used simultaneously with microsatellite and/or amplified frag-ment length polymorphism (AFLP) markers to identify evolutionary lineages within species. This could be important for many conservation programmes (including translocation of individuals, reintroduction of threatened populations and restitution of extinct populations). This database is already being used for studies on evolutionary and ecological interactions among xerothermic plants and beetles (in preparation). Lastly, the developed plant barcode database can also be used for diet analyses of other flagship or rare and threatened dry-grasslands herbivore species from central Europe, such as skippers (Spialia sertorius), blues (Pseudophilotes baton) and fritillary (Melitaea cinxia) butterflies, ground squirrels (Spermophilus citellus and S. suslicus) and hamsters (Cricetus cricetus). Similar plant barcode databases should be assembled and characterized, and their utility verified for other types of habitats and areas in Europe to develop comprehensive genetic information that allows for reliable plant species identification for systematic, ecological and conservation purposes. Table S2. Kimura-2-parameter (K2P) distances calculated for plant genera with at least two species present in DNA barcode database. N, number of species available for a particular barcode (trnL, rbcL, matK). In brackets are species for which K2P distances equal 0.0. Table S3. Composition of host plants assigned for P. inustus weevil with use of Illumina sequencing and two methods of species identification: mapping and blast search against the reference database of xerothermic plant species from Poland. Only species with relative share of at least 1.0% are presented.