Insight into infrageneric circumscription through complete chloroplast genome sequences of two Trillium species

To clarify the significant genomic events of the tribe Parideae, we analysed the complete chloroplast genome sequences of two Trillium species representing two subgenera: Trillium and Phyllantherum. The results showed that the cpDNAs of Parideae are highly conserved across genome structure, gene order and contents. However, the chloroplast genome of T. maculatum contained a 3.4kb inverted sequence between ndhC - rbcL in the LSC region. In addition, we found three different types of trnI-CAU duplication. These genomic features provide informative molecular markers for identifying the infrageneric taxa of Trillium and improve our understanding of the evolution patterns of Parideae.


Introduction
The chloroplast that characterizes all green plants (Viridiplantae) originated from an endosymbiotic event between independent living cyanobacteria and a non-photosynthetic host (Dyall et al. 2004). Chloroplast genomes of flowering plants are typically circular doublestranded DNA molecules, and usually contain two inverted repeat (IR) regions (IRA and IRB) separated by a large single-copy and a small single-copy (LSC and SSC, respectively) regions (Ravi et al. 2008). The plastid genome is mostly stable in structure, gene content and gene order across land plant lineages . Due to this stability, it demonstrated great utility for developing phylogenetic hypotheses across the plant tree of life Zhang et al. 2011;Li et al. 2013). Within seed plants, plastid genomes usually contain 101 -118 unique genes with the majority of those 66 -82 coding for proteins involved in photosynthesis and gene expression, 29 -32 of these genes code for transfer RNAs and 4 code for the ribosomal RNA genes (Jansen and Ruhlman 2012). The advance of nextgeneration sequencing has facilitated rapid growth of complete chloroplast genomes due to time-saving and low-cost advantages (Shendure and Ji 2008). To date, 500 complete chloroplast DNA genome sequence data have been released in GenBank's Organelle Genome Resources (http://www.ncbi.nlm.nih.gov/genome).
Melanthiaceae, a member of Liliales, comprises 17 genera and 178 species of perennial herbs that are mostly distributed in the temperate regions of the Northern Hemisphere (Zomlefer et al. 2001). Species of this family are characterized by their extrorse anthers and carpels bearing three distinct styles (Rudall et al. 2000). The family has been divided into five tribes: Heloniadeae, Chionographideae, Xerophylleae, Melanthieae and Parideae (The Angiosperm Phylogeny Group 2009). Prior to any molecular systematic analyses, Melanthiaceae were divided into several taxonomically independent families by Takhtajan (1997) due to their unique autapomorphies. Trilliaceae, which is now recognized as tribe Parideae (Trillieae), are unique in having solitary flowers, berries, membranous nectary and large chromosomes with five chromosomes as the base number. The phylogeny of species within the Trilliaceae (now Parideae) was highly debated by many researchers using molecular and morphological data (Kato et al. 1995;Farmer and Schilling 2002;Farmer 2006). Tribe Parideae includes three genera: Paris, Trillium and Pseudotrillium. Paris has 4 -15 leaves in a whorl, flowers 4-merous or more and inner perianth segments that are much narrower than outer ones, while Trillium has only 3 leaves in a whorl, flowers 3-merous and inner perianth segments that are a little narrower than the outer ones. Pseudotrillium has thick, tough, heartshaped leaves, spotted petals and flower stalks that extend until the ripe fruit touches the ground. Trillium has been divided into two subgenera differing in the presence of pedicel: subgenus Trillium (with pedicels) and Phyllantherum (without pedicels) (Freeman 1969(Freeman , 1975. The monophyly of subgenus Phyllantherum was strongly supported in many previous studies Farmer and Schilling 2002;Farmer 2006). On the other hand, subgenus Trillium is rendered a paraphyletic group by the inclusion of Phyllantherum.
Currently, complete chloroplast genomes of the Melanthiaceae have been reported from Paris verticillata (KJ433485; Do et al. 2014), Veratrum patulum (KF437397; Do et al. 2013) and Chionographis japonica (KF951065; Bodin et al. 2013), which represent three tribes of Parideae, Melanthieae and Chionographideae, respectively. In this study, we analysed complete chloroplast genome sequences of subgenera Trillium and Phyllanthrum of Trillium to better understand the evolution of the chloroplast genomes in tribe Parideae and across the Melanthiaceae. We analysed the sequence variation between two subgenera and proposed novel molecular markers for phylogenetic studies by comparing the two newly generated genome sequences. In addition, we characterized the trnI-CAU duplication event in Parideae, detected in P. verticillata chloroplast genome (KJ433485), to determine the origin of the repeating unit. Consequently, these results provide additional knowledge about the patterns of the chloroplast genome evolution within tribe Parideae.

DNA extraction, sequencing and annotation
We collected Trillium tschonoskii from Ulleung Island, South Korea. The voucher specimen and plant materials were deposited at the herbarium (GCU) and Medicinal Plant Resources Bank (MPRB) of Gachon University. Trillium maculatum was obtained from the Abraham Baldwin Agricultural College, USA (voucher No. Susan Farmer 19990006). We used silica gel-dried leaves from each species to extract total genomic DNA using the DNeasy Plant Mini Kit (Qiagen, Seoul, South Korea). The Hiseq 2000 system was employed to sequence chloroplast genomes of T. tschonoskii and T. maculatum. Raw data were assembled using Geneious ver. 7. 1 (Biomatters Ltd, New Zealand) with default settings. After trimming the sequences, we mapped pair-end reads to the reference sequence of P. verticillata (KJ433485). Aligned contigs were ordered according to the reference genome and the gaps were filled via direct sequencing of polymerase chain reaction (PCR) products with newly designed primers. In addition, the ambiguous sequences including low assembly coverage regions and the borders of the four junctions between LSC, SSC and IR regions were confirmed using the Sanger method.
Complete chloroplast genomes of both species were annotated by Geneious ver. 7. 1 (Biomatters Ltd), with manual corrections for putative start and stop codons. The exon positions of protein-coding genes and intron were determined using released Liliales chloroplast genome sequences as references. All tRNA sequences were confirmed utilizing the web-based online tool of tRNAScan-SE (Schattner et al. 2005) with default settings to corroborate tRNA boundaries identified by Geneious. The genome maps were generated using OGDraw (Orga-nellarGenomeDRAW; Lohse et al. 2007) followed by manual modification.

Comparison of the chloroplast genome sequences of two subgenera
The simple sequence repeats (SSRs) were analysed using Phobos Version 3.3.12 (Kraemer et al. 2009), with thresholds of eight repeat units for mononucleotide SSRs, four repeat units for dinucleotide, trinucleotide SSRs and three repeat units for tetranucleotide, pentanucleotide and hexanucleotide SSRs. All the detected repeats were manually verified, and the redundant results were removed. We aligned the plastid genome sequences of two Trillium using MAFFT (Katoh et al. 2002). The identified insertion/deletion mutations (indels) from the results were confirmed by reassembling the whole reads generated by HiSeq 2000. The single nucleotide polymorphisms (SNPs) were analysed using Geneious 7.1 (Kearse et al. 2012), and each indel and SNP were separated based on the position excluding one of IR regions. Since we are comparing only two genomes, we quantified the sequence divergence as the ratio of aligned nucleotide sites within specifically different regions ( p-distance). Sanger sequences and assembled genomes were calculated using mean p-distance in MEGA 6.0 (Tamura et al. 2013).
Twenty-nine species, representing the two subgenera of Trillium in Parideae, were selected for comparative sequencing of inversion. The PCR amplification primers were designed based on the sequence comparisons among three chloroplast genome sequences of two Trillium species (in this study), and P. verticillata (KJ433485). Presence and absence PCR amplifications were carried out using various combinations of the three primers (I1F: 5 ′ -CCC TAG GTT TTT TTC TTC AAG-3 ′ , I1R: 5 ′ -TTA TGT AGC TTA TCC TTT AGA CC-3 ′ and I2R: 5 ′ -AGA AGG TCT ACG GTT CGA G-3 ′ ).

trnI-CAU duplication pattern in the tribe Parideae
To clarify the trnI-CAU duplication pattern in the tribe Parideae, we designed two primers (Primer 1: 5 ′ -GAA GAG TTC GAC CCA ATG CT-3 ′ , Primer 2: 5 ′ -TTA TGA AAC TCT TTG ACC CC-3 ′ ) for amplifying the intergenic spacer (IGS) region of rpl23-ycf2 based on the identical sequence among the three species (P. verticillata, T. maculatum and T. tschonoskii). The PCR condition for IGS region of rpl23-ycf2 was at initial denaturation at 94 8C for 5 min, followed by 30 cycles of denaturation at 94 8C for 1 min, annealing at 50 8C for 1 min and extension at 72 8C for 2 min, with a final extension at 72 8C for 5 min. We obtained variously sized PCR products ranging from 500 to 1200 bp, and compared the sequences of this region from 33 species covering the infrageneric classification of the tribe. Sequence editing and assembly were performed using Sequencher (ver. 5.1). The sequence alignment was initially performed using MAFFT (Katoh et al. 2002) and was adjusted manually.
The gene content and order were slightly different between both species because of the rpl22 position in the IR-LSC boundary and trnI-CAU duplication in IR. While the rpl22 gene remained in the LSC region of the T. maculatum plastid genome, this gene was present in the IR region of T. tschonoskii plastid genome (Fig. 2). In total, 116 genes of T. maculatum were identified and consisted of 78 coding genes, 4 rRNA genes, 31 tRNA genes and 3 pseudogenes, while those of T. tschonoskii were 115 genes without tRNA gene duplication [see Supporting Information- Table S1]. In addition, T. tschonoskii has 7 coding genes, 4 rRNA genes, 9 tRNA genes, 2 pseudogenes, whereas T. maculatum has 8 coding genes, 4 rRNA genes, 8 tRNA genes, 2 pseudogenes, duplicated in the IR region, making a total of 138 genes and 137 genes presented in the T. tschonoskii and T. maculatum chloroplast genome, respectively. Among these genes, 22 intron-containing genes were found including  Furthermore, the cemA gene located in the LSC of both genomes was also pseudogenized.

Characterization of single inversion in subgenus Phyllantherum
Based on comparison of T. maculatum, T. tschonoskii and P. verticillata, a single inversion of 3.4 kb is characterized in the chloroplast genome of T. maculatum. This inversion is located between the ndhC and rbcL genes. We designed three different primers including I1F (5 ′ -CCC TAG GTT TTT TTC TTC AAG-3 ′ ), I1R (5 ′ -TTA TGT AGC TTA TCC TTT AGA CC-3 ′ ) and I2R (5 ′ -AGA AGG TCT ACG GTT CGA G-3 ′ ) to confirm and clarify the distribution of this inversion throughout the genus Trillium. Specifically, the primer pairs of I1F and I1R worked only in the normal type, while I2R and I1R primer pairs were utilized for the recognized inversion type among examined species. The results showed that the inversion occurred in all examined species of the subgenera Phyllantherum (Fig. 3A and B).

Indels, SNPs and SSR between two subgenera of Trillium
A total of 402 indels were detected between T. maculatum and T. tschonoskii, and most indels were located in the IGS regions (78.2 %). 66.2 % of the total number of indels were found in the LSC, while 22.1 and 11.7 % were present in the SSC and IR regions, respectively [ Table 1, see Supporting Information- Table S2]. The average length of indels was 74.8 bp, and the largest indel was located in ycf1 and ycf2. The frequency of 1 bp indels was 10.6 %, while 79.3 % of all indels were over 20 bp in length. In rRNA sequences, one indel of 3 bp and four indels of 5 bp were found in 16S rRNA and 23S rRNA. In addition, indel events were identified in 20 coding genes of both species (accD, atpB, ccsA, cemA, clpP, infA, matK, ndhF, rpl2, rpl20, rpl22, rpl32, rpoC1, rpoC2, rps11, rps15, rps18, rps19, ycf1 and ycf2).
A total of 2861 SNPs were detected between T. maculatum and T. tschonoskii (Table 2), and 1620 SNPs were transversions. In total, 1707 (59.7 %) SNPs were located in the coding regions, and 1154 (40.3 %) were within IGS regions or within introns.
In our result of SNPs, p-distance values in coding regions range from 0.002 to 0.23 and the average value was 0.02. On the other hand, the average p-distance value in non-coding regions was 0.034. Figure 4 shows the average p-distance for five classes of genomic regions: protein-coding genes, tRNAs, rRNAs, IGSs and introns. The IGS divergence is almost double that of the next highest class (genes). Introns hold the lowest sequence divergence, at an average of 0.011%.
We detected SSRs longer than 8 bp in T. maculatum, T. tschonoskii and P. verticillata chloroplast genomes by the method of Qian et al. (2013). According to Qian et al., the threshold was set because 8 bp or longer SSRs are prone to slipstrand mispairing, which is thought to be the primary mutational mechanism causing their high level of polymorphism. In this analysis, the total number of SSRs was 204 in P. verticillata, 205 in T. maculatulatum and 213 in T. tschonoskii (Table 3). The most abundant type of SSR in Parideae was a mononucleotide, with 138 in P. verticillata, 121 in T. maculatulatum and 133 in T. tschonoskii. In addition to mononucleotide SSRs, there are 52 dinucleotide SSRs in P. verticillata, 57 in T. maculatulatum and 53 in T. tschonoskii. Trinucleotide SSRs were less frequent with 6, 14 and 7 in P. verticillata, T. maculatum and T. tschonoskii, respectively. The hexanucleotide SSRs were found only in Trillium species. The majority of mononucleotide repeats were A-T rich ( Table 3).

Type of trnI-CAU of Parideae
We compared the sequences of the IGS region between rpl23 and ycf2 using 33 species including Xerophyllum to understand the evolutionary implication of trnI-CAU duplication, which was reported from the Paris chloroplast genome (Do et al. 2014). Based on the results, we found that this region is of highly variable length among the species, and we distinguished three major types based on the number of copies of trnI-CAU (Fig. 5). Type A was composed of a single trnI-CAU and was found in Xerophyllum, Pseudotrillium rivale and T. undulatum. It was also identified in several Trillium and Paris species, but with variable lengths: in subgenus Remarkably, section Kinugasa (Paris japonica) has the longest length of IGS between rpl23 and ycf2 among the tribe Parideae. Type C, possessing three copies of trnI-CAU genes in the sequenced region, was detected in T. govanianum and section Paris of subgenus Paris. They included three fully repeated units including trnI-CAU, and the lengths were 155 and 139 bp, respectively.

Discussion
Comparison of complete plastid genomes of subgenera Trillium and Phyllantherum The plastid genome structure of the two Trillium species, T. maculatum and T. tschonoskii, have a typical form found in most angiosperms (Zhang et al. 2011;Kim and Kim 2013;Li et al. 2013;Qian et al. 2013). The T. tschonoskii chloroplast genome was 507 bp shorter than T. maculatum, and we confirmed that the length variation among Parideae chloroplast genomes including Paris verticillata occurred by gene deletion and duplication as well as its IR expansion. Although chloroplast genomes are considered highly conserved among land plants, sequence polymorphisms were often observed among closely related species. From the T. tschonoskii and T. maculatum chloroplast genome sequences, we confirmed that 402 indels and 2861 SNPs were present between the two species.
In addition, we found that SSRs (i.e. microsatellites), composed of 1 -6 bp in length per unit, are distributed throughout both genomes. The SSRs have been accepted as one of the major molecular markers for genome variation between species or within populations due to their high polymorphism within the species and have been widely practiced for analysing plant population structure, diversity, differentiation and maternity analysis . Simple sequence repeats have successfully been applied to the study of Poaceae, Brassicaceae and Solanaceae (Provan et al. 1997(Provan et al. , 1999Bryan et al. 1999;Flannery et al. 2006). Simple sequence repeats detected in the present study will provide basic information for the further analysis of genetic diversity in Parideae.
Based on our results, the IR/LSC boundary and the IR/ SSC boundary differed between the two subgenera of Trillium. Inverted repeat/large single-copy junction was expanded to a part of rps3 in T. tschonoskii, whereas that of T. maculatum was found at rps19. The ycf1 was completely located in SSC of T. tschonoskii, but a part of the ycf1 gene was duplicated in IR of T. maculatum. Within the Parideae, the IR boundary pattern of T. tschonoskii was more similar to P. verticillata than T. maculatum (Fig. 2).

Inversion events in Melanthiaceae
Inversions caused by the recombination between repeated sequences are considered to be a main mechanism for changes in gene order among plastid genomes (Jansen and Ruhlman 2012). Most of the reported inversions in plastid genomes are in the LSC region (Kim et al. 2005). In subtribe Phaseolinae of Fabaceae, there is a 78 kb inversion between trnH/rpl14 and rps19/rps8 in the chloroplast genome (Tangphatsornruang et al. 2010). Additionally, Kim et al. (2005) reported that the inversion occurred in the spacer between tRNA Gly and tRNA Ser genes of Lactuca sativa. Also, they defined two inversions that characterize Asteraceae. The two inversions were identical across all members of Asteraceae, suggesting that the inversion events are likely to occur simultaneously or within a short period of time following the origin of the family. In Campanulaceae, .50 large inversions occurred during diversification of the family, in which at least 20 occurred in Cyphia, and a minimum of 53 are now known in Lobelia (Knox 2014). Fabaceae are known to exhibit a number of unusual phenomena in their chloroplast genome: Trifolium subterraneum has undergone extensive genomic reconfiguration, including the loss of six genes and two introns and numerous gene order changes, attributable to 14 -18 inversions (Cai et al. 2008).
Our results confirmed a single inversion in Melanthiaceae. It was remarkable that a single inversion of 3492 bp embedded four genes between ndhC and rbcL genes, which specifically occurred in the monophyletic subgenus Phyllantherum (Fig. 3). This event is thought to have occurred after the evolutionary divergence between subgenus Phyllantherum and subgenus Trillium. This new finding may be an effective molecular marker for classifying subgenera of the genus Trillium.

Diverse patterns of trnI-CAU duplication in Parideae
Gene duplication is an important process in organellar genome evolution. Most duplicated genes occur within the IR regions due to the mechanisms underlying IR expansion and contraction (Xiong et al. 2009). Gene duplication in plastid genome has been reported in tRNA genes (Hipkins et al. 1995;Vijverberg and Bachmann 1999;Schmickl et al. 2009) and in some protein-coding genes. Most of the duplications can be detected only in rearranged chloroplast genomes, as in grasses, legumes and conifers. Hipkins et al. (1995) compared the number of direct repeats between partially duplicated trnY-GUA and the complete trnY-GUA gene in Pseudotsuga. They found that the length-variable region in Pseudotsuga comprised imperfect tandem direct repeats based on the trnY gene sequence. Schmickl et al. (2009) used the 5 ′ -trnL-UAA_trnF-GAA region for phylogeographic reconstructions, gene diversity calculations and phylogenetic analyses among the genera Arabidopsis and Boechera. The Cruciferous taxa are characterized by these pseudogenes in at least four independent phylogenetic lineages. In addition, the tRNA gene as well as the coding gene could be confirmed by duplication events in Jasminum and Menodora, which have the duplicated rbcL_psaI region. Most chloroplast gene duplications outside of the IR involve tRNAs, as in the case of Oleaceae .
A total of 30 -32 tRNA genes are present within the chloroplast genome of land plants (Tsudzuki et al. 1994;Vijverberg and Bachmann 1999), and they may be involved in chloroplast genome rearrangements through their secondary structure (Howe et al. 1988). These genes are dispersed throughout the genome, but five to eight   1993). We found that three major types of trnI-CAU gene duplication are located between rpl23 and ycf2 at the IR of tribe Parideae (Fig. 5). Traditionally, Parideae included two genera, Paris and Trillium; however, Trillium was separated into two genera Trillium and Pesudotrillium in recent classifications (Farmer and Schilling 2002). Using the various duplication patterns of trnI-CAU in the IR region, the infrageneric circumscription of Parideae member was strongly supported. The type of trnI-CAU that had been discovered in Xerophyllum, Pesudotrillium and T. undulatum with one trnI-CAU between rpl23 and ycf2 was seen to be similar to the ancestor of Parideae (Type A, Fig. 5). This type was found also in most chloroplast genomes of Liliales (Liu et al. 2012;Bodin et al. 2013;Do et al. 2013;Kim and Kim 2013). It was modified in subgenus Trillium of Trillium and subgenus Daiswa of Paris to be extended by the tandem repeat between trnI-CAU and ycf2. Type B was found in subgenus Phyllantherum of Trillium and section Kinugasa of subgenus Paris although section Kinugasa possessed the additional tandem repeat between trnI-CAU units. Type C, which was found in T. govanianum and section Paris of subgenus Paris, has three copies of the trnI-CAU gene. From the results, we suggested that duplicate events of trnI-CAU have occurred independently in the tribe Parideae of Melanthiaceae, and it provided useful information for determining the infrageneric circumscription. However, T. govanianum, which was classified into another genus Trillidium by Farmer and Schilling (2002) based on morphological characters and geographical distribution, was more similar to Paris than Trillium. Also, this result showed that the trnI-CAU gene duplication pattern of T. govanianum was more similar to Paris than Trillium. unpublished data). Further studies are necessary to clarify the relationship between both species.

Conclusions
We analysed the complete chloroplast genomes of two species of T. tschonoskii (subgenus Trillium) and T. maculatum (subgenus Phyllantherum) to verify the specific feature in the genome level. As a result, we found a 3.4 kb inverted sequence between ndhC and rbcL in the LSC region in the chloroplast genome of T. maculatum, which was unique to subgenus Phyllantherum. In addition, three different gene duplication patterns of trnI-CAU gene were found and they were the informative molecular markers for identifying the infrageneric taxa of Trillium.

Conflict of Interest Statement
None declared.

Supporting Information
The following additional information is available in the online version of this article - Table S1. Genes found in Trillium tschonoskii and T. maculatum chloroplast genomes. Table S2. The detailed list of insertion-deletion mutations between the chloroplast genomes of T. tschonoskii and T. maculatum in Parideae. Table S3. Sequences of rpl23_ycf2 IGS among Parideae species.