Development of 5123 Intron-Length Polymorphic Markers for Large-Scale Genotyping Applications in Foxtail Millet

Generating genomic resources in terms of molecular markers is imperative in molecular breeding for crop improvement. Though development and application of microsatellite markers in large-scale was reported in the model crop foxtail millet, no such large-scale study was conducted for intron-length polymorphic (ILP) markers. Considering this, we developed 5123 ILP markers, of which 4049 were physically mapped onto 9 chromosomes of foxtail millet. BLAST analysis of 5123 expressed sequence tags (ESTs) suggested the function for ∼71.5% ESTs and grouped them into 5 different functional categories. About 440 selected primer pairs representing the foxtail millet genome and the different functional groups showed high-level of cross-genera amplification at an average of ∼85% in eight millets and five non-millet species. The efficacy of the ILP markers for distinguishing the foxtail millet is demonstrated by observed heterozygosity (0.20) and Nei's average gene diversity (0.22). In silico comparative mapping of physically mapped ILP markers demonstrated substantial percentage of sequence-based orthology and syntenic relationship between foxtail millet chromosomes and sorghum (∼50%), maize (∼46%), rice (∼21%) and Brachypodium (∼21%) chromosomes. Hence, for the first time, we developed large-scale ILP markers in foxtail millet and demonstrated their utility in germplasm characterization, transferability, phylogenetics and comparative mapping studies in millets and bioenergy grass species.


Introduction
Foxtail millet [(Setaria italica (L.) P. Beauv] has recently been regarded as a tractable model crop due to its small genome ( 515 Mb; 2n ¼ 2x ¼ 18), low amount of repetitive DNA, inbreeding nature and short life cycle. 1,2 Moreover, its close relatedness to several bioenergy crops with complex genomes such as switchgrass (Panicum virgatum), napiergrass (Pennisetum purpureum) and pearl millet (Pennisetum glaucum) and its potential abiotic stress tolerance adds up to the merits of foxtail millet as an experimental model crop for exploring various plant architectural traits, evolutionary genomics and physiological attributes of the C 4 Panicoid grass crops. [1][2][3] Since foxtail millet has one of the largest sets of both cultivated and wild-type germplasm rich in phenotypic variations, it appears promising for association mapping and allele mining of elite and novel variants to be integrated in crop improvement programmes. 2,4,5 Hence, considering the importance of foxtail millet, the Joint Genome Institute (JGI) of the Department of Energy, USA, and BGI (formerly Beijing Genome Initiative), China, have recently sequenced its genome. 6,7 The availability of genomic sequence has motivated the scientific community to generate genomic resources, which ultimately had resulted in the development of largescale sequence-based genomic 8 and genic expressed sequence tag (EST)-derived microsatellite markers. 9 Noteworthy, the recently developed 'Foxtail millet Marker Database (FmMDb)' (http://www.nipgr.res.in/ foxtail.html) 10 encompasses all these generated genomic resources for the benefit of research community aiming in genetic improvement of target millet and its related bioenergy crop species, thus bridging the gap between the researchers and breeders.
DNA markers such as restriction fragment-length polymorphism, 11 random amplified polymorphic DNA, 12 amplified fragment-length polymorphism, 13 simple sequence repeat polymorphism (SSR), 14 single-nucleotide polymorphism 15 and intron-length polymorphism (ILP) 16 exploit the variations, or polymorphisms in DNA sequences are used in various genotyping applications in crop plants. Of these, ILP markers are unique since they are gene-specific, co-dominant, hypervariable, neutral, convenient and reliable. 17 ILP markers utilize the variation in the intron sequences and are the most easily recognizable type as it could be detected by PCR with primers designed on exons flanking the target intron. 18 Thus, these markers in spite of being derived from gene sequences showed higher intra-specific polymorphism in plant species than other kinds of markers. In addition to being sequence-tagged sites markers, 19 ILP markers have high transferability rates among related plant species. 18,20 In order to facilitate straightforward mining of ILP markers, Yang et al. 17 has developed a web-based database platform named PIP ( potential intron polymorphism) to provide detailed information of the PIP markers and homologous relationships among PIP markers from different species.
Regardless of these advantages, very few reports are available on development of ILP markers in plant species when compared with reports on other DNA markers. Noteworthy, till now only one study has been carried out on ILP markers in foxtail millet, 20 where about 98 markers were developed and characterized. Hence, in view of the importance of ILPs and the availability of less significant number of ILP markers in foxtail millet, the present study was conducted aiming at: (i) developing ILP markers at large-scale from the entire set of publicly available foxtail millet ESTs, (ii) demonstrating the applicability of the ILP markers in examining genetic diversity and cross-species transferability and (iii) developing physical map for studying in silico ILP marker-based comparative mapping between foxtail millet and other grass species such as sorghum, maize, rice and Brachypodium.

Plant material and DNA isolation
The details of plant materials used in the study are summarized in Supplementary Table S1. The seeds of all the investigated species were surface-sterilized in 3% sodium hypochlorite for 20 min, rinsed with sterile distilled water and were germinated in greenhouse. The genomic DNA was isolated from the fresh young leaves using the CTAB method as described elsewhere. 21 The DNA was purified and then quantified on agarose gel by comparison with 50 ng/ml of standard lambda (l) DNA marker (NEB).

Development of putative ILP markers
The publicly available EST sequences of S. italica were searched and retrieved from NCBI dbEST (ftp://ftp.ncbi. nih.gov/blast/db/). Approximately 66 027 ESTs were used for the unigene definition using the CD-HIT (Cluster Database at High Identity with Tolerance) software tool (http://weizhong-lab.ucsd.edu/cdhit_suite/ cgi-bin/index.cgi) for redundancy minimization and assembling of sequences. The non-redundant ESTs were used for the development of specific intron-based markers. Using rice genomic sequence as reference, PIP database (http://ibi.zju.edu.cn/pgl/pip/) 17 was used to predict intron positions in the EST sequences and then designed a pair of primers flanking the intron position. A query EST was considered to be homologous to a subject-coding sequence only if there were at least 100 bp overlapping and 80% similarity between them. The corresponding position and length of identified introns from the subject species were obtained from the PIP database. 17 To cross-check the primer-designing potential of ILP markers and validate the results of the PIP database, the forward and reverse primers designed for the ILP markers were BLAST-searched against latest released foxtail millet pseudomolecules of nine chromosomes (http://www.phytozome.net).

Functional annotation and physical mapping of ILP markers
The putative functions of the ILP markers were assigned by executing BLASTX search of respective marker encompassing EST sequences against the non-redundant database at NCBI (http://blast.ncbi.nlm.nih.gov/Blast.cgi) with default search parameters. The ILP markers were BLAST-searched against the whole-genome sequences of foxtail millet available at Phytozome (http://www. phytozome.net) and plotted individually on each of the nine foxtail millet chromosomes according to their ascending order of physical position (bp), from the short-arm telomere to the long-arm telomere and finally visualized in the MapChart software. 22 To further validate the BLAST results of in silico physical mapping, we re-analysed the forward and reverse primers of ILP markers against foxtail millet chromosome pseudomolecules using the ePCR program (www.ncbi.nlm.nih.gov/ sutils/e-pcr). 23

Validation of ILP markers
The ILPs were amplified in a 25 ml total volume containing 1 unit of Taq DNA polymerase (Sigma), 50 ng in an iCycler thermal controller (Bio-Rad). The PCR profile was an initial denaturation of 3 min at 948C, followed by 35 cycles of 60 s at 948C, 60 s at 50 -558C and 2 min at 728C, and a final extension of 10 min at 728C. The amplicons were resolved on 2% agarose gel (Cambrex, USA) in Tris-borate EDTA buffer ( pH 8.0), stained with ethidium bromide and analysed using GelDoc-It TM imaging system (UVP). The fragment size for each locus was determined by 100 bp standard size markers (NEB). Results were confirmed by three replicate assays. The amplified products (alleles) from millet and nonmillet species were eluted and cloned into pGEM (R) -T Easy vector (Promega) following the manufacturer's instructions. The recombinant plasmids were purified using AccuPrep Plasmid MiniPrep DNA Extraction Kit (Bioneer) following the manufacturer's protocol. The plasmids were sequenced in an automated sequencer (3730xI DNA Analyzer, Applied Biosystems) using M13 forward and reverse primers. The sequence information was used to construct multiple sequence alignment along with a reference S. italica sequence using the ClustalW2 program (http://www.ebi.ac.uk/Tools/ clustalw2/index.html).

Genetic diversity
The ILP marker profiles amplified among 96 foxtail millet accessions were scored manually; each allele was scored as present (1) or absent (0) for each of the ILP loci. Polymorphic informative content (PIC) values were calculated according to Roldán-Ruiz et al. 24 as where f i is the frequency of the amplified allele (band present), and (1 2 f i ) is the frequency of the null allele (band absent) of marker i. Using pairwise similarity matrix of Jaccard's coefficient, 25 the level of genetic diversity among foxtail millet accessions was calculated and a phylogenetic tree was constructed by unweighted pair-group method of arithmetic average, neighbour-joining (NJoin) module of the NTSYS-pc software v2.02. 26 The genetic relationships among millets and non-millet grass species based on cross-transferability of ILP markers were determined based on Nei (1983) 27 diversity co-efficient, and a phylogenetic tree was constructed using the neighbour-joining (NJ) tree interface of the PowerMarker software ver2.5. 28 The observed heterozygosity (H O ), Nei's average gene diversity 27 and fixation index (F IS ) were also computed using the PowerMarker software ver2.5. 28 Correlation analysis among PIC and the number of alleles were examined using the GraphPad InStat software v3.10 (www. graphpad.com).

In silico comparative genome mapping
The ILP markers that were physically mapped on the nine chromosomes of foxtail millet were BLASTNsearched against genome sequences of sorghum, maize, rice and Brachypodium (www.phytozome.net) to develop marker-based syntenic relationships among the chromosomes of foxtail millet and three other grass species. A cut-off bit score of 54.7 and an E-value of ,1e 2 05 were considered optimum for BLASTN analysis. The marker-based syntenic relationships among foxtail millet, sorghum, maize, rice and Brachypodium were finally visualized with visualization blocks in the Circos software v0.55 (http://circos.ca). 29

Development of ILP markers and physical mapping
in foxtail millet genome A set of 66 027 EST sequences of S. italica produced 24 828 non-redundant ESTs, which were used for generating ILP markers by using rice as reference genome in the PIP database. 17 A total of 5123 ILP markers (20.6%) were generated out of 24 828 EST sequences with an average frequency of 12.6 ILP markers per megabase genomic sequences (Supplementary Table S2). BLAST analysis of 5123 ILP markers against the foxtail millet genome showed the presence of all the markers in the genome, and the determination of genomic distribution of these 5123 ILP markers on the foxtail millet genome revealed physical localization of 4049 markers on the nine chromosomes of foxtail millet with an average marker density of 9.8 markers/Mb ( Fig. 1; Table 1). The average marker density was maximum (14.1/Mb) in chromosome 9, followed by chromosome 5 (11.8/Mb), and minimum in chromosome 8 (5.7/Mb). An extensive analysis of chromosome-wise distribution and frequency of these physically mapped ILP markers showed higher frequency of markers mapped on chromosome 9 (831 markers, 20.5%) and minimum on chromosome 8 (230, 5.7%) ( Fig. 1; Table 1).

Functional annotation of ILP
BLASTX analyses of the 5123 EST sequences suggested a nearly defined function for 71.5% of ILP markers, and 28.5% had no similarities to previously sequenced genes. Based on the function, the ILP markers with defined function (71.5%) were grouped into five major categories (Fig. 2). The largest category (47.4%) contained EST sequences with hypothetical/ uncharacterized/putative functions. The second largest category (26.2%) comprised stress-related transcripts. The housekeeping proteins (9.5) ranked third, followed by protein kinases (8.6%) and transcription factors (8.3%) (Fig. 2). Foxtail millet being a potentially   3.3. Marker validation, cross-genera transferability and genetic basis of sequence length variation To amplify introns by PCR, the exon-primed introncrossing PCR (EPIC-PCR) method was used, 30 where primers were designed in flanking exons using the PIP database. The advantages of EPIC-PCR are it is fast, reliable, reproducible and convenient, providing ready-touse and clearly intelligible results. Hence, from the 4049 physically mapped ILP markers, 440 primer pairs flanking the exons were chosen for further analyses based on two criteria, viz. representing the whole genome of foxtail millet and the function of the EST. All the ILP markers were evidenced to produce clear, successful and reproducible amplification in S. italica cv. Prasad with 100% amplification potential (Supplementary Table S3). This demonstrates the significance of the developed 4049 ILP markers in expediting foxtail millet genomics and molecular breeding. About 391 ( 90%) of the 440 ILP markers amplified unique single allele, while 49 markers amplified more than one allele/multiple alleles. Thus, a total of 495 alleles were amplified by 440 ILP markers in S. italica cv. Prasad (Supplementary Table S3). All the 440 ILP markers have the ability to distinguish the investigated millet and non-millet species into two distinct groups (Fig. 3).
From these validated set of 440 ILP markers, about 100 markers representing the whole foxtail millet genome were chosen to evaluate its polymorphism and molecular diversity potential in a set of eight accessions of Setaria including five cultivated and three wild species. About 45 (45%) SiILP markers showed polymorphism and a total of 163 alleles ranging from 1 to 5 alleles were amplified by candidate SiILP markers with an average of 2.15 alleles per marker locus (Supplementary Table S4). The polymorphic potential ( 45%) of ILP markers estimated among foxtail millet cultivated and wild accessions was higher than that reported using the ILP markers derived from foxtail millet genomic sequences. 20 The higher polymorphic potential of ILP markers is expected because of their development targeting the hypervariable introns, which are under less selective pressure.
In order to investigate the utility of the ILP markers in cross-genera transferability, the 100 validated set of ILP markers were used to amplify the genomic DNA of millet (barnyard millet, finger millet, kodo millet, little millet, pearl millet, proso millet, guinea grass) and non-millet species (switchgrass, sorghum, maize, rice, Brachypodium) ( Table 2; Fig. 4). Of the 100 SiILP markers assayed, the highest transferability percentage (98%) was observed in proso millet and lowest (59.4%) in wheat, with an average percent transferability of 85% (Table 2). Markers which showed a consistent amplification profile in other species were scored as being cross-transferable, thus confirming the utility of developed ILP markers for revealing high cross-genera transferability. To gain further insight into the molecular basis of cross-transferability of ILP markers, the paralogous relationships of 4049 markers orthologous between foxtail millet and rice genes were determined. Three hundred and forty-seven (8.6%) of 4049 orthologous ILP markers were present in paralogous foxtail millet genes, while 80 (2%) were found in paralogous rice genes. It inferred that 3622 ILP markers designed in this study are unique either in the foxtail millet or in the rice genome. Interestingly, the 100 ILP markers showing orthologous relationships between foxtail millet and rice genes selected for cross-transferability  study did not show any paralogous relationships within either rice or foxtail millet genome. This confirms the higher cross-transferability potential of ILP markers due to orthologous relationships among species rather than paralogy within species.
To examine whether the PCR products were really amplified or homologous to the target genes in millet and non-millet species, we randomly picked up a primer pair, SiILP4686 (Zea mays PHD transcription factor), which amplified variant alleles from 214 to 455 bp (Fig. 4). As expected, the sequences of cloned PCR products of the investigated species revealed indels and several point mutations, such as singlebase insertions, deletions or translocations; in addition, polymorphism in intron length was observed (data not shown). Overall, multiple sequence alignment has shown that they were homologous to each other and comprised conserved exon regions at two end positions and non-conserved or variable intron region in the Figure 3. Genetic relationships among millet and non-millet grass species based on 43 foxtail millet ILP markers, using NJoin clustering. Nine millet species including foxtail millet clearly differentiated from the five non-millet grass species and expected genetic relationships among species under study were also apparent. middle. Similar kind of observation was also reported in rice, 18 Hypericum perforatum 31 and foxtail millet. 20 Further, higher level of transferability of ILPs compared with previously identified markers reflects the conserved nature of exon positions in gene and variability in the non-coding sequences. 18 Hence, in our study,   the high levels of cross-species amplification indicate that the foxtail millet ILP markers could be successfully useful for comparative mapping in millet and nonmillet species. Similarly, it was shown that EST-SSR has higher cross-species transferability than genomic SSR markers. 10,32,33 In addition, all the 100 ILP markers possess ability to distinguish the investigated millet and non-millet species belonging to different genera. The variations in the number of alleles per ILP marker locus in different species studied are possibly dependent upon ploidy level, nature and number of genotype sets in each species used for analysis.

Genetic diversity
Besides, the validated physically mapped markers could enable one to discriminate all the 96 cultivated and wild foxtail millet accessions from each other with a level diversity from 0 to 65% (Fig. 5). A core set of 89 cultivated S. italica accessions and 7 related wild species were used to decipher the polymorphic potential of 20 ILP markers representing the whole genome of foxtail millet. In total, 59 alleles were identified with an average of about 3 alleles per locus, varying from 2 to 5 (Table 3). This was comparable with a recent study in foxtail millet using ILP markers, 20 where the average number of alleles per locus reported was 2.6. The polymorphic information content (PIC) values were extended from 0.03 to 0.47 with a mean of 0.20. The observed heterozygosity (H O ) for individual loci ranged from 0.00 to 0.32, with a mean of 0.13 (Table 3). The PIC value was calculated in order to examine the extent of information on diversity that these markers can provide and compare these results with previous published studies. The average PIC value (0.20) reported in this study was lower than that reported for rice (0.45 and 0.44). 18,34 The probable reason for differences in results might be attributed to the difference in number of genotypes and their genetic background and number of markers used. Nei's average gene diversity (Nei) ranged from 0.03 to 0.52, with a mean of 0.22. Among all the loci analysed with fixation index (F IS ), 15 loci were found positive, representing excess of observed homozygotes, whereas 5 loci were negative, demonstrating heterozygotes, with a mean of 0.42 per locus (Table 3). There was no significant correlation observed between PIC and allele number for the 20 markers investigated (data not shown). The phylogenetic tree constructed in this study using ILP markers differentiated 89 cultivated S. italica accessions and 7 related wild species from each other and clustered according to their taxonomic classification (Fig. 5). The dendrogram constructed grouped 96 Setaria accessions into two distinct clusters, cluster I with 89 accessions comprising cultivated species (foxtail millet, S. italica) and cluster II includes the wild Setaria species (Fig. 5). Therefore, the ILP markers with high amplification and polymorphic potential distributed over nine chromosomes of foxtail millet genome could be promisingly useful for many large-scale genotyping applications in foxtail millet.

3.5.
In silico comparative genome mapping between foxtail millet and other grass species In order to substantiate that the ILP marker-based physical map constructed in this study for foxtail millet genome could be useful in comparative genome mapping, the physically mapped 4049 SiILP markers were compared with their physical location on the chromosomes of other related grass genomes, including sorghum, maize, rice and Brachypodium ( Fig. 6; Table 4). The comparative genome mapping showed considerably significant proportion of sequence-based orthology and syntenic relationship of SiILP markers distributed over nine foxtail millet chromosomes with sorghum ( 50%, 2038), maize ( 46%, 1867), rice ( 21%, 868) and Brachypodium ( 21%, 845) chromosomes ( Fig. 6; Supplementary Tables S5-S8).  Table S5). About 50% syntenic relationship of ILP marker loci between foxtail millet and sorghum chromosomes was observed on an average, with maximum synteny between foxtail millet chromosome 9 and sorghum chromosome 1 ( 91%), followed between foxtail millet chromosome 5 and sorghum chromosome 3 ( 89%) (Fig. 6a; Supplementary  Table S5).
3.5.2. Foxtail millet -maize synteny About 1867 ILP marker loci distributed over nine chromosomes of foxtail millet showed significant matches with 1867 genomic regions of 10 maize chromosomes ( Fig. 6b; Supplementary Table S6). Interestingly, each foxtail millet chromosome showed sytenic relationship with two maize chromosomes, thus highlighting the recent whole-genome duplication in maize. All the nine foxtail millet chromosomes showed considerable and higher average frequency ( 46%) of ILP markerbased syntenic relationship with specific maize chromosomes. The physically mapped ILP markers on the foxtail millet chromosome 9 showed maximum synteny ( 57%) with maize chromosome 1 and between foxtail millet chromosome 5 and maize chromosome 3 ( 51%) (Fig. 6b; Supplementary  Table S6).
Though there are many reports showing the mapping of microsatellite markers either genetically or physically on orthologous or syntenic chromosomes of different related plant genomes, 7 -9,35 -38 this is the first report of comparative mapping using ILP markers. The syntenic relationships showed a higher degree of synteny between foxtail and sorghum genome followed by maize, rice and Brachypodium, which is possibly due to their taxonomic relationship, where foxtail millet, sorghum and maize belong to same subfamily Panicoideae, while rice belongs to Ehrhartoideae and Brachypodium belongs to Pooideae. The comparative mapping thus demonstrates the decrease of colinearity with increasing phylogenetic distance among plant species. These results are in accordance with the interpretations reported using genomic SSR markers 8 and genic EST-derived SSR markers 9 in foxtail millet. This shows the applicability of ILP marker-based comparative genome mapping between foxtail millet and other grass species such as sorghum, maize, rice and Brachypodium in translating the sequence information/candidate genes from this diploid crop to other polyploid biofuel grasses. Further, the ILP markerbased comparative mapping between foxtail millet and other grass species could enable transfer of genebased marker information among these target species and thus would expedite map-based isolation of genes of agronomic importance in foxtail millet.

Conclusions
To the best of our knowledge, the ILP markers developed in this study are the large-scale novel set of markers in foxtail millet in addition to 98 earlier reported by Gupta et al. 20 The present study identified 5123 ILP markers from 24 828 non-redundant ESTs, of which 4049 markers were physically mapped onto 9 chromosomes of foxtail millet genome. The validation, cross-genera transferability and genetic diversity studies demonstrated expediency of these ILP markers in germplasm characterization, genome relationships in millet and non-millet species and comparative mapping. The newly developed large-scale SiILP markers will be made available to the research community through the FmMDb (http://www.nipgr.res.in/foxtail.  10 and this will promisingly expedite the molecular breeding in foxtail millet and other millets and forage grass species.