Dominant Allele Phylogeny and Constitutive Subgenome Haplotype Inference in Bananas Using Mitochondrial and Nuclear Markers

Abstract Cultivated bananas (Musa spp.) have undergone domestication patterns involving crosses of wild progenitors followed by long periods of clonal propagation. Majority of cultivated bananas are polyploids with different constitutive subgenomes and knowledge on phylogenies to their progenitors at the species and subspecies levels is essential. Here, the mitochondrial (NAD1) and nuclear (CENH3) markers were used to phylogenetically position cultivated banana genotypes to diploid progenitors. The CENH3 nuclear marker was used to identify a minimum representative haplotype number in polyploids and diploid bananas based on single nucleotide polymorphisms. The mitochondrial marker NAD1 was observed to be ideal in differentiating bananas of different genomic constitutions based on size of amplicons as well as sequence. The genotypes phylogenetically segregated based on the dominant genome; AAB genotypes grouped with AA and AAA, and the ABB together with BB. Both markers differentiated banana sections, but could not differentiate subspecies within the A genomic group. On the basis of CENH3 marker, a total of 13 haplotypes (five in both diploid and triploid, three in diploids, and rest unique to triploids) were identified from the genotypes tested. The presence of haplotypes, which were common in diploids and triploids, stipulate possibility of a shared ancestry in the genotypes involved in this study. Furthermore, the presence of multiple haplotypes in some diploid bananas indicates their being heterozygous. The haplotypes identified in this study are of importance because they can be used to check the level of homozygozity in breeding lines as well as to track segregation in progenies.


Introduction
Banana including plantain is among the world's most important staple food crops cultivated with an annual production of $145 million tonnes of which <17% is for export trade (FAOSTAT 2014). Africa contributes one-third of the global production with East Africa being the largest banana-growing region accounting for $40% of the total production in Africa. Banana production is hampered by pests and diseases including weevils and nematodes, fusarium wilt caused by the fungi Fusarium oxysporum f. sp. cubense, banana Xanthomonas wilt caused by bacteria Xanthomonas campestris pv. musacearum, black Sigatoka caused by Mycosphaerella fijiensis and viral diseases caused by Banana streak virus (BSV) and Banana bunchy top virus (BBTV; Jones 2000; Surridge et al. 2003;Tripathi et al. 2009). Large-scale producers can afford chemical control for the pests and diseases but the genotype, which is characterized by low genetic diversity due to clonal propagation of nearly-sterile dominant cultivars (Baurens et al. 2010).
Cultivated banana and plantain commonly exist as autopolyploids and allopolyploids, the later representing different combinations of A and B and sometimes S and T genomes. Mutations may exist in heterozygous allele sequences within individual subgenomes as well as in homoeologous sequences among the polyploid's constitutive subgenomes (Kaur et al. 2012). Identification of number of gene-alleles present in polyploids is very important as variability in numbers has been observed to have either additive or nonadditive expression patterns. Furthermore, gene redundancy in polyploid bananas covers-up for loss-of-function or even rescue lethal mutations arising in the homeologs and this masks evolutionary changes because they are not morphologically visible. Moreover, alleledosage can help in identification of haplotypes which can in turn help identify progenitors contributing to the polyploid genome (Baurens et al. 2010;Boonruangrod et al. 2008;Li et al. 2013). The variations due to subgenomic alleles may sometimes be difficult to identify without mapping of the polymorphic loci, clone-based-sequencing or use of the latest approaches of haplotyping like genotyping by sequencing (GBS). Moreover, some of mapping approaches like Simple Sequence Repeats (SSRs) may not clearly identify biases like segregation bias common in mapping populations. Furthermore, most subgenomic allele discriminatory methods are expensive and out of reach for majority of laboratories especially those in developing countries. Moreover, the approaches used in haplotype identification in banana have included manually comparing and scoring amplified bands on agarose gel (Boonruangrod et al. 2008) which is laborious and could result in bias. Despite the advancement in haplotype identification techniques like next generation sequencing, the prices are still prohibitive for them to be used in some of the laboratories. Low cost approaches that simplify the identification of progenitors from cultivated bananas would contribute more to banana breeding efforts.
Amplified sequences have previously been used to identify homologs, homeologs, SNP-alleles as well as haplotypes in several crops like maize (Ching et al. 2002;Ding and Cantor 2003). The use of amplified sequences to identify haplotypes has led to development of algorithms that enable phasing (Neigenfind et al. 2008;Browning and Browning 2011;Efros and Halperin 2012;Guo et al. 2012). As DNA in polyploids contains multiple nucleotide calls at regions in which the constitutive genomes differ, these can be used to identify haplotypes of a specific gene within the polyploid genomes. Haplotypes especially from low-copy genes like CENH3 can be used to identify diploid progenitors in polyploids as well as for screening introgression lines (IL) in breeding. The use of amplified sequence drastically reduces haplotyping cost and especially for direct sequencing of PCR products compared with the long procedure of haplotyping based on cloning.
This study used the partial coding sequence of the markers centromere specific histone 3 (CENH3) and nicotinamide adenine dinucleotide dehydrogenase subunit 1 (NAD1) to establish the relationships between cultivated diploid and triploid as well as wild-type diploid progenitor bananas. On the basis of the nuclear gene CENH3, we further inferred haplotype(s) from amplified sequence of triploid and diploid bananas. The mitochondrial gene (NAD1) coding for NADH dehydrogenase subunit 1, an enzyme in metabolism of fatty acids, sugar and amino acids is generally conserved but variable enough to be used in differentiating taxonomic levels in plant molecular phylogeny (Bremer et al. 2002). The CENH3 gene is functionally conserved but is highly variable within the N-terminal tail region and exists in single or multiple copies in plants (Dunemann et al. 2014;Hirsch et al. 2009;Masonbrink et al. 2014;Wang et al. 2011). In this study, we obtained comparative phylogeny as well as representative haplotypes in bananas using the mitochondrial and nuclear genes NAD1 and CENH3 respectively.

Plant Material
In this study, a total of 40 banana genotypes/cultivars with variable genomic compositions were used (table 1). The samples included 10 genotypes of AA, 6 of AAA, 7 of AAB, 2 of AB, 6 of ABB, 1 of AS, and 5 of BB genomic composition and one tetraploid (ABBT cultivar) and the species Musa ornata and M. textilis from the sections Rhodochlamys and Australimusa, respectively. The cultivar "Ngombe" was obtained from Kenyan Agricultural and Livestock Research Organization (KALRO) and initiated as the in vitro culture at IITA-Kenya. The remaining samples were obtained from Bioversity International Musa International Transit Center (ITC) hosted in KU Leuven, Belgium (Ruas et al. 2017).

Genomic DNA Extraction
Genomic DNA was extracted from all the cultivars using CTAB (Porebski et al. 1997) except for the cultivar "Ngombe" that was extracted using Qiagen DNA extraction kit (Qiagen, Hilden, Germany).

Primer Design
Primers were designed to amplify partial coding sequence of CENH3 and NAD1 genes. Primers for CENH3 were designed based on coding sequence of "Doubled Haploid Pahang" (DH Pahang) using PrimerSelect-DNASTAR software (Madison, USA). The CENH3 primers were designed to target a 322 bp amplicon from position 6211 to 6533, which corresponds to the exon 1 and part of intron 1 obtained from in silico analysis of the 7 kb genomic sequence from M. acuminata genotype "DH Pahang". Sequences representing 27 species for NAD1 within the monocots group were downloaded from the National Centre for Biotechnology Information (NCBI; http://www. ncbi.nlm.nih.gov) GenBank release 186 and further assembled in Sequencher version 4.1 (Ann Arbor, MI) (supplemen tary table 1, Supplementary Material online). The positions of variable and conserved regions were noted within the consensus sequence, which was then exported for primer design. For NAD1, positions 215-231 for the forward and 1598-1615 for the reverse primers out of the 1,726 bp long sequence were targeted. The best primer pairs were selected for each target region and synthesized from Inqaba Biotech (Pretoria, SA) and listed in table 2.

PCR Analysis, Sequencing and Sequence Assembly
Six primer sets (three for each target region) were first optimized with eight genotypes selected from the 40 samples under study. On the basis of quality of PCR products and variability in the sequences, the best primer pair for each marker was selected and used to amplify the remaining samples. The sequences of the two primer pairs were: CENH3_gene_F2: CTGCTGTGATGGCGAGAAC (6530-6548) and CENH3_gene_R2: CTGGTGGCCGTGGTTC (6212-6227) for amplification of CENH3 gene and NAD1_c5F: GTCCCCGGCCAGAACCAC and NAD1_c5R: GCAGTCCGGGGCACAAG for amplification of NAD1 gene. PCR reactions were performed in a final volume of 20 ml reaction using the Bioneer PCR premix (Bioneer, Daejeon, South Korea) containing: 1 U of Top DNA polymerase, 250 mM dNTP, 10 mM Tris-HCl (pH 9.0), 30 mM KCl and 1.5 mM MgCl 2 . The amplifications were done in Applied Biosystem (ABI) GeneAmp PCR machine 9,700 using the profile: Initial denaturation of 94 C for 5 min, 40 cycles of 94 C for 30 s, annealing temperature of 59 C for 1 min, 72 C extension for 40 s, and a final extension of 72 C for 10 min for CENH3 gene, while NAD1 was amplified with an initial denaturation of 94 C for 5 min, 40 cycles of 94 C for 30 s, annealing temperature of at 64 C for 30 s and extended at 72 C for 1.5 min. The PCR products were separated on a 1.5% agarose gel stained with gelred (Biotium, CA) to confirm successful amplifications. Products with successful amplifications were cleaned using Bioneer PCR purification kit (Bioneer, Daejeon, South Korea) according to the manufacturer's protocol.
Sequencing of PCR products was performed from both ends (Forward and reverse) using automated sequencers ABI 3130 and 3730 prism as per set protocols. Sequences were assembled and edited in Sequencher version 4.1. All ends that had low confidence base calls were trimmed to retain high confidence sequences. Sequence assembly parameters were set at 85 and 25 and assembly done using the "assemble automatically parameter". The criteria for calling heterozygous secondary bases were set at 30% of the primary chromatogram. Finally, IUPAC codes were assigned to all the positions that were meeting the criteria for multiple calls and sequences exported in FASTA concatenated format for further analyses.

Sequence Alignment and Phylogenetic Reconstruction
FASTA concatenated sequences were aligned in ClustalW (Thompson et al. 1994) as implemented in MEGA5 (Tamura et al. 2011). Alignment of CENH3 sequences was done default Gap Opening Penalty (GOP), Gap Extension Penalty (GEP) and DNA weight matrix for both markers. Phylogeny reconstruction was performed by maximum likelihood (ML) for both markers. The two markers had different parameters in reconstruction except for the bootstrap (BP) statistical support of individual clades and the respective number of BP replicates which was set at 1,000. Model selection was performed in phyML (Guindon et al. 2010). During phylogenetic analysis for CENH3 sequences, the rates among sites were gamma distributed with a gamma parameter of 8.0, gaps with missing sequence sections were treated by

Haplotype Inference
Single nucleotide polymorphisms (SNPs) for CENH3 to which had been assigned IUPAC codes were exported to excel from MEGA5 and saved as a .cvs file. The SNP positions were grouped as per the genotype's ploidy (triploids and diploids). Triploid genotype that only had two nucleotide calls (a primary and secondary) at the target SNP site (missing a tertiary chromatogram) the IUPAC code N (A or T or G or C) was added in place of the third, indicating that the third base call may have been any of the four at that position (including homozygous type). SNPs within the diploid genotypes that only had one base call equally had an N added to complete the two base calls associated with diploidy (supplementary table 2, Supplementary Material online). The species M. ornata and M. textilis plus any other genotype that did not have multiple calls were assumed to be homozygous and therefore omitted from the analysis. The software SATlotyper (Neigenfind et al. 2008) was used for haplotype inference. Both triploid and diploid SNP positions were run with the SATsolvers set as MiniSat_v1.14_cygwin. All the SNP positions having multiple calls were used for analysis with a BP value of 100. Haplotypes identified in haploids were physically checked for duplication in triploids. Haplotypes that were duplicated in diploid and triploids were treated as a single incidence. Duplicated haplotypes together with the unique ones (occurring only once in either diploids or triploids) were considered specific to both diploids and triploids. All haplotypes and the genotypes from which they were inferred were tabulated to indicate relatedness.

PCR Amplification of CENH3 and NAD1 Genes in Banana Genotypes
Amplification was performed on banana genotypes with different genomic compositions and different sizes of amplicons were obtained using NAD1 marker (supplementary fig. 1, Supplementary Material online). However, a single amplicon of $322 bp was obtained with the marker CENH3. On the basis of NAD1 specific primers, bananas of genomic groups AAA, AA, AAB, and AB were observed to have a band size of $1,000 bp, whereas ABB and BB genomic groups had a band size of 1,500 bp. The species M. textilis (T-genome) was observed to have the smallest band size of $900 bp. None of the genotype used in this study showed multiple bands on the gel. PCR products were sequenced in order to identify the dominant allele and also to check reasons behind the differences in size of NAD1 amplicons. To ensure sequence authenticity, sequencing was done bidirectionally using the same primers used for amplification (table 2). The differences in sizes of bands in amplicons from different genomic groups observed in PCR were identified to be due to insertions and deletions (indels; fig. 1).

Sectional and Genomic Group Congruence
Bidirectional sequencing and clean-up resulted in a total of 59 sequences, 36 for NAD1 and 23 for CENH3. The data for the genotypes Pelipita, Calcutta 4, Tani, Red Yade, and Yawa 2 were treated as missing data for the NAD1 marker due to consistently poor quality of sequence. The sequences obtained were deposited in the NCBI under accession numbers KP751256-KP751292 for NAD1 and KP751293-KP751328 for CENH3. Because CENH3 was used in haplotype inference, it was paramount to map it in the context of the available banana whole genome ( fig. 2). The CENH3 gene was observed to be within the contig NW_008990373.1 of the M. acuminata subsp. malaccensis chromosome 8 genomic scaffold (Ma08_p09050.1; banana genome, annotation version 2) specifically within scaffold ASM31385v2_scafold_4 (D'Hont et al. 2012) of the NCBI annotation release 101 ( fig. 2).
Multiple alignments of clean sequences of NAD1 and CENH3 for all the 40 genotypes resulted in an aligned length of 1,448 bp (including indels in some cultivars) and 278 bp for the two markers respectively. Among all sequences, 196 and four SNPs were observed to be variable for NAD1 and CENH3, respectively. In order to ensure that the observed SNP variations were genuine and not due to PCR errors, they were  . 3). NAD1 marker was unable to differentiate diploids within AA genome represented by the subspecies banksii, zebrina, malaccensis, and siamea. However, the diploid AA in the subspecies truncata segregated within clade A albeit as a subclade.
The clade B in the NAD1 phylogeny had representation from BB and ABB genomic groups and was strongly supported at a BP support of 100. This clade had nine subclades represented by individual genotypes of both BB and ABB genomic constitution. The genotypes with ABB genome within this clade were represented by Saba, Namwa Khom, Kluai Tiparot, Monthan, and Dole ( fig. 3). The genotypes with BB genomes within this clade were represented by Pisang Klutuk Wulung, Pisang Batu, Lal Velchi, and Honduras.
In the third clade C had the species M. textilis as the only representative of the section Australimusa. The species M. ornata of the section Rhodochlamys did not cluster to any of the three clades (A, B or C).

Genotype Congruence Based on the Marker CENH3
The phylogeny based on the marker CENH3 had two major clades and associated subclades ( fig. 4 and supplementary data 2, Supplementary Material online). The first clade the "A genome genotypes" had representative genotypes with A genome (AA, AB, AAA, and AAB) except for the genotype Wompa, which is an interspecific hybrid between M. acuminata and M. schizocarpa (AS genomic composition) and the species M. ornata. This marker was able to differentiate subspecies and genome groups within the A genome clade and these appeared in the phylogenetic tree as subclades, each having one or more genotype(s). Two subclades with more than one genotype were observed to segregate from the A genome clade. These two multigenotype subclades included the one with the genotypes Zebrina (AA), Mbwazirume (AAA), and Ngombe (AAA), which were supported at a 57 BP. The Table 3 Variable SNPs from the Marker NAD1 that Differentiated A and B Banana Genomes SNP Number

Genotype
GenomicGroup 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20  The second clade based on CENH3 marker, "B genome clade" had genotypes from the BB, ABB, and ABBT genomic groups ( fig. 4). This clade had a total of eight genotypes, out of which four were from BB, three from ABB and one with ABBT genomic composition. The BB genotypes in this subclade included the genotypes Pisang Batu, Lal Velchi, Pisang Klutuk Wulung, and Tani. The ABB genotypes in this B genome clade included Saba, Dole and Kluai Tirapot. The only tetraploid within this clade was represented by Yawa 2 (ABBT). The genotypes Honduras, Monthan, Pisang Bakar and M. textilis were treated as missing data resulting from failure to PCR amplify.

Haplotypes in Diploids and Triploids
Haplotypes were obtained by analyzing SNP positions with multiple base calls within the CENH3 gene. Seven SNP positions in 34 genotypes were used to infer haplotypes (table 4).
A total of 13 haplotypes were observed based on the seven SNPs and out of these five were found to be present in diploids and triploids, three SNPs were observed in diploids only while the rest were unique to triploids. Haplotype numbers 1, 2, 3, 5 were observed to have the highest genotype representation (table 4).

Discussion
Several studies have been conducted using molecular approaches in banana to identify their origins and domestication patterns especially cultivated polyploids from their diploid progenitors (De Langhe et al. 2010;Hippolyte et al. 2012;Lescot et al. 2008). In this study, a mitochondrial and a nuclear marker were used to establish the phylogeny and specific haplotypes of cultivated triploids and a tetraploid banana to the potential diploid progenitors. The phylogenies using the dominant allele with both markers resulted in clear segregation of genotypes based on the respective genomic constitution. The major genome was observed to dominate with AAB genotypes segregating together with AA and AAA genotypes and ABB with BB genotypes. Moreover, the A genome was observed to dominate in the genotypes with AB and AS genomic composition as they were observed to segregate with the AA and AAA. It is therefore important to consider the subgenomic alleles through haplotyping.
The sizes of NAD1 amplicons were dependent on the genomic group, which was attributable to insertions and deletions (indels) within the sequences. The genotypes with BB and ABB genomes were observed to have a larger amplicon in comparison with genotypes with AA, AAB, and AAA genomic composition. This marker was able to differentiate the B cytoplasm in ABB genotypes and A cytoplasm in AAB heterogenomic triploids, as clearly reflected through the band sizes of amplicons (supplementary fig. 1, Supplementary Material online).
Three different sections (Eumusa, Rhodochlamys, and Australimusa) represented in this work were differentiated using the two markers. The result that M. ornata representative of the section Rhodochlamys clustered with the A genome genotypes of the section Eumusa using the CenH3 marker is consistent with other studies (Christelov a et al. 2011;H ribov a et al. 2011;Wong 2002). This observation corroborates Wong's (2002) andChristelov a et al.'s (2011) findings and suggestions that these two sections should be merged, as there are no great genetic differences that warrant their positioning into different sections.
The only representative of the AS diploid genotype "Wompa" belonging to M. schizocarpa was observed to cluster with the M. acuminata clade within the section Eumusa hence supporting the point made by H ribov a et al. (2011) that M. schizocarpa is closely related to M. acuminata.
Alternatively, this clustering could also be due to the A-allele in this AS genotype being the dominant allele and therefore easily amplifiable and detectable.
It was observed that the B genome clade and the respective genotypes are distant to that of M. acuminata despite being in the same section ( figs. 3 and 4). Despite being in a distinct clade, the monophyly of the B genome clade cannot be assertively concluded due to the limited number of markers used in this study. However, the species M. ornata of the section Rhodochlamys was observed to be closer to M. acuminata than it was to M. balbisiana. This observation is consistent with the observation by Christelov a et al. (2011) on Musa species where B genome is shown to have diverged 27.9 Ma while M. ornata of the section Rhodochlamys diverged only 8.8 Ma. Furthermore, the results that M. ornata of the section Rhodochlamys and M. acuminata of Eumusa being similar confirm Simmonds (1962) work which showed that hybridization between species of section Eumusa and Rhodochlamys are successful and attributed to lack of any reproductive barriers between the two sections.
Surprisingly in this work, the clustering of the ABB genotypes Pelipita, Monthan, Dole, and Saba to the B genome clade is contrary to what was observed by Boonruangrod et al. (2008). Indeed, clustering of the genotypes with ABB genome was not limited to the four genotypes but to all the ABBs; genotypes Kluai Tiparot and Namwa Khom were also observed to cluster to the B genome clade (figs. 3 and 4). The segregation of AAB genotypes is in accordance with the results obtained by Carreel et al. (2002), where all the AAB apart from Pisang Rajah and Pisang Kelat subgroups were found to have A genome cytoplasmic constitution. The segregation of this genomic group with NAD1 was not only observed in the sequences but also in the sizes of the amplicons. This suggests that the mtDNA region used in this study may not solely and conclusively identify the mitochondrial types hence more markers may be needed.
The segregation based on the marker CENH3 was similar in many ways to that of NAD1. The two major clades representing the A and B genomes observed in NAD1 phylogeny were also observed with CENH3, which shows consistency. The identification of A and B was derived by the observation that no diploid BB or AA genotypes clustered together, they were in either one of the two main clades for both markers ( fig. 4). The major deviation to the NAD1 phylogeny was mainly in the A genome clade where this marker was unable to differentiate the subspecies (fig. 4). The marker CENH3 is a nucleus based marker and hence the clustering of the A genome subspecies was expected.
The segregation of the A genome in subspecies malaccensis and truncata in this study compares favorably with the one observed by Wong (2002) where the two subspecies clustered separately. However, this observation deviates from the placement of the subspecies truncata as a synonym of the subspecies malaccensis by Hotta (1989). This study clearly differentiated the two subspecies, with malaccensis being supported as a separate clade from truncata ( fig. 4). Despite the limited number of markers used in this study, our study confirms the closeness of the subspecies burmannica (Long Tavoy) and burmannicoides (Calcutta 4) similar to observations made by Ude et al. (2002) and later confirmed and proposed to be joined as one (Sardos et al. 2016; fig. 4).
The segregation within the B genome clade was consistent in both markers used in this study and is similar in some aspect to observations made by Ude et al. (2002), which observed that ABB genotypes were closer to BB diploids. In this study using both markers, the AAB and ABB genotypes were observed to segregate into either A or B genome clades. Similarly, Ude et al. (2002) observed a close clustering between the triploid AAB and ABB with the diploid AA and BB, respectively. However, AAB and ABB genomic groups were observed to have A and B as well as B and A haplotypes respectively (table 3).
Different studies have indicated that only M. acuminata subspecies that originated from the islands of Southeast Asia (ISEA) have had genomic contribution to cultivated bananas (Li et al. 2013;Perrier et al. 2011). In this study, both phylogeny and haplotype inference indicated that ISEA subspecies contributed to triploid cultivated bananas (figs. 3 and 4). Phylogeny using both NAD1 and CENH3 was able to establish one major clade (Clade A) that had ISEA subspecies zebrina, errans, banksii, truncata, and malaccensis clustering. Moreover, the EAHB have been shown to have had genomic contribution from ISEA subspecies zebrina and banksii (Perrier et al. 2011). In this study, based on CENH3 marker, EAHB genotypes Ngombe and Mbwazirume were observed to cluster within clade A to major ISEA subspecies (figs. 3 and 4). These observations further attest to the ISEA subspecies contributing to EAHB. Observation of the diploid genotype Tomolo clustering with the wild type genotype M. acuminata subsp. malaccensis ( fig. 4) was contrary to other observations made that have linked this cooking diploid cultivar to wild type subspecies banksii (Carreel et al. 2002;De Langhe et al. 2010;Li et al. 2013). However, this deviation was observed only with the phylogeny based on CENH3 marker. This deviation can be explained by the fact that CENH3 marker is a nuclear gene, while NAD1 marker is of mitochondrial nature and the latter has uniparental inheritance.
The maternal and paternal nature of inheritance of chloroplast (cpDNA) and mitochondrial (mtDNA) genetic material in bananas (Fauré et al. 1993;Heslop-Harrison and Schwarzacher 2007) can facilitate the identification of the wild genotypes that contributed to banana cultivars. To infer haplotypes from amplified sequences, only the marker CENH3 was used, due to the fact that extranuclear DNA (NAD1) in bananas is monoparentally inherited and no crossing over in the case of NAD1 may have taken place during breeding. Furthermore, no multiple peaks were observed in the sequences obtained from triploid and diploid genotypes based on NAD1.
A total of 18 haplotypes (denoted by the number 1-18) were observed in this study. Haplotypes represented in a cultivar have been used as an indicator of the species that contributed to the cultivar and thus indicating the domestication patterns (Boonruangrod et al. 2008;Li et al. 2013). In our study, Haplotype 1 mainly consisted of B genome derived haplotypes (table 4). The two haplotypes for the genotype Kunnan (AB) were found to closely associate with haplotype 1, suggesting that indeed the B genome diploids contributed to these genotypes. Our results do not support the results obtained by Li et al. (2013), which suggested that this genotype should be renamed to AA. This observation however raises the question of why the A genome in the AB genotype was not identified in the analysis. The triploid genotypes Lady Finger (AAB), Saba (ABB), and Dole (ABB) were observed to have a haplotype similar to haplotype 1 which is a B genome haplotype (table 4). This indicates that B genome diploids contributed to the genomes of triploids and is in line with observations made from other studies (Li et al. 2013;Perrier et al. 2011). However, this study could not identify which BB diploid specifically contributed to the B genome in triploids. The clustering of two Lady Finger haplotypes to haplotype 2; which is essentially an A genome haplotype group, indicates that both AA haplotypes in the AAB haplotype of Lady Finger may have originated from the same M. acuminata subspecies. The two haplotypes of the genotype Wompa were related to Haplotypes 3 and 6. Haplotypes 3 had other genotypes of A genomic composition but no diploid wild type genotypes were in this haplotype. Two of the four genotypes that were together with Wompa in this haplotype were EAHB genotypes Ngombe and Mbwazirume. The other genotype with a haplotype represented in this group was the diploid genotype Tomolo (table 4). This indicates that Tomolo may have contributed its genome to EAHB genotypes. The approach used here provides a quick and easy approach to identify haplotypes and constitutive subgenomic polymorphisms.
In this study, no single genotype (either homogenomic or heterogenomic triploid) was observed to have mtDNA haplotypes that were obtained with the marker NAD1. This observation can be explained by the uniparental nature of mitochondrial inheritance in banana where both mitochondria and chloroplast DNA have previously been used to identify the origins of genomes in diploid, homo-and heterogenomic banana genotypes (Boonruangrod et al. 2008;Carreel et al. 2002). In the present study, direct sequencing of PCR products allowed to differentiate easily the maternal mtDNA types. We observed that multiple CENH3 alleles may exist in bananas similar to observations made in other crops.