CAP Analysis of the Distribution of the Introduced Bemisia tabaci (Hemiptera: Aleyrodidae) Species Complex in Xinjiang, China and the Southerly Expansion of the Mediterranean Species

Abstract Bemisia tabaci (Gennadius) cryptic complex has invaded Xinjiang, China, since 1998. The distribution of Mediterranean (MED) and Middle East-Asia Minor 1 (MEAM1) B. tabaci substrains has been gradually identified due to the development of molecular technology. In this study, the distribution of MED and MEAM1 in Xinjiang was determined by cleaved amplified polymorphic sequence (CAPs). Results showed that MED dominated in northern Xinjiang (84%), whereas MEAM1 was dominant in southern Xinjiang (72%). Five pairs of simple sequence repeat (SSR) primers were used to analyze the genetic diversity of B. tabaci among 36 geographic populations. The genetic diversity of MED and MEAM1was low and varied little among populations in Xinjiang (0.09 ± 0.14 and 0.09 ± 0.13, respectively). Based on ∆K statistic, 13 populations of MEAM1 could be classified into two subgroups at K = 2, whereas the 23 populations of MED could be classified into four subgroups at K = 4. However, Mantel t-test demonstrated no correlation between geographical and genetic distances among B. tabaci complex (R = 0.42, P = 1.00). Neighbor-joining and principal coordinate analysis showed that geographical isolation and interspecific differences were the main causes of the genetic variation. Gene flow predicted that MEAM1 was most likely introduced from Urumqi to the southern Xinjiang. Meanwhile, a large proportion of MED in Kashi region came from Changji and Yining. To block ongoing dispersal, strict detection and flower quarantine regulations need to be enforced.

I) barcoding, Cao et al. (2011) found out that MED had invaded Xinjiang.
Recently, molecular markers such as COI barcoding and simple sequence repeat (SSR) have been widely used to identify species of whiteflies. These tools help in understanding genetic diversity, gene flow, and even the replacement of populations (David 1998, Götz andWinter 2016). Boykin et al. (2007) used the whole COI sequence to analyze the genetic diversity of the B. tabaci, and he unveiled 12 members of the species complex. With the discovery of MED B. tabaci's invasion, it was then known that Xinjiang harbored at least two cryptic species: MED and MEAM1 (Duan et al. 2011).
SSR is widely used in identification and in investigations related to the genetic evolution, genetic mapping, phylogenetic, and genetic diversity of animal and plant species. For example, Colorado potato beetles, Leptinotarsa decemlineata (Say) (Coleoptera: chrysomelidae), was a notoriously invasive species. Zhang et al. (2013) analyzed its genetic diversity among 10 geographical populations in Xinjiang and the mode of the beetle diffusion from the border to inner Xinjiang. Previous report on B. tabaci showed that SSR may not distinguish cryptic species in the complex unless there was an appropriate method to study population structure and the mechanism of invasion and diffusion (Chu et al. 2012). Thus, cleaved amplified polymorphic sequence (CAPs) was used to detect cryptic species within the B. tabaci species complex. Previous COI barcoding studies showed that only MEAM1 and MED of B. tabaci were found in Xingjiang (Ashfaq et al. 2015).
With the expansion of greenhouses and the increasing frequency of flower trade, whiteflies have invaded most areas in Xinjiang (Jia et al. 2017). As the largest cotton-producing area in China, Xinjiang is facing great threats of the whitefly-transmitted cotton leaf curl Multan virus. Despite of this, the distribution of MED and MEAM1 B. tabaci remains unclear. Meanwhile, the ecological conditions of the unique desert oasis agriculture in Xinjiang may impose certain limitations on the spread of whitefly. In this study, we aimed to uncover the distribution patterns of MEAM1 and MED from 2015 to 2017 to provide insights into the invasion and diffusion of the B. tabaci complex in Xinjiang.

Samples
Bemisia tabaci samples were collected from 54 sites in Xinjiang from June 2015 to December 2017 (Table 1). At least 200 adults per site were collected, placed in tubes containing 95% ethanol, and stored at −20°C. Thirty to 50 samples were randomly selected for identification. In total, 1,768 adults were identified as MEAM1 or MED cryptic species. We chose samples for SSR by initially, randomly sampling the sites, and then by the individuals coming from the chosen sites (Table 1).

Cleaved Amplified Polymorphic Sequence
Total genomic DNA was extracted from each adult according to Frohlich et al. (2010). Briefly, each adult was laid on clean parafilm and homogenized in DNase-free Eppendorf tube (Thermo Fisher Scientific) containing 5-µl lysis buffer (5 mM Tris [pH 8.0], 0.5 mM EDTA, 0.5% NP40, and 1 mg/ml protein K + ; Beijing Solarbio Science & Technology Co., Ltd.). The homogenate was then transferred to tubes with 30-µl lysis buffer and incubated at 65°C for 15 min and then at 95°C for 10 min. The products were stored at −20°C.

Simple Sequence Repeat
Five microsatellite markers were selected according to previous studies of De Barro et al. (2003) and Valle et al. (2012; Table 2). The 25-µl PCR system contained 2-µl DNA, 12.5 µl 2× Taq PCR MaserMix (Tiangen Biotech [Beijing] Co., Ltd.), and 1 µl of each primer (20 µM). The PCR program was set to the following conditions: initiation at 95°C for 5 min, followed by 45 cycles of denaturation at 95°C for 30 s, annealing for 30 s, extension at 72°C for 30 s, then final extension at 72°C for 5 min. Annealing temperatures for each primer are provided in Table 2. The PCR products were analyzed by capillary electrophoresis following the directions of DNF-905 kiton Fragment Analyzer TM (Advanced Analytic Service). Fragment detailed data were extracted using Agilent PROSize 2.0.

Genetic Diversity
PopGene32 was used to analyze genetic diversity parameters such as the number of alleles (N a ), number of effective alleles (N e ), average diversity index (H), and the Shannon diversity index (I). The inbreeding coefficients (F IS ) were also calculated to reflect the inbreeding status of the populations. Genetic differences among populations were evaluated by pairwise F IS values, with 10 000 permutations to assess significance (Weir and Cockerham 1984;FSTAT v.2.9.3.2, Goudet 2002).

Population Relationships
Group and population clusters were generated using STRUCTURE v.2.3.3 based on Bayesian methods (STRUCTURE v.2.3.3, Pritchard et al. 2000). ΔK method was used to estimate the optimum number of genetic groups (Evanno et al. 2005). We chose the admixture model with correlated allele frequencies and set the K from 1 to 10 for MED group and from 1 to 20 for MEAM1 group. We ran three replicates of each run and set a burn-in period of 50 000 Markov chain Monte Carlo generations followed by 5 × 10 5 iterations. To confirm the existence of population structure, analysis of molecular variance (AMOVA) was performed to assess the genetic variance partitioned into four levels (among groups, among populations, within groups, within populations, and within individuals) based on structure output, with 10 000 permutations to test for significance (Arlequin v.3.11, Excoffier et al. 2005). Correlation between geographical and genetic distances (Slatkin's linear F ST , F ST /(1 − F ST ), and D′) of all samples from Xinjiang as well as between and within groups were defined by Structure (Mantel procedure, GenAlEx   6.2; Peakall and Smouse 2006). We used a neighbor-joining cluster method in MEGA 3.01 to generate a circle phylogenetic tree based on D′ distance which was calculated using Popgene32. To assess the genetic associations of individuals among various groups representing cryptic species and population, a pairwise similarity matrix was generated using simple matching coefficient (Sokal and Michener 1958), with 10 000 permutations. Random, this was converted to Euclidean distance matrix as the square root of 1 minus element-wise similarity for a principal coordinate analysis (PCoA) using NTSYS-pc 2.01 (Rohlf 1987). Top 2 components were plotted for MED, MEAM1, MED, and MEAM1 complex. Direction of gene flow was inferred by a partial Bayesian method from GENECLASS 2.0, and assignment probabilities of individuals were tested from geographical populations with a threshold probability value of 0.01. Assigned probability data of MED to MED, MED to MEAM1, MEAM1 to MED, MEAM1 to MEAM1 groups were arcsin-square root transformed to normal distribution. Significant levels (P < 0.05) were tested using Tukey's multiple comparisons test of ANOVA in Graphpad Prism 6.0 (Rannala and Mountain 1997) and GeneClass v.2.0 , Piry et al. 2004).

Results
The Distribution of B. tabaci Cryptic Species The distribution of B. tabaci cryptic species were identified in 54 sampling sites during 2015 and 2017 by CAPs. The randomly chosen 631 samples for SSR were also sequenced to verify these results (data not shown). Results showed that there was a large percentage difference between northern and southern Xinjiang (Fig. 1)  to 0.08), and 0.15 ± 0.20 and 0.15 ± 0.20 (range 0.07-0.13), respectively. These indicates low genetic diversity and little variation among populations (Table 2). F is values were all negative in 36 SSR analyzed sites with 12 and 20 populations reaching P < 0.05 and P < 0.001 significant departure levels. These results indicate that the genetic diversity of B. tabaci in Xinjiang is low and the genetic differentiation is high.

∆K Statistic
Based on ∆K statistic, 13 populations of MEAM1 could be divided into two subgroups at K = 2, whereas the 23 populations of MED could be divided into four subgroups at K = 4 (Table 3). In MEAM1, one group Kashi-1, Kashi-2, Kashi-3 included those from Tulupan; the third group included those from Changji and Yining; and finally, the fourth group included those from Alashankou, Shihezi, and Hami.

Analysis of Molecular Variance
Basing on the subdivision by ΔK, AMOVA demonstrated that most of the variation came from within individuals (100.89%, 115.31%; Table 4), with K = 2 in MEAM1 group and K = 4 in MED group. Fixation indices of within populations (F is ) were negative (−0.23, −0.23). It demonstrated that there was a strong disassortative mating, which was consistent with the F is in each population (Table 5). Variation among population and groups in MEAM1 and MED population was low but significant (df = 11, F = 0.06, P = 0.000; df = 19, F = 0.04, P = 0.00; df = 1, F = 0.12, P = 0.00, df = 3, F = 0.03, P = 0.00; Table 4), which supported the groupings.

Mantel t-Test
Mantel test analysis of the matrix (Ln Km) based on ΔK subdivision showed that there was no correlation between geographical and genetic distances among B. tabaci complex (R = 0.42, P = 1.00), or in MED (R = 0.49, P = 1.00) and MEAM1 (R = 0.44, P = 1.00) group or subgroup (Table 6). MEAM1-group2 and MED-group3's data were not available as there were only two populations in these groups.

Neighbor-Joining Tree
The neighbor-joining tree (Fig. 3A) showed that most of the MED and MEAM1 population were clustered separately, except for the Kashi, Kelamayi, and Urumqi population. The Kashi-5-MED and Urumqi-MEAM1 population were clustered with Kashi-8-MEAM1 and Urumqi-MED. The rest of the Kashi-MED population (Kashi-1, Kashi-2, Kashi-3, Kashi-4, Kashi-6, Kashi-7, and Kashi-13), Kelamayi-MED in northern-Xinjiang, and Hetian-MEAM1 in Southern Xinjiang converged into one branch. Nei's genetic clustering results showed that there was no significant correlation between genetic relationship and the regional distribution of B. tabaci in Xinjiang.

Principal Coordinate Analysis
The phylogenetic tree has shown that most of MED and MEAM1 populations could be clearly separated according to geographic clustering. Thus, we verified the relationships between MED and MEAM1 samples with a PCoA plot. Based on 99 alleles of five SSR primers, the top three principle components accounted for 55.9, 67.9, and 70.2% of the SSR variation in MED, MEAM1, MED, and MEAM1 complex. There were no clear boundaries between MED and MEAM1 samples. However, the MED samples from Southern df, degree of freedom; K, the number of inferred population groups based on STRUCTURE analysis; F, fixation index; PV, percent variation; P-value, probability of the significance tests.  Xinjiang were mainly located at the right side of the coordinate axis, whereas the MEAM1 samples from Northern Xinjiang were mainly located at the bottom (Fig. 2B). We further tested whether MED and MEAM1 samples could be separated by geographic isolation. Results show that MED samples that grouped by sampling sites were highly highly dispersed but can be partially separated into Northern and Southern Xinjiang, with the large Tianshan Mountain potentially serving as a geographical barrier for the two groups (Fig. 2C). MEAM1 samples were clearly separated, especially those from Southern Xinjiang. Interestingly, MEAM1 in Northern Xinjiang occupied the center which separated the samples of Bayingol from those of Hotan and Kashgar. MEAM1 B. tabaci in northern Xinjiang showed clear separation from those in Hotan, Bazhou, and Kashgar area. MED B. tabaci from northern Xinjiang was intersected by those from Bazhou and Hotan areas. The results showed further that MED B. tabaci individuals and MED B. tabaci from southern Xinjiang were clearly clustered, whereas those from northern Xinjiang were more dispersed. The range of MED from northern Xinjiang overlaps with that of MED B. tabaci from southern Xinjiang.

Discussion
MED B. tabaci has spread and gradually replaced the ecological niche of MEAM1 as a dominant cryptic species in many provinces since it was first found in Yunnan in 2004 . After 22 yr of B. tabaci invasion in Xinjiang, CAPs were used to identify MED and MEAM1 cryptic species in this study, and the randomly chosen SSR samples were also used for sequencing COI gene (data not shown) to verify CAP-based identification. Based on the results of CAPs, we found that MED B. tabaci was mainly distributed in northern Xinjiang and was in agreement with Cao et al. (2013). We also found that in southern Xinjiang, a higher proportion of MEAM1 existed, there was a wide percentage difference between northern (MED: 84%) and southern (MED: 28%) Xinjiang. The investigation revealed that the most serious damage occurred in Turpan area and the MED was the dominant crop pest. This was attributed to local climatic conditions, planting patterns, and field management. The samples from Shihezi, Alashankou, Changji, and Aler were collected in flowers and were identified and confirmed as MED-byrelated studies. The spread of MED B. tabaci is related to the introduction of flowers . MEAM1 was the main cryptic species in southern Xinjiang including western part of Bazhou, parts of Hotan and Kashgar. This phenomenon could be attributed to the adjacent planting of local vegetable greenhouses, cotton fields, and with frequent flower transport and trade.
We were interested in the invasion pathways of MEAM1 and MED B. tabaci and the effect of substitutions on the internal genetic diversity of B. tabaci. We used five microsatellite primers to analyze the genetic relationship between MED and MEAM1 populations. There was low genetic diversity in B. tabaci in Xinjiang and low genetic variation between populations. The main sources of genetic variation between MED and MEAM1 B. tabaci were interspecific genetic differences and geographical isolation. There was no correlation between the geographical and genetic distances in B. tabaci. The reason is that the man-made transmission factor of B. tabaci is larger than that of the natural transmission. To block ongoing dispersal, strict detection and flower quarantine regulations need to be enforced. Based on the AMOVA and the F ST value, B. tabaci populations in Xinjiang had highly significant differentiation. The correlation between the geographical distance (LN Km) and the genetic distance of the whitefly populations was analyzed using Mantel test. Mantel's t-test demonstrated that there was no relationship between geographical distance and Fig. 5. Box diagram of assigned probabilities of migration from MED to MED, MED to MEAM1, MEAM1 to MED, and MEAM to MEAM1. Probabilities above 95% and below 5% were depicted as dots and 5% and 95% threshold as bars. Horizontal lines and in box indicate quantiles and medians. '+' expresses means. Capital letters mean significant levels at P < 0.001. genetic distance (Table 6). ΔK statistic divides MEAM1 into two subgroups and into four subgroups for MED. As the provincial capital, Urumqi may be the main source of MED whitefly transmission, while MEAM1 invasion pathways may be more varied. NJ-tree and PCoA could neither distinguish MED and MEAM1 nor separate the northern and southern Xinjiang into two clusters well (Fig. 3). This may be explained by the gene flow between a few MED and MEAM1 populations, such as Hetian-13-MED, Urumqi-2-MEAM1, and Changji-MED (Fig. 4). The unique characteristics of irrigation agriculture and oasis agriculture in Xinjiang, at the same time, the large desert areas surrounding cities, provide geographical barrier for whitefly transmission. Thus, human-based transmission becomes the main mode of transmission for B. tabaci. MED populations had a stronger gene flow toward MEAM1 populations. The probabilities in MEAM1 were mainly self-assigned, indicating that the MEAM1 had a reproductive disadvantage to MED. This can result from pesticide resistance (Roditakis et al. 2009 or asymmetric mating interaction (Liu et al. 2007).