Uncovering the phylogeography of Schinus terebinthifolia in South Africa to guide biological control

Abstract Schinus terebinthifolia is a problematic invasive alien plant (IAP) in South Africa that is a high priority target for biological control. Biological control has been implemented in the states of Florida and Hawaii (USA), where S. terebinthifolia is also an IAP. Phylogeographic work determined that there have been multiple introductions of two lineages (haplotype A and B) into the USA. Haplotype A was introduced to western Florida and Hawaii, while haplotype B was introduced to eastern Florida. Haplotypes A and B have subsequently hybridized in Florida, resulting in novel plant genotypes. Biological control agents in the USA are known to vary in efficacies on the two different haplotypes and hybrids. This study used molecular techniques to uncover the source populations of S. terebinthifolia in South Africa using chloroplast DNA and microsatellites. Populations from the introduced ranges in Florida (east, west and hybrids) and Hawaii were included (n = 95). All South Africa populations (n = 51) were found to be haplotype A. Microsatellite analysis determined shared alleles with western Florida and Hawaiian populations. The likely source of South African S. terebinthifolia was determined to be western Florida through the horticultural trade. These results will help guide a biological control programme to source agents that perform well on these populations in the USA. Furthermore, the presence of only one haplotype in South Africa highlights the need to ensure no further introductions of other haplotypes of the plant are made, in order to avoid similar hybridization events like those recorded in Florida.


Introduction
Management of biological invasions can be assisted by knowing the introductory history of a species (Gaskin et al. 2011;Hopper et al. 2018). Genetic analyses are often the only way to elucidate this phylogeography, as records on the timing and location of introductions are usually lacking or misleading (Estoup and Guillemaud 2010). Phylogeographic studies can also give important insights into the number of independent introductions and their subsequent expansion and patterns of gene flow (Shirk et al. 2014). This information can play a crucial role in helping to develop management strategies for invasive alien plants (IAPs), especially for the development of weed biological control (hereafter referred to as biocontrol) programmes (Goolsby et al. 2006;Paterson et al. 2009;Rollins et al. 2009). Such genetic studies have therefore become standard practice in biocontrol programmes (Goolsby et al. 2006;Madeira et al. 2007;Paterson et al. 2009;Paterson and Zachariades 2013;Canavan et al. 2017;Kwong et al. 2017).
Schinus terebinthifolia (Anacardiaceae) (Brazilian peppertree) is a tree native to subtropical South America that has been widely distributed worldwide primarily for its ornamental value (Wheeler et al. 2016). The tree has subsequently become invasive in over 20 countries and is considered one of the world's worst invasive trees (Lowe et al. 2000). In the USA, S. terebinthifolia has invaded Florida and Hawaii and is one of the most widespread and destructive IAPs in these regions (Schmitz et al. 1997;Rodgers et al. 2014). A biocontrol programme was initiated in the USA and the pre-introductory evaluations involved genetic work to determine the introductory history of S. terebinthifolia (Williams et al. 2005). Using nuclear and chloroplast loci it was determined that in Florida there have been multiple introductions of two lineages known as haplotype A and B that originate from different source areas in Brazil (Williams et al. 2005(Williams et al. , 2007. Haplotype A is predominantly found in western Florida while haplotype B is found in eastern Florida. Intraspecific hybridization between the two haplotypes has subsequently occurred and resulted in novel hybrid genotypes (Williams et al. 2005). This has increased genetic variability in the invaded range, and facilitated rapid adaptations to new niches (Williams et al. 2007;Geiger et al. 2011;Mukherjee et al. 2012).
Phylogeographic studies can also assist a biocontrol programme in elucidating how plant genotype may influence biocontrol agent performance. Determination of the evolutionary history of a plant in its invaded range can help lead researchers to the area of origin in the native range that share the same haplotype. Biocontrol agents often have greater performance on target plants that they share an evolutionary history with (Lambert and Casagrande 2007;Bhattarai 2014;Cronin et al. 2016). Goolsby et al. (2006) demonstrated this sensitivity with populations of an eriophyid mite collected for biocontrol of Lygodium microphyllum (Lygodiaceae). Mites collected from plants from the source of the invasive population performed better on the invasive genotype and had reduced fitness and efficacy on plant genotypes from other parts of the native distribution (Goolsby et al. 2006). Similarly, the biocontrol programme on S. terebinthifolia in the USA has determined that biocontrol agents vary in their efficacies according to plant haplotype . The three agents (three established in Hawaii and two in mainland USA) were found to have 'fine tuned' adaptation to specific populations and genotypes of their hosts plants (Forno 1992;Cuda et al. 2012). The occurrence of two different plant genotypes from different source populations and intraspecific hybrids between these genotypes is therefore expected to result in varying levels of control across populations (Manrique et al. 2008Geiger et al. 2011).
Schinus terebinthifolia is an IAP in South Africa and is listed as a category 1b species under the National Environmental Management: Biodiversity Act (NEM:BA) (Act 10 of 2004) in the Eastern Cape, Limpopo, Mpumalanga and KwaZulu-Natal provinces and category 3 elsewhere (DEA 2014). In South Africa, S. terebinthifolia has invaded across a range of biomes, mostly along the coastal areas of the country and is particularly problematic within the threatened coastal thornveld in the KwaZulu-Natal Province (van der Linden et al. 2008;Martin et al. 2020).
As biocontrol has already been implemented in the USA, it is considered one of the options for management of S. terebinthifolia in South Africa (Byrne et al. 2021;Canavan et al. 2021). The history and pathways of introduction of S. terebinthifolia in South Africa are largely unknown, and since the origin of the invasive populations has proven important for biocontrol in the USA, it is assumed that understanding the origin of the invasive populations in South Africa may be important in selecting effective biocontrol agents. This aim of this study was to determine the introductory history of the invasive populations of S. terebinthifolia in South Africa using chloroplast DNA (cpDNA) sequencing and microsatellites.

Study sites
Fifty-one trees were sampled from across the plant's distribution in South Africa ( Fig. 1; see Supporting Information-Table S1). A subset (n = 95) of the 592 Florida plants analysed in Williams et al. (2007) and 45 samples from Hawaii were included [see Supporting Information- Table S2]. The 95 samples from Florida were chosen to represent the three distinct populations that now occur in that region from the two original introductions of haplotype A into western Florida and haplotype B into eastern Florida (i.e. east, west and hybrid). Samples were selected from STRUCTURE analysis that had the highest shared ancestry with each group (q-value) (eastern vs. western q > 0.85) as well as hybrids (average q in eastern/western introductions = 0.48/0.52) (Williams et al. 2007).

DNA extraction
Fresh leaves were dried in silica gel for all South African samples according to the protocol of Chase and Hills (1991). DNA was extracted using the Qiagen DNeasy® Plant Mini Kit (Valencia, CA, USA) following the manufacturer's protocol. The protocol was adjusted in that leaf tissue was crushed under liquid nitrogen prior to the extraction. The samples included from Williams et al. (2007) were not included in this step as they were received as DNA extracts.

Sequences
This method was derived from Hamilton (1999) and Williams et al. (2005). Chloroplast DNA intraspecific variation was assessed by amplifying the trnS-trnG intergenic spacer region [see Supporting Information- Table S3]. The polymerase chain reaction (PCR) reaction contained 0.4 μL of each of the primers at a concentration of 0.01 μM, 10 μl of Promega Master Mix (Madison, WI, USA), 0.8 μL of Promega MgCl, 3 μL of template DNA per reaction. A further 5.4 μL of Promega nuclease-free water was added to reach a final volume of 20 μL. Amplifications were performed in a T100™ thermal cycler (Bio-Rad, South Africa). The PCR protocol had a 5-min denaturing step at 96 °C followed by 40 cycles of 96 °C for 45 s, annealing temperature of 52 °C for 1 min and 72 °C extension for 30 s. Polymerase chain reaction products were sequenced at Macrogen Corp., Korea using a DNA Analyser 3730x (Applied Biosystems™, Foster City, CA, USA). Samples included from the introduced range (Florida and Hawaii) had already had their haplotypes determined and reported in Williams et al. (2005). The cpDNA analysis was therefore only conducted on South African samples.
Chloroplast DNA chromatograms were examined, and contiguous sequences were assembled and manually edited in ChromasLite™ version 2.6.4. An alignment of sequences was done in GeneStudio™ version 2.2.0.0 and included all cpDNA haplotypes for S. terebinthifolia that were downloaded from GenBank (accession numbers: AY928398-AY928407 (Williams et al. 2005)) with haplotypes confirmed by visual inspection to have all base pairs matched.

Microsatellites
Microsatellite analysis was performed on the South African samples (n = 51) and two samples from Williams et al. (2005) [see Supporting Information- Table S1]. The remaining samples included from Florida and Hawaii were later included with the data set by standardizing the genotypes using the two samples for alignment. Six S. terebinthifolia microsatellite loci were grouped into two multiplex reactions according to the protocols of Williams et al. (2002Williams et al. ( , 2005 [see Supporting Information- Table S4]. Applied Biosystems Inc., UK added fluorescent labels to the primers; stAAG13 with 6-FAM dye, stAAG14 with VIC dye, stGGT39F with NED dye, stCTCCTT01 with PET dye, stAAT1 with 6-FAM dye and stAAT16 with NED dye (DS-33 Matrix Standard (G5 dye set)). The PCR contained ~10 ng of DNA, 12.5 μL of Q5 High-Fidelity 2× Master Mix and 2.5 μL of each primer at concentrations of 10 μM to make a total volume of 25 μL. Reactions were cycled in a Mastercycler® nexus (Eppendorf, Germany), 230 V/50-60 Hz using the simulated tube function. Cycling parameters were one cycle at 98 °C for 30 s, followed by 35 cycles of 10 s at 98 °C, 20 s at 50 °C (multiplex A) or 55 °C (multiplex B), 20 s at 72 °C and a final cycle at 72 °C for 2 min. Capillary electrophoresis was run on an ABI 3500XL (Applied Biosystems™, Foster City, CA, USA) genetic analyser at Inqaba Biotec™.
Chromatogram alignment was first constructed using Geneious version 11.1.5 (Kearse et al. 2012). Peak size markers were all aligned to ensure amplified peaks could be aligned by fragment size. The samples from Florida and Hawaii from Williams et al. (2007) were then included with the 51 South African samples. The two samples included in the study were used to standardize allelic calls from the results in Williams et al. (2005). The data set was entered into a co-dominant matrix and analysed using GenAIEx version 6.5 (Peakall and Smouse 2012).
Observed and expected heterozygosity and F IS (inbreeding coefficient) were calculated. We used HP-RARE (Kalinowski 2005) to calculate allelic richness to standardize comparisons across sample sizes. The Hardy-Weinberg equilibrium (HWE) was calculated for each sampling using 'pegas' package (version 0.13) in R (Paradis 2010). Genotypic linkage disequilibrium was determined using the genepop package (version 1.1.7) in R (Raymond and Rousset 1995) and the significance threshold was adjusted using Bonferroni correction to account for multiple comparisons (Sokal and Rohlf 1995).
Discriminant analysis of principal components (DAPC) and Bayesian-based STRUCTURE analysis were used to investigate the pattern of population structure. The nuclear ancestry of all 191 samples was tested based on the six microsatellite loci using a Bayesian genetic clustering algorithm implemented in STRUCTURE version 2.3.4. (Pritchard et al. 2000). An admixture model was used that assumed correlated allele frequencies, with no location prior with 10 iterations for each run. Each run consisted of 1 000 000 MCMC (Monte Carlo Markov chain) steps and a burn-in period of 50 000. STRUCTURE can give misleading results both for the number of populations and individual ancestry if there is uneven sampling across clusters (K; Puechmaille 2016; Wang 2017). We used the recommendations of Wang (2017) and set the prior for admixture to allow α to vary between clusters and we decreased the initial α from 1.0 to 0.2. We ran 10 independent runs for K = 1-5. The most likely K was identified using the method of Evanno et al. (2005). We then used CLUMPP 1.1.1 (Jakobsson and Rosenberg 2007) to average across the 10 runs for the most likely K. Membership assignment of each population was estimated as (q), the ancestry coefficient, which varies on a scale from 0 to 1.0, with 1.0 indicating full ancestry with a certain population.
A DAPC analysis was run to complement the STRUCTURE analysis as it is a non-parametric approach and some of the microsatellite loci and populations significantly deviated from the HWE [see Supporting Information- Fig. S1] (Jombart et al. 2010). Discriminant analysis of principal components was used to identify and describe clusters of genetically related individuals, and implemented in the R package adegenet 4.0.2 (Jombart and Ahmed 2011). Discriminant analysis of principal components uses PCA to transform the data to perform a discriminant analysis on the principal components (PCs). This allowed for graphical assessment of the population structure and also calculated the proportion of successful reassignment of individuals to indicate how genetically distinct each population is. A prior step to the DAPC was to determine the optimal number of PCs to retain. We used two methods to calculate this value by predicting the a-score and performing a cross-validation method. The optim.a.score function (50 replicates a-scores were calculated) was used to calculate the maximum a-score according to Jombart et al. (2010). This measured the discriminating power and stability of DAPC, and was obtained by random permutation of group memberships and comparison of the estimated probabilities of group assignment for the permuted data. The cross-validation approach was performed using the function xvalDapc to fit a DAPC to a subset of the data (a 'training' set) and assessing how accurately the model predicts population membership for the samples that were not in the training set. The method that indicated the lowest number of PCs that are sufficient for accurate assignment was used for the subsequent DAPC analysis.

Chloroplast DNA
All S. terebinthifolia sampled in South Africa (n = 51) were determined to be cpDNA haplotype A (GenBank accession number for haplotype A: AY928398). Forty-four samples from Florida were haplotype A and all samples from Hawaii were haplotype A [see Supporting Information- Table S2]. The remaining samples from Florida were haplotype B, which were mostly found in the east of the State and from hybrid populations. The A and B haplotypes were distinguished by variance at five nucleotide positions out of 716 base pairs (Williams et al. 2005).

Microsatellites
Based on their microsatellite phenotypes, STRUCTURE HARVESTER analysis determined that all the samples clustered into two populations (k = 2) ( STRUCTURE analysis the South African, Hawaii and western Florida populations were grouped together (Fig. 3). The South African samples grouped more closely with the western Florida populations compared to the Hawaiian group. The samples from eastern Florida were least genetically similar, grouping away from all other regions. The hybrid populations were found to group between the two main clusters.
South Africa had a total of 15 alleles across all loci and all of these were found in Florida (31 alleles) and in the individuals from western Florida (21 alleles). South African populations were also closely related to the Hawaiian introduction with 11 shared alleles that are also in western Florida.
South  S1]. There was no evidence of significant linkage disequilibrium after Bonferroni correction for multiple comparisons (P < 0.003) for any of the locus pairs [see Supporting Information- Table S5].
A separate STRUCTURE analysis of only South African samples grouped all the samples into three populations (k = 3) [see Supporting Information-Figs S5 and S6]. However, the low Delta K values (y-axis 0-4) indicate that there is no distinct clustering of populations in South Africa. Further, there was no geographic structuring according to allelic phenotypes. Some samples (n = 9) shared the same allelic profiles, despite having considerable geographic distances, for example sample 46 (Wilshire, Eastern Cape Province) and 18 (Nyalaza, KwaZulu-Natal Province).

Discussion
All S. terebinthifolia populations surveyed in South Africa were found to be haplotype A and therefore, unlike the invasive populations in Florida, there is no evidence of multiple introductions of genetically distinct lineages. To date, 10 haplotypes of S. terebinthifolia have been recorded in the native range (Williams et al. 2005). Haplotype A which has been recorded from the native range in Balneário Camboriú, Santa Catarina in southeast Brazil has been determined to be the most common haplotype in the introduced ranges in Hawaii, Florida, Texas, U.S. Virgin Islands (Williams et al. 2005) and now South Africa. Williams et al. (2005) found populations of S. terebinthifolia in the native range to be more strongly structured according to geographic distance and this was attributed to more limited seed dispersal compared to the exotic range in Florida. Like in Florida, South African populations were not found to be geographically structured and further there was no clear clustering across all samples. In Florida, this has been attributed to two factors: (i) human-driven movement of plants for the ornamental trade and (ii) the long-distance dispersal of seeds by migrating American robins (Turdus migratorius) (Williams et al. 2005). These same vectors of seed dispersal are likely to have occurred in South Africa. Schinus terebinthifolia has also been intentionally planted throughout the country for ornamental, hedging and shade/shelter purposes (Panetta and McKee 1997;Kuruneri-Chitepo and Shackleton 2011). In addition, four common native frugivorous birds were determined to readily consume the fruits of S. terebinthifolia which reduces germination time and enhances dispersal of seeds (Dlamini et al. 2018). South African populations were also found to have higher genetic diversity compared to Hawaii (higher levels of heterozygosity, HWE) indicating that they are not constrained by a genetic bottleneck which will further enhance their invasive ability.
In South Africa, Hawaii and western Florida, haplotype A is the dominant cpDNA haplotype. Previous studies have determined that populations in Florida have a subset of microsatellite alleles from source regions in South America and Hawaii (Williams et al. 2005). However, South Africa, Texas, the U.S. Virgin Islands and other Caribbean islands have a subset of monomorphic alleles from Florida (Williams et al. 2005;unpubl.). It therefore seems probable that Florida was the 'beachhead' for introductions into South Africa and the other adventive ranges rather than a separate introduction directly from the native range. The USA is the world's foremost producer of and market for nursery and floriculture plants (Niemiera and Von Holle 2009), particularly in the post-World War II economic boom which increased demand for commercial shipping of ornamental plants (Reichard and White 2001). Schinus terebinthifolia was first recorded in 1919 in South Africa (Potts 1919) and this is around the same time of introductions of the plant into Hawaii (de Freitas et al. 2020). These introductions to both Hawaii and South Africa would have occurred before plants in Florida became admixed through hybridization of the two haplotypes (Williams et al. 2007;Mukherjee et al. 2012).
Schinus terebinthifolia is also considered invasive in areas that have not been sampled including Spain, Portugal, Australia, New Zealand and some islands in the Pacific and Indian Ocean (CABI 2021). These areas could conceivably also be sources for the South African introduction although it is not clear if these populations existed when S. terebinthifolia was introduced into South Africa. Nevertheless, even if one of these unsampled areas was the source for South African S. terebinthifolia, this study suggests that they probably would have been genetically very similar to the Florida introduction and so our general conclusions do not change.
The presence of only one source population for S. terebinthifolia in South Africa has important implications for management of the plant. Without the presence of additional genotypes in the country there is no potential for admixture or hybridization. Given that hybrids have the potential to have novel phenotypes that can result in higher survival, growth rates and biomass (Geiger et al. 2011), it is imperative that legislation is maintained so that no further introductions of S. terebinthifolia are allowed into the country that may be the source of new genotypes.
Schinus terebinthifolia has not yet had a biocontrol programme initiated in South Africa. A native herbivore, Megastigmus transvaalensis (Hymenoptera: Torymidae), uses the plant as a host but has not been found to be suitably damaging to the seeds to impact populations and therefore additional agents are required (Magengelele and Martin 2021). A recent study to determine which IAPs to target for biocontrol in South Africa ranked S. terebinthifolia within the top 20 species (out of 299) that are high priority targets (Canavan et al. 2021). Brazilian peppertree would represent a transfer project for South Africa given that biocontrol has been implemented elsewhere and agents are available to be directly sourced for host specificity testing (Byrne et al. 2021). The outcome of this genetic study will help guide the selection of the most appropriate biocontrol agent.
The absence of any hybridization of haplotypes and comparatively lower genetic diversity than in Florida and the native range enhances the likelihood of achieving successful biocontrol. Plants that have been changed through hybridization and artificial selection are often more difficult to control using biocontrol as they can differ from the plants in the native range where natural enemies have evolved (Paterson et al. 2009;Urban et al. 2011). This has been the case for biocontrol efforts in Florida whereby the attributes of hybrid populations have resulted in varying levels of control across populations depending on genotype. For example, the introduced seed parasitoid, M. transvaalensis, was found to perform poorly on hybrids in Florida (Geiger et al. 2011). Furthermore, variation between haplotypes in S. terebinthifolia has been found to influence biocontrol agent efficacy. Populations of the recently released, Pseudophilothrips ichini (Thysanoptera: Phlaeothripidae) from two geographic locations in Brazil-Ouro Preto, Minas Gerais state (Ouro Preto thrips) and Salvador, Bahia State (Salvador thrips)differed in their impact on the two haplotypes in Florida .

Conclusions
This study determined that populations from western Florida are the likely source of S. terebinthifolia in South Africa. Uncovering this pathway of introduction has important implications for management of the plant. Firstly, the presence of only one genotype in the country ensures that hybridization cannot occur and stringent regulation of the plant should continue to ensure new genotypes are not brought in through the horticultural trade.  (Bhattarai et al. 2017). These agents should be considered in South Africa and the choice of which agent to use should be guided by climatic matching (Martin et al. 2020) and success of establishment on the correct haplotype in Florida.

Supporting Information
The following additional information is available in the online version of this article- Table S1. Sampling sites for the collection of Schinus terebinthifolia genetic material. Table S2. Haplotype information and microsatellite genotype data for all samples including the subset (n = 95) of the 592 Florida plants analysed in Williams et al. (2007) and 45 samples from Hawaii. Table S3. Primer sequences used to obtain Schinus terebinthifolia haplotypes. Table S4. Primer sequences used in this study and their allelic size range for six microsatellite loci of Schinus terebinthifolia. Table S5. P-values for linkage disequilibrium of each pair of loci across all populations (Fisher's method). Using the Bonferroni correction (P < 0.003) there were no significant deviations. Figure S1. Heat map showing P-values of deviations from the Hardy-Weinberg equilibrium (HWE) across populations and marker (P <= 0.05). FL East = East Florida, FL West = West Florida, FL Admixed = hybrid populations in Florida, HW = Hawaii, SA = South Africa. Figure S2. Delta K values showing the ideal number of populations as k = 2 based on 191 samples of Schinus terebinthifolia from South Africa, Hawaii and Florida using six microsatellite primer pairs and the Evanno method implemented in STRUCTURE HARVESTER program according to Earl and von Holdt (2012). Figure S3. The a-score optimization showing the optimum number of retained principal components (PCs) is 10. Figure S4. The discriminant analysis of principal components (DAPC) cross-validation showing the optimum number of retained principal components (PCs) is 25. Figure S5. Genetic population structure of 51 individuals of Schinus terebinthifolia from populations in South Africa, based on Bayesian clustering analysis of microsatellite loci with STRUCTURE (Pritchard et al. 2000). According to the Evanno method, three populations were inferred [see Supporting Information- Fig. S6]. Sample ordered according to Q-value. The red cluster corresponds to population 1, the green cluster corresponds to population 2 and blue corresponds to population 3. The values in ordinate the shared ancestry according to percentage membership into each population. Figure S6. Delta K values showing the ideal number of populations as k = 3 based on 51 samples of Schinus terebinthifolia from South Africa using six microsatellite primer pairs and the Evanno method implemented in STRUCTURE HARVESTER program according to Earl and von Holdt (2012).