Geographic patterns of genomic diversity and structure in the C4 grass Panicum hallii across its natural distribution

Abstract Geographic patterns of within-species genomic diversity are shaped by evolutionary processes, life history and historical and contemporary factors. New genomic approaches can be used to infer the influence of such factors on the current distribution of infraspecific lineages. In this study, we evaluated the genomic and morphological diversity as well as the genetic structure of the C4 grass Panicum hallii across its complex natural distribution in North America. We sampled extensively across the natural range of P. hallii in Mexico and the USA to generate double-digestion restriction-associated DNA (ddRAD) sequence data for 423 individuals from 118 localities. We used these individuals to study the divergence between the two varieties of P. hallii, P. hallii var. filipes and P. hallii var. hallii as well as the genetic diversity and structure within these groups. We also examined the possibility of admixture in the geographically sympatric zone shared by both varieties, and assessed distribution shifts related with past climatic fluctuations. There is strong genetic and morphological divergence between the varieties and consistent genetic structure defining seven genetic clusters that follow major ecoregions across the range. South Texas constitutes a hotspot of genetic diversity with the co-occurrence of all genetic clusters and admixture between the two varieties. It is likely a recolonization and convergence point of populations that previously diverged in isolation during fragmentation events following glaciation periods.


Species occurrence points
We compiled occurrence localities from direct field surveys, georeferenced herbarium specimens and public databases for biological records. All secondary occurrence points were first verified for probable errors. A total of 965 localities covering the full extant geographic range of Panicum hallii, of which 65 come from P. hallii var. filipes and 904 from P. hallii var. hallii. We deduplicated the presence data keeping one occurrence per raster cell grid using the function gridSample of the dismo R package (Hijmans, Phillips, Leathwick, & Elith, 2017). To reduce potential bias due to uneven sampling effort (Boria, Olson, Goodman, & Anderson, 2014;Kramer-Schadt et al., 2013) we performed a spatial filtering of occurrence points by randomly removing localities that were within 0.4 degrees of one another. We employed the R package spThin v.0.2.0 (Aiello-Lammens, Boria, Radosavljevic, Vilela, & Anderson, 2015) to retain the maximum number of presences possible under a minimum separation distance. We also included a set of true absence locations that consistently showed no records of P. hallii in field surveys between authum 2014 and summer 2017. Using the true absence set points, we created a buffer of 0.4 degrees of the absence points using the function st_buffer of the package sf (Pebesma, 2018) and then removed all presence points falling inside the buffer using the function erase.point of the package spatialEco v.1.3-1 (Evans, 2020) in R. The final set of presence points included 36 records for P. hallii var. filipes and 439 for P. hallii var. hallii.

Environmental variables
We obtained the explanatory variables for the Habitat Suitability Modeling (HSM) for the present day  from the WorldClim climate archive (Hijmans, Cameron, Parra, Jones, & Jarvis, 2005) at 30 arc-second resolution (ca. 1 km at the Equator). This fine-scale resolution improves predictions for fixed such as plants (Guisan & Thuiller, 2005). In order to avoid multicollinearity-related noise, we performed variable selection using the function corselect of the R package fuzzySim (Barbosa, 2015) by calculating pairwise correlations among the variables and, among each pair of variables with Pearson correlation above 0.75, we keep the variable with the highest bivariate (individual) relationship with species presence/absence response in a binomial Generalized Linear model. All final explanatory data sets include eight bioclimatic variables for each fitted model, in all cases the variation inflation factors were below 10. We cropped the bioclimatic layers from latitude 13°to 44°N and longitude 88°to 120°W with raster R package (Hijmans, 2015), this geographic extent include the full extant range of P. hallii.

Pseudo-absences sampling
Because the HSM techniques that we implemented required both presence and absence data, we selected pseudoabsences as random localities throughout the full extant range of the species. For each independent modeling run for each variety, we set five samples of 10000 random localities outside a buffer of 0.4 degrees of the full set of presence/absence points. We generated the spatial buffer with st_buffer function and the transformed to a raster layer using the fasterize package in R (Ross, 2020).
• Occurence sampling Figure S2. Sampling points for presence, absence, and pseudo-absences locations for A. P. hallii var. filipes and B. P. hallii var. hallii. The plot shows a subset of 1000 out of 50000 pseudo-absence points sampled for the modeling processes. Background colors (green to white) represent Annual Precipitation values from 35 to 5375 mm.

Habitat Suitability Modelling and ensemble
We implemented an ensemble-HSM framework (Araujo & New, 2007) that use models obtained by several algorithms to account for the uncertainty associated with the HSM techniques. The analyses were implemented with biomod2 package (Thuiller, Georges, & Engler, 2014;Thuiller, Lafourcade, Engler, & Araújo, 2009). Individual HSMs were generated using seven modelling techniques: (1) , 2006). We generated independent HSM for P. hallii var. hallii, P. hallii var. hallii -West genetic cluster, P. hallii var. hallii -Tex-Mex genetic cluster and P. hallii var. filipes. We kept prevalence equal to 0.5. For each HSM, we run five cross-validation replicated where presence localities were divided in sets of 75% for training models and 25% for testing. For each HSM we performed five runs for each of the pseudo-absence sets, therefore we completed a total of 175 HSM for each P. hallii variety/cluster. To assess predictive performance of the HSM, we measured the threshold independent statistics area under the receiver operating characteristic curve (AUC; Phillips et al., 2006) and the True Skill Statistic (TSS; Allouche, Tsoar, & Kadmon, 2006). From the full set of obtained HSMs for each P. hallii variety/cluster we calculated an ensemble-HSM (i.e. consensus model) using a mean of suitability scores weighted by the TSS evaluation score of each individual HSM. To calculate de consensus, we only included HSMs with TSS > 0.75.

Species distribution hindcast based on habitat suitability models projections
We projected the fitted HSM to infer the distribution of P. hallii during three past conditions: Mid-Holocene (midHol; 6 Kya), Last Glacial Maximum (LGM;~22 Kya) and Last Inter-Glacial (LIG;~120-140 Kya) periods. To take into account the uncertainty related to different approaches to simulate past climate scenarios, for midHol and LGM period we projected the HSMs to three different global circulation models (GCMs): MIROC (Model for Interdisciplinary Research on Climate; Hasumi & Emori, 2004), CCSM4 (The Community Climate System Model Version 4; Gent et al., 2011), and MPI-ESM-P (Max-Planck-Institute Earth System Model; Giorgetta et al., 2013). For past climate projections, we approximated a consensus distribution map ranges to the areas that were predicted by one, two, or three binary-transformed ensemble HSMs for both different circulation model hindcasts. Map for HSM projection visualization were produced with the packages rasterVis and viridis packages in R (Garnier, 2018;Perpiñán & Hijmans, 2019). * Ensemble habitat suitability model projections Figure S8. Projections of the ensemble habitat suitability model of P. hallii var. filipes to Last Inter-Glacial, Last Glacial Maximum, mid-Holocene and Current climatic conditions. Color categorical scale in the paleo-climatic projections represents the climatic suitable area predicted by either one, two or three GCMs for the specific period above the TSS threshold (see Methods). Color continuous scale in the current projection represents the suitability values projected from the HSM above the TSS threshold. Background gray scale in the current projection represents the Annual Precipitation gradient of the studied area from 35 mm (dark) to 5375 mm (ligth). The red circle represents the P. hallii diversity hotspot as a radius of 200 km around San Antonio, Texas.

Panicum hallii var. hallii
• Individual model evaluations " Figure S9. Boxplots for the individual model evaluation statistics Area under the receiver operating characteristic curve (AUC) and the True Skill Statistic (TSS) for the habitat suitability modeling for P. hallii var. hallii.
• Bioclimatic variable selection Figure S10. Correlation matrix of bioclimatic selected variables for the habitat suitability modeling for P. hallii var. hallii.
• Individual variable importance Figure S11. Variable importance for the habitat suitability modeling for P. hallii var. hallii.
• Ensemble habitat suitability model projections Figure S12. Projections of the ensemble habitat suitability model of P. hallii var. hallii to Last Inter-Glacial, Last Glacial Maximum, mid-Holocene and Current climatic conditions. Color categorical scale in the paleo-climatic projections represents the climatic suitable area predicted by either one, two or three GCMs for the specific period above the TSS threshold (see Methods). Color continuous scale in the current projection represents the suitability values projected from the HSM above the TSS threshold. Background gray scale in the current projection represents the Annual Precipitation gradient of the studied area from 35 mm (dark) to 5375 mm (ligth). The red circle represents the P. hallii diversity hotspot as a radius of 200 km around San Antonio, Texas.