Recent range expansion in Australian hummock grasses (Triodia) inferred using genotyping-by-sequencing

Abstract The Australian arid zone (AAZ) has undergone aridification and the formation of vast sandy deserts since the mid-Miocene. Studies on AAZ organisms, particularly animals, have shown patterns of mesic ancestry, persistence in rocky refugia and range expansions in arid lineages. There has been limited molecular investigation of plants in the AAZ, particularly of taxa that arrived in Australia after the onset of aridification. Here we investigate populations of the widespread AAZ grass Triodia basedowii to determine whether there is evidence for a recent range expansion, and if so, its source and direction. We also undertake a dating analysis for the species complex to which T. basedowii belongs, in order to place its diversification in relation to changes in AAZ climate and landscapes. We analyse a genomic single nucleotide polymorphism data set from 17 populations of T. basedowii in a recently developed approach for detecting the signal and likely origin of a range expansion. We also use alignments from existing and newly sequenced plastomes from across Poaceae for analysis in BEAST to construct fossil-calibrated phylogenies. Across a range of sampling parameters and outgroups, we detected a consistent signal of westward expansion for T. basedowii, originating in central or eastern Australia. Divergence time estimation indicates that Triodia began to diversify in the late Miocene (crown 7.0–8.8 million years (Ma)), and the T. basedowii complex began to radiate during the Pleistocene (crown 1.4–2.0 Ma). This evidence for range expansion in an arid-adapted plant is consistent with similar patterns in AAZ animals and likely reflects a general response to the opening of new habitat during aridification. Radiation of the T. basedowii complex through the Pleistocene has been associated with preferences for different substrates, providing an explanation why only one lineage is widespread across sandy deserts.


Table of Contents
Additional details for divergence dating analyses

Chloroplast genome sequencing and assembly
For divergence time estimation across Poaceae, we obtained chloroplast genomes for members of all subfamilies, 26 downloaded from GenBank and 13 newly sequenced and assembled (Table S2). We aimed to include two or three genomes from each subfamily, four for Oryzoideae and 14 for Chloridoideae, but were only able to obtain one representative from each of Anomochloideae, Pharoideae and Puelioideae. Sample classification and subfamilial taxonomy follows Soreng et al. (2015). For a second analysis with denser sampling of Chloridoideae and Triodia, we included an additional ten Triodia and five other Chloridoideae genomes newly sequenced and assembled.
Silica-dried leaf material was ground in liquid nitrogen and genomic DNA extracted The remaining eight genomes were assembled using a beta-test version of the de novo assembler NOVOPlasty (available at: https://github.com/ndierckx/NOVOPlasty, verified August 2017) (Dierckxsens et al. 2017), using various conserved seed sequences where necessary. Genomes were validated by mapping raw reads onto the assembly in Geneious v.
6.1.8 (http://www.geneious.com, Kearse et al. 2012) to check for ambiguously aligned regions. The NOVOPlasty algorithm was applied to three of the genomes assembled using the first method above, to demonstrate its utility in Triodia. The three genomes assembled using the two methods differed in, at most, a few heterozygous/homozygous calls and/or the number of units in repeat regions.

Phylogenetic analysis
We created two datasets for phylogenetic and divergence dating analyses: (1)  To determine the optimum partitioning and model scheme, the loci for each dataset were run through PartitionFinder v. 2.0.0-pre13 (Lanfear et al. 2012), using models available for MrBayes, the Bayesian information criterion to select the optimal model, and a greedy search. The optimal partitioning scheme (see Table S4) was used in phylogenetic inference with RAxML v. 8.1.21 (Stamatakis 2014), which implements the GTRGAMMA model for all partitions. RAxML was run with a 100 replicate rapid bootstrap (Stamatakis et al. 2008) followed by a search for the maximum likelihood tree ('-f a' option). We also ran the partitions and best models under Bayesian inference in MrBayes v. 3.2.6 (Ronquist et al. 2012).
MrBayes was executed for three runs, each using four chains and two million generations, sampling every 500, with 10% of samples discarded as burn-in. Trees from each run were combined into a single file with a custom script, and a maximum clade credibility tree constructed using TreeAnnotator v. 2.4.0 (Bouckaert et al. 2014) and visualized in FigTree v.

Divergence dating
We included five fossils and a root prior (  (Bouckaert et al. 2014). The placement of the Prasad et al. (2005Prasad et al. ( , 2011 phytoliths is somewhat controversial (see Christin et al. 2014), but has been set at crown Oryzoideae in a recent dating analysis of grasses (Burke et al. 2016), consistent with recently discovered fossils of a spikelet from c. 100 Ma (Poinar et al. 2015). We report our results with the phytoliths at stem Oryzeae (common ancestor of Oryza and Microlaena in our tree), but we also ran analyses with them placed conservatively at stem BOP+PACMAD (Bambusoideae, Oryzoideae, Pooideae + Panicoideae, Aristidoideae, Chloridoideae, Micrairoideae, Arundinoideae, and Danthonioideae). In setting the prior probability distributions, maximum ages were arbitrarily chosen as 1.5 times the minimum age except for stem Chloridoideae, which was extended (making a less informative prior) to account for the evidence of C 4 grasses from the Oligocene (Urban et al. 2010). Internal node priors were implemented using log-normal distributions with a mean of 1, an offset to the minimum age, and standard deviations set such that 95% of the probability distribution was contained within the chosen intervals. The prior on the root of the tree was set as a uniform distribution from 0 to a hard maximum age of 125 Ma, under the assumption that lineages of Poaceae were not present prior to the earliest eudicot fossil (Doyle 1992). Anomochloa was fixed as the outgroup (see Bouchenak-Khelladi et al. 2008).
For dataset 1, BEAST was run for 120 million generations, sampling every 10,000, using the Yule model for branching under a relaxed uncorrelated log-normal clock (UCLN; Drummond et al. 2006), and using the optimal partitioning scheme from PartitionFinder.
BEAST had difficulty starting the analysis from a random tree because of the prior for placement of the 65 Ma phytoliths at stem Oryzeae, so a tree in Newick format was provided with appropriate branch lengths for that node to initialize the analysis. Rather than fixing models for the optimum partitions, we allowed the substitution models to be sampled as part For dataset 2, we calibrated the node age for crown Chloridoideae using the 95% highest posterior density (HPD) interval from the analysis of dataset 1. The prior distribution was set (see Table S4) with a gamma distribution and an offset, with the shape and scale set iteratively to closely approximate the 95% HPD from the first analysis. A recent molecular dating analysis of Triodia (Toon et al. 2015) found that a random local clocks model (RLC; Drummond and Suchard 2010) provided a better fit to account for rate changes in Triodia than the UCLN clock (see also Crisp et al. 2014), which produced younger ages in Triodia. We ran BEAST with the same settings as in the analysis of dataset 1, except using an RLC clock model, again using the optimal partition scheme obtained from PartitionFinder. Examining log files in Tracer showed some instability in the MCMC chain for some parameters, with stationarity reached around 60 million generations. To obtain higher ESS of parameters, we ran a second analysis with the same settings then combined the two runs and constructed a maximum clade credibility tree after discarding appropriate burn-ins (50% and 25%, respectively).
We ran two additional analyses to assess the impact of clock model and alignment size. First, we ran the analysis of dataset 2 again with the same settings, but this time using a UCLN clock model. Examining the log file in Tracer showed stationarity after approximately 30% burn-in, so this was used to construct a maximum clade credibility tree. Second, we assessed whether differences in the dates we obtained compared to those of Toon et al. (2015) might be due to the length of our alignment. We extracted the matK region from our alignment of dataset 2 and ran an analysis with just that locus, using the same settings as in the original analysis of dataset 2 (with an RLC clock model). Examining the run in Tracer showed wide variation in the prior and posterior (with low ESS) as well as a few anomalous samples well outside the range of the chain. We combined the results with those of a second run, after discarding burn-in and removing anomalous samples with a custom script, but still had low ESS for the posterior and prior, possibly reflecting low information content in the matK locus. Results from the matK analyses should therefore be interpreted cautiously.

RESULTS
The 28 newly-assembled grass chloroplast genomes were on average c. 135,000 bp (132,643-137,308 bp) in length. There were no major discrepancies in gene content or order.
After filtering the aligned genomic regions, dataset 1 comprised 79 loci (coding regions) with a combined length of 62,005 bp, while dataset 2 comprised 163 loci (coding regions, introns or spacers) with a combined length of 101,510 bp. Phylogenetic inference for both datasets resulted in highly resolved trees (Figs. S1-S4), with high (100%/1.00 posterior probability) support for all grass subfamilies except Pharoideae and Anomochloideae (single samples), and high support for Triodia as a genus and for the T. basedowii species complex.
Phylogenetic relationships of grass subfamilies are in agreement with the current understanding of evolution in Poaceae (Grass Phylogeny Working Group II 2012; Soreng et al. 2015). Topologies are identical between RAxML and MrBayes trees, aside from the lack of resolution at the base of the trees for dataset 1.
Divergence dates for important nodes across the grasses (Fig. S5, Table 1) are similar to previous studies with comparable placement of the controversial phytoliths (Prasad et al. 2011;Christin et al. 2014;Burke et al. 2016). Our estimate for the node age of crown Chloridoideae (41.7 Ma; used for a secondary calibration in analysis of dataset 2) is slightly older than previous estimates but is within the 95% highest posterior density (HPD) interval of those studies (Table 1). Alternative placement of the controversial phytoliths at stem BOP+PACMAD led to estimates for crown Chloridoideae c. 10 Ma younger (32.1 Ma; Table   S6).
The analysis focussing on Chloridoideae and Triodia chloroplast genomes allowed incorporation of non-coding regions into the alignment and produced a well-supported (posterior probabilities >0.99 for most nodes) chronogram for evolution in the group (Fig. 3).
Estimated node ages ( 1.4-2.0 Ma 95% HPD; Table S6), but still indicated a radiation beginning in the Pleistocene. As with a previous study on diversification timing for Triodia (Toon et al. 2015), the UCLN clock model produced younger estimates for node ages compared to the preferred RLC model (see Table 2). Restricting the alignment to matK resulted in older crown age estimates (Table 2) for Triodia (10 Ma) and the T. basedowii complex (3.8 Ma) compared to the full alignment; however, the estimates were still younger than those in Toon et al. (2015).
Our estimate for the timing of diversification should be interpreted cautiously, given our limited sampling of Triodia and the lack of a fossil calibration for our ingroup. Indeed, our estimate for the age of crown Triodia (7.0-8.8 Ma) is significantly younger than the 11.4-18.3 Ma estimate obtained in a recent study (Toon et al. 2015), although our estimate for stem Triodia is similar (see Table 2). This is surprising, especially given Toon et al. (2015) used a younger secondary calibration (c. 32 Ma) for crown Chloridoideae. There are several potential reasons for our dates being younger. First, we used a larger dataset (chloroplast genomic alignment vs. ITS + matK), which we expect to be more informative for inferring rates of evolution. When we used the matK region alone, we obtained an older estimate for crown Triodia (5.5-17 Ma), although still not as old as that of Toon et al. (2015). Second, our sampling of Triodia and outgroup Chloridoideae is sparser, which may have produced a node density effect (see Heath et al. 2008;Simon Ho, pers. comm.), limiting the number of inferred substitutions. While undersampling across a tree has been shown to result in younger ages, undersampling of a specific clade was not observed to have the same effect on the age of the subtending node (Linder et al. 2005), suggesting that undersampling in Triodia may not fully explain the discrepancy. Third, our sampling of Triodia lacks species from northern tropical regions, which were included in Toon et al. (2015) and which may have higher rates of evolution and could bias lower nodes to be older (see Beaulieu et al. 2015). Broader sampling of Triodia chloroplast genomes in the future may help to resolve whether our younger dates are primarily a result of limited sampling or a better estimate based on greater sequence data. Table S4.
Partitions and models for datasets 1 and 2. The number of loci refers to genes, introns or intergenic spacers and is the number of possible partitions that PartitionFinder assigned to that partition. Table S5.
Molecular dating calibrations and their parameterization in BEAST. In the case of the secondary calibration, the age range is the 95% highest posterior density (HPD) interval, with the gamma implementation that closely approximates that HPD. The secondary calibration (S) is obtained from the analysis of dataset 1. Footnotes: References for fossils are: (1) (Prasad et al. 2005(Prasad et al. , 2011, (2)   Node ages (Ma) from analyses of datasets 1 (clear) and 2 (greyed) with the phytoliths placed alternatively at stem BOP+PACMAD or at stem Oryzeae (preferred). HPD is the highest posterior density interval.

Figure S5.
Chronogram from the BEAST analysis of dataset 1, comprising chloroplast alignments from members of all grass subfamilies. Fossil calibration points are shown with black triangles and numbered as in Table S5.