Recent secondary contacts, background selection and variable recombination rates shape genomic diversity in the model species Anolis carolinensis

Gaining a better understanding on how selection and neutral processes affect genomic diversity is essential to gain better insights into the mechanisms driving adaptation and speciation. However, the evolutionary processes affecting variation at a genomic scale have not been investigated in most vertebrate lineages. Previous studies have been limited to a small number of model species, mostly mammals, and no studies have investigated genomic variation in non avian reptiles. Here we present the first population genomics survey using whole genome re sequencing in the green anole (Anolis carolinensis). This species has emerged as a model for the study of genomic evolution in squamates. We quantified how demography, recombination and selection have led to the current genetic diversity of the green anole by using whole-genome resequencing of five genetic clusters covering the entire species range. The differentiation of green anole’s populations is consistent with a northward expansion from South Florida followed by genetic isolation and subsequent gene flow among adjacent genetic clusters. Dispersal out-of-Florida was accompanied by a drastic population bottleneck followed by a rapid population expansion. This event was accompanied by male-biased dispersal and/or selective sweeps on the X chromosome. We show that the combined effect of background selection and recombination rates is the main contributor to the genomic landscape of differentiation in the anole genome. We further demonstrate that recombination rates are positively correlated with GC content at third codon position (GC3) and confirm the importance of biased gene conversion in shaping genome wide patterns of diversity in reptiles.

Introduction scenario with extensive gene flow, with high uncertainties for the time spent in isolation for 2 2 9 secondary contact models (SC). Models with the highest likelihood and AIC incorporated 2 3 0 genomic barriers to gene flow in GA, with approximately 20-30% of loci resisting introgression 2 3 1 from Florida and less than 10% resisting gene flow from GA. Secondary contacts are often associated with the emergence of genomic islands resisting gene 2 3 4 flow, that display higher differentiation than regions that have been homogenized. The diversity 2 3 5 of such islands is also higher, as they diverged and accumulated mutations before gene flow resumed. On the other hand, purifying selection at linked sites can also generate genomic islands, widespread impact of background selection in Florida, reducing diversity at linked sites over 2 4 0 ~60% of the genome. We therefore tested the role of low recombination in shaping the genomic 2 4 1 landscape of diversity and differentiation in green anoles in a context of secondary contact.

4 2
Recombination rates estimated by LDHat in the EF cluster were highly heterogeneous along Rozas's ZZ statistic, suggesting stronger linkage disequilibrium in the middle of chromosomes 2 4 6 arms.

4 7
We observed higher relative differentiation (measured by F ST ) in regions of low recombination  We assessed whether biased gene conversion had an impact on nucleotide composition in the 2 5 5 green anole by testing for correlation between recombination and GC content in coding DNA 2 5 6 sequences (CDS). We did observe a significant correlation between GC content and 2 5 7 recombination rates at all three codon positions, the strongest effect being observed for the 2 5 8 1 0 correlation between GC3 content and recombination rates (Spearman's rank correlation test, all 2 5 9 P-values < 2.2x10 -16 ; Figure 7). Since this codon position is less impacted by purifying selection, 2 6 0 our results are consistent with a joint role of purifying selection and biased gene conversion in 2 6 1 shaping nucleotide variation in the green anole genome. A dynamic demographic history has shaped the genomic landscape of differentiation. Green anole populations are strongly structured and it was hypothesized that successive splits have provided the opportunity for expansion northwards (Soltis et al. 2006 several events of isolation followed by secondary contact in Florida. Third, we found clear time when lowering sea levels would have facilitated colonization (Lane, 1994;Petuch, 2004).

7 7
Despite an old history of divergence, we found clear evidence for gene flow between taxa having 2 7 8 diverged in the last two million years. We argue that this makes the green anole a valuable model to study speciation (and its reversal) in the presence of gene flow, as well as identifying genomic  Here, we found evidence of locally reduced diversity due to background selection within Florida 2 8 2 in our ∂ a∂i models. We note however that models with the highest likelihoods for the Gulf Atlantic-Eastern Florida comparison included heterogeneous migration rates along the genome,  The role of background selection is further supported by the correlations we observed between 2 8 8 diversity, differentiation, and recombination (see below), although we acknowledge that some 2 8 9 regions of high divergence and low diversity may have been the targets of positive selection right 2 9 0 after population splits (Cruickshank and Hahn 2014). This does not preclude the existence of 2 9 1 heterogeneous gene flow along the genome, since we could not properly test the likelihood of 2 9 2 models incorporating both of these aspects at once. Instead, this highlights the important role of Hahn 2014), even in a context of secondary contact where genomic islands resisting gene flow 2 9 5 may be more expected. Recent years have seen a growing interest for the so-called "genomic islands of speciation", 2 9 7 regions that harbor higher differentiation than the genomic background (Feder and Nosil 2010; with other vertebrates. The green anole is a valuable system to understand local adaptation in    We detected a significant deviation from a balanced effective sex-ratio in the two populations 3 1 0 that recently expanded and colonized North America, with strongly reduced nucleotide diversity 3 1 1 on the X chromosome in Gulf Atlantic when compared to autosomal diversity (Sup. Fig. 2). This suggests that the number of females that contributed to the present diversity on the X have therefore led to a biased sex-ratio in the founding populations and smaller effective 3 1 8 population sizes on the X chromosome compared to unbiased expectations.

1 9
In Anolis roquet, male-biased dispersal is associated with competition, since males disperse more 3 2 0 when density increases and competition for females is stronger (Johansson et al. 2008). In Anolis 3 2 1 sagrei, smaller males tend to disperse more while females are more likely to stay in high quality   in an artificially biased sex-ratio. Sexual or natural selection may be responsible for this pattern,  Selection and recombination shape nucleotide composition and diversity at linked sites.

4 0
We observed strong heterogeneity in recombination rates along the green anole genome. Our claiming that this assumption does not hold upon closer scrutiny. association that we observed between recombination rate and GC3 content confirms the Anolis carolinensis is an important model organism for biomedical and physiological studies, and benefits from a complete genome sequence that can be used to bridge multiple mechanisms  However, studying the genetic bases of adaptation in anoles cannot be properly addressed  Moreover, the study of intraspecific genetic variation holds promise to address questions at 3 6 7 larger evolutionary scale, such as the role of demography, selection and incompatibilities in the differentiation. This study provides a valuable background for precise quantification of the reptiles. Whole genome sequencing libraries were generated from Anolis carolinensis liver tissue samples Core (http://nyuad.nyu.edu/en/research/infrastructure-and-support/core-technology- average of 1,519,339,234 read pairs, and after quality trimming 93.3% were retained as paired 3 9 1 reads and 6.3% were retained as single reads. Sequencing data from this study have been  SNP detection with the assistance of the NYUAD Bioinformatics Core, using NYUAD variant 3 9 8 calling pipeline. Briefly, the quality-trimmed FastQ reads of each sample were aligned to the 3 9 9 AnoCar2.0 genome using the BWA-mem short read alignment approach (Li and Durbin, 2009) 4 0 0 and resulting SAM files were converted into BAM format, sorted and indexed using SAMtools  (Table S1). Each re-sequenced genome was then processed coverage, no more than 40% missing data across all 29 samples, a minimum quality score of 20 4 1 2 per site, and a minimum genotype quality score of 20. To assess genetic structure, we conducted a clustering analysis using discriminant analysis of describing variance in SNP datasets, then performs a discriminant analysis on these PC axes to 4 1 8 identify genetic groupings. We retained two principal components and two of the linear 4 1 9 discriminants. We also described relationships between individuals with the same dataset using rapid bootstraps to assess support for the phylogeny with the highest likelihood.

2 5
We further quantified patterns of diversity and the shape of the allele frequency spectrum in each  An advantage of this algorithm is that it is phase-insensitive, limiting the propagation of phasing  Atlantic, and Carolinas genetic clusters. We sexed individuals by taking advantage of the expected relationship between depths of 4 6 5 coverage at autosomal and sex-linked loci in males and females. Since females are XX and males 4 6 6 XY, the latter are expected to display a two-times lower coverage at X-linked sites compared to 4 6 7 autosomal loci (Sup. Fig 1). We then adjusted allele frequencies for all X-linked scaffolds, S i = 1 -2 | p i -0.5 |. scenarios with both asymmetric migration rates and heterogeneous population sizes but were 4 9 7 unable to reach convergence. Overall, we compared 34 scenarios combining these features, using harmonic mean of the SMC++ estimates before splitting time for all pairs of populations.

1 8
Estimating recombination rates 5 1 9 We used the LDHat software (McVean et al. 2002) to estimate effective recombination rates VCFtools (option -ldhat). Since LDHat assumes that samples are drawn from a panmictic module. The chain was run for 1,000,000 iterations and sampled every 5000 iterations with a 5 3 0 large block penalty of 20 to avoid overfitting and minimize random noise. The first 100,000 generations were discarded as burn-in. Convergence under these parameters was confirmed by

3 5
To assess whether selection and low recombination had an effect on diversity and differentiation, 5 3 6 we computed two measures of divergence (F ST and d XY ) over non-overlapping 100kb windows 5 3 7 for the three divergent Floridian lineages. Comparison between those two statistics for a given We extracted CDS sequences for all green anole genes from the ENSEMBL database (available recombination rates were estimated in R. data. PLoS Genet. 14:1-32. Danecek P, Auton A, Abecasis G, Albers CA, Banks E, DePristo MA, Handsaker RE, Lunter G, Trends Ecol. Evol. 29:51-63.   between the five genetic clusters previously inferred. We assume a mutation rate of 2.1x10 -10 /bp/generation and a generation time of one year. B: Average branch lengths obtained from autosomal data and effective sex-ratios (ξ) inferred from KIMTREE. A set of 5,000 autosomal and 5,000 sex-linked markers were randomly sampled to create 50 pseudo-replicated datasets on which the analysis was run. The analysis was run on the three most closely related populations.
Pie charts indicate the proportion of replicates for which we observed significant support (Si<0.01) in favor of a biased sex-ratio. comparisons. Both models fit the observed datasets as indicated by the similar spectra between observation and simulation. The "2N" suffix means that background selection was added to the base model by modelling heterogeneous effective population sizes across loci. The "2M2P" suffix means that heterogeneity in gene flow was incorporated into the model. The "ex" suffix means that exponential population size change was introduced in the base model.    Supplementary Table   Table S1. Samples origin, sequencing depth and quality statistics. Figure