Shared evolutionary processes shape landscapes of genomic variation in the great apes

Abstract For at least the past 5 decades, population genetics, as a field, has worked to describe the precise balance of forces that shape patterns of variation in genomes. The problem is challenging because modeling the interactions between evolutionary processes is difficult, and different processes can impact genetic variation in similar ways. In this paper, we describe how diversity and divergence between closely related species change with time, using correlations between landscapes of genetic variation as a tool to understand the interplay between evolutionary processes. We find strong correlations between landscapes of diversity and divergence in a well-sampled set of great ape genomes, and explore how various processes such as incomplete lineage sorting, mutation rate variation, GC-biased gene conversion and selection contribute to these correlations. Through highly realistic, chromosome-scale, forward-in-time simulations, we show that the landscapes of diversity and divergence in the great apes are too well correlated to be explained via strictly neutral processes alone. Our best fitting simulation includes both deleterious and beneficial mutations in functional portions of the genome, in which 9% of fixations within those regions is driven by positive selection. This study provides a framework for modeling genetic variation in closely related species, an approach which can shed light on the complex balance of forces that have shaped genetic variation.

Landscapes of divergence can be correlated by their definition, as they can share part of their histories.In most of our analyses (except for Figure S2), we do not show the correlations for such cases but below we describe how this sharing would a↵ect correlations (using a simplified theory).For example, in Figure 3 d V X and d XY share the branch X; depending on how the length of the branch X compares to the total tree length, these two landscapes are bound to be correlated.Assuming that mutations follow a Poisson process and that coalescences happen instantaneously, we derive the following.There are three non-overlapping parts in the tree between these, the branch from the XY ancestor to X with length E[⌧ X ] = T XY , the branch from the XY ancestor to Y with length E[⌧ Y ] = T XY and the branch from V to the XY ancestor with length E[⌧ V ] = 2T V W XY T XY .If we just consider the genealogical definition of divergence and assume d V X = ⌧ V + ⌧ X and d XY = ⌧ X + ⌧ Y (i.e., ignoring the contributions of ancestral diversity to divergence), then where p d V X = T X T X +T V is the proportion of d V X that is shared with d XY , and is the proportion of d XY that is shared with d V X .

Great apes dataset
Figure S2: Correlations between landscapes of diversity and divergence for comparisons with branch overlap.For example, diversity in humans and divergence between humans and bonobos share part of their history.Each point on the plots correspond to the (Spearman) correlation between two landscapes of diversity/divergence, computed on 1Mb windows across the entire genome.Correlations were split by type of landscapes compared (⇡ d XY , d XY d XY ).The x-axis is a metric of expected branch overlap between the landscapes.See subsection 4.1 for more information.Note that species with low N e (bonobos, eastern gorillas and western chimps) have a di↵erent point shape.The colors reflect the number of species involved in the comparison.For example, the comparison between human-western gorilla and eastern chimp-Sumatran orangutan divergences includes four di↵erent species.On the other hand, the comparison between human-western gorilla and human-Sumatran orangutan divergences includes just three species.states were assumed to be the same as seen in rhesus macaques (RheMac2), and sites not called in macaques were not used.d XY for W sites is simply the mean pairwise di↵erences between samples in species X and Y per ancestral W sites (A/T).Similar reasoning applies for d XY for S ancestral sites, but only considering (G/C) sites.Points were colored by the most common recent ancestor of the two species compared in each divergence.Lines were fitted using local linear regression.Note that for ancestrally weak mutations (A) there is an increase in divergence at the ends of the chromosomes, but that is not seen for ancestrally strong mutations (B).refers to simulations with beneficial mutations, Both refers to simulations with both beneficial and deleterious mutations.The shape of the points di↵erentiate simulations with constant mutation rate along the genome and variable local mutation rates.Principal component analysis (PCA) applied to a matrix with all pairwise correlations between landscapes across the great apes (including ⇡ ⇡, ⇡ d XY and d XY d XY comparisons) for the great apes dataset and simulations (with selection and with mutation rate variation).We excluded simulations with µ p 1 ⇥ 10 10 from the PCA analysis because PC2 was capturing negative correlations caused by strong positive selection -as seen in Figure 7F.

Figure S3 :
FigureS3: Landscapes of diversity and divergence in selected simulations with natural selection.The selection parameters µ n and µ p are the rate of mutations in exons with negative and positive fitness e↵ects, respectively.The mean fitness e↵ect was s = 0.03 for deleterious mutations and s = 0.01 for beneficial mutations (see subsection 2.2 for more details).Other details are as in Figure2.

Figure S4 :
FigureS4: Landscapes of divergence partitioned by allele state in the ancestor.Ancestral states were assumed to be the same as seen in rhesus macaques (RheMac2), and sites not called in macaques were not used.d XY for W sites is simply the mean pairwise di↵erences between samples in species X and Y per ancestral W sites (A/T).Similar reasoning applies for d XY for S ancestral sites, but only considering (G/C) sites.Points were colored by the most common recent ancestor of the two species compared in each divergence.Lines were fitted using local linear regression.Note that for ancestrally weak mutations (A) there is an increase in divergence at the ends of the chromosomes, but that is not seen for ancestrally strong mutations (B).

Figure S5 :
Figure S5: PCA visualization of data and simulations at 500Kb.The colors di↵erentiate the empirical data from simulations with di↵erent parameters: Neutral refers to the simulation without any selection, Deleterious refers to simulations with deleterious mutations, Positive refers to simulations with beneficial mutations, Both refers to simulations with both beneficial and deleterious mutations.The shape of the points di↵erentiate simulations with constant mutation rate along the genome and variable local mutation rates.Principal component analysis (PCA) applied to a matrix with all pairwise correlations between landscapes across the great apes (including ⇡ ⇡, ⇡ d XY and d XY d XY comparisons) for the great apes dataset and simulations (with selection and with mutation rate variation).We excluded simulations with µ p 1 ⇥ 10 10 from the PCA analysis because PC2 was capturing negative correlations caused by strong positive selection -as seen in Figure7F.

Figure S7 :Figure S8 :Figure S9 :
Figure S7: Correlations and covariances between landscapes of diversity and divergence and annotation features in the real great apes data.Only windows in the middle half of chromosome 12 were included.Compare to Figure 10.

Figure S10 :
Figure S10: Landscapes of diversity, divergence, exon density and recombination rate across chromosome 1.See Figure 2 for more details.

Figure S11 :
Figure S11: Landscapes of diversity, divergence, exon density and recombination rate across chromosome 2. See Figure 2 for more details.

Figure S12 :
Figure S12: Landscapes of diversity, divergence, exon density and recombination rate across chromosome 3. See Figure 2 for more details.

Figure S13 :
Figure S13: Landscapes of diversity, divergence, exon density and recombination rate across chromosome 4. See Figure 2 for more details.

Figure S14 :
Figure S14: Landscapes of diversity, divergence, exon density and recombination rate across chromosome 5. See Figure 2 for more details.

Figure S15 :
Figure S15: Landscapes of diversity, divergence, exon density and recombination rate across chromosome 6.See Figure 2 for more details.

Figure S16 :
Figure S16: Landscapes of diversity, divergence, exon density and recombination rate across chromosome 7. See Figure 2 for more details.

Figure S17 :
Figure S17: Landscapes of diversity, divergence, exon density and recombination rate across chromosome 8.See Figure 2 for more details.

Figure S18 :
Figure S18: Landscapes of diversity, divergence, exon density and recombination rate across chromosome 9. See Figure 2 for more details.

Figure S19 :
Figure S19: Landscapes of diversity, divergence, exon density and recombination rate across chromosome 10.See Figure 2 for more details.

Figure S20 :
Figure S20: Landscapes of diversity, divergence, exon density and recombination rate across chromosome 11.See Figure 2 for more details.

Figure S21 :
Figure S21: Landscapes of diversity, divergence, exon density and recombination rate across chromosome 12.See Figure 2 for more details.

Figure S22 :
Figure S22: Landscapes of diversity, divergence, exon density and recombination rate across chromosome 13.See Figure 2 for more details.

Figure S23 :
Figure S23: Landscapes of diversity, divergence, exon density and recombination rate across chromosome 14.See Figure 2 for more details.

Figure S24 :
Figure S24: Landscapes of diversity, divergence, exon density and recombination rate across chromosome 15.See Figure 2 for more details.

Figure S25 :
Figure S25: Landscapes of diversity, divergence, exon density and recombination rate across chromosome 16.See Figure 2 for more details.

Figure S26 :
Figure S26: Landscapes of diversity, divergence, exon density and recombination rate across chromosome 17.See Figure 2 for more details. • FigureS1: E↵ect of exon density and recombination rate on the accumulation of genetic divergence in chromosome 12 with phylogenetic distance.Within-species genetic diversities are shown at dT = 0. Mean diversity and divergences were computed for four groups depending on whether they fell or not on the top 90% percentile of recombination rate and exon density.