Abstract

A new approach to modeling the effects of Hill-Robertson interference on levels of adaptation and patterns of variability in a nonrecombining genome or genomic region is described. The model assumes a set of L diallelic sites subject to reversible mutations between beneficial and deleterious alleles, with the same selection coefficient at each site. The assumption of reversibility allows the system to reach a stable statistical equilibrium with respect to the frequencies of deleterious mutations, in contrast to many previous models that assume irreversible mutations to deleterious alleles. The model is therefore appropriate for understanding the long-term properties of nonrecombining genomes such as Y chromosomes, and is applicable to haploid genomes or to diploid genomes when there is intermediate dominance with respect to the effects of mutations on fitness. Approximations are derived for the equilibrium frequencies of deleterious mutations, the effective population size that controls the fixation probabilities of mutations at sites under selection, the nucleotide site diversity at neutral sites located within the nonrecombining region, and the site frequency spectrum for segregating neutral variants. The approximations take into account the effects of linkage disequilibrium on the genetic variance at sites under selection. Comparisons with published and new computer simulation results show that the approximations are sufficiently accurate to be useful, and can provide insights into a wider range of parameter sets than is accessible by simulation. The relevance of the findings to data on nonrecombining genome regions is discussed.

Introduction

Population genomics data show that genomic regions with little or no genetic recombination usually show reduced levels of genetic diversity, as well as signs of reduced levels of adaptation such as reduced frequencies of optimal codons, reduced rates of fixation of selectively favorable nonsynonymous mutations, and increased frequencies of segregating nonsynonymous mutations (reviewed by Charlesworth and Campos 2014; Charlesworth and Jensen 2021). These patterns are consistent with the hypothesis that selection within genomes with low frequencies of recombination interferes with evolution at closely linked sites, a process commonly referred to as Hill-Robertson interference (HRI) (Hill and Robertson 1966; Felsenstein 1974). Several types of HRI have been modeled analytically, including the effects of the spread of beneficial mutations (selective sweeps) on closely linked selected sites (e.g. Kim 2004; Weissman and Barton 2012), and the elimination of strongly selected deleterious mutations (Charlesworth et al. 1993; Charlesworth 1994; Peck 1994; Johnson and Barton 2002). The study of other types of HRI, especially the case of multiple linked sites subject to weak purifying selection, when both selection and genetic drift play important roles, has relied heavily on computer simulations (Li 1987; McVean and Charlesworth 2000; Tachida 2000; Comeron and Kreitman 2002; Kaiser and Charlesworth 2009; Charlesworth et al. 2010; Seger et al. 2010; Hough et al. 2017; Devi et al. 2023; Daigle and Johri 2025). Given that genomic regions with little or no recombination, and the whole genomes of asexual and highly inbreeding species, are likely to include many functionally significant nucleotide sites (both coding and noncoding) that are all closely linked to each other, HRI involving purifying selection seems likely to have major effects on variability and adaptation. Analytical results should help to provide a better understanding of these effects.

A recent paper has suggested a new approach to the problem of modeling HRI due to purifying selection in large nonrecombining genomic regions, providing formulae that seem to match the results of simulations over much of parameter space (Devi et al. 2023). In contrast to several previous analytical approaches to this problem (e.g. Santiago and Caballero 1998; Good et al. 2014; Santiago and Caballero 2016; Cvijovic et al. 2018; Melissa et al. 2022; Buffalo and Kern 2024; Strütt et al. 2025), reversible mutations were allowed between selectively favorable and deleterious mutations, permitting the establishment of a statistical equilibrium between mutation, selection, and drift. This is important, because the features of long-established low or zero recombination rate genomic regions should reflect their equilibrium properties, especially with respect to the effects of single base substitutions (Charlesworth et al. 2010). The formulae of Devi et al. (2023) were, however, derived by a heuristic argument based on clonal interference between beneficial mutations. Buffalo and Kern (2024) have developed an alternative approach to modeling the effects of HRI on diversity at linked neutral sites, using the theoretical machinery developed by Santiago and Caballero (1998, 2016). However, their approach assumes unidirectional mutation from wild-type to deleterious alleles.

This paper re-examines the effects of HRI on multiple sites subject to purifying selection in the absence of recombination, by modifying the approach of Santiago and Caballero (1998, 2016) to allow for reversible mutations at selected sites, so that the system can reach a true stationary distribution of allele frequencies, allowing predictions of the mean frequencies of favored alleles at sites under selection. This provides a widely used measure of the overall level of adaptation (Li 1987; McVean and Charlesworth 2000; Comeron and Kreitman 2002; Charlesworth et al. 2010; Devi et al. 2023), which is described below.

In addition to developing an approximate formula for the level of adaptation at the sites under selection, we examine the properties of the gene tree for a block of linked loci under selection, allowing the pairwise diversity and site frequency spectrum (SFS) for neutral sites located among the selected sites to be calculated. We also show how the effect of HRI on the genetic variance at sites subject to selection, caused by the generation of linkage disequilibrium (LD), can be taken into account, to a good level of approximation.

For simplicity, a haploid genome is modeled; the results also apply to diploids, with minor changes in parameters, if the effects of mutations on fitness show intermediate dominance. Complications are, however, likely to arise in diploid populations where deleterious mutations are mostly partially recessive (Crow 1993), due to the possible effects of associative overdominance (Zhao and Charlesworth 2016; Becher et al. 2020; Gilbert et al. 2020; Charlesworth B 2022; Abu-Awad and Waller 2023), so that caution is needed in extending the conclusions to diploids. They should nevertheless be applicable to evolving Y or W chromosomes, which (subject to some caveats: Englestädter 2008) behave effectively as haploid for many purposes (Charlesworth and Charlesworth 2000; Olito et al. 2004), as well as U and V chromosomes in dioicous species with a haploid sexual stage in their life-cycle (Bull 1978; Charlesworth D 2022), some types of supernumerary chromosome (e.g. Torgasheva et al. 2019), clonally reproducing organisms (Price and Arkin 2015; Bast et al. 2018), and highly homozygous self-fertilizing populations, where recombination is effectively nearly absent (Charlesworth and Wright 2001; Burgarella et al. 2024; Daigle and Johri 2025).

Methods

The basic population genetics model

The basic model is identical to that used in theoretical studies of selection on codon usage, which assume a large number of nucleotide sites, each with 2 alleles (A1 and A2), with A1 denoting the selectively favorable allele at a site and A2 its deleterious alternative (Li 1987; Bulmer 1991; McVean and Charlesworth 1999; Tachida 2000). Carriers of A2 have a fitness 1—s relative to a fitness of 1 for A1 and fitnesses are multiplicative across sites. The mutation rates for A1 to A2 and A2 to A1 are κu and u, respectively, where κ is the mutational bias parameter. It is usually assumed that κ > 1, since preferred codons often end with GC (Hershberg and Petrov 2008; Sharp et al. 2010) and there is a nearly universal bias in favor of GC to AT base substitutions (Fu et al. 2011; Graur 2016, Chap.10). This model applies to any situation where selectively favored and deleterious alleles can be identified at individual nucleotide sites. Most analytical treatments assume complete evolutionary independence among sites and use a single-locus population genetics model (Li 1987; Bulmer 1991; McVean and Charlesworth 1999).

The assumption of independence becomes increasingly implausible as linkage between sites becomes tighter (Li 1987; McVean and Charlesworth 2000; Comeron and Kreitman 2002). Here, we assume a genomic region of L completely linked sites, with a total mutation rate to deleterious alleles of U = Lu. Assuming a Wright-Fisher population, in the absence of HRI the effects of genetic drift at each site are determined by a variance effective population size of N0 (Crow and Denniston 1988). With allele frequencies x and 1−x at a site, the new variance in allele frequency generated by random sampling from one generation to the next is x(1−x)/N0. HRI can be viewed as reducing the effective population size (Ne) below this value (Hill and Robertson 1996; Santiago and Caballero 1998, 2016). While use of a modified effective population size does not capture all aspects of HRI, such as its effects on allele frequency spectra (Charlesworth et al. 1995; Santiago and Caballero 1998; O'Fallon, et al. 2010; Nicolaisen and Desai 2013; Good et al. 2014; Cvijovic et al. 2018), it is useful for understanding its effects on the fixation probabilities of new mutations and on pairwise diversity measures.

It may seem unnecessary to model reversible mutations when population genomic studies of normally recombining regions of genomes in organisms such as humans and Drosophila indicate that a majority of deleterious mutations are so strongly selected that they are kept at such low frequencies that reversion to wild-type can largely be ignored (e.g. Eyre-Walker and Keightley 2009; Kim et al. 2017; Johri et al. 2020). However, as shown by the results described below, the HRI effects of large numbers of completely linked sites can be so large that deleterious alleles reach intermediate frequencies or fixation, so that reversals cannot be ignored, as has also been found in previous studies (e.g. Charlesworth et al 2010; Devi et al. 2023).

Following McVean and Charlesworth (2000), Comeron and Kreitman (2002), and Devi et al. (2023), the effect of HRI on the level of adaptation can be quantified by the mean frequency of preferred alleles (p¯) across the genomic region in question. At statistical equilibrium under drift, mutation and selection, p¯ depends on the scaled selection coefficient γ = 2Nes and the mutational bias κ, and is determined to a good approximation by the Li-Bulmer equation (Li 1987; Bulmer 1991; McVean and Charlesworth 1999):

(1a)

Strictly speaking, p¯ is the proportion of sites that are fixed for the favorable allele A1 among all fixed sites, and Equation (1a) is obtained by equating the rate of substitution of A2 mutations arising at sites fixed for A1 to the rate of substitution of A1 variants arising at sites fixed for A2 (Bulmer 1991), using the standard diffusion equation formula for the probability of fixation of a mutation in a finite population (Kimura 1964). However, this formula also accurately predicts the proportion of sites that are A1 in state in a random sequence sampled from the population, provided that the proportion of sites that are segregating as opposed to being fixed is sufficiently small (McVean and Charlesworth 1999), so that the assumptions of the infinite sites model (Kimura 1971) are met.

Given an observed value of p¯, γ can be estimated as:

(1b)

This relation is often used to estimate the scaled strength of selection on codon usage from genomic data on the frequencies of preferred synonymous codons (e.g. Sharp et al. 2005, 2010).

If the value of the scaled selection coefficient in the absence of HRI, γ0 = 2N0s, has been specified, the ratio B = Ne/N0 when HRI operates can be equated to γ^/γ0 (McVean and Charlesworth 2000). Devi et al. (2023) used the symbol ϕ instead of the more commonly employed B for this quantity, and they termed the corresponding value of Ne the “fixation effective population size,” because Equation (1a) is derived in terms of fixation probabilities, as described above. The usefulness of Equation (1b) for estimating γ breaks down when p¯ is close to 1, so that −ln(1 – p¯) approaches infinity; this situation corresponds to  0 >> 1, so that allele frequencies are close to deterministic mutation-selection balance at all sites. It is then impossible to obtain meaningful estimates of B from p¯estimates. This is not a serious limitation in practice, since the interest in studying HRI comes from its effect in reducing Ne below N0.

The effect of HRI on the level of adaptation

The basic model

To use this modeling approach, we must be able to predict the fixation Ne. Santiago and Caballero (1998, 2016) showed that the effects of HRI under the situation considered here depend on the additive genetic variance in fitness (Va). With haploidy, Va is equal to the total genetic variance in the absence of epistasis, as is assumed here, but the term additive genetic variance will be retained in order to make comparisons with diploid models. It is useful to start with the equation for the additive variance at a single selected site subject to reversible mutation and genetic drift, assuming that it can be described by the equation for free recombination, but, importantly, with a value of Ne reflecting the effects of HRI. This differs from the approach of Santiago and Caballero (2016), Devi et al. (2023), and Buffalo and Kern (2024).

Using Equation (15) of McVean and Charlesworth (1999), modified for a haploid population, the expected nucleotide site diversity at such a site, under the infinite sites assumption that only a single mutation segregates at each variable site (Kimura 1971), is given by:

(2a)

The corresponding variance in fitness at a site is equal to 12πsels2; the expected genic variance obtained by summing additive variances over all L sites, giving a total mutation rate U = Lu, is thus equal to:

(2b)

It is important to note that Equation (2a) was derived by considering the expected contributions to diversity contributed by mutations traveling to fixation or loss, either A1 to A2 mutations or vice-versa, weighting their contributions by their respective rates of input into the population and probabilities of loss or fixation, respectively, as well as assuming that the majority of sites present at any one time are fixed for either A1 or A2 (McVean and Charlesworth 1999). The results derived below tacitly assume that the state of the population at any one time can be described by this expectation, which must of course be violated in practice, due to the highly stochastic nature of the processes involved. They are likely, therefore, to be only approximate at best, and it is important to check them against simulations.

The effect of LD on the additive genetic variance

In the following derivations, we assume that (1) the Ne that appears implicitly in these equations can be used to predict p¯ from Equation (1a) and (2) the contributions of LD to Va can be ignored, so that Va = Vg. However, for a haploid genome Va=Vg+2i,jCij, where Cij is the covariance between sites i and j resulting from LD, and the sum is taken over all pairs of sites (Bulmer 1980, p. 158). Section 1 of the  Appendix presents a heuristic argument, which shows that LD increases the diversity at selected sites, and hence Vg, by approximately the same amount that it contributes to a reduction in Va. It follows that the relations derived below will cause the value of πsel to be underestimated, whereas the effect of HRI on p¯ should be approximately correct. The problem of correcting for the effect of LD on the net additive genetic variance does not seem to have been considered previously in this context.

Finally, the multilocus simulations used by various investigators to evaluate the effects of HRI, including those used here, generally assume multiplicative fitness interactions among different sites. This raises the question of how to extrapolate to the whole genome from the single site equations described above. To do this, we note that the quantity used in the predictions of HRI effects is the additive genetic variance relative to the squared population mean fitness, w¯2 (e.g. Santiago and Caballero 2016). With multiplicative fitnesses and small effects of each site on fitness, we can use the well-known approximation for the total variance in fitness:

(3a)

With multiplicative fitnesses, the fitness of an individual is the exponential of ln(w), where w is the sum of 1−s over each site whose state differs from wild-type. If each site has only a small effect, so that ln(w) ≈−s, we then have

(3b)

where Va(w) is the variance in fitness on the additive scale, obtained taking the variance of the sum of the effects of each site. This is the quantity that will be used here. As shown below, predictions based on this expression and the other approximations match the simulation results quite well.

Calculating the effects of HRI

Using Equation (3) of Santiago and Caballero (2016) for the case of no recombination, the fixation Ne is given implicitly by:

(4)

where Vm is the increase in fitness variance per generation due to new mutations and VaVg is given by Equation (2b). In addition:

(5)

With γ >> 1 and no HRI between selected sites, the selected sites will equilibrate at deterministic mutation-selection balance, such that the frequency of deleterious mutations at each site is q* = u/s << 1 and VaUs, provided that u << s, as is usually the case. This case corresponds to the limiting value of Equation (2b) for large γ. Equations (4) and (5) then imply that:

(6)

As expected, this is the classical background selection (BGS) formula for the reduction in Ne caused by deleterious mutations in the absence of recombination (Charlesworth et al. 1993); it fails when HRI causes the frequencies of deleterious mutations to depart from their mutation-selection equilibrium values (Charlesworth et al. 1993; Santiago and Caballero 2016).

The challenge is thus to find an expression for B when γ is sufficiently small that HRI and genetic drift cause substantial departures from mutation-selection balance. This can be done by assuming that the Ne involved in γ = 20 in Equations (2) is determined by Equations (4) and (5), yielding the following expression:

(7)

In general, this equation can only be solved for B by numerical iteration. However, as well as the solution for the case of large γ described above, a solution for small γ and B close to 1 can be found by assuming that ln(B) ≈−δ, where δ is sufficiently small that second-order terms in δ can be neglected. Writing γ0 = 2N0s and α0 = 2N0U for the selection and mutation parameters scaled by twice the effective population size in the absence of HRI, the following approximation, which is valid only for small δ, is then obtained:

(8a)

where

(8b)

Given a value of B from Equations (6)-(8), p¯ can be obtained from Equation (1) and compared with the results of computer simulations of nonrecombining populations, providing a test of the validity of this approach. Devi et al. (2023) described the results of simulations of haploid genomes across a wide range of parameters, which yielded numerous estimates of p¯. They also derived an approximate formula for B that is valid for Nes << 1, large L and u/s << 1. In the notation used here, it is equivalent to:

(9a)

where

(9b)

Devi et al. (2023) stated that Bmax is the upper limit to B when all sites are at mutation-selection balance (it is set equal to 1 if the numerator is < 1). As noted above, this is not biologically meaningful, since in this case the allele frequencies depend only on u and s, so that Bmax has only a notional meaning, but is still useful for approximating the effects of HRI.

Levels of neutral diversity under HRI

It is also interesting to determine the effect of purifying selection on variability at neutral sites embedded in a block of nonrecombining sites, and this has been the subject of many previous theoretical studies (e.g. Charlesworth et al. 1993, 1995; Santiago and Caballero 1998, 2016; Comeron and Kreitman 2002; Gordo et al. 2002; Kaiser and Charlesworth 2009; O’Fallon et al. 2010; Seger et al. 2010; Zeng and Charlesworth 2011; Nicolaisen and Desai 2013; Good et al. 2014; Cvijovic et al. 2018; Melissa et al. 2022; Devi et al. 2023). Previous studies have shown that, when there are departures from mutation-selection balance, the value of B for measuring the ratio of neutral diversity to its value in the absence of selection, denoted here by B′, tends to be larger than the B determining the fixation probabilities of new mutations (Comeron and Kreitman 2002; Santiago and Caballero 2016; Devi et al. 2023). The corresponding value of Ne for a haploid population is equivalent to the mean time to coalescence of a pair of alleles at a neutral site, so this quantity can usefully be referred to as the “coalescence effective population size” (Devi et al. 2023), denoted here by Ne′; Santiago and Caballero (2016) called it the “heterozygosity effective size.”

B′ can be found as follows, using the equation at the top of p. 1270 of Santiago and Caballero (2016), which gives the rate of coalescence at generation τ back from the current generation as the reciprocal of an effective population size Ne(τ), which is defined by:

(10a)

where:

(10b)

Qτ is a factor that represents the amplification of the cumulative effects of selection over many generations on the genetic variance in fitness (Robertson 1961; Santiago and Caballero 2016).

As pointed out by Santiago and Caballero (2016), Equation (4) is derived from the asymptotic value of Ne(τ) as time tends to infinity, reflecting the fact that the rate of fixations of mutations under selection is affected by the cumulative effects of the fitness variance on the rate of drift over a long period of time, whereas the effects on diversity are on a much shorter time-scale.

If the population size is large, time can be treated as a continuous variable. It is convenient to rescale time by dividing time in generations by Ne = BN0, where B is determined by Equation (7). The probability of no coalescence by time Ton this time-scale is:

(11a)

where X = τ/Ne and:

(11b)
(11c)

The mean coalescence time relative to the fixation Ne is given by:

(12)

If the value of Ne has been found from Equation (7), numerical evaluation of Equation (12) allows the effect of HRI on neutral diversity to be determined.

The expected neutral SFS

To determine the expected SFS at neutral sites in a sample of k alleles, it is necessary in general to obtain an expression for the expected size of each successive interval between coalescent events in the gene tree for the nonrecombining block, as the numbers of nodes in the gene tree decrease from k to 1 (Griffiths and Tavaré 1998). It is important to note that the gene tree applies to both selected and neutral sites, but the genetic diversity statistics considered here apply only to neutral mutations that obey the infinite sites model. The representation of a coalescent process with a temporally varying effective population size can be used for this purpose (e.g. Griffiths and Tavaré 1998, Polanski et al. 2002; Polanski and Kimmel 2003; O’Fallon et al. 2010; Walczak et al. 2012; Nicolaisen and Desai 2013; Strütt et al. 2025).

Here we use Equations (10), together with the results in section 2 of the Appendix, to calculate the trajectory of the effective population size at a time τ back in the past, Ne′ (τ), and then substitute the results into Equation (52) of Polanski et al. (2002) for the expected SFS (see also Polanksi and Kimmel 2003). An alternative approach is to use the results of the coalescent process directly, as described in sections 1 and 2 of Supplementary File 1. This was, however, found to give substantially less accurate results than the alternative method (see Table 2). Both procedures assume that members of a set of ancestral alleles present at a given time in the past are exchangeable (O’Fallon et al. 2010); this assumption is violated if selection is acting on a block of linked sites, even if only neutral mutations are being studied, so that the results are likely to be only approximate.

A useful property of the SFS is that deviations from neutral expectation can be determined solely from the properties of segregating sites in a sample. As shown in section 2 of the Appendix, these properties can be used to obtain the measure of distortion Δθw proposed by Becher et al. (2020). This involves the ratio of the mean pairwise nucleotide diversity, π, to Watterson's θw, which is given by the mean number of segregating sites in a sample of k alleles (Sk) divided by the sum ak of the harmonic series 1/i from i = 1 to k—1 (Watterson 1975). It is defined as:

(13)

The number of segregating sites used to calculate θw is insensitive to variant frequencies, in contrast to π. Since the expected values of π and θw for neutral variants are equal at mutation-drift equilibrium with a constant Ne (Watterson 1975, their ratio is expected to be <1 if there is an excess of rare variants compared with this null expectation (Tajima 1989). If the SFS shows a higher number of low frequency variants than expected, as expected with HRI, the maximum value of Δθw (Δθwm) can easily be found (Equation A.2c). This suggests that we can normalize Δθw by Δθwm to obtain a measure of distortion of the SFS toward low frequency variants that is less sensitive to the sample size than Δθw itself. The resulting statistic is identical to the normalized measure of Tajima's D statistic proposed by Schaeffer (2002), except for a difference in sign. We have:

(14)

Computer simulations

We ran forwards-in-time simulations using SLiM 4.1 (Haller and Messer 2023) in Wright-Fisher mode. SLiM is designed for diploids. To model haploid individuals, we removed any mutation from each individual's second genome in each generation. Our deleterious mutation model, detailed above, allows back mutations. To allow for this in the simulations, we created 2 mutation types: a “bookkeeping” variant to track mutation events at deleterious sites, and another representing the actual deleterious variant. Whenever a mutation happened at a deleterious site, a bookkeeping variant was placed on that site. If there was no deleterious variant present at that site and chromosome, the bookkeeping variant was replaced by a deleterious one. If there was already a deleterious variant at the site, both the deleterious and the bookkeeping variants were removed. This implies that κ = 1. We included a third mutation type to represent neutral variants, for which we did not implement back mutations. We set the global mutation rate to 2 times u, implying an equal chance that a mutation resulted in a bookkeeping or a neutral variant. This allowed us to analyze deleterious and neutral genotype matrices separately with a mutation rate of u for each. We did not divide the genome into segments; deleterious and neutral mutations happened in an interspersed fashion, with neutral variants allowed to stack onto deleterious ones. We ran all simulations for 200,000 generations, since tracking variant frequencies over time indicated that simulations had essentially equilibrated by that time. At the end of a simulation run, we exported genotype matrices and variant site positions separately for neutral and deleterious variants. We ran simulations for 6 parameter sets: genome lengths L ∈ {2500, 10,000, 40,000}, s values ∈ {10−3, 10−2}, κ = 1, u = 10−5, N = 1000, with 200 replicate runs for each set.

Semianalytical SFS statistics

To obtain the expected SFS for neutral sites under HRI with a given set of parameters, we used the mathematical machinery developed by Polanski and Kimmel (2003) and Polanksi et al. (2003), as described above. We numerically solved Equation (7) to obtain B. We then used Equations (10) to obtain the apparent population size trajectory due to HRI and BGS—a monotonically decreasing function of the number of generations back in time, t, which asymptotes as t goes to infinity. For computational reasons, we discretized this function into 10 equal steps of population size decrease before using it as the input for computing the expected conditional SFS—see directory “PKsfs” in the Zenodo repository associated with this article.

Results

Levels of adaptation and neutral diversity under HRI

Fits to previous simulation results

Figures 1 and 2 compare p¯ values from simulations, taken from Supplementary Fig. 3 of Devi et al. (2023), with the predictions of Equations (7) and (9), for the case of κ = 1 and κ = 10, respectively. The values of p¯ from Equation (1) in the absence of HRI are also shown, providing a sense of the effect of HRI on the level of adaptation. The detailed results on which this section is based are shown in Supplementary File 2. In addition, Section 3 of Supplementary File 1 presents some analytical results on the effects on B of changes in the net deleterious mutation rate U, the selection coefficient s and the baseline effective population size N0, which are in broad agreement with the observed patterns.

Predicted vs simulated values of p¯, the equilibrium mean frequency of the selectively favorable allele, for the case of no mutational bias (k = 1). The x axes show values of obtained from the simulation results shown in Supplementary Fig. 3 of Devi et al. (2023). The y axes correspond to the predicted values of the values of N0s used in the simulations. The upper panels are for L = 104 and the lower panels are for L = 106. The circles (blue) are the values of in the absence of HRI, obtained from Equation (1a) (labeled as No HRI); the triangles are obtained from Equations (1) and (7) (labeled as HRI—B) the inverted triangles (green) are the predicted values of in the presence of HRI obtained from Equations (9) (labeled as HRI-D). The dashed straight lines correspond to y = x; if the predicted values of p-bar with HRI were identical to the simulated values, they would fall on these lines.
Fig. 1.

Predicted vs simulated values of p¯, the equilibrium mean frequency of the selectively favorable allele, for the case of no mutational bias (k = 1). The x axes show values of obtained from the simulation results shown in Supplementary Fig. 3 of Devi et al. (2023). The y axes correspond to the predicted values of the values of N0s used in the simulations. The upper panels are for L = 104 and the lower panels are for L = 106. The circles (blue) are the values of in the absence of HRI, obtained from Equation (1a) (labeled as No HRI); the triangles are obtained from Equations (1) and (7) (labeled as HRI—B) the inverted triangles (green) are the predicted values of in the presence of HRI obtained from Equations (9) (labeled as HRI-D). The dashed straight lines correspond to y = x; if the predicted values of p-bar with HRI were identical to the simulated values, they would fall on these lines.

Predicted vs simulated values of p¯, the equilibrium mean frequency of the selectively favorable allele, for the case of no strong bias (κ = 10). The x axes show values of obtained from the simulation results shown in Supplementary Fig. 3 of Devi et al. (2023). The y axes correspond to the predicted values of p¯, for the values of N0s used in the simulations. The upper panels are for L = 104 and the lower panels are for L = 106. The circles (blue) are the values of p¯, in the absence of HRI, obtained from Equation (1a) (labeled as No HRI); the triangles are obtained from Equations (1) and (7) (labeled as HRI—B) the inverted triangles are the predicted values of in the presence of HRI obtained from Equations (9) (labeled as HRI-D). The dashed straight lines correspond to y = x; if the predicted values of p¯, with HRI were identical to the simulated values, they would fall on these lines.
Fig. 2.

Predicted vs simulated values of p¯, the equilibrium mean frequency of the selectively favorable allele, for the case of no strong bias (κ = 10). The x axes show values of obtained from the simulation results shown in Supplementary Fig. 3 of Devi et al. (2023). The y axes correspond to the predicted values of p¯, for the values of N0s used in the simulations. The upper panels are for L = 104 and the lower panels are for L = 106. The circles (blue) are the values of p¯, in the absence of HRI, obtained from Equation (1a) (labeled as No HRI); the triangles are obtained from Equations (1) and (7) (labeled as HRI—B) the inverted triangles are the predicted values of in the presence of HRI obtained from Equations (9) (labeled as HRI-D). The dashed straight lines correspond to y = x; if the predicted values of p¯, with HRI were identical to the simulated values, they would fall on these lines.

The 2 approximate formulae both provide fairly good fits to the simulation results when there is no mutational bias, although both predictions deviate somewhat from the dashed line that corresponds to a perfect fit (Fig. 1). In this case, Equations (9) perform worse than Equations (7) with L = 104, weak selection and a low mutation rate (upper left panel), underestimating the effects of HRI. With a strong mutational bias and L = 104 (Fig. 2), Equations (9) perform much worse, for both sets of selection and mutation parameters. With L = 106, both approximations give similar results, although Equations (9) fit somewhat better in the absence of mutational bias. Further comparisons, using the simulation results of McVean and Charlesworth (2000) and Comeron and Kreitman (2002) are shown in sections 4 and 5 of File S1.

We also investigated the extent to which the reduction in neutral diversity at sites embedded within a nonrecombining block of selected sites can be predicted by our approach. Table 1 compares the predictions of B and B′ from the above equations with the simulation results of Devi et al. (2023), and Supplementary Table 1 in section 4 of Supplementary File S1 gives similar comparisons with the simulations of Comeron and Kreitman (2002). Overall, despite the approximations involved, agreement is satisfactory over a wide range of numbers of selected sites, strengths of selection, and mutational bias levels.

Table 1.

Comparisons of simulation results for B and B′ in Supplementary Fig. 4 of Devi et al. (2023) with the predictions of Equations (7) and (12).

LN0sSimulation BPredicted BSimulation B′Predicted B′
1030.21.0001.0001.0000.978
1040.21.0000.9961.0000.976
1050.20.9000.9661.0000.960
1060.20.7000.8150.8000.873
103200.4100.4830.6200.646
104200.2500.2850.4000.438
105200.1500.1580.2300.270
106200.0800.0810.1400.155
1032000.7410.7500.741a
1042000.0300.0500.0600.070
1052000.0100.0090.0200.021
1062000.0000.0040.0090.011
LN0sSimulation BPredicted BSimulation B′Predicted B′
1030.21.0001.0001.0000.978
1040.21.0000.9961.0000.976
1050.20.9000.9661.0000.960
1060.20.7000.8150.8000.873
103200.4100.4830.6200.646
104200.2500.2850.4000.438
105200.1500.1580.2300.270
106200.0800.0810.1400.155
1032000.7410.7500.741a
1042000.0300.0500.0600.070
1052000.0100.0090.0200.021
1062000.0000.0040.0090.011

The simulations assumed N0 = 106, u = 3 × 10–8, κ = 3.

aThis value was obtained from Equation (6), which is more accurate than numerical integration of Equation (11a) in this region of parameter space.

Table 1.

Comparisons of simulation results for B and B′ in Supplementary Fig. 4 of Devi et al. (2023) with the predictions of Equations (7) and (12).

LN0sSimulation BPredicted BSimulation B′Predicted B′
1030.21.0001.0001.0000.978
1040.21.0000.9961.0000.976
1050.20.9000.9661.0000.960
1060.20.7000.8150.8000.873
103200.4100.4830.6200.646
104200.2500.2850.4000.438
105200.1500.1580.2300.270
106200.0800.0810.1400.155
1032000.7410.7500.741a
1042000.0300.0500.0600.070
1052000.0100.0090.0200.021
1062000.0000.0040.0090.011
LN0sSimulation BPredicted BSimulation B′Predicted B′
1030.21.0001.0001.0000.978
1040.21.0000.9961.0000.976
1050.20.9000.9661.0000.960
1060.20.7000.8150.8000.873
103200.4100.4830.6200.646
104200.2500.2850.4000.438
105200.1500.1580.2300.270
106200.0800.0810.1400.155
1032000.7410.7500.741a
1042000.0300.0500.0600.070
1052000.0100.0090.0200.021
1062000.0000.0040.0090.011

The simulations assumed N0 = 106, u = 3 × 10–8, κ = 3.

aThis value was obtained from Equation (6), which is more accurate than numerical integration of Equation (11a) in this region of parameter space.

Fits to new simulation results

We used computer simulations to further investigate the effects of HRI on levels of adaptation and neutral diversity, with a special emphasis on the extent to which HRI affects the SFS at neutral sites. We also used them to determine whether the simulated effects of LD can be used to correct the diversity at selected sites and genetic variance at selected sites that is predicted by Equations (2)—see the subsection above on The effect of HRI on the level of adaptation, and section 1 of the Appendix. The detailed simulation results are shown in Supplementary File S3.

A summary of the main simulation results and predicted values for the statistics p¯, B, and B′ is shown in Fig. 3. Table 2 shows the results for the neutral diversities πneut, the simulated and uncorrected predicted values of the diversities at selected sites (πsel), their predicted diversities corrected for the effects of LD using the LD estimates from the simulations (πsel1), the πsel/πneut ratios obtained from the simulations and the corresponding ratios obtained from the uncorrected and corrected formulae for πsel, and Δθw.

The mean simulated values (solid bars) and theoretical values (striped bars) of the equilibrium mean frequency of the favored allele (p¯), the ratio of the fixation ne to N0 (B), and the ratio of the coalescence ne to N0 (B′), for the cases of no HRI and 3 different numbers of selected sites (L). The bars representing p¯, B, and B′ are the left-hand, middle, and right-hand bars for each case. The left-hand panel is for γ0 = 2 and the right-hand panel is for γ0 = 20. N0 = 1000, u = 10–5, and κ = 1 (i.e, no mutational bias).
Fig. 3.

The mean simulated values (solid bars) and theoretical values (striped bars) of the equilibrium mean frequency of the favored allele (p¯), the ratio of the fixation ne to N0 (B), and the ratio of the coalescence ne to N0 (B′), for the cases of no HRI and 3 different numbers of selected sites (L). The bars representing p¯, B, and B′ are the left-hand, middle, and right-hand bars for each case. The left-hand panel is for γ0 = 2 and the right-hand panel is for γ0 = 20. N0 = 1000, u = 10–5, and κ = 1 (i.e, no mutational bias).

Table 2.

Comparisons of theoretical predictions of population statistics with the new simulation results.

Parametersπneut
(sim)
πneut
(pred)
πsel
(sim)
πsel
(pred.u)
πsel
(pred.c)
πsel/πneu
(sim)
πsel/πneu
(pred.u)
πsel/πneu
(pred.c)
Δθw
(sim)
Δθw
(PK)
Δθw
(Coal)
No HRI, γ0 = 20.020.020.01530.01530.01530.3820.3820.382000
No HRI
γ0 = 20
0.020.020.00200.00200.00200.0500.0500.050000
L = 2500
γ0 = 2
0.01450.01200.01180.00690.01010.8220.5750.8410.173 ± 0.0380.1630.122
L = 2500
γ0 = 20
0.00640.00550.00240.00180.00240.3810.3270.4360.391 ± 0.0300.3440.182
L = 10,000
γ0 = 2
0.01040.00940.00910.00480.00750.8730.5110.7980.290 ± 0.0330.2340.128
L = 10,000
γ0 = 20
0.00380.00350.00240.00130.00220.6120.3720.6290.510 ± 0.0280.4550.233
L = 40,000
γ0 = 2
0.00800.00670.00720.00330.00600.9000.4920.8960.357 ± 0.0290.3080.186
L = 40,000
γ0 = 20
0.00240.00240.00190.00090.00170.8000.3750.7080.573 ± 0.0210.5330.651
Parametersπneut
(sim)
πneut
(pred)
πsel
(sim)
πsel
(pred.u)
πsel
(pred.c)
πsel/πneu
(sim)
πsel/πneu
(pred.u)
πsel/πneu
(pred.c)
Δθw
(sim)
Δθw
(PK)
Δθw
(Coal)
No HRI, γ0 = 20.020.020.01530.01530.01530.3820.3820.382000
No HRI
γ0 = 20
0.020.020.00200.00200.00200.0500.0500.050000
L = 2500
γ0 = 2
0.01450.01200.01180.00690.01010.8220.5750.8410.173 ± 0.0380.1630.122
L = 2500
γ0 = 20
0.00640.00550.00240.00180.00240.3810.3270.4360.391 ± 0.0300.3440.182
L = 10,000
γ0 = 2
0.01040.00940.00910.00480.00750.8730.5110.7980.290 ± 0.0330.2340.128
L = 10,000
γ0 = 20
0.00380.00350.00240.00130.00220.6120.3720.6290.510 ± 0.0280.4550.233
L = 40,000
γ0 = 2
0.00800.00670.00720.00330.00600.9000.4920.8960.357 ± 0.0290.3080.186
L = 40,000
γ0 = 20
0.00240.00240.00190.00090.00170.8000.3750.7080.573 ± 0.0210.5330.651

The terms (sim) and (pred) refer to the simulated and theoretical values of the statistics in question; (pred.u) and (pred.c) refer to the uncorrected theoretical values of πsel and the theoretical values of πsel corrected for LD by the method described in the main text, respectively. The Δθw column gives the mean values of the ratio of Δθw for neutral sites relative to its maximal value for a sample size of 20. PF and Coal refer to the Polanski et al. and coalescent SFS predictions, respectively. 95% confidence intervals of the mean are shown for Δθw; the other statistics have negligible CIs. A population of size 1,000 with a mutation rate of 10–5 and no mutational bias was simulated; the values with no HRI were obtained from the relevant theory.

Table 2.

Comparisons of theoretical predictions of population statistics with the new simulation results.

Parametersπneut
(sim)
πneut
(pred)
πsel
(sim)
πsel
(pred.u)
πsel
(pred.c)
πsel/πneu
(sim)
πsel/πneu
(pred.u)
πsel/πneu
(pred.c)
Δθw
(sim)
Δθw
(PK)
Δθw
(Coal)
No HRI, γ0 = 20.020.020.01530.01530.01530.3820.3820.382000
No HRI
γ0 = 20
0.020.020.00200.00200.00200.0500.0500.050000
L = 2500
γ0 = 2
0.01450.01200.01180.00690.01010.8220.5750.8410.173 ± 0.0380.1630.122
L = 2500
γ0 = 20
0.00640.00550.00240.00180.00240.3810.3270.4360.391 ± 0.0300.3440.182
L = 10,000
γ0 = 2
0.01040.00940.00910.00480.00750.8730.5110.7980.290 ± 0.0330.2340.128
L = 10,000
γ0 = 20
0.00380.00350.00240.00130.00220.6120.3720.6290.510 ± 0.0280.4550.233
L = 40,000
γ0 = 2
0.00800.00670.00720.00330.00600.9000.4920.8960.357 ± 0.0290.3080.186
L = 40,000
γ0 = 20
0.00240.00240.00190.00090.00170.8000.3750.7080.573 ± 0.0210.5330.651
Parametersπneut
(sim)
πneut
(pred)
πsel
(sim)
πsel
(pred.u)
πsel
(pred.c)
πsel/πneu
(sim)
πsel/πneu
(pred.u)
πsel/πneu
(pred.c)
Δθw
(sim)
Δθw
(PK)
Δθw
(Coal)
No HRI, γ0 = 20.020.020.01530.01530.01530.3820.3820.382000
No HRI
γ0 = 20
0.020.020.00200.00200.00200.0500.0500.050000
L = 2500
γ0 = 2
0.01450.01200.01180.00690.01010.8220.5750.8410.173 ± 0.0380.1630.122
L = 2500
γ0 = 20
0.00640.00550.00240.00180.00240.3810.3270.4360.391 ± 0.0300.3440.182
L = 10,000
γ0 = 2
0.01040.00940.00910.00480.00750.8730.5110.7980.290 ± 0.0330.2340.128
L = 10,000
γ0 = 20
0.00380.00350.00240.00130.00220.6120.3720.6290.510 ± 0.0280.4550.233
L = 40,000
γ0 = 2
0.00800.00670.00720.00330.00600.9000.4920.8960.357 ± 0.0290.3080.186
L = 40,000
γ0 = 20
0.00240.00240.00190.00090.00170.8000.3750.7080.573 ± 0.0210.5330.651

The terms (sim) and (pred) refer to the simulated and theoretical values of the statistics in question; (pred.u) and (pred.c) refer to the uncorrected theoretical values of πsel and the theoretical values of πsel corrected for LD by the method described in the main text, respectively. The Δθw column gives the mean values of the ratio of Δθw for neutral sites relative to its maximal value for a sample size of 20. PF and Coal refer to the Polanski et al. and coalescent SFS predictions, respectively. 95% confidence intervals of the mean are shown for Δθw; the other statistics have negligible CIs. A population of size 1,000 with a mutation rate of 10–5 and no mutational bias was simulated; the values with no HRI were obtained from the relevant theory.

The corrections for the effects of LD on diversity were obtained from Equation (A1a) of the Appendix, using the fact that all sites experience the same evolutionary forces. The sum of the D values over all L (L−1)/2 pairs of sites, ijDij, was obtained for each replicate simulation, where D = 0 for any case in which one member of a pair is not segregating. The correction for the mean diversity at a selected site, C, was obtained from the mean of the Dij over all pairs of selected sites (D¯) as follows:

(15)

The factor of L−1 arises because a given site potentially experiences the effects of LD at L−1 other sites. C is then added to πsel in Equation (2a) to obtain the corrected value of the diversity at selected sites.

The simulations show that D¯ is always negative, as expected from previous work on HRI with this type of model (e.g. McVean and Charlesworth 2000), so that the correction is invariably positive. Table 2 shows that this correction brings the predicted diversities at selected sites close to the simulated values, validating the approximations used to derive Equations (1). Figure 3 also confirms that p¯, B, B′ are reasonably well predicted by the formulae derived here, as was previously seen in Figs. 1 and 2 and Table 1.

The behavior of Δθw, which measures the distortion of the gene genealogy toward longer external branches (and hence the distortion of the SFS toward an excess of rare derived variants) is perhaps of more interest, as the values of this statistic have not previously been predicted. Table 2 shows that, as expected, the value of Δθw for a sample of 20 genomes increases as the effect of HRI on B′ increases. The value of Δθw using the predictions based on the trajectory of Ne backwards over time (Equations 10), combined with the equations of Polanski and Kimmel (2003) that describe the corresponding SFS (see the subsection Semianalytical SFS statistics), provides an excellent fit to the simulation results. In contrast, the predictions based on the coalescent process formulae (Equations A8 and A9) substantially underestimate HRI effects. Figure 4 shows the simulated and predicted mean SFSs for a sample size of 20, showing the accuracy of the Polanski-Kimmel predictions; comparisons with the SFS expected in the absence of HRI illustrate the considerable distortion toward low frequency variants at neutral sites caused by HRI for most of the frequency range.

Site frequency spectra for a sample of 20 neutral alleles in a haploid, nonrecombining genome, with no mutational bias, for the sets of parameters used in Fig. 3. For a given number of variants (1, 2, etc.) the bars from left to right are as follows: the SFS for the pooled simulation results (filled bar), the Polanski et al. (2023) prediction (empty bar), and the SFS in the absence of HRI (stippled bar). The top panel is for γ  0= 2 and the bottom panel is for γ  0 = 20. The left-hand, middle, and right-hand panels are for L = 2500, 10,000, and 40,000, respectively.
Fig. 4.

Site frequency spectra for a sample of 20 neutral alleles in a haploid, nonrecombining genome, with no mutational bias, for the sets of parameters used in Fig. 3. For a given number of variants (1, 2, etc.) the bars from left to right are as follows: the SFS for the pooled simulation results (filled bar), the Polanski et al. (2023) prediction (empty bar), and the SFS in the absence of HRI (stippled bar). The top panel is for γ  0= 2 and the bottom panel is for γ  0 = 20. The left-hand, middle, and right-hand panels are for L = 2500, 10,000, and 40,000, respectively.

Supplementary Fig. 1 in section 6 of Supplementary File 1 summarizes the distributions of the summary statistics used here across the 200 replicates simulations of each parameter set. A dataset on a single nonrecombining genomic region should correspond to a single outcome of the evolutionary process, so that the boxes and whiskers in these plots provide a picture of the probable spread of possible observations for a given parameter set. Δθw is the noisest of the statistics, followed by B′ when selection is weak. In other cases, the mean values of the statistics provide a fairly good idea of what to expect for a single observation.

Discussion

Effects of HRI on the level of adaptation

The results presented here confirm the conclusion from several earlier studies that HRI induced by blocks of tightly linked sites under purifying selection can have major effects on the level of adaptation as measured by the mean frequency (p¯) at which selected sites are occupied by favorable alleles in a genome (Li 1987; Comeron et al. 1999; McVean and Charlesworth 2000; Tachida 2000; Comeron and Kreitman 2002; Charlesworth et al. 2010; Devi et al. 2023). As described in the section Results: levels of adaptation and neutral diversity under HRI, Equations (1), (7), and (8) provide a simple and rapid method for obtaining an approximate expression for the magnitude of this effect. In the present paper, the variable B measures the factor by which the baseline effective population size in the absence of HRI (N0) must be multiplied in order to obtain the Ne for use in the standard diffusion equation formula for the fixation probability of a new mutation (Kimura 1964; Santiago and Caballero 2016; Buffalo and Kern 2024)—the “fixation Ne” of Devi et al. (2023). These effects do not require a small population size; rather, a combination of the number of selected sites, the mutation rate and Nos determines B in a highly nonlinear fashion.

The p¯is related to B via the Li-Bulmer equation (Equation 1a), where γ = 2BN0s = 0 in the presence of HRI, and provides a measure of codon usage bias under the simple model where A1 at a site represents the preferred codon and A2 represents an unpreferred alternative (Li 1987; Bulmer 1991; McVean and Charlesworth 1999). A large body of evidence suggests that γ for synonymous site variants is usually of the order of one in normally recombining genomic regions of species where there is evidence for selection on codon usage, but with considerable variation between genes, associated with factors such as gene expression levels (Sharp et al. 2005, 2010; Hershberg and Petrov 2008). As found in the simulation studies cited above, a sufficiently large number of such weakly selected sites can produce a strong reduction in B and hence in p¯, in the absence of any other types of selection, such as purifying selection at nonsynonymous sites. This is illustrated in Fig. 5 for the panels with γ  0 = 1 (for the detailed results, see Supplementary File 4).

Plots of the approximate values of population statistics obtained from equations (2a), (7), and (12) (Y axes) against the number of selected sites (X axes; log2 scale) for mutational biases of κ = 1 (upper panels) and 2 (lower panels), with γ0 = 1 (left-hand panels), γ0 = 10 (middle panels), and γ0 = 100 (right-hand panels). The ratio of πsel with HRI to its value in the absence of HRI is denoted by “Relative πsel.”
Fig. 5.

Plots of the approximate values of population statistics obtained from equations (2a), (7), and (12) (Y axes) against the number of selected sites (X axes; log2 scale) for mutational biases of κ = 1 (upper panels) and 2 (lower panels), with γ0 = 1 (left-hand panels), γ0 = 10 (middle panels), and γ0 = 100 (right-hand panels). The ratio of πsel with HRI to its value in the absence of HRI is denoted by “Relative πsel.

This pattern is consistent with the evidence for reductions in B and B′ in genomes and genomic regions where recombination is rare or absent, as reflected in levels of codon usage and silent site diversity, respectively (e.g. Qiu et al. 2011; Szövényi et al. 2011; Charlesworth and Campos 2014; Hough et al. 2014). This is especially true for neo-Y chromosomes in Drosophila, such as that of D. miranda, where a whole chromosome arm has stopped recombining due to the lack of recombination in males (e.g. Mahajan et al. 2018). For the D. miranda neo-Y chromosome, silent site diversity is approximately 1% of the genome-wide average (Bartolomé and Charlesworth 2006). Smaller nonrecombining regions in Drosophila, such as the centromeric regions, have silent site diversities of the order of 10% of the genome-wide average (Becher et al. 2020). Of course, other factors, such as selective sweeps of favorable mutations and HRI effects of strongly selected nonsynonymous mutations on weakly selected synonymous mutations are also likely to be involved in causing these patterns.

With γ0 = 1 in the absence of HRI and a mutational bias of κ = 2 toward deleterious alleles, the ratio of p¯ to its value of 0.576 in the absence of HRI declines almost linearly with the log2 of L, reaching ∼0.65 with 1 million selected sites, corresponding to 2,000 genes with 500 synonymous sites. This number of genes is comparable with those in a typical Drosophila chromosome arm (Misra et al. 2002), and so would apply to a Drosophila neo-Y chromosome. The value of B itself declines much faster, with a value of ∼0.1 with 1 million sites. The fixation probability of a deleterious mutation relative to that of a neutral mutation is equal to γ/[exp(γ)−1], and that of a favorable mutation is γ/[1−exp(−γ)] (Kimura 1964). These quantities take values of 0.58 and 1.58 with no HRI, respectively, becoming 0.95 and 1.05, respectively, with 1 million selected sites. This implies a sizeable increase in the rate of substitution of deleterious mutations, and a decrease for favorable ones, in a large nonrecombining genome. This effect can explain features such as the much faster rate of substitutions of mutations toward unpreferred codons relative to preferred codons in genes on the newly nonrecombining neo-Y chromosome of Drosophila miranda (Bartolomé and Charlesworth 2006), even in the absence of any other source of HRI effects.

With stronger selection and hence a larger value of γ0, the probability of fixation of a favorable allele when B is small can still be relatively high, and the probability of fixation of its deleterious alternative correspondingly low, simply because γ = 0 remains large. The lower middle panel of Fig. 5 shows that B with κ = 2 and γ0 = 10 is ∼0.05 with 1 million selected sites, whereas the ratio of p¯ to its value in the absence of HRI is ∼0.42. The fixation probabilities of deleterious and favorable mutations, relative to neutrality, change from values of 0.00045 and 10, respectively, in the absence of HRI to 0.98 and 1.03 with 1 million selected sites.

With γ0 = 100, there is a similar pattern, except that p¯ shows a threshold effect, with little or no decline with L until there are ∼50,000 selected sites, corresponding to B reaching a value of ∼0.05, so that γ = 5. This reflects the well-known quasi-threshold behavior of p¯ in Equation (1a) as a function of γ. With strongly selected variants, such as nonsynonymous mutations, it thus may be hard to detect reduced adaptation in nonrecombining genomic regions simply from the abundance of putatively deleterious mutations, unless the region includes hundreds of thousands of selected sites. Nonetheless, increased ratios of nonsynonymous to synonymous substitutions have been detected in nonrecombining genomic regions such as the D. miranda neo-Y chromosome (Bartolomé and Charlesworth 2006), the Y chromosome of Rumex hastulus (Hough et al. 2014) and the pericentromeric and telomeric regions of D. melanogaster and its relatives (Campos et al. 2012, 2014).

It is important to remember that, in the absence of recombination, B can depart massively from the classical BGS prediction of Equation (6); the same applies to the measure B′, which we have used to predict the effects of HRI on linked neutral sites (Equation 12). The classical formula assumes that all selected sites are at deterministic mutation-selection equilibrium and has been used either to predict neutral diversity at linked sites (Charlesworth et al. 1993), and the fixation probabilities of mutations that are under much weaker selection than the selected ones (Charlesworth 1994; Peck 1994; Orr and Kim 1998; Johnson and Barton 2002). Departures from the predictions of Equation (6) can occur in the absence of reverse mutations, even with γ0 >> 1, provided that U/s is so large that the expected number of mutation-free individuals in the population, N0 exp(−U/s), is small enough for them to be vulnerable to stochastic loss, so that Muller's ratchet operates in the absence of reverse mutations from mutant to wild-type (Charlesworth et al. 1993, 1995; Gordo et al. 2002; Nicolaisen and Desai 2013; Good et al. 2014; Cvijovic et al. 2018, Melissa et al. 2022). If γ is of order 1 or less, deleterious allele frequencies may depart considerably from their deterministic equilibria, which again means that Equation (6) is inappropriate. Equations (7), (8), and (12) thus provide a more robust means than Equation (6) of predicting the consequences of selection against deleterious mutations in a low recombination environment.

Effects of HRI on the ratio of nucleotide diversities at selected sites vs neutral sites

Another method for detecting a weakening of the efficacy of selection caused by HRI has used the ratio of mean nucleotide site diversity at selected sites (πsel) to mean diversity at putatively neutral sites in the same genomic region (πneut). In analyzing data, this approach commonly uses the ratio of the mean nonsynonymous site diversity (πN) to that for synonymous sites (πS), and in related measures, such as the diversity ratio at 0-fold coding sites (π0) to that for 4-fold degenerate sites (π4) (e.g. Charlesworth and Campos 2014; Bast et al. 2018; Castellano et al. 2018). Our results show that πneut for a set of neutral sites embedded in a nonrecombining section of the genome is well approximated by Equations (7) and (12), which determine B′, the ratio of πneut to its value in the absence of HRI. πneut is given by the limit of Equation (2a) as γ approaches zero, 4N0u/(1+κ), assuming that neutral sites are at statistical equilibrium with respect to their base composition and 4N0u << 1.

A complication, however, is that the simple use of 2BN0s for γ in the equation for the diversity at selected sites (Equation 2a) underestimates πsel, since the effect of LD in weakening selection against deleterious alleles is not fully taken into account simply by using 2BN0s in this equation (see section 1 of the  Appendix). While this effect can be approximately corrected for if the sum of the D values across all pairs of sites is known (see section 1 of the  Appendix and Table 2), this sum is not currently predictable analytically, except when selection is sufficiently strong that mutations are kept close to their frequencies under mutation-selection balance (Roze 2021).

The extent to which πsel/πneut is underestimated can be assessed using the results in Table 2. With no HRI, πsel/πneut is 0.380 for γ0 = 2 and 0.050 with γ0 = 20. With L = 2500, the simulated and uncorrected values of πsel/πneut are 0.875 and 0.511 for γ = 2; for γ0 = 20, the corresponding values are 0.375 and 0.327. With L = 10000, the simulated and uncorrected values of πsel/πneut are 0.875 and 0.510 for γ0 = 2; for γ0 = 20, the corresponding values are 0.632 and 0.371. The extent of underestimation of πsel/πneut without correcting for LD clearly gets worse with stronger selection and larger numbers of selected sites within the nonrecombining region. Nevertheless, for the parameters used in Table 2, Equations (2a) and (7) still predict a considerable increase in πsel/πneut due to HRI.

Predictions based on these equations are shown in Fig. 5 for a much wider range of values of L and selection parameters (the detailed results are presented in section 4 of Supplementary File 1). Somewhat unexpectedly, πsel/πneut decreases with L when selection is very weak (γ0 = 1), decreasing almost linearly from a value of 0.92 and 1.09 with κ = 1 and 2, respectively, to ∼0.42 and 0.58 at 1 million sites. This reflects the much slower decrease in B′ with L compared with B, so that neutral diversity declines more slowly that the effective strength of selection against deleterious mutations. However, with stronger selection, B′ and B are much closer, and πsel/πneut increases substantially as L increases, either reaching a plateau (κ = 2) or declining slightly (κ = 1). Given the inaccuracies in the formula for πsel, the exact values of πsel/πneut should be treated with caution, but the general patterns are probably approximately correct. The overall magnitude of the effect of HRI in causing increased values of πsel/πneut is underestimated by these results, so they can be regarded as conservative predictions of this measure of the effect of reduced efficacy of selection.

It might be thought that using B′ instead of B in Equation (2a) would yield better predictions of πsel/πneut, since B′ values are more closely related to the behavior of segregating variants. This is not, however, borne out by the results in Fig. 3. Applying the values of B′ and mean πneut obtained from the simulations to Equation (2a), the predicted values of πsel/πneut are 0.853 and 0.311 for L = 2,500 with γ0 = 2 and 20, respectively, and 0.904 and 0.504 with L = 1,000 with γ0 = 2 and 20, respectively. Comparing these with the simulation values for πsel/πneut presented above shows that πsel/πneut is over-predicted when γ0 = 2 and underpredicted when γ0 = 20, with stronger underprediction than overprediction.

This result implies that it may be unsafe to use estimates of Ne from levels of putatively neutral diversity to make quantitative inferences concerning the effects of HRI on sites under selection. As an example of such an inference, Castellano et al. (2018) used the relation between ln(πN/πS) and ln(πS) across different regions of the genome of D. melanogaster that differ in the rate of recombination to infer the shape parameter of the distribution of mutational effects (DFE) of deleterious nonsynonymous mutations, on the assumption that this is a gamma distribution, and that the slope of this relationship is equal to the negative of the shape parameter of the gamma distribution (Welch et al. 2008). Their estimate of this parameter (∼0.5) was much larger than the value of ∼0.3 commonly found for D. melanogaster from population genomics methods (e.g. Kousathanas and Keightley 2013). The failure of B′ to accurately predict diversity at selected sites in the presence of HRI could be a source of this discrepancy.

Another caveat concerning the use of an increase in πsel/πneut as evidence for a reduced efficacy of selection on nonsynonymous sites is that a fraction of such sites could be selectively neutral, and the rest are subject to such strong purifying selection that they are insensitive to HRI. In such a case, πsel/πneut would automatically increase with a reduction in πneut (Campos et al. 2014). Other information is thus required to conclude definitively that an increase in πsel/πneut in low recombination genomic regions is caused by HRI, as discussed by Campos et al. (2014). If a sufficient number of genes is available for comparing regions with differing recombination rates with good statistical accuracy, a useful criterion for the operation of HRI would be a lower value of mean πsel in regions with low recombination rates compared with recombining regions (see Figs. 3 and 5), provided that the 2 regions do not consistently differ in other properties affecting πsel.

In contrast, the hypothesis that a fraction Ps of selected sites is under strong selection and immune to HRI, and a fraction (1—Ps) behaves neutrally, predicts an increase in πsel as Ne decreases, due to HRI. With this model, πsel=Psπstrong+(1Ps)Bπneut0, where πstrong is the (constant) diversity at strongly selected sites and πneut0 is the neutral diversity in the absence of HRI. The ratio of πsel to its value with B′= 1 is thus given by:

(16)

This ratio is an increasing function of B′, in contrast to what is expected when diversity at selected sites is affected by HRI, as described above. This test was applied by Campos et al. (2014), whose Tables 1 and 2 showed that the mean value of πN over 225 genes in 3 autosomal regions of D. melanogaster genomes with no crossing over in a sample from a Rwandan population was 0.00069, compared with a mean value of 0.00143 over 7,099 genes in autosomal regions with a “normal” rate of crossing over, a ratio of 0.48, with a highly significant difference. In contrast, the corresponding mean values of πS for the 2 regions were 0.00154 and 0.0141, a ratio of 0.109; the values of πN/πS for the 2 regions were 0.254 and 0.101, respectively. These observations are thus consistent with the hypothesis that the DFE for nonsynonymous mutations is such that HRI results in decreased πN, rather than the increase predicted by Equation (16).

Of course, such a test is not conclusive, since other factors might influence the properties of the genes in the regions. Other criteria, including reduced frequencies of optimal codons, GC content, and an increase in dN/dS, and in the lengths of introns in low recombination regions in this example (Campos et al. 2012) all point to effects of HRI in undermining the efficacy of purifying selection in this case.

Effects of HRI on diversity at neutral sites

As has been known since the earliest study of the effects of BGS on neutral diversity (Charlesworth et al. 1993), HRI causes departures from the deterministic equilibrium frequencies of deleterious mutations on which Equation (6), leading to a considerable weakening of its effect on diversity at linked neutral sites, as measured here by B′. Nevertheless, increased levels of HRI are expected to lead to reduced values of B′ (e.g. Gordo et al. 2002; Kaiser and Charlesworth 2009; Seger et al. 2010; Good et al. 2014; Santiago and Caballero 2016; Hough et al. 2017). This pattern is illustrated in Fig. 5, where B′ is plotted against the number of sites under selection (L), for several strengths of selection and 2 levels of mutational bias. There is a strong tendency for the rate of decrease in B′ to decline as L increases (note that L is measured on a log2 scale), although it does not seem that a true asymptote is reached, as was initially proposed by Kaiser and Charlesworth (2009) on the basis of simulations. With a million selected sites and a scaled selection coefficient of 100, neutral diversity is reduced to a very low level, ∼2% of its value in the absence of HRI. This is consistent with the evidence for extreme reductions in levels of neutral or nearly neutral diversity in large nonrecombining genomic regions (reviewed by Charlesworth and Jensen 2021). Note, however, that Fig. 3 shows that B′tends to be somewhat underestimated by the approximations used here, especially when L is small.

Distortions of the SFS at neutral sites

As has been shown in many previous studies (Charlesworth et al. 1995; Santiago and Caballero 1998; O'Fallon et al. 2010; Nicolaisen and Desai 2013; Good et al. 2014; Cvijvocic et al. 2018), HRI is also expected to cause a distortion of the SFS at linked neutral sites toward more than expected low frequency derived variants in sequences in nonrecombining genome regions—see Table 2 and Fig. 4.

Many empirical studies have detected such an effect in comparisons of low and high recombination genomic regions (reviewed by Charlesworth and Jensen 2021). Tajima's D statistic (Tajima 1989) is a commonly used measure of the extent of the distortion of the SFSs, compared with the equilibrium neutral expectation—see Equation (S10) in section 1 of Supplementary File 1. This statistic has several undesirable properties, notably a dependence on the number of segregating sites in the sample, which in turn depends on the size of genome region analyzed, as well as on the sample size. Other proposed measures include Schaeffer's Dmin statistic, which compares the estimated Tajima's D with its maximum absolute value for a given sample size and number of segregating sites (Schaeffer 2002), the Δπ statistic of Langley et al. (2014) and the Δθw statistic of Becher et al. (2020)—see Equation (13). Here, we used a modification of Δθw, Δθw, which is the ratio of Δθw to its maximum possible value when there is an excess of rare variants (Equation 14); this is equivalent to—Dmin. This statistic is not completely free of dependence on the sample size, but it is far less sensitive to sample size than Tajima's D, as can be seen from Supplementary Table 4 in section 7 of Supplementary File S1, which is based on the simulations used to generate Table 2. Moreover, section 2 of the Appendix shows that Δθw can be estimated from the SFS alone, without reference to the number of segregating sites, and is therefore useful for analyzing population genomic data. It thus seems to have considerable utility as a measure of the skew of the SFS away from equilibrium neutral expectation.

The present approach does not, however, fully describe the processes involved in determining the SFS. Fixations of either A1 or A2 occur at rates governed by the products of the proportions of sites fixed for A2 or A1 (p¯ and 1p¯, respectively) and their expected rates of substitution. With L sites under selection, the overall expected rate of fixation of new mutations per generation is given by:

(17)

where γ = 0 (Charlesworth and Charlesworth 2000, p.275, Equation 6.11).

This quantity can become large when the number of sites is large, allowing “sweeps” of new mutations of either type to become frequent. For example, for the example in Table 2 with N0 = 1000, γ0 = 20, u = 10–5, κ = 1, and L = 40000, the simulation results gave B = 0.066, so that γ = 0 = 1.32, giving λ = 0.40, i.e. one sweep every 2.5 generations in the simulations (note, however, that if the parameters are rescaled to a population size of 106, this corresponds to a sweep every 2,500 generations). There is, therefore, a chance that a derived neutral variant can be caught up in a sweeping haplotype and be present at a high frequency at the time the population is sampled, leading to a small fraction of sites with derived alleles at high frequencies (e.g. Fay and Wu 2000). This possibility is not captured by the approximations used here, which consider only the effect on the SFS of the apparent population expansion created by hitchhiking effects (Equations 10). Devi et al. (2023) used an argument based on models of clonal interference between sweeping mutations to obtain approximate expressions for the effects of HRI, such as Equations (9), but did not consider the SFS.

Cvijovic et al. (2018) analyzed the case of irreversible deleterious mutations for a nonrecombining genome with Lu/s >> 1, and derived approximations for the SFS that predict an uptick in the probabilities of high frequency derived variants at the extreme end of the distribution of frequencies. Careful examination of Fig. 4 shows that this effect is visible when the frequencies of the classes with 19 and 18 derived variants in a sample of 20 are compared with each other, with class 19 being consistently more frequent than 18, even with γ = 2 (Lu/s = 2). The effect is more noticeable with larger sample sizes (see Supplementary Table 5 in Supplementary File 1, and the frequency spectra in Supplementary File 3). This deficiency in the application of the Polanski et al. (2002) formula to the SFS under HRI probably explains why Fig. 4 shows that it consistently slightly underestimates the probabilities of relatively high frequency derived variants. The overall effect is, however, small, and the Δθw summary statistic is accurately predicted by this formula, at least for the parameter sets displayed in Table 2.

Limitations of the study and future prospects

While our results provide some useful insights into the effects of HRI on patterns of genetic diversity and levels of adaptation when recombination is absent, they have some obvious limitations. First, only a single selection coefficient is assigned to each site under selection; in reality, a wide distribution of selection coefficients is likely to apply to deleterious mutations (e.g. Eyre-Walker and Keightley 2009; Kim et al. 2017). It is not clear how to deal with this problem, as previous simple examples involving 2 strengths of selection suggest that integrating Equations (10) over a distribution of selection coefficients (Santiago and Caballero 2016; Buffalo and Kern 2024) may not be adequate in low recombination regimes, a topic that needs to be explored in future simulation studies. Our concern is that sites under strong selection are likely to have larger effects on weakly selected sites than vice-versa (Gordo and Charlesworth 2001; Kaiser and Charlesworth 2010). Devi et al. (2023) have examined the properties of p¯ with up to 3 classes of selected sites for the reversible mutation model by a heuristic argument, but a definitive treatment of this problem seems to be hard to obtain.

Second, there is evidence that gene conversion events occur with significant frequencies in noncrossover regions of genomes, at least in Drosophila (Comeron et al. 2012) and humans (Palsson et al. 2025), and that low frequency recombination events can happen in apparently asexual lineages such as the Bdelloid rotifers (Vakhrusheva et al. 2020; Laine et al. 2022). It is therefore of interest to examine the robustness of our conclusions to low frequency recombination events, although Kaiser and Charlesworth (2009) found only small effects of gene conversion in their simulations. Finally, we do not consider diploidy, where the recessivity or partial recessivity of deleterious mutations can result in effects of associative overdominance or pseudo-overdominance that act in the opposite direction to the HRI effects discussed here (Zhao and Charlesworth 2016; Becher et al. 2020; Gilbert et al. 2020; Abu-Awad and Waller 2023) Further developments of analytical approximations and comparisons with simulation results are needed to investigate these questions further.

Data availability

No new data or reagents were generated for this work. The codes for the computer programs used to generate the results described above are available on https://doi.org/10.5281/zenodo.14024806. The numerical results used to produce the figures and tables are presented in Supplementary Files S1–S4.

Supplemental material available at GENETICS online.

Acknowledgments

We thank Deborah Charlesworth, Denis Roze, Enrique Santiago and an anonymous reviewer for their comments on an earlier version of this paper. This work has made use of the resources provided by the Edinburgh Compute and Data Facility (ECDF) (http://www.ecdf.ed.ac.uk/).

Funding

This work was not funded.

Literature cited

Abu-Awad
 
D
,
Waller
 
DM
.
2023
.
Conditions for maintaining and eroding pseudo-overdominance and its contribution to inbreeding depression
.
Peer Comm. J
.
3
:
e8
. .

Bartolomé
 
C
,
Charlesworth
 
B
.
2006
.
Evolution of amino-acid sequences and codon usage on the Drosophila miranda neo-sex chromosomes
.
Genetics
.
174
(
4
):
2033
2044
. .

Bast
 
J
,
Parker
 
DJ
,
Dumas
 
Z
,
Jalvingh
 
KM
,
Tran Van
 
P
,
Jaron
 
KS
,
Figuet
 
E
,
Brandt
 
A
,
Galtier
 
N
,
Schwander
 
T
.
2018
.
Consequences of asexuality in natural populations: insights from stick insects
.
Mol Biol Evol.
 
35
(
7
):
1668
1677
. .

Becher
 
H
,
Jackson
 
B
,
Charlesworth
 
B
.
2020
.
Patterns of genetic variability in genomic regions with low rates of recombination
.
Curr Biol.
 
30
(
1
):
94
100
. .

Buffalo
 
V
,
Kern
 
AD
.
2024
.
A quantitative genetic model of background selection in humans
.
PloS Genet
.
20
(
3
):
e1011144
. .

Bull
 
JJ
.
1978
.
Sex chromosomes in haploid dioecy
.
Am Nat.
 
112
(
983
):
245
250
. .

Bulmer
 
MG
.
1980
.
The Mathematical Theory of Quantitative Genetics
.
Oxford
:
Oxford University Press
.

Bulmer
 
MG
.
1991
.
The selection-mutation-drift theory of synonomous codon usage
.
Genetics
.
129
(
3
):
897
907
. .

Burgarella
 
C
,
Brémaud
 
M-F
,
Von Hirschheydt
 
G
,
Viader
 
V
,
Ardisson
 
M
,
Santoni
 
S
,
Ranwez
 
V
,
de Navascués
 
M
,
David
 
J
,
Glémin
 
S
.
2024
.
Mating systems and recombination landscape strongly shape genetic diversity and selection in wheat relatives
.
Evol Lett.
 
8
(
6
):
866
880
. .

Campos
 
JL
,
Charlesworth
 
B
,
Haddrill
 
PR
.
2012
.
Molecular evolution in nonrecombining regions of the Drosophila genome
.
Genome Biol Evol.
 
4
(
3
):
278
288
. .

Campos
 
JL
,
Halligan
 
DL
,
Haddrill
 
PR
,
Charlesworth
 
B
.
2014
.
The relation between recombination rate and patterns of molecular evolution and variation in Drosophila melanogaster
.
Mol Biol Evol.
 
31
(
4
):
1010
1028
. .

Castellano
 
D
,
James
 
J
,
Eyre-Walker
 
A
.
2018
.
Nearly neutral evolution across the Drosophila melanogaster genome
.
Mol Biol Evol.
 
35
(
11
):
2685
2694
. .

Charlesworth
 
B
.
1994
.
The effect of background selection against deleterious alleles on weakly selected, linked variants
.
Genet Res.
 
63
(
3
):
213
228
. .

Charlesworth
 
B
.
2022
.
The effects of weak selection on neutral diversity at linked sites
.
Genetics
.
221
(
1
):iyac027. .

Charlesworth
 
D
.
2022
.
The mysterious sex chromosomes of haploid plants
.
Heredity (Edinb).
 
129
(
1
):
17
21
. .

Charlesworth
 
B
,
Betancourt
 
AJ
,
Kaiser
 
VB
,
Gordo
 
I
.
2010
.
Genetic recombination and molecular evolution
.
Cold Spring Harb Symp Quant Biol.
 
74
(
0
):
177
186
. .

Charlesworth
 
B
,
Campos
 
JL
.
2014
.
The relations between recombination rate and patterns of molecular variation and evolution in Drosophila
.
Annual Review of Genetics
.
48
(
1
):
383
403
. .

Charlesworth
 
B
,
Charlesworth
 
D
.
2000
.
The degeneration of Y chromosomes
.
Philos Trans R Soc Lond B Biol Sci.
 
355
(
1403
):
1563
2572
. .

Charlesworth
 
D
,
Charlesworth
 
B
,
Morgan
 
MT
.
1995
.
The pattern of neutral molecular variation under the background selection model
.
Genetics
.
141
(
4
):
1619
1632
. .

Charlesworth
 
B
,
Jensen
 
JD
.
2021
.
The effects of selection at linked sites on patterns of genetic variability
.
Ann Rev Ecol Evol Syst
.
52
(
1
):
177
197
. .

Charlesworth
 
B
,
Morgan
 
MT
,
Charlesworth
 
D
.
1993
.
The effect of deleterious mutations on neutral molecular variation
.
Genetics
.
134
(
4
):
1289
1303
. .

Charlesworth
 
D
,
Wright
 
SI
.
2001
.
Breeding systems and genome evolution
.
Curr Opn Genet Dev
.
11
(
6
):
685
690
. .

Comeron
 
JM
,
Kreitman
 
M
.
2002
.
Population, evolutionary and genomic consequences of interference selection
.
Genetics
.
161
(
1
):
389
410
. .

Comeron
 
JM
,
Kreitman
 
M
,
Aguadé
 
M
.
1999
.
Natural selection on synonymous sites is correlated with gene length and recombination in Drosophila
.
Genetics
.
151
(
1
):
239
249
. .

Comeron
 
J
,
Ratnappan
 
R
,
Bailin
 
S
.
2012
.
The many landscapes of recombination in Drosophila melanogaster
.
PLoS Genet.
 
8
(
10
):
e1002905
. .

Crow
 
JF
.
1993
. Mutation, mean fitness, and genetic load. In:
Futuyma
 
D
,
Antonovics
 
J
, editors.
Oxford surveys in evolutionary biology
. Oxford, UK: Oxford Univ Press. p.
3
42
.

Crow
 
JF
,
Denniston
 
C
.
1988
.
Inbreeding and variance effective numbers
.
Evolution
.
43
(
3
):
482
495
. .

Cvijvocic
 
I
,
Good
 
BH
,
Desai
 
MM
.
2018
.
The effect of strong purifying selection on genetic diversity
.
Genetics
.
209
(
4
):
1235
1278
. .

Daigle
 
A
,
Johri
 
P
.
2025
.
Hill-Robertson interference may bias the inference of fitness effects of new mutations in highly selfing species
.
Evolution
.
79
(
3
):
342
363
. .

Devi
 
A
,
Speyer
 
G
,
Lynch
 
M
.
2023
.
The divergence of mean phenotypes under directional selection
.
Genetics
.
224
(
3
):
iyad091
. .

Engelstädter
 
J
.
2008
.
Muller's ratchet and the degeneration of Y dhromosomes: a simulation study
.
Genetics
.
180
(
2
):
957
967
. .

Eyre-Walker
 
A
,
Keightley
 
PD
.
2009
.
Estimating the rate of adaptive mutations in the presence of slightly deleterious mutations and population size change
.
Mol Biol Evol.
 
26
(
9
):
2097
2108
. .

Fay
 
JC
,
Wu
 
CI
.
2000
.
Hitchhiking under positive Darwinian selection
.
Genetics
.
155
(
3
):
1405
1413
. .

Felsenstein
 
J
.
1974
.
The evolutionary advantage of recombination
.
Genetics
.
78
(
2
):
737
756
. .

Fu
 
L-Y
,
Wang
 
G-Z
,
Ma
 
B-G
,
Zhang
 
H-Y
.
2011
.
Exploring the common molecular basis for the universal DNA mutation bias
.
Biophys Biochem Res Commun
.
409
(
3
):
367
371
. .

Gilbert
 
KJ
,
Pouyet
 
F
,
Excoffier
 
L
,
Peischl
 
S
.
2020
.
Transition from background selection to associative overdominance promotes diversity in regions of low recombination
.
Curr Biol.
 
30
(
1
):
101
107
. .

Good
 
BH
,
Walczak
 
AM
,
Neher
 
RA
,
Desai
 
MM
.
2014
.
Genetic diversity in the interference selection limit
.
PLoS Genet.
 
10
(
3
):
e1004222
. .

Gordo
 
I
,
Charlesworth
 
B
.
2001
.
The speed of Muller's ratchet with background selection, and the degeneration of Y chromosomes
.
Genet Res.
 
78
(
2
):
149
162
. .

Gordo
 
I
,
Navarro
 
A
,
Charlesworth
 
B
.
2002
.
Muller's ratchet and the pattern of variation at a neutral locus
.
Genetics
.
161
(
2
):
835
848
. .

Graur
 
D
.
2016
.
Molecular and Genome Evolution
.
Sunderland, MA
:
Sinauer Associates
.

Griffiths
 
RC
,
Tavaré
 
S
.
1998
.
The age of a mutation in a general coalescent tree
.
Commun Stat Stochastic Models
.
14
(
1–2
):
273
295
. .

Haller
 
BC
,
Messer
 
PW
.
2023
.
SLiM 4: multispecies eco-evolutionary modeling
.
Am Nat.
 
201
(
5
):
E127
E139
. .

Hershberg
 
R
,
Petrov
 
DA
.
2008
.
Selection on codon bias
.
Annu Rev Genet.
 
42
(
1
):
287
299
. .

Hill
 
WG
,
Robertson
 
A
.
1966
.
The effect of linkage on limits to artificial selection
.
Genet Res.
 
8
(
3
):
269
294
. .

Hough
 
J
,
Hollister
 
JD
,
Wang
 
W
,
Barrett
 
SCH
,
Wright
 
SI
.
2014
.
Genetic degeneration of old and young Y chromosomes in the flowering plant Rumex hastatulus
.
Proc Natl Acad Sci U S A.
 
111
(
21
):
77137718
. .

Hough
 
J
,
Wang
 
W
,
Barrett
 
SCH
,
Wright
 
SI
.
2017
.
Hill-Robertson interference reduces genetic diversity on a young plant Y-chromosome
.
Genetics
.
207
(
2
):
685
695
. .

Johnson
 
T
,
Barton
 
NH
.
2002
.
The effect of deleterious alleles on adaptation in asexual populations
.
Genetics
.
162
(
1
):
395
411
. .

Johri
 
P
,
Charlesworth
 
B
,
Jensen
 
JD
.
2020
.
Toward an evolutionarily appropriate null model: Jointly inferring demography and purifying selection
.
Genetics
.
215
(
1
):
173
192
. .

Kaiser
 
VB
,
Charlesworth
 
B
.
2009
.
The effects of deleterious mutations on evolution in non-recombining genomes
.
Trends Genet
.
25
(
1
):
9
12
. .

Kaiser
 
VB
,
Charlesworth
 
B
.
2010
.
Muller's ratchet and the degeneration of the Drosophila miranda neo-Y chromosome
.
Genetics
.
185
(
1
):
339
348
. .

Kim
 
B
,
Huber
 
CD
,
Lohmueller
 
KE
.
2017
.
Inference of the distribution of selection coefficients for new nonsynonymous mutations using large samples
.
Genetics
.
206
(
1
):
345
361
. .

Kim
 
Y
.
2004
.
Effect of strong directional selection on weakly selected mutations at linked sites: Implication for synonymous codon usage
.
Mol Biol and Evol
.
21
(
2
):
286
294
. .

Kimura
 
M
.
1964
.
Diffusion models in population genetics
.
J App Prob
.
1
(
2
):
177
223
. .

Kimura
 
M
.
1971
.
Theoretical foundations of population genetics at the molecular level
.
Theor Popul Biol
.
2
(
2
):
174
208
. .

Kousathanas
 
A
,
Keightley
 
PD
.
2013
.
A comparison of models to infer the distribution of fitness effects of new mutations
.
Genetics
.
193
(
4
):
1197
1208
. .

Lain
 
VN
,
Sackton
 
TB
,
Meselson
 
M
.
2022
.
Genomic signature of sexual reproduction in the bdelloid rotifer Macrotrachella quadricornifera
.
Genetics
.
220
(
2
):
iyab221
. .

Langley
 
SA
,
Karpen
 
GH
,
Langley
 
CH
.
2014
.
Nucleosomes shape DNA polymorphism and divergence
.
PLoS Genet.
 
10
(
7
):
e1004457
. .

Li
 
W-H
.
1987
.
Models of nearly neutral mutations with particular implications for non-random usage of synonymous codons
.
J Mol Evol.
 
24
(
4
):
337
345
. .

Mahajan
 
S
,
Wei
 
KH-C
,
Nalley
 
MJ
,
Gibilisco
 
L
,
Bachtrog
 
D
,
Tyler-Smith
 
C
.
2018
.
De novo assembly of a young Drosophila Y chromosome using single-molecule sequencing and chromatin conformation capture
.
PLOS Biol
.
16
(
7
):
e2006348
. .

McVean
 
GAT
,
Charlesworth
 
B
.
1999
.
A population genetic model for the evolution of synonymous codon usage: patterns and predictions
.
Genet Res.
 
74
(
2
):
145
158
. .

McVean
 
GAT
,
Charlesworth
 
B
.
2000
.
The effects of Hill-Robertson interference between weakly selected mutations on patterns of molecular evolution and variation
.
Genetics
.
155
(
2
):
929
944
. .

Melissa
 
MJ
,
Good
 
BH
,
Fisher
 
DS
,
Desai
 
MM
.
2022
.
Populationn genetics of polymorphism and divergence in rapidly evolving populations
.
Genetics
.
221
(
4
):iyac053. .

Misra
 
S
,
Crosby
 
MA
,
Mungall
 
CJ
,
Matthews
 
BB
,
Campbell
 
KS
,
Hradecky
 
P
,
Huang
 
Y
,
Kaminker
 
JS
,
Millburn
 
GH
,
Prochnik
 
SE
, et al.  
2002
.
Annotation of the Drosophila melanogaster euchromatic genome: a systematic review
.
Gen Biol
.
3
(
12
):Research0083. .

Nei
 
M
.
1987
.
Molecular Evolutionary Genetics
.
New York
:
Columbia University Press
.

Nicolaisen
 
LE
,
Desai
 
M
.
2013
.
Distortions in genealogies due to purifying selection and recombination
.
Genetics
.
195
(
1
):
221
230
. .

O'Fallon
 
BD
,
Seger
 
J
,
Adler
 
FR
.
2010
.
A continuous-state coalescent and the impact of weak selection on the structure of gene genealogies
.
Mol Biol Evol.
 
27
(
5
):
1162
1172
. .

Olito
 
C
,
Ponnikas
 
S
,
Hansson
 
B
,
Abbott
 
JK
.
2024
.
Consequences of partially recessive deleterious genetic variation for the evolution of inversions suppressing recombination between sex chromosomes
.
Evolution
.
79
(
8
):
1499
1510
. .

Orr
 
HA
,
Kim
 
Y
.
1998
.
An adaptive hypothesis for the evolution of the Y chromosome
.
Genetics
.
150
(
4
):
1693
1698
. .

Palsson
 
G
,
Hardarson
 
MT
,
Jonsson
 
H
,
Steinthorsdottir
 
V
,
Stefansson
 
OA
,
Eggertsson
 
HP
,
Gudjonsson
 
SA
,
Olason
 
PI
,
Gylfason
 
A
,
Masson
 
G
, et al.  
2025
.
Complete human recombination maps
.
Nature
.
639
:
700
707
. .

Peck
 
J
.
1994
.
A ruby in the rubbish: beneficial mutations, deleterious mutations, and the evolution of sex
.
Genetics
.
137
(
2
):
597
606
. .

Polanski
 
A
,
Bobrowski
 
A
,
Kimmel
 
M
.
2002
.
A note on distributions of times to coalescence, under time-dependent population size
.
Theor Popul Biol
.
63
(
1
):
33
40
. .

Polanski
 
A
,
Kimmel
 
M
.
2003
.
New explicit expressions for relative frequencies of single nucleotide polymorphisms with application to statistical inference on population growth
.
Genetics
.
165
(
1
):
427
436
. .

Price
 
MN
,
Arkin
 
AP
.
2015
.
Weakly deleterious mutations and low rates of recombination limit the impact of natural selection on bacterial genomes
.
mBio
.
6
(
6
):
e01302
e01315
. .

Qiu
 
S
,
Zeng
 
K
,
Slotte
 
T
,
Wright
 
SI
,
Charlesworth
 
D
.
2011
.
Reduced efficacy of natural selection on codon usage bias in selfing Arabidopis and Capsella species
.
Genome Biol Evol.
 
3
:
868
880
. .

Robertson
 
A
.
1961
.
Inbreeding in artificial selection programmes
.
Genet Res.
 
2
(
2
):
189
194
. .

Roze
 
D
.
2021
.
A simple expression for the strength of selection on recombination generated by interference among mutations
.
Proc Natl Acad Sci U S A.
 
118
(
19
):e2022805118. .

Santiago
 
E
,
Caballero
 
A
.
1998
.
Effective size and polymorphism of linked neutral loci in populations under selection
.
Genetics
.
149
(
4
):
2105
2117
. .

Santiago
 
E
,
Caballero
 
A
.
2016
.
Joint prediction of the effective population size and the rate of fixation of deleterious mutations
.
Genetics
.
204
(
3
):
1267
1278
. .

Schaeffer
 
SW
.
2002
.
Molecular population genetics of sequence length diversity in the adh region of Drosophila pseudoobscura
.
Genet Res.
 
80
(
3
):
163
175
. .

Seger
 
J
,
Smith
 
WA
,
Perry
 
JJ
,
Hunn
 
J
,
Kaliszewska
 
ZA
,
Sala
 
LL
,
Pozzi
 
L
,
Rowntree
 
VJ
,
Adler
 
FR
.
2010
.
Gene genealogies distorted by weakly interfering mutations in constant environments
.
Genetics
.
184
(
2
):
529
545
. .

Sharp
 
PM
,
Bailes
 
E
,
Grocock
 
RJ
,
Peden
 
JF
,
Sockett
 
RE
.
2005
.
Variation in the strength of selected codon usage bias among bacteria
.
Nucleic Acids Res.
 
33
(
4
):
1141
1153
. .

Sharp
 
PM
,
Emery
 
LR
,
Zeng
 
K
.
2010
.
Forces that influence the evolution of codon bias
.
Philos Trans R Soc Lond B Biol Sci.
 
365
(
1544
):
1203
1212
. .

Strütt
 
S
,
Excoffier
 
L
,
Peischl
 
S
.
2025
.
A generalized structured coalescent for purifying selection without recombination
.
Genetics
.
iyaf013
. .

Szövényi
 
P
,
Ullrich
 
KK
,
Rensing
 
SA
,
Lang
 
D
,
Van Gessel
 
N
,
Stenøien
 
HK
,
Conti
 
E
,
Reski
 
R
.
2011
.
Selfing in haploid plants and efficacy of selection: codon usage bias in the model mossPhyscomitrella patens
.
Genome Biol Evol.
 
9
(
6
):
1528
1546
. .

Tachida
 
H
.
2000
.
DNA evolution under weak selection
.
Gene
.
261
(
1
):
3
9
. .

Tajima
 
F
.
1989
.
Statistical method for testing the neutral mutation hypothesis
.
Genetics
.
123
(
3
):
585
595
. .

Torgasheva
 
AA
,
Malinovskaya
 
LP
,
Zadesenets
 
KS
,
Karamysheva
 
TV
,
Kizilova
 
EA
,
Akberdina
 
EA
,
Pristyazhnyuk
 
IE
,
Shnaider
 
EP
,
Volodkina
 
VA
,
Saifitdinova
 
AF
, et al.  
2019
.
Germline-restricted chromosome (GRC) is widespread among songbirds
.
Proc Natl Acad Sci U S A.
 
116
(
24
):
11845
11850
. .

Vakhrusheva
 
OA
,
Mnatsakanova
 
EA
,
Galimov
 
YR
,
Neretina
 
TV
,
Gerasimov
 
ES
,
Naumenko
 
SA
,
Ozerova
 
SG
,
Zalevsky
 
AO
,
Yushenova
 
IA
,
Rodriguez
 
F
, et al.  
2020
.
Genomic signatures of recombination in a natural population of the bdelloid rotifer Adineta vaga
.
Nat Commun.
 
11
(
1
):
6421
. .

Walczak
 
AM
,
Nicolaisen
 
LE
,
Plotkin
 
JB
,
Desai
 
M
.
2012
.
The structure of geneaologies in the presence of purifying selection: a fitness-class coalescent
.
Genetics
.
190
(
2
):
753
779
. .

Watterson
 
GA
.
1975
.
On the number of segregating sites in genetical models without recombination
.
Theor Popul Biol
.
7
(
2
):
256
276
. .

Weissman
 
DB
,
Barton
 
NH
.
2012
.
Limits to the rate of adaptive substitution in sexual populations
.
PLoS Genet.
 
8
(
6
):
e1002740
. .

Welch
 
JJ
,
Eyre-Walker
 
A
,
Waxman
 
D
.
2008
.
Divergence and polymorphism under the nearly neutral theory of molecular evolution
.
J Mol Evol.
 
67
(
4
):
418
426
. .

Zeng
 
K
,
Charlesworth
 
B
.
2011
.
The joint effects of background selection and genetic recombination on local gene genealogies
.
Genetics
.
189
(
1
):
251
266
. .

Zhao
 
L
,
Charlesworth
 
B
.
2016
.
Resolving the conflict between associative overdominance and background selection
.
Genetics
.
203
(
3
):
1315
1334
. .

Appendix

1. The effects of LD

For a haploid population, the total additive variance in fitness, Va, is equal to Vg plus twice the sum of terms involving the covariances in fitness between all pairs of sites (Bulmer 1980). For a pair of sites i and j, the covariance is Cij=Dijs2, where Dij is the coefficient of LD for this pair of sites, giving a contribution to Va of 2 Dijs2. In addition, the existence of this LD means that each selected site experiences an additional selective force caused by the other loci under selection. For site i, selection at site j induces a change in allele frequency of δqij=sDij (Zhao and Charlesworth 2016). This corresponds to an additional selection coefficient at site i, which will affect the expected diversity at site, with L−1 contributions of this type from sites other than i. At first sight, this suggests that it is virtually impossible to obtain an expression for the expectation of Va, since the Dij cannot easily be found analytically except for limiting cases such as complete neutrality. But the following argument shows that insights can be obtained without exact knowledge of the Dij.

Simulations of blocks of nonrecombining, haploid loci show that HRI involves negative LD between deleterious mutations (e.g. McVean and Charlesworth 2000; Comeron and Kreitman 2002), implying that the effective selection coefficient against a deleterious mutation at a given site is reduced in magnitude compared with the selection coefficient in the absence of LD. This reduction provides an alternative perspective on HRI; the reduction in Nes caused by HRI is equivalent to this reduction in the effective selection coefficient (Zhao and Charlesworth 2016). However, inspection of Equation (2a) shows that the expected diversity at a site i under selection (πsel ij) is the product of 2u/s and a function of Nes; only the former term need be considered further, since the effect of HRI on the latter is absorbed into Nes.

A small alteration δs in s results in an approximate change of (δs)πsel ij/s in πsel i, where s is the selection coefficient in the absence of HRI effects. From the selection equation Δq ≈—spq, the change in s at site i due to selection at site j is δssel ij = – sDij/(piqi), where qi is the frequency of the deleterious allele at site i and pi = 1­ –qi since the corresponding change in allele frequency at site i is—sDij, as stated in the previous paragraph. The expected change in diversity at site i due to selection at another site j is thus:

(A1a )

The corresponding expected change in the genic variance at site i is given by the product of s2 and one-half of this quantity:

(A1b )

The net change in the genic variance at site i is given by summing over all ji.

There is an identical expected change to the genic variance at site j due to selection at site i, E{δVgji}. The corresponding contribution to the net expected additive genetic variance arising from LD between sites i and j is thus 2E{Dij}s2=E{δVgij}E{δVgji}. If second-order terms in δs are neglected, the effect of LD associated with HRI on the sum of the genic variances across all sites approximately cancels the effect of the LD-induced covariance terms on the total additive genetic variance, given by the sum of 2 Dijs2 over all pairs of sites i and j.

2. The neutral SFS

Let f(i) be the probability that a derived neutral variant at a site is present in i copies in a sample of size k. Knowledge of f(i) provides an alternative method of determining Δθw, without using any information about overall diversity statistics. Using the correction for bias in estimating pairwise diversity (Nei 1987, p. 178), the conditional pairwise diversity at segregating sites is given by:

(A2a )

If Ps is the probability that a site is segregating, the theoretical values of θw and unconditional pairwise diversity are given by θw=Ps/ak and π=Psπc. It follows that:

(A2b )

If k = 2, only i = 1 contributes to Equation (A2a), so that πc = 1; in this case, ak = 1 and so Δθw = 0. If the sample contains only singletons (i = 1), for k > 2 the maximum value of Δθw is given by:

(A2c )

The ratio Δθw/Δθwm gives a standardized measure of the degree of distortion of the SFS toward rare variants, which we denote by Δθw.

Author notes

Conflicts of interest: The authors declare no conflict of interest.

This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
Editor: D Roze
D Roze
Editor
Search for other works by this author on:

Supplementary data