Evading evolution of resistance to gene drives

Gene drives offer the possibility of altering and even suppressing wild populations of countless plant and animal species, and CRISPR technology now provides the technical feasibility of engineering them. However, population-suppression gene drives are prone to select resistance, should it arise. Here we develop mathematical and computational models to identify conditions under which suppression drives will evade resistance, even if resistance is present initially. We show that linkage between the resistance and drive loci is critical to the evolution of resistance and that evolution of resistance requires (negative) linkage disequilibrium between the two loci. When the two loci are unlinked or only partially so, a suppression drive that causes limited inviability can evolve to fixation while causing only a minor increase in resistance frequency. Once fixed, the drive allele no longer selects resistance. Our analyses suggest that among gene drives that cause moderate suppression, toxin-antidote systems are less apt to select for resistance than homing drives. Single drives of this type would achieve only partial population suppression, but multiple drives (perhaps delivered sequentially) would allow arbitrary levels of suppression. The most favorable case for evolution of resistance appears to be with suppression homing drives in which resistance is dominant and fully suppresses transmission distortion; partial suppression by resistance heterozygotes or recessive resistance work against resistance evolution. Given that it is now possible to engineer CRISPR-based gene drives capable of circumventing allelic resistance, this design may allow for the engineering of suppression gene drives that are effectively resistance-proof.


Introduction 1
The ability to engineer gene drives has recently become feasible 2 for countless species, especially insects, many of which carry 3 disease or eat crops (Gantz and Bier 2015;Champer et al. 2016; Gene drive engineering has been profoundly facilitated by 23 CRISPR technology (Gantz and Bier 2015;Champer et al. 2016;24 Oberhofer et al. 2019). CRISPR uses an RNA-directed nuclease 25 to cut specific sites in DNA, enabling a variety of gene drive 26 designs from homing endonucleases that operate by segrega-27 tion distortion to killer-rescue and toxin-antidote systems that 28 destroy sensitive alleles. The most obvious form of resistance 29 to CRISPR-based drives is mutation in the target sequence at 30 which the nuclease cuts (Burt 2003;Unckless et al. 2015Unckless et al. , 201731 Champer et al. 2017). However, contrary to early suspicions that 32 all possible target sites would be prone to such mutations (e.g., 33 Drury et al. 2017), at least one essential genomic site has been 34 identified in mosquitoes that is intolerant of mutations and thus 35 appears to escape this problem (Kyrou et al. 2018). Furthermore, 36 CRISPR-based gene drives can be designed to cut at multiple sites in the same gene (any of which suffice to allow the drive to distortion operates at the A/a locus, resistance to segregation 48 distortion operates at the R/r locus (Table 1). 49 Fitness effects are expressed in the gamete stage, with only 50 the gene drive allele having an effect (Table 2)  The non-drive allele is depressed by segregation distortion but bolstered by its fitness advantage In the absence of resistance alleles, the net effect of each life cycle is to further depress Diploid genotype Gametes produced Aa rr Ar Aa Rr ar, aR, Ar, AR Aa RR AR, aR Table 1 Segregation ratios for the three possible genotypes that are heterozygous at the drive locus. Only the genotype of the top row experiences segregation distortion, as any genome with allele R suppresses distortion. All of the gamete types listed for a diploid are produced in equal abundance; the double heterozygote produces all 4 in equal abundance because of the absence of linkage between the two loci. Segregation is Mendelian for the 6 other possible diploid genotypes not shown here -those involving AA and aa homozygotes at the drive locus.
Gamete type Fitness  the a allele frequency. But this net effect is the outcome of the opposing effects of segregation distortion and gametic fitness. Assuming transmission occurs prior to selection against the drive allele A, the frequency of allele a after transmission but before selection (p * a ) as a function of its frequency at the start of the generation (p a ) is where H rr is the frequency of diploid genotypes that are 56 resistance-free and heterozygous for the drive (Aarr). The term 57 H rr /2 describes the depression of allele a from 100% segregation 58 distortion. Note that resistance will do nothing to blunt this 59 negative effect, other than perhaps to reduce H rr .

60
Selection against the drive subsequently counters this. The frequency of a is increased from p * a by the factor 1/w, where p a denotes the frequency of a at the start of the next gen-61 eration and w = 1 − (1 − p * a )s A is mean fitness, necessarily less 62 than 1 when the drive allele is present. The term in parentheses 63 in (2) describes the gain of allele a from its fitness advantage 64 over A. The loss in (1) is always larger than the gain (2) when 65 the drive allele is spreading. However, it is the separation of 66 these effects that is critical to understanding the evolution of 67 non-allelic resistance.

68
Resistance has no advantage in the transmission stage This point is perhaps the most surprising, but is easy to understand. Considering the resistance allele R before and after the segregation distortion stage, there is no change in its frequency: (Appendix, eq. 14 with δ Rr = 0). This result may be understood Given that neither R nor r has an effect on gametic fitness, 10 there is no intrinsic advantage that one allele has over the other 11 during the transmission stage. However, the relative benefit of 12 resistance changes when we consider associations among loci, 13 as next.
14 Synergy between resistance and fitness of the non-drive allele 15 The foregoing two sections establish that the non-drive allele a 16 gains in survival over A each generation but loses potentially 17 more from segregation distortion. By itself, the resistance allele 18 has no fitness advantage and also does not benefit by suppress-

19
ing distortion at the other locus. Both outcomes change when 20 we consider resistance combined with the fitness advantage of 21 the non-drive allele. The two alleles synergize when they oc-22 cur in the same genotype. When associated with R, the a allele 23 is emancipated from its loss in segregation distortion because 24 segregation distortion is abolished; now its only deviation from 25 neutrality is the gain due to its intrinsic fitness advantage in 26 gametes. In turn, allele R gains by segregating with a, experienc-27 ing the fitness advantage a has in gametes. Thus, both R and a 28 benefit from their co-occurrence, each allele providing a benefit 29 to the other.

30
For this association to provide a net benefit across the population, the alleles of the different loci must be statistically associated. In population genetics terms, there must be (negative) linkage disequilibrium, which can be defined as where x aR is the frequency of gametes that contain both a and 31 R; D < 0 indicates that a and R occur together more frequently 32 than at random. Note that D is bounded below by the larger of 33 −(1 − p a )p R and −p a (1 − p R ) (e.g., Crow and Kimura 1970).

34
This lower bound to the strength of the association between a 35 and R approaches zero in particular as the frequency of the drive 36 allele A approaches 1.

37
During transmission, free recombination halves the magnitude of D but, separately, segregation distortion forces the association between a and R closer. Specifically, the disequilibrium after transmission, D * , becomes The subtracted term shows that the amount of association cre-38 ated by gene drive increases with the frequencies of resistance-39 free drive heterozygotes (H rr ) and the resistance allele itself. If 40 D is initially positive, recombination and segregation distortion 41 work in concert to reduce it. Once D is negative, the two forces 42 work in opposition with segregation distortion exaggerating and 43 recombination eroding the association between a and R.

44
The statistical association between a and R has consequences for R when a is selected. After selection its frequency becomes This result shows that negative linkage disequilibrium (D * < 0) 45 is required for R to rise in frequency -that a and R must exist 46 together in the same gamete more often than random. Thus, 47 non-allelic resistance increases by genetic hitchhiking (May-48 nard Smith and Haigh 1974).

49
To summarize, this simple model highlights three main fea-50 tures: Segregation distortion has no direct effect on resistance 51 (unless allelic to the drive), segregation distortion directly pro-52 motes a tighter statistical association between a and R, and alle-53 les a and R must co-occur in gametes more often than at random 54 (i.e., negative linkage disequilibrium is required) for resistance 55 to rise in frequency via hitchhiking.

56
Below and in the Appendix, we consider more complex sce-57 narios with incomplete segregation distortion, partial domi-58 nance, linkage between drive and resistance loci, costly resis-59 tance, sex-limited and toxin-antidote gene drives, and diploid 60 selection, including lethal recessive drives. These models show 61 that the scope for resistance evolution increases with more ef-62 fective segregation distortion, closer linkage, and less costly  Evading resistance evolution is possible and practical 67 The preceding results facilitate developing protocols for suppres-68 sion drives that diminish the evolution of resistance. Prior to 69 exploring whether such protocols are theoretically feasible, we 70 offer several points to serve as intuitive guiding principles.     The analytical results above assume segregation distortion is 100%, resistance is dominant and complete, and that the drive and resistance loci are unlinked. Relaxing these assumptions has the following consequences. (See Appendix for all derivations.) First, following transmission, the wild type drive allele frequency is where H is the total frequency of drive heterozygotes Aa and 93 δ is the average distortion across all resistance locus genotypes 94 (0 ≤ δ ≤ 1 2 ). Equation (7)  Second, transmission can directly change the frequency of the resistance allele if heterozygous resistance is incomplete. Assuming random mating, this frequency is where c is the recombination rate between the drive and resis-1 tance loci and δ Rr is the drive segregation distortion allowed 2 by resistance heterozygotes (0 ≤ δ Rr ≤ 1 2 ). Equation (8) shows 3 that transmission will have no effect on the frequency of re-4 sistance (i.e., p * R = p R ; cf. eq. 3) unless three conditions are 5 simultaneously fulfilled: the loci must be linked (c < 1 2 ), in link-6 age disequilibrium (D = 0), and resistance heterozygotes allow 7 some segregation distortion (δ Rr > 0). Note too from (8) that p * R 8 is not influenced by segregation distortion in rr or RR resistance 9 genotypes; this is a completely general result (see Appendix).
Since D can be positive or negative, (8) shows that transmission 11 might increase or decrease the frequency of R.

12
As above, transmission can change D itself. Assuming random mating, the disequilibrium after transmission becomes in the absence of resistance (δ rr = 1 2 ).

20
Equation (9) shows that partial resistance can affect disequi-  Numerical studies 42 We offer numerical analyses of models that fulfil many of the 43 preceding points (code available in Supplemental File S1). These Frequency dynamics are shown in Figure 2 for a single trial. 89 Resistance experiences a modest gain during gene drive evolu-90 tion (red curve), reflecting the temporary evolution of negative 91 linkage disequilibrium (both gray curves, separate for males 92 and females). The increase is not enough to halt drive fixation 93 but would be permanent in the absence of a cost to resistance. 94 Costless resistance would therefore 'ratchet' itself up during the 95 process, even when it does not ultimately block fixation of the 96 drive, but any cost would continually bring resistance down 97 between gene drive introductions.

98
The parameter space allowing resistance-free evolution of a 99 homing male-drive is narrowed somewhat if fitness of the Rr 100 heterozygote is intermediate between that of the two homozy-101 gotes (Fig. 3). The zone in which cost-free resistance does not 102 evolve is necessarily not affected; the difference between the 103 two cases is most pronounced with partial recombination (gray 104 curve). In the region where cost-free resistance does evolve, re-105 sistance evolves more readily with semi-dominant fitness effects 106 (here) than with dominant ones (Fig. 1). 107 2-sex homing drive, dominant resistance. 2-sex drives are now 108 easily implemented, so it is valuable to analyze that case for 109 comparison to male-only drive (Fig. 4). The qualitative patterns 110 Resistance-free zone Figure 1 Evolution of (dominant) resistance to a male-only drive: numerical results. The drive allele can evolve to fixation despite the presence of resistance alleles in the population, depending on the magnitude of the suppression (fitness of drive homozygotes) and on the cost of resistance. Gene drives evolve to fixation above and to the left of the curves, in the 'resistance-free' zone, at which point they are no longer subject to suppression. The baseline trial (blue) assumes drive is 100% efficient, the drive allele starts at frequency 0.005, resistance at 0.015. The drive allele impairs fitness of homozygotes of both sexes, their fitness being 1 − s AA . Resistance to the drive's segregation distortion is unlinked (except for the dashed gray curve), dominant, complete, and impairs fitness of Rr and RR genotypes as 1 − s R . For the baseline case, even cost-free resistance does not evolve until s AA > 0.3, and at higher values of s AA , resistance evolves and blocks fixation of the drive allele only if cost of resistance (s R ) is sufficiently low. Other curves deviate from the baseline case in single respects: (red) the fitness effect of drive homozygotes s AA is experienced only in females; (thick black) the initial frequency of resistance is increased to 0.05; (thin black) the initial frequency of resistance is increased to 0.15; (gray) the drive and resistance loci are linked with a recombination rate of 0.25. Although not shown, the line is flat out to s AA = 0.54 if the fitness effect s AA is assigned only to males instead of to females. All trials were run at least 700 generations; values of s AA were incremented by 0.03, s R by 0.025. Trials initiated R and A in separate individuals (negative disequilibrium), but limited simulations suggested the curves are not sensitive to this aspect of initial conditions.
are surprisingly similar up to the drive's fitness effect of s AA =  However, the higher s AA values more strongly select resistance, 6 so a 2-sex drive does not obviously provide a more practical 7 solution to resistance-free evolution than does a 1-sex drive. Resistance-free zone Figure 3 The evolution of dominant resistance is less hindered when the fitness effect of the resistance allele is additive (Rr has fitness 1 − s R 2 , RR has fitness 1 − s R ). Otherwise as in Fig. 1: the drive operates in males only. male-drive case of Figure 1 was studied for the case of reces-15 sive resistance -only the RR genotype suppressed drive, but 16 the fitness effect of 1 − s R applied to heterozygoes and homozy-17 gotes (Fig. 5). There is a profound effect of dominance versus  Resistance-free zone Figure 5 Evolution of recessive resistance to a male-only drive. Evolution of resistance is profoundly suppressed compared to the dominant-resistance case. Otherwise as in Fig. 1, except that only one (extreme) boost in initial frequency for R is considered, and even it has only a modest effect. of the gene drive from the lethality.

10
A simple version of a toxin-antidote system is embodied by 11 a recently invented design known as 'Cleave-Rescue' (ClvR).

12
Using CRISPR technology, the ClvR element is engineered as a 13 two-component system: (i) a Cas-9 nuclease targeted to destroy 14 all native copies of an essential gene in the genome, and (ii) a 15 recoded version of the essential gene that is protected against 16 cleavage. The ClvR element -containing both components -17 may be inserted anywhere in the genome; it is easily engineered 18 to be located at a site remote from and unlinked to the target 19 gene, but 'same-site' designs are also possible (e.g., Champer 20 et al. 2020a). 21 The ClvR element (denoted here as C, its absence being de-22 noted c) spreads in an analogy to bootstrapping -it creates and 23 disseminates a genetic 'poison' for which it provides the only 24 antidote. This poison is merely the destruction of the target gene 25 (the essential wild-type allele T is converted to a null allele t). 26 As the null t alleles are introduced and spread in the popula-27 tion, they eventually form tt homozygotes which die unless the 28 genome also carries at least one C allele (the antidote). Different 29 variations of this theme can be engineered, but we will consider 30 the case in which all genotypes have normal fitness except cctt, 31 which dies.

32
While C is rare, tt is also very rare (under random mating), so

64
The evolution of resistance is no longer intuitive when C 65 affects the fitness of its carriers and when resistance is non-allelic. 66 First note that resistance to a ClvR system is merely a block to the 67 'toxic' property of ClvR -a block to the conversion of T to t. If C 68 fully evades resistance evolution, then the population evolves 69 to the complete loss of T. Once t is fixed, there is no further 70 cleave activity on which selection for resistance may act. At this 71 endpoint, cc becomes a universally lethal genotype, and if Cc 72 has higher fitness than CC, the C locus will evolve to a stable 73 polymorphism.

74
In the spirit of previous sections, we let CC genotypes have 75 fitness 1 − s CC , and Cc genotypes have fitness 1, regardless of 76 genotype at the target locus. cc genotypes have fitness 1 except 77 in tt genotypes, when they are inviable. Resistance allele R is 1 dominant in its blocking effect and in its fitness effect (1 − s R ), 2 paralleling the cases in Figs. 1 and 4. Fig. 6 shows the parameter 3 space of s R and s CC that evades resistance evolution (for which 4 t goes to fixation). For fitness effects above 0.2, the ClvR system 5 is far less prone to evolve resistance than are the homing drive 6 systems. Evolution of dominant resistance to a toxin-antidote drive (ClvR) with no sex differences: resistance evolves only with a large fitness effect of the drive and low cost to resistance. Initial frequencies of the killer (K 0 ) and resistance alleles (R 0 ) are given in the key. The curves show the boundary of parameter values for which the wild-type target allele is fully converted by (ClvR) after 1,000 generations. s CC refers to the fitness suppression of drive homozygotes (genotype CC), s R to the fitness suppression of genotypes with one or two copies of the resistance allele.

8
In theory, gene drives offer an ability to quickly alter the genetics 9 of populations, with two important uses: genetic modification 10 or suppression of population numbers (Sandler and Novitski 11 1957;Hamilton 1967;Lyttle 1977;Burt 2003;Gould et al. 2006, 12 2008; Gould 2008 Maxwell 2018). These various mechanisms will not typically be 49 allelic to the drive and may even be fully unlinked (Lyttle 1981;50 Champer et al. 2019). Some may also entail a (large) fitness cost 51 (Lyttle 1981).

52
As shown here, the evolution of non-allelic resistance can 53 be avoided with drives that impose only a modest fitness cost 54 on the organism, provided resistance is not initially common 55 in the population. Drives that push population fitness to zero, 56 which are tempting because of their assured population extinc-57 tion (e.g., Lyttle 1977;Kyrou et al. 2018) will select resistance -58 but only if resistance arises (Lyttle 1979(Lyttle , 1981. Non-allelic re-59 sistance evolves by genetic hitchhiking from (negative) linkage 60 disequilibrium with the drive locus. If the drive allele can evolve 61 quickly to fixation then D will decay rapidly to zero, preclud-62 ing further increases in resistance. This finding appears to be 63 general, transcending 1-sex and 2-sex homing drives and toxin-64 antidote systems, with various assumptions about dominance 65 and fitness effects. There are obviously many other combina-66 tions of effects to analyze, but a gene drive evading non-allelic 67 resistance is clearly a possibility for many systems.

68
One ramification of these findings is that modification drives 69 with unintended (mild) fitness consequences will not likely fail The most assured design for a single partial-suppression 80 drive is to target it to destroy a gene that is haplo-sufficent but introduce additional drives. Implementations of resistance-free 49 drives will thus need to be monitored for long term effects.

50
As with most studies of gene drive resistance, our models linkage disequlibrium are, respectively, p A = 1 − p a = x 10 + x 11 , 81 p R = 1 − p r = x 01 + x 11 , and D = x 00 x 11 − x 01 x 10 . Note that we 82 use a different notation for the haploid frequencies in the main 83 text that is easier to grasp but less convenient for derivations; the 84 equivalencies are as follows: x 00 ↔ x ar , x 01 ↔ x aR , x 10 ↔ x Ar , 85 and x 11 ↔ x AR .

94
After transmission, the frequencies of gametes produced by the diploids are where Z cs 11 and Z tr 11 are the frequencies of 'cis' (AR/ar) and 'trans' (Ar/aR) conformation double heterozygotes, respectively. Using these equations, it can be shown that where H = Z 10 + Z 11 + Z 12 and δ = (δ rr Z 10 + δ Rr Z 11 + 95 δ RR Z 12 )/H, and If we now assume the population was formed by random union of gametes, then with Z cs 11 = 2x 00 x 11 and Z tr 11 = 2x 01 x 10 . In this case, 14) and the gamete recursions are equivalent to x * 10 = x 10 + cD + 2δ rr x 00 x 10 + 2δ Rr (x 01 x 10 + cD) (15c) Equations (15) can also be used to show that disequilibrium after transmission is where Model of costly resistance: gametic selection 1 Consider the model described in the previous subsection but 2 assume multiplicative fitnesses at both the drive and resistance 3 loci such that each copy of the drive allele A reduces an individ-4 ual's fitness by the multiplicative factor 1 − s A and each copy of 5 the resistance allele R reduces fitness by the factor 1 − s R . This is 6 called gametic selection (e.g., Lewontin 1970). The evolutionary 7 dynamics of independent loci with gametic selection are mathe-8 matically equivalent those resulting from diploid multiplicative 9 fitnesses (e.g., Crow and Kimura 1970).

10
Using (14), (15), (16), and (17), selection after transmission results in the following haplotype frequencies at the start of the next generation: These expressions can be used to show that the corresponding frequencies of the drive and resistance alleles are If resistance has no cost (s R = 0) and is complete and dominant (i.e., δ RR = δ Rr = 0), equations (19) are equivalent to equations (2) and (6). However, selection against resistance, s R > 0, could indirectly promote the drive allele A if the loci have a negative association (second term in eq. 19a). For s A > 1 2 , it is possible to prove that this model has an unstable equilibrium with no resistance (p R = 0) and polymorphism for the drivep Model of a lethal gene drive with linked resistance 11 In this subsection, we describe a model similar to the above except that the gene drive is recessive lethal. Assuming death occurs after diploids are formed by random mating as in (13) but before transmission, the post-death frequencies (indicated by the symbol #) are Z # 20 = Z # 21 = Z # 22 = 0 and, for i = 0, 1, where the fraction surviving. From this it can be shown that Equation (24) shows that when D < 0 the benefit to resistance 12 from lethality increases with the frequency of the drive allele.

Note from (25) that
and where Z # 10 = Z 10 /w = 2x 00 x 10 /w is the frequency of resistance-14 free heterozygotes among survivors. Equations (27) and (24)   15 show that even with a recessive lethal gene drive, the evolution-16 ary spread of perfect, dominant resistance is completely deter-17 mined by genetic hitchhiking with the wild-type drive allele 18 during selection. toxin-antidote CRISPR gene drive system for regional popula-   Vella, M. R., C. E. Gunning, A. L. Lloyd, and F. Gould, 2017 29 Evaluating strategies for reversing CRISPR-Cas9 gene drives.