cifB-transcript levels largely explain cytoplasmic incompatibility variation across divergent Wolbachia

Abstract Divergent hosts often associate with intracellular microbes that influence their fitness. Maternally transmitted Wolbachia bacteria are the most common of these endosymbionts, due largely to cytoplasmic incompatibility (CI) that kills uninfected embryos fertilized by Wolbachia-infected males. Closely related infections in females rescue CI, providing a relative fitness advantage that drives Wolbachia to high frequencies. One prophage-associated gene (cifA) governs rescue, and two contribute to CI (cifA and cifB), but CI strength ranges from very strong to very weak for unknown reasons. Here, we investigate CI-strength variation and its mechanistic underpinnings in a phylogenetic context across 20 million years (MY) of Wolbachia evolution in Drosophila hosts diverged up to 50 MY. These Wolbachia encode diverse Cif proteins (100% to 7.4% pairwise similarity), and AlphaFold structural analyses suggest that CifB sequence similarities do not predict structural similarities. We demonstrate that cifB-transcript levels in testes explain CI strength across all but two focal systems. Despite phylogenetic discordance among cifs and the bulk of the Wolbachia genome, closely related Wolbachia tend to cause similar CI strengths and transcribe cifB at similar levels. This indicates that other non-cif regions of the Wolbachia genome modulate cif-transcript levels. CI strength also increases with the length of the host’s larval life stage, presumably due to prolonged cif action. Our findings reveal that cifB-transcript levels largely explain CI strength, while highlighting other covariates. Elucidating CI’s mechanism contributes to our understanding of Wolbachia spread in natural systems and to improving the efficacy of CI-based biocontrol of arboviruses and agricultural pests globally.


Introduction
Endosymbioses are intimate associations where microbes live inside the cells of other organisms (1). This type of interaction led to the evolution of mitochondria and chloroplasts, and thus all eukaryotic life on Earth (2). Maternally transmitted Wolbachia are the most common endosymbionts, infecting over half of insect species (3). While some Wolbachia are required for host survival (4,5), many manipulate host reproduction to spread to high frequencies (6)(7)(8)(9)(10)(11).
CI strength is highly variable within and among Wolbachia strains, ranging from complete to statistically insignificant  (37). The chronogram was estimated using 156 full-length and single-copy genes (130,359 bp) of equal length and based on prior calibration of rates of Wolbachia divergence (38). All nodes have posterior probabilities >0.95. wMel-like and wRi-like clades are highlighted. (C) Phylogram of Drosophila host species used in this study, based on 20 conserved and single-copy genes (38). All nodes have posterior probabilities >0.95. Dashed lines pair Wolbachia strains with their Drosophila host species and indicate topological discordance. Female D. teissieri is displayed to the bottom right. (D) Six Wolbachia induced strong CI, two yielded weak CI, and two caused nonsignificant reductions in egg hatch. "Compatible crosses" include the three compatible crosses in Fig. 1A. The wAur compatible cross includes only the infected female x infected male cross since uninfected wAur males were largely inviable. CI strength was calculated as 1-(CI-cross hatch rate/mean hatch of compatible crosses) (10) and displayed as a percentage. 0% CI represents no deviation from average compatibility, and 100% CI represents no eggs hatching. BCa confidence intervals of CI strength are displayed to the right of the plots. Significant differences are based on Mann-Whitney U tests between compatible and CI crosses for each strain. Names of strains are displayed in orange text if they do not cause significant CI, blue text if they cause weak CI, and black text if they cause strong CI. Significant differences are * P < 0.05, * * P < 0.01, * * * P < 0.001, and * * * * P < 0.0001. Nonsignificant results are denoted with ns. Exact P-values are reported in Supplementary Table S1, and Supplementary Fig. S1 displays all cross types individually. embryonic lethality for mostly unknown reasons. Within systems, CI strength varies by male age (25,26), temperature (26), mating history (27), rearing density (28), host genotype (6,(29)(30)(31), and other factors. Numerous hypotheses may explain molecular mechanisms of CI-strength variation. Most proximally, high cifA-and cifB-transcript levels should cause strong CI since more sperm will contain Cif proteins, and more Cifs will reach the fertilized embryo (14)(15)(16)32). However, higher Wolbachia density and slower development time should also covary with stronger CI since more Wolbachia should produce more Cifs (13), and slower development gives Cifs more time to act or localize (33). Cif protein sequence variation also contributes to CI strength (20,34,35). Indeed, theory predicts that CI-causing genes are under weak purifying selection since they are expressed in males that do not transmit Wolbachia (36). Weak selection on these genes likely results in high divergence rates and varied enzymatic efficiencies (34,36).
Despite CI's importance in Wolbachia's widespread prevalence and use in biocontrol, the evolutionary and mechanistic underpinnings of CI-strength variation remain unresolved. Here, we leverage 20 million years (MY) of Wolbachia evolution in Drosophila hosts diverged up to 50 MY to assess the extent of CI-strength variation among strains, and its mechanisms, under consistent experimental conditions. We also investigate the sequence and structural variation of proteins that contribute to CI and test whether Wolbachia density, cif-transcript levels, and/or host developmental time can explain CI-strength variation. While several factors may contribute to CI strength, we report for the first time that cifBtranscript levels in testes are the most substantial contributor.
We annotated functional domains using HHpred (42). CifA had no hits above 80% probability-our threshold for accepting annotations ( Fig. 2B; Supplementary Table S2). The lack of confident annotations for CifA indicates that CifA either has a novel function not currently represented in the database or, more likely, that it is too diverged from database proteins to be confidently annotated. However, CifB had six confident domain predictions (Supplementary Table S2): PD-(D/E)XK nuclease, Ulp1 protease or peptidase, Avirulence protein, OTU-like cysteine protease, Ankyrin/DNA-binding, and Latrotoxin (Fig. 2B). The last three domains were unique to CifB wTri [T5] , and the avirulence protein domain was unique to CifB wBai [T1-T2] . The Ulp1 protease domain, known to be a functional deubiquitinase (14), was present in all CifB [T1] proteins and absent in other CifB Types. Deubiquitinase activity does not appear to be essential for CI induction (32). Despite considerable sequence variation ( Supplementary  Fig. S1C), a pair of PD-(D/E)XK-nuclease domains was present in all CifB proteins-CifB [T3 and T4] are functional nucleases (18,19), and CifB [T1] is a DNase with unknown catalytic residues (43). We conclude that Cifs are divergent in both sequence and domains but with several shared features. Indeed, the representation of the PD-(D/E)XK pair in all CifB Types suggests it is functionally important (13,43) for CI and/or another unknown phenotype.

CifB sequence and structural similarities do not covary
Theory predicts that CI-causing genes like cifB are under weak purifying selection since they are expressed in males that do not transmit the infection, while rescue genes like cifA should be preserved (36). Indeed, genomic and phenotypic analyses of putatively pseudogenized Cifs suggest that CifB has a higher rate of putative pseudogenization than CifA (37,41). For example, non-CI-causing wMau has a putatively pseudogenized CifB, intact CifA, and can rescue CI induced by closely related strains (37,44). Consistent with these results, our CifA sequences (95% CI of the median = 57.5% to 63.9% identity) were significantly more similar than were CifB sequences (95% CI = 41.9% to 55.0%) according to a Wilcoxon matched-pairs signed-rank test (P < 0.0001). However, this result may also be explained by relaxed selection due to structural flexibility in the protein.
To test this hypothesis, we generated AlphaFold (45) CifA structures, and CifB structures surrounding the shared PD-(D/E)XK pair (CifB nuc ), for representative Cif proteins from Types 1, 2, 4, and 5 (Fig. 2C). Type 3 was excluded because Type 3 wTsa was putatively pseudogenized (see below), and only the PD-(D/E)XKnuclease pair of CifB (CifB nuc ) was analyzed as it was shared across all Types. AlphaFold confidence, measured as a Local Distance Difference Test (pLDDT) score, suggests confident structural predictions (pLDDT = between 74.9 and 86.9; Supplementary Fig. S2; Supplementary Table S3). However, the C-terminus of all CifA protein predictions were low confidence and structurally disordered in the wMel, wRi, and wTei proteins (  (45). Indeed, empiricallydetermined crystal structures of CifA-CifB complexes could not resolve CifA's C-terminus, indicating it is disordered (46). Notably, disordered regions are often capable of forming structures under particular binding associations and may exhibit numerous context-dependent structures that add function to the protein. The CifA proteins formed a concave structure composed primarily of α-helices. In contrast, CifB proteins were composed of α-helices and β-strands organized into three regions: N-terminal PD-(D/E)XK, C-terminal PD-(D/E)XK, and a linker region (Fig. 2C). The αβββαβ structural motif typical of PD-(D/E)XK domains (47) was present in both domains in all of the CifB nuc structures.
We generated pairwise alignments of protein structures and calculated template modeling (TM) scores representing structural similarity (48). CifA TM scores positively covaried with percent sequence identity according to a Pearson correlation (R 2 = 0.94; P = 0.005; N = 6 comparisons). However, CifB nuc TM scores did not covary with sequence identity (R 2 = 0.17; P = 0.74; N = 6). These results suggest that CifA protein sequence similarity is a good indicator of structural similarity, whereas CifB nuc sequence similarity is not a good predictor of structural similarity. This is consistent with findings that PD-(D/E)XK domains are structurally conserved but represented by highly divergent sequences (47). These results indicate that CifB's relatively rapid sequence evolu-tion is likely due to the sequence flexibility underlying CifB's structure. Since theory predicts that selection will not maintain the CIcausing genes, it is unknown why CifB's nuclease is structurally maintained. However, it is plausible that the selective maintenance of these structures is dependent on their involvement in other cellular phenotypes such as autophagy (49).

Cif pseudogenization does not explain non-CI-inducing strains
Two of our Wolbachia (wTei and wTsa) failed to cause statistically significant embryonic mortality in the CI crosses. We tested the hypothesis that non-CI inducers had pseudogenized Cif proteins (37,41) by searching for premature stop codons that truncate the proteins (Fig. 2B). While wTei appeared to have normal Cif proteins, wRi, wHa, wBoc, and wTsa encoded at least one Cif copy that was putatively pseudogenized. , which has a transposon inserted after the 348th nucleotide. We conclude that statistically nonsignificant wTei and wTsa CI cannot be explained by pseudogenization alone since wTei's Cifs appear normal and wTsa has four putatively intact Cif [T1] pairs. Prior genomic and functional analyses of wYak-clade Wolbachia support our wTei result (6,34). However, sequence variation may still contribute to inefficiencies in protein function. Indeed, transgenic expression and biochemical assays revealed that a single Valine to Leucine substitution in CifB wYak[T1] , a very close homolog of CifB wTei[T1] , significantly inhibits enzymatic activity relative to sister wMel (34). Finally, wBoc was the only strain that lacked an intact Cif pair. Two hypotheses could explain why wBoc induced weak CI despite pseudogenized Cifs. First, expression of the upstream coding sequence may be sufficient to yield a phenotype. Second, since complete circularized genomes were only available for wMel, wRi, and wHa, it is plausible that the wBoc genome contains additional unknown cif copies that our analyses did not detect.

Wolbachia densities in testes do not fully explain CI strength
The bacterial density hypothesis predicts that CI strength covaries with Wolbachia densities in testes (13), though there are exceptions where strong-CI-inducing strains have unusually low densities (50). We tested the bacterial density hypothesis by measuring Wolbachia densities in whole-testes extracts via qPCR (Fig. 3A) and performing categorical and qualitative correlation analyses with CI strength.
First, we predicted that strong-, weak-, and non-CI-inducing Wolbachia would have higher, similar, and lower Wolbachia densities in testes than model wMel that caused weak CI. However, only one strong-CI-inducing strain, wRi, occurred at statistically higher densities in testes than wMel (Fig. 3A). Three of the other strong-CI-inducing strains (wAur, wTri, and wHa) had similar densities, and two had lower densities (wBai and wAno) compared to wMel-infected testes. Contrary to our prediction, the other weak-CI line, wBoc, also occurred at lower densities in testes than wMel. Finally, non-CI-causing wTei and wTsa had similar and lower densities than wMel, respectively (Fig. 3A). Intriguingly, despite diverging from wRi on the order of thousands of years ago (Fig. 1B), strong-CI-inducing wAno and wAur occur at lower densities in their hosts, relative to wRi in D. simulans. In summary, only wRi and wTsa fit our predictions for categorical assignment (Table 1), , and cif wTri [T5] samples. Names of strains are displayed in orange text if they do not cause significant CI, blue text if they cause weak CI and black text if they cause strong CI. Significant differences are based on FDR-adjusted pairwise t tests relative to wMel abundance in A and cif wMel [T1] transcript levels in B and C. * P < 0.05, * * P < 0.01, * * * P < 0.001, and * * * * P < 0.0001. Exact P-values are reported in Supplementary Table S1. and low-density wAno, wBoc, and wBai caused stronger CI than predicted by their testes Wolbachia densities.
We next tested the bacterial density hypothesis using a phylogenetic generalized least squares (PGLS) regression that accounts for phylogenetic signal and dependencies within the residuals of our comparisons. Wolbachia densities in testes had a statistically nonsignificant relationship with CI strength among our focal Wolbachia (R 2 = 0.03; P = 0.62; Supplementary Table S4). However, we also subset our data by removing non-CI inducers (wTei and wTsa) and/or low-density strains (wAno, wBoc, wBai, and wTsa), since they seemingly yielded stronger CI than expected from their densities. Removal of low-density Wolbachia revealed a strong positive relationship between Wolbachia densities and CI strength (R 2 = 0.99; P = 2.56E−05; Supplementary Table S4). Similarly, removal of non-CI strains recovered a strong correlation between densities and CI strength (R 2 = 0.99; P = 9.74E−07; Supplementary Table S4). We conclude that Wolbachia densities in the testes cannot fully explain CI-strength variation among our ten systems, but it trends with CI strength among a subset of strains. Notably, while Wolbachia densities measured via qPCR from whole-testes extracts is a poor predictor of CI strength across systems, it remains plausible that Wolbachia density in specific stages of spermatogenesis is a strong predictor (51).

cifB-transcript levels in testes largely explain CI strength
The Wolbachia-density hypothesis is based on the presumption that CI-gene expression positively covaries with Wolbachia abundance in host tissues (13). Thus, we tested if cifA-and/or cifB-transcript levels in whole-testes extracts covaried with CI strength using RT-qPCR. As with Wolbachia densities, we tested if cif-transcript levels and CI strength covaried via categorical and/or qualitative analyses. Relative to weak-CI-causing wMel, we predicted that strong-, weak-, and non-CI strains would have higher, similar, or lower cif-transcript levels, respectively.
Regarding cifB-transcript levels, seven of nine strains matched our predicted transcript levels in testes relative to cifB wMel[T1] : Five strong-CI strains (wRi, wAno, wAur, wTri, and wHa) had copies with only higher transcript levels, the weak-CI wBoc had similar transcript levels, and the non-CI-inducing wTsa had significantly lower transcript levels (Fig. 3C). Of the remaining two strains, strong-CI wBai only had copies with significantly lower transcript levels, and wTei that did not cause statistically significant CI had a copy with lower transcript levels (cifB wTei [T1] ) and one with higher transcript levels (cifB wTei [T4] ; Fig. 3C). Finally, PGLS regression revealed a significant positive relationship between cifB [T1] -transcript levels and CI strength when low-density Wolbachia (R 2 = 0.33; P = 0.04) and non-CI-inducing strains were removed from the analysis (R 2 = 0.89; P = 4.66E−03; Supplementary  Table S4). A single cifB copy (cifB [T1] ) was selected for analysis from each strain since Type 1 is the only cifB Type maintained across all strains.
Conversely, only five strains matched our predicted transcript levels pattern relative to cifA wMel [T1] in testes: Three strong-CI strains (wRi, wAno, and wTri) had at least one cifA copy with higher transcript levels, the weak-CI strain wBoc had similar transcript levels, and the strong-CI strain wHa had copies with similar and lower transcript levels that likely amount to higher combined transcript levels (Fig. 3B). The remaining five strains had higher (wTei and wTsa) and lower (wAno and wBai) transcript levels than predicted based on their CI strength (Fig. 3B). A PGLS regression revealed no significant relationship between cifA [T1] -transcript levels and CI strength (Supplementary Table S4). In summary, categorical analysis of cifA-transcript levels relative to wMel suggests that CI strength can be explained by cifA-transcript levels in the testes of wRi, wAno, wTri, wBoc, and wHa but not wTei, wAno, wBai, and wTsa ( Table 1). The quantitative analysis supports the disconnect between cifA [T1] -transcript levels and CI strength but Note: High, medium, and low density/cif transcription is defined as higher, similar, and lower than the weak-CI-causing wMel. We predicted that Wolbachia density/cif transcription would be higher, similar, and lower than wMel for strong, weak, and non-CI inducers. Measures consistent with these predictions are bolded. * Co-expression of wHa's two cifA copies may yield higher total expression than cifA wMel [T1] , but neither copy alone had higher transcript levels.
does not account for the combinatory impact of expressing multiple cifA copies. We also tested for a relationship between cifcopy number and CI strength and found no significant relationship (R 2 = 0.11; P = 0.76).
In sum, our analyses reveal for the first time that cifB-transcript levels in testes largely explains CI-strength variation. However, two strains are interesting exceptions: wTei did not cause statistically significant CI, yet cifB wTei[T4] -transcript levels were relatively high, and wBai caused very strong CI, but cifB wBai[T1-1] -and cifB wBai[T1-2] -transcript levels were much lower than expected. We propose four non-exclusive hypotheses for these patterns. First, we measured transcript levels as a proxy for translation, but the two can be decoupled (52). Thus, wTei may produce fewer proteins per transcript than other strains, and wBai may produce more proteins per transcript. Second, we measured Wolbachia densities and cif-transcript levels from full-testes extracts, but Wolbachia localization within spermatogenesis is linked to CI-strength variation (51). It is plausible that wTei is poorly localized for CI induction while wBai is optimally localized. Third, theory and empirical studies agree that hosts develop resistance to CI (6,36,40). Since most of the strains investigated were from different species, it is plausible-indeed, perhaps likely-that host factors modulate CI-strength variation. Future work aimed at determining the contributions of host factors to CI strength and the relationship between cifB-transcript levels and CI strength will be particularly useful. Finally, cif-genetic diversity may contribute to variation in CI efficiencies. This hypothesis seems particularly likely for wTei since a single amino acid substitution between CifB wMel[T1] and a CifB wTei[T1] homolog in wYak inhibits transgenic enzymatic activity and CI penetrance (34). CifB wBai[T1 -2] uniquely encodes a domain with structural homology to the cysteine protease avirulence protein AvrPphB of Pseudomonas (Fig. 2B). AvrP-phB cleaves host kinases and inhibits Arabidopsis defenses against Pseudomonas colonization (53). Thus, the CifB wBai[T1 -2] avirulence domain may interfere with host CI-resistance mechanisms (54). However, the molecular function of the AvrPphB-like residues retained in CifB wBai [T1-2] and the means of CI suppression are unknown. Future work will be necessary to determine the relative contributions of translation, localization, and genetic variation to CI caused by these extraordinary strains.

Strong CI and cifB [T1] -transcript levels show evidence of Wolbachia phylogenetic signal
We tested whether closely related Wolbachia exhibit similar CI strengths and/or Wolbachia densities/cif-transcript levels in testes. Conversely, causing CI, having low-Wolbachia densities, or expressing high levels of a cifA [T1] copy was randomly distributed across the phylogeny (Supplementary Fig. S3C-E). However, Geiger simulations suggest that the inclusion of more Wolbachia-infected systems may help differentiate our D values for low-Wolbachia density, cifA [T1] -transcript levels, and CI induction from randomness ( Supplementary Fig. S3), although the sample sizes required are notably unreasonable. For non-proportional data (i.e. CI-strength measures), we also calculated the maximum-likelihood value of Pagel's lambda (λ) (56), using the continuous measures of our traits (λ = 0 indicates no phylogenetic signal). This approach supported the finding that cifB [T1] -transcript levels have phylogenetic signal (λ = 0.88; P = 0.03), and that cifA [T1] -transcript levels (λ = 0.68; P = 0.73), and Wolbachia density (λ = 0.84; P = 0.12) do not. In summary, these results suggest that Wolbachia that cause strong CI and have high cifB [T1] transcription tend to be closely related. We hypothesize that non-cif regions of the Wolbachia genome modulate cifB-transcript levels. Future work is required to determine if this is via active modulation of cifB expression or the indirect effect of loci that regulate Wolbachia abundance in host tissues.

Larval, but not pupal, development time positively covaries with CI strength
Cardinium-induced CI strength (33) positively covaries with the length of Encarsia wasp pupation time. Conversely, longer development time covaries with weaker CI in wMel-infected D. melanogaster (28). Here, we test for the first time whether Wolbachia-induced CI strength across numerous Wolbachia-Drosophila systems covaries with larval ( Supplementary Fig. S4A), pupal ( Supplementary Fig. S4B), and/or egg-to-adult development times (Supplementary Fig. S4C) Supplementary Fig. S4D-F). Removing low-density Wolbachia strains resulted in an even stronger relationship between larval development and CI strength (R 2 = 0.86, P = 0.02) and revealed a weak yet statistically significant relationship with egg-to-adult development time (R 2 = 0.78, P = 0.048; Supplementary Table S5; Supplementary Fig. S4G-I), likely driven by the larval pattern. In summary, we find that larval development time is a predictor of Wolbachia-induced CI across systems. We present two hypotheses for this correlation. First, as has been proposed for Cardinium (57,58), longer development may enable CI products to accumulate or provide more time to act on their host targets in specific life stages. Second, development time may covary with other contributors of CI-strength variation, including the accessibility of host targets. Future investigations of the developmental restrictions of CI will be necessary to determine the mechanistic basis of the relationship between host development time and CI strength. Moreover, while these data support a relationship between development time and CI strength across systems, it remains to be determined if slow and fast developing males from the same Wolbachia-Drosophila system cause different CI strengths.

Conclusions
By leveraging CI diversity across 20 MY of Wolbachia and 50 MY of host evolution, we discovered that cifB-transcript levels largely explain CI-strength variation. This result confirms long-held predictions and enables future in-depth mechanistic investigation of the dynamic relationships between transcript levels, density, and localization on CI strength. Our analyses reveal five additional impactful findings. First, our focal Wolbachia encode considerable Cif-sequence variation, with some strains having as little as 7.4% pairwise similarity. Second, CifB sequence evolves faster than CifA, but the similarity in the protein structures of CifB nuclease pairs does not covary with CifB sequence similarities. Third, Wolbachia densities and cifA-transcript levels in testes fail to explain CI strength broadly. Fourth, CI tends to be stronger when larval development time is longer. Finally, cifB and strong-CI expression are similar among closely related Wolbachia, suggesting that other regions of the Wolbachia genome modulate cif-transcript levels, either actively or indirectly via effects on Wolbachia abundance in host tissues. This latter finding is particularly notable given that cifs are associated with prophage-WO regions that tend to be phylogenetically discordant from the bulk of the Wolbachia genome. Taken together, these results inform the mechanism and evolution of CI, bringing us closer to developing a complete understanding of CI-strength variation and how it governs Wolbachia's standing as the world's most common endosymbiont.

Wolbachia chronogram
Genomes used are listed in Supplementary Table S6 (6,37,38,(59)(60)(61). We used Prokka v. 1.11 to annotate genomes, then extracted all single-copy genes that were the same length in each genome for phylogenetic analyses. 156 genes met these criteria. The genes were aligned with MAFFT v. 7 (62), then concatenated. RevBayes v. 1.1.1 was used for all phylogenetic analyses (63). We generated a Wolbachia absolute chronogram using the same GTR+ model as (38), except that we used a relaxed clock with a branch rate prior of (7,7) for each branch, normalized to a mean of 1 across all branches. Briefly, we assumed a constant-rate sampled-birth-death process, which has four parameters: speciation rate, extinction rate, sampling probability, and the age of the root. We used previously reported speciation and extinction rate estimates based on empirical information (38). Specifically, we previously specified empirical lognormal hyperpriors to determine the means of these distributions so that the prior expected number of species under the birth-death process was equal to the known number of species in the group. The sampling fraction, ρ, was set to 0.1. To estimate absolute node ages, we used prior estimates of substitutions per site per year (64). We refer readers to (38) for a detailed description of these methods.
We used these genes to estimate a phylogram using RevBayes v. 1.1.1, following (38). We used a GTR+ model with four rate categories, partitioning by gene and codon position. Each partition had an independent rate multiplier with prior (1,1) [i.e. Exp(1)], as well as stationary frequencies and exchangeability rates drawn from flat, symmetrical Dirichlet distributions [i.e. Dirichlet(1,1,1. . .)]. The model used a uniform prior over all possible topologies. Branch lengths were drawn from a flat, symmetrical Dirichlet distribution and thus summed to 1. Since the expected number of substitutions along a branch equals the branch length times the rate multiplier, the expected number of substitutions across the entire tree for a partition is equal to the partition's rate multiplier. Four independent runs were performed, and all converged to the same topology. For additional details on the priors and their justifications, consult (38).

Fly lines, care, and maintenance
Fly lines used are listed in Supplementary Table S6. Uninfected flies were derived via tetracycline treatment in prior studies (15,68). Tetracycline cleared lines were used in experiments over a year after treatment, avoiding the effects of antibiotic treatment on mitochondria metabolism and density (69). Infection status was confirmed with PCR of the Wolbachia surface protein gene. An arthropod-specific 28S rDNA was amplified as a control for DNA quality (6,37). DNA was extracted for symbiont checks using a squish buffer protocol, described in (25). Flies were reared in vials with 10 mL of food made of cornmeal, dry corn syrup, malt extract, inactive yeast, soy flour, and agar (25). Fly stocks were maintained at 23 • C before and during experiments. Flies were anesthetized using CO 2 for virgin collections and dissections. During hatch-rate assays, flies were mouth aspirated between vials.

Hatch-rate and egg-lay assays
Four crosses were performed for each strain (Fig. 1A). The one exception was wAur-infected D. Auraria, where only crosses with infected males were performed as uninfected males were sickly and failed to mate in pairs. Virgin 6-8-d-old females were added to vials containing a spoon filled with fly food-one fly was added per vial. Food for these assays was the same as standard rearing media but with blue food coloring and 1% extra agar. Fresh yeast paste (3:2 water to yeast) was smeared on the food. After 4-5 h of acclimation, a single 3-d-old virgin male was added to each vial and then incubated overnight. In the morning, flies were aspirated into new vials with a fresh spoon. Vials were incubated for another 24 h before flies were removed via aspirating. Eggs were counted on spoons immediately after flies were removed, and the remaining unhatched eggs were counted after 48 h. The hatch rate was calculated as the proportion of eggs hatched per spoon in 48 h. CI strength (s h ) was measured for each sample as 1-(CI-cross hatch rate/mean hatch of compatible crosses) (10). 95% BCa confidence intervals and means of CI strength were calculated using boot in R (70). We used Mann-Whitney U tests in R to determine if hatching differed between CI and compatible crosses for each strain. A Kruskal-Wallis followed by Dunn's test was performed to determine if any cross types differed. Samples with fewer than ten eggs laid were excluded from hatch-rate analyses.
CI strength is amenable to variation in male age (25,40), rearing density (28), paternal grandmother age (71), and temperature (72). To enable comparison among different Wolbachia-host associations, we controlled these factors. Virgin males were aged for 3 d before pairing with females since younger D. tscacasi and D. bocki males failed to mate and lay. Flies were given only ∼24 h to lay in vials that would yield fathers to prevent overcrowding. All flies were maintained at 23 • C before the experiment, as virgins, and during the experiment. Finally, while we did not control paternal grandmother age, all males in these experiments were derived from females that were not maintained as virgins. Since the paternal grandmother age effect is significantly inhibited after mating, we predict that variation in non-virgin grandmother age has no impact on these studies.

Cif sequence and structure analyses
cif sequences were retrieved from the genomes of all strains used in this study with BlastP. Representative cifA and cifB genes from each of the five phylogenetic Types of cif genes were used as query sequences, based on previous assignments (41) , and cifB wTri [T5] . cifs were assigned a Type designation based on their closest hit from the query sequences and the Type of their nearby cifA or cifB partner. Glimmer 3 was used to identify the coding region surrounding each blast hit in Geneious Prime and then translated (73).
Three methods were used to characterize Cif proteins further. First, MUSCLE alignments for CifA and CifB were generated in Geneious Prime (73) and were used to create heatmaps of pairwise protein similarity in R using the corrplot package (74,75). Second, the HHpred webserver (42) was used to identify protein domains. Databases used were SCOPe70_2.07, Pfam-A_v33.1, COG_KOG_v1.0, and SMART_v6.0. Annotations with P > 80 were recorded. If multiple hits were retrieved for a single region, only the highest probability annotation was taken and used for annotation.
Finally, ColabFold (76), an AlphaFold2 (45) Google Collaboratory notebook, was used to generate tertiary structures for CifA and CifB's PD-(D/E)XK domains. Full CifB proteins were not generated due to restrictions on protein length and computational power. Tertiary structure similarity was determined using the Zhanglab TM-score webserver (48). Each pairwise comparison was conducted twice-switching the order of template and experimental structures-and the average was taken. The relationship between Cif sequence and structural similarity was assessed using a Wilcoxon matched-pairs signed-rank test in Graphpad Prism 8.

Tissue collection and nucleotide purification
Siblings from hatch-rate assays were collected for DNA and RNA extractions for Wolbachia-density and cif-transcript-level assays. All tissue was collected the day after crossing for the hatch-rate experiment. Virgin males were CO 2 anesthetized, and testes were dissected in chilled phosphate-buffered saline. Five pairs of testes were placed into a single tube for DNA extractions and stored at −80 • C until processing. The tissue was homogenized, and DNA was extracted with the DNeasy Blood and Tissue kit (Qiagen). For RNA extractions, 15 pairs of testes were placed into a single tube containing 200 μL of Trizol and four 3 mm glass beads. Samples were then homogenized using a TissueLyser II (Qiagen) at 25 Hz for 2m, centrifuged, and stored at −80 • C until processing.
RNA samples were thawed, 200 μL of additional Trizol was added, and tissue was further homogenized at 25 Hz for 2 m. RNA was extracted using the Direct-Zol RNA Miniprep kit (Zymo Research) following the manufacturer's recommendations, but with an added wash step. On-column DNase treatment was not performed. The "rigorous" treatment protocol from the DNA-free kit (Ambion) was used to degrade DNA in RNA samples. Samples were confirmed DNA-free using PCR and gel electrophoresis for an arthropod-specific 28S rDNA (6,37). The Qubit RNA HS Assay Kit (Invitrogen) was used to measure RNA concentration. Samples were diluted to 20 ng/μL, 2 μL of TATAA Universal RNA Spike I (tataabiocenter) was added to each sample, and 16 μL of the sample was converted to cDNA using SuperScript IV VILO Master Mix (Invitrogen).

Relative-abundance and gene-transcript-level assays
Abundance and transcript-level assays were performed on testes nucleotide extracts. Wolbachia density was measured via qPCR as the relative abundance of the Wolbachia gene ftsZ and the singlecopy gene mid1, which is conserved across the Drosophila genus (25,77). cifA-and cifB-transcript levels were measured via RT-qPCR as the relative abundance of cif target to a spike-in control (described below). Since cifs are highly diverse, eight cifA primer sets and ten cifB primer sets were designed to capture the transcript levels of all variants encoded by our 10 focal Wolbachia. cif primers were designed using Primer3 in Geneious Prime (73) (Supplementary Table S7). All primers were efficient for relevant hosts and Wolbachia, except Type 3 wTsa primers, which were not adequately tested due to insufficient transcript levels.
In RT-qPCR, Cq values are commonly normalized to an endogenously expressed gene to control for variation in RNA quality, reverse transcript levels, amplification, and pipetting. Endogenous controls must be consistently expressed across treatment groups to be valid. A single gene is unlikely to meet these criteria since we analyze transcript levels across nine Drosophila species. As such, we opted to normalize our qPCR results against an RNA spikein control added to each sample after RNA dilution and prior to reverse transcript levels (described above). The spike-in control accounts for variation in reverse transcript levels, amplification, and pipetting that occurs after the spike-in is added to the sample and is considered equivalent or better than endogenous controls in some cases (78)(79)(80).
Samples were tested in triplicate using Powerup SYBR Green Master Mix (Applied Biosystems). Abundance FC was calculated as 2 -Ct of ftsZ and mid1 relative to a random wMel sample.
Transcript levels FC was calculated as 2 -Ct of the cif target relative to the spike-in control compared to the average transcript levels of four reference samples with four unique cif copies: cif wMel [T1] , cif wRi [T2] , cif wTei [T4] , and cif wTri [T5] . FDR-corrected pairwise t tests were used to determine if Wolbachia density differed between systems. Analyses excluded samples with a Cq standard deviation exceeding 0.4 between triplicate measures.

Development assays
We measured the developmental timing of fly species used in our study, in parallel. Stock bottles were cleared of flies, and all flies were collected after three days and held in a vial with fresh food for 24 h. Ten non-virgin Wolbachia-infected females were moved to a vial with a spoon with food, as described above for hatch-rate assays. Vials were monitored to determine the number of eggs laid and hatched every 3 h between 8 AM and 8 PM. After 20 to 30 eggs were laid, females were removed from vials, and the spoon was transferred to a vial containing 10 mL of fresh food. After eggs hatched, pupation was recorded every 6 h and adult emergence were recorded once daily. A vial was monitored until all adults emerged or there was no emergence for three days. Larval, pupal, and egg-to-adult development times were calculated as the time between the first egg hatch and the last larva pupation, the first pupation and the last adult emergence, and the first egg lay and last adult emergence per vial, respectively.

Correlation analyses
We tested whether Wolbachia density, cif-transcript levels, or development time correlated with CI strength. Each Wolbachia-host pair represented a single sample in each analysis. PGLS regressions were used to test for relationships between BCa means of CI strength and the means of the traits above (density and transcript levels data were log2 transformed before taking the average) using caper in R (81). Analyses were performed with and without maximum-likelihood correction of λ and Akaike information criterion values were generated for each model. In all cases, correcting for λ yielded a preferred model, which we have reported.

Phylogenetic signal
We investigated whether any of these traits exhibit phylogenetic signals using our Wolbachia phylogeny and data for CI strength, Wolbachia density, and cif-transcript levels. For cif-transcript levels, only Type 1 cifs were included in analyses, and only a single allele was selected from strains with multiple Type 1 copies (cif wHa[T1 -1] and cif wBai [T1-1] ). Two methods were used to investigate phylogenetic signals.
First, we calculated Fritz and Purvis' D statistic using the caper R package, which estimates phylogenetic signal based on binary traits (55,81). Binary traits were assigned as follows. Wolbachia density was low for wAno, wBai, wBoc, and wTsa and high for other strains. cifA-transcript levels were high for wAno, wRi, and wTri and low for other strains. cifB-transcript levels were high for wAno, wAur, wHa, wRi, wTei, and wTri and low for other strains. CI strength was treated as binary by either comparing strong CI (wAno, wAur, wBai, wHa, wRi, wTri) vs. all other strains or CI (strong-CI strains plus wMel and wBoc) vs. non-CI strains. The D statistic compares the observed D value to alternative D values generated with simulated data based on a random phylogenetic pattern and Brownian motion using 1,000 permutations each. To investigate if larger phylogenies improve D statistic estimation, we used the Geiger package in R (82) to simulate 100 permutations of the D statistic for trees with 10, 25, 50, and 100 taxa (λ = 1 or 0) given the prevalence of our binary statistics.
Next, we calculated Pagel's lambda (λ), using the phytools package in R, which estimates phylogenetic signal based on continuous traits (56,83). A likelihood ratio test was used to compare our fitted value of λ to a model assuming no phylogenetic signal. Pagel's λ was not calculated for CI since CI data is proportional. We used the mean of log2-transformed data for relative abundance and transcript levels.