Gene Protein Sequence Evolution Can Predict the Rapid Divergence of Ovariole Numbers in the Drosophila melanogaster Subgroup

Abstract Ovaries play key roles in fitness and evolution: they are essential female reproductive structures that develop and house the eggs in sexually reproducing animals. In Drosophila, the mature ovary contains multiple tubular egg-producing structures known as ovarioles. Ovarioles arise from somatic cellular structures in the larval ovary called terminal filaments (TFs), formed by TF cells and subsequently enclosed by sheath (SH) cells. As in many other insects, ovariole number per female varies extensively in Drosophila. At present, however, there is a striking gap of information on genetic mechanisms and evolutionary forces that shape the well-documented rapid interspecies divergence of ovariole numbers. To address this gap, here we studied genes associated with Drosophila melanogaster ovariole number or functions based on recent experimental and transcriptional datasets from larval ovaries, including TFs and SH cells, and assessed their rates and patterns of molecular evolution in five closely related species of the melanogaster subgroup that exhibit species-specific differences in ovariole numbers. From comprehensive analyses of protein sequence evolution (dN/dS), branch-site positive selection, expression specificity (tau), and phylogenetic regressions (phylogenetic generalized least squares), we report evidence of 42 genes that showed signs of playing roles in the genetic basis of interspecies evolutionary change of Drosophila ovariole number. These included the signaling genes upd2 and Ilp5 and extracellular matrix genes vkg and Col4a1, whose dN/dS predicted ovariole numbers among species. Together, we propose a model whereby a set of ovariole-involved gene proteins have an enhanced evolvability, including adaptive evolution, facilitating rapid shifts in ovariole number among Drosophila species.


Figure S3
. Clustering of the average standardized expression of all genes in the D. melanogaster genome (with non-zero expression) in the various cell types obtained from the study of LL3 larval ovaries in Slaidina, et al. (2020).The expression was clustered using hierarchical clustering and average linkage in the Morpheus program (see Materials and Methods).CC=cap cells, FSCP=follicle stem cell precursors, GC=germ cells, IC=intermingled cells, SHa= anterior sheath cells, SHm= migrating sheath cells, SW=swarm cells, TFa= anterior terminal filament cells, TFp= posterior terminal filament cells.

Figure S6.
The percent of studied D. melanogaster sc-RNA seq genes upregulated and rapidly evolving in D. melanogaster that had three-species orthologs in Hawaiian Drosophila.N values in D. melanogaster are in Fig. 4. To be defined as having an orthologous gene set, the D. melanogaster gene had to have a match to the annotated genome D. grimshawi, and the three Hawaiian species had to all have high confidence orthologs to each other.
Table S1.The 30 developmental stages and 29 tissue types used for expression breadth analysis, or tau.The RPKM data is from ModEncode (Li, et al. 2014) and is available at FlyBase (Gramates, et al. 2022).Downloaded in April 2022.

Developmental stages Tissue types
Table S2.The 28 genes (27 with five-species orthologs for further follow-up study) that were identified herein as having a high dN/dS value (≥1.5 fold higher than genome-wide median) based on assessment of the gene sets that affected ovariole numbers and/or egg laying in D. melanogaster based on RNAi in a hpo loss of function genetic background from Kumar, et al. (2020).The observed phenotypes were named as hpo [RNAi] Ovariole number, hpo [RNAi] Egg laying and Egg laying [wt], or as "connectors".Results are shown for dN/dS obtained for D. melanogaster-D.simulans (Dmel-Dsim) (genome-wide median=0.086),the melanogaster subgroup (median=0.089),and the melanogaster group (0.062) using data available from (Stanley and Kulathinal 2016) and the M0 model of PAML that provides a single dN/dS per alignment (Yang 2007).For completeness, all details for genes that belong to more than one of the four phenotype gene sets described above, and those with dN/dS in more than one of the three sets of Drosophila species are shown.Genes are listed in descending order with respect to the M0 dN/dS of the highest value per gene.Occasional genes near the cut-offs were retained for analysis.These genes were subjected to follow-up molecular evolutionary analyses.Table S3.The 27 rapidly evolving signalling and connector genes under study.For each gene, the associated pathways, tau and key functional terms at FlyBase (Gramates, et al. 2017)  Table S4.The 27 rapidly evolving SIGNALC genes identified from Kumar, et al. (2020) and their expression status in the soma and germ cells in the larval ovary (each pooled across stages), and among somatic cells at the early, mid and late stages (DeSeq2 P<0.01 (Tarikere, et al. 2022)).Note that if a gene is designated as upregulated in the germ cells, this automatically indicates it is downregulated in soma.A total of 25 of 27 genes showed differential expression using at least one of these comparisons.

Fbgn ID Gene symbol
Upregulation observed (Tarikere, et al. 2022 Tefu Germ -Notes: the gene Paris (FBgn0031610), that was rapidly evolving in Dmel-Dsim but lacked high confidence orthologs in all five species was reported as upregulated in the germ cells relative to the soma cells in (Tarikere, et al. 2022).
Table S5.Genes that were highly upregulated in germ cells of the D. melanogaster larval ovary (relative to somatic cells) pooled across three larval stages (Tarikere, et al. 2022) and that exhibited rapid divergence in the melanogaster subgroup (M0 dN/dS>0.20).Freeratio dN/dS per branch, branch-site positive selection and tau values are shown for each gene.The genes with the top 10 log2 fold change values matching these criteria are shown.Upregulation in germ cells implies downregulation in the somatic cells.S6.The genes that were highly upregulated in the somatic cells of the D. melanogaster larval ovary in one of three stages of development (early, mid or late) as defined by Tarikere, et al. (2022) and that had elevated M0 dN/dS values in the melanogaster subgroup (>0.20).All differentially expressed genes with M0 dN/dS>0.20 were ranked according to log2 fold change and the 30 genes with the greatest degree of upregulation at one stage of the soma are shown.Also shown is the free ratio branch dN/dS for each species, the presence of branch-site positive selection (indicated by "yes" (P<0.05)), and the tau value."Stage" indicates the developmental stage in which the gene displayed upregulation.

Fbgn
Fbgn  S7.Rapidly evolving genes (M0 dN/dS>0.20 in the melanogaster subgroup) that were statistically significantly upregulated in one cell type of the LL3 ovary relative to all other cell types from the Slaidina et al. (2020) sc-RNA seq data (P<0.05using Seurat program v.2 (where log=loge in v2 (Satija, et al. 2015)).The genes listed include all those with M0 dN/dS>0.20 and with upregulation in the cell type provided.Genes could be upregulated in more than one cell type and thus may be listed more than once (Slaidina, et al. 2020).The subset of genes that were exclusively upregulated in the terminal filament cells (TF) or sheath cells (SH) cells are listed under Notes.The results are shown for the SHa= anterior sheath cells, SHm= migrating sheath cells, TFa= anterior terminal filament cells, TFp= posterior terminal filament cells.Notes: Genes that were exclusively upregulated in the SH cells: FBgn0051363, FBgn0033438, FBgn0014133, FBgn0033724, FBgn0000719, FBgn0003507, FBgn0027932, FBgn0051955.Genes that were exclusively upregulated in the TFs: FBgn0001253, FBgn0003435, FBgn0015872, FBgn0020503, FBgn0026620, FBgn0027598, FBgn0030174, FBgn0030237, FBgn0033134, FBgn0033683, FBgn0034198, FBgn0034200, FBgn0034638, FBgn0037350, FBgn0038071, FBgn0038476, FBgn0038498, FBgn0038682, FBgn0039914, FBgn0040343, FBgn0045842, FBgn0051361, FBgn0085407,FBgn0085432, FBgn0086686, FBgn0262563.Branch-site positive selection in D. simulans, D. sechellia and D. melanogaster using the subset of genes exclusively upregulated in SH cells was 25.0, 25.0 and 0% of studied genes respectively and for TFs was 11.5, 23.1, and 7.7% respectively."-" indicates the dN and dS were each <0.001 and thus too low divergence to determine dN/dS.Notes: No Orth-HD=no high confidence reciprocal BLASTX orthologs identified among the three Hawaiian Drosophila (HD).NA indicates the branch did not meet the criteria that dN or dS>0.001 for dN/dS analysis.Ten genes had dN/dS>2 fold higher (value>0.330)than the genome-wide medians in at least one species branch (the branch with the highest dN/dS value is in bold).Three additional genes, aux, Sirt1, and tefu had an elevated value when using a criterion of dN/dS >1.5 fold higher (value>0.246)than the genome-wide medians.

Fbgn
For the BULKSG transcription dataset (Tarikere, et al. 2022), we identified and extracted all genes that were statistically significantly upregulated (using DeSeq2 P<0.01 criterion in that assessment (Love, et al. 2014)) in the D. melanogaster pooled larval ovary somatic cells and those upregulated in the pooled germ cells.Upregulated in germ cells automatically indicates downregulation in soma cells, and thus both up-and downregulated genes in soma were studied.
We also extracted for study those genes that were upregulated at a single one of the larval stages studied (early, mid, or late) for the pooled ovary somatic cells; these stages correspond to different stages of TF formation (Tarikere, et al. 2022).For these gene sets, we then identified those genes with M0 dN/dS>0.20 in the melanogaster subgroup, which represent a value ≥2.2 fold higher than the genome-wide median.The cut-off was higher for the BULKSG than SIGNALC dataset, given that we examined all genes in the genome that were differentially expressed in the former dataset, which are apt to be less conserved as a group than the SIGNALC genes.The genes matching these criteria (dN/dS>0.20,and up-or downregulated in soma cells), were ranked based on degree of differential expression, and subjected to follow-up analysis in the melanogaster subgroup.
With respect to the SINGLEC dataset (Slaidina, et al. 2020), genes were extracted for further study that were statistically significantly differentially transcribed in a specific cell type of the LL3 larval ovary relative to all other cell types based on sc-RNA seq (using average standardized expression and P-values from Seurat v.2; some genes were upregulated in more than one cell type based on these criteria (Slaidina, et al. 2020)).The nine cell types studied (shown in fig. 1) included the germ cells (GC) and eight somatic cell types, namely the cap cells (CC), follicle stem cell precursors (FSCP), intermingled cells (IC), anterior sheath cells (SHa), migrating sheath cells (SHm), swarm cells (SW), anterior terminal filaments (TFa), and posterior terminal filaments (TFp), with a particular focus on the two types of SH cells and the two types of TF cells, which are thought to largely shape species-specific ovariole numbers and function (King, et al. 1968;Sarikaya, et al. 2012;Sarikaya and Extavour 2015;Slaidina, et al. 2020).Similar to the BULKSG assessment, among the genes that were upregulated per cell type, we identified the genes that had signs of rapid protein sequence divergence (M0 dN/dS >0.20 in genes with five-species orthologs) in the melanogaster subgroup (Stanley and Kulathinal 2016;Gramates, et al. 2017), for further analyses.
and D. murphyi scaffolds for evidence-based gene predictions.We also obtained ab initio gene predictions in Augustus (Hoff and Stanke 2013) which recommends D. melanogaster gene models as a reference for predictions in Dipteran species (Stanke and Waack 2003).In the assessments, we excluded any CDS that lacked a start or a stop codon, or had an internal stop codon, such that only complete CDS were used for study.Using these criteria, we identified 13,395 CDS for D.
murphyi and 12,670 CDS for D. sproati .Thus, these comprise high quality full-length CDS.To assess the completeness of our final D. sproati and D. murphyi CDS lists we applied BUSCO v.
5.5.2 (Seppey, et al. 2019;Manni, et al. 2021), a program that evaluates the completeness of a given transcriptome by comparing it to a subset of "universal" single-copy genes.For this, we used as the control the Dipteran gene set, named diptera_odb10 (N genes=3,250).The results showed that for the 13,395 full-length CDS for D. murphyi, 93.4% had a complete (unfragmented) match in the BUSCO dataset, indicating we obtained a very complete CDS list for this species (note that this value would be elevated using all 14,108 CDS identified for D. murphyi, but we focused solely on the obtained full-length CDS).In turn, for D. sproati , we found 93.2% of the full-length CDS were represented in the Dipteran BUSCO dataset.Taken together, the results show that our approach yielded a high confidence and highly complete CDS list for D. sproati and D. murphyi.
We next used reciprocal BLASTX (BLAST v. 2.13.0+ (Altschul, et al. 1997)) to identify orthologs between all three Hawaiian species using all CDS per genome in each contrast, for each combination of paired species, namely D. grimshawi-D.murphyi, D. grimshawi-D.sproati and D. murphyi-D.sproati.For each pair, in order to be identified as orthologs, two CDS had to be the best match (lowest e-score, any matched with a tied e-score the best match was the one with the highest bit value, e<10 -6 ) in both the forward and reverse contrast and have an e-value <10 -6 .Using sproati.
To identify Hawaiian species orthologs to the genes linked to ovariole/egg number evolution obtained from analysis within the melanogaster subgroup (SIGNALC in table 1; SIGNALC in fig.4), and for genome-wide orthologs, we compared the CDS of the most well annotated melanogaster species D. melanogaster (13,986 coding genes with longest isoform per gene) to the annotated Hawaiian species D. grimshawi CDS lists using reciprocal BLASTX (using same criteria described for Hawaiian Drosophila above).We report that 11,011 high confidence orthologs were identified between D. melanogaster and D. grimshawi (78.7% of the Dmel CDS list).A total of 10,276 of those genes had high confidence three-species orthologs in Hawaiian clade and were used for study of dN/dS and ortholog detection rates.
Each alignment was filtered in GBlocks v. 0.91b set at default parameters (Castresana 2000;Talavera and Castresana 2007) to remove gaps and any highly divergent segments.For each aligned CDS in D. grimshawi, D. murphyi and D. sproati(gaps removed ), we determined dN/dS per species branch using the free-ratios model of PAML (M1) and conducted branch-site analyses (Yang 2007), as outlined for the melanogaster subgroup in the main text.Branches were unsaturated in substitutions, and the 90 th percentile of dS values across all genes was <0.090 and of dN was <0.032 for each of the three Hawaiian species branches.The median gene-wide dN/dS value across all genes for each species branch were 0.152, 0.164, and 0.160 for D. murphyi, D.
of ovariole numbers in larval ovaries (Sarikaya and Extavour 2015) is reported as maximally transcribed (Li, et al. 2014).The 4-6 hour old embryos comprise an embryonic stage during which the somatic gonad precursors (SGPs) are being established (Richardson and Lehmann 2010), and thus speculatively these two genes (upd2 and hpo) could potentially influence ovariole numbers by affecting SGP number or behaviour (King, et al. 1968;Sarikaya, et al. 2012;Sarikaya and Extavour 2015).In sum, upd2 protein sequence changes may contribute to the divergence in ovariole numbers in the melanogaster subgroup, particularly the marked decline in the D. sechellia branch.

Zyxin (Zyx)
For the SIGNALC ovariole-related gene Zyx, which modulates activity of the Hippo pathway (table S3; Rauskolb et al. 2011;Gaspar et al. 2014;Ma et al. 2016), each of the five species of the melanogaster subgroup had a gene-wide dN/dS value (ranging from 0.267 up to >1, table 1) that was more than threefold higher than the genome-wide median per respective species branch (the genome-wide median dN/dS values for each of the five branches are shown in fig.S1).
The patterns suggest potential functional divergence of Zyx throughout the clade.Zyx exhibited gene-wide dN/dS>1 in D. sechellia, suggesting a history of positive selection in that lineage, that co-occurred with its marked decline in ovariole numbers (fig.2), suggesting the potential for a causative relationship.Upon close examination, we found that Zyx in the D. sechellia branch was a case when dS approached 0 (the other species branches had dS between 0.006 to 0.010), however the dN value was 0.004 (above the cut-off of 0.001), which was double the dN for D. simulans and D. melanogaster (even though dS was higher for the latter two species, their dN was lower), together indicating that the elevated dN/dS in D. sechellia was due to elevated dN (not low dS, and is conservatively denoted as dN/dS>1).Overall, the patterns suggest positive selection of Zyx in D. sechellia.
There are plausible cell biological mechanisms whereby adaptive changes in the Zyx protein may potentially influence ovariole number divergence.The gene Zyx encodes an actin cytoskeleton regulator, with roles in Hippo pathway regulation (table S3, (Rauskolb, et al. 2011)), and evidence suggests it can regulate cell behaviors and movements during development (Amsellem, et al. 2005;Matsui and Lai 2013).Given that actin cytoskeleton activity is a prime candidate for coordinating the cell movements required for the formation of the TFs (Chen, et al. 2001), it may be speculated that Zyx may contribute to this process, and thereby changes in its amino acid sequence may affect TF formation (table 1), and in turn, alter the ovariole numbers (King, et al. 1968;Sarikaya, et al. 2012;Sarikaya and Extavour 2015).This provides a possible mechanism whereby functional changes in the Zyx protein may affect interspecies transitions in ovariole numbers.Further, Zyx positively regulates Yki (Rauskolb, et al. 2011), a transcriptional coactivator involved in many processes including cell proliferation (Huang, et al. 2005), and that accumulates in the nucleus in hpo RNAi larval ovary somatic cells (Sarikaya and Extavour 2015).
We previously showed that elevating Yki levels in the somatic ovary results in an increased number of TFs and ovarioles (Sarikaya and Extavour 2015).Moreover, loss of Zyx reduces Yki activity (Rauskolb, et al. 2011).Thus, taken together, it may be hypothesized that the low ovariole numbers in D. sechellia as compared to all other melanogaster subgroup species (fig.2) may have arisen from a reduction of Yki activity, that was caused by lowered function/activity of Zyx protein due to its amino acid sequence changes (table 1).Given that Zyx exhibited signs of positive selection in the D. sechellia branch (gene-wide dN/dS, table 1), this would in theory imply that the reduction in ovariole numbers (rather than an increase) in D. sechellia may comprise an adaptive reproductive change.

Phosphatase and tensin homolog (Pten)
The SIGNALC ovariole-related gene Pten (Kumar, et al. 2020) is a regulator of the mTOR and insulin signalling pathways (table S3).S3), and is involved in cell size, cell growth and tumor suppression (Goberdhan, et al. 1999).Mutations in Pten impair cytoskeletal activity in Drosophila ovaries, and induce various female reproductive phenotypes including impaired localization of the germ plasm components oskar mRNA and Vasa protein in laid eggs and an absence of pole cells (von Stein, et al. 2005).Further, Pten RNAi in the D. melanogaster larval ovaries increases ovariole number (the hpo[RNAi] Ovariole Number phenotype (Kumar, et al. 2020), see table 1).
Among all of the 59 tissues/stages used for tau analyses (which was a relatively low value of 0.663 in D melanogaster for this gene), Pten had maximal expression, x̂=1, for the adult virgin ovaries (table S3), which is also consistent with an essential female ovarian role.In sum, given its functions in female reproductive system, and the rapid evolution of this signalling gene in four of five of the Drosophila lineages (0.128-0.594), and explicit evidence of positive selection in D. erecta (table 1), we hypothesize that Pten is a putative factor involved in ovariole number transitions in the melanogaster subgroup, and that may shape ovariole numbers by adaptive evolution, particularly in the D. erecta branch (table 1).

vein (vn)
The ovariole-related EGF pathway ligand gene vn (table S3) exhibited a dN/dS value that was between 1.6 to 7.2 fold higher than the genome-wide median per respective species branch in three of the Drosophila species branches studied (D.  (Slaidina, et al. 2021;Tu, et al. 2021).Further, this gene is expressed in the D. melanogaster TFs and SH cells (Slaidina, et al. 2020), which regulate ovariole formation and number (King, et al. 1968;Sarikaya, et al. 2012;Sarikaya and Extavour 2015;Slaidina, et al. 2020), consistent with the finding that vn RNAi reduces ovariole number in D. melanogaster (the hpo[RNAi] Ovariole Number phenotype, table 1 (Kumar, et al. 2020)).The highly elevated dN/dS value for vn in both D. simulans and D. sechellia terminal branches in particular suggest that vn protein sequence evolution may contribute to the rapid and two-fold level change in ovariole numbers between these closely related species (fig.2).
In sum, vn has signs of evolvability in its protein sequence, with frequent protein sequence changes and adaptive evolution (in D. yakuba) and yet also exhibited relatively slow evolution in the D.
melanogaster branch (dN/dS=0.0841),perhaps due to high pleiotropy (tau=0.741,table S3; note tau was measured in D. melanogaster as the reference), suggesting the gene has been subjected to strong purifying selection in that lineage.Thus, vn appears to have a dynamic evolutionary pattern, suggestive of potential roles in ovariole number divergence in the D. simulans and D. sechellia branches, and in D. yakuba.

BULKSG genes (table 2)
Insulin-like peptide 5 (Ipl5) The  S1).It may be the case that gene redundancy has contributed to the divergence patterns for this gene.The D. melanogaster Ilp family has eight genes that have evolved a range of functions and tissue type specializations, including ovarian functions for Ilp5 (Gronke, et al. 2010).In gene families, one gene member (or another) may pass through various stages of partial redundancy of function while still retaining the original gene function (Ohno 1970;Kirschner and Gerhart 1998), creating a window for relaxed constraint, and evolvability, of protein-coding regions and potentially giving rise to adaptive new functions or subfunctions, including gene-dosage functions (Kirschner and Gerhart 1998;Presgraves 2005;Conant and Wolfe 2008;Kuzmin, et al. 2022).
Otherwise, without a function, the genes may accumulate mutations to form a pseudogene or become lost from the genome (Ohno 1970;Presgraves 2005;Conant and Wolfe 2008;Kuzmin, et al. 2022).Given the gene Ilp5 was retained in the genomes of all five species here (table 2), it may be theorized that its rapid sequence divergence in four of five terminal branches may be associated with the evolution of altered functions that may affect ovariole formation.In fact, llp5 has been linked to female re-mating frequency in D. sechellia (Wigby, et al. 2011), and thus sexual selection pressures could potentially contribute to its very high dN/dS in that branch (0.5843, table 2), and to the evolved decrease in ovariole numbers in this lineage (fig.2).With respect to the comparatively slow evolution of this gene in D. melanogaster, a study of Ilp5 knockouts found evidence for compensatory expression of Ilp3 (Gronke et al. 2010).Ilp5 loss of function conditions cause a decrease in egg laying but do not completely eliminate it (Kumar, et al. 2020), suggesting its redundancy with other Ilp family members (and gene-dose compensation) may limit the extent of the negative effects on egg production.Nonetheless, the overall effect on egg yield and thus likely on fitness (Gronke, et al. 2010) may explain the strong purifying selection and low dN/dS value that we observed for the D. melanogaster branch, particularly if a weaker effect of Ilp5 mutations occurred in the other species.In sum, Ilp5 presents a dynamic history in the melanogaster subgroup and may substantively contribute to ovariole number variation in this taxon.

aquarius (aqrs)
The BULKSG ovariole-related gene aquarius (aqrs) (table 2) had the highest gene-wide dN/dS in the D. yakuba branch (0.3029), followed by D. erecta (0.1972), suggesting particularly rapid changes in the outgroup species (fig.2).Aqrs protein has been associated with the functionality of the male seminal fluid proteins (SFPs) and the sex peptide (SP) in D. melanogaster (Findlay, et al. 2014), products that are transferred to the female reproductive tract during copulation and act to increase egg production and decrease female receptivity to courtship (Findlay, et al. 2014).The observed high levels of aqrs transcripts in larval somatic ovary cells (table 2) suggests that the gene also plays some role in these female reproductive cells.In this regard, aqrs may be a strong target for male-female sexual conflict, which is a factor known to cause rapid evolution of female (and male) proteins and of reproductive traits such as egg production, and may ultimately promote speciation (Rice 1996;Arnqvist, et al. 2000;Swanson and Vacquier 2002;Clark, et al. 2009).Thus, the rapid sequence divergence observed here for aqrs may putatively contribute to interspecies ovariole number variation.

SINGLEC Genes (table S7; fig. 4)
Ecdysone-inducible gene E1 (ImpE1), sm, and CLIP-190 In terms of the subset of rapidly evolving genes that we identified that were exclusively upregulated in TFs (TFa and/or TFp, and no other cell types, table S7 190), each of which exhibited positive selection in the D. sechellia branch (table S7).ImpE1 has been linked to cell rearrangements during morphogenesis (Natzle, et al. 1988), sm is a nuclear ribonucleoprotein involved in diverse roles including muscle functions (Draper, et al. 2009), neuronal and chemosensory processes (Layalle, et al. 2005), and CLIP-190 proteins coordinate binding between actins and microtubules (Lantz and Miller 1998), and may affect intracellular transport (Sanghavi, et al. 2012).Taken together, the rapidly evolving genes upregulated in TF cell types are involved in multiple cellular processes likely required for TF morphogenesis, and thus adaptive changes observed in these genes (fig.4) may contribute to the proximate mechanisms underlying the rapid evolutionary divergence in ovariole numbers in Drosophila.

Genes downregulated in the BULKSG soma cells versus germ cells (table S7)
In terms of genes upregulated in the germ cells (and thus downregulated in the soma), the top ten genes with the highest degree of upregulation are shown in table S5.The genes exhibited elevated tau values ranging from 0.880 to 0.975, indicating a tendency of narrow transcription breadth (and several very narrow, with tau values above 0.950) for these genes, potentially reflecting high specialization to germ cell functions.Further, 8 of the 10 genes were largely uncharacterized genes (annotations as "CG number" only in table S5) based on available annotation from DAVID (Huang da, et al. 2009) and FlyBase (Gramates, et al. 2022).The two relatively well characterized genes no long nerve cord (nolo) and orientation disruptor (ord) play roles in the nerve cord, extracellular matrix, and meiosis (Huang da, et al. 2009).As the evidence to date suggests that somatic cells, rather than germ cells, are the principal regulators of ovariole number, we did not focus further on this particular genes of very highly and positively upregulated germ cell genes, and instead focused in the main text on the genes upregulated in somatic cells and with variable expression across larval ovary stages in tables 3 and 4.

Figure S1 .
Figure S1.Box plots of the distribution of the free-ratio dN/dS values in each species terminal branch for all genes per genome with five-species orthologs in the melanogaster subgroup (Dsim = Drosophila simulans; Dsec = D. sechellia, Dmel = D. melanogaster, Dyak = D. yakuba, Dere = D. erecta).The median dN/dS value is shown within each box.Outliers are excluded for Dsim and Dsec as they were outside the range of the Y axis.

Figure S2 .
Figure S2.Box plots of the tau values across all studied genes in D. melanogaster using expression data across 59 tissues and developmental stages (combined) and solely using the data from 30 developmental stages.Values are for all genes with orthologs among the five studied species in the melanogaster subgroup.Median values are shown within bars.Tissues and stages are described in tableS1.
the three sets of paired orthologs, we then compared the orthologs identified in the newly annotated D. murphyi-D.sproati pair to the D. grimshawi.-D.murphyi and D. grimshawi.-D.murphyi pairs, to ensure that the orthologs per gene identified between D. murphyi and D. sproati each independently matched the same single CDS in D. grimshawi.Thus, this comprises very stringent criteria for identification of three-way reciprocal orthologs among D. grimshawi.-D.murphyi-D.

Fbgn ID Gene Symbol Pathways tau 59-All x̂ Ovary virgin Key functional terms at FlyBase
Kumar, et al. (2020)lation of tau, the x̂ for adult virgin ovaries is shown as an example of this parameter (values of 1 underlined).Genes are listed in descending order from the highest M0 dN/dS values from PAML (TableS2).Pathways are fromKumar, et al. (2020)

ID Log2 Fold Change Gene name Gene Symbol M0 dN/dS Free-ratio dN/dS Branch-site positive selection P<0.05 tau
" indicates the dN and dS were each <0.001 and thus too low divergence to determine dN/dS.

ID Log2 fold change Stage Gene name Symbol M0 dN/dS dN/dS per branch Branch-site positive selection tau
" indicates the dN and dS were each <0.001 and thus too low divergence to determine dN/dS."yes" indicates P<0.05 in branch-site selection test.

Table S8 .
(Huang da, et al. 2009classifications of rapidly evolving genes that were upregulated in TFa and TFp cell types.Classifications were determined using DAVID(Huang da, et al. 2009) for all genes per category.P-values indicate the statistical significance of preferential classification of the gene into each GO group.Genes could match more than one GO group.

Table S9 .
(Stanley and Kulathinal 2016;Murga-Moreno, et al. 2019) positive selection for the 42 genes identified in Tables1, 2, and 3.Only those genes with P<0.05 are shown.The D. melanogaster populations examined are the Raleigh North Carolina (RAL) and the Zambia population.The P-values are shown as MK-P.Alpha (α) indicates the proportion of substitutions apt to have experienced positive selection.Analysis was conducted using the integrative McDonald Kreitman test(Murga-Moreno, et al. 2019) that uses D. simulans as the comparison species (Dmel-Dsim).Note that iMKT may not necessarily have the exact same alignment per gene as FlyDivas(Stanley and Kulathinal 2016), which was used for all our core analyses, but the alignments from both analyses are highly similar and representative per gene for this supplementary assessment.Data are available publicly for each dataset(Stanley and Kulathinal 2016;Murga-Moreno, et al. 2019).MK-P indicates the P-value.

Table S10 .
(Kumar, et al. 2020) branch-site analysis Hawaiian Drosophila(D.murphyi,D.sproati and D. grimshawi)for orthologs to the SIGNALC genes identified in table 1 that exhibited signs of involvement in ovariole number evolution (from analysis of the melanogaster subgroup, and(Kumar, et al. 2020)).Genes having a dN/dS value of >1 at the gene-wide level and those with values above 0.330 in at least one species branch are each shown (that is double the median of the genome with the highest median dN/dS, D. sproati).Species that exhibited positive selection using branch-site (BR-S) analysis in PAML (P<0.05) are indicated by the abbreviated species name.

Genes with all branches dN/dS≤0.330
ovariole-related gene Ilp5 in table 2 was particularly notable in that it exhibited strong purifying selection in D. melanogaster (dN/dS=0.001)and had high dN/dS values in all other branches (values of 0.2932-0.5843 in D. simulans, D. sechellia, D. yakuba, D. erecta, each markedly higher than the genome-wide branch median values, which ranged from 0.066 to 0.125, fig.