Decoupling gene knockout effects from gene functions by evolutionary analyses

Genic functions have long been confounded by pleiotropic mutational effects. To understand such genetic effects, we examine HAP4, a well-studied transcription factor in Saccharomyces cerevisiae that functions by forming a tetramer with HAP2, HAP3, and HAP5. Deletion of HAP4 results in highly pleiotropic gene expression responses, some of which are clustered in related cellular processes (clustered effects) while most are distributed randomly across diverse cellular processes (distributed effects). Strikingly, the distributed effects that account for much of HAP4 pleiotropy tend to be non-heritable in a population, suggesting they have little evolutionary consequences. Indeed, these effects are poorly conserved in closely related yeasts. We further show substantial overlaps of clustered effects, but not distributed effects, among the four genes encoding the HAP2/3/4/5 tetramer. This pattern holds for other biochemically characterized yeast protein complexes or metabolic pathways. Examination of a set of cell morphological traits of the deletion lines yields consistent results. Hence, only some gene deletion effects support related biochemical understandings with the rest being pleiotropic and evolutionarily decoupled from the gene’s normal functions.

Introduction genetics analysis. The common practice is to knockout or knockdown a gene and find 77 the traits significantly altered(14), which represents a purely mechanistic framework. 78 In the above TF versus ATCGATC motif example, when the TF is deleted, the 79 expression of those genes with the motif at promoter region could all be affected. The 80 resulting pleiotropic effects, which are either ad hoc or evolutionarily selected 81 according to the nature of the focal motifs, together would lead to a very complex 82 picture on the functionality of the TF. 83 The necessity of adopting an evolutionary view in reverse genetic analysis lies also 84 in the effect size of the knockout or knockdown mutations experimentally introduced, 85 which is much larger than that of typical segregating alleles in natural populations(15). cerevisiae strain BY4741, and checked the expression trait of the other yeast genes by 99 sequencing the transcriptome of the strains that grow in the rich medium YPD at 30 °C 100 (Table S1). We found 195 responsive genes each with a significant expression change 101 under a stringent statistical cutoff (Table S2). Gene ontology (GO) analysis of the 195 102 genes revealed one third of them (65/195) clustered more than expected by chance in 103 dozens of GO terms. These GO terms are related with each other and reflect well the 104 functional annotations of HAP4 as a regulator of mitochondria activities(19) (Fig. 1A). 105 The remaining two thirds (130/195) distribute rather randomly across diverse biological 106 processes, underscoring the strong pleiotropy of HAP4. The two sets of genes are all 107 functionally characterized with clear GO annotations (Table S2)

111
Because evolution happens in a population rather than in an individual, it is 112 important to test the population-level heritability of the deletion effects. We crossed 113 two S. cerevisiae strains to obtain a population of yeast segregants. Specifically, a 114 wild-type strain BY4742(MATalpha), which is identical to BY4741 except at the 115 mating locus, was crossed with the HAP4 deletion line of DBVPG1373(MATa) (Fig. 116 1C). This way, the comparison between wild-type and null alleles at the HAP4 locus 117 would match the comparison conducted in the isogenic BY4741 background. We 118 dissected six tetrads of the hybrid and obtained 12 HAP4 wild-type and 12 HAP4 null 119 segregants. For each of the 195 deletion effects we computed its heritability (h 2 HAP4) in the segregant population (Methods). The h 2 HAP4 measures the fraction of variance 121 of an expression trait that is attributed to the HAP4 locus. We noted that the 122 heritability analysis resembles a forward genetic assay with a candidate genetic locus. 123 An h 2 HAP4 close to zero suggests HAP4 is not a QTL (Quantitative Trait Locus) of the 124 expression trait. We found the 65 clustered effects in general have much higher h 2 HAP4 125 than the 130 distributed effects (P = 7.8×10 -4 , Mann-Whitney U-test; Fig. 1D Hence, the heritability analysis based on a rather arbitrary and 143 small population appeared to represent well the situation in nature. We also examined 144 intra-species conservation of the HAP4 deletion effects by looking at two other S. 145 cerevisiae strains DBVPG1373 and GIL104, the former of which is 0.35% diverged 146 from the strain BY4741 and the latter 0.07% diverged at the genomic level (25 of the finding that was based on the HAP2/3/4/5 tetramer, we examined other 208 . CC-BY 4.0 International license available under a not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint (which was this version posted March 25, 2020. ; https://doi.org/10.1101/688358 doi: bioRxiv preprint biochemically characterized protein complexes by using publicly available expression 209 data. To avoid bias we considered a single dataset comprising microarray-based 210 expression profiles of over one thousand yeast gene deletion lines(26). There are 54 211 protein complexes annotated by a previous study suitable for our analysis(35) . In 24 212 cases the overlaps of clustered effects are significantly more than what would be 213 expected from distributed effects at a 99% confidence level, and the enrichments range 214 from 2.7-fold to over 100-fold with a median 5.3-fold ( Fig. 3A and Table S4). The

224
We also checked genes on the same KEGG pathways. There are 41 pathways that 225 are related to metabolism, genetic information processing, cellular processes, and so on, 226 suitable for our analysis (Table S5). The rate of overlaps of clustered effects is 227 significant higher than that of distributed effects in nine cases, and the enrichments 228 range from 5.6-fold to over 100-fold with a median 46.9-fold ( Fig. 3D and Table S5).

229
Consistently, the overlapped clustered effects of each pathway represent distinct 230 . CC-BY 4.0 International license available under a not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint (which was this version posted March 25, 2020.

241
These traits are typically about area, distance, and angle calculated based on dozens of 242 coordinate points, lines and angles that describe the shape of mother cell and bud, and 243 the shape and localization of the nuclei in mother cell and bud (Fig. 4A). This large 244 set of yeast traits had served as a valuable resource for studying genotype-phenotype 245 relationships(9, 40, 41). Deletion of HAP4 in S. cerevisiae significantly altered 78 246 morphological traits, among which 24 are also significantly affected in S. paradoxus by 247 HAP4 deletion (Table S6). To test if the evolutionarily conserved effects of HAP4 are 248 shared with HAP2, HAP3 and HAP5 more than the non-conserved effects, we also 249 measured the morphological traits affected by each of the other three genes, respectively, 250 in S. cerevisiae. We found that 58.3% (14/24) of the conserved effects are shared with 251 all the other three genes, which is significantly higher than the number (18/54 = 33.3%) 252 . CC-BY 4.0 International license available under a not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint (which was this version posted March 25, 2020. ; https://doi.org/10.1101/688358 doi: bioRxiv preprint for the non-conserved effects of HAP4 (P = 0.035, Fisher's exact test; Fig. 4B). The 253 estimations are not explained by correlated traits (Fig. S8), and the difference remains 254 largely unchanged when only traits with small measuring noise are considered (Fig. S9).

255
Hence, the cell morphology data also support the role of evolution in separating genetic 256 effects.

259
Thanks to the mature framework of measuring the selective constraints on DNA 260 sequence(42), the evolution-free functionality of DNA elements defined in ENCODE 261 was challenged immediately after its emergence (11,12). Notably, the gene-trait 262 interactions defined in reverse genetic analyses are also based on an evolution-free 263 framework. However, this century-old problem has been largely ignored, despite 264 exceptions(43, 44), primarily due to the lack of a readily available measure of the 265 underlying evolutionary constraint. In this study we performed, for the first time to 266 the best of our knowledge, a rigorous test of the evolutionary nature of a set of gene 267 deletion effects by examining their within-population heritability and intra-/inter-268 species conservation. We found only some of them subject to effective selection, with 269 the rest likely being ad hoc and non-evolutionary. That being said, we cautioned some 270 effects might be under very weak selection that was beyond the detection power of our 271 analyses. This concern would be alleviated by a reasonable assumption that effects 272 under very weak selection are not distinct from those under no selection in the 273 functional properties examined in the study. Similar to the ad hoc "functional" DNA 274 . CC-BY 4.0 International license available under a not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint (which was this version posted March 25, 2020. ; https://doi.org/10.1101/688358 doi: bioRxiv preprint elements defined in ENCODE(10), the ad hoc genetic effects are presumably explained 275 by mutation equilibrium or spurious functions arising from the gene deletion(16).

276
Importantly, since such ad hoc effects have not yet been shaped by evolution, they are 277 unlikely to be compatible with the roles the focal gene has long played in evolution (9).

278
This may explain in great part the origin of gene pleiotropy.  Hence, we could, as in this study, only perform enrichment analysis for a group of 293 genetic effects. Nevertheless, the current limitation in operationality does not 294 challenge the validity of the concept. A surprising finding of this study is GO 295 clustering can serve as a useful and readily operational proxy for selection when 296 . CC-BY 4.0 International license available under a not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made were obtained for further examination, which effectively controlled the potential effects 328 . CC-BY 4.0 International license available under a not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint (which was this version posted March 25, 2020. ; https://doi.org/10.1101/688358 doi: bioRxiv preprint of secondary mutations introduced during the gene replacement.

329
Because haploid yeast cells tend to flocculate, which is not suitable for cell   not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made  not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made

548
. CC-BY 4.0 International license available under a not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint (which was this version posted March 25, 2020. ; https://doi.org/10.1101/688358 doi: bioRxiv preprint

732
. CC-BY 4.0 International license available under a not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made Nearly 90 years ago Hermann J. Muller coined the terms amorph, hypermorphy, hypomorph, antimorph, and neomorph to classify mutations based on their lossor gain-of-function nature (see ref. 1). The basic idea of the classification has been fundamental to genetic analysis. In particular, amorph refers to null mutation on a gene, and the resulting phenotype is believed to represent the gene's native function. In contrast, neomorph refers to gain-of-function mutation on a gene, and the resulting phenotype does not represent the gene's native function. These concepts, although intuitively valid, lack rigorous tests. The diagram on the left illustrates how confusion could arise.
Suppose there is a living system with three genes (A, B and C) and two traits (T1 and T2). The function of proteins A and B is to form a dimer to regulate T1, and the function of protein C is to regulate T2. The understand the system we may apply genetic analysis. Deletion of B will break the A-B dimer, altering T1. This phenotype change represents the native function of B. However, when B is absent, A may find C to form a new, although less intimate, dimer A-C, which would alter T2. This is plausible since proteins with a structurally similar domain are prevalent in a eukaryotic genome. The change of T2 does not represent the native function of B; instead, it is explained by the non-native A-C dimer, a spurious function that arises from the deletion of B.
Notably, the spurious function arising from the deletion of B is by nature same as the new function caused by a gain-of-function mutation on A. It is well accepted that phenotype changes resulting from gain-of-function mutations do not represent native functions. From an evolutionary perspective only the A-B dimer is selected; the A-C dimer in both mutated systems is ad hoc.         . CC-BY 4.0 International license available under a not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint (which was this version posted March 25, 2020. We then removed traits one by one from those with the highest absolute R until no two traits have R^2 greater than a threshold, which is set to be 0.95, 0.9, 0.85, 0.8, 0.75, 0.7, 0.65, 0.6, 0.55, and 0.5, respectively. The number of remaining traits are 346, 312, 277, 247, 223, 204, 185, 161, 153, and 127, respectively. Error bars represent SE. . CC-BY 4.0 International license available under a not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint (which was this version posted March 25, 2020.  Fig. 4B is robust against trait measuring noise. To address the potential technical bias that traits with large measuring noise tend to be both non-conserved and non-overlapping, only traits with measuring CV < 0.1 across the replicates in wild-type BY 47 41 are considered. This results in 58 traits that are significantly affected by HAP4 deletion in S. cerevisiae, among which 19 are conserved effects and 39 non-conserved effects. The rate of overlaps in the conserved set remains significantly higher than the non-conserved set (P = 0.029, one-tailed Fisher's exact test). Overlaps refer to traits significantly affected by all four gene deletions in S. cerevisiae.