Evolutionary decomposition of genetic effects

The view of evolution-free biochemical functionalities defined in the Encyclopedia of DNA Elements (ENCODE) project is challenged immediately after its emergence. Ironically, genetic effects defined in reverse genetics are also evolution-free, but this century-old fact has received little attention because of no readily available measure of the underlying evolutionary constraint. In this study we conducted systematic tests for selection on a set of pleiotropic genetic effects by assessing their within-population heritability and inter/intra-species conservation. We found only some of the effects are subject to effective selection, and the rest represent evolutionarily ad hoc gene-trait interactions. Interestingly, the selected portion supports well related biochemistry understandings while the ad hoc portion accounts for cryptic pleiotropy. We argue that, given the ENCODE consortium already abandoned the evolution-free view on biochemical functionalities, it is time for geneticists to develop an evolution-included framework for genetic effects.

# These authors contributed equally to this work. Mutation analysis has long been used to understand the normal function of a gene 1 . 49 It appears now clear that a gene can often affect various seemingly unrelated traits 2 , a 50 phenomenon termed pleiotropy 3 . For instance, a large-scale gene knockdown assay 51 in the nematode worm Caenorhabditis elegans finds on average a gene affects ~10% 52 of 44 assessed traits 4 . Attempts to understand such pleiotropic mutational effects were 7 evolution 19 .
We hypothesized clustered genetic effects should support this 143 biochemistry understanding better than distributed genetic effects, because the latter is 144 non-evolutionary. To test the hypothesis, we deleted the other three genes that encode 145 the tetramer, respectively, in S. cerevisiae BY4741, and measured the expression 146 profiles of the deletion lines. We defined clustered effects and distributed effects for 147 each of the lines using the same method as in the HAP4 deletion line. We obtained  Table S3). Consistent with the 150 hypothesis, we found 20 overlapped clustered effects across the four gene deletion lines, 151 14.5 times higher than that of the distributed effects (P < 0.001, simulation test, Fig.   152 2d). Notably, the 20 overlapped clustered effects are not the strongest in 153 BY4741(Δhap4) (Fig. S4). To avoid the potential bias that expression responses to 154 the tetramer may have been considered in the GO annotations of the responsive genes, 155 we excluded all expression-related evidences for GO annotations to define new 156 clustered and distributed effects. We obtained essentially the same result (Fig. S5).

157
Because there are publicly available microarray data for HAP2, HAP3, HAP4 and 158 HPA5 deletion lines 24 , we also repeated the analysis using the public expression data 159 and observed a similar pattern (Fig. S6). 160 In addition to considering the protein complex formed by HAP4, we could also 161 consider protein-DNA interactions since HAP4 is a transcription factor. Data from a 162 chromatin immuno-precipitation (CHIP) assay of the promoters bound by HAP4 show 163 that, among the 195 responsive genes observed in BY4741(Δhap4), 13 are direct targets 8 of HAP4 (Fig. 2e) 25,26 . Interestingly, there is 24-fold enrichment of direct targets in 165 the clustered effects relative to the distributed effects (P = 8.6×10 -6 , Fisher's exact test); 166 among the 20 overlapped clustered effects 50% (10/20) are direct targets of HAP4, 167 while the genomic background is 0.64% (33/5146) (P = 4.6×10 -18 , Hypergeometric test) 168 ( Fig. 2f). Hence, the CHIP data well support the distinction of the two effect types. 169 Collectively, these results are consistent with a recently proposed model 23   amino acids (Fig. 3c), a functional insight not been well recognized 36 . 207 We also checked genes on the same KEGG pathways. There are 41 pathways that 208 are related to metabolism, genetic information processing, cellular processes, and so on, 209 suitable for our analysis (Table S5). The rate of overlaps of clustered effects is 210 significant higher than that of distributed effects in nine cases, and the enrichments 211 range from 5.6-fold to over 100-fold with a median 46.9-fold ( Fig. 3d and Table S5).

212
Consistently, the overlapped clustered effects of each pathway represent distinct

224
These traits are typically about area, distance, and angle calculated based on dozens of 225 coordinate points, lines and angles that describe the shape of mother cell and bud, and 226 the shape and localization of the nuclei in mother cell and bud (Fig. 4a) all the other three genes, which is significantly higher than the number (18/54 = 33.3%) 235 for the non-conserved effects of HAP4 (P = 0.035, Fisher's exact test; Fig. 4b). The 236 estimations are not explained by correlated traits (Fig. S8), and the difference remains 237 largely unchanged when only traits with small measuring noise are considered (Fig. S9).

238
Hence, the cell morphology data also support the role of evolution in separating genetic 239 effects.  In summary, by re-assessing genetic causality in the light of evolution this study 281 suggests an expanded framework for reverse genetics (Fig. 4c)      Nearly 90 years ago Hermann J. Muller coined the terms amorph, hypermorphy, hypomorph, antimorph, and neomorph to classify mutations based on their lossor gain-of-function nature (see ref. 1). The basic idea of the classification has been fundamental to genetic analysis. In particular, amorph refers to null mutation on a gene, and the resulting phenotype is believed to represent the gene's native function. In contrast, neomorph refers to gain-of-function mutation on a gene, and the resulting phenotype does not represent the gene's native function. These concepts, although intuitively valid, lack rigorous tests. The diagram on the left illustrates how confusion could arise.
Suppose there is a living system with three genes (A, B and C) and two traits (T1 and T2). The function of proteins A and B is to form a dimer to regulate T1, and the function of protein C is to regulate T2. The understand the system we may apply genetic analysis. Deletion of B will break the A-B dimer, altering T1. This phenotype change represents the native function of B. However, when B is absent, A may find C to form a new, although less intimate, dimer A-C, which would alter T2. This is plausible since proteins with a structurally similar domain are prevalent in a eukaryotic genome. The change of T2 does not represent the native function of B; instead, it is explained by the non-native A-C dimer, a spurious function that arises from the deletion of B.
Notably, the spurious function arising from the deletion of B is by nature same as the new function caused by a gain-of-function mutation on A. It is well accepted that phenotype changes resulting from gain-of-function mutations do not represent native functions. From an evolutionary perspective only the A-B dimer is selected; the A-C dimer in both mutated systems is ad hoc.            Fig. 4B is robust against trait measuring noise. To address the potential technical bias that traits with large measuring noise tend to be both non-conserved and non-overlapping, only traits with measuring CV < 0.1 across the replicates in wild-type BY 47 41 are considered. This results in 58 traits that are significantly affected by HAP4 deletion in S. cerevisiae, among which 19 are conserved effects and 39 non-conserved effects. The rate of overlaps in the conserved set remains significantly higher than the non-conserved set (P = 0.029, one-tailed Fisher's exact test). Overlaps refer to traits significantly affected by all four gene deletions in S. cerevisiae.