Recent Positive Selection Has Acted on Genes Encoding Proteins with More Interactions within the Whole Human Interactome

Genes vary in their likelihood to undergo adaptive evolution. The genomic factors that determine adaptability, however, remain poorly understood. Genes function in the context of molecular networks, with some occupying more important positions than others and thus being likely to be under stronger selective pressures. However, how positive selection distributes across the different parts of molecular networks is still not fully understood. Here, we inferred positive selection using comparative genomics and population genetics approaches through the comparison of 10 mammalian and 270 human genomes, respectively. In agreement with previous results, we found that genes with lower network centralities are more likely to evolve under positive selection (as inferred from divergence data). Surprisingly, polymorphism data yield results in the opposite direction than divergence data: Genes with higher centralities are more likely to have been targeted by recent positive selection during recent human evolution. Our results indicate that the relationship between centrality and the impact of adaptive evolution highly depends on the mode of positive selection and/or the evolutionary time-scale.


Network-Level Analysis Using Different Alignment Sets to Detect Positive Selection from Divergence Data
In order to confirm the relationship observed between the position of proteins in the protein-protein interaction network (PIN) and interspecific positive selection, we used two alternative sets of alignments to which we applied the M7 vs. M8 model (Nielsen & Yang 1998). We first considered the set of 8,697 human genes represented in the PIN with orthologs in 3 to 9 non-human genomes. Alignments were obtained and filtered as described in Material and Methods. We observed that log-likelihood increments (2Δℓ scores) from the positive selection test exhibit a significant negative correlation with proteins' degrees (Spearman's rank correlation coefficient, ρ = −0.0557; P < 0.0001; Supplementary Table 2).
In addition, 2Δℓ values are significantly different in the different groups according to the degree quartiles (non-parametric ANOVA; P < 0.0001; Supplementary Table 2; Supplementary Figure 2), and these differences among groups are due to a clear trend towards lower 2Δℓ scores in groups corresponding to higher degrees (linear trend test on ranks; P < 0.0001; Supplementary Table 2; Supplementary Figure 2). Second, we used the alignemnts corresponding to the 5,916 human genes with 1:1 orthologs in all 9 non-human genomes, to which we did not apply any alignment filtering. We also observed a negative correlation between 2Δℓ and proteins' degrees (Spearman's rank correlation coefficient, ρ = −0.0772; P < 0.0001; Supplementary Table 2). 2Δℓ values are also significantly different in the different groups defined according to the degree quartiles (non-parametric ANOVA; P < 0.0001; Supplementary Table 2; Supplementary Figure 2), and these differences among groups are due to a clear trend towards lower 2Δℓ scores in groups corresponding to higher degrees (linear trend test on ranks; P < 0.0001; Supplementary Table 2; Supplementary Figure 2). In summary, using these two alternative sets of alignments resulted in qualitatively equivalent results.

Network-Level Analysis Using Different Methods to Detect Positive Selection from Polymorphism Data
In order to confirm the relationship observed between the position of proteins in the PIN and intraspecific positive selection, we broadened the analysis by using separately the three original tests to detect positive selection iHS (Voight et al. 2006); XP-CLR (Chen et al. 2010) and DH (Zeng et al. 2007) used to compute the Fisher's combination test score (ZF). We also used the Composite of Multiple Signals (CMS) method (Grossman et al. 2010) calculated in YRI, CEU and CHB+JPT populations using Pilot1 genotype data from the 1000 Genomes Project (Grossman et al. 2013). For each gene we used the average score (for iHS, XP-CLR and CMS) or the −log10(P-value) (for DH) as summary scores. We then applied a Spearman's correlation analysis between gene degree (number of interactions), as an estimator of network centrality, and these scores. Moreover, genes were classified into four groups delimited by their first, second and third degree quartiles. The median summary scores of the four groups were compared using a non-parametric ANOVA test. We also applied a linear trend test to contrast whether the putative differences among groups were due to a trend towards higher summary scores in groups corresponding to higher degrees (Supplementary Table 3 Table 3). When using the XP-CLR score, we did not observe such a clear relationship between degree and the impact of positive selection. Indeed, although the Spearman's correlation coefficient is positive in the three populations and significantly different from 0 in YRI and CHB (ρ equal to 0.0385 and 0.0282, respectively; P equal to 0.0007 and 0.0127, respectively; Supplementary Table 3), the non-parametric ANOVA reaches significance only in YRI (P = 0.0211), and so does the linear trend test on ranks in YRI and CEU (P = 0.0028 and P = 0.0236, respectively; Supplementary  Figure 3). Moreover, we also observe an association between degree and the impact of selection as measured by DH. Indeed, the Spearman's correlation coefficient is positive and significantly different from 0 in the three populations (ρ ≥ 0.0343; P ≤ 0.0015;  Figure 3).

Supplementary
In summary, the observed general tendency of central genes to evolve under recent positive selection remains significant when positive selection is inferred separately from different statistics.

Subset of Unlinked Genes
To study the impact of positive selection on genes we only used the SNVs located within the genomic region corresponding to the longest transcript. However, it is well known that the regions affected by a selective sweep are large, spanning hundreds of kilobases or even megabases and containing many potential variants driving the signal. Thus, several adjacent genes may be affected by a unique event of selection targeting one particular variant. Therefore, some of the genes showing signals of positive selection in our study may be false positives, even though we do not expect that this bias can affect our network-level analysis, since there is no reason why false positives should tend to concentrate in specific parts of the PIN. To confirm that our study does not suffer from this caveat, we first validated our results using the Composite of Multiple Signals (CMS) method (Grossman et al. 2010) calculated in the YRI, CEU and CHB+JPT populations using the Pilot1 genotype data from the 1000 Genomes Project (Grossman et al. 2010). Although this study used the less accurate Pilot1 data, the implemented method presents the strong advantage of more accurately pinpointing a small number of variants within a large genomic region (Grossman et al. 2010). Thus, using this test we expect to reduce to a great extent the number of falsely detected genes due to genetic hitch-hiking. Our network-level analyses have been confirmed by the use of CMS and, in fact, the association between the impact of selection and network centrality appears to be stronger (see Supplementary Table 3 To further confirm that hitch-hiking does not affect the association between the impact of positive selection and degree, we built a subset of unlinked genes, i.e. not in linkage disequilibrium, for the three populations (YRI, CEU and CHB). For that purpose, in each population, we used the population-specific recombination rates estimated genome-wide (recombination map provided by the 1000 Genomes Project Pilot 1 (The 1000 Genomes Project Consortium 2010)) and defined as a recombination hotspot a region where the observed recombination rates was greater than 10 times the genome average, i.e. greater than 18.36 cM/Mb, 18.55 cM/Mb and 17.61 cM/Mb in YRI, CEU and CHB, respectively. Then, we randomly sampled one PIN gene located between two recombination hotspots. We obtained three subsets of most likely unlinked genes involved in the PIN containing 2792, 3106 and 3107 genes in YRI, CEU and CHB, respectively.
For each gene we used the ZF score as the likelihood of having been targeted by positive selection in the human populations, and observed that it is significantly positively correlated with degree in all three populations (Spearman's correlation analysis; ρ ≥ 0.0450; P ≤ 0.0248; Supplementary Table 4). Moreover, when genes were classified into four groups delimited by the first, second and third degree quartiles, we observed significant differences of summary scores among groups in CEU and CHB (non-parametric ANOVA test, P equal to 0.0036 and 0.0075, respectively; Supplementary Table 4; Supplementary Figure 4). Through a linear trend test on ranks, we concluded that these differences among groups were due to a trend towards higher summary scores in groups corresponding to higher degrees in these two populations (P ≤ 0.0053; Supplementary Table 4; Supplementary Figure 4). For YRI, although the non-parametric ANOVA was not significant (P = 0.2661), the linear trend test on ranks was marginally significant (P = 0.0482).

Network-Level Analysis Correcting for Putative Confounding Factors
Factors such as gene expression level and breadth (tissues in which a gene is expressed), and the length of the encoded proteins, correlate with both network centralities and the likelihood of detecting natural selection, and hence could potentially have an effect on the observed relationship between the impact of natural selection and the gene centrality in the network (Anisimova et al. 2002;Duret & Mouchiroud 2000;Kim et al. 2007;Kosiol et al. 2008;Pál et al. 2006;Subramanian & Kumar 2004). In order to evaluate the effect of these factors, we applied a linear regression between the scores used as the likelihood of having been targeted by positive selection during human and mammalian evolution (ZF and 2Δℓ, respectively), as well as the scores that estimate the strength of purifying selection (DAF, NI and ω) and the mentioned putative confounding factors. The linear regression residuals were then used to perform the correlation analysis in each case. The relationship between positive selection inferred using polymorphism data and degree remains significant in all three populations with a Spearman's correlation coefficient, ρ, ranging between 0.0326 (P = 0.0059) in CEU and 0.0451 (P = 0.0001) in YRI (Main text Table 1). Moreover, the nonparametric ANOVA and trend tests on ranks provide similar results when using the linear regression residual instead of the ZF score, although P-values tend to be higher (Main text  Figure 5). Indeed, the non-parametric ANOVA test is significant in YRI (P = 0.0423) and CEU (P = 0.0240), while it does not reach significance in CHB (P = 0.1006). Moreover, we observe a trend towards higher residuals in groups corresponding to higher degree (Main text Table 1; Supplementary Figure 5). Indeed, in YRI and CHB the linear trend test on ranks reaches significance (P = 0.0053 and P = 0.0239, respectively).
Taken together, these observations indicate that the association observed between the ZF score and degree within the PIN cannot be attributed to the three putative confounding factors.
Similarly, the association observed between the impact of positive selection inferred using divergence data (as estimated by 2Δℓ) and degree remains significant when we correct for the three putative confounding factors. Indeed, we observed a significant negative Spearman's partial correlation coefficient (ρ = −0.0340; P = 0.0107; Main text Table 1). Moreover, although the non-parametric ANOVA test is marginally significant (P = 0.0548), the trend test remains significant (P = 0.0122) with lower residuals in groups corresponding to higher degree (Main text Table 1; Supplementary Figure 5). Finally, the relationship between purifying selection and degree also remains significant when using the residuals of the linear regression of either DAF or ω with protein length, expression level and expression breadth.
Indeed, the correlation tests remain significant (ρ = −0.0668 and −0.1698, respectively; P < 0.0001 in both cases), as well as the non-parametric ANOVA (P ≤ 0.0003) and the linear trend tests (P < 0.0001). However, the relationship between degree and purifying selection when using the residuals of the linear regression of NI with protein length, expression level and expression breadth, does not remain significant. Indeed, neither the correlation tests (ρ = 0.0314; P = 0.0742), nor the non-parametric ANOVA (P = 0.0713) or the linear trend tests (P = 0.0852) reach significance (Main text Table 1; Supplementary Figure 5).

Network-Level Analysis Using Different Protein−Protein Interaction Networks
To validate the association between network position and the impact of positive selection, analyses were repeated using two additional high-quality networks: a high-quality (HQ) subnetwork from BioGRID (Stark et al. 2011), in which we retained only interactions discovered by low-throughput techniques, plus those reported in at least two independent high-throughput analyses, and the network from the Human Protein Reference Database When studying the association between positive selection as inferred from divergence data (estimated by 2Δℓ) and degree in the HQ and HPRD networks, we observed a negative Spearman's correlation coefficient (ρ = −0.0620 and ρ = −0.0577, respectively; P < 0.0001; Supplementary Table 5). We also observed differences in the 2Δℓ scores among degree groups (non-parametric ANOVA; P ≤ 0.0011; Supplementary Table 5; Supplementary Figures 6 and   7), with higher 2Δℓ scores in groups corresponding to lower degrees. Indeed, the linear trend test on ranks reaches significance in both networks (P ≤ 0.0001). Finally, when studying the association between purifying selection and degree in both the HQ and HPRD networks, we observed a significantly negative Spearman's correlation coefficient with DAF and ω (ρ ≤ −0.0488; P < 0.0001; Supplementary Table 5) and a significantly positive Spearman's correlation coefficient with NI (ρ ≥ 0.0714; P ≤ 0.0002; Supplementary Table 5). Moreover, we observed clear differences in DAF, NI and ω among degree groups (non-parametric ANOVA, P ≤ 0.0008), due to a clear tendency towards lower DAF and ω values and higher NI values in groups corresponding to higher degrees (linear trend test on ranks; P ≤ 0.0001; Supplementary Table 5; Supplementary Figures 6 and 7).

Network-Level Analysis Using Different Centrality Measures
We explored whether the association found between the impact of natural selection and network centrality, as estimated by degree (the number of interactions in which a protein is involved), was also significant when using other centrality measures. For that purpose, we calculated two other centrality measures: betweenness (the number of shortest paths between other proteins passing through a given protein), and closeness (the inverse of the average distance to all other proteins in the network). The association between the impact of natural selection and these network centrality measures remains similar, regardless of the centrality measure considered. Indeed, the Spearman's correlation coefficient between either betweenness or closeness and ZF, the score used as the likelihood of having been targeted by positive selection in recent human evolution, is significantly positive in all three populations (ρ ≥ 0.0295; P ≤ 0.0096; Supplementary Table 6). Moreover, we observed ZF differences among betweenness groups in the three populations performing a non-parametric ANOVA, which reaches significance in the three populations (P ≤ 0.0332; Supplementary Table 6 When studying the association between positive selection during mammalian evolution (as estimated by 2Δℓ) and network centrality, using both betweenness and closeness, we observed a significantly negative Spearman's correlation coefficient (ρ equal to −0.0645 and −0.0726, respectively; P < 0.0001; Supplementary Table 6). We observed differences in the 2Δℓ scores among betweenness groups (non-parametric ANOVA, P < 0.0001; Supplementary Table 6 Supplementary Figures 8 and 9).

Correcting for the Action of Purifying Selection
The action of purifying selection on a genomic region can leave some patterns that are similar to the ones expected under recent positive selection (e.g. an excess of rare variants). Therefore, we wanted to confirm that the association found between positive selection estimated from polymorphism data and degree is not a by-product of the already described association between purifying selection and network centrality. For that purpose, we applied a linear regression between ZF, the score used as the likelihood of having been targeted by the impact of recent positive selection in human populations, and ω, which estimates the strength of purifying selection during mammalian evolution. The linear regression residuals were then used to perform the analysis. The relationship between positive selection and degree remains positive in all three populations, with a Spearman's correlation coefficient, ρ, greater than or equal to 0.0195 (Supplementary Table 7). It is significantly different from 0 in YRI and CHB (P equal to 0.0020 and 0.0032, respectively). Moreover, the residuals are marginally different among degree groups (non-parametric ANOVA; P ranging from 0.0371 to 0.0578; Supplementary Table 7; Supplementary Figure 10). These differences are due to a trend towards higher residuals in groups corresponding to higher degrees in YRI and CHB (linear trend test on ranks; P equal to 0.0081 and 0.0100, respectively; Supplementary Table 7; Supplementary Figure 10). In summary, the observed association between ZF scores and protein degree cannot be attributed to the association between network centrality and the action of purifying selection.
We further confirmed that background selection (BGS), a process that removes neutral variation linked to deleterious mutations, thus reducing levels of polymorphism in regions of high functional density and low recombination (Charlesworth et al. 1993), does not confound the association observed between network centrality and positive selection estimated from polymorphism data. We estimated the level of BGS acting on each gene using two correlates of BGS: GC content and recombination rate. Note that we did not take into account the level of functional constraint because the present study focuses on protein-coding genes. For each gene, we calculated the average of GC content from the 5-mer table downloaded from the UCSC browser (Karolchik et al. 2009) (table "gc5Base" downloaded on the 10 th of July, 2013), as well as the average recombination rate from the population-specific recombination rates estimated genome-wide (recombination map provided by the 1000 Genomes Project Pilot 1 (The 1000 Genomes Project Consortium 2010)). We then applied a linear regression between ZF, the score used as the likelihood of having been targeted by positive selection, and both recombination rate average and GC content average. The linear regression residuals were then used to perform the analysis. The relationship between positive selection and degree remains positive in all three populations, with a Spearman's correlation coefficient, ρ, greater than 0.0369 (Supplementary Table 8) and significantly different from 0 (P ≤ 0.0013).
Moreover, the residuals are different among degree groups in all three populations (nonparametric ANOVA; P ranging from 0.0017 to 0.0089; Supplementary Table 8; Supplementary Figure 11), and these differences are due to a trend towards higher residuals in groups corresponding to higher degree (linear trend test on ranks; P ranging from 0.0007 to 0.0016; Supplementary Table 8; Supplementary Figure 11). Therefore, the association observed between ZF scores and degree cannot be attributed to the association between network centrality and the action of BGS.

Network-Level Analysis for Short-Term Positive Selection Through Soft Sweeps
There exist a broad range of methods to detect molecular signals of adaptation, each one taking advantage of specific patterns expected to be observed at a locus that has evolved under positive selection. The methods we used in the present study present the advantage of using only genetic data, and, thus, we could take advantage of the extraordinary SNP density made available by the 1000 Genomes Project (The 1000 Genomes Project Consortium 2012).
However, these methods, based on genetic differentiation, linkage disequilibrium and site frequency spectrum, rely on the assumptions that the new beneficial allele is driven in very few generations to high frequencies. Therefore, the signal detected under such assumptions are expected to be the consequence of a selective sweep targeting a mutations with a consequent selective coefficient, most likely due to its important effect size. In order to explore the association between network topology and the impact of positive selection acting through other modes of adaptation (namely soft sweeps, selection of standing variants and polygenic adaptation; for a review see Pritchard et al. 2010), we used the results of a study that scanned the genome for selection signals by identifying the SNPs with the strongest correlations between allele frequencies and climate across 61 worldwide populations (Hancock et al. 2011). This method allows to detect SNPs that went through a similar subtle shift in allele frequency (Hancock et al. 2010) in distant populations living under similar environmental conditions. The positive selection events thus detected are expected to have acted on variants that make smaller contributions to the adaptive trait (Hancock et al. 2010).
We downloaded the results of this study from the DBcline database (http://genapps2.uchicago.edu/dbcline; Hancock et al. 2011) for the absolute latitude and eight environmental variables (see Supplementary Table 6 for the list) and assigned to each gene and each individual environmental variable, the log10 transformation of the minimum Pvalue reported for the association between the environmental variable and the allele frequencies at SNPs located within the gene. We also computed an extra summary statistic, referred to as "8 Climate Variables" which is the log10 transformation of the lower P-value reported for the association between any of the eight environmental variables and allele frequencies at SNPs located within the gene. We then tested for the correlation between these local adaptation scores and degree within the PIN. We found no significant correlation (Spearman's correlation test; P ranging from 0.0999 to 0.9992 with -0.0155 < ρ < 0.0195; Supplementary Table 6). Moreover, we observed six positive correlations and four negative ones. Similar results were obtained when using as a gene-level summary statistic the mean of the score or the proportion of P-values below 0.05 (data not shown).
Therefore, there is no evidence for any association between the impact of local adaptation through soft sweep and the network topology, and therefore it seems that local adaptation events through subtle shifts in allele frequencies are uniformly distributed across the PIN. Permutation test (P-value) d 0.0254* 0.0108* 0.2862 0.0929 0.0067** a Positive selection is invoked when the P-value associated to ZF score is below 5%. b Positive is invoked at global level when the P-value associated with the ZF score is below 5% in at least one of the three studied populations (YRI, CEU or CHB). a Positive selection is invoked when the P-value associated to 2Δℓ score is below 5%. d P-values were calculated using permutations. In each permutation a set of genes is randomly drawn, with the sampling size corresponding to the number of genes with signals of positive selection. Then, the average of their degree is compared to the one obtained for the genes with signals of positive selection. P-values are computed as the proportion of permutations with an average degree higher or equal to the observed one. *: P < 0.05; **: P < 0.01; ***: P < 0.001. P-value 1.60×10 −06 *** 4.66×10 −08 *** a Spearman correlation between degree and the positive selection score (2Δℓ) calculated using filtered alignment for 4 to 10 mammal species or unfiltered alignments for 10 species.

Supplementary
High scores indicate a higher probability of having evolved under positive selection.

Supplementary
High scores indicate a higher probability to have evolved under positive selection. b Non-parametric ANOVA and trend tests on ranks performed to contrast whether the medians of the positive selection scores are equal across the degree groups. *: P < 0.05; **: P < 0.01; ***: P < 0.001. We obtained a subset of most likely unlinked genes represented in the network containing 2,792, 3,106 and 3,107 genes in YRI, CEU and CHB, respectively, by randomly sampling one network gene located between two recombination hotspots (defined as a region where the observed recombination rates is greater than 10 times the genome recombination rate average).

Supplementary
a Spearman correlation between degree and ZF in YRI, CEU and CHB.
High ZF scores indicate a higher probability of having evolved under positive selection.
b Non-parametric ANOVA and trend tests on ranks performed to contrast whether the medians of the ZF score are equal across the degree groups (calculated on the whole set of genes). *: P < 0.05; **: P < 0.01. P-value 0.0018** 0.0093** 0.0012** 2.85×10 −07 *** 1.77×10 −11 *** 0.0061** 6.59×10 −35 *** a Spearman correlation between degree and natural selection scores (ZF for positive selection in the YRI, CEU and CHB populations; 2Δℓ for positive selection in mammals; DAF for purifying selection during recent human evolution; NI for purifying selection in the human lineage; and ω for purifying selection in mammals). High ZF and 2Δℓ scores indicate a higher probability of having evolved under positive selection as inferred from polymorphsim and divergence data, respectively. Low DAF and ω scores indicate higher evolutionary constraint estimated from polymorphism and divergence data, respectively, while high NI scores indicate higher evolutionary constraint estimated from both polymorphism and divergence data. b Non-parametric ANOVA and trend tests on ranks performed to contrast whether the medians of the natural selection scores are equal across the connectivity measure groups. For Betweenness, the 1 st and 2 nd quartiles were merged due to the uneven distribution of values. *: P-value < 0.05; **: P-value < 0.01; ***: P-value < 0.001. P-value 0.0081** 0.1541 0.0100** In order to test for an association between degree and the impact of positive selection in humans controlling for ω, we used the ZF as the likelihood of having been targeted by positive selection. We then applied a linear regression between this score and ω. High ZF values indicate a higher probability of having evolved under positive selection. Low ω scores indicate higher selective constraint. The linear regression residuals were then used to perform the Spearman's correlation analysis, the non-parametric ANOVA and the linear trend test on rank. a Spearman correlation between degree and the residuals. b Non-parametruc ANOVA and trend tests performed to contrast whether the medians of the residuals across the degree groups. *: P < 0.05; **: P < 0.01; ***: P < 0.001.  Trend test on ranks b F 11.53 9.947 11.48 P-value 0.0007*** 0.0016** 0.0007*** In order to test for an association between degree and the impact of positive selection in humans controlling for background selection, we used ZF as the likelihood of having been targeted by positive selection. We then applied a linear regression between this score and both populationspecific recombination rate average across the gene and GC content average across the gene.

Supplementary
High ZF values indicate a higher probability of having evolved under positive selection.
The linear regression residuals were then used to perform the Spearman's correlation analysis, the ANOVA and the linear trend test. a Spearman correlation between degree and the residuals. b Non-parametric ANOVA and trend tests performed to contrast whether the medians of the residuals across the degree groups. *: P < 0.05; **: P < 0.01; ***: P < 0.001. Enrichment analysis performed using the PANTHER statistical over-representation test (Mi et al. 2013) for GO Cellular Compartment for the 1,521 genes with putative signatures of positive selection in any of the three human populations (P < 0.05 for the Fisher's combination test) and

Supplementary
using as a reference set the 7,603 PIN genes for which the ZF score could be computed. Genes were classified into four groups according to the degree quartiles calculated in the network.
The median of the positive selection score ± one median absolute deviation within each group is represented in the y-axis. A non-parametric ANOVA analysis was performed to contrast whether the medians of the positive selection scores were equal across the groups. A trend test on ranks was also been carried out to test for a linear relationship between the four quartiles (encoded from 1 to 4) and natural selection scores. A Tukey's honestly significant difference test was further applied to test for all pairwise differences. Significantly different pairs are marked with asterisks. *: P < 0.05; **: P < 0.01; ***: P < 0.001. We obtained a subset of most likely unlinked genes involved in the network containing 2,793, 3,107 and 3,108 genes in YRI, CEU and CHB, respectively, by randomly sampling one network gene located between two recombination hotspots (defined as a region where the observed recombination rates is greater than 10 times the genome recombination rate average). Genes were classified into four groups according to the degree quartiles. The median of the ZF scores ± one median absolute deviation within each group is represented in the y-axis. A non-parametric ANOVA analysis was performed to contrast whether the median scores are equal across the groups. A trend test on ranks was also carried out to test for a linear relationship between the four groups (encoded from 1 to 4) and natural selection scores. A Tukey's honestly significant difference test was further applied to test for all pairwise differences. Significantly different pairs are marked with asterisks. *: P < 0.05; **: P < 0.01; ***: P < 0.001. ZF and 2Δℓ were used to estimate the impact of positive selection in human populations and in mammals, respectively. DAF, NI and ω were used to estimate the impact of purifying selection in recent human populations, in the human lineage and in mammals, respectively. Lower DAF and ω values indicate higher evolutionary constraint estimated from polymorphism and divergence data, respectively, while higher NI values indicate higher evolutionary constraint estimated from both polymorphism and divergence data. In order to test for an association between degree and positive selection scores controlling simultaneously for protein length, expression level and expression breadth, we applied a linear regression between positive selection scores and these factors. The linear regression residuals were then used to perform the Spearman's correlation analysis, the nonparametric ANOVA and the linear trend test on ranks. Genes were classified into four groups according to the degree quartiles. The median of the residuals ± one median absolute deviation within each group are represented in the y-axis. A non-parametric ANOVA analysis was performed to contrast whether the medians of the scores are equal across the groups. A trend test was carried out to test for a linear relationship between the four groups (encoded from 1 to 4) and natural selection scores. A Tukey's honestly significant difference test was further applied to test for all pairwise differences. Significantly different pairs are marked with asterisks. *: P < 0.05; **: P < 0.01; ***: P < 0.001. Figure 6. Impact of natural selection among groups of genes classified according to their degree in the BioGRID high quality network.

Supplementary
Genes were classified into four groups according to the degree quartiles calculated in the network HQ. The median of the positive selection score used as likelihood of having been targeted by natural selection ± one median absolute deviation within each group is represented across the yaxis. ZF and 2Δℓ were used to infer the impact of positive selection in human populations and in mammals, respectively. DAF, NI and ω were used to estimate the impact of purifying selection in recent human populations, in the human lineage and in mammals, respectively. Lower DAF and ω values indicate higher evolutionary constraint estimated from polymorphism and divergence data, respectively, while higher NI values indicate higher evolutionary constraint estimated from both polymorphism and divergence data. A non-parametric ANOVA analysis was been performed to contrast whether the medians of the scores were equal across the groups. A trend test on ranks was also carried out to test for a linear relationship between the four groups (encoded from 1 to 4) and natural selection scores. A Tukey's honestly significant difference test was further applied to test for all pairwise differences. Significantly different pairs are marked with asterisks. *: P < 0.05; **: P < 0.01; ***: P < 0.001. Genes were classified into four groups according to the degree quartiles calculated in the HPRD network. The median of the positive selection scores ± one median absolute deviation within each group is represented across the y-axis. ZF and 2Δℓ were used to estimate the likelihood of having been targeted by positive selection in human populations and in mammals, respectively. DAF, NI

Supplementary
and ω were used to estimate the impact of purifying selection in recent human populations, in the human lineage and in mammals, respectively. Lower DAF and ω values indicate higher evolutionary constraint estimated from polymorphism and divergence data, respectively, while higher NI values indicate higher evolutionary constraint estimated from both polymorphism and divergence data. A non-parametric ANOVA analysis was performed to contrast whether the medians of the positive selection scores are equal across the groups. A trend test on ranks was also carried out to test for a linear relationship between the four groups (encoded from 1 to 4) and natural selection scores. A Tukey's honestly significant difference test was further applied to test for all pairwise differences. Significantly different pairs are marked with asterisks. *: P < 0.05; **: P < 0.01; ***: P < 0.001. Genes were classified into four groups according to the betweenness quartiles calculated in the interactome. The 1 st and 2 nd groups were merged due to the uneven distribution of values. The median of the positive selection scores ± one median absolute deviation within each group is represented across the y-axis. ZF and 2Δℓ were used to estimate the likelihood of having been targeted by positive selection in human populations and in mammals, respectively. DAF, NI and ω were used to estimate the impact of purifying selection in recent human populations, in the human lineage and in mammals, respectively. Lower DAF and ω values indicate higher evolutionary constraint estimated from polymorphism and divergence data, respectively, while higher NI values indicate higher evolutionary constraint estimated from both polymorphism and divergence data. A non-parametric ANOVA analysis was performed to contrast whether the medians of the scores are equal across the groups. A trend test on ranks was also carried out to test for a linear relationship between the four quartiles (encoded from 1 to 3) and natural selection scores. A Tukey's honestly significant difference test has been further applied to test for all pairwise differences. Significantly different pairs are marked with asterisks. *: P < 0.05; **: P < 0.01; ***: P < 0.001. Figure 9. Impact of natural selection among groups of genes classified according to their closeness in the BioGRID network.

Supplementary
Genes were classified into four groups according to the closeness quartiles. The median of the positive selection scores ± one median absolute deviation within each group is represented across the y-axis. ZF and 2Δℓ were used to estimate the likelihood of having been targeted by positive selection in human populations and in mammals, respectively. DAF, NI and ω were used to estimate the impact of purifying selection in recent human populations, in the human lineage and in mammals, respectively. Lower DAF and ω values indicate higher evolutionary constraint estimated from polymorphism and divergence data, respectively, while higher NI values indicate higher evolutionary constraint estimated from both polymorphism and divergence data. A non-parametric ANOVA analysis was performed to contrast whether the medians of the scores are equal across the groups. A trend test on ranks was also carried out to test for a linear relationship between the four quartiles (encoded from 1 to 4) and natural selection scores. A Tukey's honestly significant difference test was further applied to test for all pairwise differences. Significantly different pairs are marked with asterisks. *: P < 0.05; **: P < 0.01; ***: P < 0.001. In order to test for an association between degree and the ZF score controlling for purifying selection, we applied a linear regression between ZF and ω. The linear regression residuals were then used in a Spearman's correlation analysis, a non-parametric ANOVA and a linear trend test on ranks. Genes were classified into four groups according to the degree quartiles. The median of the residuals ± one median absolute deviation within each group is represented across the y-axis. A Tukey's honestly significant difference test was further applied to test for all pairwise differences.

Supplementary
Significantly different pairs are marked with an asterisk. *: P < 0.05; **: P < 0.01; ***: P < 0.001. Figure 11. Impact of positive selection during human evolution among groups of genes divided according to their degree controlling for covariates of background selection.

Supplementary
In order to test for an association between degree and the ZF scores controlling for background selection, we applied a linear regression between ZF and both the GC content and the average population-specific recombination rate across the gene. The linear regression residuals were then used in a Spearman's correlation analysis, a non-parametric ANOVA and a linear trend test on ranks.
Genes were classified into four groups according to their degree. The median of the residuals ± one median absolute deviation within each group is represented across the y-axis. A Tukey's honestly significant difference test was further applied to test for all pairwise differences. Significantly different pairs are marked with asterisks. *: P < 0.05; **: P < 0.01; ***: P < 0.001.