Detection of Pathways Affected by Positive Selection in Primate Lineages Ancestral to Humans

Abstract Gene set enrichment approaches have been increasingly successful in finding signals of recent polygenic selection in the human genome. In this study, we aim at detecting biological pathways affected by positive selection in more ancient human evolutionary history. Focusing on four branches of the primate tree that lead to modern humans, we tested all available protein coding gene trees of the Primates clade for signals of adaptation in these branches, using the likelihood-based branch site test of positive selection. The results of these locus-specific tests were then used as input for a gene set enrichment test, where whole pathways are globally scored for a signal of positive selection, instead of focusing only on outlier “significant” genes. We identified signals of positive selection in several pathways that are mainly involved in immune response, sensory perception, metabolism, and energy production. These pathway-level results are highly significant, even though there is no functional enrichment when only focusing on top scoring genes. Interestingly, several gene sets are found significant at multiple levels in the phylogeny, but different genes are responsible for the selection signal in the different branches. This suggests that the same function has been optimized in different ways at different times in primate evolution.


Introduction
An important challenge in the study of protein evolution is the detection of substitutions fixed by positive selection on a background of genetic drift and purifying selection. The detection of such positive selection signal has progressed thanks to better codon models and statistical tests (Delport et al. 2009). Yet these tests suffer from low power (Anisimova and Yang 2007;Gharib and Robinson-Rechavi 2013), especially when they are applied to closely related species, where phylogenetic trees only have relatively short branches with few descending sequences. The situation is worse when positive selection is weaker, and thus harder to detect, e.g., in species with small population sizes. If cumulated, these effects make it notably difficult to reliably detect positive selection in recent primate evolution, such as on the phylogenetic branches directly leading to humans.
Despite these inherent limitations, there is much interest in detecting positive selection in humans, their primate relatives, and their direct ancestors (Lachance and Tishkoff 2013). For example, as soon as the chimpanzee genome was available, genome-wide scans using codon models were performed, resulting in the detection of fast evolving genes related to immunity, host defense, or reproduction (Chimpanzee Sequencing and Analysis Consortium 2005; Nielsen et al. 2005). However, these genome scans often lacked power to distinguish positive selection from relaxed purifying selection. With more species (i.e., more data) positive selection was eventually detected. However, very few genes remained significant after correcting for multiple tests (e.g., Bakewell et al. 2007;Gibbs et al. 2007), and thus only those genes with many positively selected mutations were identified. Although these limited results are interesting, recent studies have tried to uncover a more comprehensive picture of positive selection in the human lineage and its ancestors. Using patterns of incomplete lineage sorting (ILS), Munch et al. (2016) identified selective sweeps in ancestors of the Great Apes. Cagan et al. (2016) combined several neutrality tests to infer natural selection in the great apes, and found that population size has been a major determinant of the effectiveness of selective forces.
Here, we propose to combine potentially weak to moderate signals from several genes to gain statistical power, using biologically meaningful groupings of genes, such as known regulatory and metabolic pathways. Indeed, several genes with small effect mutations can altogether have a large impact on a biological pathway, even though these genes would have little chance to be identified by conventional genome scans. An increasing number of studies has thus shifted the focus from single gene approaches to the detection of polygenic selection (e.g., Serra et al. 2011;Daub et al. 2013;Fraser 2013;Berg and Coop 2014;Carneiro et al. 2014;Wellenreuther and Hansson 2016), taking advantage of existing databases of gene sets and pathways. For example, we have used a gene set enrichment test to detect gene sets with polygenic adaptive signals in human populations (Daub et al. 2013), showing that most significant pathways for positive selection were involved in defense against pathogens. This procedure has also been successfully applied to find signals of convergent adaptation in humans living at high altitudes (Foll et al. 2014) or in tropical forests (Amorim et al. 2015) and to detect positive selection in ant genomes using the results of the branch-site test (Roux et al. 2014). More recently, we have extended the classical McDonald-Kreitman test (McDonald and Kreitman 1991) to detect more ancient selection signals, i.e., outlier pathways affected by different modes of selection after the split of humans with chimpanzee .
In the present study, we investigate whether one can find traces of positive selection in older periods of human evolution. We use the branch-site likelihood-based test, contrasting codon models with and without positive selection (Zhang et al. 2005). This testing procedure can detect episodic positive selection, while avoiding false positives due to relaxation of selective constraints, and it has been widely used to find signals of ancient positive selection in various species, including primates and other mammals (Zhang et al. 2005;Bakewell et al. 2007;Kosiol et al. 2008;Studer et al. 2008). We propose here to combine the gene-specific likelihoods of branch-site tests over all members of a gene set, to infer which biological systems have been under positive selection in the primate ancestors of humans.
Using this approach, we detect signals of adaptation in pathways involved in immune response, sensory perception, metabolism, and energy production. Furthermore, we find that in many candidate pathways different genes are responsible for the selective signals in different periods of primate evolution.

Results
We have performed a gene set enrichment analysis to detect positive selection at the pathway level in the inner branches of a phylogenetic tree leading to African great apes (Homininae), great apes (Hominidae), apes (Hominoidae), and Old World monkeys and apes (Catarrhini) ( fig. 1). We first ran separate branch-site tests of positive selection on inner branches of 15,738 protein coding gene trees of the Primates clade. The output of the procedure is a loglikelihood ratio test (DlnL) statistic, comparing the likelihood of a model without positive selection to that of a model with one additional parameter for positive selection. A branch in a gene tree with a high DlnL statistic reveals that a subset of codons is likely to have been positively selected (i.e., dN/ dS > 1) in that gene over that period.
We tested over 1,400 pathways from the Biosystems database (Geer et al. 2010) for episodic positive selection (table 1). For each pathway and each branch, we calculated the sum of the DlnL4 values of the genes in this set (where DlnL4 is the fourth root of DlnL, see Materials and Methods), and we inferred the significance of this "SUMSTAT" score (Tintle et al. 2009) against a null distribution of random gene sets of the same size. As shown in a previous study on the human specific branch , we find that genes in gene sets tend to be more conserved in internal branches of the Primates tree than genes that are not included in any gene set. This pattern is most pronounced in the two youngest branches leading to Homininae and Hominidae (supplementary fig. S1, Supplementary Material online). To account for FIG. 1. The Primates clade with the species used in the branch-site test. The four tested branches (Homoninae, Hominidae, Hominoidae, and Catarrhini) are numbered (used to identify branch specific lists of genes or gene sets, e.g., G1, G2, G3, and G4) and marked in red. (Modified from the Ensembl mammalian species tree: https://github.com/Ensembl/ensembl-compara/blob/release/70/scripts/pipeline/species_tree_blength. nh. For more information about the construction of phylogenetic trees in Ensembl and the calculation of branch lengths, see http://dec2013. archive.ensembl.org/info/genome/compara/index.html) Daub et al. . doi:10.1093/molbev/msx083 MBE the fact that some genes in sets are more conserved, we created null distributions that explicitly reflect the expected genomic background by randomly sampling from the group of genes that are part of at least one gene set.
As we are essentially interested in finding gene sets that show a global shift in selection scores and not those gene sets that include a single or a few extremely significant genes, we repeated the procedure after excluding in each set the highest scoring gene. Only pathways that scored a q value below 0.2 (thus allowing a 20% FDR) both before and after exclusion of the top significant gene were considered candidates for polygenic selection. Note that the observed patterns could also be the result of consecutive single locus selective sweeps, instead of simultaneous shifts in allele frequencies at multiple loci. Therefore, one should have these two scenarios in mind when we mention polygenic selection in the rest of the paper. These two processes can be described as positive selection acting on multiple genes ("polygenic") in a gene set within a specific evolutionary time frame (i.e., each tested branch in the Primates tree).
We found 6, 32, 42, and 93 such significant pathways in the Homininae, Hominidae, Hominoidae, and Catarrhini branches, respectively (table 1 and supplementary table S1, Supplementary Material online). The fact that we find more candidate pathways when we go further back in evolutionary history is not necessarily caused by a change of selective pressures, but could be due to an increased power to detect selection in the longer ancient branches (see Discussion). In all four tested branches, we found clusters of candidate pathways that share a considerable proportion of their genes and often have similar biological functions, partly due to the nature of our data source, which is an aggregation of multiple pathway databases (supplementary figs. S2-S5, Supplementary Material online). We removed this redundancy with a "pruning" method, by iteratively removing the genes of the top scoring set from the remaining sets and rerunning the testing procedure on these remaining sets. The pruning procedure considerably reduced the number of significant sets and yielded 2, 7, 6, and 9 independent candidate sets in the Homininae, Hominidae, Hominoidae, and Catarrhini branches, respectively (table 2). Even though all high scoring pathways could be worth further investigation, those gene sets that remain significant after pruning are the pathways with the best evidence for direct action of positive selection.
The strongest candidate for positive selection in the Homininae branch is the GPCR downstream signaling pathway, which is also a top scoring candidate in the Hominoidae branch. G protein-coupled receptors (GPCRs) are membrane proteins that regulate the cellular response to external signals such as neurotransmitters and hormones, and they play an important role in vision, taste and smell (Rosenbaum et al. 2009). We found that 28 out of the 43 high scoring genes in this pathway (having a DlnL4 > 1) are genes coding for olfactory receptors, suggesting that its role in taste and smell has been a major driver for selection.
The second candidate is the Immunoregulatory interactions between a Lymphoid and a non-Lymphoid cell pathway, which contains receptors and cell adhesion molecules that are important in immune response regulation of lymphocytes. This pathway is also significant after pruning in the Hominidae and Hominoidae branches, and it scores high before pruning in the Catarrhini branch. Interestingly, the genes in this pathway that contribute most to the polygenic selection signal differ among branches ( fig. 2), which suggests that while the same pathway has been under selection over a long period, different genes underwent adaptive changes at different times. This is a pattern that is shared by many of the significant pathways (supplementary fig. S6, Supplementary Material online).
The highest scoring candidate in the Hominidae branch is the Olfactory Signaling Pathway, i.e., smell perception. The pathway Metabolism of xenobiotics by cytochrome P450, which ranked third in Hominidae and fifth in Hominoidae, encodes detoxifying proteins which play a role in the metabolism of xenobiotics such as drugs and toxins; they include cytochrome P450 enzymes or glutathione S-transferases. Other candidate pathways are also involved in metabolism, such as the Synthesis of bile acids and bile salts via 7alphahydroxycholesterol and Fatty acid metabolism. Note that the latter pathway is also a candidate for positive selection in the Catarrhini branch. The remaining two top scoring pathways in the Hominidae branch are related to immune response (Intestinal immune network for IgA production) and electron transport (Oxidative phosphorylation).
In addition to the Immunoregulatory interactions between a Lymphoid and a non-Lymphoid cell pathway, two other immune response pathways remain significant after pruning in the Hominoidae branch. The highest scoring candidate, NOTE.-For each branch the number of significant sets (q < 0.2) in the SUMSTAT gene set enrichment test is reported, both before and after removing overlapping genes ("pruning"), as well as the number of significant sets before pruning that remain significant after removal of their highest scoring gene. Branches where the pathway scores significant after pruning are marked with a "*." The genes are grouped by hierarchical clustering to visualize blocks with similar signals within and among branches. Genes for which DlnL4 scores were not available (NA) in a certain branch are depicted in grey. Genes are merged (horizontally) with their paralog(s) into an "ancestral gene" in the branches preceding a duplication and their scores were included only once in the calculation of the SUMSTAT score for these branches. Genes with (vertically) merged branches represent cases where the sequence of one or more species is missing or excluded, resulting in a single "average" DlnL4 score over multiple branches. We use this score when testing each branch separately. The DlnL4 score is computed as the fourth root of loglikelihood ratio in the branch-site test for positive selection. NOTE.-For each branch, only the pathways that score significant (q < 0.2) both before and after pruning (removal of overlapping genes) are listed. The SUMSTAT scores and gene set sizes that changed after pruning are shown in parentheses. Pathways which score significant on more than one branch are highlighted by the symbol ' §' Daub et al. . doi:10.1093/molbev/msx083 MBE Staphylococcus aureus infection, includes genes coding for proteins used by the bacterium S. aureus to infect a host cell (Foster 2009). It is also a top candidate in the Catarrhini branch. The Complement cascade pathway plays an important role in immune surveillance and homeostasis (Ricklin et al. 2010). Finally, the last candidate in the Hominoidae branch, Electron Transport Chain, plays a role in energy production. Five out of the nine pathways that remain significant after pruning in the Catarrhini branch have a function in immune response. Apart from the above-mentioned Staphylococcus aureus infection, these are Cytokine-cytokine receptor interaction, Hematopoietic cell lineage (giving rise to various types of blood cells including leukocytes), the host defense peptides Defensins, and NF-kB activation through FADD/RIP-1 pathway mediated by caspase-8 and -10. The fourth candidate is non-Alchoholic fatty liver disease (NAFLD), a disease linked with metabolic syndrome (Dowman et al. 2010). Note that before pruning, this latter pathway was part of a cluster of gene sets related to electron transport, a process which-when disturbed-can lead to oxidative stress, which is a key feature of NAFLD pathology. The remaining candidate pathways have functions in the metabolism of toxins (Chemical carcinogenesis) and nutrients (Pancreatic secretion).

Processes with Evidence of Positive Selection
In this study, we have combined the strength of two approaches to detect positive selection in the ancestral lineages of humans. The branch-site test identifies episodes of selection at specific evolutionary times and sites in a protein, and gene set enrichment combines the signal of multiple genes to find selection at the pathway level. We identified several significant pathways related to immune response, sensory perception, metabolism, and electron transport in different branches of the primate tree (supplementary figs. S2-S5, Supplementary Material online). These pathways were often organized in clusters that share many genes and have similar biological functions. After removal of the overlap between these pathways with a pruning procedure, two to nine pathways remained significant per branch. These pathways are our prime candidates for having been shaped by positive selection in primate evolution, and correspond to four broad biological processes.
First, in all four tested branches, immune response related pathways were among the top candidates, both before and after pruning. This strong and enduring signal suggests that the ongoing challenge of adaptation to changing pathogens has been one of the major selective pressures in primate evolution. This is in line with similar findings in previous reports of selection or fast evolution in ancient (Chimpanzee Sequencing and Analysis Consortium 2005; Nielsen et al. 2005) and recent human evolution (Daub et al. 2013).
Second, several significant pathways are involved in sensory perception, with GPCR downstream signaling and the Olfactory Signaling Pathway remaining after pruning in Homininae and Hominidae, respectively. Sensory perception pathways have many functions, from sensing environmental signals to internal signals such as hormones and neurotransmitters. Earlier studies have reported genes involved in sensory perception to be evolving rapidly in humans and other primates, with some support for positive selection (Chimpanzee Sequencing and Analysis Consortium 2005; Nielsen et al. 2005;Arbiza et al. 2006;Kosiol et al. 2008). These genes could have been affected by positive selection because of changes in environment, behavior or diet.
Third, clusters of significant pathways involved in the metabolism of lipids and other nutrients were detected in Hominidae, Hominoidae, and Catarrhini. This selective signal could be explained by changes in diet. Of note, the selective signal could also be due in part to the involvement of some of these pathways (Metabolism of xenobiotics by cytochrome P450 and Chemical carcinogenesis) in the metabolism of potentially toxic xenobiotics. These results are specifically interesting, since adaptations in metabolism are expected with the changes in lifestyle that have marked hominid evolution, yet they have been rarely detected (but see Fumagalli et al. 2015;Mathieson et al. 2015). Our gene set approach thus allows us here to capture a subtle but biologically important signal of adaptation.
Fourth, there are several significant pathways involved in energy production in the Hominidae, Hominoidae, and Catarrhini branches, with Oxidative phosphorylation, Electron Transport Chain, and non-Alchoholic fatty liver disease (NAFLD) remaining significant after pruning in these branches. Earlier candidate gene studies have also reported genes involved in electron transport to be under positive selection in anthropoid primates (reviewed in Grossman et al. 2004). It has been suggested that this could be related to anthropoid specific evolutionary changes, such as extended lifespan, prolonged fetal development, and enlarged neocortex, as these traits require increased aerobic energy production (Grossman et al. 2004).

Consistency of Selection Over Evolutionary Time
Several pathways reported here were also detected in an earlier study, in which we applied a gene set enrichment analysis on a more recent period of human evolutionary history (Daub et al. 2013). The Cytokine-cytokine receptor interaction pathway for example was detected as a potential candidate for recent polygenic human adaptation. Another candidate in that study, Fatty Acid Beta oxidation, scored significant here before pruning in Catarrhini, as did the Malaria pathway in all branches except Homininae. These findings suggest that certain functions have been under selection in multiple or ongoing episodes of positive selection, and might still be under selection now.
The finding of the Olfactory Signaling Pathway as candidate for positive selection is apparently in contradiction with a previous study  in which we found olfactory pathways to be under relaxed selection in the human branch. That earlier finding was in line with the fact that many olfactory genes have become pseudogenes in primates-and particularly in humans-, possibly because we have become more dependent on vision than smell and taste Detection of Pathways Affected by Positive Selection . doi:10.1093/molbev/msx083 MBE (Hughes et al. 2014). Our current results suggest that in more ancient Ape history, some of the olfactory system still played an important adaptive role, and evolved rapidly not just because of relaxed selection but because of adaptation. Two other pathways in the aforementioned study showed signs of positive selection, namely Meiotic Recombination and Betacatenin phosphorylation cascade, but they did not score high in the current study. Again, this could be explained by positive selection acting differently over different time periods in Ape evolution. However, we cannot rule out that some differences in results are caused by the fact that we used different methods to test for selection in each study.
The fact that some pathways are significant over successive evolutionary periods (neighboring branches) appears to indicate that similar selective pressures occurred over a long timeframe in primates. Interestingly, we found that in such cases the highest scoring genes often differed among branches ( fig. 2 and supplementary fig. S6, Supplementary Material online), which suggests that biological pathways under long-term selective pressures have adapted by means of changes in different genes over time, allowing the fine tuning of the pathway function without altering previous adaptations. However, we cannot exclude that we might lack power to detect more continuity in selective pressure in the same gene over long evolutionary periods. For example, deviations of model assumptions-e.g., having one or more positively selected background branches-can decrease the power of the branch-site test (Anisimova and Yang 2007;Kosakovsky Pond et al. 2011) and might explain in some cases the observed differences in gene scores among branches.
We have also identified several candidate pathways that are branch-specific. Whereas these isolated adaptive signals may be due to unique changes in selective pressure, they might result from alternative causes. We therefore investigated if nonadaptive factors might have affected our results. We indeed find small but significant correlations between DlnL4 scores and gene tree size (number of branches; Pearson's r ¼ 0.07, P < 2.2e-16) or sequence length (r ¼ À0.05, P < 2.2e-16). The latter negative correlation contrasts with earlier studies in which the branch-site test had more power to detect selection in longer sequences (Studer et al. 2008; Yang and dos Reis 2011). At the gene set level, we find an even more pronounced negative correlation between the average sequence length and enrichment score (Àlog[P value]; Pearson's r ranging from À0.24 to À0.11 depending on the branch, P < 6e-5). We are cautious though to conclude that shorter genes have been more affected by positive selection, as we cannot rule out other nonadaptive factors that could explain this pattern.
We also found a noticeable correlation between DlnL4 scores and gene specific branch length (estimated number of nucleotide substitutions per codon; r ¼ 0.23, P < 2.2e-16), although the effect is reduced when we consider the synonymous branch length only (dS; r ¼ 0.14, P < 2.2e-16). Finding a correlation between branch length and gene score makes sense since branch lengths are inferred from the number of mutations, and we have more power to detect positive selection with increasing numbers of mutations (Fletcher and Yang 2010), but positive selection can also lead to more mutations being fixed. So, our candidate pathways could be enriched in fast evolving genes (thus genes less affected by purifying selection), partly because positively selected genes evolve fast, and partly because there is more power to detect selection in fast evolving genes. Therefore, our finding that more significant pathways are observed in more ancient lineages might be partly due to an increased power to detect selection in the older branches that are on average longer than younger branches.

Impact of GC Bias and Duplications
Earlier studies in primates have shown that GC-biased gene conversion can be confounded with positive selection as it leads to accelerated evolutionary rates and biased fixation of GC alleles in regions of high recombination (Berglund et al. 2009;Galtier et al. 2009;Ratnakumar et al. 2010). Although a simulation study by Gharib and Robinson-Rechavi (2013) indicated that the branch-site test is robust against GC variation, we have investigated a possible GC bias in our data by calculating correlations between GC content and selection score (DlnL4). We related selection scores with the GC content per branch, but also with the mean GC content per gene tree and with GC content variation (by calculating the GC content variance in a tree and the difference between the maximum and minimum GC content within a tree). We did not find any noticeable correlation between these statistics and selection scores (supplementary table S2A, Supplementary Material online). Although one would not expect pathways to be GC-biased (but see Berna et al. 2012), we also measured correlations at the gene set level. We averaged the GC statistics per gene set and related them with the score (Àlog[P value]) in the gene set enrichment test (supplementary table S2B, Supplementary Material online). Again, we did not find any noticeable correlation with GC content, but we did detect a small correlation between GC variance (per gene tree) and gene set score (r between 0.15 and 0.19, depending on the tested branch, P values all <1e-08). This can be attributed to the fact that fast evolving genes (gene trees with a relatively high number of mutations) will have on average a higher GC variance than slow evolving genes. That could explain why we see a higher correlation in gene sets than genes: in gene sets this effect is amplified, especially because genes in gene sets tend to have similar evolutionary regimes .
It has been hypothesized that duplicated genes can be a source of evolutionary novelties, through processes such as neo-or subfunctionalization or dosage effects (Innan and Kondrashov 2010). Indeed, Lorente-Galdos and colleagues showed recently that fast evolving exons in primates were enriched in duplicated regions (Lorente-Galdos et al. 2013). In another study, Qian and Zhang (2014) reported that simultaneously deleting a duplicate gene pair in budding yeast reduced fitness significantly more than deleting their singleton counterpart in fission yeast, again suggesting adaptation after duplication. We thus investigated the effect of duplications on our results. We find that branches with at least one duplication in their ancestral branches in the primate tree Daub et al. . doi:10.1093/molbev/msx083 MBE tend to have higher selection scores (i.e., 37% of these branches score DlnL > 0, compared with 25% in branches without duplications in ancestral branches), suggesting an increase of selection intensity in some of the duplicated genes. Conversely, we applied the enrichment test on a reduced data set containing only gene trees without duplications. With this new data set we find less gene sets that are significant, which can be explained by a decrease in power as we have less genes and gene sets to test. Nevertheless, many of the significantly scoring gene sets from the original data set still score significant or high with the new test, including the olfactory pathways that are known to contain many duplications (supplementary table S3, Supplementary Material online). These new test results suggest that our results are not particularly confounded by duplications.

Methodological Limitations and Strengths
A potential limitation of our study is that some genes are not included in the Selectome database, as only gene trees with at least six terminal leaves and passing stringent alignment quality filters were included . Therefore, about 21% of genes were ignored in our enrichment test, which might lead to a deficit of trees with fast evolving genes that are difficult to align, as well as a potential excess of gene trees with duplications. The latter category contains several paralogs resulting in more leaves, and thus passes the criterion of six leaves even when a few sequences are missing from genomes or eliminated by alignment quality filters. To estimate the effect of this exclusion on our results, we tested for each gene set whether they had a significant excess of excluded genes (q value < 0.2 with Fisher Exact test). We found 62 gene sets with such an excess (supplementary fig. S7, Supplementary Material online). They are mainly involved in protein metabolism (15 sets) and the cell cycle (37 sets). Interestingly, we also find a few gene sets that contain many excluded genes, but which still score high in our likelihoodbased test, namely the Olfactory signaling, Olfactory transduction and GPCR downstream signaling pathways. Some of the pathways with an excess of excluded genes were candidates for being under selection in studies of recent human evolution, such as Pathogenic Escherichia coli infection (Daub et al. 2013), and we could thus not fully check if these pathways have also been under selection in more ancient primate evolution.
We want to emphasize that our gene set enrichment analysis differs from classical Gene Ontology (GO) enrichment tests, as we include all genes in our analysis instead of only considering top scoring genes after setting an arbitrary significance threshold. Although GO enrichment tests ask whether these top scoring genes are enriched for a biological function or process (see e.g., Cagan et al. 2016), we aim to find pathways with an overall shift in the distribution of gene scores. For comparison, we also performed a GO enrichment test on the significant genes in each branch, but this procedure did not result in any significantly enriched GO terms. This negative result underlines the power of a gene set enrichment test as performed in this study, as we can detect pathways that contain many genes with small to moderate effect mutations.

Conclusions
In conclusion, by combining the specificity of the branch-site test and the power of the gene set approach, we have been able to uncover for the first time strong signals of polygenic positive selection in several biological processes during longterm primate evolution. In addition to immune response, we find evidence for adaptive evolution on sensory perception, as well as on metabolism and energy production. The fact that different genes are involved in pathways showing signals of positive selection in several branches, suggests that the fine tuning of biological functions can change over time during primate evolution. Our results allow us to bridge the gap between studies of selection in deep mammalian evolution, and recent adaptation in the human lineage.

Data Collection
In this study, we aim at detecting biological pathways affected by episodic positive selection in primate evolutionary history, specifically in the four inner branches of the Primates tree that lead to modern humans ( fig. 1).

Branch-Site Likelihood Test
The data used in this study was produced as part of release 6 of Selectome (http://selectome.unil.ch/, Proux et al. 2009;Moretti et al. 2014), which is a database that provides results of the branch-site likelihood test for positive selection (Zhang et al. 2005) on internal branches of several clades. The branchsite test can detect codon sites on specific phylogenetic branches that are affected by episodic positive selection. In short, it estimates the rate of nonsynonymous (dN) and synonymous (dS) nucleotide substitutions to assess differences in selective pressure (dN/dS ratio) among branches and over sites. Usually, tested branches that have a class of sites with a dN/dS ratio x 2 > 1 are candidates for positive selection. Although the strength of positive selection can in principle be estimated by the ratio x 2 or by the proportion p 2 of sites in this class, we have found the likelihood ratio to be a good estimator of the evidence for positive selection (Studer et al. 2008;Roux et al. 2014). In more detail, for each branch, the maximum likelihood of the data is estimated under two models: one that allows for positive selection (H1), and one that only allows negative selection and neutral evolution (H0), and a log-likelihood ratio statistic DlnL ¼ 2(lnLH1 À lnLH0) is computed. To determine their significance, the DlnL values are usually compared with a chi-square (v 2 ) distribution with one degree of freedom; here we use all DlnL values without applying any a priori significance cut-off.
To avoid false positives due to poor sequence alignments, the Selectome pipeline includes many filtering and realignment steps to remove unreliable regions before running the branch-site test , and see http://selectome. unil.ch/cgi-bin/methods.cgi). Supplementary figure S8, Supplementary Material online, shows an example of a codon alignment before and after filtering, where filtering results in a significant change of the DlnL value. Furthermore, only the internal branches of gene families with at least six sequences Detection of Pathways Affected by Positive Selection . doi:10.1093/molbev/msx083 MBE (leaves in the tree) were computed in Selectome as there is more potential for errors on terminal branches, due to sensitivity to sequencing and annotation errors in one species; moreover, the test has low power and accuracy when only a few sequences are used (Anisimova et al. 2002).
For our gene set analysis, we thus used the DlnL values obtained by testing 15,738 gene trees from the Primates clade as defined in version 70 of Ensembl Compara (Vilella et al. 2009). We only kept test scores for the four branches mentioned above (and shown in fig. 1) if they led to at least one human gene. We could thus create for each branch i ( fig. 1), a list G i with human genes and their corresponding DlnL scores. Because of duplication events and the lack of resolution of the Homininae label, these initial lists often contained several rows per gene, whereas we need at most one DlnL score per gene per branch for our enrichment test. On the other hand, missing or excluded sequences can result in branches lacking DlnL scores in a number of genes. We describe below how we dealt with these situations and how we handled computational issues with likelihood estimation.

Dealing with Multiple Homininae Branches
In Ensembl gene trees, both the branch that leads to the common ancestor of human, chimp, and gorilla (Homininae) as well as the branch to the common ancestor of human and chimp (Hominini) are labeled as Homininae. As a result, many gene trees that have human, chimp and gorilla sequences present, have multiple branches annotated as Homininae. For example, the DMXL1 gene family (supplementary fig. S9, Supplementary Material online) has a Homininae branch leading to the ancestor of human, chimp. and gorilla followed by a branch leading to the ancestor of human and chimp which is labeled "Homininae", although it is properly an Hominini branch. In about 42% of the gene trees, we found such multiple Homininae labeled branches. In these cases we took the test scores of the oldest branch, which in 95% of the cases is also the longest branch.

Dealing with Missing Branches
For several genes, the sequence of one or more species is absent or excluded due to low quality . The corresponding branch in the gene tree is then merged with its downstream branch, and the DlnL score is assigned to this lower branch, while it actually represents an "average" score over both branches. We therefore use this DlnL score for both branches as input in the gene set enrichment test. Supplementary fig. S10, Supplementary Material online, shows an example of the C3 gene tree with a missing macaque sequence. Supplementary table S4, Supplementary Material online, shows results of the gene set enrichment test where in the case of a missing sequence, we assign the DlnL score only to the lower branch.

Dealing with Gene Duplications
Gene duplications will result in some species having paralogous genes. For our gene set enrichment analysis, we removed the branches in the Primates tree that led to a duplication event (about 3% of all branches). Our procedure is further detailed in supplementary fig. S11, Supplementary Material online. Briefly, for branches predating the duplication, the values in the gene list corresponding to the duplicated genes, which are redundant (same ancestral branches reported for each paralog), were merged and replaced by one value, while for the branches after the duplication both values were kept, as each represents a different paralog.

Nonconverging Likelihoods
The numerical optimization (BFGS algorithm, see PAML documentation) does not always converge to the maximum likelihood estimates of H0 or H1. This can result in negative DlnL values or in false positives having extreme high DlnLs. In order to reduce the number of events of nonconvergence, we ran the branch-site test two more times on the whole Primates tree, thus yielding a total of three likelihood scores for each of H0 and H1. We then selected for each branch the highest log-likelihoods for H0 and H1 among the three runs (lnLH0 g and lnLH1 g ) and constructed the log-likelihood ratio score for a gene (DlnL g ) as follows: However, three runs could still allow for some nonconvergence. If we obtained a negative DlnL g score (indication of nonconvergence of all lnLH1 scores) we set DlnL g to zero (about 8% of the cases, with less than 0.06% having a DlnL g <À0.1).
The first branch-site test was run with codeml version 4.6 from PAML (Yang 2007). Most jobs of the first run were performed on the Swiss multi-scientific computing grid (SMSCG, http://lsds.hesge.ch/smscg/), whereas the longer jobs were submitted to the Vital-IT computer cluster (http://www.vital-it.ch) and to the Ubelix computer cluster of the university of Berne (http://www.ubelix.unibe.ch/). The second run was run with SlimCodeml (Schabauer et al. 2012), and the third with FastCodeML ) starting with M1-estimated (j and x 0 ) starting parameters, both on the Vital-IT computer cluster.

Ensembl Gene ID to Entrez Gene ID Conversion
We use gene sets from NCBI Biosystems (Geer et al. 2010) (see next section). Since these sets are annotated with Entrez gene IDs, whereas Selectome uses Ensembl gene IDs, we created a one-to-one Ensembl-Entrez conversion table, to map the Ensembl gene IDs in the branch specific gene tables (G i ) to Entrez gene IDs. First, we started with a gene list (G entrez ) containing 20,016 protein coding human genes located either on the autosomal, X or Y chromosomes, downloaded from the NCBI Entrez Gene (Maglott et al. 2011) website (http:// www.ncbi.nlm.nih.gov/gene, accessed on July 16, 2014. We further collected conversion tables (often containing one-tomany or many-to-many mappings) from HGNC (http:// www.genenames.org/biomart/, accessed on July 16, 2014), NCBI (ftp://ftp.ncbi.nih.gov/gene/DATA, accessed on July 16, 2014), and Ensembl (version 70, http://jan2013.archive. Daub et al. . doi:10.1093/molbev/msx083 MBE ensembl.org/biomart/martview). We only kept the rows in these conversion tables that contained genes from G i and from G Entrez . Next, we merged the three tables to one, and only uniquely mapped genes (Ensembl ID-Entrez ID) were used further. For each Ensembl ID, we kept the Ensembl-Entrez mapping with the highest count (in case of multiple candidates we chose randomly one) and we then repeated this procedure for each Entrez ID. With this final list of unique one-to-one Ensembl-Entrez ID mappings, we translated the Ensembl genes in the G i tables to Entrez IDs, and unmapped genes were removed. The resulting 14,574, 15,026, 15,375, and 15,450 genes in the tables G 1, G 2 , G 3 , and G 4 , respectively, were used for further analyses.
For each primate branch of interest (Homininae, Hominidae, Hominoidae, and Catarrhini) ( fig. 1), we excluded genes that could not be mapped to the corresponding gene list G i (see previous section), then removed gene sets with less than ten genes, because the gene set enrichment test has low power to detect selection in small sets. We merged groups of nearly identical gene sets (i.e., sets that share 95% or more of their genes) into single gene sets, i.e., the union of all gene sets in these groups. In the text, the name if these union sets is followed by an asterisk ("*"). To distinguish gene sets with identical names, their source database is added to their name. After the filtering process, we obtained S 1 ¼ 1,415, S 2 ¼ 1,424, S 3 ¼ 1,441, and S 4 ¼ 1,441 gene sets for the four branches to be tested for selection with the gene set enrichment analysis (table 1; numbering according to fig. 1). Note that for each branch we use a different gene list (G i ), leading to a different number of gene sets, as we condition on a minimum of ten genes per set for each branch.

Test for Polygenic Selection
We used a gene set enrichment approach to test for polygenic signals of positive selection on the four primate branches Catarrhini, Hominoidae, Hominidae, and Homininae. We first calculated for each gene set its SUMSTAT score, which is the sum of selection scores of genes in the set of interest (Tintle et al. 2009;Daub et al. 2013). As selection score we took the fourth-root of the DlnL g values (called DlnL4 hereafter) to ensure that the distribution of nonzero DlnLs is approximately Normal (Hawkins and Wixley 1986;Roux et al. 2014). This procedure also prevents extreme scoring genes from getting too much weight in the test, which would otherwise result in significant pathways mostly due to a few outlier genes. The SUMSTAT score of a gene set s is then simply calculated as: The R code to run the gene set enrichment pipeline together with detailed examples and data sets is freely available from https://github.com/CMPG/polysel.

Empirical Null Distribution
The significance of the SUSMTAT score of a pathway was inferred by creating a null distribution of random gene sets of identical size and calculating SUMSTAT on these random sets. The null distribution was built by sampling genes at random from all genes in G i that belonged to at least one gene set. To improve computation time, we created the null distribution with a sequential random sampling method (Ahrens and Dieter 1985), which avoids the burden of high precision P value estimation for low scoring, and thus for us uninteresting, gene sets. For this, we first tested all sets against a small null distribution with 10,000 random sets and estimated their P value. For those sets with a P value < 0.5 we expanded the null distribution with another 10,000 randomizations. This process was continued with decreasing P value thresholds (P < 1 2i at the ith iteration) until we reached a maximum of 1,000,000 randomizations.
We have considered using a parametric distribution, as we know from theory and simulation studies (Zhang et al. 2005) that under the null hypothesis the log-likelihood ratios in the branch-site test should be distributed as a mixture of 50% zeros and 50% a v 2 distribution with one degree of freedom. Assuming such a distribution, we can infer the expected null distribution of a log-likelihood ratio score at the gene set level as well. However, in our real data the proportion of zeros is much larger, around 70-75% for the four tested branches and we observed that a parametric null distribution produces a skewed distribution of P values, leading to an overestimation of P values (see also supplementary text S1, Supplementary Material online) and thus an under-estimation of the signal for positive selection. Therefore, we have favored the use of an empirical distribution. Note that a simulation study by Yang and dos Reis (2011) produced similar high proportions of zeroes, but only for short sequence lengths (<50 codons). As only a few genes (9) in our study have such a short length, we conclude that sequence length cannot explain the high number of zero log-likelihood ratios. It would be worth investigating if other model assumptions do not fully match reality and could therefore be responsible for the deviation from the theoretical distribution of log-likelihood ratios, but this lies beyond the scope of the present study.

Removing Sets with Outlier Genes
Gene sets can potentially have a high SUMSTAT score due solely to one gene with an extremely high DlnL value. However, we are interested here in gene sets affected by polygenic selection, where multiple genes have moderately Detection of Pathways Affected by Positive Selection . doi:10.1093/molbev/msx083 MBE high selection scores. Therefore, we also tested the gene sets after removing their highest scoring gene, and contrasted their SUMSTAT score against random sets that also had their top scoring gene removed. Those gene sets that were not scoring significant anymore after this test were not included in further analyses.
Removing Redundancy in Overlapping Gene Sets ("pruning") The gene set enrichment test can result in partially redundant gene sets being called significant, because they share high scoring genes, and BioSystems includes overlapping or redundant sets. We therefore removed the overlap between gene sets with a "pruning" method similar to one we described in a previous study (Daub et al. 2013). In short, we removed for each branch the genes of the most significant pathway from all the other pathways, and ran the enrichment test on these updated gene sets. We repeated this pruning procedure until no sets were left to be tested.
We estimated the False Discovery Rate (FDR) in our results empirically, since the tests in the pruning procedure are not independent and the results are biased toward low P values (only the high scoring sets will remain after pruning). To estimate the FDR, we repeatedly (N ¼ 300) permuted DlnL4 scores among genes that are part of a set, and tested the gene sets by applying the above described pruning method. For each observed P value P* in our original results, we can estimate the FDR (if we would reject all hypotheses with a P value P*) with: where p 0 is the proportion of true null hypotheses,VðP Ã Þ is the estimated number of rejected true null hypotheses if all hypotheses are true nulls and R(P*) is the total number of rejected hypotheses. We conservatively set p 0 ¼ 1; and esti-matedVðP Ã Þ from the mean proportion of gene sets in the randomized data sets with P value P*. The q value was finally determined by taking the lowest estimated FDR among all observed P values ! P*. We reported the gene sets that scored significant (q value < 0.2) both before and after pruning.

Classical GO Enrichment Test
To contrast our results with a classical Gene Ontology (GO) enrichment test, we defined for each branch a list of top scoring genes, namely those genes with a DlnL value resulting in a q value < 0.2 when compared against a null distribution consisting of 50% zeros and 50% v 2 with one degree of freedom (see paragraph "Empirical null distribution" and Zhang et al. 2005). Note that this is a much less conservative score compared with the branches reported on the Selectome website, as the latter are based on q < 0.10 and were computed over all branches and trees using a more conservative null distribution (v 2 with one degree of freedom). The top genes thus defined were used as input for the online tool Fatigo (http://babelomics.bioinfo.cipf.es, (Al-Shahrour et al. 2004)). We tested for enrichment in (level 3-9) GO biological processes and molecular functions using as background the gene lists G i . The resulting P values were corrected for multiple testing by calculating the q value per branch and GO category (biological process and molecular function), where q values < 0.2 were considered significant.

Test for Bias in Genes Filtered from Selectome
Since all gene trees with less than six leaves after removing unreliably aligned sequences were excluded in Selectome, we tested which categories of gene sets where enriched with these excluded genes, and were thus unlikely to show significant results in our gene set enrichment test. For this, we performed for each gene set a Fisher's exact test on a contingency table with the counts of included and excluded genes in the set contrasted against the same counts for the rest of the genes in G entrez , the list of genes downloaded from the NCBI Entrez website. The resulting P values were corrected for multiple tests, and gene sets with a q value < 0.2 were reported.

Investigating Bias in Branch Length and GC Content
To study whether branch length or GC content might have biased our results, we calculated the correlation between the selection score DlnL4 and each of these factors. The (gene specific) synonymous branch length, dS, was obtained by running the M1 model in codeml with branch length optimization. Due to time constraints, the largest gene tree (ENSGT00550000074383, subtree 1) was excluded from the calculations. GC content was computed at two different levels. At the branch level, GC content was estimated using the tool nhphyml (Boussau and Gouy 2006), with settings: branch length optimization, no tree topology optimization, an infinite number of GC categories, four gamma rate categories (alpha estimated) and nucleotide mode. At the gene tree level, the average, minimum and maximum GC content together with the variance in GC content were calculated. At both levels the unmasked sequences from the terminal branches in each gene tree served as input for the computations.

Correcting for Multiple Testing
For all tests (except when inferring significance after pruning), we calculated the q value (Storey and Tibshirani 2003;Storey et al. 2004) as a measure of the false discovery rate using the R package qvalue (with parameter pi0.method set to "bootstrap") and reported those gene sets with a q value < 0.2.

Supplementary Material
Supplementary data are available at Molecular Biology and Evolution online. Daub et al. . doi:10.1093/molbev/msx083 MBE