Modern humans have occupied almost all possible environments globally since exiting Africa about 100,000 years ago. Both behavioral and biological adaptations have contributed to their success in surviving the rigors of climatic extremes, including cold, strong ultraviolet radiation, and high altitude. Among these environmental stresses, high-altitude hypoxia is the only condition in which traditional technology is incapable of mediating its effects. Inhabiting at >3,000-m high plateau, the Tibetan population provides a widely studied example of high-altitude adaptation. Yet, the genetic mechanisms underpinning long-term survival in this environmental extreme remain unknown. We performed an analysis of genome-wide sequence variations in Tibetans. In combination with the reported data, we identified strong signals of selective sweep in two hypoxia-related genes, EPAS1 and EGLN1. For these two genes, Tibetans show unusually high divergence from the non-Tibetan lowlanders (Han Chinese and Japanese) and possess high frequencies of many linked sequence variations as reflected by the Tibetan-specific haplotypes. Further analysis in seven Tibetan populations (1,334 individuals) indicates the prevalence of selective sweep across the Himalayan region. The observed indicators of natural selection on EPAS1 and EGLN1 suggest that during the long-term occupation of high-altitude areas, the functional sequence variations for acquiring biological adaptation to high-altitude hypoxia have been enriched in Tibetan populations.
The indigenous people living at the Himalayas (mostly at 3,500–4,500 m) are a well known example of successful adaptation to high altitude (Wu 2001), as revealed by their long occupation (>25,000 years) and establishment of a major civilization in the Himalayan region (Aldenderfer 2003; Yan et al. 2007; Shi et al. 2008; Zhao et al. 2009). A major stressor of high-altitude living is reduced oxygen level or hypoxia. At 4,000-m elevation, each breath contains only about 60% of the oxygen inhaled at sea level (Beall 2007). Among the positive effects of high-altitude adaptation is improved aerobic performance compared with lowlanders, as reflected in elevated resting ventilation, low hypoxic pulmonary vasoconstrictor response, better levels of blood oxygen saturation, and low hemoglobin (Hb) level compared with acclimatized lowlanders (Wu and Kayser 2006; Beall 2007). All these adaptive features very likely have a genetic basis (Moore 2001; Wu and Kayser 2006; Beall 2007; Grocott and Montgomery 2008).
It has been shown that Tibetan women with a high likelihood of possessing high oxygen saturation genotypes have more surviving children. The mean offspring mortality for the high oxygen saturation genotype was significantly lower, at 0.48 deaths compared with 2.53 for the low oxygen saturation homozygote, suggesting that the selective pressure by high-altitude hypoxia is even stronger than the classic case of an environment with endemic falciparum malaria (Beall et al. 2004). It has been proposed that genetic changes in genes regulating oxygen delivery under hypoxic conditions contribute to the functional outcome of successful high-altitude adaptation (Beall 2007).
Previous studies have suggested a few candidate genes for high-altitude adaptation in Tibetans. It was reported that a Tibetan population (Sherpa) possesses higher frequencies of two alleles of endothelial nitric oxide synthase (eNOS) than lowlanders, which might explain the improved blood flow and oxygen diffusion in the lung (Wilkins et al. 2002). Another reported candidate gene is hypoxia-inducible factor-1α (HIF-1α), and a dinucleotide repeat of HIF-1α was found in 20 Tibetans (Sherpas) but not in 30 Japanese lowlander controls (Suzuki et al. 2003). However, the reported genetic associations with high-altitude adaptation were rather tentative, and the responsible gene(s) are yet to be identified.
Genome-wide scan of sequence variations and population-based analysis provide powerful tools in detecting molecular signatures of natural selection in recent human population history (Nielsen et al. 2007). Recently, several genome-wide studies of Tibetan highlanders have been reported, and a series of candidate genes have been suggested responsible for the genetic adaptation of high-altitude hypoxia in Tibetans (Beall et al. 2010; Simonson et al. 2010; Yi et al. 2010). However, the major candidate genes suggested in these studies were rather different, likely due to the different analytical methods used and the limited sample sizes. Simonson et al. (2010) reported that the positively selected haplotypes of EGLN1 and PPARA were significantly associated with the decreased Hb phenotype (as compared with acclimatized lowlanders) that is unique to Tibetans. But in the other two studies (Beall et al. 2010; Yi et al. 2010), EPAS1 (endothelial PAS domain protein 1; also known as hypoxia-inducible factor-2α [HIF-2α]) was considered the candidate gene that has undergone positive selection for high-altitude adaptation. By applying the genomic approach and sampling multiple Tibetan populations of different geographic origins, we conducted an independent analysis and we demonstrate that EPAS1 and EGLN1 are the two candidate genes likely carrying the adaptive mutations in Tibetan populations.
Materials and Methods
A total of 1,334 native Tibetan individuals were sampled, representing seven different populations from three provincial areas of the Tibetan Plateau in China (Tibet, Qinghai, and Yunnan). The sample size, geographic location, and elevation of each population are indicated in figure 1. Blood samples were collected from each individual with informed consents. The protocol of this study was reviewed and approved by the Internal Review Board of Kunming Institute of Zoology, Chinese Academy of Sciences.
Genome-Wide Single Nucleotide Polymorphism Analysis, Sequencing and Haplotype Analysis
Our approach employed the Affymetrix Genome-wide Human SNP Array 6.0, which includes 906,600 single nucleotide polymorphisms (SNPs), with an average inter-SNP distance of 1.5 kb. The HapMap data of northern Han Chinese in Beijing, China (CHB), was included as the reference population. We found that there are 843,131 overlapped SNPs between Tibetan and CHB. We initially scanned 50 Tibetan individuals randomly selected from the 1,334 samples.
For sequencing, we designed a series of primers to cover the entire 94-kb genomic region of EPAS1. The same 50 Tibetan individuals screened by the Affymetrix DNA chips were sequenced. For haplotype analysis, to avoid the influence of recombination, we selected the 9-kb fragment of intron 2 in EPAS1 containing 28 SNPs and constructed the haplotype network of Tibetan, CHB, CEU (Utah residents with Northern and Western European ancestry from the CEPH collection), and YRI (Yoruban in Ibadan, Nigeria), following the method by Bandelt et al. (1999).
Detection of Selective Sweep, Estimation of FST and Neutrality Test
We undertook a XPCLR test to scan for highly differentiated genomic regions as targets of selective sweeps (Chen et al. 2010). The XPCLR method simultaneously models multi-locus allele frequency differentiation between two populations as a function of selection intensity and the genetic distances between the causal allele and neutral markers surrounding the target region. CHB was chosen as the reference population, and Tibetan samples were taken to be the object population. We set uniform grid points with spacing of 2 kb along the chromosomes as putative causal mutant positions. The sliding window sizes were 0.2 CM, and weights with a cutoff of 0.99 were assigned to each SNP according to their pair-wise correlation coefficients with other SNPs in the reference population within the same window. Pair-wise correlation coefficients for SNPs were estimated using the expectation-maximization algorithm.
We used the moments estimate of FST described by Weir and Cockerham (1984) to measure between-population genetic divergence. We calculated FST for each SNP between Tibetan and CHB. The empirical P value from genome-wide SNPs was reported. In essence, low P values indicate that the locus under investigation is an outlier compared with the rest of the genome. The empirically derived P value is robust to uncertainties surrounding population history. Tests of neutrality followed the method developed by Fay and Wu (2000) using Africans (YRI) as outgroup.
Sequencing Analysis of EPAS1 and Genotyping of the Candidate Gene tag SNPs
We tested three tag SNPs for each of the three candidate genes (EPAS1: rs13419896, rs4953354, and rs1868092; EGLN1: rs2275279, rs2790859, and rs961154; and PPARA: rs6520015, rs7292407, and rs9627403). The tag SNPs for EGLN1 and PPARA were from the published data (Simonson et al. 2010), and the tag SNPs for EPAS1 were selected based on our own data. Genotyping was conducted by the SNaPshots method on an ABI 3130 sequencer (Applied Biosystems, Forster City, CA), and we designed a series of primers for covering the relevant genomic regions. The result of every typing assay was confirmed by resequencing of 20 randomly selected individuals.
We performed sequencing of the entire 94-kb genomic region of EPAS1 in the 50 Tibetan individuals initially screened by the Affymetrix DNA chips. The sequence data were used to estimate the magnitude and time of selection (Voight et al. 2006; Tishkoff et al. 2007) (supplementary methods, Supplementary Material online).
Genome-Wide Analysis of Selective Sweep in Tibetans
Using genome-wide genotyping of >900,000 SNPs (Affymetrix Genome-wide Human SNP Array 6.0), we initially tested 50 randomly selected individuals (out of the 1,334 native Tibetan individuals from seven different populations, fig. 1). In order to detect genomic regions that have undergone a selective sweep, we compared the Tibetan sample with other major populations available in the HapMap public database: CHB, JPT (Japanese in Tokyo, Japan), CEU, and YRI.
As expected a priori, the Tibetan sample was on average closely related to CHB/JPT but highly divergent from CEU and YRI (supplementary fig. S1, Supplementary Material online). Thus, any specific genomic region showing unusually deep divergence between Tibetan and CHB/JPT would suggest that the genetic loci might have been under selection for high-altitude adaptation. By applying the XPCLR test (Chen et al. 2010) to scan for highly differentiated genomic regions as targets of selective sweeps, we identified a total of 131 regions with positive signals (P < 0.01) containing 179 genes (supplementary fig. S2 and supplementary table S1, Supplementary Material online). Further analysis indicated that there is a significant overrepresentation of hypoxic genes undergone selective sweeps, and a few of them have been under very strong natural selection in Tibetans (FST > 0, Kolmogorov–Smirnov test: P = 0.0048, FST > 0.0252 [average FST between CHB and Tibetan], Kolmogorov–Smirnov Test: P < 0.0001) (fig. 2).
In order to identify the major candidate genes for high-altitude adaptation, we combined our data with those reported recently (Beall et al. 2010; Simonson et al. 2010; Yi et al. 2010). Among the top list of candidate genes, EPAS1 and EGLN1 are the two genes showing a relatively strong signature of selection which were supported by data from different studies (table 1). Although it is likely that other genes besides of EPAS1 and EGLN1 may also contribute to high-altitude adaptation, the combined genome-wide data from multiple studies implied that these two genes are likely the key players, which is supported by the known direct involvement of these two genes in hypoxia (Patel and Simon 2008). And the signature of positive selection on EPAS1 and EGLN1 were further supported by the results of different analytical methods in detecting selection (Beall et al. 2010; Yi et al. 2010). Hence, the following analyses are focused on these two candidate genes.
|Gene||Study 1||Study 2||Study 3||Present|
|HBB and HBG2||√||—|
|Gene||Study 1||Study 2||Study 3||Present|
|HBB and HBG2||√||—|
NOTE.—The candidate genes were selected based on the re-analysis of the published genome-wide data as well as data from this study (study 1 from Simonson et al. 2010; study 2 from Yi et al. 2010; and study 3 from Beall et al. 2010). The candidate genes of present study were selected from the results of the XPCLR test (empirical P value < 0.001). The “em dash” refers to no data available.
Complete Sequence of EPAS1 Gene Region in Tibetans
As EPAS1 is the top-one candidate gene in the genome-wide analyses (table 1), we sequenced the entire 94-kb EPAS1 gene region in the 50 Tibetan individuals so as to uncover all of the sequence variations within our sample. A total of 273 SNPs and in-dels (insertions and deletions) were identified, among which 182 SNPs having been reported (125 SNPs in the HapMap database, http://www.hapmap.org) and 91 are novel (88 SNPs and 3 in-dels). Importantly, more than half of the 125 HapMap SNPs (55.2%, 69/125) show significant divergence between Tibetan and CHB/JPT samples (supplementary table S2, Supplementary Material online). Our linkage disequilibrium (LD) analysis indicated that the downstream EPAS1 SNPs are tightly linked with each other, consistent with the proposed natural selection (supplementary fig. S3, Supplementary Material online). Haplotype analysis based on the highly linked 9-kb region of EPAS1 intron 2 (containing 28 SNPs) demonstrated that there is a dominant haplotype in Tibetan (72%), which is rare in CHB (2.2%) and absent in the other world populations. Notably, this Tibetan-specific haplotype is highly diverged from the other haplotypes, again suggesting genetic hitchhiking (fig. 3). The results of neutrality tests also indicated a suggestive signal of Darwinian positive selection in the Tibetan sample (H = −35.86, P < 0.01; Fay and Wu test) (Fay and Wu 2000).
We believe that high-altitude adaptation offers the most plausible explanation for the observed selection on EPAS1 is within the context of overall genome-wide similarity of the Tibetan and CHB/JPT populations. Similar results have been reported for EGLN1 by Simonson et al. (2010) in which the selected core haplotype is prevalent in Tibetans but relatively rare in Han Chinese.
Based on the sequence data of EPAS1, we accessed the intensity and age of the proposed selection. Because we did not know which SNP is the selected mutant in the EPAS1 gene region, we picked six SNPs with the highest FST between CHB and Tibetan (supplementary fig. S3, Supplementary Material online). These SNPs are in similar allele frequency and gave similar level of selection intensity and allele age, which is due to the fact that they are in high LD and on the same core haplotype. The estimated selection intensity around 0.0145 indicates the adaptation to high altitude is among the top signals of strong selection (Tishkoff et al. 2001; Bersaglieri et al. 2004; Tishkoff et al. 2007). Also, the estimated age of the selected alleles is relatively ancient (18,250 years on average) (supplementary table S3, Supplementary Material online).
Population Analysis of EPAS1 and EGLN1 tag SNPs
To quantify the prevalence of selection on EPAS1 and EGLN1 across the Himalayas, we further genotyped the tag SNPs (rs13419896, rs4953354, and rs1868092 for EPAS1; rs2275279, rs2790859, and rs961154 for EGNL1; Simonson et al. 2010) in seven Tibetan populations (1,334 individuals) from three different provincial areas of China (fig. 1 and table 2). For comparison, we also genotyped the three tag SNPs of PPARA, which was reported as one of the top candidate genes for high-altitude adaptation in Tibetans (Simonson et al. 2010) but not supported by our combined genome-wide data analysis (table 1).
NOTE.—BLT, Bailang; CEU, European; CHB, northern Han Chinese; DWT, Dawu; DXT, Dangxiong; JPT, Japanese; LWT, Leiwuqi; MKT, Mangkang; NA, data not available; QST, Qusong; SGL, Shangri-la; YRI-African.
For EPAS1, we observed the same pattern as seen in our initial genome-wide analysis, namely, all Tibetan populations possess unusually high frequencies of the potentially selected SNPs (i.e., SNPs showing deep divergence between Tibetan and CHB/JPT; supplementary table S2, Supplementary Material online). The signal of the sample from Shangri-la (SGL) (3,200 m) is relatively weak (table 2) because Tibetans living there are known having recent admixture with local Han Chinese and other ethnic groups.
The population data of the EGLN1 tag SNPs also confirmed the signature of selective sweep in Tibetans. However, among the three tag SNPs genotyped, only rs2275279 showed high differentiation between Tibetan and non-Tibetan populations, whereas the other two SNPs did not show significant frequency differences (table 2). Therefore, the population data suggest that the selection on EGLN1 is relatively weak as compared with EPAS1, which is also reflected in the results of genome-wide analysis (table 1). Detailed sequence data from EGLN1 is needed to further confirm this pattern.
For PPARA, none of the three tag SNPs showed significant allele frequency divergence between Tibetan and non-Tibetan populations (table 2), implying that this candidate gene might be a false positive, although a significant association between the “selected” PPARA haplotype and Hb levels was reported (Simonson et al. 2010).
Compared with people living at the sea level, there are several key adaptive features in Tibetans for coping with the hypoxic conditions at high altitude. The first feature is the decreased Hb levels in Tibetans when compared with acclimatized lowlanders, offsetting complications related with the sustained high Hb levels (e.g., hyperviscosity) seen in non-Tibetan lowlanders exposed to high-altitude conditions (Wu and Kayser 2006; Simonson et al. 2010). The second feature is the unusually small degree of hypoxic pulmonary vasoconstriction (HPV) compared with lowlanders. HPV can cause hypoxic pulmonary hypertension, a deleterious effect associated with the development of high-altitude pulmonary edema (Wu and Kayser 2006). The other key feature is the increased rate of flow of oxygen-carrying blood to tissues and oxygen diffusion from the bloodstream into cells, which was explained resulting from the increased levels of the vasodilator nitric oxide (NO) that can enlarge the diameter of blood vessels (Beall 2007). All the above key features in Tibetans are considered having a genetic basis (Beall 2007). The question is whether EPAS1 and EGLN1, the two major candidate genes, can explain the key adaptive features in Tibetans.
There are more than 200 genes involved in the hypoxic pathways (Simonson et al. 2010) in which both EPAS1 and EGLN1 are involved. EPAS1 is the key regulator during chronic hypoxia, and it directly regulates genes like erythropoietin (EPO), vascular endothelial growth factor (VEGF), and eNOS in coping with hypoxic conditions (Hu et al. 2003; Patel and Simon 2008). EPAS1 is preferentially expressed in lung and placenta, the two key organs involved in oxygen exchange (Patel and Simon 2008). EGLN1 is a negative regulator of EPAS1, and it targets the two HIFα proteins (EPAS1 and HIF-1α) for degradation under normoxic conditions. Under hypoxic conditions, the degradation effect of EGLN1 is weakened so that the increase of EPAS1 protein could initiate expression of the downstream genes, that is, EPO, VEGF, and eNOS etc. (Hu et al. 2003; Patel and Simon 2008)
The reported data of association analysis indicated that the putatively advantageous haplotype of EGLN1 show significant negative correlations with Hb levels in Tibetans (Simonson et al. 2010). The association with Hb levels was also reported for EPAS1 (Yi et al. 2010). Hence, the presumably advantageous sequence variations in EPAS1 and EGLN1 could explain the decreased Hb levels in Tibetans. But how these two genes are involved in Hb level regulation is yet to be dissected.
It has been shown that Tibetans exhibit an elevated NO level in the blood, contributing to the small degree of HPV and higher blood flow. eNOS is the key enzyme producing NO, and it contains hypoxia-response element (HRE) proven to be specifically recognized by EPAS1 (Coulet et al. 2003). It has been shown that eNOS is a hypoxia-inducible gene, whose transcription is stimulated through EPAS1 interaction with its two contiguous HRE sites (Coulet et al. 2003). Our analysis indicated that all the potentially selected SNPs of EPAS1 in Tibetans are located within the noncoding regions, implying that the molecular adaptation of EPAS1 likely occurs at the transcriptional level, that is, one or more of the SNPs may alter the expression and/or splicing pattern of EPAS1. This speculation is supported by the published exome sequencing data, in which the EPAS1 SNP with the greatest Tibetan–Han frequency difference was intronic (with a derived allele at 9% frequency in the Han and 87% in the Tibetan sample), whereas no amino acid–changing variant had a population frequency difference of greater than 6% (Yi et al. 2010). Therefore, it is likely that the putatively advantageous sequence variations of EPAS1 may enhance the expression of eNOS, and eventually lead to the increase of NO level in the bloodstream in Tibetans. Functional assays will help identifying the causal SNP(s) within EPAS1 and EGLN1 for revealing the molecular mechanism involved.
Our sequence analysis of EPAS1 suggested that the selection intensity is strong (0.0145 on average), and the selection have been acting on Tibetans for a long time, which is consistent with the proposed paleolithic peopling at the Himalayas (Aldenderfer 2003; Yan et al. 2007; Shi et al. 2008; Zhao et al. 2009), although there are still debates (Brantingham et al. 2010). It should be noted that our estimation of selection intensity and allele age only provides a rough picture of the gene. A detailed analysis will be done in further work with more data resource and computationally intensive approaches.
Besides of EPAS1 and EGLN1, there might be other genes (including genes currently not listed in the hypoxic pathways) involved for acquiring molecular adaptation to high altitude in Tibetans as revealed by the data we presented (supplementary table S1, Supplementary Material online), although the roles of the other genes might be relatively minor as compared with EPAS1 and EGLN1.
We are grateful to all the voluntary donors of DNA samples in this study. We thank Yan-jiao Li for their technical help. We also thank Dr Darren Curnoe for his critical reading of the manuscript. This study was supported by grants from the National 973 project of China (2007CB947701, 2007CB815705), the Chinese Academy of Sciences (KSCX1-YW-R-34, Westlight Doctoral Program), the National Natural Science Foundation of China (30700445, 30630013 and 30771181), the State Key Laboratory of Genetic Resources and Evolution (GREKF09-03 and GREKF10-02), and the Natural Science Foundation of Yunnan Province of China (2007C100M, 2009CD107, and 2010CI44).