Abstract

Numerous genome-wide association studies (GWASs) have been conducted for the identification of genetic variants involved with human height. The vast majority of these studies, however, have been conducted in populations of European ancestry. Here, we report the first GWAS of adult height in the Taiwan Biobank using a discovery sample of 14 571 individuals and an independent replication sample of 20 506 individuals. From our analysis, we generalize to the Taiwanese population genome-wide significant associations with height and 18 previously identified genes in European and non-Taiwanese East Asian populations. We also identify and replicate, at the genome-wide significance level, associated variants for height in four novel genes at two loci that have not previously been reported: RASA2 on chromosome 3 and NABP2, RNF41 and SLC39A5 at 12q13.3 on chromosome 12. RASA2 and RNF41 are strong candidates for having a role in height with copy number and loss of function variants in RASA2 previously found to be associated with short stature disorders, and decreased expression of the RNF41 gene resulting in insulin resistance in skeletal muscle. The results from our analysis of the Taiwan Biobank underscore the potential for the identification of novel genetic discoveries in underrepresented worldwide populations, even for traits, such as height, that have been extensively investigated in large-scale studies of European ancestry populations.

Introduction

Height is a classical polygenic trait that has been long recognized, since the 18th century from the seminal work of Francis Galton (1), as being highly heritable. The genetic basis and architecture of human height continues to be of great interest to the genetics community, and numerous genome-wide association studies (GWASs) have been conducted over the past 15 years for the identification of genetic variants contributing to height. Advancements in high-throughput genotyping technologies have facilitated GWASs with hundreds of thousands of individuals, resulting in the identification of a large number of genetic loci associated with height in European ancestry populations (2–4). For example, in 2010, Allen et al. (2) reported a GWAS meta-analysis of height in a sample of 183 727 European ancestry individuals where 180 loci across the genome reached genome-wide significance. In 2014, Wood et al. (3) identified 697 genome-wide significant variants clustered in 423 loci that were independently associated with height in a sample of 253 288 individuals of European ancestry. A recent study of ~700 000 individuals from European ancestry populations (4) identified 712 loci, including 512 novel loci, for height at the genome-wide significance level.

Most large-scale GWASs for height have been conducted in populations of European ancestry. There is great potential, however, for genetic studies in diverse populations to provide new insights into the genetic architecture of complex traits (5). Indeed, He et al. (6) conducted a GWAS meta-analysis for height in a sample of 93 926 individuals from seven countries/regions of East Asia populations and identified 17 novel loci for height that were not previously reported, as well as generalized 81 previously identified loci in European ancestry populations to East Asians.

Here, we performed the first GWAS of human height in the Taiwan Biobank with a discovery sample of 14 571 Taiwanese individuals. Using a linear mixed model, we tested for associations between height and 601 768 directly genotyped SNPs and 5 682 219 imputed SNPs across the genome. From our analysis, we identify 410 SNPs in 18 previously reported genes for height from European and/or East Asian GWASs that reached a genome-wide significance level of 5 × 10−8. We also detect novel genetic associations with height in our discovery sample that have not previously been reported: RASA2 on chromosome 3 and NABP2, RNF41 and SLC39A5 on chromosome 12. While a previous study identified a variant near RASA2 with a suggestive association with height (7), our study is the first to identify genome-wide significant variants within RASA2. In addition, the associations with height and variants in these four novel genes were replicated at the genome-wide significance level in an independent sample of 20 506 individuals from the Taiwan Biobank.

Results

Taiwan Biobank study cohort

Table 1 provides demographic characteristics of the Taiwan Biobank study cohort, which included 14 571 subjects in the discovery sample and 20 506 subjects in the replication sample. Approximately, 48% of the study participants are male and 52% are female in both the discovery sample and the replication sample. The mean height was 163.0 (with standard deviation was 8.4 cm) in the discovery sample and 163.3 cm (standard deviation of 8.5 cm) in the replication sample.

Table 1

Demographic characteristics of study subjects

CharacteristicDiscoveryReplicationPa
No. of subjects, n14 57120 506
Mean age ± SD, years49.2 ± 11.149.8 ± 11.1<0.001
Male, %48.1147.820.576
Height, cm163.0 ± 8.4163.3 ± 8.50.002
CharacteristicDiscoveryReplicationPa
No. of subjects, n14 57120 506
Mean age ± SD, years49.2 ± 11.149.8 ± 11.1<0.001
Male, %48.1147.820.576
Height, cm163.0 ± 8.4163.3 ± 8.50.002

SD = standard deviation. Data are presented as mean ± standard deviation.

aChi-square test for the categorical data; Kruskal–Wallis test for continuous variables.

Table 1

Demographic characteristics of study subjects

CharacteristicDiscoveryReplicationPa
No. of subjects, n14 57120 506
Mean age ± SD, years49.2 ± 11.149.8 ± 11.1<0.001
Male, %48.1147.820.576
Height, cm163.0 ± 8.4163.3 ± 8.50.002
CharacteristicDiscoveryReplicationPa
No. of subjects, n14 57120 506
Mean age ± SD, years49.2 ± 11.149.8 ± 11.1<0.001
Male, %48.1147.820.576
Height, cm163.0 ± 8.4163.3 ± 8.50.002

SD = standard deviation. Data are presented as mean ± standard deviation.

aChi-square test for the categorical data; Kruskal–Wallis test for continuous variables.

We estimated relatedness in the study cohort using the PC-Relate method (8). Supplementary Material, Fig. S1 shows the relatedness estimation results where PC-Relate identifies monozygotic twins, first-degree, second-degree and third-degree relatives in the Taiwan Biobank sample. Population structure was inferred using the PC-AiR method (9), which obtains robust inference of population structure in the presence of relatedness. Supplementary Material, Figs. S2S4 show graphical displays of the population structure from PC-AiR. The first two principal components from PC-AiR are shown in Supplementary Material, Fig. S2 and illustrate population structure in the Taiwan Biobank with distinct clustering patterns suggesting three clinal groups with some admixture. The first few principal components (PCs) capture most of the population structure in the data, while the plots for principal components 3–6 do not show much of a clustering pattern (Supplementary Material, Figs. S3 and S4). The proportion of variance explained by principal component 1–6 is 0.1283, 0.0597, 0.0460, 0.0433, 0.0429 and 0.0383, respectively.

GWAS of height in the Taiwanese population

We performed a GWAS of height for the Taiwan Biobank sample using the linear mixed model implementation in the GENESIS software (10). The test statistics were properly calibrated genome-wide, as demonstrated by the quantile-quantile (Q-Q) plot of the association results, given in Supplementary Material, Fig. S5, and a genomic control inflation factor of 1.025. Figure 1 displays the Manhattan plot of the association P-values for the genotyped SNPs, and Supplementary Material, Tables S1 and S2 show the results for 416 SNPs (including 39 genotyped SNPs and 377 imputed SNPs, respectively) that reach genome-wide significance (P < 5 × 10−8) in the discovery sample. Moreover, the heritability estimate for height was 56.3% (95% confidence intervals: 51.8–60.7%) from all genotyped SNPs in the Taiwan Biobank.

The Manhattan plot of genome-wide association of single nucleotide polymorphisms (SNPs) with height for the genotyped SNPs. The Manhattan plot was generated by using the P-values of SNPs, which was constructed via a linear mixed model after adjustment for covariates including age, gender, ancestry representative principle components and a polygenic effect due to genetic relatedness. A single SNP is shown by a point, where a higher point (higher negative log10P-value) suggests a more significant association. The red horizontal line is presented as the genome-wide significance level (P = 5 × 10−8). Hence, points above the red horizontal line represent SNPs with a P-value of less than 5 × 10−8. Y-axis: −log10 (P-value) of each SNP; X-axis: chromosomes demonstrated with blue and orange colors.
Figure 1

The Manhattan plot of genome-wide association of single nucleotide polymorphisms (SNPs) with height for the genotyped SNPs. The Manhattan plot was generated by using the P-values of SNPs, which was constructed via a linear mixed model after adjustment for covariates including age, gender, ancestry representative principle components and a polygenic effect due to genetic relatedness. A single SNP is shown by a point, where a higher point (higher negative log10P-value) suggests a more significant association. The red horizontal line is presented as the genome-wide significance level (P = 5 × 10−8). Hence, points above the red horizontal line represent SNPs with a P-value of less than 5 × 10−8. Y-axis: −log10 (P-value) of each SNP; X-axis: chromosomes demonstrated with blue and orange colors.

Generalization of previous loci for height to the Taiwanese population

From our analysis, we identified 410 genome-wide significant variants in 18 genes that were previously identified in GWASs for height in European and/or East Asian populations: ANKRD52, CEP250, CS, CNPY2, DCAF16, DIS3L2, EFEMP1, FAM169B-IGF1R, FAM184B-DCAF16, GDF5, HHIP, IGF1, IGF1R, LCORL, NCAPG, PAN2, UQCC1 and ZBTB38 (Supplementary Material, Tables S1 and S2).

We also selected 1188 SNPs in/near 693 genes from previous GWASs for height that are reported in the NHGRI-EBI GWAS Catalog to reach genome-wide significance (P < 5 × 10−8) and compared the association results for these SNPs in our discovery sample. Supplementary Material, Table S3 provides the effect sizes and P values from the previous studies and the discovery sample from the Taiwan Biobank. As shown in Supplementary Material, Table S3, 31 of these SNPs in 14 genes reach the genome-wide significance level (P < 5 × 10−8) in the discovery sample. There were also 37 SNPs in 21 genes that showed suggestive significant associations (5 × 10−8 < P < 1 × 10−5) with height in the discovery sample (Supplementary Material, Table S3).

Novel genes for height identified and replicated in the Taiwan Biobank

Intriguing results from our GWAS of height in the Taiwan Biobank are the identification of novel variants and genes for height that have not been previously reported. Directly genotyped SNP rs295321 (P = 3.57 × 10−8) and imputed SNP rs79147413 (P = 2.98 × 10−9) in the intron region of the RASA2 gene are novel for height and reach the genome-wide significance level in the discovery sample (Table 2). In an independent replication sample of 20 506 individuals from the Taiwan Biobank, both rs295321 and rs79147413 are significant at the genome-wide significance level (P = 6.39 × 10−9 and P = 2.23 × 10−9, respectively) (Table 2 and Supplementary Material, Table S4). While an intergenic SNP near RASA2 (RASA2-LINC02618) showed a suggestive association with height in a previous study of European ancestry individuals (7), our study is the first to identify SNPs within RASA2 that reach genome-wide significance, and these associations were replicated in an independent sample at the genome-wide significance level. Figure 2 shows locus zoom plots illustrating the association results and linkage disequilibrium (LD) patterns for the RASA2 gene in the discovery sample. Supplementary Material, Figs. S6S8 show heat map plots to illustrate cross-population differences in LD in the RASA2 gene using the present Taiwanese population as well as East Asian and European ancestry subjects from the 1000 Genomes Project, respectively. Based on the heat map plots, we find that the LD patterns vary among these three populations. We further observed from the heat map distinct LD patterns in some areas between the Taiwanese and East Asian populations (Supplementary Material, Figs S6 and S7).

Table 2

Linear mixed models of associations between height and four genes, which have an evidence of genome-wide significance (P < 5 × 10−8) in the discovery and replication samples

GeneChrSNPA1A2RegionMAF1BETA1SE1P1P2
NABP212rs77394186CTIntron0.108−0.640.104.58 × 10−108.72 × 10−12
rs7311505ATIntron0.150−0.530.092.10 × 10−98.69 × 10−11
RASA23rs295321CTIntron0.2500.450.083.57 × 10−86.39 × 10−9
rs79147413CTIntron0.2810.470.082.98 × 10−92.23 × 10−9
RNF4112rs76280383TCIntron0.125−0.530.104.60 × 10−81.15 × 10−11
SLC39A512rs11171781ATIntron0.122−0.580.101.77 × 10−99.53 × 10−14
GeneChrSNPA1A2RegionMAF1BETA1SE1P1P2
NABP212rs77394186CTIntron0.108−0.640.104.58 × 10−108.72 × 10−12
rs7311505ATIntron0.150−0.530.092.10 × 10−98.69 × 10−11
RASA23rs295321CTIntron0.2500.450.083.57 × 10−86.39 × 10−9
rs79147413CTIntron0.2810.470.082.98 × 10−92.23 × 10−9
RNF4112rs76280383TCIntron0.125−0.530.104.60 × 10−81.15 × 10−11
SLC39A512rs11171781ATIntron0.122−0.580.101.77 × 10−99.53 × 10−14

A1 = minor allele in the present Taiwanese population, A2 = major allele in the present Taiwanese population, Chr = chromosome, BETA = beta coefficients, MAF = minor allele frequency, SE = standard error. Analysis was obtained after adjustment for covariates including age, gender, ancestry representative principle components and a polygenic effect due to genetic relatedness. P1: P-value for the discovery sample; P2: P-value for the replication sample; BETA1: Beta coefficients for the discovery sample; MAF1: minor allele frequency for the discovery sample; SE1: standard error for the discovery sample.

Table 2

Linear mixed models of associations between height and four genes, which have an evidence of genome-wide significance (P < 5 × 10−8) in the discovery and replication samples

GeneChrSNPA1A2RegionMAF1BETA1SE1P1P2
NABP212rs77394186CTIntron0.108−0.640.104.58 × 10−108.72 × 10−12
rs7311505ATIntron0.150−0.530.092.10 × 10−98.69 × 10−11
RASA23rs295321CTIntron0.2500.450.083.57 × 10−86.39 × 10−9
rs79147413CTIntron0.2810.470.082.98 × 10−92.23 × 10−9
RNF4112rs76280383TCIntron0.125−0.530.104.60 × 10−81.15 × 10−11
SLC39A512rs11171781ATIntron0.122−0.580.101.77 × 10−99.53 × 10−14
GeneChrSNPA1A2RegionMAF1BETA1SE1P1P2
NABP212rs77394186CTIntron0.108−0.640.104.58 × 10−108.72 × 10−12
rs7311505ATIntron0.150−0.530.092.10 × 10−98.69 × 10−11
RASA23rs295321CTIntron0.2500.450.083.57 × 10−86.39 × 10−9
rs79147413CTIntron0.2810.470.082.98 × 10−92.23 × 10−9
RNF4112rs76280383TCIntron0.125−0.530.104.60 × 10−81.15 × 10−11
SLC39A512rs11171781ATIntron0.122−0.580.101.77 × 10−99.53 × 10−14

A1 = minor allele in the present Taiwanese population, A2 = major allele in the present Taiwanese population, Chr = chromosome, BETA = beta coefficients, MAF = minor allele frequency, SE = standard error. Analysis was obtained after adjustment for covariates including age, gender, ancestry representative principle components and a polygenic effect due to genetic relatedness. P1: P-value for the discovery sample; P2: P-value for the replication sample; BETA1: Beta coefficients for the discovery sample; MAF1: minor allele frequency for the discovery sample; SE1: standard error for the discovery sample.

Locus zoom plot for the newly identified RASA2 locus for height using the discovery sample in the Taiwan Biobank. SNPs are shown by their position on the chromosome against their association (−log10P) with height in the discovery sample in the Taiwan Biobank. SNPs are colored to reflect their linkage disequilibrium with the top SNP (rs79147413). Estimated recombination rates are plotted in cyan using Asian subjects from the 1000 Genomes Project. This plot was generated using LocusZoom.
Figure 2

Locus zoom plot for the newly identified RASA2 locus for height using the discovery sample in the Taiwan Biobank. SNPs are shown by their position on the chromosome against their association (−log10P) with height in the discovery sample in the Taiwan Biobank. SNPs are colored to reflect their linkage disequilibrium with the top SNP (rs79147413). Estimated recombination rates are plotted in cyan using Asian subjects from the 1000 Genomes Project. This plot was generated using LocusZoom.

As shown in Supplementary Material, Table S2, there are four additional novel SNPs that reach genome-wide significance for height: rs76280383 in RNF41 (P = 4.60 × 10−8), rs77394186 in NABP2 (P = 4.58 × 10−10), rs7311505 in NABP2 (P = 2.10 × 10−9) and rs11171781 in SLC39A5 (P = 1.77 × 10−9). All of these novel associations identified in the discovery samples were replicated at the genome-wide significance level in an independent sample of 20 506 individuals from the Taiwan Biobank, as shown in Table 2 and Supplementary Material, Table S4: rs76280383 in RNF41 (P = 1.15 × 10−11), rs77394186 in NABP2 (P = 8.72 × 10−12), rs7311505 in NABP2 (P = 8.69 × 10−11) and rs11171781 in SLC39A5 (P = 9.53 × 10−14). Interestingly, RNF41, NABP2 and SLC39A5 are all located at 12q13.3 and the four genome-wide significant SNPs are in moderate or high LD with each other in the Taiwanese population (Supplementary Material, Table S4). Figure 3 shows locus zoom plots illustrating the association results in the 12q13.3 region for the discovery sample. As shown in Fig. 3, these four genome-wide significant SNPs on chromosome 12 fall within a 35 kb window. We performed a conditional analysis that showed that these associations on chromosome 12 are not independent and are likely reflecting the same signal in this region (Supplementary Material, Table S6). All SNPs satisfied our strict quality control criteria for inclusion in the analysis (see Methods; Supplementary Material, Table S7).

Locus zoom plot for the newly identified RNF41, NABP2 and SLC39A5 loci for height using the discovery sample in the Taiwan Biobank. SNPs are shown by their position on the chromosome against their association (−log10P) with height in the discovery sample in the Taiwan Biobank. SNPs are colored to reflect their linkage disequilibrium with the top SNP (rs77394186). Estimated recombination rates are plotted in cyan using Asian subjects from the 1000 Genomes Project. This plot was generated using LocusZoom.
Figure 3

Locus zoom plot for the newly identified RNF41, NABP2 and SLC39A5 loci for height using the discovery sample in the Taiwan Biobank. SNPs are shown by their position on the chromosome against their association (−log10P) with height in the discovery sample in the Taiwan Biobank. SNPs are colored to reflect their linkage disequilibrium with the top SNP (rs77394186). Estimated recombination rates are plotted in cyan using Asian subjects from the 1000 Genomes Project. This plot was generated using LocusZoom.

As shown in Supplementary Material, Table S2, there are three additional novel SNPs near RASA2 (ZBTB38-RASA2) that reach genome-wide significance for height: rs1863867 (P = 7.55 × 10−10), rs76152047 (P = 1.64 × 10−8) and rs2312193 (P = 2.39 × 10−8). Interestingly, a whole-genome sequencing study in European ancestry populations reported SNP rs295301 near RASA2 (RASA2-LINC02618) to be associated with height at a suggestive level (P = 8.97 × 10−7) (7). In our Taiwan Biobank discovery sample, rs295301 also showed a suggestive association with height (P = 5.48 × 10−5) and was in moderate LD with our genome-wide significant SNPs, rs295321 and rs79147413, in RASA2 (Supplementary Material, Table S8).

To our knowledge, this is the first study to identify and replicate variants in the RASA2, RNF41, NABP2 and SLC39A5 genes for height at the genome-wide significance level. At the time of submission of this manuscript, these SNPs were novel based on all variants reported for height in the NHGRI-EBI GWAS Catalog (11).

Discussion

Recent GWASs with hundreds of thousands of individuals have successfully identified hundreds of loci across the genome contributing to human height. These uncovered variants, however, do not completely explain height heritability, thus indicating there are additional uncovered variants contributing to height variability that remain elusive. Most GWASs for height, have been conducted in European ancestry populations. Genetic studies in diverse populations have the potential to identify novel variants contributing to phenotypic variability across populations (5). In this paper, we report results from the first GWAS of human height for the Taiwan Biobank, using a discovery sample of 14 571 individuals and a replication sample of 20 506 individuals.

We find that many previously identified loci for height identified in European and East Asian population GWASs generalize to the Taiwanese population. In particular, there were 410 SNPs in 18 previously reported genes for height that reached genome-wide significance in our Taiwan Biobank discovery sample. We also identified four novel genes for height with variants that reach genome-wide significance in our discovery sample: RASA2, RNF41, NABP2 and SLC39A5. To our knowledge, this is the first GWAS reporting genome-wide significant associations between variants in these four genes and height (as verified by the NHGRI-EBI GWAS Catalog (11) at the time of the submission of this paper). All of the novel associations identified in our discovery sample were replicated at the genome-wide significance level in an independent sample from the Taiwan Biobank.

The novel RASA2 gene we identified for height is located on chromosome 3 at 3q23 and has been previously found to be associated with body mass index in Japanese (12), European (13,14) and multiethnic (15,16) GWASs, as well as in genome-wide meta-analysis studies (17,18). RASA2 encodes a member of the GAP1 family of GTPase-activating proteins, which has been implicated in regulating cellular proliferation, survival and differentiation (19). RASA2 is a strong candidate for height as this gene has been previously implicated in disorders with short stature. For example, Chen et al. (20) found that loss-of-function variants in RASA2 are likely to cause Noonan syndrome, a disorder that is characterized by unusual facies, short stature and other abnormalities. Hu et al. (21) identified copy number variations in RASA2 to be associated with idiopathic short stature in Chinese children. Tachmazidou et al. (7) reported that the SNP rs295301 near RASA2 (RASA2-LINC02618) had a suggestive significant association with height (P = 8.97 × 10−7) in a previous whole-genome sequencing study in European ancestry populations. We also found this SNP to be suggestively significant for height in the Taiwan Biobank. This result provides additional support for variants in or near RASA2 being associated with height across diverse populations.

The three novel genes for height identified on chromosome 12, RNF41, SLC39A5 and NABP2, are located at 12q13.3. A conditional analysis indicated that there likely is a single signal in this region on chromosome 12, and additional studies will be needed for fine mapping this region. The RNF41 gene encodes a protein that is responsible for mediating cytokine receptor degradation, insulin response and insulin sensitivity of skeletal muscle (22). The SLC39A5 gene encodes a zinc transporter protein, which has been speculated to regulate intracellular zinc levels where zinc is an essential nutrient for biological processes such as growth and development in humans (23). The NABP2 gene encodes a single-stranded DNA-binding protein, which has been indicated as an important factor in DNA metabolic processes such as the repair of damaged DNA (24).

Our GWAS in Taiwanese populations identified four additional novel genes for height from what was reported in He et al. (6), where 17 novel loci for height were detected via a GWAS meta-analysis of 93 926 individuals from heterogeneous East Asian populations (such as subjects in China, Korea, Japan, Singapore and the Philippines). The genetic architecture of complex traits may vary across East Asian populations, as there are distinct population structure patterns. By combining samples from different countries/regions of East Asia in the analysis, this study likely identified variants that are shared across these East Asian populations. In our discovery sample, we replicated 10 previously identified genes by He et al. (6) at the genome-wide significance level, including CEP250, DCAF16, DIS3L2, EFEMP1, GDF5, LCORL, NCAPG, PAN2, UQCC1 and ZBTB38 (Supplementary Material, Table S3). It is important to note that the East Asian meta-analysis GWAS for height did not include individuals from Taiwan. Also, the variants tested for association were restricted to genotyped or imputed variants that were present in HapMap Phase 2, and the six novel genome-wide significant variants that we identified were not included in their study. All of these factors help explain why the novel associations for height and RASA2, RNF41, NABP2 and SLC39A5 were not been previously identified in the much larger GWAS meta-analysis of East Asians.

From our analysis, we identified interesting population structure patterns in the Taiwan Biobank that is likely due to contributions from three ancestral subpopulations (Supplementary Material, Fig. S2). In accordance with our analysis, a recent study of the population structure in the Taiwan Biobank by Chen et al. (25) observed a third cline group (14.5%) in the Taiwanese population, in addition to northern Han Chinese ancestry (5%) and southern Han Chinese ancestry (79.9%). Interestingly, they also identified the distinct genetic pattern of the third cline group, such as the haplotype structure in the major histocompatibility complex region of chromosome 6, implicating that individuals in the third group may have been involved in unique evolutionary development that is independent from other Han Chinese (25). In addition, the Taiwanese sample in the Taiwan Biobank was found to be distinct from the Japanese sample using hierarchical cluster tree analysis (25). Therefore, the Taiwanese individuals in the Taiwan Biobank are distinct from other East Asian individuals based on a population genetics point of view.

In conclusion, our study underscores the importance of genetic studies in diverse populations. There is a great potential for the identification of novel genetic discoveries in underrepresented worldwide populations, even for common traits, such as height, that have been broadly and extensively investigated in large-scale GWAS studies of European ancestry. The Taiwan Biobank is, to date, the largest Taiwanese cohort for genetic studies. As the Taiwan Biobank continues to grow in size, we expect future studies using this useful resource to yield additional novel genetic discoveries for complex traits in the Taiwanese population.

Materials and Methods

Taiwan Biobank

The study cohort consisted of 14 571 Taiwanese subjects in the discovery sample and 20 506 Taiwanese subjects in the replication sample from the Taiwan Biobank (25–31). This biobank recruited subjects from the general Taiwanese population across Taiwan and collected specimens and relevant data via recruitment centers during 2013–2015 (25–31). The Taiwan Biobank is primarily supported by the Taiwanese government and targets to facilitate scientific research on a wide array of public health issues and to provide researchers with collaboration opportunities concerning local common chronic diseases (26). Inclusion criteria were participants aged 30–70 years, self-reported as being of Taiwanese Han Chinese ancestry, and could perform activities of daily living at the time of recruitment (25). Exclusion criteria included individuals with a history of cancer or who were nonresidents of Taiwan (25). Ethical approval for the study was granted by the Institutional Review Board of the Taiwan Biobank before conducting the study (approval number: 201506095RINC). Each subject signed the approved informed consent form. All experiments were performed in accordance with relevant guidelines and regulations.

Genotyping and imputation

DNA was isolated from blood samples by using a QIAamp DNA blood kit and by following the manufacturer’s instructions (Qiagen, Valencia, CA, USA). We evaluated the quality of the isolated genomic DNA by using agarose gel electrophoresis and determined the quantity by using spectrophotometry (32,33). SNP genotyping was completed on the Axiom Genome-Wide Array Plate System (Affymetrix, Santa Clara, CA, USA) by employing the custom Taiwan Biobank chips. In order to precisely gain maximal genetic data from Taiwanese Han Chinese samples, the custom Taiwan Biobank chip was established by utilizing SNPs in exons with minor allele frequencies (MAFs) > 10% on the Human Exome BeadChip chip (Illumina, Inc., San Diego, CA, USA), by employing SNPs on the Axiom Genome-Wide CHB 1 Array chip (Affymetrix, Inc., Santa Clara, CA, USA) with MAFs ≥5%, and by taking advantage of SNPs formerly identified in cancer, pharmacogenomics and genetic ancestry studies (25,34). In order to gauge the performance of the custom Taiwan Biobank chip, 70 unrelated Taiwanese subjects were genotyped by utilizing both the Axiom Genome-Wide CHB 1 Array chip and the custom Taiwan Biobank chip. In these 70 individuals, a high average concordance rate of 99.6% was achieved for the SNPs (25).

In addition, imputation was carried out using Michigan Imputation Server (35), with East Asian samples from the Phase 3 (Version 5) of the 1000 Genomes Project. In this study, we implemented quality control procedures to exclude problematic SNPs from further analysis (28,36) that were not in Hardy–Weinberg equilibrium (HWE) (with a P-value less than 1 × 10−6) or had a genotyping call rate less than 90% or MAF < 5%. After the quality control procedure, a total of 601 768 directly genotyped SNPs and 5 682 219 imputed SNPs were included for analyses.

Statistical analysis

In GWASs, spurious associations may occur if both relatedness and population stratification have not been accordingly considered and adjusted (37). Thus, it is essential to identify sources of dependence such as both pedigree and population structure among the sample individuals using statistical estimation methods. For inference on population structure in the Taiwan Biobank, we used the PC-AiR method (8), which provides robust principal components for ancestry in the presence of known or cryptic relatedness. For inference and estimation of relatedness among individuals in the Taiwan Biobank, we used the PC-relate method (7), which utilizes principal components generated by the PC-AiR method for the estimation of recent kinship in samples with population stratification.

In this GWAS for height, genetic associations with each variant were assessed using a linear mixed model (LMM) that included age, gender and two ancestry representative PCs as fixed effect covariates and a random effect with a genetic relatedness matrix (GRM) to account for background polygenic effects acting on height (7,8,38). The GENESIS software (9) was used to conduct the LMM analysis, and we utilized PCs from the PC-AiR method as covariates for ancestry in the LMM. The PC-relate method was used to estimate genetic relatedness and to construct the GRM for the random effect in the LMM. The genome-wide significance threshold of P < 5 × 10−8 that is widely used for GWAS was implemented for this study. In addition, a conditional analysis was conducted using the LMM with covariates including rs77394186 in NABP2, age, gender, two ancestry representative PCs and a polygenic effect due to genetic relatedness.

For quality control, we excluded SNPs with a genotyping call rate less than 90%. We also assessed HWE of SNPs based on genotype frequencies by utilizing a chi-squared goodness-of-fit test with one degree of freedom, and removed SNPs with a HWE P-value less than 1 × 10−6. We also used the top two PCs from PC-AiR to identify any outlier individuals in the Taiwan Biobank sample. No individuals were identified to be outliers based on inferred genetic ancestry from the PCs (25).

The Manhattan and Q-Q plots were created using the R package ‘qqman.’ We also estimated the LD patterns of SNPs with PLINK (36). In addition, the LMM employed in the aforementioned GENESIS software (9) was used to estimate height heritability that is explained by the entire set of genotyped SNPs with adjustment for the covariates age, gender and two ancestry representative PCs as fixed effects. Furthermore, we generated locus zoom plots in candidate genes using the LocusZoom tool (39). The Kruskal–Wallis test was conducted to measure the difference in the means of two continuous variables. The chi-square test was performed for categorical data. The heat map plots were created using the ‘LDheatmap’ package written in R (40).

The NHGRI-EBI GWAS Catalog (11) (https://www.ebi.ac.uk/gwas/) was used to identify SNPS from previous GWASs of height and directly compare the associations in the discovery sample from the Taiwan Biobank. We obtained a list of SNPs from the NHGRI-EBI GWAS Catalog for the ‘body height’ phenotype and an association P-value < 5 × 10−8. We then matched the SNP rsIDs from this list to our Taiwan Biobank GWAS, resulting in a total of 1188 SNPs that were included in an analysis to assess the transferability of association results from previous GWASs of height to the Taiwanese population.

Conflict of Interest statement. The authors declare no competing interests.

Funding

Ministry of Science and Technology of Taiwan (MOST 109-2634-F-075-001); Taipei Veterans General Hospital (V108D44-001-MY3-1).

References

1.

Galton
,
F.
(
1886
)
Regression towards mediocrity in hereditary stature
.
J. Anthropol. Inst. Great Brit. Ireland
,
15
,
246
263
.

2.

Allen
,
H.L.
,
Estrada
,
K.
,
Lettre
,
G.
,
Berndt
,
S.I.
,
Weedon
,
M.N.
,
Rivadeneira
,
F.
,
Willer
,
C.J.
,
Jackson
,
A.U.
,
Vedantam
,
S.
and
Raychaudhuri
,
S.
(
2010
)
Hundreds of variants clustered in genomic loci and biological pathways affect human height
.
Nature
,
467
,
832
838
.

3.

Wood
,
A.R.
,
Esko
,
T.
,
Yang
,
J.
,
Vedantam
,
S.
,
Pers
,
T.H.
,
Gustafsson
,
S.
,
Chu
,
A.Y.
,
Estrada
,
K.
,
Luan
,
J.A.
and
Kutalik
,
Z.
(
2014
)
Defining the role of common variation in the genomic and biological architecture of adult human height
.
Nat. Genet.
,
46
,
1173
1186
.

4.

Yengo
,
L.
,
Sidorenko
,
J.
,
Kemper
,
K.E.
,
Zheng
,
Z.
,
Wood
,
A.R.
,
Weedon
,
M.N.
,
Frayling
,
T.M.
,
Hirschhorn
,
J.
,
Yang
,
J.
and
Visscher
,
P.M.
(
2018
)
Meta-analysis of genome-wide association studies for height and body mass index in ∼700000 individuals of European ancestry
.
Hum. Mol. Genet.
,
27
,
3641
3649
.

5.

Wojcik
,
G.L.
,
Graff
,
M.
,
Nishimura
,
K.K.
,
Tao
,
R.
,
Haessler
,
J.
,
Gignoux
,
C.R.
,
Highland
,
H.M.
,
Patel
,
Y.M.
,
Sorokin
,
E.P.
and
Avery
,
C.L.
(
2019
)
Genetic analyses of diverse populations improves discovery for complex traits
.
Nature
,
570
,
514
518
.

6.

He
,
M.
,
Xu
,
M.
,
Zhang
,
B.
,
Liang
,
J.
,
Chen
,
P.
,
Lee
,
J.-Y.
,
Johnson
,
T.A.
,
Li
,
H.
,
Yang
,
X.
and
Dai
,
J.
(
2014
)
Meta-analysis of genome-wide association studies of adult height in East Asians identifies 17 novel loci
.
Hum. Mol. Genet.
,
24
,
1791
1800
.

7.

Tachmazidou
,
I.
,
Süveges
,
D.
,
Min
,
J.L.
,
Ritchie
,
G.R.
,
Steinberg
,
J.
,
Walter
,
K.
,
Iotchkova
,
V.
,
Schwartzentruber
,
J.
,
Huang
,
J.
and
Memari
,
Y.
(
2017
)
Whole-genome sequencing coupled to imputation discovers genetic signals for anthropometric traits
.
Am. J. Hum. Genet.
,
100
,
865
884
.

8.

Conomos
,
M.P.
,
Reiner
,
A.P.
,
Weir
,
B.S.
and
Thornton
,
T.A.
(
2016
)
Model-free estimation of recent genetic relatedness
.
Am. J. Hum. Genet.
,
98
,
127
148
.

9.

Conomos
,
M.P.
,
Miller
,
M.B.
and
Thornton
,
T.A.
(
2015
)
Robust inference of population structure for ancestry prediction and correction of stratification in the presence of relatedness
.
Genet. Epidemiol.
,
39
,
276
293
.

10.

Gogarten
,
S.M.
,
Sofer
,
T.
,
Chen
,
H.
,
Yu
,
C
,
Brody
,
J.A.
,
Thornton
,
T.
,
Rice
,
K.M.
and
Conomos
,
M.P.
(
2019
) Genetic association testing using the GENESIS R/Bioconductor package.
Bioinformatics.
,
35
,
5346
5348
.

11.

Buniello
,
A.
,
MacArthur
,
J.A.L.
,
Cerezo
,
M.
,
Harris
,
L.W.
,
Hayhurst
,
J.
,
Malangone
,
C.
,
McMahon
,
A.
,
Morales
,
J.
,
Mountjoy
,
E.
and
Sollis
,
E.
(
2018
)
The NHGRI-EBI GWAS catalog of published genome-wide association studies, targeted arrays and summary statistics 2019
.
Nucleic Acids Res.
,
47
,
D1005
D1012
.

12.

Akiyama
,
M.
,
Okada
,
Y.
,
Kanai
,
M.
,
Takahashi
,
A.
,
Momozawa
,
Y.
,
Ikeda
,
M.
,
Iwata
,
N.
,
Ikegawa
,
S.
,
Hirata
,
M.
and
Matsuda
,
K.
(
2017
)
Genome-wide association study identifies 112 new loci for body mass index in the Japanese population
.
Nat. Genet.
,
49
,
1458
1467
.

13.

Kichaev
,
G.
,
Bhatia
,
G.
,
Loh
,
P.-R.
,
Gazal
,
S.
,
Burch
,
K.
,
Freund
,
M.K.
,
Schoech
,
A.
,
Pasaniuc
,
B.
and
Price
,
A.L.
(
2019
)
Leveraging polygenic functional enrichment to improve GWAS power
.
Am. J. Hum. Genet.
,
104
,
65
75
.

14.

Winkler
,
T.W.
,
Justice
,
A.E.
,
Graff
,
M.
,
Barata
,
L.
,
Feitosa
,
M.F.
,
Chu
,
S.
,
Czajkowski
,
J.
,
Esko
,
T.
,
Fall
,
T.
and
Kilpeläinen
,
T.O.
(
2015
)
The influence of age and sex on genetic associations with adult body size and shape: a large-scale genome-wide interaction study
.
PLoS Genet.
,
11
,
e1005378
.

15.

Hoffmann
,
T.J.
,
Choquet
,
H.
,
Yin
,
J.
,
Banda
,
Y.
,
Kvale
,
M.N.
,
Glymour
,
M.
,
Schaefer
,
C.
,
Risch
,
N.
and
Jorgenson
,
E.
(
2018
)
A large multiethnic genome-wide association study of adult body mass index identifies novel loci
.
Genetics
,
210
,
499
515
.

16.

Locke
,
A.E.
,
Kahali
,
B.
,
Berndt
,
S.I.
,
Justice
,
A.E.
,
Pers
,
T.H.
,
Day
,
F.R.
,
Powell
,
C.
,
Vedantam
,
S.
,
Buchkovich
,
M.L.
and
Yang
,
J.
(
2015
)
Genetic studies of body mass index yield new insights for obesity biology
.
Nature
,
518
,
197
206
.

17.

Graff
,
M.
,
Scott
,
R.A.
,
Justice
,
A.E.
,
Young
,
K.L.
,
Feitosa
,
M.F.
,
Barata
,
L.
,
Winkler
,
T.W.
,
Chu
,
A.Y.
,
Mahajan
,
A.
and
Hadley
,
D.
(
2017
)
Genome-wide physical activity interactions in adiposity—a meta-analysis of 200,452 adults
.
PLoS Genet.
,
13
,
e1006528
.

18.

Justice
,
A.E.
,
Winkler
,
T.W.
,
Feitosa
,
M.F.
,
Graff
,
M.
,
Fisher
,
V.A.
,
Young
,
K.
,
Barata
,
L.
,
Deng
,
X.
,
Czajkowski
,
J.
and
Hadley
,
D.
(
2017
)
Genome-wide meta-analysis of 241,258 adults accounting for smoking behaviour identifies novel loci for obesity traits
.
Nat. Commun.
,
8
,
14977
.

19.

Aoki
,
Y.
,
Niihori
,
T.
,
Banjo
,
T.
,
Okamoto
,
N.
,
Mizuno
,
S.
,
Kurosawa
,
K.
,
Ogata
,
T.
,
Takada
,
F.
,
Yano
,
M.
and
Ando
,
T.
(
2013
)
Gain-of-function mutations in RIT1 cause Noonan syndrome, a RAS/MAPK pathway syndrome
.
Am. J. Hum. Genet.
,
93
,
173
180
.

20.

Chen
,
P.-C.
,
Yin
,
J.
,
Yu
,
H.-W.
,
Yuan
,
T.
,
Fernandez
,
M.
,
Yung
,
C.K.
,
Trinh
,
Q.M.
,
Peltekova
,
V.D.
,
Reid
,
J.G.
and
Tworog-Dube
,
E.
(
2014
)
Next-generation sequencing identifies rare variants associated with Noonan syndrome
.
Proc. Natl. Acad. Sci. USA
,
111
,
11473
11478
.

21.

Hu
,
G.
,
Fan
,
Y.
,
Wang
,
L.
,
Yao
,
R.-E.
,
Huang
,
X.
,
Shen
,
Y.
,
Yu
,
Y.
and
Gu
,
X.
(
2016
)
Copy number variations in 119 Chinese children with idiopathic short stature identified by the custom genome-wide microarray
.
Mol. Cytogenet.
,
9
,
16
.

22.

Breuker
,
C.
,
Amouzou
,
C.
,
Fabre
,
O.
,
Lambert
,
K.
,
Seyer
,
P.
,
Bourret
,
A.
,
Salehzada
,
T.
,
Mercier
,
J.
,
Sultan
,
A.
and
Bisbal
,
C.
(
2018
)
Decreased RNF41 expression leads to insulin resistance in skeletal muscle of obese women
.
Metabolism
,
83
,
81
91
.

23.

Hennigar
,
S.R.
,
Kelley
,
A.M.
and
McClung
,
J.P.
(
2016
)
Metallothionein and zinc transporter expression in circulating human blood cells as biomarkers of zinc status: a systematic review
.
Adv. Nutr.
,
7
,
735
746
.

24.

Paquet
,
N.
,
Adams
,
M.N.
,
Ashton
,
N.W.
,
Touma
,
C.
,
Gamsjaeger
,
R.
,
Cubeddu
,
L.
,
Leong
,
V.
,
Beard
,
S.
,
Bolderson
,
E.
and
Botting
,
C.H.
(
2016
)
hSSB1 (NABP2/OBFC2B) is regulated by oxidative stress
.
Sci. Rep.
,
6
,
27446
.

25.

Chen
,
C.H.
,
Yang
,
J.H.
,
Chiang
,
C.W.K.
,
Hsiung
,
C.N.
,
Wu
,
P.E.
,
Chang
,
L.C.
,
Chu
,
H.W.
,
Chang
,
J.
,
Song
,
I.W.
,
Yang
,
S.L.
et al. (
2016
)
Population structure of Han Chinese in the modern Taiwanese population based on 10,000 participants in the Taiwan Biobank project
.
Hum. Mol. Genet.
,
25
,
5321
5331
.

26.

Fan
,
C.T.
,
Lin
,
J.C.
and
Lee
,
C.H.
(
2008
)
Taiwan Biobank: a project aiming to aid Taiwan's transition into a biomedical island
.
Pharmacogenomics
,
9
,
235
246
.

27.

Hou
,
S.-J.
,
Tsai
,
S.-J.
,
Kuo
,
P.-H.
,
Liu
,
Y.-L.
,
Yang
,
A.C.
,
Lin
,
E.
and
Lan
,
T.-H.
(
2020
)
An association study in the Taiwan Biobank reveals RORA as a novel locus for sleep duration in the Taiwanese population
.
Sleep Med.
,
73
,
70
75
.

28.

Lin
,
E.
,
Kuo
,
P.H.
,
Liu
,
Y.L.
,
Yang
,
A.C.
,
Kao
,
C.F.
and
Tsai
,
S.J.
(
2016
)
Association and interaction of APOA5, BUD13, CETP, LIPA and health-related behavior with metabolic syndrome in a Taiwanese population
.
Sci. Rep.
,
6
,
36830
.

29.

Lin
,
E.
,
Kuo
,
P.H.
,
Liu
,
Y.L.
,
Yang
,
A.C.
,
Kao
,
C.F.
and
Tsai
,
S.J.
(
2017
)
Effects of circadian clock genes and health-related behavior on metabolic syndrome in a Taiwanese population: evidence from association and interaction analysis
.
PLoS One
,
12
,
e0173861
.

30.

Lin
,
E.
,
Tsai
,
S.J.
,
Kuo
,
P.H.
,
Liu
,
Y.L.
,
Yang
,
A.C.
,
Kao
,
C.F.
and
Yang
,
C.H.
(
2017
)
The ADAMTS9 gene is associated with cognitive aging in the elderly in a Taiwanese population
.
PLoS One
,
12
,
e0172440
.

31.

Lin
,
E.
,
Tsai
,
S.J.
,
Kuo
,
P.H.
,
Liu
,
Y.L.
,
Yang
,
A.C.
,
Kao
,
C.F.
and
Yang
,
C.H.
(
2017
)
The rs1277306 variant of the REST gene confers susceptibility to cognitive aging in an elderly Taiwanese population
.
Dement. Geriatr. Cogn. Disord.
,
43
,
119
127
.

32.

Lin
,
E.
,
Kuo
,
P.-H.
,
Liu
,
Y.-L.
,
Yang
,
A.
and
Tsai
,
S.-J.
(
2019
)
Association and interaction effects of interleukin-12 related genes and physical activity on cognitive aging in old adults in the Taiwanese population
.
Front. Neurol.
,
10
,
1065
.

33.

Lin
,
E.
,
Kuo
,
P.-H.
,
Liu
,
Y.-L.
,
Yang
,
A.C.
and
Tsai
,
S.-J.
(
2019
)
Polymorphisms of the DNA repair gene EXO1 modulate cognitive aging in old adults in a Taiwanese population
.
DNA Repair
,
78
,
1
6
.

34.

Lin
,
E.
,
Kuo
,
P.-H.
,
Liu
,
Y.-L.
,
Yang
,
A.C.
and
Tsai
,
S.-J.
(
2017
)
Transforming growth factor-β signaling pathway-associated genes SMAD2 and TGFBR2 are implicated in metabolic syndrome in a Taiwanese population
.
Sci. Rep.
,
7
,
1
8
.

35.

Das
,
S.
,
Forer
,
L.
,
Schönherr
,
S.
,
Sidore
,
C.
,
Locke
,
A.E.
,
Kwong
,
A.
,
Vrieze
,
S.I.
,
Chew
,
E.Y.
,
Levy
,
S.
and
McGue
,
M.
(
2016
)
Next-generation genotype imputation service and methods
.
Nat. Genet.
,
48
,
1284
1287
.

36.

Purcell
,
S.
,
Neale
,
B.
,
Todd-Brown
,
K.
,
Thomas
,
L.
,
Ferreira
,
M.A.
,
Bender
,
D.
,
Maller
,
J.
,
Sklar
,
P.
,
de
Bakker
,
P.I.
,
Daly
,
M.J.
et al. (
2007
)
PLINK: a tool set for whole-genome association and population-based linkage analyses
.
Am. J. Hum. Genet.
,
81
,
559
575
.

37.

Thornton
,
T.
,
Tang
,
H.
,
Hoffmann
,
T.J.
,
Ochs-Balcom
,
H.M.
,
Caan
,
B.J.
and
Risch
,
N.
(
2012
)
Estimating kinship in admixed populations
.
Am. J. Hum. Genet.
,
91
,
122
138
.

38.

Breslow
,
N.E.
and
Clayton
,
D.G.
(
1993
)
Approximate inference in generalized linear mixed models
.
J. Am. Stat. Assoc.
,
88
,
9
25
.

39.

Pruim
,
R.J.
,
Welch
,
R.P.
,
Sanna
,
S.
,
Teslovich
,
T.M.
,
Chines
,
P.S.
,
Gliedt
,
T.P.
,
Boehnke
,
M.
,
Abecasis
,
G.R.
and
Willer
,
C.J.
(
2010
)
LocusZoom: regional visualization of genome-wide association scan results
.
Bioinformatics
,
26
,
2336
2337
.

40.

Shin
,
J.-H.
,
Blay
,
S.
,
McNeney
,
B.
and
Graham
,
J.
(
2006
)
LDheatmap: an R function for graphical display of pairwise linkage disequilibria between single nucleotide polymorphisms
.
J. Stat. Softw.
,
16
,
1
10
.

Author notes

Eugene Lin and Shih-Jen Tsai authors contributed equally to this work.

This article is published and distributed under the terms of the Oxford University Press, Standard Journals Publication Model (https://academic.oup.com/journals/pages/open_access/funder_policies/chorus/standard_publication_model)

Supplementary data