Abstract

Previously identified common variants explain only a small fraction of the trait heritability and at most loci the identities of the underlying causal genes and their functional variants still remain unknown. To identify the low-frequency and rare coding variants that influence lipid levels, we conducted a meta-analysis of exome-wide association studies in 14,473 Chinese subjects, followed by a joint analysis with 1000 genomes imputed data from 6,534 samples. We replicated 24 previously reported lipid loci with exome-wide significance (P < 3.3 × 10 7), including fourteen coding variants at ten confirmed lipid loci (P range from 1.44 × 10 7 to 1.64 × 10 45). Of these, six coding variants showed population-specific associations and were independent of previously identified associations in European populations, including four low-frequency (PCSK9 p.Arg93Cys, HMGCR p.Tyr311Ser, APOA5 p.Gly185Cys and CETP p.Asp399Gly) and two common (APOB p.Arg532Trp and APOA4 p.Ser147Asn) variants. Furthermore, we detected three new lead non-coding variants at LPA, LIPC and LDLR in Chinese. The independent variants at PCSK9, HMGCR, LPA, APOA5 and LDLR were also associated with increased risk of coronary artery disease in the expected direction. In gene-based tests, the burden of rare or low frequency variants in PCSK9, HMGCR and CEPT exhibited strong associations with blood lipid levels (P < 2.8 × 10 6). Our findings identify additional population-specific possible causal variants. Our data demonstrate that the inter-ethnic differences in allele frequencies of coding variants may lead to different association signals across ethnic groups, highlighting the importance of including diverse populations to uncover genetic variation associated with lipid levels.

Introduction

Plasma concentrations of total cholesterol (TC), low-density lipoprotein cholesterol (LDL-C), high-density lipoprotein cholesterol (HDL-C) and triglycerides (TG) are the most important risk factors for cardiovascular diseases and are targets for therapeutic intervention (1,2). Genome-wide association studies (GWAS) have successfully identified multiple genetic loci associated with blood lipids (3–7). However, GWAS provide limited coverage of low-frequency or rare variants, which may partly explain why all the common variants identified by GWAS to date only explain a small fraction of the heritability of lipid levels. More importantly, the coding variants may result in the loss of function and are often ethnic-specific, thus helping us to discover potentially causative variants.

Several studies using the exome chip have been conducted in Europeans and rare genetic variations have been found to show strong associations with lipid levels (8–9). However, few such genetic studies have been conducted to assess Asian populations. Asians appear to have distinct lifestyle factors, including diet, body mass index and exercise, which could play a role in the risk of hyperlipidaemia. In the present study, we performed a meta-analysis of exome-wide association studies (EWAS) of blood lipid levels involving a total of 14,473 subjects of Chinese Han ancestry. Our specific aim was not only to investigate whether the known lipid loci harboured coding variants, but also to identify novel or population-specific variants for blood lipid levels. We subsequently investigated whether the identified variants could contribute to coronary artery disease (CAD) in 6,179 cases and 9,552 controls individuals.

Results

Overview of single-variant associations

A detailed description of clinical characteristics and phenotype measurements for the 14,473 subjects that were genotyped using the whole exome array are summarized in Supplementary Material, Table S1 and Supplementary Material, Methods. A total of 151,289 polymorphic variants were analysed after quality control (Supplementary Material, Table S2). Approximately 70.79% (n= 107,099) of the polymorphic variants are annotated as missense or loss of function (splice, stop-gain and stop-loss) variants (Supplementary Material, Table S3). The meta-analysis separately evaluated associations for TC, LDL-C, HDL-C and TG. Quantile–quantile plots for lipid levels are presented in Supplementary Material, Fig. S1. The genomic control inflation factor (λ) for the meta-analysis was low (λ = 1.01∼1.02), indicating that population stratification effects were negligible in our study samples.

As shown in the Manhattan plots of the -log10P values for lipid levels in Supplementary Material, Fig. S2, the meta-analysis confirmed twenty previously reported loci at chip-wide significance (defined as P < 3.3 × 10−7). The most significant variants in each locus are shown in Supplementary Material, Table S4. No novel loci for lipid levels are identified in our samples.

Associations of coding variants at previously reported loci

Of these 20 significant loci, 9 loci harbour 13 coding variants with exome-wide significance with one or more lipid traits (Table 1 and Supplementary Material, Fig. S3). The coding variants in GCKR, MLXIPL, LPL and APOE have been reported to be significantly associated with lipid levels in the previous GWAS and EWAS (3,9). At GCKR-C2orf16-GPN1 locus, the index variant rs1260326 (p.Leu446Pro) previously identified by GWAS was also the most significant signal in our data (P = 4.43 × 10−21). The other two significant associations of coding variants in this locus, rs1919128 and rs3749147, can be explained by the lead rs1260326 (Pconditional = 0.58 and 0.13, respectively). In MLXIPL, rs35332062 encoding p.Ala358Val (P = 1.27 × 10−9) and rs3812316 encoding p.Gln241His (P = 5.89 × 10−9) are in strong linkage disequilibrium (LD) with the lead GWAS index rs17145738 (P = 8.99 × 10−10) (pairwise r2 from 0.84 to 1). In addition, the associations of two coding variants identified by exome wide association studies in Europeans (9), rs328 creating the stop codon p.Ser474 at LPL and rs7412 encoding p.Arg176Cys at APOE, were confirmed by our samples.

Table 1.

Association results for coding variants with blood lipids in Chinese population at exome-wide significance (P < 3.3 × 10−7)

GeneVariantsCHRPositionA1A2Amino-acid changeTraitNMAFBETASEP
Population-specific coding variants
PCSK9rs151193009155509585TCp.Arg93CysLDL-C143540.014−0.5420.0503.43× 10−27
TC143330.014−0.4830.0505.43 × 10−22
APOBrs13306194221252534AGp.Arg532TrpTC143330.115−0.1110.0192.72 × 10−9
HMGCRrs191835914574646765CAp.Tyr311SerLDL-C143540.020−0.2510.0421.95 × 10−9
APOA5rs207529111116661392ACp.Gly185CysTG144590.0550.3650.0261.64 × 10−45
HDL-C144560.055−0.2600.0266.59 × 10−24
APOA4rs510411116692334CTp.Ser147AsnTG144590.3290.1370.0134.81 × 10−27
CETPrs23037901657017292GAp.Asp399GlyHDL-C144560.0220.3470.0405.95 × 10−18
Previous reported coding variant in genome or exome-wide association studies
GCKRrs1260326227730940CTp.Leu446ProTG144590.478−0.1120.0124.43 × 10−21
C2orf16rs1919128227801759GAp.Ile774ValTG144590.4880.0810.0126.72 × 10−12
GPN1rs3749147227851918AGp.Arg12LysTG144590.2850.0690.0131.44 × 10−7
MLXIPLrs35332062773012042AGp.Ala358ValTG144590.111−0.1150.0191.27 × 10−9
MLXIPLrs3812316773020337GCp.Gln241HisTG132870.109−0.1160.0205.89 × 10−9
LPLrs328819819724GCp.Ser474*HDL-C144560.0930.1970.0219.78 × 10−22
TG144590.093−0.1480.0216.05 × 10−13
APOErs74121945412079TCp.Arg176CysLDL-C47870.084−0.3680.0379.66 × 10−24
TC47870.084−0.2800.0361.62 × 10−14
GeneVariantsCHRPositionA1A2Amino-acid changeTraitNMAFBETASEP
Population-specific coding variants
PCSK9rs151193009155509585TCp.Arg93CysLDL-C143540.014−0.5420.0503.43× 10−27
TC143330.014−0.4830.0505.43 × 10−22
APOBrs13306194221252534AGp.Arg532TrpTC143330.115−0.1110.0192.72 × 10−9
HMGCRrs191835914574646765CAp.Tyr311SerLDL-C143540.020−0.2510.0421.95 × 10−9
APOA5rs207529111116661392ACp.Gly185CysTG144590.0550.3650.0261.64 × 10−45
HDL-C144560.055−0.2600.0266.59 × 10−24
APOA4rs510411116692334CTp.Ser147AsnTG144590.3290.1370.0134.81 × 10−27
CETPrs23037901657017292GAp.Asp399GlyHDL-C144560.0220.3470.0405.95 × 10−18
Previous reported coding variant in genome or exome-wide association studies
GCKRrs1260326227730940CTp.Leu446ProTG144590.478−0.1120.0124.43 × 10−21
C2orf16rs1919128227801759GAp.Ile774ValTG144590.4880.0810.0126.72 × 10−12
GPN1rs3749147227851918AGp.Arg12LysTG144590.2850.0690.0131.44 × 10−7
MLXIPLrs35332062773012042AGp.Ala358ValTG144590.111−0.1150.0191.27 × 10−9
MLXIPLrs3812316773020337GCp.Gln241HisTG132870.109−0.1160.0205.89 × 10−9
LPLrs328819819724GCp.Ser474*HDL-C144560.0930.1970.0219.78 × 10−22
TG144590.093−0.1480.0216.05 × 10−13
APOErs74121945412079TCp.Arg176CysLDL-C47870.084−0.3680.0379.66 × 10−24
TC47870.084−0.2800.0361.62 × 10−14

CHR, chromosome; A1, minor allele; A2, major allele; MAF, minor allele frequency.

Genomic positions are on human genome build hg19.

Table 1.

Association results for coding variants with blood lipids in Chinese population at exome-wide significance (P < 3.3 × 10−7)

GeneVariantsCHRPositionA1A2Amino-acid changeTraitNMAFBETASEP
Population-specific coding variants
PCSK9rs151193009155509585TCp.Arg93CysLDL-C143540.014−0.5420.0503.43× 10−27
TC143330.014−0.4830.0505.43 × 10−22
APOBrs13306194221252534AGp.Arg532TrpTC143330.115−0.1110.0192.72 × 10−9
HMGCRrs191835914574646765CAp.Tyr311SerLDL-C143540.020−0.2510.0421.95 × 10−9
APOA5rs207529111116661392ACp.Gly185CysTG144590.0550.3650.0261.64 × 10−45
HDL-C144560.055−0.2600.0266.59 × 10−24
APOA4rs510411116692334CTp.Ser147AsnTG144590.3290.1370.0134.81 × 10−27
CETPrs23037901657017292GAp.Asp399GlyHDL-C144560.0220.3470.0405.95 × 10−18
Previous reported coding variant in genome or exome-wide association studies
GCKRrs1260326227730940CTp.Leu446ProTG144590.478−0.1120.0124.43 × 10−21
C2orf16rs1919128227801759GAp.Ile774ValTG144590.4880.0810.0126.72 × 10−12
GPN1rs3749147227851918AGp.Arg12LysTG144590.2850.0690.0131.44 × 10−7
MLXIPLrs35332062773012042AGp.Ala358ValTG144590.111−0.1150.0191.27 × 10−9
MLXIPLrs3812316773020337GCp.Gln241HisTG132870.109−0.1160.0205.89 × 10−9
LPLrs328819819724GCp.Ser474*HDL-C144560.0930.1970.0219.78 × 10−22
TG144590.093−0.1480.0216.05 × 10−13
APOErs74121945412079TCp.Arg176CysLDL-C47870.084−0.3680.0379.66 × 10−24
TC47870.084−0.2800.0361.62 × 10−14
GeneVariantsCHRPositionA1A2Amino-acid changeTraitNMAFBETASEP
Population-specific coding variants
PCSK9rs151193009155509585TCp.Arg93CysLDL-C143540.014−0.5420.0503.43× 10−27
TC143330.014−0.4830.0505.43 × 10−22
APOBrs13306194221252534AGp.Arg532TrpTC143330.115−0.1110.0192.72 × 10−9
HMGCRrs191835914574646765CAp.Tyr311SerLDL-C143540.020−0.2510.0421.95 × 10−9
APOA5rs207529111116661392ACp.Gly185CysTG144590.0550.3650.0261.64 × 10−45
HDL-C144560.055−0.2600.0266.59 × 10−24
APOA4rs510411116692334CTp.Ser147AsnTG144590.3290.1370.0134.81 × 10−27
CETPrs23037901657017292GAp.Asp399GlyHDL-C144560.0220.3470.0405.95 × 10−18
Previous reported coding variant in genome or exome-wide association studies
GCKRrs1260326227730940CTp.Leu446ProTG144590.478−0.1120.0124.43 × 10−21
C2orf16rs1919128227801759GAp.Ile774ValTG144590.4880.0810.0126.72 × 10−12
GPN1rs3749147227851918AGp.Arg12LysTG144590.2850.0690.0131.44 × 10−7
MLXIPLrs35332062773012042AGp.Ala358ValTG144590.111−0.1150.0191.27 × 10−9
MLXIPLrs3812316773020337GCp.Gln241HisTG132870.109−0.1160.0205.89 × 10−9
LPLrs328819819724GCp.Ser474*HDL-C144560.0930.1970.0219.78 × 10−22
TG144590.093−0.1480.0216.05 × 10−13
APOErs74121945412079TCp.Arg176CysLDL-C47870.084−0.3680.0379.66 × 10−24
TC47870.084−0.2800.0361.62 × 10−14

CHR, chromosome; A1, minor allele; A2, major allele; MAF, minor allele frequency.

Genomic positions are on human genome build hg19.

We also identified 6 missense variants that showed greatly inter-ethnic differences in allele frequency at PCSK9, APOB, HMGCR, APOA5, APOA4 and CETP. Except for rs5104 at APOA4, all other missense variants were not polymorphic in 1000 genomes 3 non-Asian populations (EUR, AFR and AMR) (Supplementary Material, Table S5), while the four missense variants at PCSK9, HMGCR, APOA5 and CETP had low minor allele frequencies (MAF from 1.4% to 5.5%) in our samples. The PCSK9 p.Arg93Cys variant, APOB p.Arg532Trp, APOA5 p.Gly185Cys and CETP p.Asp399Gly were predicted by both PolyPhen and SIFT to likely cause damage to the protein structure/function (Supplementary Material, Table S6).

Population-specific missense variants

We next explored the relationship between the coding variants identified in Chinese populations and the previously reported variants at the corresponding loci in the genome and exome wide association studies of European ancestry by performing conditional analysis to recognize independent coding variants. All six missense variants (PCSK9, APOB, HMGCR, APOA5, APOA4 and CEPT) identified in our samples were independent association signals at each locus (Table 2).

Table 2.

Association results for independent coding variants in Chinese at the previously reported loci

Independent coding variants in Chinese
Associations of the reported variants by genome and exome wide association studies
TraitGeneVariantsP initialP conditionalVariantsChr:PositionAlleleAmino-acid changeMAFBETA(SE)Pr2Reference
LDL-CPCSK9rs1511930093.43 × 10−273.53 × 10−27rs24794091:55504650A/Gnoncoding31.73%0.005(0.013)0.700.013
3.28 × 10−27rs115911471:55505647T/Gp.Arg46Leu0.003%−0.812(1.000)0.4208
rs676089431:55512222G/Cp.Tyr142*08
rs283622861:55529215A/Cp.Cys679*08
TCAPOBrs133061942.72 × 10−92.33 × 10−8rs13671172:21263900A/Gp.Thr98Ile12.87%0.072(0.018)4.27 × 10−50.013
2.49 × 10−9rs57429042:21229160T/Cp.Arg3527Gln0.01%0.446(0.577)0.4408
LDL-CHMGCRrs1918359141.95 × 10−92.83 × 10−7rs126542645:74648603A/Tnoncoding48.10%−0.076(0.012)1.28 × 10−100.023
TGAPOA5rs20752911.64 × 10−458.02 × 10−59rs96418411:116648917G/Cnoncoding21.51%0.225(0.015)6.76 × 10−530.023
1.52 × 10−45rs313550611:116662407C/Gp.Ser19Trp0.08%0.054(0.204)0.7909
1.45 × 10−45rs14295314011:117089205T/Cp.Arg504His0.007%0.643(0.705)0.3608
TGAPOA4rs51044.81 × 10−271.44 × 10−8rs9641840.14
3.25 × 10−27rs31355060
4.94 × 10−27rs1429531400
HDL-CCETPrs23037905.95 × 10−182.45 × 10−9rs376426116:56993324A/Cnoncoding16.24%0.214(0.016)3.88 × 10−410.043
rs3411955116:56995908A/Tp.Val6Asp08
6.29 × 10−18rs588016:57015091C/Gp.Ala330Pro0.51%−0.226(0.083)6.22 × 10−309
2.63 × 10−21rs588216:57016092G/Ap.Val362Ile46.80%0.057(0.012)1.48 × 10−60.029
Independent coding variants in Chinese
Associations of the reported variants by genome and exome wide association studies
TraitGeneVariantsP initialP conditionalVariantsChr:PositionAlleleAmino-acid changeMAFBETA(SE)Pr2Reference
LDL-CPCSK9rs1511930093.43 × 10−273.53 × 10−27rs24794091:55504650A/Gnoncoding31.73%0.005(0.013)0.700.013
3.28 × 10−27rs115911471:55505647T/Gp.Arg46Leu0.003%−0.812(1.000)0.4208
rs676089431:55512222G/Cp.Tyr142*08
rs283622861:55529215A/Cp.Cys679*08
TCAPOBrs133061942.72 × 10−92.33 × 10−8rs13671172:21263900A/Gp.Thr98Ile12.87%0.072(0.018)4.27 × 10−50.013
2.49 × 10−9rs57429042:21229160T/Cp.Arg3527Gln0.01%0.446(0.577)0.4408
LDL-CHMGCRrs1918359141.95 × 10−92.83 × 10−7rs126542645:74648603A/Tnoncoding48.10%−0.076(0.012)1.28 × 10−100.023
TGAPOA5rs20752911.64 × 10−458.02 × 10−59rs96418411:116648917G/Cnoncoding21.51%0.225(0.015)6.76 × 10−530.023
1.52 × 10−45rs313550611:116662407C/Gp.Ser19Trp0.08%0.054(0.204)0.7909
1.45 × 10−45rs14295314011:117089205T/Cp.Arg504His0.007%0.643(0.705)0.3608
TGAPOA4rs51044.81 × 10−271.44 × 10−8rs9641840.14
3.25 × 10−27rs31355060
4.94 × 10−27rs1429531400
HDL-CCETPrs23037905.95 × 10−182.45 × 10−9rs376426116:56993324A/Cnoncoding16.24%0.214(0.016)3.88 × 10−410.043
rs3411955116:56995908A/Tp.Val6Asp08
6.29 × 10−18rs588016:57015091C/Gp.Ala330Pro0.51%−0.226(0.083)6.22 × 10−309
2.63 × 10−21rs588216:57016092G/Ap.Val362Ile46.80%0.057(0.012)1.48 × 10−60.029

CHR, chromosome; MAF, minor allele frequency.

Chr: Position is reported in human genome build hg19.

Alleles are listed as minor/major on the forward strand of the reference genome.

Table 2.

Association results for independent coding variants in Chinese at the previously reported loci

Independent coding variants in Chinese
Associations of the reported variants by genome and exome wide association studies
TraitGeneVariantsP initialP conditionalVariantsChr:PositionAlleleAmino-acid changeMAFBETA(SE)Pr2Reference
LDL-CPCSK9rs1511930093.43 × 10−273.53 × 10−27rs24794091:55504650A/Gnoncoding31.73%0.005(0.013)0.700.013
3.28 × 10−27rs115911471:55505647T/Gp.Arg46Leu0.003%−0.812(1.000)0.4208
rs676089431:55512222G/Cp.Tyr142*08
rs283622861:55529215A/Cp.Cys679*08
TCAPOBrs133061942.72 × 10−92.33 × 10−8rs13671172:21263900A/Gp.Thr98Ile12.87%0.072(0.018)4.27 × 10−50.013
2.49 × 10−9rs57429042:21229160T/Cp.Arg3527Gln0.01%0.446(0.577)0.4408
LDL-CHMGCRrs1918359141.95 × 10−92.83 × 10−7rs126542645:74648603A/Tnoncoding48.10%−0.076(0.012)1.28 × 10−100.023
TGAPOA5rs20752911.64 × 10−458.02 × 10−59rs96418411:116648917G/Cnoncoding21.51%0.225(0.015)6.76 × 10−530.023
1.52 × 10−45rs313550611:116662407C/Gp.Ser19Trp0.08%0.054(0.204)0.7909
1.45 × 10−45rs14295314011:117089205T/Cp.Arg504His0.007%0.643(0.705)0.3608
TGAPOA4rs51044.81 × 10−271.44 × 10−8rs9641840.14
3.25 × 10−27rs31355060
4.94 × 10−27rs1429531400
HDL-CCETPrs23037905.95 × 10−182.45 × 10−9rs376426116:56993324A/Cnoncoding16.24%0.214(0.016)3.88 × 10−410.043
rs3411955116:56995908A/Tp.Val6Asp08
6.29 × 10−18rs588016:57015091C/Gp.Ala330Pro0.51%−0.226(0.083)6.22 × 10−309
2.63 × 10−21rs588216:57016092G/Ap.Val362Ile46.80%0.057(0.012)1.48 × 10−60.029
Independent coding variants in Chinese
Associations of the reported variants by genome and exome wide association studies
TraitGeneVariantsP initialP conditionalVariantsChr:PositionAlleleAmino-acid changeMAFBETA(SE)Pr2Reference
LDL-CPCSK9rs1511930093.43 × 10−273.53 × 10−27rs24794091:55504650A/Gnoncoding31.73%0.005(0.013)0.700.013
3.28 × 10−27rs115911471:55505647T/Gp.Arg46Leu0.003%−0.812(1.000)0.4208
rs676089431:55512222G/Cp.Tyr142*08
rs283622861:55529215A/Cp.Cys679*08
TCAPOBrs133061942.72 × 10−92.33 × 10−8rs13671172:21263900A/Gp.Thr98Ile12.87%0.072(0.018)4.27 × 10−50.013
2.49 × 10−9rs57429042:21229160T/Cp.Arg3527Gln0.01%0.446(0.577)0.4408
LDL-CHMGCRrs1918359141.95 × 10−92.83 × 10−7rs126542645:74648603A/Tnoncoding48.10%−0.076(0.012)1.28 × 10−100.023
TGAPOA5rs20752911.64 × 10−458.02 × 10−59rs96418411:116648917G/Cnoncoding21.51%0.225(0.015)6.76 × 10−530.023
1.52 × 10−45rs313550611:116662407C/Gp.Ser19Trp0.08%0.054(0.204)0.7909
1.45 × 10−45rs14295314011:117089205T/Cp.Arg504His0.007%0.643(0.705)0.3608
TGAPOA4rs51044.81 × 10−271.44 × 10−8rs9641840.14
3.25 × 10−27rs31355060
4.94 × 10−27rs1429531400
HDL-CCETPrs23037905.95 × 10−182.45 × 10−9rs376426116:56993324A/Cnoncoding16.24%0.214(0.016)3.88 × 10−410.043
rs3411955116:56995908A/Tp.Val6Asp08
6.29 × 10−18rs588016:57015091C/Gp.Ala330Pro0.51%−0.226(0.083)6.22 × 10−309
2.63 × 10−21rs588216:57016092G/Ap.Val362Ile46.80%0.057(0.012)1.48 × 10−60.029

CHR, chromosome; MAF, minor allele frequency.

Chr: Position is reported in human genome build hg19.

Alleles are listed as minor/major on the forward strand of the reference genome.

At PCSK9, three low frequency coding variants (rs11591147, rs67608943 and rs28362286) were previously found to be associated with LDL-C in European ancestry (EA) or African ancestry (AA) individuals (8,10). However, these variants were monomorphic or extremely rare in Chinese samples. Moreover, rs2479409 in PCSK9, a well-known GWAS index variant, showed no evidence of association in our study (LDL-C, P = 0.70). As expected, all these reported variants did not have any effect on the association of lead variant rs151193009 (Pconditional < 3.53 × 10−27).

At APOB, a common missense variant (rs1367117, encoding p.Thr98Ile) was previously identified as the GWAS index variant (4). Although rs1367117 was suggestively associated with TC in our samples (P = 4.27 × 10−5), the association of the lead variant rs13306194 encoding p.Arg532Trp was not attenuated by conditioning on the rs1367117 (Pconditional = 2.3 × 10−8, r2 =0.01). Another reported missense variant rs5742904 was very rare (MAF = 0.01%) and showed no association with TC in our study.

At HMGCR, the GWAS variant rs12916 was not present on the exome array, but the top variant rs12654264 (P = 1.98 × 10−10) in our data showed strong linkage disequilibrium (LD) with rs12916 (r2 =0.99). Conditional analyses revealed the missense variant rs191835914 encoding p.Tyr311Ser was independent of rs12654264 (Pconditional = 2.83 × 10−7, r2 =0.02).

At APOA5-APOA4, rs2075291 encoding p.Gly185Cys and rs5104 encoding p.Ser147Asn were about 30KB apart and in no LD (r2 =0). Adjusting for one variant in a conditional analysis only slightly decreased the association for the other (Pinitial = 1.64 × 10−45, Pconditional = 1.78 × 10−44 for rs2075291; Pinitial = 4.81 × 10−27, Pconditional = 5.87 × 10−26 for rs5104, respectively), suggesting that the two coding variants were completely independent. They were also independent of the GWAS index rs964184 (Pconditional = 8.02 × 10−59 for rs2075291; Pconditional = 1.44 × 10−8 for rs5104). The frequencies of two reported coding variants (8,9) at APOA5 were 0.08% (rs3135506) and 0.007% (rs142953140) and showed no evidence of association with TG in our samples.

The GWAS index rs3764261 at CETP was also the lead variant with HDL-C in our samples (P = 3.88 × 10−41). Conditional analysis showed that the association for the missense variant rs2303790 encoding p.Asp399Gly was modestly attenuated and was thus likely to be independent from the GWAS index rs3764261 (the P value decreased from 5.95 × 10−18 to 2.45 × 10 9, r2 =0.04). The coding variants identified in European or African ancestry individuals (8,9) just showed suggestive significance (rs5880 p.Ala330Pro, P = 6.22 × 10−3; rs5882 p.Val362Ile, P =1.48 × 10−6) or was not polymorphic (rs34119551, p.Val6Asp) in our samples.

New lead non-coding variants at LPA, LIPC and LDLR

Besides the independent coding variants, we also identified significant associations of three new lead non-coding variants that were independent of previously identified associations (Table 3). These loci included LPA for TC (rs7770628, P = 1.05 × 10−7), LIPC for HDL-C (rs1800588, P = 2.11 × 10−12) and LDLR for LDL-C (rs11557092, P = 9.02 × 10−8). These variants were not in LD with the previously reported variants (r2 <0.002). The previously identified lead variants at LPA (rs1564348) and LDLR (rs6511720) (3) were found to be rare in our Chinese samples and showed no associations with lipid levels. For LIPC, the GWAS index variant (rs1532085) (3) displayed a significant association with HDL-C in Chinese populations (P = 3.00× 10−10). However, the association of our lead variant rs1800588 at this locus was not attenuated by conditioning on the variant rs1532085 (P conditional = 1.05× 10−11), representing two exome-wide significant independent signals in Chinese populations.

Table 3.

Association results for independent non-coding variants in Chinese at the previously reported loci

TraitGeneOur lead variants
The previously reported lead variants
r2
VariantsChr:PositionAlleleNMAFEffect (S.E.)PP conditionalVariantsAlleleMAFEffect(S.E.)PP conditional
TCLPArs77706286:161018174C/T143330.1150.098(0.018)1.05× 10−71.11× 10−7rs1564348C/T0.0030.069(0.100)0.490.570.002
HDL-CLIPCrs180058815:58723675T/C144560.3860.085(0.012)2.11× 10−121.05× 10−11rs1532085A/G0.4610.074(0.012)3.00× 10−101.36× 10−90.002
LDL-CLDLRrs1155709219:11257018T/C143540.248−0.073(0.014)9.02× 10−85.15× 10−8rs6511720T/G0.007−0.153(0.071)0.0310.0160.001
TraitGeneOur lead variants
The previously reported lead variants
r2
VariantsChr:PositionAlleleNMAFEffect (S.E.)PP conditionalVariantsAlleleMAFEffect(S.E.)PP conditional
TCLPArs77706286:161018174C/T143330.1150.098(0.018)1.05× 10−71.11× 10−7rs1564348C/T0.0030.069(0.100)0.490.570.002
HDL-CLIPCrs180058815:58723675T/C144560.3860.085(0.012)2.11× 10−121.05× 10−11rs1532085A/G0.4610.074(0.012)3.00× 10−101.36× 10−90.002
LDL-CLDLRrs1155709219:11257018T/C143540.248−0.073(0.014)9.02× 10−85.15× 10−8rs6511720T/G0.007−0.153(0.071)0.0310.0160.001

CHR, chromosome; MAF, minor allele frequency.

Chr: Position is reported in human genome build hg19.

Alleles are listed as minor/major on the forward strand of the reference genome.

Table 3.

Association results for independent non-coding variants in Chinese at the previously reported loci

TraitGeneOur lead variants
The previously reported lead variants
r2
VariantsChr:PositionAlleleNMAFEffect (S.E.)PP conditionalVariantsAlleleMAFEffect(S.E.)PP conditional
TCLPArs77706286:161018174C/T143330.1150.098(0.018)1.05× 10−71.11× 10−7rs1564348C/T0.0030.069(0.100)0.490.570.002
HDL-CLIPCrs180058815:58723675T/C144560.3860.085(0.012)2.11× 10−121.05× 10−11rs1532085A/G0.4610.074(0.012)3.00× 10−101.36× 10−90.002
LDL-CLDLRrs1155709219:11257018T/C143540.248−0.073(0.014)9.02× 10−85.15× 10−8rs6511720T/G0.007−0.153(0.071)0.0310.0160.001
TraitGeneOur lead variants
The previously reported lead variants
r2
VariantsChr:PositionAlleleNMAFEffect (S.E.)PP conditionalVariantsAlleleMAFEffect(S.E.)PP conditional
TCLPArs77706286:161018174C/T143330.1150.098(0.018)1.05× 10−71.11× 10−7rs1564348C/T0.0030.069(0.100)0.490.570.002
HDL-CLIPCrs180058815:58723675T/C144560.3860.085(0.012)2.11× 10−121.05× 10−11rs1532085A/G0.4610.074(0.012)3.00× 10−101.36× 10−90.002
LDL-CLDLRrs1155709219:11257018T/C143540.248−0.073(0.014)9.02× 10−85.15× 10−8rs6511720T/G0.007−0.153(0.071)0.0310.0160.001

CHR, chromosome; MAF, minor allele frequency.

Chr: Position is reported in human genome build hg19.

Alleles are listed as minor/major on the forward strand of the reference genome.

The meta-analysis of the exome and genome-wide association data

To improve the power and identify additional lipid loci, we combined exome-wide and 1000 genomes imputed genome-wide association data, although 34.11% (n = 51,610) with the good imputation quality of variants in the exome array were available in the imputed data. Besides the 20 significant loci identified by exome-wide association analysis, the 4 additional lipid loci (RGS12, FADS1, HPR and LIPG) in the previously reported region attained exome-wide significance (Supplementary Material, Table S7). Of note, at LIPG, we detected two independent association signals (the missense rs2000813 encoding p.Thr111Ile, P = 1.26× 10−7 and rs4939883, P = 4.25× 10−10, r2 =0.005). We observed the previously reported lead GWAS variant (rs7241918) (3) at LIPG only showed nominal significance with HDL-C (P = 4.45× 10−4) in our Chinese populations, while the missense variant rs77960347 (MAF = 1.6%) identified in Europeans (9) was extremely rare (MAF = 0.01%) and not associated with HDL-C in our samples (P = 0.85).

Gene-based association tests

To test rare and low-frequency variants for association, we then performed gene-based analyses incorporating variants annotated as missense, stop-gain, stop-loss, or splice site changes with a MAF <5%. Three well-known genes (PCSK9, HMGCR and CEPT) exhibited strong associations with blood lipid levels in gene-level tests (P < 2.8 × 10−6) (Table 4). The strength of association with a burden of rare variants was more significant than that observed in a single variant test. We observed secondary nominally significant variants: rs5908 encoding p.Ile638Val at HMGCR (MAF = 0.02%, TC, P = 0.046; LDL, P = 0.022) and rs201790757 creating the stop codon p.Tyr74 (MAF = 0.04%, HDL-C, P = 3.02 × 10−4) and rs5880 encoding p.Ala330Pro (MAF = 0.51%, HDL-C, P = 6.22 × 10−3) at CETP (Supplementary Material, Table S8). No new genetic associations were identified by gene-based analyses.

Table 4.

Genes with a burden of rare or low-frequency variants significantly associated with lipid levels (P < 2.8 × 10 6)

GeneTraitPBest testNo.of variantscMAFBetaSECoding variants
PCSK9LDL-C1.42 × 10−29SKAT70.02−0.380.04p.Arg93Cys, p.Gly176Arg, p.Gly236Ser, p.Ala443Thr, p.Arg469Trp, p.Pro576Leu, p.Val644Ile
TC7.74 × 10−24SKAT70.02−0.330.04
HMGCRLDL-C1.85 × 10−9SKAT20.02−0.240.04p.Tyr311Ser, p.Ile638Val
TC4.00 × 10−7SKAT20.02−0.200.04
CETPHDL-C1.22 × 10−21SKAT70.030.200.03p.Tyr74*, p.Arg299Cys, p.Gly271Ser, p.Asn298Ser, p.Ala330Pro, p.Glu360Lys, p.Asp399Gly
GeneTraitPBest testNo.of variantscMAFBetaSECoding variants
PCSK9LDL-C1.42 × 10−29SKAT70.02−0.380.04p.Arg93Cys, p.Gly176Arg, p.Gly236Ser, p.Ala443Thr, p.Arg469Trp, p.Pro576Leu, p.Val644Ile
TC7.74 × 10−24SKAT70.02−0.330.04
HMGCRLDL-C1.85 × 10−9SKAT20.02−0.240.04p.Tyr311Ser, p.Ile638Val
TC4.00 × 10−7SKAT20.02−0.200.04
CETPHDL-C1.22 × 10−21SKAT70.030.200.03p.Tyr74*, p.Arg299Cys, p.Gly271Ser, p.Asn298Ser, p.Ala330Pro, p.Glu360Lys, p.Asp399Gly

Variants included in the analysis were restricted to those with MAF < 0.05 and annotated as nonsynonymous, splice-site, or stop loss/gain variants.

*

cMAF, combined minor allele frequency of all variants included in the analysis.

Gene-based association tests were performed in 14,473 exome array data.

Table 4.

Genes with a burden of rare or low-frequency variants significantly associated with lipid levels (P < 2.8 × 10 6)

GeneTraitPBest testNo.of variantscMAFBetaSECoding variants
PCSK9LDL-C1.42 × 10−29SKAT70.02−0.380.04p.Arg93Cys, p.Gly176Arg, p.Gly236Ser, p.Ala443Thr, p.Arg469Trp, p.Pro576Leu, p.Val644Ile
TC7.74 × 10−24SKAT70.02−0.330.04
HMGCRLDL-C1.85 × 10−9SKAT20.02−0.240.04p.Tyr311Ser, p.Ile638Val
TC4.00 × 10−7SKAT20.02−0.200.04
CETPHDL-C1.22 × 10−21SKAT70.030.200.03p.Tyr74*, p.Arg299Cys, p.Gly271Ser, p.Asn298Ser, p.Ala330Pro, p.Glu360Lys, p.Asp399Gly
GeneTraitPBest testNo.of variantscMAFBetaSECoding variants
PCSK9LDL-C1.42 × 10−29SKAT70.02−0.380.04p.Arg93Cys, p.Gly176Arg, p.Gly236Ser, p.Ala443Thr, p.Arg469Trp, p.Pro576Leu, p.Val644Ile
TC7.74 × 10−24SKAT70.02−0.330.04
HMGCRLDL-C1.85 × 10−9SKAT20.02−0.240.04p.Tyr311Ser, p.Ile638Val
TC4.00 × 10−7SKAT20.02−0.200.04
CETPHDL-C1.22 × 10−21SKAT70.030.200.03p.Tyr74*, p.Arg299Cys, p.Gly271Ser, p.Asn298Ser, p.Ala330Pro, p.Glu360Lys, p.Asp399Gly

Variants included in the analysis were restricted to those with MAF < 0.05 and annotated as nonsynonymous, splice-site, or stop loss/gain variants.

*

cMAF, combined minor allele frequency of all variants included in the analysis.

Gene-based association tests were performed in 14,473 exome array data.

Association with CAD

To further assess whether the identified independent variants also relate to CAD, we tested their associations with CAD in the Hubei CAD study (HuCAD) and two GWAS samples (the Beijing Atherosclerosis Study (BAS) and the China Atherosclerosis Study (CAS)) involving 6,179 cases and 9,552 controls. Three low-frequency coding variants and two non-coding variant showed significant associations (P values from 0.02 to 1.68× 10−4) with CAD in the expected direction (Table 5). T (rs151193009) allele at PCSK9, C (rs191835914) allele at HMGCR, and T (rs11557092) allele at LDLR leading to lower LDL-C were associated with reduced risk for CAD (OR = 0.76, 95% CI = 0.52-0.99, P = 0.02; OR = 0.76, 95% CI = 0.57-0.96, P = 8.42 × 10−3; OR = 0.88, 95% CI = 0.84-0.94, P = 1.68 × 10−4, respectively), while A (rs2075291) allele at APOA5 leading to higher TG and C (rs7770628) allele at LPA leading to higher TC were similarly associated with increased risk for CAD (OR = 1.14, 95% CI = 1.04–1.24, P = 9.60 × 10−3; OR = 1.09, 95% CI = 1.01–1.19, P = 0.02, respectively). None of the remaining variants at APOB, APOA4, LIPC, or CETP were associated with risk for CAD.

Table 5.

Association results for independent variants with coronary artery disease in 6,179 cases and 9,552 controls

GeneVariantsCHRPositionA1A2MAFOR(95%CI)P
PCSK9rs151193009155509585TC0.0150.76(0.52–0.99)0.02
APOBrs13306194221252534AG0.1180.95(0.98–1.03)0.22
HMGCRrs191835914574646765CA0.0220.76(0.57–0.96)8.42 × 10−3
LPArs77706286161018174CT0.1201.09(1.01–1.19)0.02
APOA5rs207529111116661392AC0.0581.14(1.04–1.24)9.60 × 10−3
APOA4rs510411116692334CT0.3340.99(0.94–1.04)0.67
LIPCrs18005881558723675TC0.3811.03(0.98–1.08)0.22
CETPrs23037901657017292GA0.0220.98(0.79–1.17)0.85
LDLRrs115570921911257018TC0.2580.88(0.84–0.94)1.68× 10−4
GeneVariantsCHRPositionA1A2MAFOR(95%CI)P
PCSK9rs151193009155509585TC0.0150.76(0.52–0.99)0.02
APOBrs13306194221252534AG0.1180.95(0.98–1.03)0.22
HMGCRrs191835914574646765CA0.0220.76(0.57–0.96)8.42 × 10−3
LPArs77706286161018174CT0.1201.09(1.01–1.19)0.02
APOA5rs207529111116661392AC0.0581.14(1.04–1.24)9.60 × 10−3
APOA4rs510411116692334CT0.3340.99(0.94–1.04)0.67
LIPCrs18005881558723675TC0.3811.03(0.98–1.08)0.22
CETPrs23037901657017292GA0.0220.98(0.79–1.17)0.85
LDLRrs115570921911257018TC0.2580.88(0.84–0.94)1.68× 10−4

Genomic positions are on human genome build hg19.

CHR, chromosome; A1, minor allele; A2, major allele; MAF, minor allele frequency.

Table 5.

Association results for independent variants with coronary artery disease in 6,179 cases and 9,552 controls

GeneVariantsCHRPositionA1A2MAFOR(95%CI)P
PCSK9rs151193009155509585TC0.0150.76(0.52–0.99)0.02
APOBrs13306194221252534AG0.1180.95(0.98–1.03)0.22
HMGCRrs191835914574646765CA0.0220.76(0.57–0.96)8.42 × 10−3
LPArs77706286161018174CT0.1201.09(1.01–1.19)0.02
APOA5rs207529111116661392AC0.0581.14(1.04–1.24)9.60 × 10−3
APOA4rs510411116692334CT0.3340.99(0.94–1.04)0.67
LIPCrs18005881558723675TC0.3811.03(0.98–1.08)0.22
CETPrs23037901657017292GA0.0220.98(0.79–1.17)0.85
LDLRrs115570921911257018TC0.2580.88(0.84–0.94)1.68× 10−4
GeneVariantsCHRPositionA1A2MAFOR(95%CI)P
PCSK9rs151193009155509585TC0.0150.76(0.52–0.99)0.02
APOBrs13306194221252534AG0.1180.95(0.98–1.03)0.22
HMGCRrs191835914574646765CA0.0220.76(0.57–0.96)8.42 × 10−3
LPArs77706286161018174CT0.1201.09(1.01–1.19)0.02
APOA5rs207529111116661392AC0.0581.14(1.04–1.24)9.60 × 10−3
APOA4rs510411116692334CT0.3340.99(0.94–1.04)0.67
LIPCrs18005881558723675TC0.3811.03(0.98–1.08)0.22
CETPrs23037901657017292GA0.0220.98(0.79–1.17)0.85
LDLRrs115570921911257018TC0.2580.88(0.84–0.94)1.68× 10−4

Genomic positions are on human genome build hg19.

CHR, chromosome; A1, minor allele; A2, major allele; MAF, minor allele frequency.

Discussions

The present study systematically investigated the contribution of coding variants to lipid levels in Chinese populations. Our study validated 24 loci previously identified in European populations. We also identified six coding variants (PCSK9, APOB, HMGCR, APOA5, APOA4 and CETP) and three non-coding variants (LPA, LIPC and LDLR) that were independent of previously identified associations, four of which were low-frequency coding variants with large effect sizes. Our data demonstrate the inter-population differences in allele frequencies lead to association signals at different coding variants in different populations.

GWAS have identified ∼160 significantly lipid-associated loci in individuals of European ancestries, which could explain 12%∼15% of lipid variance (3,4). We also performed large scale GWAS in a total of 23,083 Chinese individuals (11), and we not only found that 17 previously reported loci could be generalized to Chinese populations, but also identified three Chinese-specific variants for lipid levels, suggesting that both shared and population-specific susceptibility were commonly present. However, almost of the variants identified by GWAS are common, and low-frequency coding variants are poorly characterized by GWAS genotyping arrays. The current exome-wide study not only confirmed the 24 reported loci but also allowed us to identify 14 coding variants at 10 loci with exome-wide significance. The protein-coding variants are more likely to have lower frequencies in the human population and have arisen recently in human evolutionary history, and most of them are population-specific (12,13). For example, the GWAS association signals at PCSK9 in different populations are distinct, and GWAS index variant rs2479409 initially identified in European populations cannot be replicated in non-European populations (14). We also detected a different GWAS signal at PCSK9 (rs7525649) in Chinese individuals (11), which showed moderate LD with rs2479409 (r2 =0.179, 1000 Genomes CHB + JPT). It has been showed the common GWAS signal at PCSK9 in European populations could be explained by the low frequency functional variant p.Arg46Leu identified by fine mapping studies (3,15,16), suggesting that the GWAS variant has no relevant functional consequence (17). The presence of allelic heterogeneity for rare coding variants at PCSK9 has been also demonstrated in different ancestry samples. Two loss of function variants p.Tyr142* and p.Cys679* at PCSK9 exhibited substantially stronger evidence of association with LDL-C in African Americans compared to p.Arg46Leu. The PCSK9 p.Arg46Leu variant associates with lower LDLC levels in European populations at a MAF of 3.2% but at 0.6% in African Americans, while p.Tyr142* and p.Cys679* at PCSK9 in African Americans (MAF 2.6%) are found at 0.06% in Europeans (10). It is notable that all three coding variants are monomorphic or extremely rare in Chinese samples. The current study further demonstrated the inter-ethnic differences in the identification of rare coding variants across populations. All four low-frequency coding variants (PCSK9 p.Arg93Cys, HMGCR p.Tyr311Ser, APOA5 p.Gly185Cys and CETP p.Asp399Gly) we identified in Chinese are monomorphic in 1000 genome individuals of 3 non-Asian populations (EUR, AFR and AMR). Similarly, almost all the low-frequency or rare coding variants (MAF ranges from 0.05% to 3.4%) identified by EWAS in European and African ancestry individuals have at least 1 order of magnitude lower frequency (MAFs range from 0.0035% to 0.12%) or are monomorphic in Chinese populations (Supplementary Material, Table S9). These substantial differences in inter-ethnic MAF result in the identification of different coding variants in different ancestry individuals and insufficient power to replicate the initially observed variants in other ancestries. This allelic heterogeneity emphasizes the importance of including diverse populations in genetic association studies of complex traits such as lipid levels.

More importantly, the identification of the protein-coding variants with potential deleterious function may allow us to quickly prioritize both the functional genes and the causal variations. The functional genes pointed to by coding variants are either well-known genes that cause Mendelian dyslipidemias (PCSK9, APOB, APOA5, CETP, LPL, LIPG and APOE) (Supplementary Material, Table S10) or genes with well-established roles in lipid metabolism (HMGCR, MLXIPL and GCKR). The identified coding variants are also more likely to be causal, especially for the rare variants with large effect sizes. The associations of the coding variants at PCSK9 (p.Arg93Cys) and APOB (p.Arg532Trp) with LDL-C were the strongest in each locus, while the coding variant at HMGCR showed almost perfect LD with the top signal. These results indicate that the coding variants at the three above loci can substantially account for the association signals. The coding variants at APOA5-APOA4, CETP and LIPG did not show the most significant associations. However, conditional analyses showed that these coding variants were independent of the corresponding top variants at each locus. It’s worth noting that four low-frequency coding variants showed large effect size on lipid levels (effect size > 0.2 s.d.) that was substantially greater than that of the corresponding common variants in previous GWAS. Based on the population-based 4187 HuCAD control samples, we observed one mutation in PCSK9 p.Arg93Cys and HMGCR p.Tyr311Ser could lower plasma levels of LDL cholesterol by 21.59 mg and 6.28 mg per deciliter, respectively, while the mutation APOA5 p.Gly185Cys increased TG by 27.32 mg per deciliter (Fig. 1) . All these coding variants also displayed the association with the risk of CAD with the expected consistent direction, which supported the causal roles of these variants. CETP p.Asp399Gly was associated with an increase of 4.40 mg/dl in HDL-C but not with CAD, consistent with the fact that HDL-C may not be a causal risk factor for clinical CAD.
Lipid levels in 4,187 HuCAD samples according to the number of four low-frequency coding variants. The box plots give the median levels (middle horizontal line in each box), the interquartile ranges (delineated by the top and bottom of each box)
Figure 1.

Lipid levels in 4,187 HuCAD samples according to the number of four low-frequency coding variants. The box plots give the median levels (middle horizontal line in each box), the interquartile ranges (delineated by the top and bottom of each box)

Several limitations need to be acknowledged in our study. First, the exome array we used did not provide complete coverage of all functional variants. A large fraction of common (68.11%, MAF >5%) and low-frequency (75.00%, MAF = 0.5-5%) coding variants observed in the 1000 genomes East Asian samples were captured by the current exome array. For the remaining coding variants with two or more copies, only 51.58% were captured by the array. We were also not able to evaluate some rare ethnic specific variants using the current exome array. Exome or whole genome sequencing is therefore needed to fully characterize and capture the population-specific variants. Second, we had limited power to evaluate some low frequency and rare variants with lipid levels. Power calculations based on our sample size suggest that we had 80% power to detect variants with MAF = 1% and effect size of 0.35 s.d. at α = 3.3 × 10−7 but considerably less power to detect the same effect size at MAF < 1% (Supplementary Material, Fig. S4). The statistical power for CAD analyses is even lower, though more than 15,000 cases and controls were included. This means that some signals especially with a low minor allele frequency and/or weaker genetic effects may have been missed. Subsequent studies in larger sample sizes will be required to identify associations of rare coding variants with lipid levels and CAD.

In conclusion, we have identified 6 coding variants and 3 non-coding variants that were independent of previously identified associations in European populations. Our study highlights the value of searching for low frequency, ethnic-specific possible causal variants in non-European ancestry samples. Large-scale sequencing efforts in multiple diverse populations may lead to better understanding of the genetic architecture of lipid levels.

Materials and Methods

Study population

The study was a meta-analysis consisting of 14,473 Han Chinese who underwent standardized collection of blood lipid measurements in four independent exome wide association studies. These studies included the HuCAD (18), the Nutrition and Health of Aging Population in China (NHAPC) (19), the Guizhou-Bijie Type 2 Diabetes Study (GBTDS) (20), and the Guangxi Fangchenggang Area Male Health and Examination Survey (FAMHES) (21). The two GWAS samples comprising 6,534 individuals (the Beijing Atherosclerosis Study (BAS) and the China Atherosclerosis Study (CAS)) were further combined in the meta-analysis to detect additional significant lipid association and evaluate the evidence of association with CAD (22,23). Both of them were previously genotyped by genome wide array and imputed using the 1000 genomes project phase 1 v3 training set (24).

Blood specimens were obtained after participants had fasted overnight (≥8 h). The plasma TC, TG, LDL-C and HDL-C levels were measured in all the subjects. Values of TG were natural log transformed to approximate normality in each study. Each study obtained approval from the institutional review board of local research institutions. All participants in each study gave written informed consent.

Exome array genotyping and quality control

All study participants were genotyped using the Asian human exome beadchips, which captured a total of 302,218 variants, including functional coding variants (>80%) and disease-associated tag variants from published GWAS. The custom content of ∼60K variants in this exome array specially designed for Asian populations was added to the standard Illumina HumanExome BeadChip to improve the coverage of low frequency variants in Asian populations. The custom panel of coding variants was selected based on three independent Asian sequencing data sets. Included on the chip were also top variants selected from GWAS for follow-up. Details of the Asian Exome chip design have been described in elsewhere (25). Genotype calling was carried out using Illumina’s GenTrain version 2.0 clustering algorithm in GenomeStudio. Detailed descriptions of quality control filters applied to the six studies are provided in Supplementary Material, Table S2. Several quality control filters were applied to samples and variants before analysis. Individuals with low genotype rate, evidence of gender discrepancy, excess heterozygosity or cryptic relatedness were removed. Variants that did not meet the 95% genotyping threshold or showed a deviation from Hardy-Weinberg equilibrium (P < 1 × 10−5) were removed. In addition, variants that were monomorphic in our study population were removed from the analysis.

Statistical analyses

Within each cohort, TC, LDL-C, HDL-C, and TG and measures were transformed using the inverse normal distribution after adjustment of each trait for age, age square and study-specific covariates, including principal components to account for population structure (Supplementary Material, Table S2). We performed both single variant and gene-level association tests. For each cohort, single variant score statistics and their covariance matrix for variants, which summarizes LD information and relatedness among sampled individuals, are generated using rvtests or RareMetalWorker (26). Using summary association statistics collected from each study, we performed meta-analysis of single variant and gene-level association tests using the R-package RAREMETALS (27) for TC, LDL-C, HDL-C and TG. Single-variant statistics and their variance-covariance matrix are combined across studies to obtain an overall score statistic for each variant and a combined covariance matrix across studies, which are used for the single-variant analyses. For gene-based tests, summary association statistics from the variants are used to construct the gene-level association test statistics. Gene-level tests that aggregate the effect of multiple rare or low-frequency variants across a gene are used to test for association. These analyses were restricted to variants which were predicted to alter the coding sequence of the gene product (defined as missense, stop-gain, stop-loss, or splice-site variants) in order to enhance the likelihood of identifying causal variants and to reduce the multiple testing burden. For each trait, we ran two gene-based tests: a simple burden test and a sequence kernel association test (SKAT) (28,29) with a MAF cutoff of <5%. The simple burden test can be defined as a weighted sum of allele counts for variants satisfying these criteria. Its main limitation is that all tested variants are assumed to influence the phenotype in the same direction. Thus, the simple burden test can be underpowered when variants with opposite phenotypic effects reside in the same gene. SKAT is more powerful than the simple burden test when there are both protective and deleterious variants with different magnitudes in the same gene. For single variant analyses, we considered a significance threshold of P < 3.3 × 10−7, corresponding to a Bonferroni correction for 151,289 tests. For gene-level analyses, we used a significance threshold of P < 2.8 × 10−6, corresponding to a Bonferroni correction for 17612 gene-level tests.

We conducted CAD association analysis using individuals from HuCAD, BAS and CAS. Logistic regression was carried out to evaluate if the lipid-associated variants influence CAD risk. The effect estimates and s.e. were meta-analysed using metal (30) by the fixed-effect inverse-variance method.

To test for independence of the population-specific association signals from variants previously demonstrated to be associated with the phenotype at that locus, conditional analyses were performed to control for the effects of the reported lipid variants. Power calculations were performed using QUANTO software program.

Annotation and functional prediction of variants

Variants were annotated using ANNOVAR (31). Variant identifiers and chromosomal positions are listed with respect to the hg19 genome build. The functional coding variants on the array were defined as missense, stop-gain, stop-loss, or splice site changes. The identified coding variants were further analysed for potential damaging effects with prediction software tools SIFT (32) and PolyPhen (33).

Supplementary Material

Supplementary Material is available at HMG online.

Acknowledgements

The authors would like to acknowledge the contribution of the staff and participants of all the studies, including Hubei Coronary Artery Disease study (HuCAD), the Nutrition and Health of Aging Population in China (NHAPC) , the Guizhou-Bijie Type 2 Diabetes Study (GBTDS), the Guangxi Fangchenggang Area Male Health and Examination Survey (FAMHES), the Beijing Atherosclerosis Study (BAS) and the China Atherosclerosis Study (CAS).

Conflict of Interest statement. None declared.

Funding

The HuCAD was supported by Natural National Scientific Foundation of China Grant 81230069 and National Basic Research Program Grant 2011CB503806. The NHAPC and the GBTDS cohorts are supported by the National High Technology Research and Development Program of China (863 Program 2009AA022704), the National Basic Research Program of China (973 Program 2012CB524900), the National Natural Science Foundation of China (81321062, 81170734 and 81471013), and the Chinese Academy of Sciences (KJZD-EW-L14). The CAS was funded by the National Basic Research Program of China (973 Plan) (2011CB503901) and the High-Tech Research and Development Program of China (863 Plan) (2012AA02A516) from the Ministry of Science and Technology of China and the National Science Foundation of China (91439202, 81422043, 81370002). The BAS was supported by Beijing Natural Science Foundation (7142138). The FAMHES was supported by Guangxi Natural Science Fund for Innovation Research Team (2013GXNSFFA019002) and Guangxi Collaborative Innovation Center for genomic and personalized medicine (201319).

References

1

Castelli
W.P.
(
1988
)
Cholesterol and lipids in the risk of coronary artery disease–the Framingham Heart Study
.
Can. J. Cardiol
.,
4, Suppl A
,
5A
10A
.

2

Go
A.S.
Mozaffarian
D.
Roger
V.L.
Benjamin
E.J.
Berry
J.D.
Blaha
M.J.
Dai
S.
Ford
E.S.
Fox
C.S.
Franco
S.
, et al. . (
2014
)
Heart disease and stroke statistics–2014 update: a report from the American Heart Association
.
Circulation
,
129
,
e28
e292
.

3

Willer
C.J.
Schmidt
E.M.
Sengupta
S.
Peloso
G.M.
Gustafsson
S.
Kanoni
S.
Ganna
A.
Chen
J.
Buchkovich
M.L.
Mora
S.
, et al. . (
2013
)
Discovery and refinement of loci associated with lipid levels
.
Nat. Genet
.,
45
,
1274
1283.,

4

Teslovich
T.M.
Musunuru
K.
Smith
A.V.
Edmondson
A.C.
Stylianou
I.M.
Koseki
M.
Pirruccello
J.P.
Ripatti
S.
Chasman
D.I.
Willer
C.J.
, et al. . (
2010
)
Biological, clinical and population relevance of 95 loci for blood lipids
.
Nature
,
466
,
707
713
.

5

Willer
C.J.
Sanna
S.
Jackson
A.U.
Scuteri
A.
Bonnycastle
L.L.
Clarke
R.
Heath
S.C.
Timpson
N.J.
Najjar
S.S.
Stringham
H.M.
, et al. . (
2008
)
Newly identified loci that influence lipid concentrations and risk of coronary artery disease
.
Nat. Genet
.,
40
,
161
169
.

6

Kathiresan
S.
Melander
O.
Guiducci
C.
Surti
A.
Burtt
N.P.
Rieder
M.J.
Cooper
G.M.
Roos
C.
Voight
B.F.
Havulinna
A.S.
, et al. . (
2008
)
Six new loci associated with blood low-density lipoprotein cholesterol, high-density lipoprotein cholesterol or triglycerides in humans
.
Nat. Genet
.,
40
,
189
197
.

7

Kathiresan
S.
Willer
C.J.
Peloso
G.M.
Demissie
S.
Musunuru
K.
Schadt
E.E.
Kaplan
L.
Bennett
D.
Li
Y.
Tanaka
T.
, et al. . (
2009
)
Common variants at 30 loci contribute to polygenic dyslipidemia
.
Nat. Genet
.,
41
,
56
65
.

8

Peloso
G.M.
Auer
P.L.
Bis
J.C.
Voorman
A.
Morrison
A.C.
Stitziel
N.O.
Brody
J.A.
Khetarpal
S.A.
Crosby
J.R.
Fornage
M.
, et al. . (
2014
)
Association of low-frequency and rare coding-sequence variants with blood lipids and coronary heart disease in 56,000 whites and blacks
.
Am. J. Hum. Genet
.,
94
,
223
232
.

9

Holmen
O.L.
Zhang
H.
Fan
Y.
Hovelson
D.H.
Schmidt
E.M.
Zhou
W.
Guo
Y.
Zhang
J.
Langhammer
A.
Lochen
M.L.
, et al. . (
2014
)
Systematic evaluation of coding variation identifies a candidate causal variant in TM6SF2 influencing total cholesterol and myocardial infarction risk
.
Nat. Genet
.,
46
,
345
351
.

10

Cohen
J.C.
Boerwinkle
E.
Mosley
T.H
Jr.
Hobbs
H.H.
(
2006
)
Sequence variations in PCSK9, low LDL, and protection against coronary heart disease
.
N. Engl. J. Med
.,
354
,
1264
1272
.

11

Lu
X.
Huang
J.
Mo
Z.
He
J.
Wang
L.
Yang
X.
Tan
A.
Chen
S.
Chen
J.
Gu
C.C.
, et al. . (
2016
)
Genetic Susceptibility to Lipid Levels and Lipid Change Over Time and Risk of Incident Hyperlipidemia in Chinese Populations
.
Circ. Cardiovasc. Genet
.,
9
,
37
44
.

12

Tennessen
J.A.
Bigham
A.W.
O'Connor
T.D.
Fu
W.
Kenny
E.E.
Gravel
S.
McGee
S.
Do
R.
Liu
X.
Jun
G.
, et al. . (
2012
)
Evolution and functional impact of rare coding variation from deep sequencing of human exomes
.
Science
,
337
,
64
69
.

13

Fu
W.
O'Connor
T.D.
Jun
G.
Kang
H.M.
Abecasis
G.
Leal
S.M.
Gabriel
S.
Rieder
M.J.
Altshuler
D
Shendure
J.
NHLBI Exome Sequencing Project
., et al. . (
2013
)
Analysis of 6,515 exomes reveals the recent origin of most human protein-coding variants
.
Nature
,
493
,
216
220
.

14

Wu
Y.
Waite
L.L.
Jackson
A.U.
Sheu
W.H.
Buyske
S.
Absher
D.
Arnett
D.K.
Boerwinkle
E.
Bonnycastle
L.L.
Carty
C.L.
, et al. . (
2013
)
Trans-ethnic fine-mapping of lipid loci identifies population-specific signals and allelic heterogeneity that increases the trait variance explained
.
PLoS Genet
.,
9
,
e1003379.

15

Surakka
I.
Horikoshi
M.
Magi
R.
Sarin
A.P.
Mahajan
A.
Lagou
V.
Marullo
L.
Ferreira
T.
Miraglio
B.
Timonen
S.
, et al. . (
2015
)
The impact of low-frequency and rare variants on lipid levels
.
Nat. Genet
.,
47
,
589
597
.

16

Sanna
S.
Li
B.
Mulas
A.
Sidore
C.
Kang
H.M.
Jackson
A.U.
Piras
M.G.
Usala
G.
Maninchedda
G.
Sassu
A.
, et al. . (
2011
)
Fine mapping of five loci associated with low-density lipoprotein cholesterol detects variants that double the explained heritability
.
PLoS Genet
.,
7
,
e1002198.

17

Chernogubova
E.
Strawbridge
R.
Mahdessian
H.
Malarstig
A.
Krapivner
S.
Gigante
B.
Hellenius
M.L.
de Faire
U.
Franco-Cereceda
A.
Syvanen
A.C.
, et al. . (
2012
)
Common and low-frequency genetic variants in the PCSK9 locus influence circulating PCSK9 levels
.
Arterioscler. Thromb. Vasc. Biol
.,
32
,
1526
1534
.

18

Zhou
L.
Zhang
X.
He
M.
Cheng
L.
Chen
Y.
Hu
F.B.
Wu
T.
(
2008
)
Associations between single nucleotide polymorphisms on chromosome 9p21 and risk of coronary heart disease in Chinese Han population
.
Arterioscler. Thromb. Vasc. Biol
.,
28
,
2085
2089
.

19

Ye
X.
Yu
Z.
Li
H.
Franco
O.H.
Liu
Y.
Lin
X.
(
2007
)
Distributions of C-reactive protein and its association with metabolic syndrome in middle-aged and older Chinese people
.
J. Am. Coll. Cardiol
.,
49
,
1798
1805
.

20

Li
H.
Gan
W.
Lu
L.
Dong
X.
Han
X.
Hu
C.
Yang
Z.
Sun
L.
Bao
W.
Li
P.
, et al. . (
2013
)
A genome-wide association study identifies GRK5 and RASGRP1 as type 2 diabetes loci in Chinese Hans
.
Diabetes
,
62
,
291
298
.

21

Tan
A.
Sun
J.
Xia
N.
Qin
X.
Hu
Y.
Zhang
S.
Tao
S.
Gao
Y.
Yang
X.
Zhang
H.
, et al. . (
2012
)
A genome-wide association and gene-environment interaction study for serum triglycerides levels in a healthy Chinese male population
.
Hum. Mol. Genet
.,
21
,
1658
1664
.

22

Hou
L.
Chen
S.
Yu
H.
Lu
X.
Chen
J.
Wang
L.
Huang
J.
Fan
Z.
Gu
D.
(
2009
)
Associations of PLA2G7 gene polymorphisms with plasma lipoprotein-associated phospholipase A2 activity and coronary heart disease in a Chinese Han population: the Beijing atherosclerosis study
.
Hum. Genet
.,
125
,
11
20
.

23

Lu
X.
Wang
L.
Chen
S.
He
L.
Yang
X.
Shi
Y.
Cheng
J.
Zhang
L.
Gu
C.C.
Huang
J.
, et al. . (
2012
)
Genome-wide association study in Han Chinese identifies four new susceptibility loci for coronary artery disease
.
Nat. Genet
.,
44
,
890
894
.

24

Nikpay
M.
Goel
A.
Won
H.H.
Hall
L.M.
Willenborg
C.
Kanoni
S.
Saleheen
D.
Kyriakou
T.
Nelson
C.P.
Hopewell
J.C.
, et al. . (
2015
)
A comprehensive 1,000 Genomes-based genome-wide association meta-analysis of coronary artery disease
.
Nat. Genet
.,
47
,
1121
1130
.

25

Zhang
Y.
Long
J.
Lu
W.
Shu
X.O.
Cai
Q.
Zheng
Y.
Li
C.
Li
B.
Gao
Y.T.
Zheng
W.
(
2014
)
Rare coding variants and breast cancer risk: evaluation of susceptibility Loci identified in genome-wide association studies
.
Cancer Epidemiol. Biomarkers. Prev
.,
23
,
622
628
.

26

Liu
D.J.
Peloso
G.M.
Zhan
X.
Holmen
O.L.
Zawistowski
M.
Feng
S.
Nikpay
M.
Auer
P.L.
Goel
A.
Zhang
H.
, et al. . (
2014
)
Meta-analysis of gene-level tests for rare variant association
.
Nat. Genet
.,
46
,
200
204
.

27

Feng
S.
Liu
D.
Zhan
X.
Wing
M.K.
Abecasis
G.R.
(
2014
)
RAREMETAL: fast and powerful meta-analysis for rare variants
.
Bioinformatics
,
30
,
2828
2829
.

28

Wu
M.C.
Lee
S.
Cai
T.
Li
Y.
Boehnke
M.
Lin
X.
(
2011
)
Rare-variant association testing for sequencing data with the sequence kernel association test
.
Am. J. Hum. Genet
.,
89
,
82
93
.

29

Lee
S.
Emond
M.J.
Bamshad
M.J.
Barnes
K.C.
Rieder
M.J.
Nickerson
D.A.
Christiani
D.C.
Wurfel
M.M.
Lin
X.
(
2011
)
Optimal unified approach for rare-variant association testing with application to small-sample case-control whole-exome sequencing studies
.
Am. J. Hum. Genet
.,
91
,
224
237
.

30

Willer
C.J.
Li
Y.
Abecasis
G.R.
(
2010
)
METAL: fast and efficient meta-analysis of genomewide association scans
.
Bioinformatics
,
26
,
2190
2191
.

31

Wang
K.
Li
M.
Hakonarson
H.
(
2010
)
ANNOVAR: functional annotation of genetic variants from high-throughput sequencing data
.
Nucleic. Acids. Res
.,
38
,
e164.

32

Ng
P.C.
Henikoff
S.
(
2003
)
SIFT: Predicting amino acid changes that affect protein function
.
Nucleic. Acids. Res
.,
31
,
3812
3814
.

33

Adzhubei
I.A.
Schmidt
S.
Peshkin
L.
Ramensky
V.E.
Gerasimova
A.
Bork
P.
Kondrashov
A.S.
Sunyaev
S.R.
(
2010
)
A method and server for predicting damaging missense mutations
.
Nat. Methods
,
7
,
248
249
.

Author notes

The authors contributed equally to this work.

Supplementary data