Abstract

To evaluate the contribution of non-synonymous-coding variants of known familial and genome-wide association studies (GWAS)-linked genes for Parkinson's disease (PD) to PD risk in the East Asian population, we sequenced all the coding exons of 39 PD-related disease genes and evaluated the accumulation of rare non-synonymous-coding variants in 375 early-onset PD cases and 399 controls. We also genotyped 782 non-synonymous-coding variants of these genes in 710 late-onset PD cases and 9046 population controls. Significant enrichment of LRRK2 variants was observed in both early- and late-onset PD (odds ratio = 1.58; 95% confidence interval = 1.29–1.93; P = 8.05 × 10−6). Moderate enrichment was also observed in FGF20, MCCC1, GBA and ITGA8. Half of the rare variants anticipated to cause loss of function of these genes were present in healthy controls. Overall, non-synonymous-coding variants of known familial and GWAS-linked genes appear to make a limited contribution to PD risk, suggesting that clinical sequencing of these genes will provide limited information for risk prediction and molecular diagnosis.

INTRODUCTION

All the protein-coding sequences of the genome, referred to as the ‘exome’, make up only 1–2% of the human genome, but are hypothesized to be enriched for genetic variants that have a direct impact on the functions of encoded proteins and therefore to have larger effects on disease susceptibility compared with the common, non-coding variants identified by genome-wide association studies (GWAS). There is emerging interest in using next-generation sequencing for the identification and evaluation of genes with non-synonymous variants that may provide informative risk prediction and molecular diagnosis in a clinical setting (1), such as BRCA1 in breast and ovarian cancer risk and APOE in Alzheimer's disease.

Recent sequencing studies have demonstrated the influence of rare non-synonymous-coding variants on complex, late-onset disease risk in the general population (2,3). In addition to single-variant association tests on common and low-frequency variants, gene-based ‘mutation load’ or burden tests are often useful in identifying genes enriched for rare coding variants or functionally impaired variants in disease cases compared with controls (2). These genes may represent excellent biologic candidates for functional follow-up. An evaluation of ‘mutation load’ of rare non-synonymous-coding variants identified by resequencing of six candidate genes for Alzheimer's disease led to the confirmation of TREM2 as a risk gene, with the association mostly driven by the rare variant Arg42His (3).

Genetic studies conducted over the years have led to the discovery of numerous pathogenic genes with high-penetrance mutations by family-based linkage studies and susceptibility genes with low-penetrance risk variants by population-based GWAS in Parkinson's disease [PD (OMIM:168600)] (4–15). GWAS-linked single nucleotide polymorphisms (SNPs) are widely believed to tag non-coding regulatory variants (16,17). However, a systematic fine-mapping study has not been done to exclude the potential contribution of coding variants to the association signals detected by GWAS that may occur through direct tagging or synthetic associations with common or low-frequency non-synonymous-coding variants. Moreover, we do not know if non-synonymous-coding variants of genes detected by GWAS may also play additional independent pathophysiologic roles in PD beyond the contribution of non-coding regulatory risk variants of these genes, especially the rare ones that are poorly tagged on genome-wide SNP arrays. Except for a few well-established pathogenic mutations, the overall contribution of non-synonymous-coding variants in familial and GWAS-linked genes to PD risk has yet to be evaluated and reported.

To address these gaps in knowledge, we conducted the first study to test the above hypotheses in PD by comprehensively examining non-synonymous-coding variants (including splice-site, non-sense and frameshift variants) in 39 familial and GWAS-linked PD genes in over 10 000 East Asian subjects.

RESULTS

We first sequenced all the coding exons of 39 PD-related disease genes in 375 subjects with early-onset PD (<55 years) and 399 ethnicity-matched controls. A total of 814 coding variants were identified, of which 479 were non-synonymous ones (including 7 splice-site variants and 26 indels). We conducted single-variant association tests on the 36 common non-synonymous variants with minor allele frequencies (MAF) >5%. None of these showed evidence of association with PD (Supplementary Material, Table S1), although we note that the current sample size was powered to detect common variants (MAF 5–50%) of effect sizes with odds ratios (ORs) much larger than 1.5 (18). We then evaluated the accumulation of 443 rare (MAF <1%) or low-frequency (MAF 1–5%) non-synonymous-coding variants in each gene in the cases and controls. The analysis of 35 individual genes revealed the enrichment of rare non-synonymous-coding variants in LRRK2 (NM_198578) [OR = 1.88; 95% confidence interval (CI) = 1.31–2.71; P = 5.94 × 10−4], which passed the Bonferroni-corrected threshold for statistical significance (α = 0.05/35 = 0.0014) (Supplementary Material, Table S2a). Ten cases and five controls carried two mutations in LRRK2; all carried at least one copy of either Gly2385Arg or Arg1628Pro. Even after excluding the low-frequency, well-studied Asian risk variants Arg1628Pro and Gly2385Arg that were enriched in patients (OR = 1.99; 95% CI = 1.20–3.31; P = 6.95 × 10−3), LRRK2 still showed significant enrichment of other rare novel non-synonymous-coding variants (OR = 1.69; 95% CI = 1.07–2.66; P = 0.022) (Table 1).

Table 1.

Rare non-synonymous-coding variants identified in LRRK2 (NM_198578) in the early-onset PD samples (375 cases and 399 controls) that were included in the gene-based tests

LRRK2 variant N cases N controls Freq cases (%) Freq controls (%) 
Gln116Arg 0.00 0.25 
Ser192Ala 0.27 0.00 
Asn238Ile 0.53 0.25 
Lys258Asn 0.27 0.00 
Val291Ala 0.27 0.00 
Leu308Met 0.00 0.25 
Ala312Val 0.27 0.00 
Leu378Phe 0.53 0.00 
Ala419Val 14 3.73 1.50 
Met579Val 0.27 0.00 
Lys616Arg 0.27 0.00 
Ala654Val 0.00 0.25 
Ser661Phe 0.27 0.00 
Phe676Ser 0.00 0.25 
Ile723Val 0.80 0.75 
Pro755Leu 1.87 1.50 
Ile777Met 0.00 0.25 
Ile786Phe 0.53 0.00 
Ile844Asn 0.27 0.00 
Cys925Tyr 0.00 0.25 
Arg1320Ser 0.53 0.50 
Arg1325Gln 0.00 0.25 
Val1340Met 0.27 0.00 
Ile1371Val 0.27 0.00 
Leu1388Ile 0.00 0.50 
Arg1441Cys 0.27 0.25 
Pro1446Leu 0.53 0.75 
Val1450Ile 0.80 0.00 
Pro1542Ser 0.00 0.25 
Glu1616Lys 0.00 0.25 
Ile1626Val 0.27 0.00 
Arg1628Pro 19 12 5.07 3.01 
Asp1756Tyr 0.27 0.00 
Asp1756Gly 0.27 0.00 
Cys1774Tyr 0.27 0.00 
Gln1961Arg 0.27 0.00 
Asn2081Asp 0.00 0.25 
Asn 2251Thr 0.27 0.00 
Gly2385Arg 29 16 7.73 4.01 
His2391Gln 0.27 0.75 
Thr2524Ala 0.27 0.25 
LRRK2 variant N cases N controls Freq cases (%) Freq controls (%) 
Gln116Arg 0.00 0.25 
Ser192Ala 0.27 0.00 
Asn238Ile 0.53 0.25 
Lys258Asn 0.27 0.00 
Val291Ala 0.27 0.00 
Leu308Met 0.00 0.25 
Ala312Val 0.27 0.00 
Leu378Phe 0.53 0.00 
Ala419Val 14 3.73 1.50 
Met579Val 0.27 0.00 
Lys616Arg 0.27 0.00 
Ala654Val 0.00 0.25 
Ser661Phe 0.27 0.00 
Phe676Ser 0.00 0.25 
Ile723Val 0.80 0.75 
Pro755Leu 1.87 1.50 
Ile777Met 0.00 0.25 
Ile786Phe 0.53 0.00 
Ile844Asn 0.27 0.00 
Cys925Tyr 0.00 0.25 
Arg1320Ser 0.53 0.50 
Arg1325Gln 0.00 0.25 
Val1340Met 0.27 0.00 
Ile1371Val 0.27 0.00 
Leu1388Ile 0.00 0.50 
Arg1441Cys 0.27 0.25 
Pro1446Leu 0.53 0.75 
Val1450Ile 0.80 0.00 
Pro1542Ser 0.00 0.25 
Glu1616Lys 0.00 0.25 
Ile1626Val 0.27 0.00 
Arg1628Pro 19 12 5.07 3.01 
Asp1756Tyr 0.27 0.00 
Asp1756Gly 0.27 0.00 
Cys1774Tyr 0.27 0.00 
Gln1961Arg 0.27 0.00 
Asn2081Asp 0.00 0.25 
Asn 2251Thr 0.27 0.00 
Gly2385Arg 29 16 7.73 4.01 
His2391Gln 0.27 0.75 
Thr2524Ala 0.27 0.25 

Moderate enrichment was also observed for GBA (NM_000157) (P = 0.013; OR = 2.64; 95% CI = 1.19–5.84), MCCC1 (NM_020166) (P = 0.037; OR = 3.66, 95% CI = 1.00–13.43) and FGF20 (NM_019851) (P = 0.031; OR = 2.09; 95% CI = 1.05–4.16), but this was not statistically significant after correction for multiple testing (αcorrected = 0.0013) (Supplementary Material, Table S2a). No significant differences were observed between Chinese and Koreans across all the genes analyzed (Breslow–Day test, Phet > 0.05).

Modifying the stringency of selecting deleterious variants to be included in the gene-based load analysis did not identify genes with strong evidence of enrichment of such variants. When the analysis was restricted to non-synonymous variants predicted to be damaging by in silico functional prediction tools (missense with Condel score > 0.605, splice-site, non-sense and frameshifting indels; see Materials and Methods) (19), only LRRK2 (P = 0.028; OR = 1.80; 95% CI = 1.06–3.08), GBA (P = 0.0040; OR = 3.61; 95% CI = 1.43–9.10) and FGF20 (P = 0.031; OR = 2.09; 95% CI = 1.05–4.16) but this did not survive correction for multiple testing. Similarly, when evaluating only the 420 rare non-synonymous-coding variants with MAF <1% in the controls, only LRRK2 (P = 0.041; OR = 1.90; 95% CI = 1.01–3.57) and GBA (P = 0.0033; OR = 3.42; 95% CI = 1.43–8.12) showed moderate evidence of enrichment. We also performed the sequence kernel association test, optimized (SKAT-O) test in each gene by using all non-synonymous-coding variants (without minor allele frequency filtering) (20). We obtained similar P-values to the Fisher's exact test on rare and low-frequency variants in both Chinese and Koreans (Supplementary Material, Table S2b).

We then evaluated the enrichment of rare or low-frequency (MAF <5%) non-synonymous-coding variants of all the familial PD genes together as a single group in the early-onset PD patients and controls. Early-onset PD patients were significantly more likely to carry at least one (OR = 1.39; 95% CI = 1.04–1.86; P = 0.028) or at least two (OR = 2.65; 95% CI = 1.45–4.83; P = 9.43 × 10−4) variants in the familial PD genes compared with the controls. The majority of these carriers carried at least one LRRK2 mutation, and the enrichment was no longer statistically significant after excluding LRRK2 from the analysis (Supplementary Material, Table S3a). In the grouped analysis of rare non-synonymous-coding variants of the 33 GWAS-linked genes together, we also observed an excess of early-onset PD patients carrying one or more variants (OR = 1.58; 95% CI = 1.16–2.14; P = 3.31 × 10−3), two or more variants (OR = 1.38; 95% CI = 1.02–1.87; P = 0.039) or three or more variants (OR = 2.17 ; 95% CI = 1.37–3.44; P = 8.20 × 10−4) compared with controls. The enrichment remained marginally significant after excluding LRRK2 from the analysis (Supplementary Material, Table S3a), suggesting that a subset of these variants among this large group of genes may also influence PD risk. However, the larger selection of GWAS-linked genes and the larger number of variants or carriers analyzed may have influenced the statistics, and may also increase the background signal through the inclusion of variants from genes with no effect on PD. Given this potential bias and the borderline significant P-values, the results must be interpreted with caution and studies in much larger samples will be required to validate this observation. Finally, similar observations were made when analyzing only rare non-synonymous-coding variants with MAF <1% in the controls or those predicted to be damaging by in silico functional prediction tools in these groups of genes (Supplementary Material, Table S3b).

In addition to the sequencing analysis of the 39 PD disease genes in the early-onset PD samples, we also analyzed the non-synonymous-coding variants of the same 39 genes in the late-onset samples by genotyping a total of 782 non-synonymous-coding variants in 710 late-onset (>55 years) PD patients and 9046 population controls. These include 45.4% of all the missense variants and 85% of the common and low-frequency variants identified by the sequencing analysis in the early-onset samples. Two hundred and ninety-four variants (of 782 genotyped) were polymorphic and passed quality control filtering, of which 35 were common (MAF >5%) or low-frequency (MAF 1–5%) and 259 were rare (MAF <1%). We first tested each single variant with MAF >1% for association with PD (Supplementary Material, Table S4). None of the 35 variants reached statistical significance after correcting for multiple testing (α = 0.0014), although moderate enrichment was observed at LRRK2 Gly2385Arg (OR = 1.48; 95% CI = 1.08–2.01; P = 0.016), consistent with previous reports (4). In the gene-based analysis comparing the number of carriers of rare or low-frequency (MAF <5%) non-synonymous-coding variants among cases and controls within each gene, evidence of enrichment was observed only at LRRK2 (OR = 1.44, P = 0.0041) (Supplementary Material, Table S5), although this did not remain significant after Bonferroni correction for multiple testing (αcorrected = 0.001). Furthermore, we did not see any evidence for the overall enrichment of non-synonymous-coding variants for either familial or GWAS-linked genes together as a single group in the late-onset samples. Similar results were obtained using the SKAT-O test (Supplementary Material, Table S5) (20).

Next, we evaluated the enrichment of non-synonymous-coding variants in the combined early-onset and late-onset PD samples. As expected, LRRK2 showed significant enrichment (OR = 1.58; 95% CI = 1.29–1.93; P = 8.05 × 10−6; Phet = 0.44) (Table 2), confirming the role of LRRK2 coding variants in influencing both early- and late-onset PD risk. In addition, enrichment was also observed for FGF20 (P = 6.79 × 10−3; OR = 2.21; 95% CI = 1.22–3.99), MCCC1 (P = 0.013; OR = 2.58; 95% CI = 1.23–5.41), GBA (P = 0.018; OR = 2.51; 95% CI = 1.15–5.45) and ITGA8 (NM_003638) (P = 0.024; OR = 1.43; 95% CI = 1.05–1.94), but these did not survive the Bonferroni correction for multiple testing.

Table 2.

Gene-based mutation load test results in a combined analysis of the early-onset (375 cases and 399 controls) and late-onset (710 cases and 9046 controls) PD samples

RefSeq ID Gene Discovery CMH early- and late-onset PD
 
BD 
PCMH Odds ratio L95 U95 
NM_198578 LRRK2 Familial + GWAS 8.05E − 06 1.58 1.29 1.93 0.44 
NM_019851 FGF20 GWAS 6.79E − 03 2.21 1.22 3.99 0.82 
NM_020166 MCCC1 GWAS 0.013 2.58 1.23 5.41 0.72 
NM_000157 GBA Familial + GWAS 0.018 2.51 1.15 5.45 0.52 
NM_003638 ITGA8 GWAS 0.024 1.43 1.05 1.94 0.99 
NM_152491 PM20D1 GWAS 0.032 0.51 0.28 0.94 0.59 
NM_013233 STK39 GWAS 0.060 0.16 0.02 1.39 0.68 
NM_000345 SNCA Familial + GWAS 0.065 1.91 0.97 3.77 0.22 
NM_003959 HIP1R GWAS 0.13 0.75 0.52 1.09 0.99 
NM_004176 SREBF1 GWAS 0.17 1.26 0.91 1.73 0.78 
NM_032409 PINK1 Familial 0.19 0.70 0.41 1.21 0.86 
NM_138326 ACMSD GWAS 0.25 1.22 0.87 1.71 0.50 
NM_018206 VPS35 Familial (novel) 0.31 1.72 0.64 4.59 0.25 
NM_030665 RAI1 GWAS 0.38 0.84 0.57 1.25 0.14 
NM_003929 RAB7L1 GWAS 0.41 0.37 0.04 3.83 0.54 
NM_003943 STBD1 GWAS 0.48 0.71 0.28 1.82 0.74 
NM_152280 SYT11 GWAS 0.54 2.06 0.19 22.48 0.37 
NM_004562 PARK2 Familial 0.62 1.13 0.70 1.84 0.60 
NM_007262 PARK7 Familial 0.66 0.80 0.28 2.32 0.36 
NM_173854 SLC41A1 GWAS 0.69 1.14 0.60 2.18 0.98 
NM_004953 EIF4G1 Familial (novel) 0.71 0.92 0.59 1.43 0.21 
NM_001347 DGKQ GWAS 0.71 0.92 0.61 1.39 0.05 
NM_020387 RAB25 GWAS 0.73 0.77 0.17 3.38 0.57 
NM_014798 PLEKHM1 GWAS 0.75 0.86 0.37 2.03 0.26 
NM_002510 GPNMB GWAS 0.78 0.95 0.64 1.39 0.62 
NM_004334 BST1 GWAS 0.80 0.97 0.74 1.26 0.75 
NM_005506 SCARB2 GWAS 0.84 0.96 0.66 1.41 0.73 
NM_014398 LAMP3 GWAS 0.85 1.05 0.65 1.69 0.52 
NM_022089 ATP13A2 Familial 0.86 0.97 0.65 1.43 0.47 
NM_005910 MAPT Familial + GWAS 0.93 0.99 0.72 1.35 0.83 
NM_201435 CCDC62 GWAS 0.94 1.02 0.56 1.88 0.99 
NM_005255 GAK GWAS 0.97 1.00 0.78 1.27 0.97 
NM_015443 KANSL1 GWAS 0.97 1.00 0.77 1.32 0.19 
NM_033102 SLC45A3 GWAS 0.98 1.02 0.32 3.19 0.15 
NM_052885 SLC2A13 GWAS 0.98 0.99 0.54 1.82 0.63 
NM_002930 RIT2 GWAS 0.78 
RefSeq ID Gene Discovery CMH early- and late-onset PD
 
BD 
PCMH Odds ratio L95 U95 
NM_198578 LRRK2 Familial + GWAS 8.05E − 06 1.58 1.29 1.93 0.44 
NM_019851 FGF20 GWAS 6.79E − 03 2.21 1.22 3.99 0.82 
NM_020166 MCCC1 GWAS 0.013 2.58 1.23 5.41 0.72 
NM_000157 GBA Familial + GWAS 0.018 2.51 1.15 5.45 0.52 
NM_003638 ITGA8 GWAS 0.024 1.43 1.05 1.94 0.99 
NM_152491 PM20D1 GWAS 0.032 0.51 0.28 0.94 0.59 
NM_013233 STK39 GWAS 0.060 0.16 0.02 1.39 0.68 
NM_000345 SNCA Familial + GWAS 0.065 1.91 0.97 3.77 0.22 
NM_003959 HIP1R GWAS 0.13 0.75 0.52 1.09 0.99 
NM_004176 SREBF1 GWAS 0.17 1.26 0.91 1.73 0.78 
NM_032409 PINK1 Familial 0.19 0.70 0.41 1.21 0.86 
NM_138326 ACMSD GWAS 0.25 1.22 0.87 1.71 0.50 
NM_018206 VPS35 Familial (novel) 0.31 1.72 0.64 4.59 0.25 
NM_030665 RAI1 GWAS 0.38 0.84 0.57 1.25 0.14 
NM_003929 RAB7L1 GWAS 0.41 0.37 0.04 3.83 0.54 
NM_003943 STBD1 GWAS 0.48 0.71 0.28 1.82 0.74 
NM_152280 SYT11 GWAS 0.54 2.06 0.19 22.48 0.37 
NM_004562 PARK2 Familial 0.62 1.13 0.70 1.84 0.60 
NM_007262 PARK7 Familial 0.66 0.80 0.28 2.32 0.36 
NM_173854 SLC41A1 GWAS 0.69 1.14 0.60 2.18 0.98 
NM_004953 EIF4G1 Familial (novel) 0.71 0.92 0.59 1.43 0.21 
NM_001347 DGKQ GWAS 0.71 0.92 0.61 1.39 0.05 
NM_020387 RAB25 GWAS 0.73 0.77 0.17 3.38 0.57 
NM_014798 PLEKHM1 GWAS 0.75 0.86 0.37 2.03 0.26 
NM_002510 GPNMB GWAS 0.78 0.95 0.64 1.39 0.62 
NM_004334 BST1 GWAS 0.80 0.97 0.74 1.26 0.75 
NM_005506 SCARB2 GWAS 0.84 0.96 0.66 1.41 0.73 
NM_014398 LAMP3 GWAS 0.85 1.05 0.65 1.69 0.52 
NM_022089 ATP13A2 Familial 0.86 0.97 0.65 1.43 0.47 
NM_005910 MAPT Familial + GWAS 0.93 0.99 0.72 1.35 0.83 
NM_201435 CCDC62 GWAS 0.94 1.02 0.56 1.88 0.99 
NM_005255 GAK GWAS 0.97 1.00 0.78 1.27 0.97 
NM_015443 KANSL1 GWAS 0.97 1.00 0.77 1.32 0.19 
NM_033102 SLC45A3 GWAS 0.98 1.02 0.32 3.19 0.15 
NM_052885 SLC2A13 GWAS 0.98 0.99 0.54 1.82 0.63 
NM_002930 RIT2 GWAS 0.78 

CMH: Cochran–Mantel–Haenszel; BD: Breslow–Day test.

P-value across early- and late-onset PD, conducted independently of the CMH test. Not shown are genes with no rare non-synonymous variants identified (NUCKS1 [NM_022731], NSF [NM_006178] and STX1B [NM_052874]).

Finally, we evaluated the distribution of coding variants most likely to cause loss of function of the encoded protein, namely non-sense, predicted splice-site (2 bp flanking exon–intron junctions) and frameshift-causing insertions and deletions, in the early- and late-onset sample collections and healthy controls. We identified a total of 36 loss-of-function mutations in 22 genes within our dataset, including those affecting autosomal dominant PD genes such as LRRK2 and GBA, autosomal recessive PD genes such as PINK1, PARK7 and ATP13A2, and GWAS-linked genes such as GPNMB (Supplementary Material, Table S6). All were rare or present at low frequencies (one with MAF ∼2%, others <1%) and half (18 of 36) were present in healthy controls. None of the cases or controls carried two or more of such variants within the familial genes as a group, while only two early-onset cases (one SLC2A13 homozygote and one LRRK2/ACMSD transheterozygote), one late-onset case (GAK homozygote) and eight controls (ACMSD/GAK/LRRK2 homozygotes or transheterozygotes) carried two or more of such variants within the GWAS-linked genes (Supplementary Material, Table S6). However, most of the variants found in homozygotes or transheterozygotes are exonic variants located close to the exon–intron junctions (within 1–2 bp), and their effects on splicing will have to be further verified.

DISCUSSION

This is the first study where the coding sequences of all the 39 known PD-related disease genes were comprehensively analyzed in a large number of PD samples using next-generation sequencing and genotyping analyses. We have demonstrated that rare non-synonymous variants of LRRK2 showed significant enrichment in both early- and late-onset PD patients, confirming their influence on the risk of PD. However, no significant enrichment was found for the other PD disease genes identified so far by linkage and GWAS analyses, although moderate enrichment was observed at a few other genes, such as GBA, MCCC1 and FGF20. Overall, no significant differences were observed between Chinese and Koreans across all the gene-based and grouped analyses we performed (Breslow–Day test, Phet > 0.05). Non-synonymous-coding variants of known PD disease genes appear to make limited or moderate contribution to PD risk. Consistent with our findings, Hunt et al. (21) reported that rare coding variants at 25 genes within known loci for autoimmune diseases had negligible impact on common autoimmune disease susceptibility in a large-scale sequencing and genotyping analysis of 41 911 subjects. Despite a larger sample size, there was still no evidence that non-synonymous variants within known genes could account for the observed GWAS signals, or that rare non-synonymous variants within genes linked to these loci influenced disease risk. Our study also suggested that common (MAF >5%) or low-frequency (1–5%) non-synonymous-coding variants of susceptibility genes within GWAS loci are unlikely to account for the observed association signals within those loci (through direct tagging or synthetic associations), and these association signals are more likely due to non-coding mutations of regulatory elements (16,17), including untranslated regions and intronic or intergenic transcription factor-binding sites, which are yet to be revealed through fine-mapping analysis in larger clinical samples.

With our current sample size, we estimate we had 80% power to identify a gene with OR = 1.3 if rare risk variants are present cumulatively in 10% of the population, OR = 1.9 if rare variants are present in 1% and OR = 3.75 if rare variants are present in 0.1% (18). Our study has mostly confirmed that none of these linkage or GWAS genes had effects and frequency of variants comparable with that of LRRK2, but we have limited power to detect those with smaller effects and/or lower cumulative frequency of rare risk variants in the population. Based on our data, we estimate that the fraction of variation in risk explained by rare coding variation in LRRK2 is ∼1% in early-onset PD and 0.6% in late-onset PD and is expected to be much lower in other individual genes. We also estimated in a stepwise regression model (with PD risk encoded as 0 and 1) entering all rare coding variation in these 35 genes that, after LRRK2 variants G2385R and R1628P are forced into the model, the remaining rare variants explain not more than another ∼1% of PD risk in our dataset (total 2% in early-onset PD and 1.4% in late-onset PD).

It is, however, interesting to note that our study revealed evidence for early-onset patients to carry two or more non-synonymous-coding variants of the familial and GWAS PD disease genes as a group, but such enrichment was not observed in the late-onset patients. Caution is needed when comparing the early- and late-onset samples, since there are additional rare variants in the late-onset samples that will be missed because they were genotyped at specific positions rather than sequenced. Nevertheless, our existing data may suggest that early-onset PD patients suffer a higher ‘mutation load’ than late-onset PD patients, which is consistent with the reported stronger genetic risk for early-onset compared with late-onset PD (4). It is also interesting to note that non-synonymous mutations, particularly loss-of-function variants of known pathogenic autosomal dominant disease genes, such as LRRK2 and SNCA, were identified in healthy controls, and some healthy controls were found to carry more than one mutation in these genes (Supplementary Material, Tables S2 and S3). These variants are likely to cause disease in compound heterozygote or homozygous form, or when inherited with risk variants in other genes. It is likely that compensatory or modifier mutations (within these genes and others yet to be identified) modulate the penetrance of deleterious variants in these known pathogenic PD genes. This will confound risk prediction based solely on currently known disease genes.

In summary, we have shown that, apart from LRRK2, non-synonymous variants in other known PD-related disease genes make only a small contribution to PD risk and thus provide limited information for risk prediction and molecular diagnosis, at least in the East Asian population. Nonetheless, our study identified a few genes that show modest association with deleterious variants and their biologic relevance should be further investigated. We anticipate that the investigation of coding variants, even the rare ones by burden tests, can still implicate novel disease susceptibility genes and thus provide new biologic insights into the pathogenesis of PD. To discover novel genes and coding variants influencing PD risk by whole-exome or genome sequencing, large numbers of cases and controls will be needed.

The limited contribution of rare non-synonymous variants revealed by our study could also be due to the fact that only a fraction of the non-synonymous-coding variants studied here are pathogenic. We cannot be certain which mutations are truly pathogenic without biochemical and clinical data. Without a sensitive and specific filter, the inclusion of non-pathogenic variants in the association test may mask the effects of the true pathogenic ones. New and accurate prediction algorithms and/or biochemical assays will therefore be needed to distinguish deleterious from neutral non-synonymous-coding variants (22). Nonetheless, we have examined mutations that are predicted to affect splicing or cause premature termination of the protein and thus have high likelihood of loss of function by affecting expression of the full length protein. These deleterious mutations do not show obvious enrichment in cases compared with controls. We observed overall enrichment of rare non-synonymous variants in LRRK2 in the cases compared with the controls, even after removal of known risk variants; we therefore expect that this approach, whether applied genome-wide or to well-selected candidates, may help to identify additional potential pathogenic genes with such enrichment. However, this may depend on other variables such as the total frequency of non-synonymous-coding variants and the fraction of these variants that are pathogenic. Intriguingly, rare non-synonymous-coding variants of the familial and GWAS genes appear to have an additive risk effect, particularly for early-onset PD, where significantly more patients carried two or more variants in the combined set of familial or GWAS genes compared with controls. An integrated analysis of known and newly-discovered genes based on fully annotated genomic data will be needed for medical sequencing to be truly informative in a clinical setting. Eventually, this will also further our understanding of complex diseases and help better stratify at-risk subjects for preventive drug trials.

MATERIALS AND METHODS

Selection of genes

We identified a total of 39 genes (Table 2) that have previously been identified by linkage analysis and GWAS to influence PD risk, based on literature searches, the Online Mendelian Inheritance in Man (OMIM) database and National Human Genome Research Institute (NHGRI)'s Catalog of Published GWAS. The genes included 8 functionally validated familial genes and 33 reported susceptibility genes within the 21 GWAS loci that reached genome-wide significance (P < 5 × 10−8) in at least one of eight major GWAS and meta-GWAS (5–12). Four genes (SNCA, MAPT, LRRK2 and GBA) overlap between the two groups. We also examined two recently reported familial genes EIF4G1 and VPS35 whose pathogenic roles have not been fully functionally validated (13–15).

Study samples

We analyzed 10 506 East Asian subjects from Singapore, Malaysia, Hong Kong and Korea. The ‘early-onset samples’ comprised of 375 early-onset (<55 years) PD patients (195 Chinese: age at sample collection: 54.1 ± 7.4 years, age of onset 46.0 ± 6.2 years; 65.9% males, 180 Koreans: age at sample collection: 52.1 ± 8.4 years, age of onset 41.4 ± 4.7 years; 47.8% males) and 399 elderly controls that have been verified to be free of neurologic diseases (219 Chinese: age at sample collection: 70.6 ± 7.8 years, 62.6% males, 180 Koreans: age at sample collection: 75.6 ± 3.5 years, 64.4% males) that matched the patients in term of gender distribution and geographic origin. The ‘late-onset’ samples comprised an independent collection of 710 late-onset (>55 years) ethnic Chinese PD cases (age at sample collection: 69.4 ± 9.2 years, age of onset 65.3 ± 9.3 years; 56.1% males) and 9046 population controls (age at sample collection: 50.5 ± 13.2 years, 55.9% males) from Singapore, verified to be genetically matched to the cases by principal components analysis (Supplementary Material, Fig. S1). This study was approved by the ethics committees or institutional review boards of the respective institutions.

Sample sequencing

To sequence all the exons of 39 PD-related disease genes, we performed targeted enrichment and sequencing of the 774 samples (375 early-onset PD patients and 399 controls) by sequence capture using the Agilent SureSelect All-Exon v1 kit (18 cases and 17 controls), Nimblegen SeqCap EZ Exome v2 (81 cases and 82 controls) or Nimblegen SeqCap EZ Choice <7 Mb (276 cases and 300 controls). Captured DNA libraries were sequenced using the Illumina GA II or HiSeq 2000 instrument using 2 × 76 or 2 × 101 bp read lengths. To control for batch effects caused by using different platforms, we took the following measures in the downstream analysis: (i) ensuring equal resequencing of cases and controls in each batch to guard against ascertainment bias; (ii) excluding variant calls at positions with <5× coverage across >90% of all samples; (iii) only analyzing genes that are well represented by probes and sequence reads in all the different arrays and batches of samples; (iv) using a standardized method for read alignment to the Hg19 reference genome, SNP/indel calling, filtering and annotation across all samples. Across all the sequenced samples, a high depth of coverage was obtained at the coding exons of the 39 genes analyzed in this study, with mean depth of coverage of 75× and an average 94.0% of the target bases (87.3 kb) covered by 10 or more reads in each sample. The coverage is similar between the cases and controls (e.g. average 94.2 versus 93.6%, respectively, covered by 10 or more reads in each sample), hence we do not expect coverage differences to introduce major biases in the analysis.

Reads were aligned to the reference genome (Hg19/Build37) using BWA (23). After removal of PCR duplicates (Picard), we performed realignment and recalibration followed by multi-sample SNP and indel calling and filtering with the Genome Analysis ToolKit (GATK) Unified Genotyper in accordance with ‘Best Practices for variant calling v3′’ recommended by GATK (Broad Institute) (24). Genotypes at positions covered by less than five reads or with quality scores <20 were set to missing; we only analyzed positions with genotype calls in >90% of the samples. Samples with excess number of variant calls indicating contamination were excluded from the analysis. Selected variants were visually inspected for proper read alignment in their respective samples using the Integrative Genomics Viewer (IGV) tool (25).

Sample genotyping

Genotyping of the 9756 samples of the late-onset cohort was performed using the Illumina Infinium HumanExome BeadChip (v1.0) Exome_Asian array, with customized add-on content enriched for rare, recurrent (found in >1 sample) variants detected by whole-exome sequencing of a subset of the early-onset PD samples (99 cases and 99 controls from Singapore). We performed standard QC filtering (99% call rate, Hardy–Weinberg P> 10−3) across all genotyped variants. To ensure that there was no bias resulting from differences in the sequencing and genotyping platforms, we checked genotype concordance of 117 variants in 103 samples with early-onset PD that were also genotyped on the exome array together with the samples from the late-onset PD dataset. About 100% concordance was observed between sample genotypes obtained by sequencing and genotyping at all but two of the rare variant positions (each with only one discordant genotype).

Statistical analyses

Single-variant analyses were performed using two-tailed Fisher's exact tests or logistic regression tests (PLINK) (26) and common non-synonymous variants with MAF >1% in the late-onset samples (710 cases and 9046 controls) and >5% in the early-onset samples (375 cases and 399 controls), since statistical power to detect variants with OR of 2 below these frequencies with the current sample size was limited (<60% at α = 0.001) (18). For gene-based tests, we focused our analysis on rare non-synonymous-coding variants (missense, splice, non-sense and frameshift) present in <5% of 1000 genomes and HapMap Europeans and Asians and <5% in our 399 controls of the early-onset cohort. Despite good sequencing coverage, no rare variants were found in four genes. In each of the remaining 35 genes, we compared the number of carriers of such variants between cases and controls using two-tailed Fisher's exact tests (within each sample collection) or Cochran–Mantel–Haenszel (CMH) tests for joint analysis across Chinese and Korean samples, and also across the early- and late-onset sample collections. For the CMH tests, we analyzed each dataset separately using all available variants that passed quality control and combined the statistics across independent datasets. We focused our analysis on the identification of genes where rare variants had a combined risk effect (i.e. enriched in cases compared with controls). We also assessed heterogeneity of effect sizes across the sample collections using the Breslow–Day test. In the gene-based tests, we estimate that we had 56 and 100% power (α = 0.001) to detect genes with rare variants in 5% of samples and minimum overall OR = 2 in the sequenced and genotyped samples, respectively, and 72 and 100% power to detect genes with rare variants in 2% of samples and minimum overall OR = 3 in the sequenced and genotyped samples, respectively (18). SKAT-O tests were performed using R scripts from Lee et al. (20). Statistical analyses were performed using IBM SPSS Statistics 20. We used the Condel tool (19) for functional prediction of missense variants, combining scores from PolyPhen (27), SIFT (28), PhyloP (29) and MutationTaster (30) to calculate a weighted score, and classifying variants as deleterious if they had a score of >0.605 (cut-off determined based on the training dataset provided) (19).

SUPPLEMENTARY MATERIAL

Supplementary Material is available at HMG online.

FUNDING

This work was supported by the Agency for Science, Technology and Research (A*STAR), Singapore, the National Medical Research Council, Duke-NUS Graduate Medical School and the Singapore Millennium Foundation.

ACKNOWLEDGEMENTS

The collection and handling of samples from Malaysian patients were supported by the Malaysian Ministry of Higher Education grant for High-Impact Research (HIR) E000033.

Conflict of Interest statement. The authors declare no conflicts of interest.

REFERENCES

1
Roberts
N.J.
Vogelstein
J.T.
Parmigiani
G.
Kinzler
K.W.
Vogelstein
B.
Velculescu
V.E.
The predictive capacity of personal genome sequencing
Sci. Transl. Med.
 , 
2012
, vol. 
4
 pg. 
133ra58
 
2
Seddon
J.M.
Yu
Y.
Miller
E.C.
Reynolds
R.
Tan
P.L.
Gowrisankar
S.
Goldstein
J.I.
Triebwasser
M.
Anderson
H.E.
Zerbib
J.
, et al.  . 
Rare variants in CFI, C3 and C9 are associated with high risk of advanced age-related macular degeneration
Nat. Genet.
 , 
2013
, vol. 
45
 (pg. 
1366
-
1370
)
3
Guerreiro
R.
Wojtas
A.
Bras
J.
Carrasquillo
M.
Rogaeva
E.
Majounie
E.
Cruchaga
C.
Sassi
C.
Kauwe
J.S.
Younkin
S.
, et al.  . 
TREM2 variants in Alzheimer's disease
N. Engl. J. Med.
 , 
2012
, vol. 
368
 (pg. 
117
-
127
)
4
Shulman
J.M.
De Jager
P.L.
Feany
M.B.
Parkinson's disease: genetics and pathogenesis
Annu. Rev. Pathol.
 , 
2010
, vol. 
6
 (pg. 
193
-
222
)
5
Satake
W.
Nakabayashi
Y.
Mizuta
I.
Hirota
Y.
Ito
C.
Kubo
M.
Kawaguchi
T.
Tsunoda
T.
Watanabe
M.
Takeda
A.
, et al.  . 
Genome-wide association study identifies common variants at four loci as genetic risk factors for Parkinson's disease
Nat. Genet.
 , 
2009
, vol. 
41
 (pg. 
1303
-
1307
)
6
Simon-Sanchez
J.
Schulte
C.
Bras
J.M.
Sharma
M.
Gibbs
J.R.
Berg
D.
Paisan-Ruiz
C.
Lichtner
P.
Scholz
S.W.
Hernandez
D.G.
, et al.  . 
Genome-wide association study reveals genetic risk underlying Parkinson's disease
Nat. Genet.
 , 
2009
, vol. 
41
 (pg. 
1308
-
1312
)
7
Do
C.B.
Tung
J.Y.
Dorfman
E.
Kiefer
A.K.
Drabant
E.M.
Francke
U.
Mountain
J.L.
Goldman
S.M.
Tanner
C.M.
Langston
J.W.
, et al.  . 
Web-based genome-wide association study identifies two novel loci and a substantial genetic component for Parkinson's disease
PLoS Genet.
 , 
2011
, vol. 
7
 pg. 
e1002141
 
8
Hamza
T.H.
Zabetian
C.P.
Tenesa
A.
Laederach
A.
Montimurro
J.
Yearout
D.
Kay
D.M.
Doheny
K.F.
Paschall
J.
Pugh
E.
, et al.  . 
Common genetic variation in the HLA region is associated with late-onset sporadic Parkinson's disease
Nat. Genet.
 , 
2011
, vol. 
42
 (pg. 
781
-
785
)
9
Nalls
M.A.
Plagnol
V.
Hernandez
D.G.
Sharma
M.
Sheerin
U.M.
Saad
M.
Simon-Sanchez
J.
Schulte
C.
Lesage
S.
Sveinbjornsdottir
S.
, et al.  . 
Imputation of sequence variants for identification of genetic risks for Parkinson's disease: a meta-analysis of genome-wide association studies
Lancet
 , 
2011
, vol. 
377
 (pg. 
641
-
649
)
10
Pankratz
N.
Beecham
G.W.
DeStefano
A.L.
Dawson
T.M.
Doheny
K.F.
Factor
S.A.
Hamza
T.H.
Hung
A.Y.
Hyman
B.T.
Ivinson
A.J.
, et al.  . 
Meta-analysis of Parkinson's disease: identification of a novel locus, RIT2
Ann. Neurol.
 , 
2012
, vol. 
71
 (pg. 
370
-
384
)
11
International Parkinson's Disease Genomics Consortium (IPDGC); Wellcome Trust Case Control Consortium 2 (WTCCC2)
A two-stage meta-analysis identifies several new loci for Parkinson's disease
PLoS Genet.
 , 
2011
, vol. 
7
 pg. 
e1002142
 
12
Lill
C.M.
Roehr
J.T.
McQueen
M.B.
Kavvoura
F.K.
Bagade
S.
Schjeide
B.M.
Schjeide
L.M.
Meissner
E.
Zauft
U.
Allen
N.C.
, et al.  . 
Comprehensive research synopsis and systematic meta-analyses in Parkinson's disease genetics: The PDGene database
PLoS Genet.
 , 
2012
, vol. 
8
 pg. 
e1002548
 
13
Chartier-Harlin
M.C.
Dachsel
J.C.
Vilarino-Guell
C.
Lincoln
S.J.
Lepretre
F.
Hulihan
M.M.
Kachergus
J.
Milnerwood
A.J.
Tapia
L.
Song
M.S.
, et al.  . 
Translation initiator EIF4G1 mutations in familial Parkinson disease
Am. J. Hum. Genet.
 , 
2011
, vol. 
89
 (pg. 
398
-
406
)
14
Zimprich
A.
Benet-Pages
A.
Struhal
W.
Graf
E.
Eck
S.H.
Offman
M.N.
Haubenberger
D.
Spielberger
S.
Schulte
E.C.
Lichtner
P.
, et al.  . 
A mutation in VPS35, encoding a subunit of the retromer complex, causes late-onset Parkinson disease
Am. J. Hum. Genet.
 , 
2011
, vol. 
89
 (pg. 
168
-
175
)
15
Vilarino-Guell
C.
Wider
C.
Ross
O.A.
Dachsel
J.C.
Kachergus
J.M.
Lincoln
S.J.
Soto-Ortolaza
A.I.
Cobb
S.A.
Wilhoite
G.J.
Bacon
J.A.
, et al.  . 
VPS35 mutations in Parkinson disease
Am. J. Hum. Genet.
 , 
2011
, vol. 
89
 (pg. 
162
-
167
)
16
Schaub
M.A.
Boyle
A.P.
Kundaje
A.
Batzoglou
S.
Snyder
M.
Linking disease associations with regulatory information in the human genome
Genome Res.
 , 
2012
, vol. 
22
 (pg. 
1748
-
1759
)
17
Maurano
M.T.
Humbert
R.
Rynes
E.
Thurman
R.E.
Haugen
E.
Wang
H.
Reynolds
A.P.
Sandstrom
R.
Qu
H.
Brody
J.
, et al.  . 
Systematic localization of common disease-associated variation in regulatory DNA
Science
 , 
2012
, vol. 
337
 (pg. 
1190
-
1195
)
18
Purcell
S.
Cherny
S.S.
Sham
P.C.
Genetic power calculator: design of linkage and association genetic mapping studies of complex traits
Bioinformatics
 , 
2003
, vol. 
19
 (pg. 
149
-
150
)
19
Gonzalez-Perez
A.
Lopez-Bigas
N.
Improving the assessment of the outcome of nonsynonymous SNVs with a consensus deleteriousness score, Condel
Am. J. Hum. Genet.
 , 
2011
, vol. 
88
 (pg. 
440
-
449
)
20
Lee
S.
Emond
M.J.
Bamshad
M.J.
Barnes
K.C.
Rieder
M.J.
Nickerson
D.A.
Christiani
D.C.
Wurfel
M.M.
Lin
X.
NHLBI GO Exome Sequencing Project—ESP Lung Project Team
Optimal unified approach for rare-variant association testing with application to small-sample case-control whole-exome sequencing studies
Am. J. Hum. Genet.
 , 
2012
, vol. 
91
 (pg. 
224
-
237
)
21
Hunt
K.A.
Mistry
V.
Bockett
N.A.
Ahmad
T.
Ban
M.
Barker
J.N.
Barrett
J.C.
Blackburn
H.
Brand
O.
Burren
O.
, et al.  . 
Negligible impact of rare autoimmune-locus coding-region variants on missing heritability
Nature
 , 
2013
, vol. 
498
 (pg. 
232
-
235
)
22
Witt
H.
Beer
S.
Rosendahl
J.
Chen
J.M.
Chandak
G.R.
Masamune
A.
Bence
M.
Szmola
R.
Oracz
G.
Macek
M.
Jr.
, et al.  . 
Variants in CPA1 are strongly associated with early onset chronic pancreatitis
Nat. Genet.
 , 
2013
, vol. 
45
 (pg. 
1216
-
1220
)
23
Li
H.
Durbin
R.
Fast and accurate short read alignment with Burrows-Wheeler transform
Bioinformatics
 , 
2009
, vol. 
25
 (pg. 
1754
-
1760
)
24
McKenna
A.
Hanna
M.
Banks
E.
Sivachenko
A.
Cibulskis
K.
Kernytsky
A.
Garimella
K.
Altshuler
D.
Gabriel
S.
Daly
M.
, et al.  . 
The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data
Genome Res.
 , 
2010
, vol. 
20
 (pg. 
1297
-
1303
)
25
Robinson
J.T.
Thorvaldsdottir
H.
Winckler
W.
Guttman
M.
Lander
E.S.
Getz
G.
Mesirov
J.P.
Integrative genomics viewer
Nat. Biotechnol.
 , 
2011
, vol. 
29
 (pg. 
24
-
26
)
26
Purcell
S.
Neale
B.
Todd-Brown
K.
Thomas
L.
Ferreira
M.A.
Bender
D.
Maller
J.
Sklar
P.
de Bakker
P.I.
Daly
M.J.
, et al.  . 
PLINK: a tool set for whole-genome association and population-based linkage analyses
Am. J. Hum. Genet.
 , 
2007
, vol. 
81
 (pg. 
559
-
575
)
27
Adzhubei
I.A.
Schmidt
S.
Peshkin
L.
Ramensky
V.E.
Gerasimova
A.
Bork
P.
Kondrashov
A.S.
Sunyaev
S.R.
A method and server for predicting damaging missense mutations
Nat. Methods
 , 
2010
, vol. 
7
 (pg. 
248
-
249
)
28
Ng
P.C.
Henikoff
S.
SIFT: predicting amino acid changes that affect protein function
Nucleic Acids Res.
 , 
2003
, vol. 
31
 (pg. 
3812
-
3814
)
29
Cooper
G.M.
Stone
E.A.
Asimenos
G.
Green
E.D.
Batzoglou
S.
Sidow
A.
Distribution and intensity of constraint in mammalian genomic sequence
Genome Res.
 , 
2005
, vol. 
15
 (pg. 
901
-
913
)
30
Schwarz
J.M.
Rodelsperger
C.
Schuelke
M.
Seelow
D.
MutationTaster evaluates disease-causing potential of sequence alterations
Nat. Methods
 , 
2010
, vol. 
7
 (pg. 
575
-
576
)

Author notes

These senior authors contributed equally to this work.

Supplementary data