Cervical cancer is caused by infection with human papillomavirus (HPV). A genome-wide association study (GWAS) has identified several susceptibility loci for cervical cancer, but they explain only a small fraction of cervical cancer heritability. Other variants with weaker effect may be missed due to the stringent significance threshold. To identify important pathways in cervical carcinogenesis, we performed a two-stage pathway analysis in two independent GWASs in the Swedish population, using the single-nucleotide polymorphism (SNP) ratio test. The 565 predefined pathways from Kyoto Encyclopedia of Genes and Genomes and BioCarta databases were systematically evaluated in the discovery stage (1034 cases and 3948 controls with 632 668 SNPs) and the suggestive pathways were further validated in the replication stage (616 cases and 506 controls with 341 358 SNPs). We found 12 pathways that were significant in both stages, and these were further validated using set-based analysis. For 10 of these pathways, the effect was mainly due to genetic variation within the major histocompatibility complex (MHC) region. In addition, we identified a set of novel candidate genes outside the MHC region in the pathways denoted ‘Staphylococcus aureus infection’ and ‘herpes simplex infection’ that influenced susceptibility to cervical cancer (empirical P = 4.99 × 10−5 and 4.99 × 10−5 in the discovery study; empirical P = 8.98 × 10−5 and 0.009 in the replication study, respectively). Staphylococcus aureus infection may evoke an inflammatory response that inadvertently enhances malignant progression caused by HPV infection, and Herpes simplex virus-2 infection may act in conjunction with HPV infection to increase the risk of cervical carcinoma development. These findings provide new insights into the etiology of cervical cancer.

INTRODUCTION

Worldwide, cervical cancer is the third most common cancer and the second most frequent cause of cancer deaths among women (1). Infection with oncogenic types of human papillomavirus (HPV) is a necessary, but not sufficient cause of cervical cancer and its precursor lesions, cervical intraepithelial neoplasia (CIN) where CIN III is considered the same as carcinoma in situ (CIS) or Stage 0 cervical cancer (2). Host genetic factors play an important role for both the persistence of infection and progression to cancer (3). We have recently performed the first genome-wide association study (GWAS) of cervical cancer in a Swedish population and identified three independent susceptibility loci within the major histocompatibility complex (MHC) region at 6p21.3, in addition to the known associations with classic human leukocyte antigen (HLA) alleles/haplotypes including HLA-B*0702, DRB1*1501-DQB1*0602 and DRB1*1301-DQA1*0103-DQB1*0603 (4). However, these loci together only explain 3.4% of phenotype variance of cervical cancer.

Certain limitations should be noted in the single-single-nucleotide polymorphism (SNP) analysis applied in the GWAS. First, causal variants with moderate effects are likely to be missed given the stringent multiple testing correction (5). Secondly, the clinical manifestations of complex diseases usually arise from the interplay of multiple genetic and environmental risk factors. Identifying a small number of significant genes in GWAS may therefore be insufficient to delineate the pathogenesis of complex diseases (6). It is increasingly recognized that the pathway-based analysis, which jointly considers multiple variants in interacting or functionally related genes, may complement single-SNP analysis for interpreting GWA data of complex diseases and is likely to yield new insights into disease etiology (79). Pathway analysis groups genes that are biologically related and tests whether these gene groups are associated with the outcome. Although the effects of variation at many genes may be too small to detect in single-SNP analysis, they are likely to be detected from the joint effect of many weaker signals at genes with a shared biological function. Pathway analyses have been applied to the GWASs of several complex diseases, and resulted in the identification of some novel disease-susceptibility pathways (713).

To gain a better understanding of the genetic mechanisms involved in the development of cervical cancer and to identify key pathways in cervical carcinogenesis, we performed a two-stage pathway analysis in two independent GWASs of cervical cancer in the Swedish population using the SNP ratio test (SRT) (14). These two GWASs have been reported previously (4,13). The SRT uses both significant and non-significant SNPs within a pathway to construct a ratio and compares this ratio with a distribution of ratios based on GWAS results using randomized phenotypes. In Stage 1, 565 predefined pathways from Kyoto Encyclopedia of Genes and Genomes (KEGG) and BioCarta databases were screened in the discovery GWAS including 1034 cervical cancer patients and 3948 cancer-free control subjects with 632 668 SNPs. In Stage 2, the pathways with adjusted empirical P-values of ≤0.05 after correcting for multiple testing were validated in the replication GWAS including 616 cervical cancer patients and 506 cancer-free control subjects with 341 358 SNPs. The pathways associated with cervical cancer that were identified by the SRT were further validated by the set-based analysis in PLINK (15).

RESULTS

Genetic information and prior biological information used in pathway analysis

Of the total 632 668 genotyped SNPs that passed quality control (QC) steps in the discovery GWAS, 91 703 SNPs were mapped to 5489 genes within the 268 KEGG pathways and 70 846 SNPs were mapped to 1042 genes within the 297 BioCarta pathways. The association tests for the original dataset resulted in 3736 nominally significant SNPs with P ≤ 0.005 and 32 148 nominally significant SNPs with P ≤ 0.05 after adjusting for the genomic inflation factor of 1.03.

Results of pathway analysis using SRT method

The Psnp-value used here refers to the threshold denoting whether the original single SNP is statistically significant. For the analysis in this study, to identify significant pathways in different disease models, Psnp was set to 0.005 and 0.05, respectively. Choosing a stringent threshold (Psnp ≤ 0.005) to define ‘associated’ SNPs, tests a hypothesis that only highly associated SNPs are enriched in a pathway. Alternatively, if a pathway becomes significant using a less stringent cutoff (Psnp ≤ 0.05), it reflects a role for more variants with smaller effect. If a pathway is significant under both thresholds, it suggests that this pathway may be enriched for variants of small and large effect (14).

As summarized in Table 1, under the SNP significance threshold of Psnp ≤ 0.005, a total of 32 pathways were statistically significant in the discovery GWAS after adjustment for multiple testing (adjusted empirical P ≤ 0.05). Among them, 11 pathways were successfully replicated in the replication GWAS, including hsa04145 (phagosome, empirical P = 0.022), hsa04612 (antigen processing and presentation, empirical P = 9.98 × 10−4), hsa04672 (intestinal immune network for IgA production, empirical P = 0.020), hsa04940 (type 1 diabetes mellitus, empirical P = 9.98 × 10−4), hsa05150 (Staphylococcus aureus infection, empirical P = 8.98 × 10−4), hsa05310 (asthma, empirical P = 8.62 × 10−4), hsa05320 (autoimmune thyroid disease, empirical P = 0.025), hsa05330 (allograft rejection, empirical P = 0.015), hsa05332 (graft-versus-host disease, empirical P = 0.016), mhcPathway (antigen processing and presentation, empirical P = 0.036) and eosinophilsPathway (the role of eosinophils in the chemokine network of allergy, empirical P = 0.046). Notably, both the pathways representing the biological process ‘antigen processing and presentation’ from the KEGG and BioCarta databases, i.e. hsa04612 and mhcPathway showed statistically significant associations with risk of cervical cancer, although the gene contents of the two pathways are not exactly the same, indicating the important role of antigen processing and presentation in cervical cancer pathogenesis.

Table 1.

Summary of significant pathways in the discovery and replication studies with threshold of Psnp ≤ 0.005 for denoting significant SNPs

Pathway Description Discovery
 
Replication
 
Gene count SNP count No. of SNPs P ≤ 0.005a Pempb BHc Gene count SNP count No. of SNPs P ≤ 0.005a Pempb 
hsa04145 Phagosome 129 1556 59 4.99 × 10−5 0.002 99 689 13 0.022 
hsa04612 Antigen processing and presentation 58 449 55 4.99 × 10−5 0.002 38 133 9.98 × 10−4 
hsa04672 Intestinal immune network for IgA production 35 372 39 4.99 × 10−5 0.002 24 85 0.020 
hsa04940 Type 1 diabetes mellitus 33 601 45 4.99 × 10−5 0.002 26 207 11 9.98 × 10−4 
hsa05140 Leishmaniasis 55 664 41 4.99 × 10−5 0.002 41 282 0.127 
hsa05145 Toxoplasmosis 101 1494 46 4.99 × 10−5 0.002 81 737 0.431 
hsa05150 S. aureus infection 38 396 56 4.99 × 10−5 0.002 23 99 8.98 × 10−4 
hsa05164 Influenza A 146 1741 70 4.99 × 10−5 0.002 112 751 0.182 
hsa05168 Herpes simplex infection 154 1421 59 4.99 × 10−5 0.002 108 587 0.057 
hsa05169 Epstein–Barr virus infection 176 1980 64 4.99 × 10−5 0.002 125 878 0.548 
hsa05310 Asthma 20 197 39 4.99 × 10−5 0.002 14 45 8.62 × 10−4 
hsa05320 Autoimmune thyroid disease 37 390 43 4.99 × 10−5 0.002 25 105 0.025 
hsa05323 Rheumatoid arthritis 68 860 41 4.99 × 10−5 0.002 51 327 0.097 
hsa05330 Allograft rejection 26 279 43 4.99 × 10−5 0.002 20 68 0.015 
hsa05332 Graft-versus-host disease 31 281 43 4.99 × 10−5 0.002 21 68 0.016 
blymphocytePathway B lymphocyte cell-surface molecules 119 19 9.98 × 10−5 0.003 26 0.102 
hsa05152 Tuberculosis 149 1582 48 9.98 × 10−5 0.003 111 744 0.341 
hsa05416 Viral myocarditis 64 1725 50 9.98 × 10−5 0.003 51 734 0.132 
hsa05322 Systemic lupus erythematosus 88 844 43 1.50 × 10−4 0.004 43 372 0.180 
hsa05166 HTLV-I infection 219 2673 52 2.00 × 10−4 0.006 161 1168 0.593 
mhcPathway Antigen processing and presentation 40 18 2.49 × 10−4 0.006 0.036 
eosinophilsPathway The role of eosinophils in the chemokine network of allergy 48 18 2.49 × 10−4 0.006 12 0.046 
asbcellPathway Antigen-dependent B-cell activation 77 18 2.99 × 10−4 0.006 24 0.082 
bbcellPathway Bystander B-cell activation 53 18 2.99 × 10−4 0.006 19 0.062 
il5Pathway IL 5 signaling pathway 80 18 2.99 × 10−4 0.006 21 0.092 
inflamPathway Cytokines and inflammatory response 95 18 2.99 × 10−4 0.006 32 0.120 
tcraPathway Lck and Fyn tyrosine kinases in initiation of TCR activation 138 18 3.49 × 10−4 0.007 64 0.177 
CSKPathway Activation of Csk by cAMP-dependent protein kinase inhibits signaling through the T-cell receptor 17 224 18 4.99 × 10−4 0.009 12 67 0.247 
ctla4Pathway The co-stimulatory signal during T-cell activation 16 242 18 4.99 × 10−4 0.009 14 81 0.268 
th1th2Pathway Th1/Th2 differentiation 18 239 18 4.99 × 10−4 0.009 16 84 0.247 
hsa03040 Spliceosome 106 572 24 5.99 × 10−4 0.010 73 248 1.00 
hsa04514 Cell adhesion molecules 120 4297 59 0.002 0.044 98 2267 18 0.185 
Pathway Description Discovery
 
Replication
 
Gene count SNP count No. of SNPs P ≤ 0.005a Pempb BHc Gene count SNP count No. of SNPs P ≤ 0.005a Pempb 
hsa04145 Phagosome 129 1556 59 4.99 × 10−5 0.002 99 689 13 0.022 
hsa04612 Antigen processing and presentation 58 449 55 4.99 × 10−5 0.002 38 133 9.98 × 10−4 
hsa04672 Intestinal immune network for IgA production 35 372 39 4.99 × 10−5 0.002 24 85 0.020 
hsa04940 Type 1 diabetes mellitus 33 601 45 4.99 × 10−5 0.002 26 207 11 9.98 × 10−4 
hsa05140 Leishmaniasis 55 664 41 4.99 × 10−5 0.002 41 282 0.127 
hsa05145 Toxoplasmosis 101 1494 46 4.99 × 10−5 0.002 81 737 0.431 
hsa05150 S. aureus infection 38 396 56 4.99 × 10−5 0.002 23 99 8.98 × 10−4 
hsa05164 Influenza A 146 1741 70 4.99 × 10−5 0.002 112 751 0.182 
hsa05168 Herpes simplex infection 154 1421 59 4.99 × 10−5 0.002 108 587 0.057 
hsa05169 Epstein–Barr virus infection 176 1980 64 4.99 × 10−5 0.002 125 878 0.548 
hsa05310 Asthma 20 197 39 4.99 × 10−5 0.002 14 45 8.62 × 10−4 
hsa05320 Autoimmune thyroid disease 37 390 43 4.99 × 10−5 0.002 25 105 0.025 
hsa05323 Rheumatoid arthritis 68 860 41 4.99 × 10−5 0.002 51 327 0.097 
hsa05330 Allograft rejection 26 279 43 4.99 × 10−5 0.002 20 68 0.015 
hsa05332 Graft-versus-host disease 31 281 43 4.99 × 10−5 0.002 21 68 0.016 
blymphocytePathway B lymphocyte cell-surface molecules 119 19 9.98 × 10−5 0.003 26 0.102 
hsa05152 Tuberculosis 149 1582 48 9.98 × 10−5 0.003 111 744 0.341 
hsa05416 Viral myocarditis 64 1725 50 9.98 × 10−5 0.003 51 734 0.132 
hsa05322 Systemic lupus erythematosus 88 844 43 1.50 × 10−4 0.004 43 372 0.180 
hsa05166 HTLV-I infection 219 2673 52 2.00 × 10−4 0.006 161 1168 0.593 
mhcPathway Antigen processing and presentation 40 18 2.49 × 10−4 0.006 0.036 
eosinophilsPathway The role of eosinophils in the chemokine network of allergy 48 18 2.49 × 10−4 0.006 12 0.046 
asbcellPathway Antigen-dependent B-cell activation 77 18 2.99 × 10−4 0.006 24 0.082 
bbcellPathway Bystander B-cell activation 53 18 2.99 × 10−4 0.006 19 0.062 
il5Pathway IL 5 signaling pathway 80 18 2.99 × 10−4 0.006 21 0.092 
inflamPathway Cytokines and inflammatory response 95 18 2.99 × 10−4 0.006 32 0.120 
tcraPathway Lck and Fyn tyrosine kinases in initiation of TCR activation 138 18 3.49 × 10−4 0.007 64 0.177 
CSKPathway Activation of Csk by cAMP-dependent protein kinase inhibits signaling through the T-cell receptor 17 224 18 4.99 × 10−4 0.009 12 67 0.247 
ctla4Pathway The co-stimulatory signal during T-cell activation 16 242 18 4.99 × 10−4 0.009 14 81 0.268 
th1th2Pathway Th1/Th2 differentiation 18 239 18 4.99 × 10−4 0.009 16 84 0.247 
hsa03040 Spliceosome 106 572 24 5.99 × 10−4 0.010 73 248 1.00 
hsa04514 Cell adhesion molecules 120 4297 59 0.002 0.044 98 2267 18 0.185 

aDerived from unconditional logistic regression assuming a log-additive model adjusted by PCs significantly associated with case–control status (P < 0.05) and further corrected for inflation factor when necessary.

bEmpirical P-value determined by the SRT based on 20 048 alternative phenotype simulations with cutoff of Psnp ≤ 0.005 for denoting significant SNPs.

cAdjusted empirical P-values were calculated using the Benjamini and Hochberg (55) step-up FDR controlling procedure (independent and positive regression-dependent test statistics) for multiple testing correction. The bold values correspond to the pathways that were statistically significant in both the discovery and replication GWASs.

Using a less stringent SNP significance threshold (Psnp ≤ 0.05), hsa05168 (herpes simplex infection) was found to be significant in both the discovery and replication studies (empirical P = 4.99 × 10−5 and 0.009, respectively), suggesting that more variants of smaller effect are enriched in this pathway. It is worth noting that pathways of hsa04612, hsa04940, hsa05150, hsa04145, mhcPathway and hsa05310 were consistently significant in both stages under the SNP significance threshold of both Psnp ≤ 0.005 and 0.05 (Table 2), suggesting that these pathways are enriched for variants of small and large effect. Alternatively, a small number of variants of larger effect are enriched in the pathways of hsa04672, hsa05320, hsa05330, hsa05332 and eosinophilsPathway, which were only significant under the SNP significance threshold of Psnp ≤ 0.005. In total, 12 significant pathways were identified under the SNP significance threshold of Psnp ≤ 0.005 and 0.05. All the 12 pathways are shown in Supplementary Material, Figs S1–S12. The representative SNPs associated with risk of cervical cancer at P ≤ 0.005 for each gene in the discovery and replication studies are shown in Supplementary Material, Tables S1 and S2, respectively.

Table 2.

Summary of significant pathways in the discovery and replication studies with threshold of Psnp ≤ 0.05 for denoting significant SNPs

Pathway Description Discovery
 
Replication
 
Gene count SNP count No. of SNPs P ≤ 0.05 a Pempb BHc Gene count SNP count No. of SNPs P ≤ 0.05 a Pempb 
hsa04612 Antigen processing and presentation 58 449 104 4.99 × 10−5 0.007 38 133 15 0.035 
hsa05168 Herpes simplex infection 154 1421 171 4.99 × 10−5 0.007 108 587 58 0.009 
hsa05330 Allograft rejection 31 279 64 4.99 × 10−5 0.007 20 68 0.055 
hsa05332 Graft-versus-host disease 64 281 64 4.99 × 10−5 0.007 21 68 0.056 
hsa04940 Type 1 diabetes mellitus 33 601 88 9.98 × 10−5 0.008 26 207 29 0.005 
hsa05150 S. aureus infection 38 396 74 9.98 × 10−5 0.008 23 99 15 0.011 
hsa05169 Epstein–Barr virus infection 176 1980 176 9.98 × 10−5 0.008 125 878 44 0.505 
bbcellPathway Bystander B-cell activation 53 25 1.50 × 10−4 0.011 19 0.122 
hsa04145 Phagosome 129 1556 150 3.49 × 10−4 0.022 99 689 54 0.058 
mhcPathway Antigen processing and presentation 40 21 4.49 × 10−4 0.023 0.042 
asbcellPathway Antigen-dependent B-cell activation 77 26 4.99 × 10−4 0.023 24 0.162 
hsa05320 Autoimmune thyroid disease 37 390 65 4.99 × 10−4 0.023 25 105 10 0.113 
hsa04672 Intestinal immune network for IgA production 35 372 60 7.98 × 10−4 0.032 24 85 0.139 
hsa05310 Asthma 20 197 49 7.98 × 10−4 0.032 14 45 10 0.011 
hsa05152 Tuberculosis 149 1582 139 0.001 0.039 111 744 33 0.653 
Pathway Description Discovery
 
Replication
 
Gene count SNP count No. of SNPs P ≤ 0.05 a Pempb BHc Gene count SNP count No. of SNPs P ≤ 0.05 a Pempb 
hsa04612 Antigen processing and presentation 58 449 104 4.99 × 10−5 0.007 38 133 15 0.035 
hsa05168 Herpes simplex infection 154 1421 171 4.99 × 10−5 0.007 108 587 58 0.009 
hsa05330 Allograft rejection 31 279 64 4.99 × 10−5 0.007 20 68 0.055 
hsa05332 Graft-versus-host disease 64 281 64 4.99 × 10−5 0.007 21 68 0.056 
hsa04940 Type 1 diabetes mellitus 33 601 88 9.98 × 10−5 0.008 26 207 29 0.005 
hsa05150 S. aureus infection 38 396 74 9.98 × 10−5 0.008 23 99 15 0.011 
hsa05169 Epstein–Barr virus infection 176 1980 176 9.98 × 10−5 0.008 125 878 44 0.505 
bbcellPathway Bystander B-cell activation 53 25 1.50 × 10−4 0.011 19 0.122 
hsa04145 Phagosome 129 1556 150 3.49 × 10−4 0.022 99 689 54 0.058 
mhcPathway Antigen processing and presentation 40 21 4.49 × 10−4 0.023 0.042 
asbcellPathway Antigen-dependent B-cell activation 77 26 4.99 × 10−4 0.023 24 0.162 
hsa05320 Autoimmune thyroid disease 37 390 65 4.99 × 10−4 0.023 25 105 10 0.113 
hsa04672 Intestinal immune network for IgA production 35 372 60 7.98 × 10−4 0.032 24 85 0.139 
hsa05310 Asthma 20 197 49 7.98 × 10−4 0.032 14 45 10 0.011 
hsa05152 Tuberculosis 149 1582 139 0.001 0.039 111 744 33 0.653 

aDerived from unconditional logistic regression assuming a log-additive model adjusted by PCs significantly associated with case–control status (P < 0.05) and further corrected for inflation factor when necessary.

bEmpirical P-value determined by the SRT based on 20 048 alternative phenotype simulations with cutoff of Psnp ≤ 0.05 for denoting significant SNPs.

cAdjusted empirical P-values were calculated using the Benjamini and Hochberg (55) step-up FDR controlling procedure (independent and positive regression-dependent test statistics) for multiple testing correction. The bold values correspond to the pathways that were statistically significant in both the discovery and replication GWASs.

The discovery and replication GWASs were genotyped using two different SNP genotyping platforms, i.e. Illumina Omni Express for the discovery GWAS and Affymetrix Genome-wide Human SNP arrays 5.0 for the replication GWAS, respectively, and there are only 88 133 overlapping SNPs between these two studies. SNPs that were genotyped only in one study but not in the other were imputed using the 1000 Genomes Project data as a scaffold (see Materials and Methods). As the SRT can only use the raw genotype data instead of SNP P-values, the two GWASs containing imputed SNP data that passed QC were merged to perform a joint analysis of the 12 identified pathways instead of meta-analysis. In the combined dataset of the discovery and replication GWASs, most of the 12 identified pathways were statistically significant except for eosinophilsPathway (the role of eosinophils in the chemokine network of allergy) which was borderline significant (empirical P = 0.059) under the SNP significance threshold of Psnp ≤ 0.005, as well as hsa04940 (type 1 diabetes mellitus, empirical P = 0.175) and mhcPathway (antigen processing and presentation, empirical P = 0.212) under the SNP significance threshold of Psnp ≤ 0.05 (Supplementary Material, Table S3). However, hsa04940 (type 1 diabetes mellitus) and mhcPathway (antigen processing and presentation) were statistically significant under the SNP significance threshold of Psnp ≤ 0.005 (empirical P = 0.045 and 0.025, respectively).

Results of pathway analysis using set-based analysis

The 12 pathways associated with risk of cervical cancer that were identified by the SRT were further validated using the set-based analysis in PLINK in both the discovery and replication GWASs (15). As summarized in Table 3, comparable results were obtained using the set-based analysis, except that the pathways of hsa04940 (type 1 diabetes mellitus) and hsa05150 (S. aureus infection) were borderline significant in the replication study under the SNP significance threshold of Psnp ≤ 0.005 (P = 0.058 and 0.059, respectively), suggesting that SRT has better sensitivity in identifying pathways associated with cervical cancer where a small number of variants of larger effect are enriched. Nevertheless, these two pathways were statistically significant in both the discovery and replication studies under the SNP significance threshold of Psnp ≤ 0.05, indicating that more variants with smaller effect in these pathways are involved in the pathogenesis of cervical cancer.

Table 3.

Validation of significant pathways identified by the SRT using set-based analysis

Pathway Description Discovery
 
Replication
 
Gene count SNP count NSIG PPsnpa ISIG PPsnpb Pempc BHd Gene count SNP count NSIG PPsnpa ISIG PPsnpb Pempc 
 SNP significance threshold Psnp ≤ 0.005            
hsa04145 Phagosome 129 1556 59 30 0.015 0.016 99 689 13 11 0.050 
hsa04612 Antigen processing and presentation 58 449 55 26 0.016 0.016 38 133 0.013 
hsa04672 Intestinal immune network for IgA production 35 372 39 14 0.006 0.012 24 85 0.028 
hsa04940 Type 1 diabetes mellitus 33 601 45 21 0.012 0.016 26 207 11 0.058 
hsa05150 S. aureus infection 38 396 56 22 0.016 0.016 23 99 0.059 
hsa05310 Asthma 20 197 39 14 0.004 0.012 14 45 0.016 
hsa05320 Autoimmune thyroid disease 37 390 43 17 0.007 0.012 25 105 0.036 
hsa05330 Allograft rejection 26 279 43 17 0.007 0.012 20 68 0.025 
hsa05332 Graft-versus-host disease 31 281 43 17 0.008 0.012 21 68 0.024 
mhcPathway Antigen processing and presentation 40 18 1.50 × 10−4 8.23 × 10−4 7.50 × 10−4 
eosinophilsPathway The role of eosinophils in the chemokine network of allergy 48 18 4.99 × 10−5 5.49 × 10−4 12 0.001 
 SNP significance threshold Psnp ≤ 0.05            
hsa04612 Antigen processing and presentation 58 449 104 56 4.99 × 10−5 8.73 × 10−5 38 133 15 14 0.002 
hsa05168 Herpes simplex infection 154 1421 171 95 4.99 × 10−5 8.73 × 10−5 108 587 58 33 0.003 
hsa04940 Type 1 diabetes mellitus 33 601 88 48 9.98 × 10−5 1.40 × 10−4 26 207 29 15 0.021 
hsa05150 S. aureus infection 38 396 74 38 4.99 × 10−5 8.73 × 10−5 23 99 15 12 0.022 
hsa04145 Phagosome 129 1556 150 89 4.99 × 10−5 8.73 × 10−5 99 689 54 37 0.001 
mhcPathway Antigen processing and presentation 40 21 2.49 × 10−4 2.91 × 10−4 0.014 
hsa05310 Asthma 20 197 49 23 4.49 × 10−4 4.49 × 10−4 14 45 10 0.020 
Pathway Description Discovery
 
Replication
 
Gene count SNP count NSIG PPsnpa ISIG PPsnpb Pempc BHd Gene count SNP count NSIG PPsnpa ISIG PPsnpb Pempc 
 SNP significance threshold Psnp ≤ 0.005            
hsa04145 Phagosome 129 1556 59 30 0.015 0.016 99 689 13 11 0.050 
hsa04612 Antigen processing and presentation 58 449 55 26 0.016 0.016 38 133 0.013 
hsa04672 Intestinal immune network for IgA production 35 372 39 14 0.006 0.012 24 85 0.028 
hsa04940 Type 1 diabetes mellitus 33 601 45 21 0.012 0.016 26 207 11 0.058 
hsa05150 S. aureus infection 38 396 56 22 0.016 0.016 23 99 0.059 
hsa05310 Asthma 20 197 39 14 0.004 0.012 14 45 0.016 
hsa05320 Autoimmune thyroid disease 37 390 43 17 0.007 0.012 25 105 0.036 
hsa05330 Allograft rejection 26 279 43 17 0.007 0.012 20 68 0.025 
hsa05332 Graft-versus-host disease 31 281 43 17 0.008 0.012 21 68 0.024 
mhcPathway Antigen processing and presentation 40 18 1.50 × 10−4 8.23 × 10−4 7.50 × 10−4 
eosinophilsPathway The role of eosinophils in the chemokine network of allergy 48 18 4.99 × 10−5 5.49 × 10−4 12 0.001 
 SNP significance threshold Psnp ≤ 0.05            
hsa04612 Antigen processing and presentation 58 449 104 56 4.99 × 10−5 8.73 × 10−5 38 133 15 14 0.002 
hsa05168 Herpes simplex infection 154 1421 171 95 4.99 × 10−5 8.73 × 10−5 108 587 58 33 0.003 
hsa04940 Type 1 diabetes mellitus 33 601 88 48 9.98 × 10−5 1.40 × 10−4 26 207 29 15 0.021 
hsa05150 S. aureus infection 38 396 74 38 4.99 × 10−5 8.73 × 10−5 23 99 15 12 0.022 
hsa04145 Phagosome 129 1556 150 89 4.99 × 10−5 8.73 × 10−5 99 689 54 37 0.001 
mhcPathway Antigen processing and presentation 40 21 2.49 × 10−4 2.91 × 10−4 0.014 
hsa05310 Asthma 20 197 49 23 4.49 × 10−4 4.49 × 10−4 14 45 10 0.020 

aNumber of significant SNPs with below P, which is the threshold for denoting significant SNPs. Association results were derived from unconditional logistic regression assuming a log-additive model adjusted by PCs significantly associated with case–control status (P < 0.05).

bNumber of independently significant SNPs based on an r2 threshold of 0.6.

cEmpirical P-value determined by the set-based test in PLINK based on 20 048 alternative phenotype simulations with N independent SNPs at an r2 threshold of 0.6 and cutoff of Psnp for denoting significant SNPs.

dAdjusted empirical P-values were calculated using the Benjamini and Hochberg (5,5) step-up FDR controlling procedure (independent and positive regression-dependent test statistics) for multiple testing correction. The bold values correspond to the pathways that were statistically significant in both the discovery and replication GWASs.

Sensitivity analysis by removing the extended MHC region

All the 12 significant pathways identified in the present study are related to the immune system, which harbor HLA genes along with a number of other genes located within and outside the extended MHC region on chromosome 6 (Supplementary Material, Tables S1 and S2). Ten genes with significant signals contribute to more than two pathways, all of which are from the extended MHC region, including HLA-DRA, HLA-DRB1, HLA-DQA2, HLA-DQB1, HLA-DOB, HLA-DPA1, HLA-DPB1, HLA-A, HLA-B and transporter 2, ATP-binding cassette, sub-family B (TAP2) (Supplementary Material, Table S4). Given the complex linkage disequilibrium (LD) pattern that extends across multiple HLA and non-HLA genes in the extended MHC region, the effects of these genes are not independent.

In order to evaluate the impact of the genetic variation in the extended MHC region on pathway association and to identify genetic factors outside it, SNPs in the extended MHC region (from 25 640 149 to 33 368 333 bp on chromosome 6) were removed from the pathway analysis for sensitivity analysis. This includes the extended Class I, Class I, Class III, Class II and extended Class II region from the telomeric to centromeric end of 6p21. The analysis using the SRT performed on the non-MHC data showed that the pathways hsa05150 (S. aureus infection) and hsa05168 (herpes simplex infection) remained statistically significant in both studies (empirical P = 6.48 × 10−4 and 0.036 in the discovery study; empirical P = 0.013 and 0.032 in the replication study, respectively) under the SNP significance threshold of Psnp ≤ 0.005 and 0.05, respectively (Table 4). For the pathway ‘S. aureus infection’, the association signals are derived from a small number of variants with large effect in four genes in the discovery study; complement factor I (CFI), complement component 1s subcomponent (C1S), formyl peptide receptor 3 (FPR3) and integrin beta 2 (ITGB2), as well as two genes in the replication study; mannan-binding lectin serine protease 1 (MASP1) and selectin P ligand (SELPLG) (Table 5). On the other hand, the association with the pathway ‘herpes simplex infection’ is attributable to more variants of small effect in 31 genes in the discovery study and 15 genes in the replication study, with serine/arginine-rich splicing factor 6 (SRSF6), cullin 1 (CUL1), eukaryotic translation initiation factor 2-alpha kinase 2 (EIF2AK2), eukaryotic translation initiation factor 2-alpha kinase 4 (EIF2AK4), mitogen-activated protein kinase 10 (MAPK10), POU class 2 homeobox 3 (POU2F3), 2′-5′-oligoadenylate synthetase 3 (OAS3) and TATA box-binding protein-associated factor 3 (TAF3) showing the strongest effects (Table 6). No statistically significant associations were observed for the other 10 pathways after removing SNPs in the extended MHC region (Table 4), suggesting that the associations with these pathways are exclusively driven by genetic variation in genes within the extended MHC region.

Table 4.

Summary of significant pathways in the discovery and replication studies after removing SNPs in the extended MHC region

Pathway Description Discovery
 
Replication
 
Gene count SNP count No. of SNPs PPsnpa Pempb Gene count SNP count No. of SNPs P < Psnpa Pempb 
 SNP significance threshold Psnp ≤ 0.005         
hsa04145 Phagosome 108 1209 11 0.125 84 638 0.186 
hsa04612 Antigen processing and presentation 36 163 0.098 24 85 0.265 
hsa04672 Intestinal immune network for IgA production 24 221 1.00 15 48 1.0 
hsa04940 Type 1 diabetes mellitus 16 402 0.283 14 166 0.011 
hsa05150 S. aureus infection 26 229 12 6.48 × 10−4 13 60 0.013 
hsa05310 Asthma 46 1.00 1.0 
hsa05320 Autoimmune thyroid disease 20 201 1.00 14 65 1.0 
hsa05330 Allograft rejection 93 1.00 28 1.0 
hsa05332 Graft-versus-host disease 14 95 1.00 11 28 1.0 
mhcPathway Antigen processing and presentation 1.00 1.0 
eosinophilsPathway The role of eosinophils in the chemokine network of allergy 15 1.00 1.0 
 SNP significance threshold Psnp ≤ 0.05         
hsa04612 Antigen processing and presentation 36 163 11 0.257 24 85 0.408 
hsa05168 Herpes simplex infection 130 1088 78 0.036 94 539 48 0.032 
hsa04940 Type 1 diabetes mellitus 16 402 27 0.163 14 166 22 0.016 
hsa05150 S. aureus infection 26 229 21 0.052 13 60 0.064 
hsa04145 Phagosome 1209 66 0.340 84 638 44 0.144 
mhcPathway Antigen processing and presentation 0.165 1.00 
hsa05310 Asthma 46 0.788 0.045 
Pathway Description Discovery
 
Replication
 
Gene count SNP count No. of SNPs PPsnpa Pempb Gene count SNP count No. of SNPs P < Psnpa Pempb 
 SNP significance threshold Psnp ≤ 0.005         
hsa04145 Phagosome 108 1209 11 0.125 84 638 0.186 
hsa04612 Antigen processing and presentation 36 163 0.098 24 85 0.265 
hsa04672 Intestinal immune network for IgA production 24 221 1.00 15 48 1.0 
hsa04940 Type 1 diabetes mellitus 16 402 0.283 14 166 0.011 
hsa05150 S. aureus infection 26 229 12 6.48 × 10−4 13 60 0.013 
hsa05310 Asthma 46 1.00 1.0 
hsa05320 Autoimmune thyroid disease 20 201 1.00 14 65 1.0 
hsa05330 Allograft rejection 93 1.00 28 1.0 
hsa05332 Graft-versus-host disease 14 95 1.00 11 28 1.0 
mhcPathway Antigen processing and presentation 1.00 1.0 
eosinophilsPathway The role of eosinophils in the chemokine network of allergy 15 1.00 1.0 
 SNP significance threshold Psnp ≤ 0.05         
hsa04612 Antigen processing and presentation 36 163 11 0.257 24 85 0.408 
hsa05168 Herpes simplex infection 130 1088 78 0.036 94 539 48 0.032 
hsa04940 Type 1 diabetes mellitus 16 402 27 0.163 14 166 22 0.016 
hsa05150 S. aureus infection 26 229 21 0.052 13 60 0.064 
hsa04145 Phagosome 1209 66 0.340 84 638 44 0.144 
mhcPathway Antigen processing and presentation 0.165 1.00 
hsa05310 Asthma 46 0.788 0.045 

aPsnp is the threshold for denoting significant SNPs. Association results were derived from unconditional logistic regression assuming a log-additive model adjusted by PCs significantly associated with case–control status (P < 0.05) and further corrected for inflation factor when necessary.

bEmpirical P-value determined by the SRT based on 20 048 alternative phenotype simulations with cutoff of Psnp for denoting significant SNPs. The bold values correspond to the pathways that were statistically significant in both the discovery and replication GWASs.

Table 5.

Association results of representative SNPs at P ≤ 0.005 for genes outside the extended MHC region in hsa05150 (S. aureus infection) pathway in the discovery and replication studies

Gene SNP Chr. Positiona Location A1b A2c OR 95% CI P 
Discovery studyd 
CFI rs2298749 110 681 505 Exonic 1.21 1.09–1.36 6.53 × 10−4 
CFI rs3822272 110 683 550 Intronic 1.21 1.09–1.36 6.68 × 10−4 
CFI rs4626205 110 708 517 Intronic 1.21 1.08–1.35 7.02 × 10−4 
CFI rs17610314 110 682 953 Intronic 1.21 1.08–1.36 8.05 × 10−4 
CFI rs7439356 110 684 744 Intronic 1.20 1.08–1.35 0.001 
CFI rs7438961 110 668 651 Intronic 1.17 1.05–1.29 0.003 
CFI rs4541508 110 682 487 Intronic 1.16 1.05–1.28 0.004 
CFI rs7675460 110 720 588 Intronic 1.17 1.06–1.29 0.002 
C1S rs1143664 12 7 175 047 Exonic 1.29 1.09–1.53 0.003 
C1S rs7965055 12 7 171 620 Exonic 1.28 1.08–1.52 0.004 
FPR3 rs12459754 19 52 306 424 Intronic 1.37 1.12–1.68 0.002 
ITGB2 rs170963 21 46 318 146 Intronic 0.86 0.77–0.96 0.005 
Replication studye 
MASP1 rs698104 187 002 517 Intronic 1.55 1.17–2.06 0.002 
MASP1 rs698097 186 987 735 Intronic 1.51 1.16–1.97 0.003 
MASP1 rs879537 186 996 095 Intronic 1.49 1.13–1.95 0.004 
SELPLG rs8179137 12 109 019 691 Intronic 1.40 1.11–1.77 0.004 
Gene SNP Chr. Positiona Location A1b A2c OR 95% CI P 
Discovery studyd 
CFI rs2298749 110 681 505 Exonic 1.21 1.09–1.36 6.53 × 10−4 
CFI rs3822272 110 683 550 Intronic 1.21 1.09–1.36 6.68 × 10−4 
CFI rs4626205 110 708 517 Intronic 1.21 1.08–1.35 7.02 × 10−4 
CFI rs17610314 110 682 953 Intronic 1.21 1.08–1.36 8.05 × 10−4 
CFI rs7439356 110 684 744 Intronic 1.20 1.08–1.35 0.001 
CFI rs7438961 110 668 651 Intronic 1.17 1.05–1.29 0.003 
CFI rs4541508 110 682 487 Intronic 1.16 1.05–1.28 0.004 
CFI rs7675460 110 720 588 Intronic 1.17 1.06–1.29 0.002 
C1S rs1143664 12 7 175 047 Exonic 1.29 1.09–1.53 0.003 
C1S rs7965055 12 7 171 620 Exonic 1.28 1.08–1.52 0.004 
FPR3 rs12459754 19 52 306 424 Intronic 1.37 1.12–1.68 0.002 
ITGB2 rs170963 21 46 318 146 Intronic 0.86 0.77–0.96 0.005 
Replication studye 
MASP1 rs698104 187 002 517 Intronic 1.55 1.17–2.06 0.002 
MASP1 rs698097 186 987 735 Intronic 1.51 1.16–1.97 0.003 
MASP1 rs879537 186 996 095 Intronic 1.49 1.13–1.95 0.004 
SELPLG rs8179137 12 109 019 691 Intronic 1.40 1.11–1.77 0.004 

Chr., chromosome; OR, odds ratio; CI, confidence interval; P, two-sided P-value corresponding to the OR.

aBuild 37.2 (GRCh37/hg19) Assembly.

bMinor allele.

cMajor allele.

dAssociation results were derived from unconditional logistic regression for minor allele assuming a log-additive model corrected for the observed inflation factor (λ = 1.03). None of the PCs generated from PCA was significantly associated with case–control status (all P > 0.05) hence not included in the final model.

eAssociation results were derived from unconditional logistic regression for minor allele assuming a log-additive model with adjustment of three PCs generated from PCA which were significantly associated with case–control status (P < 0.05) with λ = 1.00

Table 6.

Association results of representative SNPs at P ≤ 0.05 for genes outside the extended MHC region in hsa05168 (herpes simplex infection) pathway in the discovery and replication studies

Gene SNP Chr. Positiona Location A1b A2c OR 95% CI P 
Discovery studyd 
SRSF6 rs6103330 20 42 087 454 intronic 1.33 1.11–1.59 0.002 
SRSF6 rs2235611 20 42 089 511 Exonic 1.26 1.09–1.46 0.002 
CUL1 rs11760399 148 422 581 Intronic 1.37 1.11–1.69 0.003 
CUL1 rs10271133 148 480 990 Intronic 0.85 0.75–0.96 0.008 
CUL1 rs11764570 148 444 355 Intronic 1.29 1.01–1.65 0.042 
EIF2AK2 rs4648202 37 363 107 Intronic 0.76 0.63–0.91 0.003 
EIF2AK2 rs2307472 37 376 247 5′UTR 0.73 0.58–0.93 0.011 
EIF2AK2 rs11892108 37 385 451 Upstream 0.74 0.58–0.93 0.012 
EIF2AK2 rs3770768 37 346 151 Intronic 0.85 0.73–0.98 0.026 
EIF2AK2 rs4648186 37 367 107 Intronic 0.85 0.74–0.98 0.030 
EIF2AK2 rs4233920 37 336 720 Intronic 0.80 0.65–0.99 0.036 
EIF2AK4 rs2291623 15 40 299 016 Intronic 1.54 1.17–2.02 0.002 
EIF2AK4 rs12901892 15 40 286 276 Intronic 1.53 1.16–2.01 0.002 
EIF2AK4 rs4432245 15 40 324 481 Intronic 1.45 1.14–1.85 0.003 
EIF2AK4 rs16970202 15 40 317 809 Intronic 1.48 1.14–1.92 0.003 
EIF2AK4 rs4594236 15 40 327 290 3′UTR 1.44 1.13–1.83 0.003 
MAPK10 rs5006575 87 037 379 Intronic 1.23 1.07–1.42 0.003 
MAPK10 rs12650052 87 138 738 Intronic 1.22 1.06–1.40 0.006 
MAPK10 rs2904090 87 132 771 Intronic 1.22 1.06–1.40 0.006 
MAPK10 rs12640395 87 023 394 Intronic 1.19 1.04–1.36 0.009 
MAPK10 rs7678480 87 087 320 Intronic 1.20 1.05–1.38 0.010 
MAPK10 rs17011655 87 197 433 Intronic 1.23 1.05–1.44 0.010 
MAPK10 rs10010011 87 139 714 Intronic 1.18 1.04–1.35 0.011 
MAPK10 rs2035058 87 185 006 Intronic 1.22 1.04–1.43 0.013 
MAPK10 rs1460761 87 096 662 Intronic 1.18 1.03–1.35 0.015 
MAPK10 rs4403040 87 038 481 Intronic 1.12 1.01–1.23 0.032 
MAPK10 rs6822478 87 076 360 Intronic 1.17 1.01–1.34 0.034 
MAPK10 rs1599313 87 135 124 Intronic 1.13 1.01–1.27 0.041 
MAPK10 rs1903360 87 095 840 Intronic 1.13 1.01–1.28 0.045 
POU2F3 rs2847502 11 120 118 498 Intronic 0.86 0.78–0.96 0.009 
POU2F3 rs2096884 11 120 173 354 Intronic 0.86 0.74–0.98 0.029 
POU2F3 rs1941411 11 120 130 512 Intronic 0.89 0.80–0.99 0.045 
C1QBP rs8072363 17 5 335 752 Intronic 1.13 1.01–1.27 0.043 
CD74 rs2748249 149 792 600 Upstream 0.84 0.74–0.96 0.011 
CD74 rs7709772 149 792 109 Intronic 1.25 1.01–1.56 0.047 
CLOCK rs1801260 56 301 369 3′UTR 1.12 1.01–1.25 0.039 
CLOCK rs17722979 56 346 411 Intronic 1.12 1.01–1.25 0.045 
CSNK2A2 rs1393203 16 58 207 203 Intronic 1.22 1.01–1.48 0.039 
CSNK2A2 rs1875522 16 58 202 107 Intronic 1.22 1.01–1.48 0.044 
EP300 rs9611502 22 41 531 236 Intronic 0.76 0.58–0.99 0.044 
GLTSCR2 rs7258254 19 48 248 529 Upstream 1.62 1.11–2.36 0.012 
IFIT1 rs215493 10 91 164 170 3′UTR 0.89 0.81–0.98 0.018 
IFIT1 rs304478 10 91 150 922 Upstream 1.11 1.01–1.22 0.032 
IFIT1 rs303217 10 91 153 699 Intronic 1.11 1.01–1.22 0.036 
IFIT1B rs304504 10 91 137 826 5′UTR 1.11 1.00–1.22 0.040 
IFNAR1 rs12483293 21 34 716 789 Intronic 0.89 0.80–0.99 0.045 
IFNAR2 rs2248412 21 34 605 531 Intronic 0.85 0.74–0.98 0.023 
IFNGR2 rs2012075 21 34 796 762 intronic 1.15 1.01–1.32 0.046 
JAK1 rs2780900 65 318 514 Intronic 0.82 0.70–0.97 0.019 
JAK1 rs310244 65 305 877 Intronic 0.87 0.76–0.98 0.021 
JAK1 rs310216 65 316 675 Intronic 0.88 0.79–0.99 0.040 
JAK1 rs310219 65 315 298 Intronic 0.89 0.79–0.99 0.041 
JAK1 rs2780816 65 302 413 Intronic 0.89 0.79–0.99 0.042 
NFKB1 rs3774934 103 427 476 Intronic 0.82 0.69–0.79 0.024 
NFKBIB rs3136642 19 39 398 416 Intronic 1.11 1.01–1.23 0.042 
PML rs11072468 15 74 312 794 Intronic 1.13 1.01–1.26 0.029 
PML rs9479 15 74 328 576 3′UTR 1.11 0.97–1.26 0.032 
RELA rs11227248 11 65 428 178 Intronic 1.12 1.01–1.24 0.025 
RELA rs10896027 11 65 420 760 Downstream 1.11 1.01–1.23 0.038 
SP100 rs6436943 231 357 373 Intronic 0.87 0.75–0.99 0.048 
SRSF4 rs2230679 29 475 648 Exonic 1.21 1.05–1.39 0.010 
SRSF4 rs547976 29 503 214 Intronic 0.86 0.77–0.97 0.011 
SRSF4 rs2230678 29 475 341 Exonic 1.14 1.02–1.28 0.023 
SRSF4 rs2230677 29 475 394 Exonic 1.14 1.02–1.28 0.023 
SRSF8 rs12627 11 94 802 620 3′UTR 1.13 1.02–1.24 0.016 
SRSF8 rs1056986 11 94 802 318 3′UTR 1.11 1.01–1.23 0.034 
TAB1 rs4821893 22 39 797 779 Intronic 1.13 1.01–1.25 0.030 
TAB1 rs5750820 22 39 825 322 Intronic 1.12 1.01–1.24 0.040 
TAB1 rs5750824 22 39 830 123 Intronic 0.89 0.79–0.99 0.045 
TAF3 rs17143115 10 7 964 341 Intronic 1.31 1.01–1.70 0.039 
TAF4 rs1046137 20 60 550 082 3′UTR 0.74 0.56–0.99 0.042 
TAF5L rs3753886 229 738 170 Exonic 0.91 0.82–0.99 0.046 
TBPL2 rs8010013 14 55 884 828 Intronic 1.13 1.03–1.25 0.012 
TBPL2 rs8014621 14 55 906 849 Intronic 1.13 1.03–1.25 0.012 
TBPL2 rs1953740 14 55 886 717 Intronic 0.89 0.80–0.98 0.024 
TBPL2 rs8019209 14 55 891 329 Intronic 0.89 0.80–0.99 0.030 
TBPL2 rs7149317 14 55 902 542 Intronic 0.89 0.80–0.99 0.031 
TNFSF14 rs11878563 19 6 670 910 Upstream 1.12 1.01–1.23 0.026 
Replication studye 
EIF2AK4 rs12442713 15 38 068 058 Intronic 0.76 0.63–0.91 0.003 
EIF2AK4 rs2291626 15 38 077 998 Intronic 0.78 0.64–0.94 0.011 
EIF2AK4 rs8041785 15 38 039 033 Intronic 1.24 1.03–1.48 0.021 
EIF2AK4 rs566792 15 38 013 787 Exonic 0.75 0.59–0.96 0.024 
EIF2AK4 rs489508 15 38 033 071 Intronic 1.22 1.02–1.46 0.032 
OAS3 rs4766678 12 111 882 074 Intronic 0.77 0.64–0.93 0.008 
OAS3 rs2072136 12 111 883 302 Exonic 0.77 0.64–0.94 0.009 
TAF3 rs4749362 10 8 008 403 Intronic 1.32 1.10–1.59 0.003 
TAF3 rs12219539 10 7 990 905 Intronic 0.75 0.60–0.93 0.008 
TAF3 rs4749339 10 7 989 272 Intronic 0.75 0.61–0.93 0.010 
TAF3 rs11255418 10 7 953 263 Intronic 0.76 0.61–0.94 0.010 
TAF3 rs11255440 10 7 996 236 Intronic 0.76 0.62–0.94 0.011 
TAF3 rs4749317 10 7 948 903 Intronic 0.76 0.61–0.94 0.012 
TAF3 rs10795579 10 8 024 995 Intronic 1.26 1.05–1.51 0.014 
TAF3 rs4749477 10 8 085 428 Intronic 0.75 0.59–0.94 0.014 
TAF3 rs10752123 10 8 086 795 Intronic 1.24 1.03–1.48 0.023 
TAF3 rs11255456 10 8 032 624 intronic 1.24 1.03–1.49 0.024 
TAF3 rs1887582 10 8 043 339 Intronic 0.78 0.63–0.98 0.031 
TAF3 rs10752124 10 8 086 926 Intronic 1.22 1.02–1.47 0.032 
CLOCK rs6824955 56 011 893 Intronic 0.82 0.68–0.99 0.040 
CLOCK rs6812042 56 090 836 Intronic 0.82 0.68–0.99 0.041 
CLOCK rs4580704 56 021 464 Intronic 0.82 0.68–0.99 0.043 
CLOCK rs3805154 56 058 684 Intronic 0.82 0.68–0.99 0.043 
CLOCK rs9312662 56 081 128 Intronic 0.82 0.68–0.99 0.043 
CLOCK rs4865010 56 109 719 Upstream 0.82 0.68–0.99 0.044 
CSNK2A1 rs4815707 20 450 937 Intronic 1.24 1.03–1.49 0.025 
CSNK2A1 rs4815701 20 441 205 Intronic 1.22 1.01–1.46 0.037 
CUL1 rs243481 148 083 355 Intronic 1.24 1.03–1.48 0.022 
CUL1 rs2373414 148 101 252 Intronic 1.22 1.02–1.45 0.031 
CUL1 rs11773084 148 089 577 Intronic 1.22 1.02–1.45 0.032 
EIF2AK3 rs6739095 88 661 179 Intronic 0.82 0.68–0.99 0.041 
EIF2AK3 rs1913671 88 680 998 Intronic 0.83 0.69–1.00 0.050 
GTF2IRD1 rs37620 73 527 746 Intronic 0.77 0.62–0.95 0.017 
GTF2IRD1 rs37621 73 527 627 Intronic 0.79 0.64–0.99 0.043 
IKBKB rs4282553 42 269 519 Intronic 1.65 1.06–2.58 0.028 
IKBKB rs6474387 42 305 409 Intronic 1.42 1.01–2.00 0.042 
IKBKB rs6474388 42 306 311 Intronic 1.42 1.01–2.00 0.042 
POLR2A rs2071504 17 7 346 661 Exonic 0.79 0.64–0.99 0.042 
POLR2A rs11658168 17 7 346 858 Intronic 0.83 0.70–1.00 0.050 
POU2F3 rs17245504 11 119 683 963 Intronic 1.34 1.02–1.76 0.033 
PVRL2 rs8104483 19 50 064 194 Intronic 0.81 0.66–0.99 0.038 
SP100 rs890669 231 093 475 Intronic 0.83 0.69–0.99 0.043 
TRAF1 rs2239658 122 711 658 Intronic 0.80 0.67–0.96 0.016 
TRAF1 rs1930780 122 726 040 Intronic 0.80 0.67–0.96 0.016 
TRAF1 rs2239657 122 711 341 Exonic 0.82 0.68–0.98 0.030 
TRAF5 rs2175633 209 572 368 Intronic 1.74 1.07–2.84 0.027 
TRAF5 rs6698113 209 572 225 Intronic 1.67 1.02–2.74 0.040 
TRAF5 rs17267554 209 606 767 Intronic 0.75 0.57–0.99 0.042 
Gene SNP Chr. Positiona Location A1b A2c OR 95% CI P 
Discovery studyd 
SRSF6 rs6103330 20 42 087 454 intronic 1.33 1.11–1.59 0.002 
SRSF6 rs2235611 20 42 089 511 Exonic 1.26 1.09–1.46 0.002 
CUL1 rs11760399 148 422 581 Intronic 1.37 1.11–1.69 0.003 
CUL1 rs10271133 148 480 990 Intronic 0.85 0.75–0.96 0.008 
CUL1 rs11764570 148 444 355 Intronic 1.29 1.01–1.65 0.042 
EIF2AK2 rs4648202 37 363 107 Intronic 0.76 0.63–0.91 0.003 
EIF2AK2 rs2307472 37 376 247 5′UTR 0.73 0.58–0.93 0.011 
EIF2AK2 rs11892108 37 385 451 Upstream 0.74 0.58–0.93 0.012 
EIF2AK2 rs3770768 37 346 151 Intronic 0.85 0.73–0.98 0.026 
EIF2AK2 rs4648186 37 367 107 Intronic 0.85 0.74–0.98 0.030 
EIF2AK2 rs4233920 37 336 720 Intronic 0.80 0.65–0.99 0.036 
EIF2AK4 rs2291623 15 40 299 016 Intronic 1.54 1.17–2.02 0.002 
EIF2AK4 rs12901892 15 40 286 276 Intronic 1.53 1.16–2.01 0.002 
EIF2AK4 rs4432245 15 40 324 481 Intronic 1.45 1.14–1.85 0.003 
EIF2AK4 rs16970202 15 40 317 809 Intronic 1.48 1.14–1.92 0.003 
EIF2AK4 rs4594236 15 40 327 290 3′UTR 1.44 1.13–1.83 0.003 
MAPK10 rs5006575 87 037 379 Intronic 1.23 1.07–1.42 0.003 
MAPK10 rs12650052 87 138 738 Intronic 1.22 1.06–1.40 0.006 
MAPK10 rs2904090 87 132 771 Intronic 1.22 1.06–1.40 0.006 
MAPK10 rs12640395 87 023 394 Intronic 1.19 1.04–1.36 0.009 
MAPK10 rs7678480 87 087 320 Intronic 1.20 1.05–1.38 0.010 
MAPK10 rs17011655 87 197 433 Intronic 1.23 1.05–1.44 0.010 
MAPK10 rs10010011 87 139 714 Intronic 1.18 1.04–1.35 0.011 
MAPK10 rs2035058 87 185 006 Intronic 1.22 1.04–1.43 0.013 
MAPK10 rs1460761 87 096 662 Intronic 1.18 1.03–1.35 0.015 
MAPK10 rs4403040 87 038 481 Intronic 1.12 1.01–1.23 0.032 
MAPK10 rs6822478 87 076 360 Intronic 1.17 1.01–1.34 0.034 
MAPK10 rs1599313 87 135 124 Intronic 1.13 1.01–1.27 0.041 
MAPK10 rs1903360 87 095 840 Intronic 1.13 1.01–1.28 0.045 
POU2F3 rs2847502 11 120 118 498 Intronic 0.86 0.78–0.96 0.009 
POU2F3 rs2096884 11 120 173 354 Intronic 0.86 0.74–0.98 0.029 
POU2F3 rs1941411 11 120 130 512 Intronic 0.89 0.80–0.99 0.045 
C1QBP rs8072363 17 5 335 752 Intronic 1.13 1.01–1.27 0.043 
CD74 rs2748249 149 792 600 Upstream 0.84 0.74–0.96 0.011 
CD74 rs7709772 149 792 109 Intronic 1.25 1.01–1.56 0.047 
CLOCK rs1801260 56 301 369 3′UTR 1.12 1.01–1.25 0.039 
CLOCK rs17722979 56 346 411 Intronic 1.12 1.01–1.25 0.045 
CSNK2A2 rs1393203 16 58 207 203 Intronic 1.22 1.01–1.48 0.039 
CSNK2A2 rs1875522 16 58 202 107 Intronic 1.22 1.01–1.48 0.044 
EP300 rs9611502 22 41 531 236 Intronic 0.76 0.58–0.99 0.044 
GLTSCR2 rs7258254 19 48 248 529 Upstream 1.62 1.11–2.36 0.012 
IFIT1 rs215493 10 91 164 170 3′UTR 0.89 0.81–0.98 0.018 
IFIT1 rs304478 10 91 150 922 Upstream 1.11 1.01–1.22 0.032 
IFIT1 rs303217 10 91 153 699 Intronic 1.11 1.01–1.22 0.036 
IFIT1B rs304504 10 91 137 826 5′UTR 1.11 1.00–1.22 0.040 
IFNAR1 rs12483293 21 34 716 789 Intronic 0.89 0.80–0.99 0.045 
IFNAR2 rs2248412 21 34 605 531 Intronic 0.85 0.74–0.98 0.023 
IFNGR2 rs2012075 21 34 796 762 intronic 1.15 1.01–1.32 0.046 
JAK1 rs2780900 65 318 514 Intronic 0.82 0.70–0.97 0.019 
JAK1 rs310244 65 305 877 Intronic 0.87 0.76–0.98 0.021 
JAK1 rs310216 65 316 675 Intronic 0.88 0.79–0.99 0.040 
JAK1 rs310219 65 315 298 Intronic 0.89 0.79–0.99 0.041 
JAK1 rs2780816 65 302 413 Intronic 0.89 0.79–0.99 0.042 
NFKB1 rs3774934 103 427 476 Intronic 0.82 0.69–0.79 0.024 
NFKBIB rs3136642 19 39 398 416 Intronic 1.11 1.01–1.23 0.042 
PML rs11072468 15 74 312 794 Intronic 1.13 1.01–1.26 0.029 
PML rs9479 15 74 328 576 3′UTR 1.11 0.97–1.26 0.032 
RELA rs11227248 11 65 428 178 Intronic 1.12 1.01–1.24 0.025 
RELA rs10896027 11 65 420 760 Downstream 1.11 1.01–1.23 0.038 
SP100 rs6436943 231 357 373 Intronic 0.87 0.75–0.99 0.048 
SRSF4 rs2230679 29 475 648 Exonic 1.21 1.05–1.39 0.010 
SRSF4 rs547976 29 503 214 Intronic 0.86 0.77–0.97 0.011 
SRSF4 rs2230678 29 475 341 Exonic 1.14 1.02–1.28 0.023 
SRSF4 rs2230677 29 475 394 Exonic 1.14 1.02–1.28 0.023 
SRSF8 rs12627 11 94 802 620 3′UTR 1.13 1.02–1.24 0.016 
SRSF8 rs1056986 11 94 802 318 3′UTR 1.11 1.01–1.23 0.034 
TAB1 rs4821893 22 39 797 779 Intronic 1.13 1.01–1.25 0.030 
TAB1 rs5750820 22 39 825 322 Intronic 1.12 1.01–1.24 0.040 
TAB1 rs5750824 22 39 830 123 Intronic 0.89 0.79–0.99 0.045 
TAF3 rs17143115 10 7 964 341 Intronic 1.31 1.01–1.70 0.039 
TAF4 rs1046137 20 60 550 082 3′UTR 0.74 0.56–0.99 0.042 
TAF5L rs3753886 229 738 170 Exonic 0.91 0.82–0.99 0.046 
TBPL2 rs8010013 14 55 884 828 Intronic 1.13 1.03–1.25 0.012 
TBPL2 rs8014621 14 55 906 849 Intronic 1.13 1.03–1.25 0.012 
TBPL2 rs1953740 14 55 886 717 Intronic 0.89 0.80–0.98 0.024 
TBPL2 rs8019209 14 55 891 329 Intronic 0.89 0.80–0.99 0.030 
TBPL2 rs7149317 14 55 902 542 Intronic 0.89 0.80–0.99 0.031 
TNFSF14 rs11878563 19 6 670 910 Upstream 1.12 1.01–1.23 0.026 
Replication studye 
EIF2AK4 rs12442713 15 38 068 058 Intronic 0.76 0.63–0.91 0.003 
EIF2AK4 rs2291626 15 38 077 998 Intronic 0.78 0.64–0.94 0.011 
EIF2AK4 rs8041785 15 38 039 033 Intronic 1.24 1.03–1.48 0.021 
EIF2AK4 rs566792 15 38 013 787 Exonic 0.75 0.59–0.96 0.024 
EIF2AK4 rs489508 15 38 033 071 Intronic 1.22 1.02–1.46 0.032 
OAS3 rs4766678 12 111 882 074 Intronic 0.77 0.64–0.93 0.008 
OAS3 rs2072136 12 111 883 302 Exonic 0.77 0.64–0.94 0.009 
TAF3 rs4749362 10 8 008 403 Intronic 1.32 1.10–1.59 0.003 
TAF3 rs12219539 10 7 990 905 Intronic 0.75 0.60–0.93 0.008 
TAF3 rs4749339 10 7 989 272 Intronic 0.75 0.61–0.93 0.010 
TAF3 rs11255418 10 7 953 263 Intronic 0.76 0.61–0.94 0.010 
TAF3 rs11255440 10 7 996 236 Intronic 0.76 0.62–0.94 0.011 
TAF3 rs4749317 10 7 948 903 Intronic 0.76 0.61–0.94 0.012 
TAF3 rs10795579 10 8 024 995 Intronic 1.26 1.05–1.51 0.014 
TAF3 rs4749477 10 8 085 428 Intronic 0.75 0.59–0.94 0.014 
TAF3 rs10752123 10 8 086 795 Intronic 1.24 1.03–1.48 0.023 
TAF3 rs11255456 10 8 032 624 intronic 1.24 1.03–1.49 0.024 
TAF3 rs1887582 10 8 043 339 Intronic 0.78 0.63–0.98 0.031 
TAF3 rs10752124 10 8 086 926 Intronic 1.22 1.02–1.47 0.032 
CLOCK rs6824955 56 011 893 Intronic 0.82 0.68–0.99 0.040 
CLOCK rs6812042 56 090 836 Intronic 0.82 0.68–0.99 0.041 
CLOCK rs4580704 56 021 464 Intronic 0.82 0.68–0.99 0.043 
CLOCK rs3805154 56 058 684 Intronic 0.82 0.68–0.99 0.043 
CLOCK rs9312662 56 081 128 Intronic 0.82 0.68–0.99 0.043 
CLOCK rs4865010 56 109 719 Upstream 0.82 0.68–0.99 0.044 
CSNK2A1 rs4815707 20 450 937 Intronic 1.24 1.03–1.49 0.025 
CSNK2A1 rs4815701 20 441 205 Intronic 1.22 1.01–1.46 0.037 
CUL1 rs243481 148 083 355 Intronic 1.24 1.03–1.48 0.022 
CUL1 rs2373414 148 101 252 Intronic 1.22 1.02–1.45 0.031 
CUL1 rs11773084 148 089 577 Intronic 1.22 1.02–1.45 0.032 
EIF2AK3 rs6739095 88 661 179 Intronic 0.82 0.68–0.99 0.041 
EIF2AK3 rs1913671 88 680 998 Intronic 0.83 0.69–1.00 0.050 
GTF2IRD1 rs37620 73 527 746 Intronic 0.77 0.62–0.95 0.017 
GTF2IRD1 rs37621 73 527 627 Intronic 0.79 0.64–0.99 0.043 
IKBKB rs4282553 42 269 519 Intronic 1.65 1.06–2.58 0.028 
IKBKB rs6474387 42 305 409 Intronic 1.42 1.01–2.00 0.042 
IKBKB rs6474388 42 306 311 Intronic 1.42 1.01–2.00 0.042 
POLR2A rs2071504 17 7 346 661 Exonic 0.79 0.64–0.99 0.042 
POLR2A rs11658168 17 7 346 858 Intronic 0.83 0.70–1.00 0.050 
POU2F3 rs17245504 11 119 683 963 Intronic 1.34 1.02–1.76 0.033 
PVRL2 rs8104483 19 50 064 194 Intronic 0.81 0.66–0.99 0.038 
SP100 rs890669 231 093 475 Intronic 0.83 0.69–0.99 0.043 
TRAF1 rs2239658 122 711 658 Intronic 0.80 0.67–0.96 0.016 
TRAF1 rs1930780 122 726 040 Intronic 0.80 0.67–0.96 0.016 
TRAF1 rs2239657 122 711 341 Exonic 0.82 0.68–0.98 0.030 
TRAF5 rs2175633 209 572 368 Intronic 1.74 1.07–2.84 0.027 
TRAF5 rs6698113 209 572 225 Intronic 1.67 1.02–2.74 0.040 
TRAF5 rs17267554 209 606 767 Intronic 0.75 0.57–0.99 0.042 

Chr., chromosome; OR, odds ratio; CI, confidence interval; P, two-sided P-value corresponding to the OR.

aBuild 37.2 (GRCh37/hg19) Assembly.

bMinor allele.

cMajor allele.

dAssociation results were derived from unconditional logistic regression for minor allele assuming a log-additive model corrected for the observed inflation factor (λ= 1.03). None of the PCs generated from PCA was significantly associated with case–control status (all P > 0.05) hence not included in the final model.

eAssociation results were derived from unconditional logistic regression for minor allele assuming a log-additive model with adjustment of three PCs generated from PCA which were significantly associated with case–control status (P < 0.05) with λ = 1.00.

There are 11 and 9 overlapping genes in the discovery and replication studies, respectively, between S. aureus and herpes simplex infection pathways, all of which are HLA genes (HLA-DRB1, HLA-DRA, HLA-DQA1, HLA-DQA2, HLA-DQB1, HLA-DPA1, HLA-DPB1, HLA-DMA, HLA-DMB, HLA-DOA and HLA-DOB). The S. aureus infection pathway includes 27 and 14 other genes in the discovery and replication studies, respectively, whereas the herpes simplex infection pathway includes 143 and 99 other genes in the discovery and replication studies, respectively. After removing these overlapping HLA genes, both pathways were statistically significant in both the discovery and replication studies (Supplementary Material, Table S5), suggesting that the associations with these two pathways are not only driven by the shared HLA genes.

When testing all the pathways without HLA genes (HLA-A, HLA-B, HLA-C, HLA-DMA, HLA-DMB, HLA-DOA, HLA-DOB, HLA-DPA1, HLA-DPB1, HLA-DQA1, HLA-DQA2, HLA-DQB1, HLA-DRA, HLA-DRB1, HLA-E, HLA-F and HLA-G), none of the pathways remained statistically significant after correction for multiple testing in the discovery study. Nevertheless, the pathways of ‘S. aureus infection’ and ‘Herpes simplex infection’ showed statistically significant association with risk of cervical cancer in the replication study (P = 0.012 and 0.021, respectively) (Supplementary Material, Table S6), suggesting that candidate genes other than HLA genes in these two pathways may be involved in the pathogenesis of cervical cancer.

DISCUSSION

GWASs using single-SNP analysis have been very successful in identifying common genetic variation underlying complex diseases in recent years. However, this approach may miss the causal genes with moderate genetic effects due to the stringent significance threshold of GWASs. Compared with single-SNP analysis, pathway analysis combines the association evidence of multiple functionally related genes within a pathway, and potentially has greater power to reveal the biological mechanism(s) underlying complex diseases (5). Other benefits of this approach are the substantial reduction of the multiple testing burden once genes are grouped into pathways for association testing and the incorporation of biological knowledge into the analysis, which is not accounted for in single-SNP analysis. The SRT has several major advantages compared with other methods. First, the magnitude of any one-association statistic is not key, but rather the number of significant SNPs above what would be expected by chance is key, making the SRT more robust to false positives at an SNP level. Second, the SRT uses phenotype permutation to correct naturally for biases introduced by LD and gene size (14,16). It keeps all the information constant and only permutes the case–control status among individuals. The same individuals are used, maintaining the same LD structure. This minimizes false-positive associations arising from LD because, even if LD was leading to an excess of significance for a pathway, this LD block would be identical across all datasets using randomized phenotypes. The same is true for gene size, since a larger gene has more potential SNP sites and therefore more potential false positives, but this bias will be the same for the original and the simulated datasets. Last but not least, covariates such as population stratification in GWAS can be adjusted in the SRT. The SRT has been successfully applied to the GWAS of several complex diseases (1113).

In the present study, we used the SRT and identified 12 pathways that are associated with the development of cervical cancer in the Swedish population. These findings were further replicated using another method and another independent study. Among them, ‘antigen processing and presentation’, ‘type 1 diabetes mellitus’, ‘S. aureus infection’, ‘phagosome’, mhcPathway and ‘asthma’ are enriched for variants of small and large effect. In addition, a small number of variants of relatively large effect are enriched in the pathways of ‘intestinal immune network for IgA production’, ‘autoimmune thyroid disease’, ‘allograft rejection’, ‘graft-versus-host disease’ and eosinophilsPathway, while more variants of small effect are enriched in the pathway ‘herpes simplex infection’. These findings suggest that a number of different genetic models underlie cervical cancer, as is the case for many complex diseases. All the identified pathways are related to the immune response and 10 of them are mainly driven by genetic variation in genes within the extended MHC region, supporting the importance of the MHC region in cervical cancer pathogenesis, in particular, HLA molecules that present antigens to the immune system. On the other hand, in addition to genes within the MHC region, the associations with the pathways ‘S. aureus infection’ and ‘herpes simplex infection’ are also attributable to genes outside the extended MHC region, the majority of which have not been studied in the context of cervical cancer.

In a previous pathway-based analysis based on 216 KEGG pathways, the pathways ‘asthma’ (empirical P = 0.03), ‘folate biosynthesis’ (empirical P = 0.04) and ‘graft-versus-host disease’ (empirical P = 0.05) were found to be significantly associated with risk of cervical cancer (13). The associations with ‘asthma’ and ‘graft-versus-host disease’ but not ‘folate biosynthesis’ were also observed in our study. Given the lack of adjustment for multiple testing and replication in an independent cohort in the previous study (13), the role of ‘folate biosynthesis’ in the development of cervical cancer warrants further evaluation. A recent GWAS of cervical cancer in the Han Chinese population has identified three susceptibility loci. The first is between HLA-DPB1 and HLA-DPB2; the second is in exocyst complex component 1 gene (EXOC1) at 4q12; and the third is 9.5 kb downstream of gasdermin B (GSDMB) (17) at 17q12. In our study, we found that variation in HLA-DPB1 contributed to 10 pathways that were significantly associated with cervical cancer (Supplementary Material, Table S4), suggesting an important role of HLA-DPB1 in the pathogenesis of cervical cancer. However, neither EXOC1 nor GSDMB is present in the pathways from KEGG and BioCarta databases analyzed in our study. Further studies are warranted to validate the findings for EXOC1 and GSDMB in non-Asian populations.

In addition to the variants in the extended MHC region, the association with pathway ‘S. aureus infection’ seems to be mainly determined by a small number of variants with relatively large effect in the genes C1S, CFI, FPR3, ITGB2, MASP1 and SELPLG. While no studies have investigated the association between S. aureus infection and cervical cancer, a recent finding may suggest a role of S. aureus infection in epithelial carcinoma caused by HPV infection. In a transgenic model of epithelial carcinogenesis induced by HPV 16 oncogenes, infiltrating CD4+ T cells were detected in both premalignant and malignant lesions, which functionally promoted epithelial carcinogenesis. Remarkably, CD4+ T-cell reactivity was predominantly directed toward antigens expressed by the Staphylococcal bacteria infecting the dysplastic skin lesions, not the HPV16 oncogenes, suggesting that S. aureus infection may evoke an inflammatory response that inadvertently enhances malignant progression (18). CFI, C1S and MASP1 encode serine proteases of the complement system which is involved in host defense against infectious agents, more specifically in the removal of apoptotic cells and immune complexes, and in the modulation of the adaptive immune system (19). As shown in Supplementary Material, Fig. S11, the complement system can be activated via three major routes: the classical, alternative and lectin pathways. CFI is a crucial inhibitor controlling all complement pathways and complete CFI deficiency is strongly associated with severe infections (20). C1S is a subunit of the first component of the complement system (C1), which can recognize different types of activators and initiate the activation of the classical pathway (21). MASP1 functions as a component of the lectin pathway of complement activation which may play a role as an amplifier of complement activation by cleaving complement C2 or by activating another complement serine protease, MASP-2 (22,23). A splice variant of MASP1 which lacks the serine protease domain, functions as an inhibitor of the complement pathway (24). FPR3, also known as FPRL2, is expressed on dendritic cells (DCs) throughout their maturation and the interaction of FPR3 and its endogenous ligand (s) may be involved in regulating DC trafficking during antigen uptake and processing in the periphery as well as the T-cell-stimulating phase of the immune response (25). ITGB2 encodes the beta-2 integrin subunit of the leukocyte cell adhesion molecule that participates in cell adhesion and cell-surface mediated signaling (26). Defects in this gene cause leukocyte adhesion deficiency type I (LAD1), which is characterized by recurrent bacterial infections (27), and genetic variation in ITGB2 has been significantly associated with the development of papillary thyroid cancer (28). SELPLG encodes a glycoprotein that functions as a high-affinity counter-receptor for the cell adhesion molecules P-, E- and L-selectin expressed on myeloid cells and stimulated T lymphocytes. As such, this protein plays a critical role in leukocyte trafficking during inflammation by tethering of leukocytes to activated platelets or endothelia expressing selectins (29). Aberrant expression of this gene and polymorphisms in this gene are associated with defects in the innate and adaptive immune response (30,31). As cervical cancer is caused by persistent HPV infection, these genes that are involved in the immune response against infections may also play a role in the pathogenesis of cervical cancer.

The association of cervical cancer with pathway ‘herpes simplex infection’ is driven by a number of variants with small effect in 31 genes in the discovery study and 15 genes in the replication study after removing variants in the extended MHC region, with SRSF6, CUL1, EIF2AK2, EIF2AK4, MAPK10, POU2F3, OAS3 and TAF3 showing the strongest effects. Herpes simplex virus (HSV) includes two types: HSV-1 and HSV-2. HSV-1 causes primarily oral infections, whereas HSV-2, like HPV, can infect cervical squamous epithelial tissue in the squamocolumnar junction, where invasive cervical cancer arises (32). Synergisms between HPV and HSV-2 in induction of cervical cancer have been shown in epidemiological studies and confirmed by laboratory experiments (3338). These studies indicate that simultaneous infection with HPV and HSV-2 increases the risk of cervical cancer when compared with infection with either HPV-16 or HPV-18 alone. It has been assumed that HSV-2 initiates the process and infection with HPV occurs later (39). Indeed, herpetic lesions can facilitate the access of HPV to basal cells and inflammatory responses induced by HSV-2 infection may decrease T helper cells' capacity to produce efficient immune response (40). Alternatively, HSV-2 infection can increase the risk of cervical carcinogenesis by inducing the production of nitric oxide that may result in cellular DNA damage in HPV-infected cells (41). Therefore, it is biologically plausible that genes involved in the HSV-2 infection pathway may also play a role in the development of cervical cancer. The splicing factor SRSF6 is involved in the mRNA alternative splicing and is found to be an oncoprotein that regulates the proliferation and survival of lung and colon cancer cells (42). CUL1 is member of cullin protein family required for ubiquitin-dependent protein degradation and CUL1-containing ubiquitin ligase plays a role in the ubiquitination and proteolysis of the E7 HPV oncoprotein (43). EIF2AK2 and EIF2AK4 are two members of the serine–threonine kinases family that phosphorylate the α-subunit of eukaryotic initiation factor 2 (eIF2α), in response to stress signals. eIF2α phosphorylation by EIF2AK2 in response to viral infection results in arrest of viral mRNA translation and apoptosis, an efficient way to inhibit virus replication (44). Pronounced EIF2AK4 activation was observed in human tumors while compromising EIF2AK4 signaling severely hampered tumor growth in vivo, suggesting the EIF2AK4 pathway is critical for maintaining metabolic homeostasis in tumor cells (45). MAPK10 is a member of the MAP kinase family, which has proapoptotic functions and can function as a tumor-suppressor gene (46). POU2F3 is a keratinocyte-specific POU transcription factor and is also a candidate tumor-suppressor protein, and aberrant promoter methylation of this gene may play a role in cervical cancer (47). OAS3 encodes an enzyme which plays a significant role in the inhibition of cellular protein synthesis and viral infection resistance (48). TAF3 is a subunit of the RNA polymerase II transcription factor which contributes to promoter recognition and selectivity, and acts as anti-apoptotic factor (49). These genes may thus be involved in the persistence of HPV infection or cervical cancer development.

As shown in the Materials and Methods, genetic variants that exist on either Illumina HumanOmniExpress BeadChip or Affymetrix Genome-wide Human SNP arrays 5.0 were imputed in the combined dataset and QC was applied to exclude genetic variants that were poorly imputed or showed statistically significant difference in minor allele frequency (MAF) between the discovery and replication GWASs. In the combined dataset, the eosinophilsPathway was borderline significant under the SNP significance threshold of Psnp ≤ 0.005, and hsa04940 (type 1 diabetes mellitus) and mhcPathway (antigen processing and presentation) were not statistically significant under the SNP significance threshold of Psnp ≤ 0.05 (Supplementary Material, Table S3). However, these pathways were found to be statistically significant in both the discovery and replication studies using SRT or set-based analysis. It is worth noting that the SNP density of these three pathways in the combined dataset is much lower than that of the discovery GWAS (16 SNPs compared with 48 SNPs, 528 SNPs compared with 601 SNPs, 6 SNPs compared with 40 SNPs, respectively), which could reduce the study power. This is due to the fact that some SNPs in these pathways that were genotyped in the discovery GWAS were poorly imputed in the replication GWAS, especially those in the MHC region given its complex LD structure, and therefore were removed from the combined dataset. It is also important to validate the imputation results for the SNPs in these pathways before drawing any firm conclusion.

The results of pathway-based analyses can be validated in independent materials using either additional GWAS data or targeted genetic analyses of the genes in pathways indicated in the original study. The former has the advantage that a genome-wide selection of SNP is available for validation, but may be difficult to perform due to lack of such data. The latter requires genotyping of selected SNPs in genes in pathways of interest using custom SNP arrays. This also requires access to additional cohorts, but if such samples are available custom genotyping may be less costly and therefore could involve a larger set of samples. Genotyping of variation in key genes in the pathways indicated can also be expanded to include variants in addition to those present on the original SNP array. In the present investigation, we do not have access to further clinical materials that could be genotyped. There has only been one other cervical cancer GWAS published, in a Chinese material. A future step might therefore be to validate the top pathways in the Chinese GWAS data. However, there were substantial differences in the primary results reported by the Swedish and Chinese GWASs, indicating that there might be differences in the etiology of cervical cancer, and the pathways involved between these populations. A more optimal approach would be to interrogate further cohorts of European descent.

This study has several strengths. First, correction for multiple testing was performed in order to control the type I error. Second, we performed a two-stage pathway analysis in two independent cohorts from the Swedish population, which may reduce the false-positive findings and improve the credibility of the results. Third, the 12 significant pathways identified by SRT were also validated using another well-established method, set-based analysis, which can also correct for biases introduced by gene size and LD. Nevertheless, several limitations in the present study need to be acknowledged. First, incomplete annotation of the human genome may reduce the study power for the pathway analysis, since many genes are uncharacterized and SNPs in these genes or intergenic SNPs physically far away from genes were not included. Hence, the pathway-based analysis can complement but not replace the single-SNP approach. Second, as with other methods, the SRT is less equipped to identify significant pathways when there is a paucity of SNPs arising within pathways (PGSNP); if there are no significant PGSNP, the ratio is always 0 for the original data. Thus, all simulations will at least equal this ratio, resulting in an empirical P-value of 1. These limits reflect both the limits of pathway annotation and the power of the GWAS dataset used. Thus, adequate SNP coverage of a pathway is essential for that pathway to be effectively tested. Finally, the amount and diversity of SNPs measured per pathway is highly variable between the genotyping platforms used for the discovery and replication GWAS datasets. Despite this fact, 12 pathways were replicated with a different technical platform and method, providing credible evidence that these pathways influence susceptibility to cervical cancer. It is worth noting that the SNP density of the replication GWAS is much lower than that of the discovery GWAS, which reduces the study power for replicating significant pathways. Therefore, pathways identified in the discovery phase but failed in the replication study warrant further validation in another independent study with higher SNP coverage.

In summary, we have conducted a two-stage pathway analysis in GWASs of cervical cancer in the Swedish population. The successful identification and replication of 12 cervical cancer susceptibility pathways demonstrates the power and applicability of the pathway-based approach in the analysis of GWA data. The associations of 10 pathways are mainly determined by genes in the MHC region, supporting the importance of the MHC region in cervical cancer pathogenesis. In addition, we have identified a number of novel candidate genes outside the MHC region in the pathways denoted ‘S. aureus infection’ and ‘herpes simplex infection’ that influence the susceptibility to cervical cancer, providing new insights into etiology of cervical cancer. Our results demonstrate that a pathway-based analysis that incorporates information from markers with moderate significance levels, can greatly facilitate the interpretation of GWA data and leads to the identification of novel disease-susceptibility genes and mechanisms. The identified causal pathways may help formulate new hypotheses or substantiate existing hypotheses regarding disease etiology, and genes that rank high in these pathways can serve as candidates for genetic and functional follow-up studies.

MATERIALS AND METHODS

Study population

Discovery stage

Subjects included in the discovery stage were from a GWAS of cervical cancer generated on the Illumina Omni Express BeadChip (Illumina, San Diego, CA, USA) (731 422 SNPs) in a Swedish population. The details of population and QC have been described elsewhere (4). Briefly, after stringent QC, data from 1034 cervical cancer patients (971 CIS and 63 invasive carcinoma) and 3948 control subjects were available for 632 668 SNPs with an overall call rate of 99.92%. The potential for population stratification was investigated by principal components analysis (PCA) undertaken with the EIGENSTRAT package (50). None of the principal components (PCs) derived from PCA was statistically significantly associated with case–control status (all P > 0.05).

Replication stage

The replication GWA dataset was generated on the Affymetrix Genome-wide Human SNP arrays 5.0 with 440 794 SNP markers, which represents mostly different markers compared with the Illumina arrays used in the discovery GWAS. In brief, 617 unrelated cervical cancer patients (598 CIS and 19 invasive carcinoma) were selected from families with at least two affected women from Sweden and genotype data of 519 controls in Sweden was downloaded from the Nordic Control Allele Frequency and Genotype Database created by the Nordic Center of Excellence in Disease Genetics program (51). Systematic QC common to both datasets was conducted separately for the case and control datasets before merging. For each dataset, all individuals with overall genotyping rate <95% and individuals with heterozygous haploid genotypes were excluded. Markers with MAF <2% or genotyping call rate <97% were removed. Genotype data from the two datasets were merged and further QC was applied. The exclusion criteria for SNPs were: (i) genotyping call rate <97%, (ii) heterozygous haploid genotypes, (iii) multi-mapping sites, (iv) genotype frequency that deviated from Hardy–Weinberg equilibrium (HWE) among control subjects (P < 1 × 10−7) and (v) a statistically significant difference in missing rate between case subjects and control subjects (P < 1 × 10−7). Samples were excluded if heterozygosity rates for autosomal chromosomes were >6 SDs from the mean. We also conducted further exclusions for those with abnormal heterozygosity rate on the X chromosome. Unexpected duplicates and the first- and second-degree relatives were removed based on identity-by-state (IBS) estimates calculated in PLINK (15). To maximize the ethnic homogeneity of the case and control series and to capture other potential underlying structures in the data, we performed a PCA undertaken with the EIGENSTRAT (50) package using a subseries of 19 959 SNPs that are evenly distributed across the genome in low LD (pairwise r2 < 0.03). This resulted in further exclusion of four outliers. After systematic QC steps, data from 616 case subjects (597 CIS and 19 invasive carcinoma) and 506 control subjects were available for 341 358 SNPs with an overall call rate of 99.79% (Supplementary Material, Table S7). Three PCs derived from PCA were statistically significantly associated with case–control status (P < 0.05).

Informed consent was obtained from all subjects, and each study was approved by the regional ethical review board.

Imputation and merging two GWASs

The replication GWA dataset was changed to hg19 from hg18 using the USCS LiftOver tool (http://genome.sph.umich.edu/wiki/LiftOver). Both the discovery and replication GWAS datasets were separately phased using the shapeit-tool (52). They were then separately split into chunks of ∼5 Mb containing at least 200 genotyped SNPs. No chunks spanned the centromeres. Genetic variants across each chunk not genotyped on the Ilumina HumanOmniExpress BeadChip or Affymetrix Genome-wide Human SNP arrays 5.0 were imputed for the discovery and replication GWASs, respectively, using the program IMPUTE2 (53) with phased haplotypes in subjects from the December 2013 release of the 1000 Genomes Project data (https://mathgen.stats.ox.ac.uk/impute/impute_v2.html#reference) (54) as a scaffold. Imputed variants with info-score < 0.3 in either dataset were removed. The two datasets were then merged and markers with <0.9 in posterior genotypic probability in >5% of the samples or monomorphic were removed. Finally, genotypes were called from posterior probabilities using 0.9 as threshold using gtool (http://www.well.ox.ac.uk/~cfreeman/software/gwas/gtool.html). Genetic variants that exist on either Ilumina HumanOmniExpress BeadChip or Affymetrix Genome-wide Human SNP arrays 5.0 were included for further analysis. Further QC was applied to the merged dataset. The exclusion criteria for genetic variants were: (i) genotype frequency that deviated from HWE among control subjects (P < 1 × 10−7) and (ii) a statistically significant difference in MAF between the discovery and replication GWASs (P < 1 × 10−7). Unexpected duplicates (11 subjects) and first- and second-degree relatives (9 subjects) were removed based on IBS estimates calculated in PLINK (15). Utilizing a set of 15 872 SNPs genotyped in both GWASs and evenly distributed across the genome in low LD (pairwise r2 < 0.02), a PCA using the EIGENSTRAT software (50) excluded eight additional samples detected as outliers. After stringent QC, data from 1634 cases and 4442 controls were available for 747 046 SNPs with an overall call rate of 99.32%. The potential for population stratification was investigated by PCA undertaken with the EIGENSTRAT package (50). One PC derived from PCA was statistically significantly associated with case–control status (P < 0.05). The association results of the 747 046 SNPs included in the joint analysis are available in ‘http://www.genpat.uu.se/research/genetics_genomics/gyllensten/data/’, and are derived from unconditional logistic regression assuming a log-additive genetic model adjusted by one PC statistically significantly associated with case–control status.

Pathway analysis

Pathways were collected from two resources: KEGG (http://www.genome.jp/kegg/pathway.html) and BioCarta databases (http://www.biocarta.com/genes/index.asp) (as of July 2013). Two hundred and sixty-eight pathways from KEGG database and 297 pathways from BioCarta database were analyzed in the present study. SNP data were obtained from dbSNP (ftp://ftp.ncbi.nih.gov/snp/organisms/human_9606/database/organism_data/b135_SNPContigLocusId_37_3.bcp). Genes annotated in this file, which includes SNPs in upstream (2 kb) and downstream (0.5 kb) regions, were merged with those in the pathways to create a file linking pathways and SNP information using the Human Genome Organization gene nomenclature.

The association between each SNP and cervical cancer risk was estimated using unconditional logistic regression assuming a log-additive genetic model in PLINK (15). Adjustment for population stratification was performed by including PCs derived from PCA that were statistically significantly associated with case–control status (P < 0.05) as covariates in the logistic regression. In the discovery GWAS, none of the PCs was statistically significantly associated with case–control status (all P > 0.05) hence not included in the final model, while in the replication and combined GWASs, three and one PCs which were significantly associated with case–control status (P < 0.05) were included as covariates in the final model, respectively. Genomic control was subsequently applied in SRT, and all the P-values were corrected for the observed inflation when necessary (λ = 1.03, 1.00 and 1.11 for the discovery, replication and combined GWASs, respectively).

SNP ratio test

The SRT computes an ‘SNP ratio’ for each pathway, i.e. it divides the number of significant markers (P-value below a certain threshold Psnp) by the number of non-significant markers within a pathway. To estimate the significance of each pathway, we randomized the case–control status of individuals in the original GWAS data to generate 20 048 simulated GWAS datasets while preserving the overall ratio of cases to controls and then calculated an empirical P-value based on the simulated datasets. The number of significant SNPs in the simulated GWAS datasets is defined as the top n SNPs with the lowest P-values, where n is the number of SNPs with an association below Psnp in the original GWAS. The Psnp-value used here refers to the threshold denoting whether the original single SNP is significant. Adjusted empirical P-value for each pathway was calculated using the Benjamini and Hochberg (55) step-up FDR controlling procedure (independent and positive regression-dependent test statistics) for multiple testing correction (55). Pathways with adjusted empirical P-values of ≤0.05 were further validated in the replication GWAS.

Set-based analysis

The pathways associated with risk of cervical cancer that were identified by the SRT were further validated using set-based analysis in PLINK in both the discovery and replication GWASs (15). SNPs in a certain pathway were grouped into a set. This analysis works as follows: (i) for each set, for each SNP, it determines which other SNPs are in LD above the threshold of r2 = 0.6; (ii) the association between each SNP and cervical cancer risk was estimated using unconditional logistic regression assuming a log-additive genetic model with adjustment for population stratification; (iii) for each set, select up to N ‘independent SNPs’ as defined in Step 1 with P-value below a certain threshold Psnp which also refers to the threshold denoting whether the original single SNP is significant; (iv) from these subsets of SNPs, the statistics for each set is calculated as the mean of these single-SNP statistics; (v) permute the phenotype of the GWAS dataset 20 048 times, keeping LD between SNPs constant; (vi) for each permuted dataset, repeat Steps (ii)–(iv) and (vii), empirical P-value is calculated as the number of times the permuted set-statistic exceeds the original one for that set. It is worth noting that the set-based analysis does not correct for the observed inflation, which may results in different results when compared with those generated by the SRT.

SUPPLEMENTARY MATERIAL

Supplementary Material is available at HMG online.

FUNDING

This work was supported by grants from the Swedish Cancer Society (U.G.), the Medical Faculty of Uppsala University (D.C.), the Swedish Research Council (U.G.) and the Knut and Alice Wallenberg Foundation (U.G.).

ACKNOWLEDGEMENTS

The authors thank all of the participants in this research and support staff who made this study possible.

Conflict of Interest statement. None declared.

REFERENCES

1
Arbyn
M.
Castellsagué
X.
de Sanjosé
S.
Bruni
L.
Saraiya
M.
Bray
F.
Ferlay
J.
Worldwide burden of cervical cancer in 2008
Ann. Oncol.
 
2011
22
2675
2686
2
Bosch
F.X.
Lorincz
A.
Muñoz
N.
Meijer
C.J.
Shah
K.V.
The causal relation between human papillomavirus and cervical cancer
J. Clin. Pathol.
 
2002
55
244
265
3
Magnusson
P.K.
Sparén
P.
Gyllensten
U.B.
Genetic link to cervical tumours
Nature
 
1999
400
29
30
4
Chen
D.
Juko-Pecirep
I.
Hammer
J.
Ivansson
E.
Enroth
S.
Gustavsson
I.
Feuk
L.
Magnusson
P.K.
McKay
J.D.
Wilander
E.
et al.  
Genome-wide association study of susceptibility loci for cervical cancer
J. Natl.Cancer Inst.
 
2013
105
624
633
5
Wang
K.
Li
M.
Hakonarson
H.
Analysing biological pathways in genome-wide association studies
Nat. Rev. Genet.
 
2010
11
843
854
6
Barabasi
A.L.
Network medicine–from obesity to the “diseasome”
N. Engl. J. Med.
 
2007
357
404
407
7
Askland
K.
Read
C.
Moore
J.
Pathways-based analyses of whole-genome association study data in bipolar disorder reveal genes mediating ion channel activity and synaptic neurotransmission
Hum. Genet.
 
2009
125
63
79
8
Lesnick
T.G.
Papapetropoulos
S.
Mash
D.C.
Ffrench-Mullen
J.
Shehadeh
L.
de Andrade
M.
Henley
J.R.
Rocca
W.A.
Ahlskog
J.E.
Maraganore
D.M.
A genomic pathway approach to a complex disease: axon guidance and Parkinson disease
PLoS Genet.
 
2007
3
e98
9
Wang
K.
Li
M.
Bucan
M.
Pathway-based approaches for analysis of genomewide association studies
Am. J. Hum. Genet.
 
2007
81
1278
1283
10
Menashe
I.
Maeder
D.
Garcia-Closas
M.
Figueroa
J.D.
Bhattacharjee
S.
Rotunno
M.
Kraft
P.
Hunter
D.J.
Chanock
S.J.
Rosenberg
P.S.
et al.  
Pathway analysis of breast cancer genome-wide association study highlights three pathways and one canonical signaling cascade
Cancer Res.
 
2010
70
4453
4459
11
O'Dushlaine
C.
Kenny
E.
Heron
E.
Donohoe
G.
Gill
M.
Morris
D.
Corvin
A.
International Schizophrenia Consortium
Molecular pathways involved in neuronal cell adhesion and membrane scaffolding contribute to schizophrenia and bipolar disorder susceptibility
Mol. Psychiatry
 
2011
16
286
292
12
Deelen
J.
Uh
H.W.
Monajemi
R.
van Heemst
D.
Thijssen
P.E.
Böhringer
S.
van den Akker
E.B.
de Craen
A.J.
Rivadeneira
F.
Uitterlinden
A.G.
et al.  
Gene set analysis of GWAS data for human longevity highlights the relevance of the insulin/IGF-1 signaling and telomere maintenance pathways
Age (Dordr)
 
2013
35
235
249
13
Ivansson
E.L.
Juko-Pecirep
I.
Erlich
H.A.
Gyllensten
U.B.
Pathway-based analysis of genetic susceptibility to cervical cancer in situ: HLA-DPB1 affects risk in Swedish women
Genes Immunol.
 
2011
12
605
614
14
O'Dushlaine
C.
Kenny
E.
Heron
E.A.
Segurado
R.
Gill
M.
Morris
D.W.
Corvin
A.
The SNP ratio test: pathway analysis of genome-wide association datasets
Bioinformatics
 
2009
25
2762
2763
15
Purcell
S.
Neale
B.
Todd-Brown
K.
Thomas
L.
Ferreira
M.A.
Bender
D.
Maller
J.
Sklar
P.
de Bakker
P.I.
Daly
M.J.
et al.  
PLINK: a tool set for whole-genome association and population based linkage analyses
Am. J. Hum. Genet.
 
2007
81
559
575
16
Ramanan
V.K.
Shen
L.
Moore
J.H.
Saykin
A.J.
Pathway analysis of genomic data: concepts, methods, and prospects for future development
Trends Genet.
 
2012
28
323
332
17
Shi
Y.
Li
L.
Hu
Z.
Li
S.
Wang
S.
Liu
J.
Wu
C.
He
L.
Zhou
J.
Li
Z.
et al.  
A genome-wide association study identifies two new cervical cancer susceptibility loci at 4q12 and 17q12
Nat. Genet.
 
2013
45
918
922
18
Daniel
D.
Meyer-Morse
N.
Bergsland
E.K.
Dehne
K.
Coussens
L.M.
Hanahan
D.
Immune enhancement of skin carcinogenesis by CD4+ T cells
J. Exp. Med.
 
2003
197
1017
1028
19
Botto
M.
Kirschfink
M.
Macor
P.
Pickering
M.C.
Würzner
R.
Tedesco
F.
Complement in human diseases: lessons from complement deficiencies
Mol. Immunol.
 
2009
46
2774
2783
20
Nilsson
S.C.
Sim
R.B.
Lea
S.M.
Fremeaux-Bacchi
V.
Blom
A.M.
Complement factor I in health and disease
Mol. Immunol.
 
2011
48
1611
1120
21
Schumaker
V.N.
Zavodszky
P.
Poon
P.H.
Activation of the first component of complement
Annu. Rev. Immunol.
 
1987
5
21
42
22
Megyeri
M.
Harmat
V.
Major
B.
Végh
Á.
Balczer
J.
Héja
D.
Szilágyi
K.
Datz
D.
Pál
G.
Závodszky
P.
et al.  
Quantitative characterization of the activation steps of mannan-binding lectin (MBL)-associated serine proteases (MASPs) points to the central role of MASP-1 in the initiation of the complement lectin pathway
J. Biol. Chem.
 
2013
288
8922
8934
23
Takahashi
M.
Mori
S.
Shigeta
S.
Fujita
T.
Role of MBL-associated serine protease (MASP) on activation of the lectin complement pathway
Adv. Exp. Med. Biol.
 
2007
598
93
104
24
Skjoedt
M.O.
Hummelshoj
T.
Palarasah
Y.
Honore
C.
Honore
C.
Koch
C.
Skjodt
K.
Garred
P.
A novel mannose-binding lectin/ficolin-associated protein is highly expressed in heart and skeletal muscle tissues and inhibits complement activation
J. Biol. Chem.
 
2010
285
8234
8243
25
Yang
D.
Chen
Q.
Gertz
B.
He
R.
Phulsuksombati
M.
Ye
R.D.
Oppenheim
J.J.
Human dendritic cells express functional formyl peptide receptor-like-2 (FPRL2) throughout maturation
J. Leukoc. Biol.
 
2002
72
598
607
26
Hynes
R.O.
Integrins: versatility, modulation, and signaling in cell adhesion
Cell
 
1992
69
11
25
27
Hogg
N.
Stewart
M.P.
Scarth
S.L.
Newton
R.
Shaw
J.M.
Law
S.K.
Klein
N.
A novel leukocyte adhesion deficiency caused by expressed but nonfunctional beta2 integrins Mac-1 and LFA-1
J. Clin. Invest.
 
1999
103
97
106
28
Eun
Y.G.
Kim
S.K.
Chung
J.H.
Kwon
K.H.
Association study of integrins beta 1 and beta 2 gene polymorphism and papillary thyroid cancer
Am. J. Surg.
 
2013
205
631
635
29
Carlow
D.A.
Gossens
K.
Naus
S.
Veerman
K.M.
Seo
W.
Ziltener
H.J.
PSGL-1 function in immunity and steady state homeostasis
Immunol. Rev.
 
2009
230
75
96
30
Bahbouhi
B.
Berthelot
L.
Pettré
S.
Michel
L.
Wiertlewski
S.
Weksler
B.
Romero
I.A.
Miller
F.
Couraud
P.O.
Brouard
S.
et al.  
Peripheral blood CD4+ T lymphocytes from multiple sclerosis patients are characterized by higher PSGL-1 expression and transmigration capacity across a human blood-brain barrier-derived endothelial cell line
J. Leukoc. Biol.
 
2009
86
1049
1063
31
Scalabrini
D.
Galimberti
D.
Fenoglio
C.
Comi
C.
De Riz
M.
Venturelli
E.
Castelli
L.
Piccio
L.
Ronzoni
M.
Lovati
C.
et al.  
P-selectin glycoprotein ligand-1 variable number of tandem repeats (VNTR) polymorphism in patients with multiple sclerosis
Neurosci. Lett.
 
2005
388
149
152
32
Gupta
R.
Warren
T.
Wald
A.
Genital herpes
Lancet
 
2007
370
2127
2137
33
Lulitanond
V.
Chantratita
W.
Thammaprasert
K.
Nimmanahaeminda
K.
Matangkasombut
P.
Yoosook
C.
Detection of herpes simplex virus type 2 Bgl II N fragment in paraffin-embedded cervical tissue sections using nested polymerase chain reaction
Mol. Cell Probes
 
1994
8
441
447
34
Yamakawa
Y.
Forslund
O.
Chua
K.L.
Dillner
L.
Boon
M.E.
Hansson
B.G.
Detection of the BC 24 transforming fragment of the herpes simplex virus type 2 (HSV-2) DNA in cervical carcinoma tissue by polymerase chain reaction (PCR)
APMIS
 
1994
102
401
406
35
Smith
J.S.
Herrero
R.
Bosetti
C.
Muñoz
N.
Bosch
F.X.
Eluf-Neto
J.
Castellsagué
X.
Meijer
C.J.
Van den Brule
A.J.
Franceschi
S.
et al.  
Herpes simplex virus-2 as a human papillomavirus cofactor in the etiology of invasive cervical cancer
J. Natl. Cancer Inst.
 
2002
94
1604
1613
36
Zhao
Y.
Cao
X.
Zheng
Y.
Tang
J.
Cai
W.
Wang
H.
Gao
Y.
Wang
Y.
Relationship between cervical disease and infection with human papillomavirus types 16 and 18, and herpes simplex virus 1 and 2
J. Med. Virol.
 
2012
84
1920
1927
37
Alberts
C.J.
Schim van der Loeff
M.F.
Papenfuss
M.R.
da Silva
R.J.
Villa
L.L.
Lazcano-Ponce
E.
Nyitray
A.G.
Giuliano
A.R.
Association of chlamydia trachomatis infection and herpes simplex virus type 2 serostatus with genital human papillomavirus infection in men: the HPV in men study
Sex Transm. Dis.
 
2013
40
508
515
38
DiPaolo
J.A.
Woodworth
C.D.
Coutlée
F.
Zimonic
D.B.
Bryant
J.
Kessous
A.
Relationship of stable integration of herpes simplex virus-2 Bg/II N subfragment Xho2 to malignant transformation of human papillomavirus-immortalized cervical keratinocytes
Int. J. Cancer
 
1998
76
865
871
39
Partridge
J.M.
Koutsky
L.A.
Genital human papillomavirus infection in men
Lancet Infect. Dis.
 
2006
6
21
31
40
York
I.A.
Roop
C.
Andrews
D.W.
Riddell
S.R.
Graham
F.L.
Johnson
D.C.
A cytosolic herpes simplex virus protein inhibits antigen presentation to CD8+ T lymphocytes
Cell
 
1994
77
525
535
41
Paludan
S.R.
Malmgaard
L.
Ellermann-Eriksen
S.
Bosca
L.
Mogensen
S.C.
Interferon (IFN)-gamma and herpes simplex virus/tumor necrosis factor-alpha synergistically induce nitric oxide synthase 2 in macrophages through cooperative action of nuclear factor-kappa B and IFN regulatory factor-1
Eur. Cytokine Netw.
 
2001
12
297
308
42
Cohen-Eliav
M.
Golan-Gerstl
R.
Siegfried
Z.
Andersen
C.L.
Thorsen
K.
Ørntoft
T.F.
The splicing factor SRSF6 is amplified and is an oncoprotein in lung and colon cancers
J. Pathol.
 
2013
229
630
639
43
Oh
K.J.
Kalinina
A.
Wang
J.
Nakayama
K.
Nakayama
K.I.
Bagchi
S.
The papillomavirus E7 oncoprotein is ubiquitinated by UbcH7 and Cullin 1- and Skp2-containing E3 ligase
J. Virol.
 
2004
78
5338
5346
44
Donnelly
N.
Gorman
A.M.
Gupta
S.
Samali
A.
The eIF2α kinases: their structures and functions
Cell Mol. Life Sci.
 
2013
70
3493
3511
45
Ye
J.
Kumanova
M.
Hart
L.S.
Sloane
K.
Zhang
H.
De Panis
D.N.
Bobrovnikova-Marjon
E.
Diehl
J.A.
Ron
D.
Koumenis
C.
The GCN2-ATF4 pathway is critical for tumour cell survival and proliferation in response to nutrient deprivation
EMBO J.
 
2010
29
2082
2096
46
Ying
J.
Li
H.
Cui
Y.
Wong
A.H.
Langford
C.
Tao
Q.
Epigenetic disruption of two proapoptotic genes MAPK10/JNK3 and PTPN13/FAP-1 in multiple lymphomas and carcinomas through hypermethylation of a common bidirectional promoter
Leukemia
 
2006
20
1173
1175
47
Zhang
Z.
Huettner
P.C.
Nguyen
L.
Bidder
M.
Funk
M.C.
Li
J.
Li
J.
Rader
J.S.
Aberrant promoter methylation and silencing of the POU2F3 gene in cervical cancer
Oncogene
 
2006
25
5436
5445
48
Bréhin
A.C.
Casadémont
I.
Frenkiel
M.P.
Julier
C.
Sakuntabhai
A.
Desprès
P.
The large form of human 2′,5′-oligoadenylate synthetase (OAS3) exerts antiviral effect against chikungunya virus
Virology
 
2009
384
216
222
49
Gangloff
Y.G.
Pointud
J.C.
Thuault
S.
Carré
L.
Romier
C.
Muratoglu
S.
Brand
M.
Tora
L.
Couderc
J.L.
Davidson
I.
The TFIID components human TAF(II)140 and Drosophila BIP2 (TAF(II)155) are novel metazoan homologues of yeast TAF(II)47 containing a histone fold and a PHD finger
Mol. Cell Biol.
 
2001
21
5109
5121
50
Price
A.L.
Patterson
N.J.
Plenge
R.M.
Weinblatt
M.E.
Shadick
N.A.
Reich
D.
Principal components analysis corrects for stratification in genome-wide association studies
Nat. Genet.
 
2006
380
904
909
51
Speliotes
E.K.
Willer
C.J.
Berndt
S.I.
Monda
K.L.
Thorleifsson
G.
Jackson
A.U.
Lango Allen
H.
Lindgren
C.M.
Luan
J.
Mägi
R.
et al.  
Association analyses of 249,796 individuals reveal 18 new loci associated with body mass index
Nat. Genet.
 
2010
42
937
948
52
Delaneau
O.
Zagury
J.F.
Marchini
J.
Improved whole chromosome phasing for disease and population genetic studies
Nat. Methods.
 
2013
10
5
6
53
Howie
B.N.
Donnelly
P.
Marchini
J.
A flexible and accurate genotype imputation method for the next generation of genome-wide association studies
PLoS Genet.
 
2009
5
e1000529
54
Durbin
R.M.
Abecasis
G.R.
Altshuler
D.L.
Auton
A.
Brooks
L.D.
Durbin
R.M.
Gibbs
R.A.
Hurles
M.E.
McVean
G.A.
1000 Genomes Project Consortium
A map of human genome variation from population-scale sequencing
Nature
 
2010
467
1061
1073
55
Benjamini
Y.
Hochberg
Y.
Controlling the false discovery rate: a practical and powerful approach to multiple testing
J. R. Statist. Soc. B
 
1995
57
289
300