Common cancers have been demarcated into ‘hereditary’ or ‘sporadic’ (‘non-hereditary’) types historically. Such distinctions initially arose from work identifying rare, highly penetrant germline mutations causing ‘hereditary’ cancer. While rare mutations are important in particular families, most cases in the general population are ‘sporadic’. Twin studies have suggested that many ‘sporadic’ cancers show little or no heritability. To quantify the role of germline mutations in cancer susceptibility, we applied a method for estimating the importance of common genetic variants (array heritability, h2g) to twelve cancer types. The following cancers showed a significant (P < 0.05) array heritability: melanoma USA set h2g = 0.19 (95% CI = 0.01–0.37) and Australian set h2g = 0.30 (0.10–0.50); pancreatic h2g = 0.18 (0.06–0.30); prostate h2g = 0.81 (0.32–1); kidney h2g = 0.18 (0.04–0.32); ovarian h2g = 0.30 (0.18–0.42); esophageal adenocarcinoma h2g = 0.24 (0.14–0.34); esophageal squamous cell carcinoma h2g = 0.19 (0.07–0.31); endometrial UK set h2g = 0.23 (0.01–0.45) and Australian set h2g = 0.39 (0.02–0.76). Three cancers showed a positive but non-significant effect: breast h2g = 0.13 (0–0.56); gastric h2g = 0.11 (0–0.27); lung h2g = 0.10 (0–0.24). One cancer showed a small effect: bladder h2g = 0.01 (0–0.11). Among these cancers, previous twin studies were only able to show heritability for prostate and breast cancer, but we can now make much stronger statements for several common cancers which emphasize the important role of genetic variants in cancer susceptibility. We have demonstrated that several ‘sporadic’ cancers have a significant inherited component. Larger genome-wide association studies in these cancers will continue to find more loci, which explain part of the remaining polygenic component.

INTRODUCTION

Common cancers are frequently demarcated into ‘hereditary’ (‘familial’) or ‘sporadic’ (‘non-hereditary’) types (1). Such distinctions initially arose from work identifying rare highly penetrant germline mutations causing ‘hereditary’ cancer (such as CDKN2A mutations in melanoma and BRCA1/2 mutations in breast cancer). These rare mutations are important in particular families but do not explain many cases in the general population. The more common ‘sporadic’ cancers were thought to have little or no germline genetic component, as the large twin studies failed to detect significant heritability for many cancers, with the exceptions of breast, colorectal and prostate cancers (2,3). Studies examining affected relative pairs have shown a familial component to some cancers (4,5), but unlike twin studies, these cannot reliably distinguish between the effects of genes and the effects of common environment on heritability estimates. Recently, genome-wide association studies (GWAS) have identified some novel cancer susceptibility loci (∼1–120, depending on the cancer type), but these typically explain only a small proportion of the heritability. Some studies have considered the contribution of all genetic variants simultaneously instead of just few known loci, using approaches such as polygenic risk prediction (6), but only showed a small heritability for prostate cancer (7,8) and not for breast cancer (8). Thus, it remains unclear whether common germline genetic variants, on the whole, make significant contribution to ‘sporadic’ cancers.

In the current study, we apply a recently developed method to several cancer GWAS datasets. This method links case–control status to relatedness derived from dense genetic marker data (9), allowing the estimation of the magnitude of the germline genetic component. Since most of the genotyping arrays used to date only include single-nucleotide polymorphisms (SNPs), which represent the common variation (frequency >1–5%), the resultant estimate of ‘array heritability’ represents a lower bound for the overall heritability. Importantly, unlike studies that determine the contribution of genes based on estimates of risk among family members (e.g. sibling relative risk), our method excludes closely related individuals, yielding estimates less likely to be biased by shared environment.

RESULTS

For twelve cancer types, we were able to obtain genome-wide array data for >1000 cases and controls (Table 1). Eight of the twelve cancers studied demonstrated a significant polygenic component underlying cancer risk (Table 2). As with previous twin studies, we found a substantial genetic component for the most common cancer, prostate cancer, although our estimate (h2g = 0.81, 95% CI = 0.32–1), albeit with wide confidence interval, was markedly higher than those reported in twin studies (0.42 in Lichtenstein et al. (2) and this estimate was updated to 0.58 using the Nordic Twin Registry of Cancer (NorTwinCan) (10); Table 2). For most other cancers, previous twin studies had insufficient sample size to draw any conclusions (2), but that was improved in the analysis of NorTwinCan (10). Using GWAS data, we showed that endometrial cancer, esophageal adenocarcinoma, esophageal squamous cell carcinoma (ESCC), melanoma, pancreatic, renal cell carcinoma (RCC) and ovarian cancer all had significant array heritability (range from ∼0.10 to ∼0.40). Our data showed positive but non-significant estimates for breast, gastric and lung cancers; thus, we could not make a clear statement on these three cancers. Despite there being published genome-wide significant loci for bladder cancer, these only account for <1% of the variance, and overall this cancer had only a small polygenic component (estimate 0.01, with upper 95% confidence limit of 0.11). Significant X chromosome effects were seen in esophageal adenocarcinoma, ESCC and gastric cancer (GC), under all three dosage compensation models (Supplementary Material, Table S1).

Table 1.

Sample size (post-QC), number of SNPs (post-QC) and genotyping arrays for each cancer

Cancer types Cases Controls SNPs Arrays 
Bladder 2848 3159 505729 HumanHapmap550 
Breast 1081 1085 489247 HumanHapmap550 
Australian endometrial cancer 564 574 540995 Human610-Quad, Human1M-Duo 
UK endometrial cancer 670 2558 495905 Human610-Quad, Human1M-Duo 
Esophageal adenocarcinoma 1397 1947 795160 Omni1-Quad 
ESCC 1388 1614 442551 Human660W-Quad 
GC 1342 1624 451186 Human660W-Quad 
Kidney 1080 2489 498235 HumanHapmap550, Human610-Quad, Human660W-Quad 
Lung 2557 2895 493060 HumanHapmap550, HumanHapmap240, HumanHapmap300, Human610-Quad, Human1M-Duo 
Melanoma QLD 2009 1963 298477 Omni1-Quad, HumanHapmap610 
Melanoma USA 1849 965 809287 Omni1-Quad 
Ovary 1710 2562 471552 Human610-Quad, Human1M-Duo 
Pancreas 1926 1971 499792 HumanHapmap550, HumanHapmap610 
Prostate 1093 992 493308 HumanHapmap300, HumanHapmap240 
Cancer types Cases Controls SNPs Arrays 
Bladder 2848 3159 505729 HumanHapmap550 
Breast 1081 1085 489247 HumanHapmap550 
Australian endometrial cancer 564 574 540995 Human610-Quad, Human1M-Duo 
UK endometrial cancer 670 2558 495905 Human610-Quad, Human1M-Duo 
Esophageal adenocarcinoma 1397 1947 795160 Omni1-Quad 
ESCC 1388 1614 442551 Human660W-Quad 
GC 1342 1624 451186 Human660W-Quad 
Kidney 1080 2489 498235 HumanHapmap550, Human610-Quad, Human660W-Quad 
Lung 2557 2895 493060 HumanHapmap550, HumanHapmap240, HumanHapmap300, Human610-Quad, Human1M-Duo 
Melanoma QLD 2009 1963 298477 Omni1-Quad, HumanHapmap610 
Melanoma USA 1849 965 809287 Omni1-Quad 
Ovary 1710 2562 471552 Human610-Quad, Human1M-Duo 
Pancreas 1926 1971 499792 HumanHapmap550, HumanHapmap610 
Prostate 1093 992 493308 HumanHapmap300, HumanHapmap240 
Table 2.

Array heritability explained by all autosomes (h2g) compared with the total heritability estimates from the largest twin studies

Cancer types Total heritability from twins studies
 
Array heritability
 
Lichtenstein (2000) % Mucci (2013) % h2g (95% CI) % h2g (95% CI), after removing known loci % 
Bladder 31 (0–45) n.a. 1 (0–11) 0 (0–10) 
Breast 27 (4–41) 28 (12–52) 13 (0–56) 5 (0–46) 
Australian endometrial cancer 0 (0–42) 24 (14–87) 39 (2–76) 39 (2–76) 
UK endometrial cancer 23 (1–45) 23 (1–45) 
Esophageal adenocarcinoma n.a. n.a. 24 (14–34) 24 (14–34) 
ESCC n.a. n.a. 19 (7–31) 19 (7–31) 
GC n.a. n.a. 11 (0–27) 8 (0–22) 
Kidney n.a. 23 (11–42) 18 (4–32) 15 (1–31) 
Lung 26 (0–49) 25 (12–44) 10 (0–24) 8 (0–22) 
Melanoma QLD n.a. 39 (8–81) 30 (10–50) 21 (1–41) 
Melanoma USA 19 (1–37) 8 (0–28) 
Ovary 22 (0–41) 28 (15–47) 30 (18–42) 29 (17–41) 
Pancreas 36 (0–53) n.a. 18 (6–30) 16 (4–28) 
Prostate 42 (29–50) 58 (52–63) 81 (32–100) 59 (12–100) 
Cancer types Total heritability from twins studies
 
Array heritability
 
Lichtenstein (2000) % Mucci (2013) % h2g (95% CI) % h2g (95% CI), after removing known loci % 
Bladder 31 (0–45) n.a. 1 (0–11) 0 (0–10) 
Breast 27 (4–41) 28 (12–52) 13 (0–56) 5 (0–46) 
Australian endometrial cancer 0 (0–42) 24 (14–87) 39 (2–76) 39 (2–76) 
UK endometrial cancer 23 (1–45) 23 (1–45) 
Esophageal adenocarcinoma n.a. n.a. 24 (14–34) 24 (14–34) 
ESCC n.a. n.a. 19 (7–31) 19 (7–31) 
GC n.a. n.a. 11 (0–27) 8 (0–22) 
Kidney n.a. 23 (11–42) 18 (4–32) 15 (1–31) 
Lung 26 (0–49) 25 (12–44) 10 (0–24) 8 (0–22) 
Melanoma QLD n.a. 39 (8–81) 30 (10–50) 21 (1–41) 
Melanoma USA 19 (1–37) 8 (0–28) 
Ovary 22 (0–41) 28 (15–47) 30 (18–42) 29 (17–41) 
Pancreas 36 (0–53) n.a. 18 (6–30) 16 (4–28) 
Prostate 42 (29–50) 58 (52–63) 81 (32–100) 59 (12–100) 

Ones in bold are significantly different from zero (P < 0.05).

When the effects of genome-wide significant SNPs were removed, the variance explained for prostate cancer and melanoma decreased substantially, although only in the smaller of the two melanoma datasets was the remaining polygenic variance not statistically significant (Table 2). In most other cases, the genome-wide significant SNPs accounted for very little of the estimated polygenic variance.

Following the removal of genetic relatedness and the adjustment of the first 20 principal components, we found the remaining population structure only contributed to a negligible proportion of genetic variance in each dataset. In most cases, the parameters indicating cryptic relatedness (b0) and population stratification (b1) were not significantly different from zero (Supplementary Material, Table S2). Hence, we conclude that it is unlikely that our estimates of h2g were spuriously inflated by unaccounted for population structure.

Furthermore, there was no significant array heritability in the first two control–control contrast studies after applying the standard analysis protocol (Supplementary Material, Table S3). We found, however, false array heritability in the comparison of two subsets of Welcome Trust Case-Control Consortium (WTCCC) controls (Supplementary Material, Table S3). This was removed by applying stringent quality control (QC) in addition to the standard protocol, which underlines the necessity of applying stringent QC to the analyses using WTCCC as controls. Overall, these results demonstrated that our method is robust.

DISCUSSION

The goal of the current study was to quantify the genetic contribution to the lifetime risk of cancer. In eight of the twelve cancers studied, there was a significant polygenic component, that is, a large number of genes of weak effect are involved. Given a polygenic basis to disease, it has been shown that most people who develop cancer would be ‘sporadic cases’ (no affected first-, second- or third-degree relatives) (11). This may go some way toward explaining the misconception that most ‘sporadic’ cancers have little or no germline genetic component. The polygenic components estimated for these datasets were not largely attributable to the genome-wide significant variants identified by recent GWASs. Even in conditions such as prostate cancer, where the polygenic component decreased substantially when genome-wide significant loci were removed, a significant polygenic component remained. One corollary of this is that conducting larger GWASs in most cancers will continue to find ever larger numbers of loci, which will explain more of the polygenic component. It has been suggested that an enhanced predictive power, gained from incorporating a larger number of genetic loci, will help fulfill the promise of personalized risk prediction (12).

In common with methods for estimating heritability from twin studies, we estimated the variance attributable to common SNPs (array heritability, h2g) on an assumed underlying quantitative liability scale where individuals are modeled as becoming affected once they exceed a particular threshold. However, there are two ways in which estimates of heritability derived from twin studies differ from array heritability. First, when using twin/family data, expected sharing of the genome among close relatives is used, meaning that all of the genome contributes to estimates of heritability. In contrast, estimating the genetic relationship between unrelated individuals uses information on only the portion of the genome tagged by SNPs on the microarray (i.e. common variants). Rare genetic variants (such as rare BRCA1/2 mutations in breast cancer) are generally not tagged on commercial microarrays, and hence, they do not contribute to our array heritability estimates. Second, twin studies sample from the general population and hence provide heritability estimates for the cancer up to the age at which the twins were ascertained (13) (twin studies typically ascertain individuals who are <85 years of age as assumed for our h2g calculations), which means that some twin study-derived estimates of heritability are artificially lowered for late onset cancers such as prostate. For example, when adding 10 years of follow-up (plus expanding the original twin cohorts from the Nordic countries), the heritability of prostate cancer was found significantly higher than the previous estimate (approximately one-third increase), which may be due to different age ascertainment (Table 2).

Results from our study should be interpreted in the context of limitations. Despite having ensured of >1000 cases and controls for all 12 cancer types, the statistical power for detecting a significant polygenic component varies dramatically with disease prevalence and true heritability. All but bladder, breast, gastric and lung cancer sets had reasonable statistical power (≥0.7); hence, the estimates of array heritability we obtained here should be relatively robust. However, for those four cancers, we had limited power (<0.5) to draw clear conclusion on their genetic variance.

It is worth noting that, since the information for the estimation of array heritability derives from the differences between cases and controls, when cases form a more extreme sample from the population [i.e. cancers with lower lifetime risk (K)], the information content of a sample is increased. Mathematically, for K < 0.5, the variance of the array heritability reduces as K reduces [is proportional to (K (1 − K))4] (9). Hence, for the two most common cancer types, prostate and breast cancer, a sample size of 1000 cases and controls are less informative than for rarer cancer types; thus, the estimates had wide confidence intervals. The prostate cancer dataset is from the Prostate, Lung, Colon and Ovarian (PLCO) Cancer Screening Trial. It contains a larger proportion of aggressive disease than observed in the population (aggressive disease was defined as cases with a Gleason Sore of ≥7 or Stage of ≥III at the time of diagnosis; 60% in the current sample set as compared with ∼20–30% in prostate cancer patients from the general population). To our knowledge, there is no existing heritability estimate for prostate cancer stratified by aggressive and nonaggressive disease. We estimated the array heritability separately for these two diseases and found that the aggressive disease (h2g = 0.80, 95% CI = 0.15–1, P = 0.009, based on 626 aggressive disease and 992 controls) had a higher heritability than the nonaggressive disease (h2g = 0.55, 95% CI = 0–1, P = 0.1, based on 467 nonaggressive cases and same controls). Due to the limited sample size, we cannot distinguish whether the heritability of aggressive and nonaggressive disease were significantly different. It warrants further investigation in larger prostate cancer datasets. If the heritability of aggressive disease is confirmed to be higher than that of nonaggressive disease, our dataset with a large proportion of aggressive disease might partly explain why our estimate is higher than those in the literature. A further consideration with prostate cancer is that there are very few familial genes identified. This is in contrast with, for example, breast and ovarian cancers, where the effect of high- to moderate-penetrance alleles including BRCA1/2 mutations explain >20% of the familial relative risk (14). Since the contribution of rare high-penetrance genes to prostate cancer appears to be small, our estimate of the array heritability explained by common variants might be high and very close to the total heritability.

The heritability of breast cancer from the twin studies was consistently ∼0.3 (2,10). The estimate was higher in family studies (0.4–0.6) (15), but this may be due to shared environmental factors. Our estimate of array heritability accounts for approximately half of the estimates from twin studies, which is not surprising given familial genes such as BRCA1/2 with high penetrance account for a sizable proportion of the total heritability.

For these common cancers, ideally the Genome-wide Complex Trait Analysis (GCTA) approach is applied to large datasets from the same or very similar populations, through consortium effort such as Collaborative Oncological Gene-environment Study (COGS) (16). At present, the genotyping array (iCOGS) does not have sufficient coverage as it contains mainly the fine-mapping loci without tagging the whole genome. If applying GCTA to these data, the estimate of array heritability would be much smaller due to insufficient array coverage. However, it might be feasible when the second OncoArray with reasonable tagging of the whole genome becomes available.

Our estimate of array heritability for bladder cancer was 0–0.11. Despite there are known risk variants, these only explain a very small proportion of heritability. The current sample set was underpowered to detect a low heritability: with the sample size of 2848 cases and 3159 controls, we had >80% power to detect a true heritability of 0.1, but only 30% power if the true heritability is 0.05. Although we did not exclusively show a significant heritability, the result did suggest that the heritability of bladder cancer is likely to be much lower than other cancer types.

We have used publically available data for a number of cancer types, including bladder, breast, esophagus squamous cell carcinoma, gastric, kidney, lung, melanoma (USA), pancreas and prostate. Some of these datasets consisted of multiple small cohorts, which were genotyped on different arrays. Using different arrays for cases and controls, respectively, may inflate estimates of array heritability. However, our robustness checks indicated that such inflation was unlikely to be large.

For some cancers, ongoing research is providing a molecular taxonomy for cancer, which is currently in a state of evolution, making it unclear how to best categorize cancer cases so they form a homogeneous group. In breast cancer and ovarian cancer, there is evidence for different subtypes having distinct genetic bases (17,18). Ideally for the polygenic analysis, subtypes would be tested separately but we lacked sufficient sample size to, for example, differentiate between different ovarian cancer histologies. Similarly, there are multiple ways of dealing with the smoking and non-smoking groups in lung cancer, but further investigation is limited by the sample size. There are differences in prevalence by sex in some cancers. We modeled this by fitting sex as a covariate. In addition to modeling an overall difference in prevalence between sexes, array heritability can be estimated in each sex separately, although for most of the cancers presented here, we had insufficient data to make a clear statement on sex-specific polygenic effects. We kept Australian and United States melanoma samples as separate groupings because the lifetime risk, a key part of the array heritability estimate, varies markedly between countries. The UK and Australian endometrial cancer sets were also analyzed separately because different QC was applied.

In summary, we show that common genetic variants contribute significantly to susceptibility to most of the cancer types that we examined. For most cancers examined here, the descriptor ‘sporadic’ or ‘non-hereditary’ should be replaced by ‘polygenic’.

MATERIALS AND METHODS

Individual study descriptions are listed as follows.

Melanoma (Australian)

Data consist of a subset of samples from an Australian melanoma study (19). Melanoma cases in individuals of European descent (n = 2168) were selected from the Queensland study of Melanoma: Environment and Genetic Associations (Q-MEGA) (20) and the Australian Melanoma Family Study (21). Samples from Australian individuals of European descent from three different sources were used as controls (n = 2146) (2022).

Melanoma (United States)

Data consist upon an extensive resource of melanoma cases and hospital based controls collected over several years at the U.T. M.D. Anderson Cancer Centre. This dbGaP study (dbGaP accession number phs000187.v1.p1) contains samples from 1965 European ancestry cases and 1038 European ancestry controls using the Illumina OMNI1-Quad SNP chip (23,24).

Breast cancer

Data consist of a subset from the Nurses' Health Study (25,26). Controls were matched to cases based on age, blood collection variables (time, date and year of blood collection, as well as recent (<3 months) use of postmenopausal hormones), ethnicity (all cases and controls are self-reported Caucasians) and menopausal status (all cases and controls were menopausal at blood draw). The dbGaP accession number is phs000147.v1.p1.

Prostate cancer

Data consist of a subset from the Cancer Genetic Markers of Susceptibility prostate cancer GWAS (27) with patients of European ancestry drawn from the PCLO cancer screening trial. The dbGaP accession number is phs000207.v1.p1. The aggressive subtype was defined as cases with a Gleason Sore of ≥7 or Stage of ≥III at the time of diagnosis, whereas the nonaggressive subtype was defined as cases with a Gleason Score of <7 and Stage of <III.

Pancreatic cancer

Participants were drawn from 12 cohort studies and 8 case–control studies (28,29). Cases were defined as those individuals having primary adenocarcinoma of the exocrine pancreas. Those with non-exocrine pancreatic tumors were excluded from the study. The datasets used for the analyses described in this manuscript were obtained from dbGaP through dbGaP accession number phs000206.v3.p2

Lung cancer

Data consist of a subset from the national cancer institute (NCI) GWAS study, specifically the consent group ‘Cancer in all age groups, other diseases in adults only, and methods’, containing a total of 7622 subjects (30). Subjects were drawn from one population-based case–control study and three cohort studies. Lung cancer cases were included based on a lung cancer diagnosis established on clinical criteria and confirmed by pathology reports from surgery, biopsy or cytology samples in ∼95% of cases and on clinical history and imaging for the remaining 5%. The datasets used for the analyses described in this manuscript were obtained from dbGaP through dbGaP accession number phs000336.v1.p1.

Kidney cancer (renal cell carcinoma)

Data consist of a subset of the NCI GWAS of RCC (31). The GWAS study included 1453 RCC cases and 3531 controls of European background from 4 studies (3 prospective cohorts and 1 case–control study). All cases were diagnosed with pathologically confirmed renal cancer (International Classification of Diseases for Oncology, Second Edition, Topography C64). The datasets used for the analyses described in this manuscript were obtained from dbGaP through dbGaP accession number phs000351.v1.p1.

Bladder cancer

Data consist of a subset of the GWAS study (32). Cases and controls used in this study were all from European ancestry (3532 cases and 5119 controls) (32). Cases were defined as individuals having histological confirmed primary carcinoma of the urinary bladder. Scan data were obtained from two case–control studies carried out in Spain and the United States and three prospective cohort studies in Finland and the United States. Samples were genotyped on Illumina SNP array (Human_1M, HumanHap550, HumaHap250, HumanHap300 or Human610-Quad). In this study, we only used individuals typed on the 550 chip. The datasets used for the analyses described in this manuscript were obtained from dbGaP through dbGaP accession number phs000346.v1.p1.

Ovarian cancer

All cases were from the UK and were confirmed as invasive epithelial ovarian cancer (33). Genotyping for the cases was conducted using the Illumina Infinium 610 K array at Illumina Corporation. Existing data from the WTCCC, genotyped using the Illumina Human1M-Duo array, were used as controls.

Esophageal adenocarcinoma

The data used were collected by investigators in the Barrett's and Esophageal Adenocarcinoma Consortium (BEACON). A Subset of individuals with European ancestry from 17 epidemiologic studies from three cohorts (Australia, Europe and North America) was selected for this study (1509 cases and 3202 controls). Cases were clinically diagnosed and histologically confirmed with esophageal adenocarcinoma.

Gastric adenocarcinoma and esophageal squamous cell carcinoma

Data used for these two cancers were collected from the Asian UGI GWAS (34,35). Esophageal squamous cell carcinomas include incident, primary esophageal cancers with squamous histology. Gastric cancers include incident, primary gastric adenocarcinomas. Controls were all cancer-free at the time of enrolment (and blood draw). The datasets used for the analyses described in this manuscript were obtained from dbGaP through dbGaP accession number phs000361.v1.p1.

Endometrial cancer

GWAS was conducted using cases with endometrioid histology endometrial cancer from Australia and the UK (36). Genotyping was conducted using the Human 610 K array on the Illumina Infinium platform. We extracted control data for SNPs included on the 610-K platform from existing Illumina 1.2 M genome-wide scan data for controls of European ancestry from two UK population-based studies genotyped by the WTCCC (37). We also matched Australian cases on the basis of ancestry with 1244 individuals from the Hunter Community Study (38).

Methods

We applied a standardized QC pipeline to all datasets: individuals with >5% missing genotypes were excluded, as were SNPs with minor allele frequency of <0.01 or call rate of <0.99 or P-value from testing Hardy–Weinberg equilibrium of <0.0001. Specific to two datasets, UK endometrial cancer and ovarian cancer, both using the WTCCC as controls, we have applied more stringent QC regarding differential missingness between cases and controls (P < 0.001) and two-locus QC (P < 0.02) as described previously (39). The numbers of SNPs passed QC were summarized in Table 1.

We used GCTA (9,40) to estimate the genetic relatedness between pairs of individuals that were derived from all autosomal SNPs post-QC. By relating observed case/control status to the estimated genetic relationship matrix, the variance explained by all autosomal SNPs was estimated using restricted maximum likelihood in GCTA (9). We define array heritability (h2g) as variance explained by autosomal SNPs divided by the total variance. Intuitively, h2g is high if pairs of individuals with high genetic relatedness (estimated from SNPs) have more similar phenotypes than those with lower genetic relatedness.

All datasets were restricted to individuals with European ancestry, except for the GC and ESCC, which were restricted to individuals with Asian ancestry. To limit the impact of population substructure (cryptic relationship and population stratification) on the estimation of array heritability, we included only those strictly unrelated samples (genetic relatedness < 0.025, approximately equivalent to second cousin; the numbers of cases and controls after removing genetic relatedness were summarized in Table 1) and adjusted for the first 20 principal components. We also applied the simple approach based on genome partitioning (41) to quantify remaining population substructure. This approach involves modeling the differences in by-chromosome h2g, which are estimated from separately fitting GRM of one chromosome at a time and from jointly fitting the GRMs of all 22 autosomes as random effects, against the chromosome length, i.e. h2c_seph2c_joint = b0 + b1 × Lc + e, with b0 indicating inflation due to cryptic relationship (irrespective of chromosomal length Lc) and b1 indicating inflation due to population stratification (correlates with Lc). The genetic variance attributable to remaining population structure can thus be calculated from b0 × 22/21 + b1 × sum (Lc)/21 (41).

To model differing prevalence between sexes, sex was fitted as a fixed effect if appropriate. We assume causal variants have a similar frequency spectrum as genotyped SNPs (42). Variance explained by X chromosome SNPs, defined as h2g_x, was estimated separately, under a full dosage compensation model (that is, each allele in females has only a half of the effect in males, h2g_x for females = ½ × h2g_x for males) (41). We also estimated h2g_x under alternative models: no dosage compensation (that is, each allele has a similar effect on the trait; thus, the genetic variance in females is twice of that in males, h2g_x for females = 2 × h2g_x for males) and equal variance between males and females (that is, h2g_x for females = h2g_x for males) (41).

We initially estimated variance explained on the observed scale (cancer case or control). For such binary traits, heritability is usually parameterized on an unobserved continuous ‘liability’ scale (typically, a Normal distribution, where individuals above a threshold are ‘affected’). This is desirable as it allows the heritability to describe the relative importance of genetic and non-genetic factors independent of disease prevalence. Hence, to allow us to compute a lower bound for heritability (on the continuous scale), we transform the estimate obtained on the observed scale using the estimated lifetime risk from population studies as the ‘prevalence’ (Supplementary Material, Table S4).

To assess how much array heritability has already been explained by the specific SNPs unambiguously identified by GWAS, for each cancer, we analyzed the data with the known GWAS loci removed. To do this, for each SNP in the GWAS catalog (http://www.genome.gov/gwastudies/ accessed 21 August 2013), we removed the associated SNPs and all SNPs within 1 megabase. Since linkage disequilibrium very rarely extends beyond this distance, removing all SNPs in such regions will result in all signals at the locus being removed.

The methods for estimating the ‘polygenic’ component are potentially sensitive to QC issues; hence, we tested the robustness of this method by estimating array heritability in control sets from different studies: first, we contrasted controls from the breast and prostate cancer studies; second, we combined controls from the breast and the prostate cancer studies and compared this combined set with controls from the pancreatic cancer study; third, we split the WTCCC controls into two sets, 1958 British Birth cohort and the UK Blood Service control group. The analysis protocol was identical to the one used in case–control studies. We arbitrarily defined one set of controls as ‘cases’ and tested whether the array heritability was significantly different from zero. Since ‘lifetime risk’ in the situation cannot be defined, we did the calculation on the observed case/control scale; note the significance of the genetic component is not affected by scaling for ‘lifetime risk’.

SUPPLEMENTARY MATERIAL

Supplementary Material is available at HMG online.

FUNDING

S.M., N.K.H., A.B.S. and G.C.T. are supported by Australian National Health and Research Council fellowships. S.M. and D.W. are supported by an Australian Research Council fellowship.

ACKNOWLEDGEMENTS

We thank Jian Yang and Sang Hong Lee for helpful discussions. Additional acknowledgements for each study are listed in Supplementary Material.

Conflict of Interest statement. None declared.

REFERENCES

1
Roukos
D.H.
Murray
S.
Briasoulis
E.
Molecular genetic tools shape a roadmap towards a more accurate prognostic prediction and personalized management of cancer
Can. Biol. Ther.
 
2007
6
308
312
2
Lichtenstein
P.
Holm
N.V.
Verkasalo
P.K.
Iliadou
A.
Kaprio
J.
Koskenvuo
M.
Pukkala
E.
Skytthe
A.
Hemminki
K.
Environmental and heritable factors in the causation of cancer--analyses of cohorts of twins from Sweden, Denmark, and Finland
N. Engl. J. Med.
 
2000
343
78
85
3
Mack
T.M.
Hamilton
A.S.
Press
M.F.
Diep
A.
Rappaport
E.B.
Heritable breast cancer in twins
Br. J. Can.
 
2002
87
294
300
4
Schildkraut
J.M.
Risch
N.
Thompson
W.D.
Evaluating genetic association among ovarian, breast, and endometrial cancer: evidence for a breast/ovarian cancer relationship
Am. J. Hum. Genet.
 
1989
45
521
529
5
Hemminki
K.
Vaittinen
P.
Dong
C.
Easton
D.
Sibling risks in cancer: clues to recessive or X-linked genes?
Br. J. Can.
 
2001
84
388
391
6
International Schizophrenia
C.
Purcell
S.M.
Wray
N.R.
Stone
J.L.
Visscher
P.M.
O'Donovan
M.C.
Sullivan
P.F.
Sklar
P.
Common polygenic variation contributes to risk of schizophrenia and bipolar disorder
Nature
 
2009
460
748
752
7
Chatterjee
N.
Wheeler
B.
Sampson
J.
Hartge
P.
Chanock
S.J.
Park
J.H.
Projecting the performance of risk prediction based on polygenic analyses of genome-wide association studies
Nat. Genet
 
2013
45
400
405
405e401–403
8
Witte
J.S.
Hoffmann
T.J.
Polygenic modeling of genome-wide association studies: an application to prostate and breast cancer
Omics: j. Integrat. Biol.
 
2011
15
393
398
9
Lee
S.H.
Wray
N.R.
Goddard
M.E.
Visscher
P.M.
Estimating missing heritability for disease from genome-wide association studies
Am. J. Hum. Genet
 
2011
88
294
305
10
Mucci
L.A.
Kaprio
J.
Harris
J.
Czene
K.
Kraft
P.
Scheike
T.
Graff
R.
Brandt
I.
Holmes
N.
Havelick
D.
et al.  
In American Society of Human Genetics
 
2013
Boston
11
Yang
J.
Visscher
P.M.
Wray
N.R.
Sporadic cases are the norm for complex disease
Eur. J. Hum. Genet
 
2010
18
1039
1043
12
Fletcher
O.
Houlston
R.S.
Architecture of inherited susceptibility to common cancer
Nat. Rev. Can.
 
2010
10
353
361
13
So
H.C.
Gui
A.H.
Cherny
S.S.
Sham
P.C.
Evaluating the heritability explained by known susceptibility variants: a survey of ten complex diseases
Genet Epidemiol
 
2011
35
310
317
14
Bahcall
O.G.
Common variation and heritability estimates for breast, ovarian and prostate cancers
Nat. Genet.
 
2013
15
Risch
N.
The genetic epidemiology of cancer: interpreting family and twin studies and their implications for molecular genetic approaches
Can. Epidemiol. Biomark. Prev.
 
2001
10
733
741
16
Bahcall
O.G.
iCOGS collection provides a collaborative model
Foreword Nat. Genet.
 
2013
45
343
17
Kraft
P.
Haiman
C.A.
GWAS identifies a common breast cancer risk allele among BRCA1 carriers
Nat. Genet.
 
2010
42
819
820
18
Bolton
K.L.
Tyrer
J.
Song
H.
Ramus
S.J.
Notaridou
M.
Jones
C.
Sher
T.
Gentry-Maharaj
A.
Wozniak
E.
Tsai
Y.Y.
et al.  
Common variants at 19p13 are associated with susceptibility to ovarian cancer
Nat. Genet.
 
2010
42
880
884
19
Macgregor
S.
Montgomery
G.W.
Liu
J.Z.
Zhao
Z.Z.
Henders
A.K.
Stark
M.
Schmid
H.
Holland
E.A.
Duffy
D.L.
Zhang
M.
et al.  
Genome-wide association study identifies a new melanoma susceptibility locus at 1q21.3
Nat. Genet.
 
2011
43
1114
1118
20
Baxter
A.J.
Hughes
M.C.
Kvaskoff
M.
Siskind
V.
Shekar
S.
Aitken
J.F.
Green
A.C.
Duffy
D.L.
Hayward
N.K.
Martin
N.G.
et al.  
The Queensland Study of Melanoma: environmental and genetic associations (Q-MEGA); study design, baseline characteristics, and repeatability of phenotype and sun exposure measures
Twin. Res. Hum. Genet
 
2008
11
183
196
21
Cust
A.E.
Schmid
H.
Maskiell
J.A.
Jetann
J.
Ferguson
M.
Holland
E.A.
Agha-Hamilton
C.
Jenkins
M.A.
Kelly
J.
Kefford
R.F.
et al.  
Population-based, case-control-family design to investigate genetic and environmental influences on melanoma risk: Australian Melanoma Family Study
Am. J. Epidemiol
 
2009
170
1541
1554
22
Painter
J.N.
Anderson
C.A.
Nyholt
D.R.
Macgregor
S.
Lin
J.
Lee
S.H.
Lambert
A.
Zhao
Z.Z.
Roseman
F.
Guo
Q.
et al.  
Genome-wide association study identifies a locus at 7p15.2 associated with endometriosis
Nat. Genet.
 
2011
43
51
54
23
Li
C.
Liu
Z.
Wang
L.E.
Gershenwald
J.E.
Lee
J.E.
Prieto
V.G.
Duvic
M.
Grimm
E.A.
Wei
Q.
Haplotype and genotypes of the VDR gene and cutaneous melanoma risk in non-Hispanic whites in Texas: a case-control study
Int. J. Can.
 
2008
122
2077
2084
24
Li
C.
Zhao
H.
Hu
Z.
Liu
Z.
Wang
L.E.
Gershenwald
J.E.
Prieto
V.G.
Lee
J.E.
Duvic
M.
Grimm
E.A.
Wei
Q.
Genetic variants and haplotypes of the caspase-8 and caspase-10 genes contribute to susceptibility to cutaneous melanoma
Hum. Mutat.
 
2008
29
1443
1451
25
Haiman
C.A.
Chen
G.K.
Vachon
C.M.
Canzian
F.
Dunning
A.
Millikan
R.C.
Wang
X.
Ademuyiwa
F.
Ahmed
S.
Ambrosone
C.B.
et al.  
A common variant at the TERT-CLPTM1L locus is associated with estrogen receptor-negative breast cancer
Nat. Genet.
 
2011
43
1210
1214
26
Hunter
D.J.
Kraft
P.
Jacobs
K.B.
Cox
D.G.
Yeager
M.
Hankinson
S.E.
Wacholder
S.
Wang
Z.
Welch
R.
Hutchinson
A.
et al.  
A genome-wide association study identifies alleles in FGFR2 associated with risk of sporadic postmenopausal breast cancer
Nat. Genetics
 
2007
39
870
874
27
Yeager
M.
Orr
N.
Hayes
R.B.
Jacobs
K.B.
Kraft
P.
Wacholder
S.
Minichiello
M.J.
Fearnhead
P.
Yu
K.
Chatterjee
N.
et al.  
Genome-wide association study of prostate cancer identifies a second risk locus at 8q24
Nat. Genetics
 
2007
39
645
649
28
Amundadottir
L.
Kraft
P.
Stolzenberg-Solomon
R.Z.
Fuchs
C.S.
Petersen
G.M.
Arslan
A.A.
Bueno-de-Mesquita
H.B.
Gross
M.
Helzlsouer
K.
Jacobs
E.J.
et al.  
Genome-wide association study identifies variants in the ABO locus associated with susceptibility to pancreatic cancer
Nat. Genetics
 
2009
41
986
990
29
Petersen
G.M.
Amundadottir
L.
Fuchs
C.S.
Kraft
P.
Stolzenberg-Solomon
R.Z.
Jacobs
K.B.
Arslan
A.A.
Bueno-de-Mesquita
H.B.
Gallinger
S.
Gross
M.
et al.  
A genome-wide association study identifies pancreatic cancer susceptibility loci on chromosomes 13q22.1, 1q32.1 and 5p15.33
Nat. Genet.
 
2010
42
224
228
30
Landi
M.T.
Chatterjee
N.
Yu
K.
Goldin
L.R.
Goldstein
A.M.
Rotunno
M.
Mirabello
L.
Jacobs
K.
Wheeler
W.
Yeager
M.
et al.  
A genome-wide association study of lung cancer identifies a region of chromosome 5p15 associated with risk for adenocarcinoma
Am. J. Hum. Genetics
 
2009
85
679
691
31
Purdue
M.P.
Johansson
M.
Zelenika
D.
Toro
J.R.
Scelo
G.
Moore
L.E.
Prokhortchouk
E.
Wu
X.
Kiemeney
L.A.
Gaborieau
V.
et al.  
Genome-wide association study of renal cell carcinoma identifies two susceptibility loci on 2p21 and 11q13.3
Nat. Genet.
 
2011
43
60
65
32
Rothman
N.
Garcia-Closas
M.
Chatterjee
N.
Malats
N.
Wu
X.
Figueroa
J.D.
Real
F.X.
Van Den Berg
D.
Matullo
G.
Baris
D.
et al.  
A multi-stage genome-wide association study of bladder cancer identifies multiple susceptibility loci
Nat. Genetics
 
2010
11
978
984
33
Song
H.
Ramus
S.J.
Tyrer
J.
Bolton
K.L.
Gentry-Maharaj
A.
Wozniak
E.
Anton-Culver
H.
Chang-Claude
J.
Cramer
D.W.
DiCioccio
R.
et al.  
A genome-wide association study identifies a new ovarian cancer susceptibility locus on 9p22.2
Nat. Genet.
 
2009
41
996
1000
34
Wang
L.D.
Zhou
F.
Li
X.M.
Sun
L.D.
Song
X.
Jin
Y.
Li
J.M.
Kong
G.Q.
Qi
H.
Cui
J.
et al.  
Genome-wide association study of esophageal squamous cell carcinoma in Chinese subjects identifies susceptibility loci at PLCE1 and C20orf54
Nat. Genetics
 
2010
42
759
763
35
Abnet
C.C.
Freedman
N.
Hu
N.
Wang
Z.
Yu
K.
Shu
X.O.
Yuan
J.M.
Zheng
W.
Dawsey
S.M.
Dong
L.M.
et al.  
A shared susceptibility locus in PLCE1 at 10q23 for gastric adenocarcinoma and esophageal squamous cell carcinoma
Nat. Genetics
 
2010
42
764
767
36
Spurdle
A.B.
Thompson
D.J.
Ahmed
S.
Ferguson
K.
Healey
C.S.
O'Mara
T.
Walker
L.C.
Montgomery
S.B.
Dermitzakis
E.T.
Fahey
P.
et al.  
Genome-wide association study identifies a common variant associated with risk of endometrial cancer
Nat. Genet.
 
2011
43
451
454
37
Genome-wide association study of 14,000 cases of seven common diseases and 3,000 shared controls
Nature
 
2007
447
661
678
38
McEvoy
M.
Smith
W.
D'Este
C.
Duke
J.
Peel
R.
Schofield
P.
Scott
R.
Byles
J.
Henry
D.
Ewald
B.
et al.  
Cohort profile: The Hunter Community Study
Int. J. Epidemiol.
 
2010
39
1452
1463
39
Lee
S.H.
Nyholt
D.R.
Macgregor
S.
Henders
A.K.
Zondervan
K.T.
Montgomery
G.W.
Visscher
P.M.
A simple and fast two-locus quality control test to detect false positives due to batch effects in genome-wide association studies
Genet. Epidemiol.
 
2010
34
854
862
40
Yang
J.
Lee
S.H.
Goddard
M.E.
Visscher
P.M.
GCTA: a tool for genome-wide complex trait analysis
Am. J. Hum. Genet.
 
2011
88
76
82
41
Yang
J.
Manolio
T.A.
Pasquale
L.R.
Boerwinkle
E.
Caporaso
N.
Cunningham
J.M.
de Andrade
M.
Feenstra
B.
Feingold
E.
Hayes
M.G.
et al.  
Genome partitioning of genetic variation for complex traits using common SNPs
Nat. Genet.
 
2011
43
519
525
42
Yang
J.
Benyamin
B.
McEvoy
B.P.
Gordon
S.
Henders
A.K.
Nyholt
D.R.
Madden
P.A.
Heath
A.C.
Martin
N.G.
Montgomery
G.W.
et al.  
Common SNPs explain a large proportion of the heritability for human height
Nat. Genet.
 
2010
42
565
569

Author notes

These authors contribute equally.
Q-MEGA and AMFS Investigators, ANECS-SEARCH members, UKOPS-SEARCH members and BEACON Consortium members are listed in Supplementary Material.