Abstract

Late-Onset Alzheimer’s Disease (LOAD) is a heterogeneous neurodegenerative disorder with complex etiology and high heritability. Its multifactorial risk profile and large portions of unexplained heritability suggest the involvement of yet unidentified genetic risk factors. Here we describe the “whole person” genetic risk landscape of polygenic risk scores for 2218 traits in 2044 elderly individuals and test if novel eigen-PRSs derived from clustered subnetworks of single-trait PRSs can improve the prediction of LOAD diagnosis, rates of cognitive decline, and canonical LOAD neuropathology. Network analyses revealed distinct clusters of PRSs with clinical and biological interpretability. Novel eigen-PRSs (ePRS) from these clusters significantly improved LOAD-related phenotypes prediction over current state-of-the-art LOAD PRS models. Notably, an ePRS representing clusters of traits related to cholesterol levels was able to improve variance explained in a model of the brain-wide beta-amyloid burden by 1.7% (likelihood ratio test P = 9.02 × 10−7). All associations of ePRS with LOAD phenotypes were eliminated by the removal of APOE-proximal loci. However, our association analysis identified modules characterized by PRSs of high cholesterol and LOAD. We believe this is due to the influence of the APOE region from both PRSs. We found significantly higher mean SNP effects for LOAD in the intersecting APOE region SNPs. Combining genetic risk factors for vascular traits and dementia could improve current single-trait PRS models of LOAD, enhancing the use of PRS in risk stratification. Our results are catalogued for the scientific community, to aid in generating new hypotheses based on our maps of clustered PRSs and associations with LOAD-related phenotypes.

Introduction

Late-Onset Alzheimer’s disease (LOAD) is the most common cause of dementia in older adults [1]. It is a highly polygenic and multifactorial disease [2] with heritability estimated to be between 58% to 79% [3]. Despite the identification of its canonical pathologies decades ago, no effective disease-modifying treatments are available. This is likely a result of the etiological complexity of the illness and challenges associated with studying causality for a cascade of events that precede the emergence of symptoms by decades. High-throughput genetic approaches and the ongoing era of genome-wide association study (GWAS) discovery have provided clues into the complex, potentially causal mechanisms of LOAD [4] and highlighted disturbances across multiple processes and systems. The most recent genetic evidence points toward inflammatory, neurotrophic, metabolic, and cellular senescence mechanisms [5–9], and genetic correlations have been observed between LOAD and a multitude of physical, psychiatric, and neurological illnesses [8, 10].

From an epidemiological perspective, the risk factors for LOAD are varied and exert their effects at different points in the lifespan. A recent review of over 570 studies identified over 15 factors associated with the emergence and progression of LOAD [11], grouped as potentially modifiable, non-modifiable, and clinical factors; age at clinical onset, family history of dementia, biological sex, the apolipoprotein E (APOE) ɛ4 variant and over 80 other genetic variants, altered gene regulation, baseline cognitive level, neuropsychiatric symptoms, and extrapyramidal signs are non-modifiable and clinical [11]. Some examples of modifiable risk factors include preventable comorbidities, malnutrition, educational attainment, social supports, sleep behaviour and circadian rhythms, anxiety and depression, inflammation, and oxidative stress [11]. Many other factors, such as diabetes [12], cardiovascular disease [13], and smoking [14] are also associated with a higher risk of dementia. Additionally, the age at which some risk factors develop can affect an individual’s risk profile. For example, midlife obesity [15], hypertension [16], high cholesterol levels [17, 18] and statin use [19] are associated with an increased risk of dementia later in life. This calls for a more sophisticated multivariate, multi-trait analyses of the genetic component of risk for predicting LOAD, incorporating the whole-person scope of potential phenotypic contributions.

To our knowledge, there have only been a small handful of studies attempting to unravel the phenome-wide genetic risk landscape of LOAD using polygenic approaches. One group used results from the UK Biobank PheWAS [20] to identify patterns of correlated phenotypes and individual genetic variants across thousands of human traits, including LOAD, using singular value decomposition applied to summary statistics [21]. Similar work, focusing specifically on LOAD, proposed another summary statistics-focused method [22], identifying 48 PRSs each with significant association with LOAD genetic risk. Other analyses from the Alzheimer’s Disease Genomics Consortium (ADGC) have examined multifactorial risk for LOAD by calculating PRSs for 22 candidate modifiable risk factors, finding only an educational attainment PRS significantly associated with LOAD [23]. Here, we build on previous efforts in four important ways: 1) by performing dimension reduction on phenome-wide PRSs calculated on a specific target population of elderly, rather than operating on summary statistics alone as in previous studies [21, 22], which often make assumptions about allele frequencies in the general population which may not hold across samples, 2) by examining how broad patterns of phenotypic overlap across PRSs are affected by the adjustment of SNP inclusion thresholds (including sensitivity analyses with the inclusion and exclusion of APOE and the MHC genetic regions), 3) by deriving novel composite eigen-PRSs (ePRSs) through the application of the weighted gene correlation network analysis (WGCNA) to phenome-wide sets of PRSs, and 4) by testing these novel ePRSs individually and against the standard LOAD-specific PRS [24] for association with longitudinal cognitive scores and post-mortem neuropathological phenotypes.

We hypothesize that a set of phenome-wide (Table S1 of the Supplementary Methods) PRSs will yield discrete intra-correlated clusters of related traits and that patterns of correlation and clustering will change as a function of SNP inclusion criteria, possibly reflecting between-trait differences in polygenicity. We also hypothesize that the proposed ePRSs, derived from the WGCNA algorithm, will capture additional overlapping etiological risk factors for LOAD thus improve the prediction of LOAD and LOAD-related phenotypes over the more typical single-PRS approach. We have summarized an abstract illustration of our study in Fig. 1.

Flowchart showing the combination of GWAS summary statistics ind individual-level samples used in the analyses. Arrows point through the process of calculating sets of polygenic risk scores, performing clustering of scores using WGCNA, and performing regression analysis with resulting eigen-polygenic risk scores on outcome phenotypes.
Figure 1

A summary of the study workflow.

Results

Clustering of PRS matrices using WGCNA

Table 1 summarizes demographic information for the individuals included in our study. Following descriptive PCA, outlined in the Supplementary Methods (Figs S1S9), and the assessment of phenome-wide variance in PRSs at different α-value thresholds, we sought to group PRSs into discrete clusters based on their network-based similarity using WGCNA. Unlike PCA, WGCNA provides discrete, mutually exclusive groupings of PRS subnetworks for further analysis. As expected, biologically and clinically related traits clustered together. The number of modules identified, differed as a function of α-value threshold. Table S2 outlines the number of modules for each matrix, though module identities remained largely cohesive. Hub analysis was used to reveal the PRSs that are most interconnected within each module; hub PRSs at different thresholds included those for weight in the red module (ΠALL,5 × 10−8, consisting of 22 PRSs), rheumatoid arthritis (RA) in the blue module (ΠALL,5 × 10−5, consisting of 209 PRSs), and malignant melanoma in the black module (ΠALL,1 × 10−5, consisting of 25 PRSs), and cancer of bronchus lung in the brown module (ΠALL,0.01,consisting of 103 PRSs). Note that module color labels are arbitrarily assigned by WGCNA and not linked to particular sets or sizes of PRS (i.e. they are not preserved between runs of WGCNA). Some modules, such as the weight module, were largely preserved across nearly all α-value (except for 1 × 10−6 and 5 × 10−7) thresholds (Figs S10S13); this module was clustered with PRSs related to body mass, bone density, and coronary and heart diseases. Conversely, chronic ischemic heart disease in the yellow module (ΠALL,5 × 10−5, consisting of 21 PRSs) clustered along with other heart or coronary complications such as heart attack (myocardial infarction) and ischemic heart disease. Notably, this module was conserved across α-value thresholds of 0.01, 0.005, 5 × 10−5 and 5 × 10−7, and was correlated with modules that included PRSLOAD and PRSs for family history of AD and all-cause dementia. Of note, PRSLOAD consistently clustered within the rheumatoid arthritis (RA) module in ΠALL,0.001, ΠALL,0.005, ΠALL,0.0001, ΠAPOE-,0.001, ΠAPOE-,0.001, ΠAPOE-,0.0005, ΠAPOE-,0.0001, ΠAPOE-,5 × 10−5. Figure S14 outlines the correlational structure of two selected modules (weight and high cholesterol).

Table 1

Demographic summary of the participants in the religious orders study (ROS) and memory aging project (MAP) for which PRS were calculated.

Autopsied (n = 1443)Not deceased or not autopsied (n = 601)
CN (n = 496)MCI (n = 365)AD (n = 559)Other (n = 23)NCI (n = 442)MCI (n = 111)AD (n = 47)Other (n = 1)
Sex (M/F)181/315132/233167/3928/1598/34427/847/400/1
Age at study entry (SD)78.95 (6.91)80.79 (6.56)81.79 (6.75)80.98 (7.10)73.73 (6.92)76.69 (7.33)78.17 (8.48)87.24
Age at death (SD)87.63 (6.82)89.22 (6.32)91.00 (6.11)88.46 (5.69)NANANANA
Education (SD)16.35 (3.66)16.19 (3.49)16.23 (3.59)15.52 (5.17)16.70 (3.39)15.68 (3.32)16.23 (3.10)16.00
Mean LOAD PRS (SD)−0.30 (0.70)−0.07 (0.73)0.40 (0.85)0.11 (0.86)−0.15 (0.66)0.00 (0.75)0.30 (0.91)−0.05
Autopsied (n = 1443)Not deceased or not autopsied (n = 601)
CN (n = 496)MCI (n = 365)AD (n = 559)Other (n = 23)NCI (n = 442)MCI (n = 111)AD (n = 47)Other (n = 1)
Sex (M/F)181/315132/233167/3928/1598/34427/847/400/1
Age at study entry (SD)78.95 (6.91)80.79 (6.56)81.79 (6.75)80.98 (7.10)73.73 (6.92)76.69 (7.33)78.17 (8.48)87.24
Age at death (SD)87.63 (6.82)89.22 (6.32)91.00 (6.11)88.46 (5.69)NANANANA
Education (SD)16.35 (3.66)16.19 (3.49)16.23 (3.59)15.52 (5.17)16.70 (3.39)15.68 (3.32)16.23 (3.10)16.00
Mean LOAD PRS (SD)−0.30 (0.70)−0.07 (0.73)0.40 (0.85)0.11 (0.86)−0.15 (0.66)0.00 (0.75)0.30 (0.91)−0.05

Note: Other = cause of dementia not likely Alzheimer’s disease (e.g. vascular dementia); NCI = no cognitive impairment; MCI = mild cognitive impairment; AD = dementia likely due to Alzheimer’s disease. Additionally, the PRSs are calculated on the whole genome, and averaged over all 15 α-value thresholds.

Table 1

Demographic summary of the participants in the religious orders study (ROS) and memory aging project (MAP) for which PRS were calculated.

Autopsied (n = 1443)Not deceased or not autopsied (n = 601)
CN (n = 496)MCI (n = 365)AD (n = 559)Other (n = 23)NCI (n = 442)MCI (n = 111)AD (n = 47)Other (n = 1)
Sex (M/F)181/315132/233167/3928/1598/34427/847/400/1
Age at study entry (SD)78.95 (6.91)80.79 (6.56)81.79 (6.75)80.98 (7.10)73.73 (6.92)76.69 (7.33)78.17 (8.48)87.24
Age at death (SD)87.63 (6.82)89.22 (6.32)91.00 (6.11)88.46 (5.69)NANANANA
Education (SD)16.35 (3.66)16.19 (3.49)16.23 (3.59)15.52 (5.17)16.70 (3.39)15.68 (3.32)16.23 (3.10)16.00
Mean LOAD PRS (SD)−0.30 (0.70)−0.07 (0.73)0.40 (0.85)0.11 (0.86)−0.15 (0.66)0.00 (0.75)0.30 (0.91)−0.05
Autopsied (n = 1443)Not deceased or not autopsied (n = 601)
CN (n = 496)MCI (n = 365)AD (n = 559)Other (n = 23)NCI (n = 442)MCI (n = 111)AD (n = 47)Other (n = 1)
Sex (M/F)181/315132/233167/3928/1598/34427/847/400/1
Age at study entry (SD)78.95 (6.91)80.79 (6.56)81.79 (6.75)80.98 (7.10)73.73 (6.92)76.69 (7.33)78.17 (8.48)87.24
Age at death (SD)87.63 (6.82)89.22 (6.32)91.00 (6.11)88.46 (5.69)NANANANA
Education (SD)16.35 (3.66)16.19 (3.49)16.23 (3.59)15.52 (5.17)16.70 (3.39)15.68 (3.32)16.23 (3.10)16.00
Mean LOAD PRS (SD)−0.30 (0.70)−0.07 (0.73)0.40 (0.85)0.11 (0.86)−0.15 (0.66)0.00 (0.75)0.30 (0.91)−0.05

Note: Other = cause of dementia not likely Alzheimer’s disease (e.g. vascular dementia); NCI = no cognitive impairment; MCI = mild cognitive impairment; AD = dementia likely due to Alzheimer’s disease. Additionally, the PRSs are calculated on the whole genome, and averaged over all 15 α-value thresholds.

Focusing specifically on the PRSLOAD, we found it clustered within the pink module from ΠALL,5 × 10−8 (represented by high cholesterol, consisting of 19 PRSs), red module from ΠALL,1 × 10−7 (represented by high cholesterol, consisting of 23 PRSs) and ΠALL,5 × 10−5 (consisting of 22 PRSs), brown module from ΠALL,1 × 10−6 (represented by presence of cardiac and vascular implants and grafts, consisting of 68 PRSs) and ΠALL,5 × 10−6 (represented by presence of cardiac and vascular implants and grafts, consisting of 85 PRSs) and ΠALL,1 × 10−5 (represented by presence of cardiac and vascular implants and grafts, consisting of 77 PRSs) along with PRSs for all-cause dementia, family history of AD or AD-dementia, delirium, Apolipoprotein levels, cholesterol levels, presence of cardiac, and vascular implants and grafts—reinforcing existing evidence for genetic correlation between these traits. History of taking prescriptions to treat vascular and abnormal cholesterol levels, such as rosuvastatin, ezerimibe, and HMG-CoA reductase inhibitor statins, also clustered with PRSLOAD and other dementias at lower α-value thresholds. Interestingly, the PRSLOAD did not consistently cluster with the UKB-derived family history of AD dementia (ICD code: G30.9), family history of AD (ICD code: G30.9) or dementia (ICD code: F03), PRSs—demonstrating the divergence of PRS derived from different cohorts. However, this divergence was only present at higher α-value thresholds (ΠALL,1, ΠALL,0.01, ΠALL,0.05, ΠALL,0.001) reinforcing the importance of SNP inclusion when using clumping and thresholding.

The removal of MHC and/or APOE loci resulted in several changes in module number and membership across α-value thresholds. Most notably, when the APOE region was excluded (ΠAPOE-,α), the PRSLOAD clustered either in the grey module (this module contains a set of PRSs which have not been clustered in any module), or the turquoise module (represented by socio economic factors, and retinal disorders), along with other dementia-related phenotypes (and many others). This was true across all α-value thresholds except for the blue module (ΠAPOE-,0.01 represented by weight, ΠAPOE-,0.005 represented by BMI, ΠAPOE-,1 × 10−7 represented by family history of diabetes, ΠAPOE-,1 × 10−5 represented by prednisolone, ΠAPOE-,5 × 10−5 represented by Rheumatoid Arthritis (RA)), yellow module (ΠAPOE-,0.001 represented by RA), and the brown module (ΠAPOE-,0.0005 represented by RA) where the PRSLOAD was clustered with other PRSs of which none were PRS of dementia or family history of AD. When the MHC region was also excluded, there were some modules where PRSLOAD was clustered with PRS of dementia or family history of AD or delirium. However, this membership was not consistent and varied for different α-value thresholds. Furthermore, PRSLOAD appeared in the blue module (ΠMHC/APOE-,5 × 10−6 represented by presence of cardiac and vascular implants and grafts, ΠMHC/APOE-,0.01 represented by weight).

Overall, although the total variations explained by ePRSs vary across modules, the ePRS of some modules can explain up to 85% of the total variation. For example, the ePRSs of the greenyellow module (ΠAPOE-,10−7 represented by calculus of bile duct) and the pink module (ΠMHC-,10−7 represented by diabetes), along with seven other ePRSs from ΠALL,α, ΠMHC-,α, ΠAPOE-,α, and ΠMHC/APOE-,α were able to explain more than 75% of the total variation of their corresponding module. Figure S15 outlines the total variations explained by each ePRS for all PRS matrices. We mitigated trait redundancy concerns by conducting quality control (QC) steps, including the pairwise removal of PRSs with a Pearson correlation of 0.95 or higher, and sensitivity analysis using a smaller PRS subset. This involved removing phenotypes with hierarchical classifications, following the same QC as in [25]. The sensitivity analysis results (Figs S16S20) strongly supported the preservation of all modules of interest. Although the WGCNA has shown to produce robust clusters in gene expression datasets [26], we sought to identify the stability of a few example PRS matrices across ΠALL,α (refer to Table S3). These measurements suggest that the WGCNA clustering method shows robust clustering stability structure within these PRSs.

Associations of ePRS with clinical and neuropathological traits

Following the network-based identification of correlated PRSs across the phenome, we evaluated the associations of derived ePRSs with late-life cognitive and neuropathological phenotypes. Figure 2 summarizes these results across all α-value thresholds (and all PRS matrices), highlighting several modules strongly associated with our LOAD-related phenotypes at a false discovery rate (FDR) adjusted q-value ≤ 0.05. Notably, all significant ePRS associations were observed for the ΠALL,α and ΠMHC-,α matrices, whereas none of the ePRSs calculated from ΠAPOE-,α and ΠMHC/APOE-,α passed FDR correction. Full sets of ePRS association results are summarized in Figs S21S23.

The top of the figure shows a branched dendrogram of the relationships between polygenic scores, including colors below the dendrogram indicating cluster membership. The bottom part of the figure is a paneled dot plot showing the sizes of polygenic score clusters (modules) and their association strength with different phenotypic outcomes.
Figure 2

Association analysis between the aging phenotypes and the ePRSs. Dendrogram of dissimilarity matrix (based on the topological property of the PRS matrix) of ΠAll,5 × 10−8 produced by the WGCNA pipeline with soft thresholding of 1. b) Summarizing the association of the aging phenotypes from the ROS/MAP study cohort and the ePRSs from ΠAll. We outline the mostly connected PRS within the modules highly associated with the aging phenotypes (q-value ≤ 0.05). The red dotted line represents 5% significance level. The module represented by high cholesterol at α-value threshold of 5 × 10−7 had the highest association with all five phenotypes. The modules, represented by high cholesterol levels, indicated the strongest associativity with all five aging phenotypes at α-value threshold of 1 × 10−6, 1 × 10−7, and 5 × 10−8. These modules were selected to test if they can improve the predictability ability of stateof-the-art PRSLOAD. Refer to Figs S21S23 for the summary results from the other PRS matrices ΠMHC-APOE-MHC/APOE-. The significance of the APOE region in explaining ageing-related phenotypes is highlighted.

Among the top results were the pink ePRS from ΠALL,5 × 10−8, the red module from ΠALL,1 × 10−7, and the red module from ΠALL,5 × 10−5, ΠALL,1 × 10−6, which were each largely composed of PRSs for cholesterol levels, and strongly associated with all five LOAD phenotypes. EPRSs with hub PRSs of maximum carotid intima-medial thickness (IMT; based on ultrasound) were strongly associated with β-amyloid levels. With the removal of the MHC region, the top associated ePRSs with all five phenotypes were consistent in terms of overall PRSs in these modules. Though, more ePRSs represented by high cholesterol indicated association with all five phenotypes. Overall, ePRSs represented by hubs of high cholesterol levels, LDL, and maximum carotid IMT were found to be highly associated with the LOAD-related traits and neuropathologies. Furthermore, we outline the network structure of the selected modules in Figs S24S27. Interesting to note that from the network analysis, in comparison to other members of the module, the PRSLOAD had varying correlations to other PRSs in the same module. Though, this depended on the α-value threshold and whether the module was from ΠALL,α or ΠMHC-,α.

Assessment of improvements in predictive performance of ePRS over established single-trait LOAD PRS

Having identified several ePRSs associated with LOAD-related traits and neuropathologies, we sought to test whether these ePRSs could significantly augment single-PRS models of only PRSLOAD and basic demographics. In some cases, the ePRSs were able to improve the variance explained by the base models by up to 1.7%. Modelling results for selected top ePRSs are shown in Table 2. The brown hub of ΠMHC-,1 × 10−5, represented by the hub PRS for cholesterol levels, was able to improve predictive models over the single-trait PRSLOAD for both β-amyloid counts (likelihood ratio test P = 5.29 × 10−5) and global random slope of cognitive decline (likelihood ratio test P = 3.16 × 10−3). Similarly, the red module, represented by high cholesterol levels, from ΠALL,1 × 10−7, was able to improve the predictability power of the PRSLOAD in association with β-amyloid count (likelihood ratio test P = 9.25 × 10−6). Figure 3 (expanded in Fig. S28) summarizes the overall performance comparison between base models and the ePRS-inclusive full models (defined as model 2 and model 3). Only ePRSs with FDR-adjusted q-value ≤ 0.05 for the likelihood ratio test between base and full model are shown.

Table 2

Effect of ePRSs in predicting ageing pathologies and cognitive decline.

SNP inclusion thresholdY (outcomea)ePRS (module)βePRSModel performanceb
(95% CI)
ΔModel performancec
PMLE
nd
ΠALL,5 × 10−5Total AβRed
(High cholesterol)
5.1050.084
(0.063,0.123)
0.008
4.69 × 10−4**
1238
ΠALL,1 × 10−6Total AβSalmon
(Mean Carotid)
3.6340.085
(0.059,0.114)
0.002
8.63 × 10−3**
1238
ΠALL,1 × 10−7Global SlopeRed
(High Cholesterol)
−0.3340.030
(0.017,0.048)
0.003
1.11 × 10−3**
1915
Total Aβ6.7700.093
(0.067,0.125)
0.011
9.25 × 10−6***
1238
ΠALL,5 × 10−8Global SlopePink
(High Cholesterol)
−0.3250.029
(0.013,0.047)
0.002
1.46 × 10−3**
1915
Total Aβ6.5450.094
(0.071,0.126)
0.013
1.57 × 10−5***
1238
ΠMHC-,0.0001Total AβGreen
(High Cholesterol)
0.1790.091
(0.073,0.129)
0.017
9.02 × 10−7***
1238
ΠMHC-,5 × 10−5Global SlopeYellow
(High Cholesterol)
−0.0170.093
(0.071,0.121)
0.013
1.10 × 10−5***
1915
Total Aβ0.1810.091
(0.059,0.121)
0.010
2.22 × 10−5***
1238
ΠMHC-,1 × 10−5Global SlopeBrown
(High Cholesterol)
−0.0150.085
(0.061,0.119)
0.005
3.16 × 10−3**
1915
Total Aβ0.1630.085
(0.064,0.122)
0.012
5.29 × 10−5***
1238
ΠMHC-,1 × 10−6Global SlopeBrown
(High Cholesterol)
−0.0130.030
(0.015,0.048)
0.004
9.71 × 10−4***
1915
Total Aβ0.1800.079
(0.054,0.118)
0.008
3.39 × 10−4***
1238
ΠMHC-,5 × 10−7Global SlopeRed
(High Cholesterol)
−0.0130.036
(0.024,0.053)
0.004
2.02 × 10−3**
1915
Total Aβ0.1790.028
(0.013,0.048)
0.002
1.88 × 10−3**
1238
ΠMHC-,1 × 10−7Total AβBlack
(LDL)
0.2050.084
(0.058,0.131)
0.003
4.16 × 10−3**
1238
ΠMHC-,5 × 10−8Total AβGreen
(LDL)
0.2090.037
(0.023,0.052)
0.001
1.37 × 10−2*
1238
SNP inclusion thresholdY (outcomea)ePRS (module)βePRSModel performanceb
(95% CI)
ΔModel performancec
PMLE
nd
ΠALL,5 × 10−5Total AβRed
(High cholesterol)
5.1050.084
(0.063,0.123)
0.008
4.69 × 10−4**
1238
ΠALL,1 × 10−6Total AβSalmon
(Mean Carotid)
3.6340.085
(0.059,0.114)
0.002
8.63 × 10−3**
1238
ΠALL,1 × 10−7Global SlopeRed
(High Cholesterol)
−0.3340.030
(0.017,0.048)
0.003
1.11 × 10−3**
1915
Total Aβ6.7700.093
(0.067,0.125)
0.011
9.25 × 10−6***
1238
ΠALL,5 × 10−8Global SlopePink
(High Cholesterol)
−0.3250.029
(0.013,0.047)
0.002
1.46 × 10−3**
1915
Total Aβ6.5450.094
(0.071,0.126)
0.013
1.57 × 10−5***
1238
ΠMHC-,0.0001Total AβGreen
(High Cholesterol)
0.1790.091
(0.073,0.129)
0.017
9.02 × 10−7***
1238
ΠMHC-,5 × 10−5Global SlopeYellow
(High Cholesterol)
−0.0170.093
(0.071,0.121)
0.013
1.10 × 10−5***
1915
Total Aβ0.1810.091
(0.059,0.121)
0.010
2.22 × 10−5***
1238
ΠMHC-,1 × 10−5Global SlopeBrown
(High Cholesterol)
−0.0150.085
(0.061,0.119)
0.005
3.16 × 10−3**
1915
Total Aβ0.1630.085
(0.064,0.122)
0.012
5.29 × 10−5***
1238
ΠMHC-,1 × 10−6Global SlopeBrown
(High Cholesterol)
−0.0130.030
(0.015,0.048)
0.004
9.71 × 10−4***
1915
Total Aβ0.1800.079
(0.054,0.118)
0.008
3.39 × 10−4***
1238
ΠMHC-,5 × 10−7Global SlopeRed
(High Cholesterol)
−0.0130.036
(0.024,0.053)
0.004
2.02 × 10−3**
1915
Total Aβ0.1790.028
(0.013,0.048)
0.002
1.88 × 10−3**
1238
ΠMHC-,1 × 10−7Total AβBlack
(LDL)
0.2050.084
(0.058,0.131)
0.003
4.16 × 10−3**
1238
ΠMHC-,5 × 10−8Total AβGreen
(LDL)
0.2090.037
(0.023,0.052)
0.001
1.37 × 10−2*
1238

aRefer to models 1,2,3 in Statistical Analysis section.

bR2model3 was used as model performance (0.632 bootstrapped) for all outcome variables.

cR2model3 - R2model2 PMLE is the P-value of the maximum likelihood ratio test between R2model3 and R2model2.

dNumber of observations.

The summary statistics of the multivariate regressions are shown for two selected ePRSs. For continuous response variables (Beta Amyloid (Aβ) protein, cognitive global random slope) we report the 0.632 bootstrapped variation explained (R2) by the covariates for the base and full model. Additionally, we show the p-value of the likelihood ratio test between the full and base models.

*P ≤ 0.05.

**P ≤ 0.01.

***P = 0.001.

All p’s are reported at an uncorrected level.

Table 2

Effect of ePRSs in predicting ageing pathologies and cognitive decline.

SNP inclusion thresholdY (outcomea)ePRS (module)βePRSModel performanceb
(95% CI)
ΔModel performancec
PMLE
nd
ΠALL,5 × 10−5Total AβRed
(High cholesterol)
5.1050.084
(0.063,0.123)
0.008
4.69 × 10−4**
1238
ΠALL,1 × 10−6Total AβSalmon
(Mean Carotid)
3.6340.085
(0.059,0.114)
0.002
8.63 × 10−3**
1238
ΠALL,1 × 10−7Global SlopeRed
(High Cholesterol)
−0.3340.030
(0.017,0.048)
0.003
1.11 × 10−3**
1915
Total Aβ6.7700.093
(0.067,0.125)
0.011
9.25 × 10−6***
1238
ΠALL,5 × 10−8Global SlopePink
(High Cholesterol)
−0.3250.029
(0.013,0.047)
0.002
1.46 × 10−3**
1915
Total Aβ6.5450.094
(0.071,0.126)
0.013
1.57 × 10−5***
1238
ΠMHC-,0.0001Total AβGreen
(High Cholesterol)
0.1790.091
(0.073,0.129)
0.017
9.02 × 10−7***
1238
ΠMHC-,5 × 10−5Global SlopeYellow
(High Cholesterol)
−0.0170.093
(0.071,0.121)
0.013
1.10 × 10−5***
1915
Total Aβ0.1810.091
(0.059,0.121)
0.010
2.22 × 10−5***
1238
ΠMHC-,1 × 10−5Global SlopeBrown
(High Cholesterol)
−0.0150.085
(0.061,0.119)
0.005
3.16 × 10−3**
1915
Total Aβ0.1630.085
(0.064,0.122)
0.012
5.29 × 10−5***
1238
ΠMHC-,1 × 10−6Global SlopeBrown
(High Cholesterol)
−0.0130.030
(0.015,0.048)
0.004
9.71 × 10−4***
1915
Total Aβ0.1800.079
(0.054,0.118)
0.008
3.39 × 10−4***
1238
ΠMHC-,5 × 10−7Global SlopeRed
(High Cholesterol)
−0.0130.036
(0.024,0.053)
0.004
2.02 × 10−3**
1915
Total Aβ0.1790.028
(0.013,0.048)
0.002
1.88 × 10−3**
1238
ΠMHC-,1 × 10−7Total AβBlack
(LDL)
0.2050.084
(0.058,0.131)
0.003
4.16 × 10−3**
1238
ΠMHC-,5 × 10−8Total AβGreen
(LDL)
0.2090.037
(0.023,0.052)
0.001
1.37 × 10−2*
1238
SNP inclusion thresholdY (outcomea)ePRS (module)βePRSModel performanceb
(95% CI)
ΔModel performancec
PMLE
nd
ΠALL,5 × 10−5Total AβRed
(High cholesterol)
5.1050.084
(0.063,0.123)
0.008
4.69 × 10−4**
1238
ΠALL,1 × 10−6Total AβSalmon
(Mean Carotid)
3.6340.085
(0.059,0.114)
0.002
8.63 × 10−3**
1238
ΠALL,1 × 10−7Global SlopeRed
(High Cholesterol)
−0.3340.030
(0.017,0.048)
0.003
1.11 × 10−3**
1915
Total Aβ6.7700.093
(0.067,0.125)
0.011
9.25 × 10−6***
1238
ΠALL,5 × 10−8Global SlopePink
(High Cholesterol)
−0.3250.029
(0.013,0.047)
0.002
1.46 × 10−3**
1915
Total Aβ6.5450.094
(0.071,0.126)
0.013
1.57 × 10−5***
1238
ΠMHC-,0.0001Total AβGreen
(High Cholesterol)
0.1790.091
(0.073,0.129)
0.017
9.02 × 10−7***
1238
ΠMHC-,5 × 10−5Global SlopeYellow
(High Cholesterol)
−0.0170.093
(0.071,0.121)
0.013
1.10 × 10−5***
1915
Total Aβ0.1810.091
(0.059,0.121)
0.010
2.22 × 10−5***
1238
ΠMHC-,1 × 10−5Global SlopeBrown
(High Cholesterol)
−0.0150.085
(0.061,0.119)
0.005
3.16 × 10−3**
1915
Total Aβ0.1630.085
(0.064,0.122)
0.012
5.29 × 10−5***
1238
ΠMHC-,1 × 10−6Global SlopeBrown
(High Cholesterol)
−0.0130.030
(0.015,0.048)
0.004
9.71 × 10−4***
1915
Total Aβ0.1800.079
(0.054,0.118)
0.008
3.39 × 10−4***
1238
ΠMHC-,5 × 10−7Global SlopeRed
(High Cholesterol)
−0.0130.036
(0.024,0.053)
0.004
2.02 × 10−3**
1915
Total Aβ0.1790.028
(0.013,0.048)
0.002
1.88 × 10−3**
1238
ΠMHC-,1 × 10−7Total AβBlack
(LDL)
0.2050.084
(0.058,0.131)
0.003
4.16 × 10−3**
1238
ΠMHC-,5 × 10−8Total AβGreen
(LDL)
0.2090.037
(0.023,0.052)
0.001
1.37 × 10−2*
1238

aRefer to models 1,2,3 in Statistical Analysis section.

bR2model3 was used as model performance (0.632 bootstrapped) for all outcome variables.

cR2model3 - R2model2 PMLE is the P-value of the maximum likelihood ratio test between R2model3 and R2model2.

dNumber of observations.

The summary statistics of the multivariate regressions are shown for two selected ePRSs. For continuous response variables (Beta Amyloid (Aβ) protein, cognitive global random slope) we report the 0.632 bootstrapped variation explained (R2) by the covariates for the base and full model. Additionally, we show the p-value of the likelihood ratio test between the full and base models.

*P ≤ 0.05.

**P ≤ 0.01.

***P = 0.001.

All p’s are reported at an uncorrected level.

Bar charts of different heights showing the variance explained in different phenotypic outcomes for SNP-based polygenic risk score prediction models that either include or exclude the eigen-polygenic scores, demonstrating the benefit of including the eigen-polygenic scores over the traditional polygenic risk score for Alzheimer’s disease.
Figure 3

Summary of model improvements for ePRS-inclusive models over single-trait LOAD PRS (PRSLOAD) models of LOAD-related phenotypes. Barplots showing the results of the comparison of full models including ePRSs and single-trait PRS models of PRSLOAD, calculated from ΠALL,α. FDR correction (with q-value ≤ 0.05) was used across likelihood ratio tests between the full and base models. For continuous response variables (Beta amyloid (Aβ) protein, PHFtau, cognitive global random slope, cognitive global random slope at last visit) we report the 0.632 bootstrapped variation explained (R2) by the covariates for the base (model 2) and full model (model 3). Results for ePRS derived from matrices excluding the MHC region are shown in Fig. S28. Additionally, we indicate the p-value of the likelihood ratio test between the full and base models. *P ≤ 0.05, **P ≤ 0.01, ***P ≤ 0.001. All P-values are reported at an uncorrected level. note that the confidence intervals were measured out of the bootstrapping loop.

PRS-PCA approach

In order to provide a simplified version of the in-depth analyses performed we applied the method presented in [27], wherein we performed over 8000 PCAs for each PRS matrix (ΠALL,α, ΠMHC-,α, ΠAPOE-,α, ΠMHC/APOE-,α) per phenotype across all tested α-value thresholds, extracting the first PC for each. This produced four ultimate matrices with one single PRS (PRS-PCA per [27]) per phenotype—representing the first PC across all 15 α-values. We then repeated the WGCNA pipeline and association analysis using these new PRS-PCAs. None of the resulting modules, generated by WGCNA, show associations with any of the LOAD-related phenotypes. These results are outlined in Fig. S29. Moreover, we have illustrated the structure of relationships between PRSLOAD and the PRS-PCA of LOAD in Fig. S30.

Discussion

We generated over 120 000 PRSs from 2218 phenotypes in a sample of over 2000 elderly individuals harnessing the latest GWAS summary statistics from the Pan-UK Biobank Consortium. We used a network-based approach to derive clusters of PRSs and identified meaningful patterns of biologically and clinically related phenotypes with relevance to cognitive decline and core LOAD-related neuropathologies. By including summary statistics for thousands of diagnoses, traits, biomarkers, behaviours, and health-related measures, rather than selecting major risk factors a priori, we build a richer understating of the relationship between the genetic structure of the elderly population and risk for LOAD.

While PCA provided useful insight into the primary axes of variability in our PRS matrices, the utility of the resulting PCs for genetic prediction are not clear; for modelling there are no reliable guidelines for which PCs to include at each α-value threshold, and the variation explained by the top PCs was less than 1% for the majority of the PCs. Thus, we applied a clustering method which is traditionally used on gene co-expression data, WGCNA. In most cases, ePRSs derived from the WGCNA pipeline that were significantly associated with LOAD phenotypes were composed of a combination of PRSs for dementia-related phenotypes (e.g. family history of LOAD, LOAD diagnosis, and all-cause dementia), cholesterol levels, prescriptions for statins and other cardiovascular health medications, diseases, physiological measurements of carotid arteries, and Cirrhosis. This reinforces the known links between cardiovascular health and liver disease with risk of LOAD [13, 28], but also points toward the specific aspects of cardiovascular health that may be more genetically important for determining this risk than others. The observed associations between ePRRs and the LOAD traits could be attributed to genetic correlations of polygenic traits; see Rheenen et al. [29] for a review. In fact, the proposed ePRS approach leverages such polygenic genetic correlation by identifying ePRSs of phenotypes that could have (polygenic) genetic correlations with LOAD-related traits and neuropathologies. It is worth mentioning that only 9 ePRSs explained more than 80% of the total variation of the given module (Fig. S15). Though this is dependent on the module size, it can impact our association analysis, for example, PRSs that are not highly profound in the given ePRS might not get a chance to pass our association analysis pipeline, and this might mitigate studying potential interesting phenotypes. Given the moderate module variance explained by the majority of the first PRS principal component (i.e. the ePRS), it may be worth exploring multivariate methods in the future; for example, include multiple PRS principal components from each module and then utilize regularized regression.

Incorporating ePRSs into other studies could be done by using the same methodology as described in this paper. We suggest that combining genetic risk profiles for vascular and dementia-related phenotypes using latent methods may offer improvements over existing single-trait PRS models of LOAD, an important step toward achieving the unrealized potential of PRS for risk stratification in populations and clinics. To further explore the potential of ePRSs for other polygenic traits with less strong effects, replication of the methodology used in this paper is needed. This could be done by using the same network-based approach to derive clusters of PRSs and identify meaningful patterns of biologically and clinically related phenotypes. Additionally, other researchers could explore the potential of combining genetic risk profiles for vascular and dementia-related phenotypes using latent methods to improve existing single-trait PRS models for other polygenic traits. However, it is important to note that the transferability of existing PRS across independent datasets and diverse ancestries is limited, which hinders the practical utility and exacerbates health disparities [30]. Therefore, further research is needed to improve the validation and transferability of existing PRS across diverse populations.

When removing the APOE and/or MHC regions from our scores, we noted major impacts on the associations of derived ePRS on LOAD phenotypes. In fact, we found no associations (q-value < 0.05) between ePRSs produced from ΠAPOE-,α and ΠMHC/APOE-,α and any LOAD phenotypes. Even ePRS that included PRSLOAD were not associated with LOAD-related phenotypes, though interestingly PRSLOAD did cluster with PRSs for rheumatoid arthritis, which has been shown to be genetically related to microglial activation that is implicated in LOAD pathogenesis [31]. In our association analysis, we observed the emergence of modules characterized by high cholesterol and LOAD. We postulate that this occurrence can be attributed to the influence of the APOE region derived from both PRSs. However, this may also partially be explained by the heterogeneous phenotype definitions used in the recent AD GWAS meta-analysis [32]. After careful examination of the overlapping SNPs in the APOE region, drawn from the two sets of GWAS summary statistics for LOAD and high cholesterol, it was observed that the average SNP effects for LOAD within the APOE region were 0.03 higher than those observed in both conditions and triple the amount seen in the high cholesterol only GWAS. For a detailed view of these results, please refer to Fig. S31 which provides a comprehensive representation of our sensitivity analysis.

This study has several limitations. First, our conclusions related to the relationship between heritable risk for our phenotypes of interest are based on summary statistics derived from the UK Biobank (indeed, those of any PRS-based study). The UK Biobank has known healthy volunteer bias [33], and our target ROS/MAP sample also has a known late-life resilience bias as a result of survival to study entry without dementia and lifestyle factors associated with part of its underlying religious population [34]. Therefore, the transfer of genetic risk profiles from one population to the other may be subject to challenges with cross-sample portability –most apparent when considering ancestral differences but extend to age, sex, and measurement differences between groups—that may hide true genetic effects [35]. Second, by clustering matrices across traits and within SNP inclusion thresholds, we have assumed that meaningful correlations between PRSs may not occur between different traits at different thresholds. Others have attempted to resolve this, for example by calculating the first principal component of a set of PRSs derived from the same set of summary statistics across multiple thresholds [36]. Thus, we tried this method, and although the resulting modules did not show strong associations with LOAD phenotypes, we were able to identify similar groupings of carotid measurements, or heart-related diseases in the top associated modules. This method comprises two stages of information loss. The first stage is the PRS-PCAs calculation from the initial PC analysis, and the second stage is the eigen-PRS calculation from the PRS-PCAs within the WGCNA pipeline. Finally, we selected only sex non-specific phenotypes to generate their corresponding PRSs using commonly used methods. Thus, our analysis omits information available for sex-specific traits, such as those related to menopause or testicular cancers, which are known to impact aging in important ways. Additionally, evidence suggests sex-specific effects of LOAD genetic risk factors in the clinical progression of LOAD and the accumulation of LOAD-related neuropathologies [37]; these effects were not explored here and are the focus of ongoing work. We acknowledge that there are other resources, such as the PGS Catalog [38], available for precomputed PRS weights for a wide array of traits not just from the UKB. In future studies, researchers could explore the potential of incorporating these resources into their analysis.

In sum, our study provides a whole-person view of the genetic risk landscape for LOAD in an elderly population. We highlight the impacts of 1) varying SNP inclusion α-value thresholds on this landscape using the most popular PRS calculation method, 2) the links between several blood-based and ultrasound imaging-derived cardiovascular traits on LOAD risk, as well as 3) the important contribution of the APOE region. Our results suggest that combining genetic risk profiles for vascular and dementia-related phenotypes using latent methods may offer improvements over existing single-trait PRS models of LOAD, an important step toward achieving the unrealized potential of PRS for risk stratification in populations and clinics.

Materials and methods

The religious orders study (ROS) and memory aging project (MAP) participants

ROS and MAP are two longitudinal studies based out of the Rush Alzheimer’s Disease Centre (RADC) in Chicago (IL) [34] designed with harmonized protocols to be analyzed as a single cohort. Participants are older, recruited free of known dementia and agree to annual detailed clinical evaluation, cognitive testing, and organ donation at the time of death. A total 2044 were determined to be of European ancestry and included in our analyses (see Supplementary Methods). All phenotypic data are available via controlled access through the RADC Resource Sharing Hub (www.radc.rush.edu) and genomics data is available via Synapse (https://www.synapse.org/#!Synapse:syn3219045). An Institutional Review Board of Rush University Medical Center approved each study, and all subjects provided informed, written consent and signed an Anatomical Gift Act. All participants signed a repository consent for resource sharing.

Clinical diagnosis of LOAD

For all subjects, available clinical data were reviewed at the time of death by a neurologist with expertise in dementia. Summary diagnoses were made blinded to all post-mortem data and case conferences with one or more neurologists and a neuropsychologist were used to achieve consensus when required. Diagnoses fell into one of six possible categories: no cognitive impairment (NCI), mild cognitive impairment (MCI) with or without secondary cause of impairment, probable late onset Alzheimer’s disease (LOAD; NINCDS criteria [39]), possible LOAD (with additional cause of impairment), and other primary cause of dementia [40]. For the purposes of our case–control modelling of consensus diagnosis at time of death, we considered only probable LOAD and NCI participants; all other analyses of neuropathological and cognitive outcomes included all participants with available data.

Assessment of longitudinal cognitive decline

At each annual evaluation, all subjects were administered 21 cognitive tests of which 19 were in common between ROS and MAP [41]. The raw scores from these 19 tests were standardized to z-scores across the cohort and averaged to yield the global cognitive function summary score at each study visit. Slopes of change in global cognitive performance over time (cognitive decline) were calculated per subject using linear mixed modelling, with a random effect for study visit, controlling for age at baseline, as described [42]. The range of the number of longitudinal measures is from 2 to 26 across the participants.

Assessment of postmortem beta-amyloid and tau neuropathology

Board-certified neuropathologists examined all brains without prior knowledge of clinical or demographic information. Brain removal and fixing procedures have been detailed extensively [43]. Data from neuropathological examinations were available for 1512 subjects at time of study. Immunohistochemistry and automated image processing were used to quantify total beta-amyloid and paired helical filament tau (PHFtau) across 8 brain regions (hippocampus, entorhinal cortex, midfrontal cortex, inferior temporal, angular gyrus, calcarine cortex, cingulate cortex, frontal cortex), as described previously [44, 45]. The square root of brain-wide averages of beta-amyloid and tau pathologies were used in our analyses. Figure S32 illustrates the distribution of the pathological measurements alongside global cognitive decline of the ROS/MAP participants.

Genotyping and imputation

Genotype data were available for 2067 ROS/MAP participants, between two batches: nbatch1 = 1686 genotyped using the Affymetrix GeneChip 6.0 and nbatch2 = 381 genotyped using the Illumina OmniQuad Express platform; batch was included as a covariate in our analyses. Details of raw genotype quality control (QC) have been previously described [46]. Briefly, prior to submission for imputation, genotypes were preprocessed using the TOPMed Imputation Server-recommended data preparation pipeline (https://topmedimpute.readthedocs.io/en/latest/prepare-your-data.html). Each batch was imputed separately using the TOPMed Imputation Server (TOPMed reference r2) [47], including Eagle (v2.4) for allelic phasing and Minimac4 (v1.5.7) for imputation. Imputed output data from the TOPMed server for each batch were filtered for imputation quality (removing SNPs with r2 < 0.8) before merging and mapping to rsIDs (dbSNP build 155). This resulted in a set of 9 329 439 high-quality, bi-allelic autosomal SNPs. Additionally, we verified the European ancestral background of the 2067 individuals by mapping the multidimensional scaling (MDS) plots of the ROS/MAP SNP genotypes against the 1000 Genome project reference populations (phase 3) [48] (details in Supplementary Methods and Fig. S33). Prior to our PRS calculation, additional sample and variant quality controls were performed, removing SNPs and subjects according to the following criteria:

  • SNPs with minor allele frequency (MAF) < 0.01 (1 549 210 variants)

  • SNPs with Hardy–Weinberg Equilibrium (HWE) P-value < 1×10−6 (132 variants)

  • SNPs with missingness rate > 0.01 (978 variants)

  • Individuals with higher-than-expected rates of SNP heterozygosity within 3 standard deviations of the mean (15 individuals)

  • Individuals with non-white European ancestry were removed (1 individual)

  • Individuals with mismatching IDs were removed (7 individuals)

Following this rigorous QC, a total of 2044 individuals and 7 723 915 variants remained for analysis.

Selection of GWAS summary statistics from the UK biobank

For phenome-wide PRS calculations, we used GWAS summary statistics derived from the UK Biobank, a large-scale population-based study including approximately 500 000 individuals [49]. The Pan-UK Biobank (https://pan.ukbb.broadinstitute.org/) consortium has recently conducted GWAS for all phenotypes with sufficient statistical power. Out of a total of 7221 phenotypes with GWAS performed in the European ancestry UKB sample (which matches our sample of genetically verified European-ancestry ROS/MAP participants), we selected 2218 phenotypes that had heritability estimates greater than or equal to 5% and were sex non-specific. The heritability scale of each phenotype was provided by Pan-UKBB consortium (calculated using SAIGE [50]), and selected sex non-specific phenotypes using their provided meta-data (“pheno_sex” column indicating “both”). Additional manual selection was performed to remove phenotypes with no cases for either sex (indicated by columns “n_cases_full_cohort_females” or “n_cases_full_cohort_males”), indicating geographical or administrative information, or duplicate indications or codings of the same underlying measurement. The full list of selected phenotypes for our study are provided in Table S1, and further details of selection criteria are outlined in Supplementary Methods. For calculating a benchmark PRS for LOAD, we also downloaded summary statistics from the latest LOAD GWAS, published by Bellenguez et al. [51]. Table S4 contains a summary of the variant count in this GWAS.

Calculation and compilation of whole-person PRS matrices

PRSice (v2.3.3) [52] was used to generate PRSs across all traits at 15 different SNP inclusion P-value thresholds (α-values) including 1 (including all SNPs), 0.1, 0.05, 0.01, 5 × 10−3, 1 × 10−3, 5 × 10−4, 1 × 10−4, 5 × 10−5, 1 × 10−5, 5 × 10−6, 1 × 10−6, 5 × 10−7, 1 × 10−7, and 5 × 10−8 (including only genome-wide significant SNPs). For each α-value threshold, we obtained a set of PRSs for up to 2218 traits (fewer traits are included at more stringent thresholds, as they may not have identified any SNPs reaching that threshold). For PRS calculations, we used the default “average” method of PRSice [53]. To account for patterns of linkage disequilibrium (LD), we applied a 500Kb sliding window and r2 threshold of 0.2 during variant clumping, as in previous work [54]. To control for fine population structure [36], the first ten principal components of genotype data and a variable representing genotype batch were regressed out of each score independently. We then conducted further QC, removing any PRS with fewer than five contributing SNPs. We also identified any pairs of PRSs with extremely highly correlation (Pearson correlation > 0.95) and retained only one (phenotypes with ICD10 code were preferred otherwise selected randomly). Finally, each PRS was standardized (mean subtracted and then divided by standard deviation) to obtain z-scores.

This resulted in 15 sets (for the 15 different α-value thresholds) of PRS “matrices” for downstream analysis. Each PRS “matrix” is a n × k matrix of phenome-wide PRSs calculated across traits, where n = 2044 is the number of participants after QC, and k ≤ 2218 is the number of traits. The value of k varies across the 15 sets, because at a given α-value threshold, some PRSs may be constructed from fewer than five SNPs thus removed as part of the QC.

Notably, previous work has observed a profound influence of variants in the APOE region of chromosome 19 on LOAD phenotypes in PRS analyses [55]. Additionally, many psychiatric and biomarker profiles are known to be heavily influenced by strong effects in the major histocompatibility complex (MHC) region of chromosome 6, which is also notoriously challenging to impute due to complex LD [56]. To assess the influence of these regions on the clustering of PRSs and resulting associations with LOAD-related phenotypes, we additionally calculated all the 15 sets of PRSs excluding a) the MHC region (GRCh37: chr6:25477797-36 448 354) [21] (55 204 variants), and b) the APOE region (GRCh37: chr19:45300000-45500000) [57] (622 variants).

This resulted in a total of 15 × 4 = 60 sets of phenome-wide PRS matrices, for the 15 different α-value thresholds and four different APOE and/or MHC inclusion and exclusion criteria: 1) including all genotyped and imputed variants genome-wide (primary analysis; denoted “ALL”), 2) removing only APOE SNPs (denoted “APOE-”), 3) removing only MHC SNPs (denoted “MHC-”), and 4) removing both APOE and MHC SNPs (denoted “MHC/APOE-”). For clarity, we denote these PRS matrices as Πsubscript, where the subscript indicates whether the matrix was calculated on the whole genome or with indicated regions excluded (i.e. ALL, MHC-, APOE-, MHC/APOE-) and the α-value threshold used for SNP inclusion (e.g. 1 × 10−4, 5 × 10−8). For example, ΠMHC -,1 × 10−8 is for the matrix when a PRS was calculated without SNPs in the MHC region and at an α-value threshold of 1 × 10−8. Table S5 outlines the dimension of each PRS matrix after QC.

Principal component analysis (PCA) of whole-person PRS matrices

PCA is a multivariate dimension-deduction technique that can describe the covariance among a set of variables (in our case, PRSs) [58]. All PCA analyses were performed using R (4.3.0) (https://www.r-project.org/). Exploratory PCA was first applied to each of the 60 PRS matrices separately to identify the major variance components of phenome-wide genetic risk and describe the contributors to each top principal component. This exploratory PCA analysis served three purposes: 1) as a cursory test of accuracy for the large sets of PRS calculations and post-processing (do PCs load onto known epidemiologically and biologically related traits?), 2) to inform expectations for cohesion in downstream clustering analyses, and 3) to provide confirmation of observations made previously based on GWAS summary statistics that the majority of phenome-wide variance in genetic risk for human traits is explained by genetic risk for metabolic and cardiovascular health-related measures [21].

Clustering of PRS matrices and calculation of eigen-PRSs (ePRS) using WGCNA

To identify meaningful, discrete clusters of correlated PRSs across the phenome for ePRS construction, we used the weighted gene co-expression network analysis (WGCNA) framework [59], which leverages the network properties of biological phenomena to cluster input measures. Although WGCNA was originally developed for use in gene expression experiments [59] to identify biologically coherent co-expressed gene modules, the principles on which it operates are not unique to such measures; WGCNA has been used to build epigenetic networks [60] and to subtype patient populations [61]. Notably, the original authors of WGCNA have also adapted the method for network-based analyses of GWAS summary statistics (the so-called weighted SNP correlation network analysis, or WSCNA [62]), which aims to identify clusters of SNPs with similar effects on a given phenotype after controlling for LD. In contrast, we use the WGCNA framework for clustering PRSs across a large number of phenotypes, rather than clustering SNP effects across the genome for a single phenotype. Studies have shown WGCNA’s superior accuracy and robustness over other clustering methods, especially for transcriptomic data. Validation against known gene classifications revealed that graph-based techniques outperform conventional clustering approaches [63].

Extensive details on the WGCNA/WSCNA (R package, v1.71) pipeline have been published previously [59, 62]. In our analysis, for each PRS matrix, pairwise Pearson correlations were calculated among all PRSs. For the construction of a weighted network, these PRS-PRS correlations were raised to a soft-thresholding power (β), determined empirically using criteria that encourage approximate scale-free topology of the resulting network (via the WGCNA package function pickSoftThreshold). We chose to calculate an unsigned network, meaning that adjacency was based off of the absolute value of correlation, rather than discarding negative correlations. This was important, as we calculated PRS across thousands of phenotypes, among which both negative and positive correlations would be of interest.

To cluster this network, a topological overlap matrix (TOM) was calculated and used to define dissimilarity for hierarchical clustering. The resulting dendrogram was then clustered using dynamic tree cutting [64] which applies a variable-height tree cut algorithm to maximize the capture of meaningful branch structures (i.e. PRS modules). WGCNA was run using the following additional parameters for detection of PRS modules: module detection cut-height of 0.999, minimum module size of 10, and high branch split sensitivity (deepSplit = 4).

Our soft-thresholding power selection varied across PRS matrices (refer to Fig. S34), depending mostly on the SNP inclusion α-value threshold, and therefore the number of PRSs included in a given matrix. For each matrix, the lowest value of β was chosen at which the scale-free topology model fit (R2) reached above 0.8 (resulting β values ranged from 1 to 6) to satisfy a scale-free topological network. We also chose a low minKMEtoStay parameter (0.01), which governs the behaviour of the dendrogram clustering algorithm and results in the inclusion of nearly all input PRS into some module (rather than excluding the many lowly correlated PRSs from membership in any module). We did this to allow the ePRS pipeline to capitalize on even small genetic influences of weakly-correlated PRSs within clusters, rather than eliminating these subtle contributions entirely.

Following the PRS matrix clustering and the identification of PRS modules, PCA was applied to each module separately, with the first principal component representing that module’s ePRS; multiple principal components can be used for each module, however by default, WGCNA selects the first component of the PRS matrices from each module. Hence, we used it as the module’s corresponding ePRS to simplify the downstream analysis. These ePRSs were then 1) examined for insights into the whole-person genetic risk landscape across input PRS matrices, and 2) used for downstream modelling of LOAD phenotypes.

Association analysis of ePRSs with LOAD phenotypes

For each iteration of our PRS analysis (i.e. for each matrix calculated using a different SNP inclusion α-value threshold and including/excluding APOE and/or MHC), we used multivariate regression models to identify the association of derived ePRSs with the following LOAD-related variables: consensus clinical diagnosis of LOAD at time of death, global cognitive decline slope prior to death, and brain-wide post-mortem levels of β-amyloid (Aβ) and PHFtau neuropathologies. Among the four outcomes, LOAD diagnosis is a binary outcome and was analyzed using logistic regression, with the remaining three continuous traits analyzed using linear models.

For LOAD diagnosis and cognitive decline, regression covariates included biological sex, age at death, and educational attainment. For neuropathological outcomes, educational attainment was replaced with postmortem interval (PMI). False discovery rate (FDR) correction (i.e. q-value [65]) was used to mitigate multiple testing concerns among all ePRS derived from a given matrix; this matrix-specific FDR correction can be thought as a stratified FDR control [66]. For direct comparison with the benchmark PRS from the latest and most powerful GWAS [51] of LOAD, ePRSs demonstrating significant association with at least one of the LOAD phenotypes (q-value ≤ 0.05) were carried forward for further analysis.

Evaluating performance of ePRSs against LOAD PRS benchmark

First, we optimized a single-trait PRS for LOAD (PRSLOAD) in the ROS/MAP sample using GWAS summary statistics taken from [51]. This was achieved the same PRSice-based method described previously, calculating PRSLOAD at the 15 different SNP inclusion α-value thresholds, and selecting the highly associated (q-value ≤ 0.05) scores with the AD-related phenotypes. We note that this model selection approach would result in a degree of overfitting for the benchmark PRSLOAD, making our subsequent estimates of ePRS performance gains naturally conservative.

To compare the performance of selected LOAD phenotype-associated ePRSs with the benchmark PRSLOAD [67], we first built two sets of baseline regression models (linear for continuous and logistic for binary outcomes) to establish the performance of covariates alone (Model 1) and covariates + PRSLOAD (Model 2) on our clinical and neuropathological outcomes (Y):

Model 1. Covariates only:

Model 2. Covariates and PRSLOAD: 

Then, for each ePRS significantly associated with at least one LOAD-related outcome in our preceding analyses, we built a third regression model (Model 3) including covariates, PRSLOAD, the selected ePRS:

Model 3. Covariates, PRSLOAD, and ePRS: 

The likelihood ratio test was then used, comparing model 2 with model 3 to assess whether the ePRS significantly improved model fit over PRSLOAD for a given outcome. The difference in model performance was reported as a ΔAUC for logistic and ΔR2 for linear models, again comparing models 2 and 3. As each significant ePRS was selected based on association with at least one LOAD-related outcome, we mitigated concerns of overfitting by performing 0.632+ bootstrapping [68] to derive model performance estimates. Finally, we assessed the predictability of PRSLOAD across all α-value thresholds, in association with the selected LOAD phenotypes. We have outlined their performance in Fig. S35.

Conflict of interest statement: The authors declare no competing interests.

Funding

DF is supported by the Michael and Sonja Koerner Family New Scientist Award, the Krembil Family Foundation, the Canadian Institutes of Health Research (CIHR), and the Centre for Addiction and Mental Health (CAMH) Discovery Fund. L.S. is supported by The Natural Sciences and Engineering Research Council of Canada (NSERC), RGPIN-04934 and RGPAS-522594. ROS/MAP is supported by P30AG10161, P30AG72975, R01AG15819, R01AG17917. U01AG46152, U01AG61356.

Data availability

All phenotypes and omics data of the Religious Orders Study (ROS) and Memory Aging Project (MAP) Cohorts are available through the RADC Resource Sharing Hub at www.radc.rush.edu. ROS/MAP is supported by P30AG10161, P30AG72975, R01AG15819, R01AG17917. U01AG46152, U01AG61356. ROS/MAP resources can be requested at https://www.radc.rush.edu. The UK Biobank GWAS summary statistics are available from Neale Lab, http://www.nealelab.is/ukbiobank. All scripts used for this manuscript are open-resource and available at https://github.com/aminekhasteh/ePRS.

References

1.

2021 Alzheimer’s disease facts and figures
.
Alzheimers Dement
.
2021
;
17
:
327
406
.

2.

Sims
R
,
Hill
M
,
Williams
J
.
The multiplex model of the genetics of Alzheimer’s disease
.
Nat Neurosci
.
2020
;
23
:
311
322
.

3.

Gatz
M
,
Reynolds
CA
,
Fratiglioni
L
, et al.  
Role of genes and environments for explaining Alzheimer disease
.
Arch Gen Psychiatry
.
2006
;
63
:
168
174
.

4.

Andrews
SJ
,
Fulton-Howard
B
,
Goate
A
.
Interpretation of risk loci from genome-wide association studies of Alzheimer’s disease
.
Lancet Neurol
.
2020
;
19
:
326
335
.

5.

Saez-Atienzar
S
,
Masliah
E
.
Cellular senescence and Alzheimer disease: the egg and the chicken scenario
.
Nat Rev Neurosci
.
2020
;
21
:
433
444
.

6.

Jones
L
,
Harold
D
,
Williams
J
.
Genetic evidence for the involvement of lipid metabolism in Alzheimer’s disease
.
Biochim Biophys Acta
.
2010
;
1801
:
754
761
.

7.

Jones
L
,
Holmans
PA
,
Hamshere
ML
, et al.  
Genetic evidence implicates the immune system and cholesterol metabolism in the aetiology of Alzheimer’s disease
.
PLoS One
.
2010
;
5
:
e13950
.

8.

Voineskos
AN
,
Lerch
JP
,
Felsky
D
, et al.  
The brain-derived neurotrophic factor Val66Met polymorphism and prediction of neural risk for Alzheimer disease
.
Arch Gen Psychiatry
.
2011
;
68
:
198
206
.

9.

Yokoyama
JS
,
Wang
Y
,
Schork
AJ
, et al.  
Association between genetic traits for immune-mediated diseases and Alzheimer disease
.
JAMA Neurol
.
2016
;
73
:
691
697
.

10.

Ferrucci
L
,
Fabbri
E
.
Inflammageing: chronic inflammation in ageing, cardiovascular disease, and frailty
.
Nat Rev Cardiol
.
2018
;
15
:
505
522
.

11.

Loeffler
DA
.
Modifiable, non-modifiable, and clinical factors associated with progression of Alzheimer’s disease
.
J Alzheimers Dis
.
2021
;
80
:
1
27
.

12.

Gudala
K
,
Bansal
D
,
Schifano
F
, et al.  
Diabetes mellitus and risk of dementia: a meta-analysis of prospective observational studies
.
J Diabetes Investig
.
2013
;
4
:
640
650
.

13.

Samieri
C
,
Perier
M-C
,
Gaye
B
, et al.  
Association of Cardiovascular Health Level in older age with cognitive decline and incident dementia
.
JAMA
.
2018
;
320
:
657
664
.

14.

Choi
D
,
Choi
S
,
Park
SM
.
Effect of smoking cessation on the risk of dementia: a longitudinal study
.
Ann Clin Transl Neurol
.
2018
;
5
:
1192
1199
.

15.

Kivimäki
M
,
Luukkonen
R
,
Batty
GD
, et al.  
Body mass index and risk of dementia: analysis of individual-level data from 1.3 million individuals
.
Alzheimers Dement
.
2018
;
14
:
601
609
.

16.

Lennon
MJ
,
Koncz
R
,
Sachdev
PS
.
Hypertension and Alzheimer’s disease: is the picture any clearer?
 
Curr Opin Psychiatry
.
2021
;
34
:
142
148
.

17.

Feringa
FM
,
van der Kant
R
.
Cholesterol and Alzheimer’s disease; from risk genes to pathological effects
.
Front Aging Neurosci
.
2021
;
13
:
690372
.

18.

Puglielli
L
,
Tanzi
RE
,
Kovacs
DM
.
Alzheimer’s disease: the cholesterol connection
.
Nat Neurosci
.
2003
;
6
:
345
351
.

19.

Olmastroni
E
,
Molari
G
,
De Beni
N
, et al.  
Statin use and risk of dementia or Alzheimer’s disease: a systematic review and meta-analysis of observational studies
.
Eur J Prev Cardiol
.
2022
;
29
:
804
814
.

20.

UK Biobank
. http://www.nealelab.is/uk-biobank  
(accessed Jan 13, 2022)
.

21.

Tanigawa
Y
,
Li
J
,
Justesen
JM
, et al.  
Components of genetic associations across 2,138 phenotypes in the UK biobank highlight adipocyte biology
.
Nat Commun
.
2019
;
10
:
4064
.

22.

Yan
D
,
Hu
B
,
Darst
BF
, et al.  
Biobank-wide association scan identifies risk factors for late-onset Alzheimer’s disease and endophenotypes
.
2018
;
468306
. https://elifesciences.org/reviewed-preprints/91360v1.

23.

Andrews
SJ
,
Fulton-Howard
B
,
O’Reilly
P
, et al.  
Causal associations between modifiable risk factors and the Alzheimer’s phenome
.
Ann Neurol
.
2021
;
89
:
54
65
.

24.

Towards clinical utility of polygenic risk scores | Human Molecular Genetics | Oxford Academic
. https://academic.oup.com/hmg/article/28/R2/R133/5540980?login=true  
(accessed Jan 13, 2022)
.

25.

Stroganov
O
,
Fedarovich
A
,
Wong
E
, et al.  
Mapping of UK biobank clinical codes: challenges and possible solutions
.
PLoS One
.
2022
;
17
:
e0275816
.

26.

Saelens
W
,
Cannoodt
R
,
Saeys
Y
.
A comprehensive evaluation of module detection methods for gene expression data
.
Nat Commun
.
2018
;
9
:
1090
.

27.

Coombes
BJ
,
Ploner
A
,
Bergen
SE
, et al.  
A principal component approach to improve association testing with polygenic risk scores
.
Genet Epidemiol
.
2020
;
44
:
676
686
.

28.

Giannisis
A
,
Patra
K
,
Edlund
AK
, et al.  
Brain integrity is altered by hepatic APOE ε4 in humanized-liver mice
.
Mol Psychiatry
.
2022
;
27
:
3533
3543
.

29.

van Rheenen
W
,
Peyrot
WJ
,
Schork
AJ
, et al.  
Genetic correlations of polygenic disease traits: from theory to practice
.
Nat Rev Genet
.
2019
;
20
:
567
581
.

30.

Truong
B
,
Hull
LE
,
Ruan
Y
, et al.  
Integrative polygenic risk score improves the prediction accuracy of complex traits and diseases
.
medRxiv
.
2023
;
2023.02.21.23286110
. https://pubmed.ncbi.nlm.nih.gov/38508198/.

31.

Felsky
D
,
Roostaei
T
,
Nho
K
, et al.  
Neuropathological correlates and genetic architecture of microglial activation in elderly human brain
.
Nat Commun
.
2019
;
10
:
409
.

32.

Escott-Price
V
,
Hardy
J
.
Genome-wide association studies for Alzheimer’s disease: bigger is not always better
.
Brain Commun
.
2022
;
4
:
fcac125
.

33.

Fry
A
,
Littlejohns
TJ
,
Sudlow
C
, et al.  
Comparison of sociodemographic and health-related characteristics of UK biobank participants with those of the general population
.
Am J Epidemiol
.
2017
;
186
:
1026
1034
.

34.

Bennett
DA
,
Buchman
AS
,
Boyle
PA
, et al.  
Religious orders study and rush memory and aging project
.
J Alzheimers Dis
.
2018
;
64
:
S161
S189
.

35.

Privé
F
,
Aschard
H
,
Carmi
S
, et al.  
Portability of 245 polygenic scores when derived from the UK biobank and applied to 9 ancestry groups from the same cohort
.
Am J Hum Genet
.
2022
;
109
:
12
23
.

36.

Zhao
H
,
Mitra
N
,
Kanetsky
PA
, et al.  
A practical approach to adjusting for population stratification in genome-wide association studies: principal components and propensity scores (PCAPS)
.
Stat Appl Genet Mol Biol
.
2018
;
17
. https://doi.org/10.1515/sagmb-2017-0054.

37.

Fan
CC
,
Banks
SJ
,
Thompson
WK
, et al.  
Sex-dependent autosomal effects on clinical progression of Alzheimer’s disease
.
Brain
.
2020
;
143
:
2272
2280
.

38.

Lambert
SA
,
Gil
L
,
Jupp
S
, et al.  
The polygenic score Catalog as an open database for reproducibility and systematic evaluation
.
Nat Genet
.
2021
;
53
:
420
425
.

39.

McKhann
GM
,
Knopman
DS
,
Chertkow
H
, et al.  
The diagnosis of dementia due to Alzheimer’s disease: recommendations from the National Institute on Aging-Alzheimer’s association workgroups on diagnostic guidelines for Alzheimer’s disease
.
Alzheimers Dement
.
2011
;
7
:
263
269
.

40.

Schneider
JA
,
Arvanitakis
Z
,
Bang
W
, et al.  
Mixed brain pathologies account for most dementia cases in community-dwelling older persons
.
Neurology
.
2007
;
69
:
2197
2204
.

41.

Wilson
RS
,
Boyle
PA
,
Yu
L
, et al.  
Temporal course and pathologic basis of unawareness of memory loss in dementia
.
Neurology
.
2015
;
85
:
984
991
.

42.

Wilson
RS
,
Yu
L
,
Trojanowski
JQ
, et al.  
TDP-43 pathology, cognitive decline, and dementia in old age
.
JAMA Neurol
.
2013
;
70
:
1418
1424
.

44.

De Jager
PL
,
Shulman
JM
,
Chibnik
LB
, et al.  
A genome-wide scan for common variants affecting the rate of age-related cognitive decline
.
Neurobiol Aging
.
2012
;
33
:
1017.e1
1017.e15
.

45.

Wilson
RS
,
Arnold
SE
,
Schneider
JA
, et al.  
The relationship between cerebral Alzheimer’s disease pathology and odour identification in old age
.
J Neurol Neurosurg Psychiatry
.
2007
;
78
:
30
35
.

46.

De Jager
PL
,
Ma
Y
,
McCabe
C
, et al.  
A multi-omic atlas of the human frontal cortex for aging and Alzheimer’s disease research
.
Sci Data
.
2018
;
5
:
180142
.

47.

Taliun
D
,
Harris
DN
,
Kessler
MD
, et al.  
Sequencing of 53,831 diverse genomes from the NHLBI TOPMed program
.
Nature
.
2021
;
590
:
290
299
.

48.

Marees
AT
,
de Kluiver
H
,
Stringer
S
, et al.  
A tutorial on conducting genome-wide association studies: quality control and statistical analysis
.
Int J Methods Psychiatr Res
.
2018
;
27
:
e1608
.

49.

Allen
NE
,
Sudlow
C
,
Peakman
T
, et al.  
UK biobank data: come and get it
.
Sci Transl Med
.
2014
;
6
:
224ed4
.

50.

Zhou
W
,
Nielsen
JB
,
Fritsche
LG
, et al.  
Efficiently controlling for case-control imbalance and sample relatedness in large-scale genetic association studies
.
Nat Genet
.
2018
;
50
:
1335
1341
.

51.

Bellenguez
C
,
Küçükali
F
,
Jansen
IE
, et al.  
New insights into the genetic etiology of Alzheimer’s disease and related dementias
.
Nat Genet
.
2022
;
54
:
412
436
.

52.

Choi
SW
,
O’Reilly
PF
.
PRSice-2: polygenic risk score software for biobank-scale data
.
Gigascience
.
2019
;
8
:
giz082
.

53.

Privé
F
,
Vilhjálmsson
BJ
,
Aschard
H
, et al.  
Making the most of clumping and thresholding for polygenic scores
.
Am J Hum Genet
.
2019
;
105
:
1213
1221
.

54.

Association of Accelerometer-Derived Sleep Measures with Lifetime Psychiatric Diagnoses: A Cross-Sectional Study of 89,205 Participants from the UK Biobank
.
PLOS Medicine
. https://doi.org/10.1371/journal.pmed.1003782.
(accessed Jun 22, 2022)
.

55.

Belloy
ME
,
Napolioni
V
,
Greicius
MD
.
A quarter century of APOE and Alzheimer’s disease: progress to date and the path forward
.
Neuron
.
2019
;
101
:
820
838
.

56.

Candore
G
,
Balistreri
CR
,
Colonna-Romano
G
, et al.  
Major histocompatibility complex and sporadic Alzheimer’s disease: a critical reappraisal
.
Exp Gerontol
.
2004
;
39
:
645
652
.

57.

Zhou
X
,
Chen
Y
,
Mok
KY
, et al.  
Non-coding variability at the APOE locus contributes to the Alzheimer’s risk
.
Nat Commun
.
2019
;
10
:
3310
.

58.

Abdi
H
,
Williams
LJ
.
Principal component analysis
.
WIREs Comput Stat
.
2010
;
2
:
433
459
.

59.

Langfelder
P
,
Horvath
S
.
WGCNA: an R package for weighted correlation network analysis
.
BMC Bioinformatics
.
2008
;
9
:
559
.

60.

Bustamante
AC
,
Armstrong
DL
,
Uddin
M
.
Epigenetic profiles associated with major depression in the human brain
.
Psychiatry Res
.
2018
;
260
:
439
442
.

61.

Jia
R
,
Zhao
H
,
Jia
M
.
Identification of co-expression modules and potential biomarkers of breast cancer by WGCNA
.
Gene
.
2020
;
750
:
144757
.

62.

Levine
ME
,
Langfelder
P
,
Horvath
S
.
A weighted SNP correlation network method for estimating polygenic risk scores
.
Methods Mol Biol
.
2017
;
1613
:
277
290
.

63.

Jay
JJ
,
Eblen
JD
,
Zhang
Y
, et al.  
A systematic comparison of genome-scale clustering algorithms
.
BMC Bioinformatics
.
2012
;
13
:
S7
.

64.

Langfelder
P
,
Zhang
B
,
Horvath
S
.
Defining clusters from a hierarchical cluster tree: the dynamic tree cut package for R
.
Bioinformatics
.
2008
;
24
:
719
720
.

65.

Storey
JD
,
Tibshirani
R
.
Statistical significance for genomewide studies
.
Proc Natl Acad Sci
.
2003
;
100
:
9440
9445
.

66.

Sun
L
,
Craiu
RV
,
Paterson
AD
, et al.  
Stratified false discovery control for large-scale hypothesis testing with application to genome-wide association studies
.
Genet Epidemiol
.
2006
;
30
:
519
530
.

67.

Darst
BF
,
Koscik
RL
,
Racine
AM
, et al.  
Pathway-specific polygenic risk scores as predictors of β-amyloid deposition and cognitive function in a sample at increased risk for Alzheimer’s disease
.
J Alzheimers Dis
.
2017
;
55
:
473
484
.

68.

Efron
B
,
Tibshirani
R
.
Improvements on cross-validation: the .632+ bootstrap method
.
J Am Stat Assoc
.
1997
;
92
:
548
560
.

Author notes

Lei Sun as either co-corresponding author.

This article is published and distributed under the terms of the Oxford University Press, Standard Journals Publication Model (https://academic.oup.com/pages/standard-publication-reuse-rights)