Abstract

Molecular subtyping of brain tissue provides insights into the heterogeneity of common neurodegenerative conditions, such as Alzheimer’s disease. However, existing subtyping studies have mostly focused on single data modalities and only those individuals with severe cognitive impairment. To address these gaps, we applied similarity network fusion, a method capable of integrating multiple high-dimensional multi-omic data modalities simultaneously, to an elderly sample spanning the full spectrum of cognitive ageing trajectories. We analyzed human frontal cortex brain samples characterized by five omic modalities: bulk RNA sequencing (18 629 genes), DNA methylation (53 932 CpG sites), histone acetylation (26 384 peaks), proteomics (7737 proteins) and metabolomics (654 metabolites). Similarity network fusion followed by spectral clustering was used for subtype detection, and subtype numbers were determined by Eigen-gap and rotation cost statistics. Normalized mutual information determined the relative contribution of each modality to the fused network. Subtypes were characterized by associations with 13 age-related neuropathologies and cognitive decline. Fusion of all five data modalities (n = 111) yielded two subtypes (nS1 = 53, nS2 = 58), which were nominally associated with diffuse amyloid plaques; however, this effect was not significant after correction for multiple testing. Histone acetylation (normalized mutual information = 0.38), DNA methylation (normalized mutual information = 0.18) and RNA abundance (normalized mutual information = 0.15) contributed most strongly to this network. Secondary analysis integrating only these three modalities in a larger subsample (n = 513) indicated support for both three- and five-subtype solutions, which had significant overlap, but showed varying degrees of internal stability and external validity. One subtype showed marked cognitive decline, which remained significant even after correcting for tests across both three- and five-subtype solutions (pBonf = 5.9 × 10−3). Comparison to single-modality subtypes demonstrated that the three-modal subtypes were able to uniquely capture cognitive variability. Comprehensive sensitivity analyses explored influences of sample size and cluster number parameters. We identified highly integrative molecular subtypes of ageing derived from multiple high dimensional, multi-omic data modalities simultaneously. Fusing RNA abundance, DNA methylation, and histone acetylation measures generated subtypes that were associated with cognitive decline. This work highlights the potential value and challenges of multi-omic integration in unsupervised subtyping of post-mortem brain.

Introduction

Ageing is often accompanied by progressive cognitive decline. The severity of this decline ranges from normal age-related changes to clinically important mild cognitive impairment and ultimately dementia.1,2 Alzheimer’s disease is the most common cause of late-life dementia, which is typically characterized by impairments in memory and loss of daily functioning.2 This poses a major public health concern, as by 2050, the estimated number of individuals diagnosed with dementia globally is expected to reach 152.8 million.3 As a neuropathological process, Alzheimer’s disease is defined by the abnormal accumulation of neurofibrillary tangles (hyperphosphorylated tau protein), the formation of extracellular dense core plaque deposits (beta-amyloid) and chronic neuroinflammation in the brain.4 However, there is great inter-individual heterogeneity in these pathological hallmarks, and the relationship between neuropathology and cognitive impairment is not deterministic.5 As such, there likely remain unobserved molecular signatures of age-related cognitive decline that could help explain the heterogeneity observed within populations and shed light on mechanisms of illness.

Molecular subtyping most often refers to classifying individuals within a population into subgroups using molecular data types and unsupervised clustering methods.6,7 The approach has seen success in fields with abundant and readily assayed tissue samples from diseased populations, such as in oncology, where biopsied tumours yield molecular information leading to precision interventions.7 Similarly, the heterogeneity of cognitive ageing may be partly explained by using high-dimensional molecular measures from post-mortem brain tissue of elderly donors to group similar individuals. For example, molecular subtypes of Alzheimer’s disease derived from RNA sequencing (RNAseq) data have been associated with Alzheimer’s disease-relevant pathologies,8–11 including amyloid and tau neuropathological burden, and APOE genotype.8,9 Subtypes derived from common genetic variation, specifically single nucleotide polymorphisms, identified multiple Alzheimer’s disease-related molecular mechanisms.12 A major limitation of most existing subtyping studies in this field is that they rely on information from single data modalities, e.g. gene expression data, which greatly constrains the information used to parse biological systems and pathological processes.13,14

Importantly, it has been shown that several multi-omic data types, including histone acetylation,15 metabolomics 16–20 and proteomics,21 are not only associated with Alzheimer’s disease neuropathologies, but also contributed information to associations that is missed with RNAseq alone.8,21 As such, integrating data modalities into subtyping pipelines has been an active area of research,22,23 and large-scale cohort studies of ageing that include brain donation and multi-omic characterization, such as those from the Accelerating Medicines Partnership for Alzheimer’s Disease (AMP-AD) consortium, now offer opportunities for developing highly integrative models of cognitive decline.24 Methods development in high-dimensional feature integration have also facilitated these analyses,25,26 though not yet in pathological ageing or Alzheimer’s disease. Similarity network fusion (SNF) is a network-based method specifically developed to integrate several multi-omic data modalities simultaneously.27

Here, we performed a highly integrative analysis on up to five post-mortem multi-omic data modalities simultaneously, measured in the same individuals, to identify molecular subtypes of ageing using the SNF method. We then characterized these subtypes by associating subtype membership with 13 age-related neuropathologies, antemortem cognitive performance, and rates of longitudinal cognitive decline. The most important features contributing to the fully fused similarity network were identified and subsequent analyses focused on the most informative data modalities. Lastly, we performed comprehensive sensitivity testing to explore the effects of parameter selection in unsupervised multi-omic subtyping, which are often chosen arbitrarily.

Material and methods

Study participants

Data were analyzed from two longitudinal cohort studies of ageing and dementia: the Religious Orders Study and Rush Memory and Aging Project (ROS/MAP), with more than 3500 predominantly White elderly (mean = 78.44, SD = 7.79) participants of mostly European descent without known dementia at the time of enrolment.28 Participants in ROS (1994 to present) are older Catholic priests, nuns and brothers across the USA, whereas MAP (1997 to ongoing) recruits primarily from retirement communities and via social service agencies and Church groups throughout northeastern Illinois.28,29 Combined data analysis for these two cohorts are enabled by harmonized protocols for participant recruitment, clinical assessment and neuropathological examination at autopsy (autopsy rate exceeding 86%) with a large common core of identical item level data. A Rush University Medical Center Institutional Review Board approved each study. All participants signed an Anatomic Gift Act as well as informed and repository consents. Annual visits include tests of cognition function and a broad range of other demographic, social, lifestyle and clinical assessments with an averaged follow-up rate of 97%.29 Further details about the ROS and MAP cohorts can be found in previous publications30 and through the Rush Alzheimer’s Disease Center Research Resource Sharing Hub, where participant-level clinical and demographic data are available via restricted access (https://www.radc.rush.edu/home.htm).

Multi-omic data used for subtyping

We used five multi-omic data modalities to identify molecular subtypes: bulk RNAseq (18 629 genes, nRNAseq = 1092), DNA methylation (53 932 CpG sites, nDNA = 740), histone H3K9 acetylation (26 384 peaks, nhistone = 669), metabolomics (654 metabolites, nmetabolomics = 514) and tandem mass tag proteomics (7737 proteins, nproteomics = 368). All data types were acquired from the same brain region post-mortem: dorsolateral prefrontal cortex (DLPFC). All omic datasets used in our analyses were generated by members of the AMP—Alzheimer’s disease consortium and are available via restricted access through the AMP-Alzheimer’s disease knowledge portal, on Synapse (https://adknowledgeportal.synapse.org/). Further details can be found in Acknowledgements.

RNAseq

Full details on gene-level expression data from bulk DLPFC tissue have been published.31 Approximately 100 mg of DLPFC tissue were dissected from autopsied brains. Samples were processed in batches of 12–24 samples for RNA extraction using the Qiagen MiRNeasy Mini (cat no. 217004) protocol, including the optional DNAse digestion step. RNA Samples were submitted to the Broad Institute’s Genomics Platform for transcriptome library construction following sequencing in three batches using the Illumina HiSeq (batch #1: 50 M 101 bp paired end reads) and NovaSeq6000 (batch #2: 30 M 100 bp paired end; batch#3: 40–50 M 150 bp paired end 121 reads).32 A cut-off point of 5 for RNA Integrity Number score was used for constructing the cDNA library.33 The average sequencing depth was 50 million paired reads per sample. To achieve higher quality of alignment results, a paralleled and automatic RNAseq pipeline was implemented based on several Picard metrics (http://broadinstitute.github.io/picard/). Eighteen thousand six hundred twenty-nine features—full-length gene transcripts—from 1092 samples remained after data preprocessing and quality control (QC).

DNA methylation

Tissues were dissected similar to gene-expression data, full details on DNA methylation data have been published.33 DNA was extracted by the Qiagen QIAamp mini protocol (Part number 51306). Probes with P-value >0.01 were removed at probe level QC if predicted to cross-hybridize with sex chromosomes and having overlaps with known SNP with MAF ≥0.01 (±10 bp) based on the 1000 Genomes database. Subject level QC methods, including principal component analysis and bisulphite conversion efficiency, have also been previously described in depth.33 β-values reported by the Illumina platform were used as the measurement of methylation level for each CpG probe tagged on the chip, where missing values were imputed by the k-nearest neighbour algorithm (k = 100). The primary data analysis was adjusted by age, sex and experiment batch.33 Due to the large number of features present for this data type and to limit computational time, we only included the top 53 932 methylation peaks showing the greatest variability (Supplementary Fig. 1A). To verify that this selection process did not impact our subtyping efforts, we performed sensitivity analysis for five-modal integration using all CpG sites—resulting subtype memberships were nearly identical (Supplementary Fig. 1B).

Histone H3K9 acetylation

For the acetylation of the ninth lysine of histone 3 (H3K9ac), which is a marker of open chromatin, the Millipore anti-H3K9ac mAb (catalog #06-942, lot: 31636) was identified as a robust monoclonal antibody for the chromatin immunoprecipitation experiment. Similar to RNAseq and DNA methylation, 50 mg of grey matter was dissected on ice from biopsies of the DLPFC of each participant of ROS/MAP. Chromatin labelled with the H3K9ac mark and bound to the antibody was purified with protein A Sepharose beads.15 To quantify histone acetylation, single-end reads were aligned to the GRCh37 reference genome by the BWA algorithm after sequencing. Picard tools were used to flag duplicate reads. A combination of five ChIP-seq quality measures were employed to detect low quality samples: samples that did not reach (i) ≥15 × 106 uniquely mapped unique reads, (ii) non-redundant fraction ≥0.3, (iii) cross correlation ≥0.03, (iv) fraction of reads in peaks ≥0.05 and (v) ≥6000 peaks were removed.15 Samples passing QC were used to define a common set of peaks termed H3K9ac domains. H3K9ac domains of less than 100 bp width were removed resulting in a total of 26 384 H3K9ac domains with a median width of 2829 bp available for 669 subjects. Full details on H3K9ac data can be found on Synapse (https://www.synapse.org/#!Synapse:syn4896408).

Metabolomics

Metabolomics data were generated by the Alzheimer’s Disease Metabolomics Consortium (ADMC members list https://sites.duke.edu/adnimetab/team/), led by Dr. Rima Kaddurah-Daouk.18–20 Metabolomic profiling of post-mortem brain was conducted at Metabolon (Durham, NC) with the Discovery HD4 platform consisting of four independent ultra-high-performance liquid chromatography–tandem mass spectrometry instruments.16,17 For the purpose of QC and better understanding of the underlying biological mechanisms, missing rates less than 20% on known metabolites and 40% on individuals were imposed. As SNF cannot handle missing data, random forest imputation34 was then applied, resulting in 654 metabolites and 514 individuals. Full details on metabolomic assays and data processing can be found here (https://www.synapse.org/#!Synapse:syn26007830). The full metabolomics dataset and metadata can be accessed via the AMP-Alzheimer’s disease Knowledge Portal.

Proteomics

Prior to tandem mass tag labelling, samples were randomized by co-variates [age, sex, post-mortem interval (PMI), diagnosis, etc.], into 50 total batches (8 samples per batch).35 Peptides from each individual (n = 400) and the GIS pooled standard (n = 100) were labelled using the tandem mass tag 10-plex kit (ThermoFisher 90406). Peptide eluents were separated on a self-packed C18 (1.9 μm, Dr. Maisch) fused silica column (25 cm × 75 μM internal diameter) by a Dionex UltiMate 3000 RSLCnano liquid chromatography system (Thermo Fisher Scientific).35,36 Peptides were monitored on an Orbitrap Fusion mass spectrometer (Thermo Fisher Scientific). The mass spectrometer was set to acquire data in positive ion mode using data-dependent acquisition. Dynamic exclusion was set to exclude previously sequenced peaks for 20 s within a 10-ppm isolation window.35,36 In this study, we only include peptides and participants with a missing rate less than 20% followed by random forest imputation,37 resulting in 7737 proteins and 386 individuals. Full details on proteomics data acquisition and processing can be found on synapse (https://www.synapse.org/#!Synapse:syn17015098).

Uniform multi-omic feature post-processing

Due to differences in data feature preprocessing among the five selected omic data modalities, we performed additional post-processing QC to determine whether technical and demographic covariates may be influencing global patterns of variability for each modality. To achieve this, we tested associations between age of death, sex, PMI and study cohort (ROS versus MAP) with each of the top 20 components from principal component analysis for each omic modality separately, as in previous omic work in this cohort.31 The proportion of variance explained by each principal component from each of the five data modalities, and the corresponding associations of each principal component with potential covariates are shown in Supplementary Figs 2–6. Based on this assessment, we determined that four out of five data modalities showed significant associations of all four covariates within the first 10 principal components (RNAseq data had been post-processed already and residualized for each of these covariates in addition to modality-specific confounders). We therefore proceeded by residualizing all features from each modality according to a linear model including all four covariates. This conservative approach ensured that contributions of each modality to latent subgroups were not unbalanced by different representations of covariate-specific effects. We also performed iterations of the analysis without correction, finding very similar but not identical subgroup memberships for five-modal integration (Supplementary Fig. 7).

Neuropathological assessment

All selected post-mortem neuropathological variables analyzed in this study have been previously published in detail.29,38 In addition to the outcome of NIA-Reagan neuropathological diagnosis of Alzheimer’s disease,5,39 we examined 12 other individual pathologies: brain wide amyloid-beta, diffuse and neuritic plaque counts, paired helical filament tau, neurofibrillary tangle count, Transactive response DNA binding protein of 43 kDa (TDP-43) proteinopathy stage (4 levels), large vessel cerebral atherosclerosis rating (4 levels), arteriolosclerosis, semiquantitative summary of cerebral amyloid angiopathy pathology (4 levels), pathologic stage of Lewy body disease (4 stages), gross chronic cerebral infarcts (coded as binary; presence/absence of infarcts) and cerebral microinfarcts (coded as binary; presence/absence of infarcts).

Cognitive performance and residual cognition (resilience)

Scores from five cognitive domains (episodic memory, semantic memory, working memory, perceptual speed and perceptual orientation) were recorded at last study visit and summarized by z-scoring for a composite measure of global cognition, as described.40 In our study, we defined the last available global cognitive measure as cognitive performance proximal to death. Cognitive slopes were also derived from the same set of z-scores over time to measure the longitudinal aspect of cognitive decline.41 To assess the resilience component of an individual’s cognitive capacity, we used the residual cognition approach.42,43 Residual cognition was defined as the residuals of a linear model of global cognitive performance at last visit regressed on observed neuropathologies (beta-amyloid, neurofibrillary tangles, neuritic plaques, diffuse plaques, Lewy bodies, macroscopic infarcts, microscopic infarcts, atherosclerosis, arteriolosclerosis, TDP-43 and cerebral amyloid angiopathy).

Statistical analysis

Subtype identification with SNF

The SNF method was used to integrate multi-omic data modalities.27 SNF first constructs sample-by-sample similarity matrices for each data modality separately and then iteratively updates and integrates these matrices via nonlinear combination until convergence is reached, generating a fused similarity network.44 SNF does not require any prior feature selection, but fully imputed (non-missing) data is required. According to best practices,35,37 random forest imputation was applied on both metabolomics and proteomics data to impute missing values. The ‘SNFtool’ R package (v2.2.0) was used for the network fusion pipeline, with recommended parameters K = 40, alpha = 0.5 and T = 50 (where K is the number of neighbours used to construct the similarity matrices; alpha is a hyper-parameter used in the scaling of edge weights; T is the total number of algorithmic iterations). Spectral clustering, an unsupervised soft clustering method rooted in graph theory,27,45 is the default clustering method for ‘SNFtool’; it was applied to the full fused affinity matrix to cluster study participants into subtypes. Optimal cluster numbers were identified (2 to 8 clusters) by the rotation cost46 and eigen-gap45 methods. Data modalities contributing the most information to fused similarity matrices were computed by normalized mutual information (NMI). NMI is a method able to detect the agreement among two input matrices, which helps to identify the data types that contribute most strongly to the fused similarity network estimated by SNF. NMI values range from 0 to 1 and are calculated by comparing joint probability distributions of assigned cluster memberships of each data modality versus the fused network while standardizing by the entropy (uncertainty) of each data modality. Larger NMI values indicate higher similarity between two clustering solutions, although there is no standard or common threshold of significance for its value. In our analysis, NMI refers to the measure of compatibility between the single data modality similarity matrix and SNF fused similarity matrix.27

Assessment of internal subtype validity

Due to the high dimensionality and heterogeneity of multi-omic data, assessments of cluster validity are critical to tackling potential biases of clustering algorithms towards particular cluster properties and to evaluate the probability that clusters do in fact exist.47,48 Upon subtype identification, we conducted internal cluster stability analysis using the R package ‘clValid’, which measures cluster validity and stability through several metrics derived from resampling and cross-validation. Metrics included in our studies are the average proportion of non-overlap (APN) and the average distance between means (ADM), which work especially well if the data are highly correlated, which is often the case in high-throughput genomic data.48–50 For resampling, we pulled 80% of participants for a total of 300 random draws, in accordance with previously published work using the SNF pipeline51 as well as other Alzheimer’s disease molecular subtyping efforts.8 The adjusted Rand index (ARI) was used to measure the agreement between subtype membership solutions (ranging from 0 to 1, where ARI = 1 indicating perfect agreement).52 Chi-square statistics were also used to compare the independence between different subtyping solutions.53

Identifying top individual features defining molecular subtypes

In order to identify molecular features that differed most between subtypes after spectral clustering, we performed one-way ANOVA tests between each normalized feature from each multi-omic data modality and subtype groupings. P-values from F-tests were used as the measure of significance to rank features from each modality. Gene annotations for DNA methylation data were mapped using the UCSC genome browser,54 and histone acetylation peaks were annotated by Klein et al.15

Association of subtypes with neuropathology, cognition and residual cognition

For each clustering solution, subtype membership was initially characterized by associations with 13 neuropathologies and three cognitive measures described above using linear or logistic regression. Subtype membership for each participant was represented with dummy variables for inclusion in each model (nsubtypes-1). For models of neuropathology, co-variates included age at death, biological sex, educational attainment (years), PMI, study cohort and APOE 4 status.

When fitting regression models for cognitive outcomes, the model (B) was also adjusted for the measurement latency, which is equal to the time difference (in years) between the last study visit where cognitive performance was assessed and age of death.

Omnibus F-tests of the hypothesis of equal outcome means (or probabilities for logistic models) across all subtypes were used to test the significance of subtype membership effects. P-values were Bonferroni adjusted for 16 tested outcomes, except where otherwise indicated. For subtypes with significant effects on global cognition (either at last visit or longitudinal slope), secondary analyses were performed (according to model B) for each subdomain of cognition separately.

Sensitivity analyses for external validity across data modalities, sample sizes and cluster numbers

To better understand the added value of data integration in the context of molecular subtyping, we performed a set of sensitivity analyses to measure differences in neuropathological and cognitive relevance (external validity) of subtypes derived from different combinations of multi-omic data modalities. Given that each iteration of these integrative analyses was limited to the sample size in which all data types were non-missing, we also assessed the effects of performing clustering in artificially limited sample subsets (i.e. where included non-missing data modalities permit a larger sample size). To achieve this, we defined a full search space of analytical pipeline configuration and parameter combinations for exhaustive modelling: (i) data modalities included (d; 31 possible combinations), (ii) sample size (n; ranging from 111 to 1092 participants, including 31 possible sample sizes each corresponding to a different data modality combination) and (iii) cluster number (c; ranging from 2–5, the extremes of values observed in our subtyping analyses). This resulted in a total of 844 unique combinations of d, n, and c. To evaluate external validity, we performed omnibus tests of the association of subtype membership for each analytical iteration with the set of neuropathologies and cognitive measures, as previously. To provide some generalized insight into the effects of manipulating design parameters on our association strengths, second-level analyses were performed by relating each pipeline parameter to observed omnibus model significance for each neuropathology and cognitive outcome (j). For these analyses, the effects of c (and cf, the same parameter but treated as a categorical variable), n, and a new parameter, m, representing the number of data modalities being fused, were tested independently, according to the following formulae:

Ethics approval and consent to participate

For the ROS/MAP, all study participants provided informed consent and both studies were approved by a Rush University Institutional Review Board. Further, all participants signed an Anatomic Gift Act for organ donation and signed a repository consent for resource sharing.

Results

We analyzed data from a total of 1314 unique participants from the ROS/MAP with at least one available multi-omic data modality and non-missing clinical and neuropathological data. Sample demographics are summarized in Table 1. The number of participants with different degrees of overlapping multi-omic characteristics ranged from n = 111 (all five data types) to 1092 (RNAseq only); all overlaps are shown in Fig. 1A.

Molecular subtypes derived from five multi-omic data modalities via similarity network fusion. (A) Overlapping sample sizes across all combinations between five data modalities were examined using upset plot. (B) Unimodal subtypes were identified from affinity matrices using spectral clustering accordingly from 111 overlapping samples (RNAseq: three subtypes, DNA methylation: two subtypes, histone acetylation: two subtypes, proteomics: two subtypes, metabolomics: three subtypes). Fully integrated subtypes were illustrated in the affinity matrix as well.
Figure 1

Molecular subtypes derived from five multi-omic data modalities via similarity network fusion. (A) Overlapping sample sizes across all combinations between five data modalities were examined using upset plot. (B) Unimodal subtypes were identified from affinity matrices using spectral clustering accordingly from 111 overlapping samples (RNAseq: three subtypes, DNA methylation: two subtypes, histone acetylation: two subtypes, proteomics: two subtypes, metabolomics: three subtypes). Fully integrated subtypes were illustrated in the affinity matrix as well.

Table 1

Table summarizing demographic data and the availability of multi-omic data modalities stratified by NIA-Reagan diagnosis criteria in ROS/MAP

Total n = 1313 individuals with post-mortem measurement
Non-Alzheimer’s disease (n = 475)Alzheimer’s disease (n = 838)Total
Age at baseline79.62 (7.25)81.46 (6.62)80.81 (6.91)
Age of death87.55 (7.12)90.27 (6.11)89.28 (6.62)
Biological sex (0: female, 1: male)185 males, 290 females245 males, 593 females430 males, 883 females
Post-mortem interval8.32 (6.31)8.32 (5.92)8.32 (6.06)
APOE E4 (0: without E4, 1: with E4)13.05%32.36%25.74%
Year of education16.34 (3.55)16.10 (3.55)16.19 (3.55)
Proportion of participants with non-missing data for each data type
RNAseq84.84%84.22%83.17%
DNA methylation61.26%53.46%56.28%
Histone acetylation53.68%49.28%50.88%
Metabolomics36.00%40.93%39.15%
Proteomics31.79%25.89%28.03%
Total n = 1313 individuals with post-mortem measurement
Non-Alzheimer’s disease (n = 475)Alzheimer’s disease (n = 838)Total
Age at baseline79.62 (7.25)81.46 (6.62)80.81 (6.91)
Age of death87.55 (7.12)90.27 (6.11)89.28 (6.62)
Biological sex (0: female, 1: male)185 males, 290 females245 males, 593 females430 males, 883 females
Post-mortem interval8.32 (6.31)8.32 (5.92)8.32 (6.06)
APOE E4 (0: without E4, 1: with E4)13.05%32.36%25.74%
Year of education16.34 (3.55)16.10 (3.55)16.19 (3.55)
Proportion of participants with non-missing data for each data type
RNAseq84.84%84.22%83.17%
DNA methylation61.26%53.46%56.28%
Histone acetylation53.68%49.28%50.88%
Metabolomics36.00%40.93%39.15%
Proteomics31.79%25.89%28.03%

All participants in the sample space have at least one omic data modality and phenotype data available; mean and standard deviation is recorded.

Table 1

Table summarizing demographic data and the availability of multi-omic data modalities stratified by NIA-Reagan diagnosis criteria in ROS/MAP

Total n = 1313 individuals with post-mortem measurement
Non-Alzheimer’s disease (n = 475)Alzheimer’s disease (n = 838)Total
Age at baseline79.62 (7.25)81.46 (6.62)80.81 (6.91)
Age of death87.55 (7.12)90.27 (6.11)89.28 (6.62)
Biological sex (0: female, 1: male)185 males, 290 females245 males, 593 females430 males, 883 females
Post-mortem interval8.32 (6.31)8.32 (5.92)8.32 (6.06)
APOE E4 (0: without E4, 1: with E4)13.05%32.36%25.74%
Year of education16.34 (3.55)16.10 (3.55)16.19 (3.55)
Proportion of participants with non-missing data for each data type
RNAseq84.84%84.22%83.17%
DNA methylation61.26%53.46%56.28%
Histone acetylation53.68%49.28%50.88%
Metabolomics36.00%40.93%39.15%
Proteomics31.79%25.89%28.03%
Total n = 1313 individuals with post-mortem measurement
Non-Alzheimer’s disease (n = 475)Alzheimer’s disease (n = 838)Total
Age at baseline79.62 (7.25)81.46 (6.62)80.81 (6.91)
Age of death87.55 (7.12)90.27 (6.11)89.28 (6.62)
Biological sex (0: female, 1: male)185 males, 290 females245 males, 593 females430 males, 883 females
Post-mortem interval8.32 (6.31)8.32 (5.92)8.32 (6.06)
APOE E4 (0: without E4, 1: with E4)13.05%32.36%25.74%
Year of education16.34 (3.55)16.10 (3.55)16.19 (3.55)
Proportion of participants with non-missing data for each data type
RNAseq84.84%84.22%83.17%
DNA methylation61.26%53.46%56.28%
Histone acetylation53.68%49.28%50.88%
Metabolomics36.00%40.93%39.15%
Proteomics31.79%25.89%28.03%

All participants in the sample space have at least one omic data modality and phenotype data available; mean and standard deviation is recorded.

Fully integrated five-modal network identifies two molecular subtypes

First, we aimed to determine whether molecular subtypes derived from all five multi-omic data modalities were informative of post-mortem neuropathology and antemortem cognitive decline. SNF yielded an optimal solution of two molecular subtypes (Fig. 1B) in 111 individuals with all five omic modalities (nS1 = 53, nS2 = 58). Both the rotation cost and Eigen-gap methods elected two as the optimal number of clusters. These subtypes were weakly associated with diffuse plaque counts (praw = 0.03), though these associations did not survive correction for multiple testing. In addition, no significant associations were observed for cognitive performance at last visit, rate of cognitive decline or residual cognition (Fig. 2).

Molecular subtypes derived from five multi-omic data modalities are not significantly associated with age-related neuropathologies or cognitive measurements. Associations of fully integrated subtype memberships and 16 age-related neuropathologies and cognitive measurements were examined by omnibus F-tests for linear regression models for a subset of 111 individuals. Y-axis shows significance of association (−log10 transformed raw P-values). The two horizontal lines indicate P-value thresholds of 0.05 (lower line) and Bonferroni-adjusted P-value thresholds for 16 tests (praw = 3.1 × 10−3) (higher line).
Figure 2

Molecular subtypes derived from five multi-omic data modalities are not significantly associated with age-related neuropathologies or cognitive measurements. Associations of fully integrated subtype memberships and 16 age-related neuropathologies and cognitive measurements were examined by omnibus F-tests for linear regression models for a subset of 111 individuals. Y-axis shows significance of association (−log10 transformed raw P-values). The two horizontal lines indicate P-value thresholds of 0.05 (lower line) and Bonferroni-adjusted P-value thresholds for 16 tests (praw = 3.1 × 10−3) (higher line).

Despite the lack of significant associations of molecular subtypes with pathology and cognition, the fully fused network demonstrated substantial internal stability (APN = 8.7%, ADM = 0.02; Supplementary Fig. 8). We therefore proceeded to identify the data modalities contributing most to the fused network by NMI (Supplementary Table 1). We found that histone acetylation (NMI = 0.38), DNA methylation (NMI = 0.18) and RNAseq (NMI = 0.15) were the top contributors to the fused network [to a substantially greater degree than proteomic (NMI = 0.04) and metabolomic modalities (NMI = 0.05)]. The top 10 individual features contributing to the fused network from the top contributors are summarized in Supplementary Table 2. Based on the importance of the top three data modalities, secondary analysis was conducted integrating only histone acetylation, DNA methylation and RNAseq, which permitted subtyping of a much larger sample size with non-missing overlapping data (n = 513).

Subtypes derived from three-modal integration are associated with longitudinal cognitive decline

In secondary analyses with three data modalities, the Eigen-gap method elected three molecular subtypes as the optimal clustering solution, while rotation cost elected five. We therefore evaluated both solutions by comparing membership overlap, differences in internal validity metrics and associations with neuropathology and cognition. A strong overlap was identified between subtype memberships in the three- and five-subtype solutions (chi-square P = 2.2 × 10−16, ARI = 0.76; Fig. 3A and D), whereby the large subtype 3 (n = 377) from the three-subtype solution contained 81.2% of the participants assigned to subtypes 3, 4, and 5 from the five-subtype solution. Internal cluster stability was compared between three-subtype and five-subtype solutions (Fig. 3B and C); both APN and ADM measures were better for the three-subtype solution (APN = 9.6%, ADM = 0.01), though the five-subtype solution also demonstrated cluster stability well above random chance (APN = 23.1%, ADM = 0.02) (Fig. 4).

Overlap between two subtyping solutions derived from histone acetylation, DNA methylation and RNAseq data types. (A) Alluvial plot showing the overlap of three-subtype and five-subtype solutions derived from three-modal integrated networks. (B–D) Fused affinity matrices for three- and five-subtype solutions and their cross-tabulated memberships.
Figure 3

Overlap between two subtyping solutions derived from histone acetylation, DNA methylation and RNAseq data types. (A) Alluvial plot showing the overlap of three-subtype and five-subtype solutions derived from three-modal integrated networks. (BD) Fused affinity matrices for three- and five-subtype solutions and their cross-tabulated memberships.

Cluster stability metrics for two subtyping solutions derived from histone acetylation, DNA methylation and RNAseq data types. (A) Histograms for the distribution of the APN generated from 300 random sub-samples for both three-subtype and five-subtype solutions. (B) Histograms for the distribution of the ADM generated from 300 random sub-samples for both three-subtype and five-subtype solutions.
Figure 4

Cluster stability metrics for two subtyping solutions derived from histone acetylation, DNA methylation and RNAseq data types. (A) Histograms for the distribution of the APN generated from 300 random sub-samples for both three-subtype and five-subtype solutions. (B) Histograms for the distribution of the ADM generated from 300 random sub-samples for both three-subtype and five-subtype solutions.

In tests of external validity, and tests of association with neuropathological and cognitive measures, subtype membership was significantly associated with global cognition at last visit (pBonf = 0.022) and rate of cognitive decline (pBonf = 4.2 × 10−4) for the five-subtype solution after multiple testing correction (Fig. 5). In contrast, the three-subtype solution was preferred by internal cluster stability metrics, and significant associations with neuropathology or cognition were not observed (Fig. 5). We therefore probed further into the five-subtype solution.

Association of subtyping solutions derived from histone acetylation, DNA methylation, and RNAseq data types with age-related neuropathologies and cognitive measurements. Associations of three-modal integrated memberships and 16 age-related neuropathological and cognitive traits were examined by omnibus F-tests for linear regression models for a subset of 513 individuals. The y-axis shows significance of association (−log10 transformed raw P-values). The lowest horizontal line indicatesan unadjusted P-value threshold at 0.05, whereas the second and third horizontal lines demonstrate Bonferroni-adjusted P-value thresholds for 16 and 32 tests (praw = 3.1 × 10−3 and praw = 1.6 × 10−3), respectively. Two subtyping solutions for molecular subtyping are differentiated by shading.
Figure 5

Association of subtyping solutions derived from histone acetylation, DNA methylation, and RNAseq data types with age-related neuropathologies and cognitive measurements. Associations of three-modal integrated memberships and 16 age-related neuropathological and cognitive traits were examined by omnibus F-tests for linear regression models for a subset of 513 individuals. The y-axis shows significance of association (−log10 transformed raw P-values). The lowest horizontal line indicatesan unadjusted P-value threshold at 0.05, whereas the second and third horizontal lines demonstrate Bonferroni-adjusted P-value thresholds for 16 and 32 tests (praw = 3.1 × 10−3 and praw = 1.6 × 10−3), respectively. Two subtyping solutions for molecular subtyping are differentiated by shading.

In three-modal subtyping, histone acetylation provided the most information to fused cluster memberships, with NMI = 0.52 (RNAseq NMI = 0.05 and DNA methylation NMI = 0.03). Cross-tabulation of three-modal and five-modal subtype memberships was carried out for only the 111 individuals included in the full five-modal analysis above, finding substantial overlap (chi-square P = 8.1 × 10−9, ARI = 0.60; Fig. 6A). These results demonstrated that the SNF procedure was consistent across sample size in terms of defining core cluster memberships when the most influential data types were combined.

Molecular subtypes derived from histone acetylation, DNA methylation, and RNAseq data are associated with cognitive decline. (A) Subtypes derived from three-modal integrated networks overlapped with the fully integrated subtypes. (B) Associations of three-modal integrated subtypes and rate of cognitive decline for a subset of 513 individuals. Y-axis shows standardized beta coefficients estimated from linear regression, where subtype 1 was used as the baseline category (error bars show standard deviation from standardized linear regression models). (C) Difference in mean value of rate of cognitive decline between subtypes by Tukey’s honestly significant difference test.
Figure 6

Molecular subtypes derived from histone acetylation, DNA methylation, and RNAseq data are associated with cognitive decline. (A) Subtypes derived from three-modal integrated networks overlapped with the fully integrated subtypes. (B) Associations of three-modal integrated subtypes and rate of cognitive decline for a subset of 513 individuals. Y-axis shows standardized beta coefficients estimated from linear regression, where subtype 1 was used as the baseline category (error bars show standard deviation from standardized linear regression models). (C) Difference in mean value of rate of cognitive decline between subtypes by Tukey’s honestly significant difference test.

In assessments of the mean differences in global cognition and the ratio of cognitive decline across five subtypes identified, subtype 5 had the worst global cognitive performance at last visit and the fastest rate of cognitive decline (Fig. 6B). This difference was significant in post hoc pairwise tests against all other subtypes, except for subtype 2 (Fig. 6C). Subtype 4 exhibited the best average cognitive performance and slowest decline (Fig. 6B and C). Notably, the association observed with cognitive decline (pBonf = 5.9 × 10−3) was strong enough to survive correction for multiple testing across combined five-subtype and three-subtype association test sets (32 tests) (Fig. 5). Given the significant association of subtypes with global cognition at last visit and rate of global cognitive decline, we performed follow-up analysis on five cognitive subdomains. For rate of cognitive decline, subtypes were most strongly associated with perceptual orientation (pBonf = 8.0 × 10−5), perceptual speed (pBonf = 0.004), and semantic memory (pBonf = 0.007) (Supplementary Fig. 9A). Specifically, the best and worst cognitive performance values were observed on average in subtypes 4 and 5, respectively (Supplementary Fig. 9B–F). A similar pattern was also identified from cognition measured at last visit (Supplementary Fig. 9G–L).

Molecular features defining three-modal subtypes

To describe the molecular signals most strongly associated with our observed subtypes, we first identified the top features contributing to the fused network from each data modality by ANOVA (Supplementary Table 3). The top five histone acetylation features exhibited the strongest within-subtype homogeneity and between-subtype variability (consistent with the observation that histone acetylation had the largest NMI of each modality Supplementary Table 1). The most extreme values for acetylation were observed in subtypes 1 (lowest levels) and 2 (highest levels) at peaks annotated to ZNF219, TMEM153, LSM14A, PSMD11, CDK5R1, MYD1D, ALDH3A2, APBB2 and others (Fig. 7). Subtype 5, which was characterized by the fastest rate of cognitive decline, had intermediate acetylation of these peaks (along with subtype 4, which are largely represented by subtype 3 in the three-subtype solution). For DNA methylation, CpG sites showed differential methylation at sites annotated to RB1, LPAR6, and RP11-83B20.10, as well as intergenic regions on chromosome 5 and 7, though no consistent pattern related to the cognition-associated subtype 5 was observed (Fig. 8). In contrast, the top subtype-associated RNAseq features revealed lower levels of PCYOX1L and NECTIN1, as well as higher levels of SLC5A3, PPP4R2, and PPP1CC in subtype 5 specifically compared to all other subtypes (Fig. 9).

Top five contributed features from histone acetylation. Boxplots showing the z-normalized values of the top five features contributing to the three-modal fused network from histone acetylation.
Figure 7

Top five contributed features from histone acetylation. Boxplots showing the z-normalized values of the top five features contributing to the three-modal fused network from histone acetylation.

Top five contributed features from RNAseq. Boxplots showing the z-normalized values of the top five features contributing to the three-modal fused network from RNAseq.
Figure 8

Top five contributed features from RNAseq. Boxplots showing the z-normalized values of the top five features contributing to the three-modal fused network from RNAseq.

Top five contributed features from DNA methylation. Boxplots showing the z-normalized values of the top five features contributing to the three-modal fused network from DNA methylation.
Figure 9

Top five contributed features from DNA methylation. Boxplots showing the z-normalized values of the top five features contributing to the three-modal fused network from DNA methylation.

Comparison with single modality subtypes and sensitivity analysis

Finally, we compared clinical and neuropathological associations of these three-modal subtypes with those for subtypes derived from each of the modalities analyzed individually. We found that these integrated subtypes had unique associations with cognitive performance and decline. For example, subtypes derived from RNAseq alone (n = 1092) were significantly associated with amyloid-beta (pBonf = 0.018) and neuritic plaque burden (pBonf = 2.3 × 10−3), but not with global cognition at last visit (pBonf = 0.28) or rate of cognitive decline (pBonf = 1.0). In fact, none of the unimodal subtypes showed more significant associations than three-modal, five-cluster subtypes on global cognitive performance (Fig. 10).

Unimodal molecular subtypes derived from histone acetylation, DNA methylation and RNAseq were tested against age-related neuropathologies and cognitive measurements. Associations of three-modal integrated and unimodal subtype memberships with neuropathological and cognitive traits were examined by omnibus F-tests for linear regression models for a subset of 513 individuals. Y-axis shows significance of association (-log10 transformed raw P-values). The lowest horizontal line illustrates an unadjusted P-value threshold at 0.05, whereas the middle and highest horizontal lines demonstrate Bonferroni-adjusted P-value thresholds for 16 and 32 tests (praw = 3.1 × 10−3 and praw = 1.6 × 10−3), respectively. Data modalities used for molecular subtyping are differentiated by shading.
Figure 10

Unimodal molecular subtypes derived from histone acetylation, DNA methylation and RNAseq were tested against age-related neuropathologies and cognitive measurements. Associations of three-modal integrated and unimodal subtype memberships with neuropathological and cognitive traits were examined by omnibus F-tests for linear regression models for a subset of 513 individuals. Y-axis shows significance of association (-log10 transformed raw P-values). The lowest horizontal line illustrates an unadjusted P-value threshold at 0.05, whereas the middle and highest horizontal lines demonstrate Bonferroni-adjusted P-value thresholds for 16 and 32 tests (praw = 3.1 × 10−3 and praw = 1.6 × 10−3), respectively. Data modalities used for molecular subtyping are differentiated by shading.

In sensitivity analyses, substantial variability in external validity was observed across different selections of sample size, data modalities and cluster number. Supplementary Fig. 10 illustrates the full set of results for selected amyloid and cognitive outcomes, which were the outcomes demonstrating the most significant associations with subtype membership in our analyses above (full summary statistics from these analyses are available in Supplementary Table 4). Supplementary Fig. 11A shows the meta-regression results for the influence of sample size (n), number of data modalities (m) and cluster number (c) on statistical associations with all 16 tested phenotypes. Generally, less significant associations were captured as more data modalities were integrated and sample size decreased (see example of beta-amyloid in Supplementary Fig. 11B), though exceptions were noted, such as for Lewy bodies (where additional modalities on average increased external validity; meta praw = 2.5 × 10−7; Supplementary Fig. 11C). Comparatively, cluster number selection had less of an impact overall on external validity.

Discussion

We used up to five omic data modalities acquired from the human post-mortem prefrontal cortex simultaneously to detect molecular subtypes of ageing using a high-dimensional, unsupervised approach. We identified several subtypes that were significantly associated with individuals’ rates of cognitive decline and levels of beta-amyloid neuropathology. In particular, molecular subtypes derived from a three-modal integrated network combining gene expression (RNAseq), H3K9ac and DNA methylation peaks yielded subtypes of participants with significantly faster decline in global cognition, specifically in domains of perceptual orientation, perceptual speed and semantic memory. To the best of our knowledge, associations between multi-omic subtypes and cognitive performance have not previously been identified, and most subtyping studies have focused only on individuals with confirmed, late-stage Alzheimer’s disease.55 Our findings also empirically quantify the relative information provided by different omic modalities to participant similarity networks.

In fully integrated analyses, combining all five available modalities, we identified two molecular subtypes which exhibited non-significant external validity with respect to neuropathology and cognition. We did not explore this result much further for three reasons: (i) both internal cluster validity metrics (Eigen gap and rotation cost) elected the same two-subtype solution, (ii) the sample size for full five-modal integration analysis was small (n = 111) and (iii) NMI calculations showed substantial heterogeneity in the amount of information contained within each modality when considering patient similarity networks in this sample subset. The small sample size was likely a key limitation; this was confirmed by sensitivity analyses showing that even for single data modalities, when the sample was restricted to the n = 111 group, and there were virtually no observed associations with any cognitive or neuropathological measures.

By comparing both integrated molecular subtypes and unimodal subtypes from spectral clustering, we found that subtypes from RNAseq alone were significantly associated with neurofibrillary tangles and amyloid-beta. Such associations align with findings from previous subtyping work in only individuals suffering from dementia,8 and demonstrate the reliability of the method we used for subtyping. In our analysis, we consistently found that histone acetylation provided the most information to fused subtypes. Statistically, it is difficult to speculate on why the H3K9ac data type contributed most to our fused network and proteomics, the least, as inputs were quite uniformly post-processed and scaled, and input proteomic data had only 3% missing observations prior to imputation. From a neurobiological perspective, the role of histone acetylation, and indeed of H3K9ac specifically, has been recognized as an important multi-omic regulator of Alzheimer’s disease-relevant pathological processes.56 Recently, work by Nativio et al.57 identified H3K9ac as a key upstream regulator of Alzheimer’s disease pathogenesis using multi-omic data, identifying its crucial role using both differential expression from RNAseq (from temporal cortex) and a follow-up proteomic screen of histone modifiers. In addition, previous work in DLPFC tissue (from the same ROS/MAP cohort analyzed in our study) has demonstrated strong associations of H3K9ac with age-related cognitive impairment.15 Our analysis therefore emphasizes the importance of integrating epigenetic data with gene expression studies seeking to identify key molecular drivers of Alzheimer’s disease.58 Variability in gene expression alone cannot determine the current status of diseases;59,60 even so, genetic and epigenetic studies still tend to be conducted separately.58 This study serves as evidence that integrating multiple epigenetic data types with gene expression data can lead to the discovery of novel molecular subtypes associated with cognition.

In describing the top molecular features that distinguish our subtypes from one another, we identified epigenetic marks and RNA transcripts which map to genomic loci previously associated with Alzheimer’s disease and cognitive ageing. Of particular interest were those loci that differentiated cognition-associated subtype 5 from all other subtypes. In this subtype, we found lower levels of Prenylcysteine Oxidase 1 Like (PCYOX1L), a gene which has been previously associated with Alzheimer’s disease,61–64 and has been identified as an Alzheimer’s disease target gene by the Agora platform (https://agora.adknowledgeportal.org/) with strong evidence for RNA down-regulation across eight brain regions and proteomic down-regulation across four regions. Nectin cell adhesion molecule 1 (NECTIN1)65 was similarly downregulated in subtype 5, and also showed RNA and protein-level dysregulation in the Agora database, confirming that the multi-modal SNF pipeline was capable of extracting some known signals with neuropathological significance.

Among the top genes with higher average levels in subtype 5 were SLC5A3,66PPP4R2 and PPP1CC. PPP4R2 and PPP1CC code for enzymes in the serine/threonine-protein phosphatase family and are well-known contributors to canonical Alzheimer’s disease pathological cascades.67 Interestingly, PPP4R2 has also been identified as a top hypomethylated gene of interest in a methylome-wide association study of Parkinson’s disease,68 an illness which is also often accompanied by cognitive decline.69 Other top contributors to the three-modal subtypes, such as PSMD11,70APBB271 and TMEM253,72 are also known to be involved in the development of Alzheimer’s disease pathology. TMEM253 is also linked with mild cognitive impairment via predicted gene expression based on genetic variation (transcriptome-wide association study [TWAS]).72 However, some top genes (e.g. ZNF219, a Kruppel-like zinc finger gene, has been associated with a-synucleinopathy73 and has binding sites in the MAPT gene).74 In contrast, these genes have not yet been associated with Alzheimer’s disease or cognitive ageing, and our method provides a full resource of ranked importance for all omic features studied, which provides novel targets for future study.

There are several limitations to consider when interpreting our results. First, a common challenge in unsupervised clustering endeavors, we did not achieve consensus on optimal clustering solutions in our three-modal subtyping analysis. In our case, we not only examined the optimal cluster number from two established methods especially suited to the SNF pipeline, but also tested cluster validity by multiple resampling measures, as there is no ground truth to compare to, and important information may be missed by heuristic methods alone.75,76 In our analysis, the disagreement between optimal cluster number as elected by internal stability measures versus external cognitive and neuropathological information also demonstrates the importance of transparency in the presentation of clustering analyses; in our case, both the three- and five-subtype solutions had significant overlaps in identity, though only the fifth cluster revealed a significant cognitive deficit. We again emphasize that these effects on cognition would survive correction for multiple testing in a full pool of tests combining both three- and five-subtype solutions.

Second, differences in data preprocessing methods for our five omic data modalities may have impacted downstream clustering, despite our efforts to control for technical and biological confounders at both the individual feature level and at the overall sample level in models testing external validity. Third, ROS/MAP is intrinsically limited by its inclusion of predominantly individuals of European-Caucasian ancestry, with an overrepresentation of biologically female participants.28–30 Finally, ROS/MAP is known to be a resilient cohort of elderly individuals including some members of the religious communities of Illinois. Even though we modelled study as a covariate in all analyses to mitigate variability due to large lifestyle differences, results derived from such a study population might not be applicable to the entire population. Future studies will be required using populations with increased diversity with respect to ancestry and socio-demographics. This will be the means to achieve a better understanding of the degree to which our findings can be applied more broadly beyond European-Caucasians.

Conclusions

We performed unbiased integrative subtyping analysis of cognitive ageing using five distinct omic data types simultaneously. Using the unsupervised SNF approach, including comprehensive internal validity and sensitivity analyses, we identified multi-omic subtypes related to cognitive decline with both known and novel molecular drivers. We also empirically demonstrated differential contribution of omic modalities to participant similarity and dissect the effects of important subtyping parameters on external validity of derived subtypes in this population. Our results prioritize molecular signals for future study of mid-late life populations seeking biomarkers for future cognitive decline and help shed light on the heterogeneity of post-mortem brain subtyping studies using epigenetic, proteomic, metabolomic and gene expression data. Specifically, our data support RNAseq and H3K9ac data modalities as demonstrating unsupervised patterns of variability most aligned with cognitive decline, though we do not believe that this excludes the roles of other omic data types with established roles in Alzheimer’s disease neuropathology, especially using different technologies and assays for quantification, or supervised and semi-supervised analytical methods.

Supplementary material

Supplementary material is available at Brain Communications online.

Acknowledgements

The authors acknowledge all of the patients and their families for graciously donating brain tissue. ROS/MAP is supported by Y.W., P.L.D.J. and D.A.B. D.F. is supported by the Michael and Sonja Koerner New Scientist Award, the Krembil Family Foundation, the Canadian Institutes of Health Research, and the CAMH Discovery Fund. ROS/MAP is supported by NIH grants P30AG72971, P30AG72975, R01AG15819, R01AG17917 and U01jAG61356. ROS/MAP resources can be requested at www.radc.rush.edu. Results published here are based on metabolomics data generated by the Alzheimer's Disease Metabolomics Consortium (ADMC; ADMC members list https://sites.duke.edu/adnimetab/team/), led by Dr. Rima Kaddurah-Daouk at Duke University, using biospecimens provided by the Rush Alzheimer’s Disease Center, Rush University Medical Center, Chicago. Support for the biospecimen processing and data generation conducted by the ADMC was provided by the following National Institute on Aging grants R01AG046171, RF1AG051550, RF1AG057452, RF1AG059093, RF1AG058942, U01AG061359 and the Foundation for the NIH (FNIH) grant: #DAOU16AMPA.

Funding

Funding support for D.F. was provided by The Koerner Family Foundation New Scientist Program, The Krembil Foundation, the Canadian Institutes of Health Research and the Centre for Addiction and Mental Health Discovery Fund. ROS/MAP was supported by National Institutes of Health grants P30AG10161, P30AG72975, R01AG15819, R01AG17917, U01AG46152 and U01AG61356.

Competing interests

The authors report no competing interests.

Data availability

All multi-omic datasets supporting the conclusions of this article are available via approved access at the Synapse AMP-Alzheimer’s disease Knowledge Portal (https://adknowledgeportal.synapse.org/, https://doi.org/10.7303/syn2580853). All analyses were performed using open-source software. No custom algorithms or software were used that are central to the research or not yet described in published literature. ROS/MAP resources can be requested at https://www.radc.rush.edu.

References

1

Formánek
T
,
Csajbók
Z
,
Wolfová
K
, et al.
Trajectories of depressive symptoms and associated patterns of cognitive decline
.
Sci Rep
.
2020
;
10
(
1
):
20888
.

2

Boyle
PA
,
Wilson
RS
,
Yu
L
, et al.
Much of late life cognitive decline is not due to common neurodegenerative pathologies
.
Ann Neurol
.
2013
;
74
(
3
):
478
489
.

3

Nichols
E
,
Steinmetz
JD
,
Vollset
SE
, et al.
Estimation of the global prevalence of dementia in 2019 and forecasted prevalence in 2050: An analysis for the Global Burden of Disease study 2019
.
Lancet Public Health
.
2022
;
7
(
2
):
e105
e125
.

4

Breijyeh
Z
,
Karaman
R
.
Comprehensive review on Alzheimer’s disease: Causes and treatment
.
Molecules
.
2020
;
25
(
24
):
5789
.

5

Bennett
DA
,
Schneider
JA
,
Arvanitakis
Z
, et al.
Neuropathology of older persons without cognitive impairment from two community-based studies
.
Neurology
.
2006
;
66
(
12
):
1837
1844
.

6

Jiang
YZ
,
Liu
Y
,
Xiao
Y
, et al.
Molecular subtyping and genomic profiling expand precision medicine in refractory metastatic triple-negative breast cancer: The FUTURE trial
.
Cell Res
.
2021
;
31
(
2
):
178
186
.

7

Zhao
L
,
Lee
VHF
,
Ng
MK
,
Yan
H
,
Bijlsma
MF
.
Molecular subtyping of cancer: Current status and moving toward clinical applications
.
Brief Bioinform
.
2019
;
20
(
2
):
572
584
.

8

Neff
RA
,
Wang
M
,
Vatansever
S
, et al.
Molecular subtyping of Alzheimer’s disease using RNA sequencing data reveals novel mechanisms and targets
.
Sci Adv
.
2021
;
7
(
2
):
eabb5398
.

9

Zheng
C
,
Xu
R
.
Molecular subtyping of Alzheimer’s disease with consensus non-negative matrix factorization
.
PloS One
.
2021
;
16
(
5
):
e0250278
.

10

Olah
M
,
Menon
V
,
Habib
N
, et al.
Single cell RNA sequencing of human microglia uncovers a subset associated with Alzheimer’s disease
.
Nat Commun
.
2020
;
11
(
1
):
6129
.

11

Ma
M
,
Liao
Y
,
Huang
X
, et al.
Identification of Alzheimer’s disease molecular subtypes based on parallel large-scale sequencing
.
Front Aging Neurosci
.
2022
;
14
:
770136
.

12

Emon
MA
,
Heinson
A
,
Wu
P
, et al.
Clustering of Alzheimer’s and Parkinson’s disease based on genetic burden of shared molecular mechanisms
.
Sci Rep
.
2020
;
10
(
1
):
19097
.

13

Picard
M
,
Scott-Boyer
MP
,
Bodein
A
,
Périn
O
,
Droit
A
.
Integration strategies of multi-omics data for machine learning analysis
.
Comput Struct Biotechnol J
.
2021
;
19
:
3735
3746
.

14

Misra
BB
,
Langefeld
C
,
Olivier
M
,
Cox
LA
.
Integrated omics: Tools, advances and future approaches
.
J Mol Endocrinol
.
2019
;
62
(
1
):
R21
R45
.

15

Klein
HU
,
McCabe
C
,
Gjoneska
E
, et al.
Epigenome-wide study uncovers large-scale changes in histone acetylation driven by tau pathology in aging and Alzheimer’s human brains
.
Nat Neurosci
.
2019
;
22
(
1
):
37
46
.

16

Huo
Z
,
Yu
L
,
Yang
J
,
Zhu
Y
,
Bennett
DA
,
Zhao
J
.
Brain and blood metabolome for Alzheimer’s dementia: Findings from a targeted metabolomics analysis
.
Neurobiol Aging
.
2020
;
86
:
123
133
.

17

Wang
G
,
Zhou
Y
,
Huang
FJ
, et al.
Plasma metabolite profiles of Alzheimer’s disease and mild cognitive impairment
.
J Proteome Res
.
2014
;
13
(
5
):
2649
2658
.

18

Toledo
JB
,
Arnold
M
,
Kastenmüller
G
, et al.
Metabolic network failures in Alzheimer’s disease: A biochemical road map
.
Alzheimers Dement J Alzheimers Assoc
.
2017
;
13
(
9
):
965
984
.

19

Arnold
M
,
Nho
K
,
Kueider-Paisley
A
, et al.
Sex and APOE ε4 genotype modify the Alzheimer’s disease serum metabolome
.
Nat Commun
.
2020
;
11
(
1
):
1148
.

20

John-Williams L
S
,
Blach
C
,
Toledo
JB
, et al.
Targeted metabolomics and medication classification data from participants in the ADNI1 cohort
.
Sci Data
.
2017
;
4
:
170140
.

21

Johnson
ECB
,
Carter
EK
,
Dammer
EB
, et al.
Large-scale deep multi-layer analysis of Alzheimer’s disease brain reveals strong proteomic disease-related changes not observed at the RNA level
.
Nat Neurosci
.
2022
;
25
(
2
):
213
225
.

22

Badhwar
A
,
McFall
GP
,
Sapkota
S
, et al.
A multiomics approach to heterogeneity in Alzheimer’s disease: Focused review and roadmap
.
Brain
.
2020
;
143
(
5
):
1315
1331
.

23

Singh
A
,
Shannon
CP
,
Gautier
B
, et al.
DIABLO: An integrative approach for identifying key molecular drivers from multi-omics assays
.
Bioinformatics
.
2019
;
35
(
17
):
3055
3062
.

24

Ma
Y
,
Klein
H
,
De Jager
PL
.
Considerations for integrative multi-omic approaches to explore Alzheimer’s disease mechanisms
.
Brain Pathol
.
2020
;
30
(
5
):
984
981
.

25

Rappoport
N
,
Shamir
R
.
Multi-omic and multi-view clustering algorithms: Review and cancer benchmark
.
Nucleic Acids Res
.
2018
;
46
(
20
):
10546
10562
.

26

Chauvel
C
,
Novoloaca
A
,
Veyre
P
,
Reynier
F
,
Becker
J
.
Evaluation of integrative clustering methods for the analysis of multi-omics data
.
Brief Bioinform
.
2020
;
21
(
2
):
541
552
.

27

Wang
B
,
Mezlini
AM
,
Demir
F
, et al.
Similarity network fusion for aggregating data types on a genomic scale
.
Nat Methods
.
2014
;
11
(
3
):
333
337
.

28

Bennett
DA
,
Buchman
AS
,
Boyle
PA
,
Barnes
LL
,
Wilson
RS
,
Schneider
JA
.
Religious orders study and rush memory and aging project
.
J Alzheimers Dis JAD
.
2018
;
64
(
s1
):
S161
S189
.

29

De Jager
PL
,
Ma
Y
,
McCabe
C
, et al.
A multi-omic atlas of the human frontal cortex for aging and Alzheimer’s disease research
.
Sci Data
.
2018
;
5
(
1
):
180142
.

30

Bennett
DA
,
Schneider
JA
,
Buchman
AS
,
Barnes
LL
,
Boyle
PA
,
Wilson
RS
.
Overview and findings from the rush memory and aging project
.
Curr Alzheimer Res
.
2012
;
9
(
6
):
646
663
.

31

Felsky
D
,
Klein
H
,
Menon
V
, et al.
Human peripheral monocytes capture elements of the state of microglial activation in the brain
.
Res Sq.
2022
.
preprint
. doi:10.21203/rs.3.rs-1226021/v1

32

Rybnicek
J
,
Chen
Y
,
Millic
M
, et al.
CHRNA5 links chandelier cells to protection against amyloid pathology in human aging and Alzheimer’s Disease
.
BioRxiv
.
2022
.
preprint
. doi:

33

De Jager
PL
,
Srivastava
G
,
Lunnon
K
, et al.
Alzheimer’s disease: Early alterations in brain DNA methylation at ANK1, BIN1, RHBDF2 and other loci
.
Nat Neurosci
.
2014
;
17
(
9
):
1156
1163
.

34

Wei
R
,
Wang
J
,
Su
M
, et al.
Missing value imputation approach for mass spectrometry-based metabolomics data
.
Sci Rep
.
2018
;
8
(
1
):
663
.

35

Wingo
TS
,
Duong
DM
,
Zhou
M
, et al.
Integrating next-generation genomic sequencing and mass spectrometry to estimate allele-specific protein abundance in human brain
.
J Proteome Res
.
2017
;
16
(
9
):
3336
3347
.

36

Mertins
P
,
Tang
LC
,
Krug
K
, et al.
Reproducible workflow for multiplexed deep-scale proteome and phosphoproteome analysis of tumor tissues by liquid chromatography–mass spectrometry
.
Nat Protoc
.
2018
;
13
(
7
):
1632
1661
.

37

Kokla
M
,
Virtanen
J
,
Kolehmainen
M
,
Paananen
J
,
Hanhineva
K
.
Random forest-based imputation outperforms other methods for imputing LC-MS metabolomics data: A comparative study
.
BMC Bioinformatics
.
2019
;
20
(
1
):
492
.

38

Mostafavi
S
,
Gaiteri
C
,
Sullivan
SE
, et al.
A molecular network of the aging human brain provides insights into the pathology and cognitive decline of Alzheimer’s disease
.
Nat Neurosci
.
2018
;
21
(
6
):
811
819
.

39

Consensus recommendations for the postmortem diagnosis of Alzheimer’s disease. The National Institute on Aging, and Reagan Institute Working Group on diagnostic criteria for the neuropathological assessment of Alzheimer’s disease
.
Neurobiol Aging
.
1997
;
18
(
Suppl 4
):
S1
S2
.

40

Wilson
RS
,
Boyle
PA
,
Yu
L
, et al.
Temporal course and pathologic basis of unawareness of memory loss in dementia
.
Neurology
.
2015
;
85
(
11
):
984
991
.

41

De Jager
PL
,
Shulman
JM
,
Chibnik
LB
, et al.
A genome-wide scan for common variants affecting the rate of age-related cognitive decline
.
Neurobiol Aging
.
2012
;
33
(
5
):
1017.e1
1017.e15
.

42

Bocancea
DI
,
van Loenhoud
AC
,
Groot
C
,
Barkhof
F
,
van der Flier
WM
,
Ossenkoppele
R
.
Measuring resilience and resistance in aging and Alzheimer disease using residual methods: A systematic review and meta-analysis
.
Neurology
.
2021
;
97
(
10
):
474
488
.

43

Consens
ME
,
Chen
Y
,
Menon
V
, et al.
Bulk and single-nucleus transcriptomics highlight intra-telencephalic and somatostatin neurons in Alzheimer’s disease
.
Front Mol Neurosci
.
2022
;
15
:
903175
.

44

Stefanik
L
,
Erdman
L
,
Ameis
SH
, et al.
Brain-behavior participant similarity networks among youth and emerging adults with schizophrenia spectrum, autism spectrum, or bipolar disorder and matched controls
.
Neuropsychopharmacology
.
2018
;
43
(
5
):
1180
1188
.

45

Park
S
,
Zhao
H
.
Spectral clustering based on learning similarity matrix
.
Bioinformatics
.
2018
;
34
(
12
):
2069
2076
.

46

Huang
J
,
Nie
F
,
Huang
H
.
Spectral rotation versus K-means in spectral clustering
.
Proc AAAI Conf Artif Intell
.
2013
;
27
(
1
):
431
437
.

47

Handl
J
,
Knowles
J
,
Kell
DB
.
Computational cluster validation in post-genomic data analysis
.
Bioinformatics
.
2005
;
21
(
15
):
3201
3212
.

48

Brock
G
,
Pihur
V
,
Datta
S
,
Datta
S
.
Clvalid: An R package for cluster validation
.
J Stat Softw
.
2008
;
25
(
4
).

49

Sekula
M
,
Datta
S
,
Datta
S
.
Optcluster: An R package for determining the optimal clustering algorithm
.
Bioinformation
.
2017
;
13
(
3
):
101
103
.

50

Yan
L
,
Yang
M
,
Guo
H
, et al.
Single-cell RNA-Seq profiling of human preimplantation embryos and embryonic stem cells
.
Nat Struct Mol Biol
.
2013
;
20
(
9
):
1131
1139
.

51

Jacobs
GR
,
Voineskos
AN
,
Hawco
C
, et al.
Integration of brain and behavior measures for identification of data-driven groups cutting across children with ASD, ADHD, or OCD
.
Neuropsychopharmacology
.
2021
;
46
(
3
):
643
653
.

52

Chacón
JE
,
Rastrojo
AI
.
Minimum adjusted Rand index for two clusterings of a given size
.
Adv Data Anal Classif
.
2023
;
17
(
1
):
125
133
.

53

Cohen
JE
.
The distribution of the chi-squared statistic under clustered sampling from contingency tables
.
J Am Stat Assoc
.
1976
;
71
(
355
):
665
670
.

54

Kent
WJ
,
Sugnet
CW
,
Furey
TS
, et al.
The human genome browser at UCSC
.
Genome Res
.
2002
;
12
(
6
):
996
1006
.

55

Alashwal
H
,
El Halaby
M
,
Crouse
JJ
,
Abdalla
A
,
Moustafa
AA
.
The application of unsupervised clustering methods to Alzheimer’s disease
.
Front Comput Neurosci
.
2019
;
13
:
31
.

56

Lu
X
,
Wang
L
,
Yu
C
,
Yu
D
,
Yu
G
.
Histone acetylation modifiers in the pathogenesis of Alzheimer’s disease
.
Front Cell Neurosci
.
2015
;
9
:
226
.

57

Nativio
R
,
Lan
Y
,
Donahue
G
, et al.
An integrated multi-omics approach identifies epigenetic alterations associated with Alzheimer’s disease
.
Nat Genet
.
2020
;
52
(
10
):
1024
1035
.

58

Hamamoto
R
,
Komatsu
M
,
Takasawa
K
,
Asada
K
,
Kaneko
S
.
Epigenetics analysis and integrated analysis of multiomics data, including epigenetic data, using artificial intelligence in the era of precision medicine
.
Biomolecules
.
2019
;
10
(
1
):
62
.

59

Elliott
GO
,
Johnson
IT
,
Scarll
J
, et al.
Quantitative profiling of CpG island methylation in human stool for colorectal cancer detection
.
Int J Colorectal Dis
.
2013
;
28
(
1
):
35
42
.

60

Leygo
C
,
Williams
M
,
Jin
HC
, et al.
DNA methylation as a noninvasive epigenetic biomarker for the detection of cancer
.
Dis Markers
.
2017
;
2017
:
1
13
.

61

Scheubert
L
,
Luštrek
M
,
Schmidt
R
,
Repsilber
D
,
Fuellen
G
.
Tissue-based Alzheimer gene expression markers–comparison of multiple machine learning approaches and investigation of redundancy in small biomarker sets
.
BMC Bioinformatics
.
2012
;
13
(
1
):
266
.

62

Li
QS
,
De Muynck
L
.
Differentially expressed genes in Alzheimer’s disease highlighting the roles of microglia genes including OLR1 and astrocyte gene CDK2AP1
.
Brain Behav Immun Health
.
2021
;
13
:
100227
.

63

Liu
D
,
Dai
SX
,
He
K
, et al.
Identification of hub ubiquitin ligase genes affecting Alzheimer’s disease by analyzing transcriptome data from multiple brain regions
.
Sci Prog
.
2021
;
104
(
1
):
368504211001146
.

64

Vastrad
B
,
Vastrad
C
.
Bioinformatics analyses of significant genes, related pathways and candidate prognostic biomarkers in Alzheimer’s disease
.
BioRxiv
.
2021
.
preprint
. doi:

65

Kim
DY
,
Ingano
LAM
,
Kovacs
DM
.
Nectin-1α, an immunoglobulin-like receptor involved in the formation of synapses, is a substrate for presenilin/γ-secretase-like cleavage
.
J Biol Chem
.
2002
;
277
(
51
):
49976
49981
.

66

De Paepe
B
,
Merckx
C
,
Jarošová
J
,
Cannizzaro
M
,
De Bleecker
JL
.
Myo-inositol transporter SLC5A3 associates with degenerative changes and inflammation in sporadic inclusion body myositis
.
Biomolecules
.
2020
;
10
(
4
):
521
.

67

Braithwaite
SP
,
Stock
JB
,
Lombroso
PJ
,
Nairn
AC
.
Protein phosphatases and Alzheimer’s disease
.
Prog Mol Biol Transl Sci
.
2012
;
106
:
343
379
.

68

Kaut
O
,
Schmitt
I
,
Wüllner
U
.
Genome-scale methylation analysis of Parkinson’s disease patients’ brains reveals DNA hypomethylation and increased mRNA expression of cytochrome P450 2E1
.
Neurogenetics
.
2012
;
13
(
1
):
87
91
.

69

Aarsland
D
,
Creese
B
,
Politis
M
, et al.
Cognitive decline in Parkinson disease
.
Nat Rev Neurol
.
2017
;
13
(
4
):
217
231
.

70

Lokireddy
S
,
Kukushkin
NV
,
Goldberg
AL
.
cAMP-induced phosphorylation of 26S proteasomes on Rpn6/PSMD11 enhances their activity and the degradation of misfolded proteins
.
Proc Natl Acad Sci U S A
.
2015
;
112
(
52
):
E7176
85
.

71

Giri
M
,
Shah
A
,
Upreti
B
,
Rai
JC
.
Unraveling the genes implicated in Alzheimer’s disease
.
Biomed Rep
.
2017
;
7
(
2
):
105
114
.

72

Yuan
SX
,
Li
HT
,
Gu
Y
,
Sun
X
.
Brain-specific gene expression and quantitative traits association analysis for mild cognitive impairment
.
Biomedicines
.
2021
;
9
(
6
):
658
.

73

Clough
RL
,
Dermentzaki
G
,
Stefanis
L
.
Functional dissection of the α-synuclein promoter: transcriptional regulation by ZSCAN21 and ZNF219
.
J Neurochem
.
2009
;
110
(
5
):
1479
1490
.

74

Barrachina
M
,
Ferrer
I
.
DNA methylation of Alzheimer disease and tauopathy-related genes in postmortem brain
.
J Neuropathol Exp Neurol
.
2009
;
68
(
8
):
880
891
.

75

Fang
Y
,
Wang
J
.
Selection of the number of clusters via the bootstrap method
.
Comput Stat Data Anal
.
2012
;
56
(
3
):
468
477
.

76

Horne
E
,
Tibble
H
,
Sheikh
A
,
Tsanas
A
.
Challenges of clustering multimodal clinical data: Review of applications in asthma subtyping
.
JMIR Med Inform
.
2020
;
8
(
5
):
e16452
.

Abbreviations

     
  • ADM =

    average distance between means

  •  
  • AMP-AD =

    Accelerating Medicines Partnership for Alzheimer’s Disease

  •  
  • APN =

    average proportion of non-overlap

  •  
  • ARI =

    adjusted Rand index

  •  
  • DLPFC =

    dorsolateral prefrontal cortex

  •  
  • H3K9ac =

    acetylation at the 9th lysine residue of the histone H3 protein

  •  
  • MAP =

    rush memory and ageing project

  •  
  • NMI =

    normalized mutual information

  •  
  • PMI =

    post-mortem interval

  •  
  • QC =

    quality control

  •  
  • RNAseq =

    RNA sequencing

  •  
  • ROS =

    religious orders study

  •  
  • SNF =

    similarity network fusion

  •  
  • TDP43 =

    Transactive response DNA binding protein of 43 kDa

  •  
  • TWAS =

    transcriptome-wide association study

This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.

Supplementary data