Abstract

PET is used to measure tau protein accumulation in Alzheimer’s disease. Multiple biomarkers have been proposed to track disease progression, most notably the standardized uptake value ratio of PET tracer uptake in a target region of interest relative to a reference region, but literature suggests these region choices are nontrivial. This study presents and evaluates a novel framework, BioDisCVR, designed to facilitate the discovery of useful biomarkers, demonstrated on [18F]AV-1451 tau PET data in multiple cohorts. BioDisCVR enhances signal-to-noise by conducting a data-driven search through the space of possible combinations of regional tau PET signals into a ratio of two composite regions, driven by a user-defined fitness function. This study compares ratio-based biomarkers discovered by the framework with state-of-the-art standardized uptake value ratio biomarkers. Data used is tau PET regional measurements from 198 individuals from the Alzheimer’s Disease Neuroimaging Initiative database, used for discovery, and 42 from the Mayo Clinic Alzheimer’s Disease Research Center and Mayo Clinic Study of Aging (MCSA), used for external validation. Biomarkers are evaluated by calculating clinical trial sample size estimates for 80% power and 20% effect size. Secondary metrics are a measure of longitudinal consistency (standard deviation of linear mixed-effects model residuals), and separation between cognitive groups (t-statistic of the change over time due to being cognitively impaired). When applied to preclinical (secondary prevention with CU individuals) and clinical (treatment aimed at cognitively impaired individuals) trials on Alzheimer’s disease, our data-driven framework BioDisCVR discovered ratio-based tau PET biomarkers vastly superior to previous work, both reducing measurement error and sample size estimates for hypothetical clinical trials. Our analysis suggests remarkable potential for patient benefit (reduced exposure to health risks associated with experimental drugs) and substantial cost savings, through accelerated trials and reduced sample sizes. Our study supports the leveraging of data-driven methods like BioDisCVR for clinical benefit, with the potential to positively impact drug development in Alzheimer’s disease and beyond.

Introduction

Biomarkers have revolutionized medical research for everything from in vivo diagnosis (e.g. in Alzheimer’s disease) to detecting drug target engagement.1 This is particularly exciting in age-related neurodegenerative disease research, where an increasing number of disease-modifying treatments are emerging from clinical trials in Alzheimer’s disease.2-4 As these drugs reach healthcare settings, identifying optimal biomarkers will likely be key for supporting and informing patient management and treatment deployment decisions. However, no biomarker is perfect, which has ramifications for medical research and healthcare applications.

PET is a key technology in modern Alzheimer’s disease clinical trials for confirming clearance of pathology (amyloid and tau protein aggregates)2-4 and for trial eligibility.4 Both scenarios require a biomarker to be defined for the quantification of pathology. This has been the topic of numerous investigations in multiple PET modalities, using various statistical and biological approaches.

Ratio-based biomarkers (e.g. brain relative volumes or peptide ratios in blood or cerebrospinal fluid) have demonstrated great promise in diagnostic and screening applications.5,6 This is due in large part to the property of ratios cancelling out confounding factors—including biological factors, e.g. different blood flow rate, total brain volume or body mass, and methodological factors, e.g. instrumentation bias and repeatability, or batch effects. Indeed, the standard for positron emission tomography brain imaging in neurodegenerative disease is to quantify pathology using the standardized uptake value ratio (SUVR), which is the standardized uptake value (SUV) of a target region of interest with respect to the SUV of a reference region.

Previous data-driven investigations of optimal tau PET biomarkers for Alzheimer’s disease are mostly SUVR-based (note that the Centiloid7/CenTauR8 scales depend on SUVR). One prior study explored multiple image analysis pipelines and target/reference region choices in a search to optimize diagnostic separation and longitudinal precision of tau PET SUVR.9 Others sought to find a tau PET biomarker that would reduce clinical trial sample size estimates by combining known disease-affected regions in a weighted average,10 or proposing an individualized biomarker.11 Similar earlier work defined so-called ‘statistical’ regions of interest.12-16 All previous analyses are limited by the traditional SUVR mindset of using a reference region that is putatively both free of pathology and stable over time. However, the choice of a reference for SUVR in Alzheimer’s disease remains subjective and non-trivial.17,18 If common reference regions are interchangeable, their SUV values should be correlated. The fact that they are not18 hints at a fundamental problem with the current single-reference-region SUVR concept, which motivates further investigation such as the present study.

We turn the spotlight to a new question: is there a ratio of two composite regions, not restricted by the current SUVR paradigm, which is more useful in clinical applications? To be more specific, here we aim to find a data-driven tau PET biomarker for Alzheimer’s disease progression analyses. To achieve this, we developed a machine learning framework called BioDisCVR for discovering data-driven ratio-based biomarkers driven by a clinical objective.19 The framework is general (e.g. could be used on fluid biomarkers, or other features such as regional volumes, cortical thickness, cognitive scores) but here, we apply it to tau PET imaging data for clinical trials on Alzheimer’s disease.

Materials and methods

Applied to regional tau PET SUV data, our framework finds a data-driven ratio-based biomarker with numerator and denominator each consisting of a data-driven composite region (combination of multiple sub-regions). We call the biomarker the composite value ratio (CVR) and name the biomarker discovery framework BioDisCVR (‘bio discover’). Our experiments are designed to discover CVRs that minimize the sample size estimate in modern randomized controlled clinical trials on Alzheimer’s disease. We compare the performance of CVR biomarkers against relevant benchmarks from the literature, demonstrating across multiple datasets.

Study design

Figure 1 is a conceptual representation of the BioDisCVR framework. Our algorithm is described in BioDisCVR Framework and Benchmarks. Input features are regional brain volumes and regional PET SUV values (regions included are listed in Data). The algorithm searches through the combinatorial space (both numerator and denominator simultaneously) to find a ratio of composite regions driven by an application-specific fitness function. Here, we demonstrate BioDisCVR using two scenarios relevant to modern clinical trials on Alzheimer’s disease:

  • Experiment 1: 54-month secondary prevention trial in cognitively unimpaired (CU) individuals at risk of Alzheimer’s disease (cf., the A4 Study20).

  • Experiment 2: 18-month treatment trial in cognitively impaired (CI) individuals –cf., recent prominent trials of monoclonal antibodies Aducanumab (ENGAGE/EMERGE2), Lecanemab (Clarity-AD3) and Donanemab (TRAILBLAZER-ALZ 24).

  • Experiment 3: The same as Experiment 2, but with a subset of regions common to multiple datasets, to facilitate external validation. Here, we use the Alzheimer’s Disease Neuroimaging Initiative (ADNI) dataset for discovery and two datasets from the Mayo Clinic for validation (ADRC and MCSA). The datasets are described below in Data.

Conceptual representation of the BioDisCVR framework. A search algorithm is deployed to find a well-performing biomarker, defined as the ratio of signals f(triangles)/f(circles) in two composite brain regions (f(triangles) and f(circles)). The algorithm is driven by minimizing the sample size estimate needed for a hypothetical clinical trial, while maximizing the biomarker trajectory separation between cognitive groups (curved red arrow).
Figure 1

Conceptual representation of the BioDisCVR framework. A search algorithm is deployed to find a well-performing biomarker, defined as the ratio of signals f(triangles)/f(circles) in two composite brain regions (f(triangles) and f(circles)). The algorithm is driven by minimizing the sample size estimate needed for a hypothetical clinical trial, while maximizing the biomarker trajectory separation between cognitive groups (curved red arrow).

Our fitness function drives the algorithm to simultaneously reduce the sample size estimate (SSE) for a given clinical trial design, while maximizing cognitive group (biomarker) gradient separation (see Statistical Analysis). We report the SSE for 20% effect size and 80% power. Sample size estimate is an inverse square function of the effect size.

Extensive supplementary experiments are performed with multiple random initializations, alternative evaluation metrics, restricted search spaces (Supplementary Table 1) and regional ablation (Supplementary Fig. 1) to aid interpretability and comparability with the literature, while also demonstrating robustness of the BioDisCVR framework.

Data

This study uses a discovery dataset to find and evaluate biomarkers, and an additional two cohorts for validation.

Discovery dataset: the ADNI. Data analysed are regional brain tau PET signal and MRI volumes from the ADNI database (downloaded on 23 February 2024 from adni.loni.usc.edu), using participants with available processed [18F]AV-1451 tau PET scans. Our inclusion criteria consists of being amyloid-positive at any visit (composite amyloid-PET SUVR above or equal to 0.78 for Florbetapir F18 and 0.74 for Florbetaben F18)21 and having at least two visits with concurrent tau-PET and T1w MRI scans. Although the linear mixed model we use needs three time-points per individual to return a measure of the residuals, we can (and do) include data from individuals having only two scans, as these data contribute to the intercept and gradient. Diagnosis is defined by the individual’s maximum level of cognitive impairment across available visits. Of the 391 participants having two or more eligible visits, 198 were amyloid-positive (48.7% females). Among these, 76 were CU, 122 CI, which is the aggregation of individuals with either mild cognitive impairment or dementia due to probable Alzheimer’s disease), of which 71 were labelled to have dementia due to probable Alzheimer’s disease. The PET scans were acquired across 48 different sites, with voxel resolutions of 6 and 8 mm. The majority of participants (about 94%) identify as belonging to a white ethnicity (see Discussion).

Validation datasets: Mayo Clinic Alzheimer's Disease Research Center (ADRC) and MCSA. Individuals with dementia due to aetiologies other than Alzheimer’s disease were discarded for this study (N = 5). The same criteria as with ADNI data were applied (three visits with AV-1451 tau PET, structural MRI scan and a measure of amyloid), with the exception of the amyloid cut-off: since the composite reference region is not available in these datasets, we used the available regions from the cortical summary region (that is made up of frontal, anterior/posterior cingulate, lateral parietal, lateral temporal regions: full list in Supplementary file Results.xlsx, tab ‘amyloid.target’)21 and ‘inferior cerebellum grey matter’ as the reference. Then, we fit a 2-component Gaussian mixture model22 (see Supplementary Fig. 2) and derived the cut-off of 1.53 for the ¹¹C-Pittsburgh Compound B PET, to determine amyloid positivity. This reduced the available individuals to 43 (1 CU, 13 MCI and 29 Ad). Because of the small sample size of the CU group, we only used the available 42 CI individuals (33.3% females).

Time between visits is shown for all experiments in Table 1. The majority of intervals are from 0.7 to 1.5 years, but some span 3 years. We did not use partial-volume-corrected SUVR data, as it has been shown to introduce errors.9,23

Table 1

Proportion of individuals per interval time between visits

Time between visits (years)ADNIMayo
CUCICI
t ≤ 1.561%72%76%
1.5 > t ≤ 2.522%17%18%
2.5 > t ≤ 3.56%6%6%
t > 3.511%5%0%
Time between visits (years)ADNIMayo
CUCICI
t ≤ 1.561%72%76%
1.5 > t ≤ 2.522%17%18%
2.5 > t ≤ 3.56%6%6%
t > 3.511%5%0%

Most individuals had visits separated by 0.7 to 1.5 years. Data from the ADNI and from the Mayo Clinic ADRC and MCSA, the last two grouped in the Mayo column.

Table 1

Proportion of individuals per interval time between visits

Time between visits (years)ADNIMayo
CUCICI
t ≤ 1.561%72%76%
1.5 > t ≤ 2.522%17%18%
2.5 > t ≤ 3.56%6%6%
t > 3.511%5%0%
Time between visits (years)ADNIMayo
CUCICI
t ≤ 1.561%72%76%
1.5 > t ≤ 2.522%17%18%
2.5 > t ≤ 3.56%6%6%
t > 3.511%5%0%

Most individuals had visits separated by 0.7 to 1.5 years. Data from the ADNI and from the Mayo Clinic ADRC and MCSA, the last two grouped in the Mayo column.

BioDisCVR input features, as with classical machine learning models, can be the aggregation of any numerical variables. In our case, they consist of regional PET SUV and volumes for brain regions. In the Discovery dataset (ADNI), regions are defined by the Desikan-Killiany atlas,24 with the data provided on the Image & Data Archive run by the University of Southern California Laboratory of Neuro Imaging (LONI) for download in CSV format. For details of the processing pipelines, see ADNI’s manuscript25; this pipeline depends on a contemporaneous native space MRI that is segmented and parcellated with Freesurfer v7.1.1 (surfer.nmr.mgh.harvard.edu).26 For the Validation datasets, volume measures were obtained by the Mayo Clinic using the second version of the SUIT atlas.27 Off-target binding regions were excluded to ensure accurate quantification of tau pathology by minimizing interference from non-specific binding.25 The complete list of brain regions made available to our model is provided in the Supplementary File Results.xlsx (‘regions.available’ tab).

BioDisCVR framework and benchmarks

In this study, we use our framework BioDisCVR to discover biomarkers tailored for a given clinical trial design. Here, we consider both preclinical and clinical trials (in CU/CI individuals, respectively) of fixed duration (see Study design).

BioDisCVR application to tau PET for randomized clinical trials

Given a set of regions, we encode the ratio of two composite regions in a vector of 0, 1 and 2 s, where 0 denotes numerator, 2 denotes denominator and 1 indicates regions not included in the ratio. To overcome the challenge of searching the intractably large space of permutations (over 1040, using the Stirling series approximation), we employ an objective-driven genetic algorithm28 consisting of tournament selection with 3 random picks, blend crossover, random adaptive mutation and a population of 32. The resulting ratio-based biomarker is called CVR and can consist of combinations of either SUV or SUVR because the reference region cancels out in the ratio:

(1)

where compositei is the signal from any number of regions, combined in any way (e.g. volume-weighted, averaged).

To drive CVR towards minimizing the SSE of a hypothetical clinical trial while maintaining relevance to disease progression, we use a fitness/objective function equal to a measure of the trajectory separation between cognitive groups (see Statistical analysis below) divided by the square of the SSE. Here, the SSE is squared to slightly prioritize SSE over cognitive group separation, but this is a design choice for our experiments and not a requirement of the BioDisCVR framework. Adding group separation also forces the biomarker to increase in signal for the CI group with respect to the CU, facilitating interpretation and comparison with other biomarkers. We reduce the search space (again by choice, not a requirement) by using two biologically motivated priors: (i) no Braak1–3 regions29 in the denominator and (ii) no commonly used SUVR reference regions in the numerator.

Pipelines

Our experiments include several pipelines to investigate laterality (right–left hemisphere differences) and region size effects. Laterality is explored through averaged bilateral regions (indicated by the suffix-B in the biomarker nomenclature) versus left/right independent (indicated by the suffix-L in the biomarker nomenclature). Region size effects are explored through volume-weighted SUV versus simple mean SUV (mSUV). Supplementary Table 1 contains additional experiments where either the numerator or denominator is defined a priori.

Baselines

We compare the performance of CVR to SUVR biomarkers from the literature: meta-temporal14 (entorhinal cortex, parahippocampal gyrus, amygdala, inferior temporal gyrus, fusiform gyrus, middle temporal gyrus) divided by a composite reference region (eroded subcortical white matter, whole cerebellum and brainstem)13; highest tau-PET-positive data-driven stage (DDS,11 an adaptive individualized approach) divided by the inferior cerebellum grey matter. Additionally, we consider all combinations of composite target and reference regions we found in the literature: Braak stages,29 meta-temporal,14 mesial-temporal,30 temporoparietal,30 ‘rest’30 as target/numerator regions; and a set of popular reference regions (whole cerebellum, inferior cerebellar grey matter, composite reference region, eroded subcortical white matter, cerebellum cortex, brainstem, eroded subcortical white matter + inferior cerebellar grey matter, eroded subcortical white matter + whole cerebellum, whole cerebellum + brainstem, inferior cerebellar grey matter + brainstem, eroded subcortical white matter + inferior cerebellar grey matter + brainstem). Including composites calculated using both mean SUV and volume-weighted SUV, we compared our results with a total of 220 literature-inspired baseline biomarkers.

We report the best result found after 300 generations of the genetic algorithm for each of the constraints or scenarios. The top 5 biomarkers found, as well as the effect of different random initializations are shown in Supplementary File Results.xlsx, ‘Random.seeds_top.n’ tab. Our aim is not to show convergence or a focus on the search algorithm; rather, the evaluation of the biomarkers found by the algorithm. Thus, the 300 generations is arbitrary, although in 20 random initialization experiments with 500 generations each, 90% of the final performance was achieved within a mean of 140 generations (and below a standard deviation of 100), for both clinical and preclinical cases. Furthermore, the generations alone are not an indicator of the total number of combinations tested, as the genetic algorithm population size should also be taken into account.

Statistical analysis

All analyses are performed in R31 (version 4.3.2). Our analysis involves calculating SSE, longitudinal group separation and longitudinal precision (defined below) for clinical trials on Alzheimer’s disease in either CU or CI participants. The SSE and group separation measures drive the genetic algorithm. Longitudinal group separation (t-statistic) and longitudinal precision (standard deviation of model residuals) of tau PET biomarkers are quantified using a linear mixed effects model32 fit to log(SUVR). Covariates sex, APOE4 and education were not included because they did not significantly impact the intercept or the slope of the biomarker, replicating the analysis of Schwarz et al.9 Our linear mixed effects model includes correlated random slopes and intercepts, since this makes it invariant to time-shifts,32 and we use a relative time point (the average of all visits per individual). Furthermore, in the discovery cohort data (ADNI), a χ2 test indicated that the correlated random effects model fits the data better than the uncorrelated model, both for the CU (χ2=13.38, Df = 1, P < 0.001) and CI (χ2=13.39, Df = 1, P < 0.001). This is the formula used:

(2)

where:

  • i indexes the observation,

  • j indicates the individual,

  • DX is a binary variable indicating whether the individual j belongs to the CI group (1) or not (0),

  • β0∼3 are fixed effects,

  • b0j and b1j are random effects (intercepts and slopes varying by individual),

  • ɛi is the residual error.

Cognitive group separation (CU versus CI) is quantified by the t-statistic between fixed effect gradients (β3 in Eq. 2). The measure of longitudinal repeatability is quantified using the standard deviation of the model residuals (ɛ in Eq. 2), which is a measure of the relative error with respect to the biomarker values. Finally, we use the longpower package33 to calculate the SSE for each of two hypothetical clinical trials designed for 80% power to reduce tau PET accumulation by 20% versus placebo: CU individuals, 54 months (Experiment 1, c.f. the A4 Study20); and CI individuals, 18 months (Experiment 2, c.f. ENGAGE/EMERGE,2 CLARITY-AD,3 TRAILBLAZER-ALZ 24).

All metrics are provided with 95% confidence intervals, calculated using the model-based (semi-) parametric bootstrap for mixed models function from the lme4 package,32 version 1.1.35.1.

Results

Computation

Using a 12th Gen Intel(R) Core(TM) i7-12700H, with 2.70 GHz and 20 cores, we ran the search algorithm for <3 mins per experiment (Supplementary File Results.xlsx, tab Random.seeds_top.n, shows different random initializations of BioDisCVR).

Experiment 1 (hypothetical 54-month preclinical AD trial)

Figure 2 is a plot of SSE against percentage error (longitudinal precision) for all biomarkers in Experiment 1, truncated at SSE ≤ 2000. CVR biomarkers (red crosses) showed superior performance to baseline comparators (blue dots). Error bars show 95% confidence intervals from bootstrapping (see Statistical analysis).

Biomarker performance for Experiment 1. The plot shows SSE against percentage error (longitudinal precision) for all biomarkers in Experiment 1, truncated at SSE ≤ 2000. CVR biomarkers (red crosses) showed superior performance to baseline comparators (blue dots), reducing both the SSE and the error. Error bars show 95% confidence intervals from bootstrapping. The SSE shown is the minimum number of individuals that would be needed per branch for a hypothetical preclinical trial (CU individuals), designed for 80% power and 20% effect size, for a duration of 4.5 years.
Figure 2

Biomarker performance for Experiment 1. The plot shows SSE against percentage error (longitudinal precision) for all biomarkers in Experiment 1, truncated at SSE ≤ 2000. CVR biomarkers (red crosses) showed superior performance to baseline comparators (blue dots), reducing both the SSE and the error. Error bars show 95% confidence intervals from bootstrapping. The SSE shown is the minimum number of individuals that would be needed per branch for a hypothetical preclinical trial (CU individuals), designed for 80% power and 20% effect size, for a duration of 4.5 years.

Table 2 highlights selected biomarkers for Experiment 1 (all results are included in Supplementary material File Results.xlsx as a spreadsheet). Namely, specific literature biomarkers, plus the best-performing baseline (biomarker inspired by the literature) and our CVR biomarkers found by the algorithm. Overall, Table 2 shows that one of the CVR biomarkers always outperforms state-of-the-art biomarkers from the literature—for all metrics and, in particular, our primary goal of smaller SSE.

Table 2

Results of Experiment 1: biomarker performance for a 54-month clinical trial in an amyloid-positive, CU cohort

BiomarkerSSESeparationRepeatability
meta-temp/composite626 (353, 1401)4.37 (2.46, 6.49)1.06 (0.82, 1.31)
DDS/inferior_cerebellum_gm438 (259, 895)3.48 (1.57, 5.7)1.66 (1.32, 2.06)
(mSUV) mesial_temporal/
(ewm + cerebellum)
206 (143, 325)0.17 (−1.84, 2.09)1.03 (0.82, 1.3)
CVR-SUV-L195 (131, 321)4.2 (2.24, 6.25)1.52 (1.19, 1.68)
CVR-mSUV-L80 (61, 108)4.32 (2.31, 6.37)0.78 (0.67, 0.88)
CVR-SUV-B205 (138, 338)4.43 (2.61, 6.56)1.17 (0.91, 1.47)
CVR-mSUV-B115 (84, 168)3.3 (1.33, 5.43)0.86 (0.67, 0.96)
BiomarkerSSESeparationRepeatability
meta-temp/composite626 (353, 1401)4.37 (2.46, 6.49)1.06 (0.82, 1.31)
DDS/inferior_cerebellum_gm438 (259, 895)3.48 (1.57, 5.7)1.66 (1.32, 2.06)
(mSUV) mesial_temporal/
(ewm + cerebellum)
206 (143, 325)0.17 (−1.84, 2.09)1.03 (0.82, 1.3)
CVR-SUV-L195 (131, 321)4.2 (2.24, 6.25)1.52 (1.19, 1.68)
CVR-mSUV-L80 (61, 108)4.32 (2.31, 6.37)0.78 (0.67, 0.88)
CVR-SUV-B205 (138, 338)4.43 (2.61, 6.56)1.17 (0.91, 1.47)
CVR-mSUV-B115 (84, 168)3.3 (1.33, 5.43)0.86 (0.67, 0.96)

Numbers in parentheses correspond to the 95% confidence interval. SSE is assessed with 80% power to detect a 20% effect size with respect to placebo, with two visits separated by 54 months. In bold, the best result per metric. Composite = average PET SUV of whole cerebellum, brainstem and eroded subcortical white matter. Inferior_cerebellum_gm = inferior cerebellum grey matter. ‘ewm + cerebellum’ indicate the average of the eroded subcortical white matter and the whole cerebellum. Our biomarkers have the prefix ‘CVR’, short for composite value ratio, with mSUV for mean SUV or SUV for volume-weighted SUV, and a suffix L or B indicating a lateral or bilateral regional analysis.

Table 2

Results of Experiment 1: biomarker performance for a 54-month clinical trial in an amyloid-positive, CU cohort

BiomarkerSSESeparationRepeatability
meta-temp/composite626 (353, 1401)4.37 (2.46, 6.49)1.06 (0.82, 1.31)
DDS/inferior_cerebellum_gm438 (259, 895)3.48 (1.57, 5.7)1.66 (1.32, 2.06)
(mSUV) mesial_temporal/
(ewm + cerebellum)
206 (143, 325)0.17 (−1.84, 2.09)1.03 (0.82, 1.3)
CVR-SUV-L195 (131, 321)4.2 (2.24, 6.25)1.52 (1.19, 1.68)
CVR-mSUV-L80 (61, 108)4.32 (2.31, 6.37)0.78 (0.67, 0.88)
CVR-SUV-B205 (138, 338)4.43 (2.61, 6.56)1.17 (0.91, 1.47)
CVR-mSUV-B115 (84, 168)3.3 (1.33, 5.43)0.86 (0.67, 0.96)
BiomarkerSSESeparationRepeatability
meta-temp/composite626 (353, 1401)4.37 (2.46, 6.49)1.06 (0.82, 1.31)
DDS/inferior_cerebellum_gm438 (259, 895)3.48 (1.57, 5.7)1.66 (1.32, 2.06)
(mSUV) mesial_temporal/
(ewm + cerebellum)
206 (143, 325)0.17 (−1.84, 2.09)1.03 (0.82, 1.3)
CVR-SUV-L195 (131, 321)4.2 (2.24, 6.25)1.52 (1.19, 1.68)
CVR-mSUV-L80 (61, 108)4.32 (2.31, 6.37)0.78 (0.67, 0.88)
CVR-SUV-B205 (138, 338)4.43 (2.61, 6.56)1.17 (0.91, 1.47)
CVR-mSUV-B115 (84, 168)3.3 (1.33, 5.43)0.86 (0.67, 0.96)

Numbers in parentheses correspond to the 95% confidence interval. SSE is assessed with 80% power to detect a 20% effect size with respect to placebo, with two visits separated by 54 months. In bold, the best result per metric. Composite = average PET SUV of whole cerebellum, brainstem and eroded subcortical white matter. Inferior_cerebellum_gm = inferior cerebellum grey matter. ‘ewm + cerebellum’ indicate the average of the eroded subcortical white matter and the whole cerebellum. Our biomarkers have the prefix ‘CVR’, short for composite value ratio, with mSUV for mean SUV or SUV for volume-weighted SUV, and a suffix L or B indicating a lateral or bilateral regional analysis.

The best results for SSE were obtained with CVR-mSUV-L which is a ratio of simple-mean SUV combinations and laterality (separate hemispheric regions). The numerator included grey-matter regions from both hemispheres (superior frontal gyrus, amygdala), from the left hemisphere (caudal middle frontal gyrus, isthmus of cingulate gyrus, lingual gyrus, paracentral lobule, parahippocampal gyrus, postcentral gyrus, superior temporal gyrus), from the right hemisphere (lateral orbitofrontal gyrus). The denominator included grey- and white-matter regions from both hemispheres (precentral gyrus, brainstem, corpus callosum anterior, corpus callosum central, corpus callosum posterior), from the left hemisphere (insula, superior parietal lobule, ventral diencephalon), from the right hemisphere (lingual gyrus, paracentral lobule, postcentral gyrus, rostral anterior cingulate cortex).

The second-best SSE in Experiment 1 (about +45% SSE-worse) was obtained by CVR-mSUV-B, which combines bilateral mean SUV. The numerator included grey-matter regions from both hemispheres (entorhinal cortex, amygdala, inferior temporal gyrus, insula, medial orbitofrontal cortex, paracentral lobule, precuneus, superior frontal gyrus, temporal pole, transverse temporal gyrus). The denominator included grey- and white-matter regions from both hemispheres (brainstem, eroded subcortical white matter, corpus callosum anterior, corpus callosum mid_anterior, caudal anterior cingulate cortex, postcentral gyrus, rostral anterior cingulate cortex, ventral diencephalon). To aid with visualization, we show a representation of the brain,34 with the regions in the numerator in blue, and the ones in the denominator in red, in Supplementary Fig. 3.

Overall, performance of baselines in Experiment 1 went largely as expected for this preclinical trial scenario. For example, SUVRs with target/numerator regions involving later Braak stages performed poorly. The next worse targets were Braak2 (expected because of off-target binding) and temporoparietal, with mostly SSE above 1000. The best results of literature-inspired biomarkers in terms of SSE were obtained with the mesial temporal as a target (simple average of its regions’ SUV), with an SSE of 206, when using a reference region consisting of the simple average of the eroded subcortical white matter and the cerebellum regions. The next best targets were Braak stages 1 and 3 (SSE of 253 and 256, respectively, using the average SUV), when using a reference region consisting of the average of the eroded subcortical white matter and inferior cerebellum grey matter.

Experiment 2 (hypothetical 18-month clinical AD trial)

Figure 3 is a plot of SSE against percentage error (longitudinal precision) for all biomarkers in Experiment 2, truncated at SSE ≤ 4000. CVR biomarkers (red crosses) showed superior performance to baseline comparators (blue dots). Error bars show 95% confidence intervals from bootstrapping (see Statistical analysis).

Biomarker performance for Experiment 2. The plot shows SSE against percentage error (longitudinal precision) for all biomarkers in Experiment 2, truncated at SSE ≤ 4000. CVR biomarkers (red crosses) showed superior performance to baseline comparators (blue dots), reducing both the SSE and the error. Error bars show 95% confidence intervals from bootstrapping. The SSE shown is the minimum number of individuals that would be needed per branch for a hypothetical clinical trial (CI individuals), designed for 80% power and 20% effect size, for a duration of 1.5 years.
Figure 3

Biomarker performance for Experiment 2. The plot shows SSE against percentage error (longitudinal precision) for all biomarkers in Experiment 2, truncated at SSE ≤ 4000. CVR biomarkers (red crosses) showed superior performance to baseline comparators (blue dots), reducing both the SSE and the error. Error bars show 95% confidence intervals from bootstrapping. The SSE shown is the minimum number of individuals that would be needed per branch for a hypothetical clinical trial (CI individuals), designed for 80% power and 20% effect size, for a duration of 1.5 years.

Table 3 highlights selected biomarkers for Experiment 2 (all results are included in Supplementary material File Results.xlsx as a spreadsheet). Namely, specific literature biomarkers, plus the best-performing baseline (biomarker inspired by the literature) and our CVR biomarkers found by the algorithm. Overall, Table 3 shows that, like with Experiment 1 above, one of the CVR biomarkers always outperforms state-of-the-art biomarkers from the literature—for all metrics and, in particular, our primary goal of smaller SSE.

Table 3

Results of Experiment 2: biomarker performance for an 18-month clinical trial in an amyloid-positive, CI cohort

BiomarkerSSESeparationRepeatability
Meta-temp/composite1539 (1091, 2333)4.37 (2.46, 6.49)4.67 (3.97, 5.06)
DDS/inferior_cerebellum_gm1031 (739, 1539)3.48 (1.57, 5.7)3.12 (2.63, 3.69)
SUV Braak5/(ewm + cerebellum)852 (569, 1415)4.14 (2.3, 6.35)1.22 (1.02, 1.46)
CVR-SUV-L348 (275, 454)5.09 (3.17, 7.3)1.02 (0.86, 1.19)
CVR-mSUV-L212 (174, 265)6.3 (4.38, 8.42)0.66 (0.55, 0.76)
CVR-SUV-B365 (288, 479)5.15 (3.13, 7.32)1.01 (0.85, 1.19)
CVR-mSUV-B238 (195, 297)5.6 (3.58, 7.6)0.93 (0.79, 1.1)
BiomarkerSSESeparationRepeatability
Meta-temp/composite1539 (1091, 2333)4.37 (2.46, 6.49)4.67 (3.97, 5.06)
DDS/inferior_cerebellum_gm1031 (739, 1539)3.48 (1.57, 5.7)3.12 (2.63, 3.69)
SUV Braak5/(ewm + cerebellum)852 (569, 1415)4.14 (2.3, 6.35)1.22 (1.02, 1.46)
CVR-SUV-L348 (275, 454)5.09 (3.17, 7.3)1.02 (0.86, 1.19)
CVR-mSUV-L212 (174, 265)6.3 (4.38, 8.42)0.66 (0.55, 0.76)
CVR-SUV-B365 (288, 479)5.15 (3.13, 7.32)1.01 (0.85, 1.19)
CVR-mSUV-B238 (195, 297)5.6 (3.58, 7.6)0.93 (0.79, 1.1)

Numbers in parentheses correspond to the 95% confidence interval. SSE is assessed with 80% power to detect a 20% effect size with respect to placebo, with two visits separated by 18 months. In bold, the best result per metric. Composite = average PET SUV of whole cerebellum, brainstem and eroded subcortical white matter. Inferior_cerebellum_gm = inferior cerebellum grey matter. Our biomarkers have the prefix ‘CVR’, short for composite value ratio, with mSUV for mean SUV or SUV for volume-weighted SUV, and a suffix L or B indicating a lateral or bilateral regional analysis.

Table 3

Results of Experiment 2: biomarker performance for an 18-month clinical trial in an amyloid-positive, CI cohort

BiomarkerSSESeparationRepeatability
Meta-temp/composite1539 (1091, 2333)4.37 (2.46, 6.49)4.67 (3.97, 5.06)
DDS/inferior_cerebellum_gm1031 (739, 1539)3.48 (1.57, 5.7)3.12 (2.63, 3.69)
SUV Braak5/(ewm + cerebellum)852 (569, 1415)4.14 (2.3, 6.35)1.22 (1.02, 1.46)
CVR-SUV-L348 (275, 454)5.09 (3.17, 7.3)1.02 (0.86, 1.19)
CVR-mSUV-L212 (174, 265)6.3 (4.38, 8.42)0.66 (0.55, 0.76)
CVR-SUV-B365 (288, 479)5.15 (3.13, 7.32)1.01 (0.85, 1.19)
CVR-mSUV-B238 (195, 297)5.6 (3.58, 7.6)0.93 (0.79, 1.1)
BiomarkerSSESeparationRepeatability
Meta-temp/composite1539 (1091, 2333)4.37 (2.46, 6.49)4.67 (3.97, 5.06)
DDS/inferior_cerebellum_gm1031 (739, 1539)3.48 (1.57, 5.7)3.12 (2.63, 3.69)
SUV Braak5/(ewm + cerebellum)852 (569, 1415)4.14 (2.3, 6.35)1.22 (1.02, 1.46)
CVR-SUV-L348 (275, 454)5.09 (3.17, 7.3)1.02 (0.86, 1.19)
CVR-mSUV-L212 (174, 265)6.3 (4.38, 8.42)0.66 (0.55, 0.76)
CVR-SUV-B365 (288, 479)5.15 (3.13, 7.32)1.01 (0.85, 1.19)
CVR-mSUV-B238 (195, 297)5.6 (3.58, 7.6)0.93 (0.79, 1.1)

Numbers in parentheses correspond to the 95% confidence interval. SSE is assessed with 80% power to detect a 20% effect size with respect to placebo, with two visits separated by 18 months. In bold, the best result per metric. Composite = average PET SUV of whole cerebellum, brainstem and eroded subcortical white matter. Inferior_cerebellum_gm = inferior cerebellum grey matter. Our biomarkers have the prefix ‘CVR’, short for composite value ratio, with mSUV for mean SUV or SUV for volume-weighted SUV, and a suffix L or B indicating a lateral or bilateral regional analysis.

The best results were obtained with CVR-mSUV-L, which—like the results of Experiment 1—is a ratio of simple-mean SUV combinations with laterality. The numerator included grey-matter regions from both hemispheres (inferior temporal gyrus, insula, medial orbitofrontal cortex, superior frontal gyrus, temporal pole), from the left hemisphere (lingual gyrus, pars opercularis, superior parietal lobule), from the right hemisphere (fusiform, pars orbitalis, pars triangularis, pericalcarine cortex, precuneus, rostral middle frontal gyrus, superior temporal gyrus). The denominator included white- and grey-matter regions from both hemispheres (caudal anterior cingulate cortex, rostral anterior cingulate cortex, brainstem, whole cerebellum), from the left hemisphere (lateral orbitofrontal gyrus, pars orbitalis), from the right hemisphere (lateral occipital sulcus, transverse temporal gyrus).

The second-best SSE in Experiment 2 (about 12% bigger SSE-worse) was obtained by CVR-mSUV-B, as with Experiment 1, which involves bilateral mean SUV. The numerator included the following grey-matter regions from both hemispheres (inferior temporal gyrus, insula, medial orbitofrontal cortex, precentral gyrus, superior frontal gyrus, temporal pole). The denominator included the following white- and grey-matter regions from both hemispheres (brainstem, whole_cerebellum, caudal anterior cingulate cortex, lateral orbitofrontal gyrus, rostral anterior cingulate cortex). To aid with visualization, we show a representation of the brain,34 with the regions in the numerator in blue, and the ones in the denominator in red, in Supplementary Fig. 3.

In this experiment as well, performance of baselines went largely as expected for this clinical trial scenario. The earlier Braak stages (1–3) and the subset of their regions that compose the mesial temporal had the worst performance. The best results of literature-inspired biomarkers in terms of SSE were obtained with the Braak5 composite as a target (volume-weighted SUV), with an SSE of 852, when using a reference region consisting of the average of the eroded subcortical white matter and the cerebellum regions. Following this, the next best SSE was 862, achieved by the same Braak5 target regions (mean SUV) and the composite reference region. The next best target was the temporoparietal, using the composite reference region, with an SSE of 962.

Experiment 3 (hypothetical 18-month clinical AD trial): validation

Table 4 shows the salient results from the discovery set, and their performance in the validation dataset. As mentioned in 2.2 Data, we had to use a subset of intersecting regions of all datasets, hence the differences with Experiment 2. The best result in the validation set was obtained by the CVR-mSUV-B, which was the second-best BioDisCVR configuration in both Experiments 1 and 2. The CVR-mSUV-B numerator consisted in the following grey-matter regions from both hemispheres (inferior temporal gyrus, insula, medial orbitofrontal cortex, precentral gyrus, superior frontal gyrus, temporal pole), while the denominator consisted of the following white- and grey matter regions (brainstem, whole cerebellum, caudal anterior cingulate cortex, lateral orbitofrontal gyrus, rostral anterior cingulate cortex). To aid with visualization, we show a representation of the brain,34 with the regions in the numerator in blue, and the ones in the denominator in red, in Supplementary Fig. 3. The second-best result in the validation set was obtained by the adaptive DDS biomarker, although it did not perform as well in the discovery dataset.

Table 4

Results of Experiment 3: biomarker performance for an 18-month clinical trial in an amyloid-positive, CI cohort

BiomarkerSSE (discovery)SSE (validation)Repeatability (discovery)Repeatability (validation)
Meta-temp/whole_cerebellum1722 (1190, 2710)690 (394, 1509)4.68 (4.01, 5.07)2.16 (1.60, 2.84)
DDS/inferior_cerebellum_gm1031 (739, 1539)417 (271, 723)3.12 (2.63, 3.69)2.08 (1.52, 2.77)
mSUV temporoparietal/
cerebellum
1136 (719, 2056)755 (419, 1748)1.84 (1.53, 2.17)2.20 (1.68, 2.87)
CVR-SUV-L473 (362, 644)601 (351, 1261)1.17 (0.98, 1.38)0.97 (0.72, 1.26)
CVR-mSUV-L326 (257, 427)513 (307, 1027)0.60 (0.51, 0.70)0.63 (0.47, 0.82)
CVR-SUV-B513 (384, 723)630 (361, 1369)1.31 (1.1, 1.54)1.17 (0.88, 1.52)
CVR-mSUV-B362 (284, 478)334 (217, 579)0.85 (0.71, 1.01)0.70 (0.53, 0.91)
BiomarkerSSE (discovery)SSE (validation)Repeatability (discovery)Repeatability (validation)
Meta-temp/whole_cerebellum1722 (1190, 2710)690 (394, 1509)4.68 (4.01, 5.07)2.16 (1.60, 2.84)
DDS/inferior_cerebellum_gm1031 (739, 1539)417 (271, 723)3.12 (2.63, 3.69)2.08 (1.52, 2.77)
mSUV temporoparietal/
cerebellum
1136 (719, 2056)755 (419, 1748)1.84 (1.53, 2.17)2.20 (1.68, 2.87)
CVR-SUV-L473 (362, 644)601 (351, 1261)1.17 (0.98, 1.38)0.97 (0.72, 1.26)
CVR-mSUV-L326 (257, 427)513 (307, 1027)0.60 (0.51, 0.70)0.63 (0.47, 0.82)
CVR-SUV-B513 (384, 723)630 (361, 1369)1.31 (1.1, 1.54)1.17 (0.88, 1.52)
CVR-mSUV-B362 (284, 478)334 (217, 579)0.85 (0.71, 1.01)0.70 (0.53, 0.91)

Numbers in parentheses correspond to the 95% confidence interval. SSE is assessed with 80% power to detect a 20% effect size with respect to placebo, with two visits separated by 18 months. In bold, the best result per metric. Composite = average PET SUV of whole cerebellum, brainstem and eroded subcortical white matter. Inferior_cerebellum_gm = inferior cerebellum grey matter. Our biomarkers have the prefix ‘CVR’, short for composite value ratio, with mSUV for mean SUV or SUV for volume-weighted SUV, and a suffix L or B indicating a lateral or bilateral regional analysis.

Table 4

Results of Experiment 3: biomarker performance for an 18-month clinical trial in an amyloid-positive, CI cohort

BiomarkerSSE (discovery)SSE (validation)Repeatability (discovery)Repeatability (validation)
Meta-temp/whole_cerebellum1722 (1190, 2710)690 (394, 1509)4.68 (4.01, 5.07)2.16 (1.60, 2.84)
DDS/inferior_cerebellum_gm1031 (739, 1539)417 (271, 723)3.12 (2.63, 3.69)2.08 (1.52, 2.77)
mSUV temporoparietal/
cerebellum
1136 (719, 2056)755 (419, 1748)1.84 (1.53, 2.17)2.20 (1.68, 2.87)
CVR-SUV-L473 (362, 644)601 (351, 1261)1.17 (0.98, 1.38)0.97 (0.72, 1.26)
CVR-mSUV-L326 (257, 427)513 (307, 1027)0.60 (0.51, 0.70)0.63 (0.47, 0.82)
CVR-SUV-B513 (384, 723)630 (361, 1369)1.31 (1.1, 1.54)1.17 (0.88, 1.52)
CVR-mSUV-B362 (284, 478)334 (217, 579)0.85 (0.71, 1.01)0.70 (0.53, 0.91)
BiomarkerSSE (discovery)SSE (validation)Repeatability (discovery)Repeatability (validation)
Meta-temp/whole_cerebellum1722 (1190, 2710)690 (394, 1509)4.68 (4.01, 5.07)2.16 (1.60, 2.84)
DDS/inferior_cerebellum_gm1031 (739, 1539)417 (271, 723)3.12 (2.63, 3.69)2.08 (1.52, 2.77)
mSUV temporoparietal/
cerebellum
1136 (719, 2056)755 (419, 1748)1.84 (1.53, 2.17)2.20 (1.68, 2.87)
CVR-SUV-L473 (362, 644)601 (351, 1261)1.17 (0.98, 1.38)0.97 (0.72, 1.26)
CVR-mSUV-L326 (257, 427)513 (307, 1027)0.60 (0.51, 0.70)0.63 (0.47, 0.82)
CVR-SUV-B513 (384, 723)630 (361, 1369)1.31 (1.1, 1.54)1.17 (0.88, 1.52)
CVR-mSUV-B362 (284, 478)334 (217, 579)0.85 (0.71, 1.01)0.70 (0.53, 0.91)

Numbers in parentheses correspond to the 95% confidence interval. SSE is assessed with 80% power to detect a 20% effect size with respect to placebo, with two visits separated by 18 months. In bold, the best result per metric. Composite = average PET SUV of whole cerebellum, brainstem and eroded subcortical white matter. Inferior_cerebellum_gm = inferior cerebellum grey matter. Our biomarkers have the prefix ‘CVR’, short for composite value ratio, with mSUV for mean SUV or SUV for volume-weighted SUV, and a suffix L or B indicating a lateral or bilateral regional analysis.

Additional experiments

Supplementary Table 1 shows BioDisCVR results when fixing either the numerator (meta-temporal) or the denominator (inferior cerebellum grey matter). It can be observed that allowing BioDisCVR to find a data-driven denominator has a considerable impact in improving both the SSE (by a factor of 3 and 4 for Experiments 1 and 2, respectively, while fixing the reference as the inferior cerebellum grey matter achieves comparable results with literature biomarkers. Full information on regions selected is given in Supplementary File Results.xlsx. Regional ablation analysis is shown in Supplementary Fig. 1, where we re-evaluate a biomarker after removing one region, shows remarkably consistent performance suggesting that no single region dominates the CVR biomarkers, with the exception of the biomarker tested in Experiment 3—validation set, where removing the inferior temporal gyrus reduces the performance considerably (from 362 to 854 SSE).

Interestingly, the error and SSE of CVR_mSUV-B designed for Experiment 1, but applied to Experiment 2 is 0.90 and 652, respectively. The same configuration (CVR_mSUV-B), but trained for Experiment 2, has an error and SSE of 0.85 and 389, respectively, in Experiment 1.

Discussion

When applied to preclinical (secondary prevention) and clinical (treatment) trial scenarios on Alzheimer’s disease, our data-driven framework BioDisCVR discovered ratio-based tau PET biomarkers that are vastly superior to previous work. As discussed below, this can result in remarkable benefits including time and cost savings when running clinical trials on Alzheimer’s disease.

In a 54-month preclinical trial (Experiment 1), the best BioDisCVR output (mSUV-L) reduced SSE by 82%, decreased repeatability error by 53% and increased group separation by 24% compared to the best result from the literature: the adaptive DDS biomarker.11 Compared with the best combination of targets and references inspired by literature, the improvement is still vast, at 62% reduced SSE, 25% less error and about 25 times higher group separation, as the benchmark performed poorly in group separation.

We found similar results in Experiment 2 (18-month clinical trial), where our best biomarker reduced SSE by 79%, decreased repeatability error by 79% and improved group separation by 80%, with respect to the DDS.11 As the SSE is proportional to the inverse squared effect size, the CVR biomarkers will remain superior no matter which effect size is chosen.

In Experiment 3 (18-month clinical trial), results showed that the variant of CVR that had the best generalization in the validation dataset was the mSUV-B (mean SUV, bilateral analysis), with a 20% reduction of SSE and 65% reduction of error with respect to the best biomarker from the literature (DDS11). A key difference between DDS and CVR is that in our configuration, CVR is a ratio of fixed regions, whereas DDS is adaptive, and different individuals will have different target regions. Notably, this CVR vastly outperformed DDS in the discovery set (68% smaller SSE, 80% smaller error).

The mSUV-B CVR for Experiments 1 and 2 had similarities in the numerator (entorhinal cortex, amygdala, precuneus, superior frontal gyrus, transverse temporal gyrus) and the denominator (brainstem, eroded subcortical white matter, caudal anterior cingulate, rostral anterior cingulate), but specialized in different regions to adapt to the stage of the disease: cortical regions that were stable to use in the denominator for earlier stages (e.g. cuneus) disappeared from the denominator and appeared in the numerator in Experiment 2, as they show changes later in the disease process.29 We advise caution in drawing conclusions comparing the few CVR biomarkers among each other, though, since multiple configurations show similar performance (see ablation analysis in Supplementary Fig. 1).

Our framework demonstrates remarkable flexibility, which we exploited in multiple ways. For example, in supplementary experiments, we incorporated priors and constraints to enhance interpretability and bolster confidence in the main findings. Allowing for hemispheric laterality generally yielded CVR biomarkers with slightly improved SSE (5–30% smaller), which seems to concur with a previous finding of laterality of tau pathology accumulation in Alzheimer’s disease.35 In addition to the volume-weighted composite SUV that is used in previous SUVR studies on statistical ROIs,12-16 we considered a regional mean SUV, remarkably finding that the regional mean produced CVR biomarkers with better repeatability error (7–48% smaller) and SSE (35–60% smaller). The effect of composition was not significant when performing a Wilcoxon rank-sum test in the literature-inspired biomarkers (W = 19944, P-value = 0.7079, filtering out biomarkers that had above 3000 SSE). Braak stage SUVR biomarkers (using the composite reference region) produced results comparable to the best biomarkers from the literature, but still worse than CVR (see Supplementary File Results.xlsx, tab Exp1_and_2). As could be expected, the earliest Braak Stage 1 was favoured for Experiment 1 (preclinical trial), and the later Braak Stage 5 was favoured for Experiment 2 (clinical trial). Fixing the numerator as the meta-temporal region (and finding a data-driven denominator) improved results compared to using the meta-temp/composite biomarker.9 Likewise, fixing the denominator as the inferior cerebellum grey matter yielded improved results compared to the literature biomarkers, providing further support for our approach of moving beyond the traditional reference regions used in SUVR analyses. Exploring alternative optimizers as a search algorithm (box-constrained BFGS or Nelder-Mead) did not achieve results on par with the ones obtained by the genetic algorithm (results not shown), but other optimization approaches could be explored in the future. Different random initializations generated similarly performing biomarkers, with a low coefficient of variation (0.043) in fitness for Experiment 2 (clinical), but seemed to stop at a local minima for one of the random seeds in Experiment 1 (preclinical), with a coefficient of variation of 0.17. As observed in the regional ablation analysis (Supplementary Fig. 1), where different biomarkers exhibited similar performances, at the final computed generation of the genetic algorithm, the top 5 biomarkers exhibited comparable performance metrics, with a fitness coefficient of variation below 10% for Experiment 1 (preclinical) and below 5% for Experiment 2 (clinical), (Supplementary File Results.xlsx). The framework’s adaptability extends beyond the two experimental scenarios (54- and 18-month clinical trials) considered in our experiments—it could have easily been employed to identify a biomarker that performs well in both scenarios, thereby providing a unified solution.

Our experiments produced biomarker ratios that included known regions commonly associated with Alzheimer’s disease. The numerator often included regions commonly used as targets in SUVR studies of Alzheimer’s disease progression, e.g. entorhinal cortex, fusiform gyrus, inferior temporal gyrus, middle temporal gyrus. The denominator often included regions used as SUVR references, e.g. eroded subcortical white matter, whole cerebellum or cerebellum cortex, brainstem. The fact that our CVR (mean SUV, bilateral) designed for a trial works on the other (better than the specific two biomarkers from the literature) validates, in unseen data, that they are targeting disease-specific regions. Furthermore, both the 95% confidence intervals and the ablation study on influential regions (Supplementary Fig. 1) suggest a robust fit in the available data, free of questionable influential regions. Additional metrics or priors to guide the algorithm could be used to select a smaller set of good-performing biomarkers. Results indicate that using a curated biomarker per clinical trial target group can better characterize relevant disease progression, in line with the recent Revised Criteria for Diagnosis and Staging of Alzheimer’s Disease,6 reflecting that different regions are affected at different stages of the disease.

The fast execution of the framework BioDisCVR, together with the usage of readily processed scans (thus avoiding the need to re-run preprocessing methods) makes this approach computationally cheap, positively addressing an increasingly concerning environmental impact of machine learning.36

We anticipate that data-driven methods such as BioDisCVR could revolutionize the drug development pipeline. The dramatic reduction in SSE produced here has multiple benefits. First, a smaller trial reduces the number of individuals exposed to increased risk of side effects such as ARIA.37-40 Second, trials can take considerably less time, e.g. the A4 Study screening time could have been reduced from 44 months41 to under 12 months, considering a linear relationship between recruitment time and sample size.42 Third, the cost savings could be immense, e.g. for a cost per individual of $40k,43 the possible savings per trial using CVR-mSUV-B (as opposed to the best published biomarker) could be [2 arms × (1031–238)×40 000] about $65 million for CI/18-months, or [2 arms × (438–115)×40 000] about $25 million for CU/54-months. In the event that other factors influence the minimum sample size needed and cannot be reduced, using CVR would allow a more precise detection threshold of disease progression. The superior longitudinal precision of our data-driven biomarkers makes them candidates for secondary outcomes or endpoints.

We highlight opportunities for future work. First, it is important to explore further validation of tau PET ratio-based biomarkers across cohorts, ethnicities, etc. (work in progress in our team). Second, our framework can be applied across other PET tracers and beyond PET, e.g. to discover new and improved ratio-based blood biomarkers.5,6 Indeed, the small run-time needed for computation facilitates extensive exploration in any data modality and even across modalities, including fluid biomarkers and MRI (less expensive than PET and non-invasive). Not only this, but data acquisition cost could be included in the fitness function to discover a cost-effective ratio-based multimodal biomarker—potentially benefiting low- and middle-income countries, in particular. Third, future work could consider deriving a theoretical guarantee that a given search algorithm will find the global optimum biomarker, but in practice we found that multiple runs with random seeds produced very similar results (see ablation analysis in Supplementary Fig. 1 and top-n biomarkers in Supplementary File Results.xlsx).

Conclusion

Clinical trials of anti-amyloid and anti-tau therapies in Alzheimer’s disease need a precise and accurate biomarker for assessing disease modification (pathology clearance). We have introduced and deployed a data-driven framework for discovering such biomarkers, with experimental results showing state-of-the-art longitudinal precision, and drastically reduced sample size estimates required for modern clinical trials in symptomatic and preclinical Alzheimer’s disease. Our results suggest that considerable time, money and participant suffering could be saved by incorporating our biomarker framework into the design of future preclinical and clinical trials.

Supplementary material

Supplementary material is available at Brain Communications online.

Acknowledgements

The authors acknowledge members of the University College London’s Progression of Neurodegenerative Disease (POND) group (pond.cs.ucl.ac.uk) for comments received during group discussions. Also, Dr Alexandra Young and Dr Èlia Bosch Rué for their comments on an early draft. ADRC data contained in this analysis were obtained under research grant P30 AG062677 (PI: Ronald C. Petersen, MD, PhD) from the National Institutes of Health to the Mayo Alzheimer’s Disease Research Center. MCSA data contained in this analysis were obtained under research grant U01 AG006786 (PI: Ronald C. Petersen, MD, PhD) from the National Institutes of Health to the Mayo Clinic Study of Aging.

Funding

I.L.-S. and N.P.O. acknowledge funding from a UK Research and Innovation (UKRI) Future Leaders Fellowship (MR/S03546X/1). This work is supported by the Engineering and Physical Sciences Research Council (EPSRC)-funded University College London Centre for Doctoral Training in Intelligent, Integrated Imaging in Healthcare (i4health) (EP/S021930/1) and the Department of Health’s National Institute for Health and Care Research (NIHR)-funded Biomedical Research Centre at University College London Hospitals (UCLH).

Competing interests

I.L.-S.: None. N.P.O. is a consultant for Queen Square Analytics Limited (UK).

Data availability

Data used in this work (discovery dataset) can be found in the Alzheimer’s Disease Neuroimaging Initiative (ADNI) database (adni.loni.usc.edu). The cohorts used for validation (ADRD and MCSA from Mayo Clinic) can be requested at mayo.edu website. The framework code for the discovery of biomarkers presented in this work is provided in R and is available as a Supplementary file (see the online version of this paper at Brain Communications).

References

1

Frank
 
R
,
Hargreaves
 
R
.
Clinical biomarkers in drug discovery and development
.
Nat Rev Drug Discov
.
2003
;
2
(
7
):
566
580
.

2

Budd Haeberlein
 
S
,
Aisen
 
PS
,
Barkhof
 
F
, et al.  
Two randomized phase 3 studies of aducanumab in early Alzheimer’s disease
.
J Prev Alzheimers Dis
.
2022
;
9
(
2
):
197
210
.

3

Van Dyck
 
CH
,
Swanson
 
CJ
,
Aisen
 
P
, et al.  
Lecanemab in early Alzheimer’s disease
.
N Engl J Med
.
2023
;
388
(
1
):
9
21
.

4

Sims
 
JR
,
Zimmer
 
JA
,
Evans
 
CD
, et al.  
Donanemab in early symptomatic Alzheimer disease: The TRAILBLAZER-ALZ 2 randomized clinical trial
.
JAMA
.
2023
;
330
(
6
):
512
527
.

5

Li
 
Y
,
Schindler
 
SE
,
Bollinger
 
JG
, et al.  
Validation of plasma amyloid-β 42/40 for detecting Alzheimer disease amyloid plaques
.
Neurology
.
2022
;
98
(
7
):
e688
e699
.

6

Jack
 
CR
 Jr,
Andrews
 
JS
,
Beach
 
TG
, et al.  
Revised criteria for diagnosis and staging of Alzheimer’s disease: Alzheimer’s association workgroup
.
Alzheimers Dement
.
2024
;
20
(
8
):
5143
5169
.

7

Klunk
 
WE
,
Koeppe
 
RA
,
Price
 
JC
, et al.  
The centiloid project: Standardizing quantitative amyloid plaque estimation by PET
.
Alzheimers Dement
.
2015
;
11
(
1
):
1
15.e1–4
.

8

Villemagne
 
VL
,
Leuzy
 
A
,
Bohorquez
 
SS
, et al.  
CenTauR: Toward a universal scale and masks for standardizing tau imaging studies
.
Alzheimers Dement
.
2023
;
15
(
3
):
e12454
.

9

Schwarz
 
CG
,
Therneau
 
TM
,
Weigand
 
SD
, et al.  
Selecting software pipelines for change in flortaucipir SUVR: Balancing repeatability and group separation
.
NeuroImage
.
2021
;
238
:
118259
.

10

Insel
 
PS
,
Young
 
CB
,
Aisen
 
PS
, et al.  
Tau positron emission tomography in preclinical Alzheimer’s disease
.
Brain
.
2022
;
146
(
2
):
700
711
.

11

Leuzy
 
A
,
Binette
 
AP
,
Vogel
 
JW
, et al.  
Comparison of group-level and individualized brain regions for measuring change in longitudinal tau positron emission tomography in Alzheimer disease
.
JAMA Neurol
.
2023
;
80
(
6
):
614
623
.

12

Chen
 
K
,
Langbaum
 
JBS
,
Fleisher
 
AS
, et al.  
Twelve-month metabolic declines in probable Alzheimer’s disease and amnestic mild cognitive impairment assessed using an empirically pre-defined statistical region-of-interest: Findings from the Alzheimer’s disease neuroimaging initiative
.
NeuroImage
.
2010
;
51
(
2
):
654
664
.

13

Landau
 
SM
,
Fero
 
A
,
Baker
 
SL
, et al.  
Measurement of longitudinal β-amyloid change with 18F-florbetapir PET and standardized uptake value ratios
.
J Nucl Med.
 
2015
;
56
(
4
):
567
574
.

14

Jack
 
CR
,
Wiste
 
HJ
,
Weigand
 
SD
, et al.  
Age-specific and sex-specific prevalence of cerebral β-amyloidosis, tauopathy, and neurodegeneration in cognitively unimpaired individuals aged 50–95 years: A cross-sectional study
.
Lancet Neurol.
 
2017
;
16
(
6
):
435
444
.

15

Pontecorvo
 
MJ
,
Devous
 
MD
,
Kennedy
 
I
, et al.  
A multicentre longitudinal study of flortaucipir (18F) in normal ageing, mild cognitive impairment and Alzheimer’s disease dementia
.
Brain
.
2019
;
142
(
6
):
1723
1735
.

16

Leuzy
 
A
,
Pascoal
 
TA
,
Strandberg
 
O
, et al.  
A multicenter comparison of [18F]flortaucipir, [18F]RO948, and [18F]MK6240 tau PET tracers to detect a common target ROI for differential diagnosis
.
Eur J Nucl Med Mol Imaging
.
2021
;
48
(
7
):
2295
2305
.

17

Young
 
CB
,
Landau
 
SM
,
Harrison
 
TM
,
Poston
 
KL
,
Mormino
 
EC
.
Influence of common reference regions on regional tau patterns in cross-sectional and longitudinal [18F]-AV-1451 PET data
.
NeuroImage
.
2021
;
243
:
118553
.

18

Llorente
 
I
,
Oxtoby
 
N
. Correction of AV-1451 tau-PET signal. Presented at: New horizons in Alzheimer’s disease (NHAD 2021), Leuven, Belgium. (2021). 25 October 2021. Accessed 25 August 2024. https://www.vibconferences.be/sites/default/files/2021-10/NHAD2021%20-%20Poster%20Isaac%20Llorente%20Saguer.pdf

19

Saguer
 
IL
,
Busche
 
MA
,
Oxtoby
 
NP
.
Composite SUVR: A new method for boosting Alzheimer’s disease monitoring and diagnostic performance, applied to tau PET
.
Alzheimers Dement
.
2022
;
18
(
S5
):
e063177
.

20

Sperling
 
RA
,
Donohue
 
MC
,
Raman
 
R
, et al.  
Trial of solanezumab in preclinical Alzheimer’s disease
.
N Engl J Med
.
2023
;
389
(
12
):
1096
1107
.

21

Lee
 
J
,
Murphy
 
A
,
Ward
 
T
,
Harrison
 
T
,
Landau
 
S
,
Jagust
 
W
. ADNI amyloid PET processing methods, 2023. https://adni.loni.usc.edu

22

Royse
 
SK
,
Minhas
 
DS
,
Lopresti
 
BJ
, et al.  
Validation of amyloid PET positivity thresholds in centiloids: A multisite PET study approach
.
Alzheimers Res Ther
.
2021
;
13
(
1
):
99
.

23

Comparison of multiple tau-PET measures as biomarkers in aging and Alzheimer’s disease—ScienceDirect. Accessed 19 July 2024. https://www.sciencedirect.com/science/article/abs/pii/S1053811917304585

24

Desikan
 
RS
,
Ségonne
 
F
,
Fischl
 
B
, et al.  
An automated labeling system for subdividing the human cerebral cortex on MRI scans into gyral based regions of interest
.
NeuroImage
.
2006
;
31
(
3
):
968
980
.

25

Lee
 
J
,
Ward
 
T
,
Harrison
 
T
,
Landau
 
S
,
Jagust
 
W
. ADNI tau PET processing methods, 2023. https://adni.loni.usc.edu

26

Reuter
 
M
,
Schmansky
 
NJ
,
Rosas
 
HD
,
Fischl
 
B
.
Within-subject template estimation for unbiased longitudinal image analysis
.
NeuroImage
.
2012
;
61
(
4
):
1402
1418
.

27

Rolls
 
ET
,
Joliot
 
M
,
Tzourio-Mazoyer
 
N
.
Implementation of a new parcellation of the orbitofrontal cortex in the automated anatomical labeling atlas
.
NeuroImage
.
2015
;
122
:
1
5
.

28

Scrucca
 
L
.
On some extensions to GA package: Hybrid optimisation, parallelisation and islands evolution
.
R J
.
2017
;
9
(
1
):
187
206
.

29

Braak
 
H
,
Braak
 
E
.
Neuropathological stageing of Alzheimer-related changes
.
Acta Neuropathol
.
1991
;
82
(
4
):
239
259
.

30

Krishnadas
 
N
,
Doré
 
V
,
Robertson
 
JS
, et al.  
Rates of regional tau accumulation in ageing and across the Alzheimer’s disease continuum: An AIBL 18F-MK6240 PET study
.
eBioMedicine
.
2023
;
88
:
104450
.

31

R Core Team
. R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna Austria. 2020. Accessed 25 August 2024. https://cir.nii.ac.jp/crid/1370298755636824325

32

Bates
 
D
,
Mächler
 
M
,
Bolker
 
B
,
Walker
 
S
.
Fitting linear mixed-effects models using lme4
.
J Stat Softw.
 
2015
;
67
(
1
):
1
48
.

33

Iddi
 
S
,
Donohue
 
MC
.
Power and sample size for longitudinal models in R—The longpower package and Shiny app
.
R J
.
2022
;
14
(
1
):
264
282
.

34

Marinescu
 
RV
,
Eshaghi
 
A
,
Alexander
 
DC
, et al.  Brainpainter: A software for the visualisation of brain structures, biomarkers and associated pathological processes. In:
Zhu
 
D
,
Yan
 
J
,
Huang
 
H
, eds.
Multimodal brain image analysis and mathematical foundations of computational anatomy
.
Springer International Publishing
;
2019
:
112
120
.

35

Vogel
 
JW
,
Young
 
AL
,
Oxtoby
 
NP
, et al.  
Four distinct trajectories of tau deposition identified in Alzheimer’s disease
.
Nat Med
.
2021
;
27
(
5
):
871
881
.

36

Allen
 
M
.
The huge carbon footprint of large-scale computing
.
Phys World
.
2022
;
35
(
3
):
46
50
.

37

Sperling
 
R
,
Salloway
 
S
,
Brooks
 
DJ
, et al.  
Amyloid-related imaging abnormalities in patients with Alzheimer’s disease treated with bapineuzumab: A retrospective analysis
.
Lancet Neurol.
 
2012
;
11
(
3
):
241
249
.

38

Cogswell
 
PM
,
Barakos
 
JA
,
Barkhof
 
F
, et al.  
Amyloid-related imaging abnormalities with emerging Alzheimer disease therapeutics: Detection and reporting recommendations for clinical practice
.
AJNR Am J Neuroradiol.
 
2022
;
43
(
9
):
E19
E35
.

39

Salloway
 
S
,
Chalkias
 
S
,
Barkhof
 
F
, et al.  
Amyloid-related imaging abnormalities in 2 phase 3 studies evaluating aducanumab in patients with early Alzheimer disease
.
JAMA Neurol.
 
2022
;
79
(
1
):
13
21
.

40

Hampel
 
H
,
Elhage
 
A
,
Cho
 
M
,
Apostolova
 
LG
,
Nicoll
 
JAR
,
Atri
 
A
.
Amyloid-related imaging abnormalities (ARIA): Radiological, biological and clinical characteristics
.
Brain
.
2023
;
146
(
11
):
4414
4424
.

41

Sperling
 
RA
,
Donohue
 
MC
,
Raman
 
R
, et al.  
Association of factors with elevated amyloid burden in clinically normal older individuals
.
JAMA Neurol.
 
2020
;
77
(
6
):
735
745
.

42

Ritchie
 
M
,
Gillen
 
DL
,
Grill
 
JD
.
Recruitment across two decades of NIH-funded Alzheimer’s disease clinical trials
.
Alz Res Therapy
.
2023
;
15
(
1
):
28
.

43

Moore
 
TJ
,
Heyward
 
J
,
Anderson
 
G
,
Alexander
 
GC
.
Variation in the estimated costs of pivotal clinical benefit trials supporting the US approval of new therapeutic agents, 2015–2017: A cross-sectional study
.
BMJ Open
.
2020
;
10
(
6
):
e038863
.

Author notes

Data used in preparation of this article were obtained from the Alzheimer’s Disease Neuroimaging Initiative (ADNI) database (adni.loni.usc.edu). As such, the investigators within the ADNI contributed to the design and implementation of ADNI and/or provided data but did not participate in analysis or writing of this report. A complete listing of ADNI investigators can be found at: adni.loni.usc.edu/wp-content/uploads/how_to_apply/ADNI_Acknowledgement_List.pdf

This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.

Supplementary data