Machine learning identifies novel markers predicting functional decline in older adults

Abstract The ability to carry out instrumental activities of daily living, such as paying bills, remembering appointments and shopping alone decreases with age, yet there are remarkable individual differences in the rate of decline among older adults. Understanding variables associated with a decline in instrumental activities of daily living is critical to providing appropriate intervention to prolong independence. Prior research suggests that cognitive measures, neuroimaging and fluid-based biomarkers predict functional decline. However, a priori selection of variables can lead to the over-valuation of certain variables and exclusion of others that may be predictive. In this study, we used machine learning techniques to select a wide range of baseline variables that best predicted functional decline in two years in individuals from the Alzheimer’s Disease Neuroimaging Initiative dataset. The sample included 398 individuals characterized as cognitively normal or mild cognitive impairment. Support vector machine classification algorithms were used to identify the most predictive modality from five different data modality types (demographics, structural MRI, fluorodeoxyglucose-PET, neurocognitive and genetic/fluid-based biomarkers). In addition, variable selection identified individual variables across all modalities that best predicted functional decline in a testing sample. Of the five modalities examined, neurocognitive measures demonstrated the best accuracy in predicting functional decline (accuracy = 74.2%; area under the curve = 0.77), followed by fluorodeoxyglucose-PET (accuracy = 70.8%; area under the curve = 0.66). The individual variables with the greatest discriminatory ability for predicting functional decline included partner report of language in the Everyday Cognition questionnaire, the ADAS13, and activity of the left angular gyrus using fluorodeoxyglucose-PET. These three variables collectively explained 32% of the total variance in functional decline. Taken together, the machine learning model identified novel biomarkers that may be involved in the processing, retrieval, and conceptual integration of semantic information and which predict functional decline two years after assessment. These findings may be used to explore the clinical utility of the Everyday Cognition as a non-invasive, cost and time effective tool to predict future functional decline.

In order to ensure that high quality data MRI data were collected across sites, all MR images underwent image quality control at the central MRI core laboratory at Mayo Clinic, Rochester Minnesota. For ADNI-GO visits, a phantom scan was collected each day participants were scanned. These scans were checked to ensure proper scanner calibration and for detection of scanner changes indicative of underlying scanner issues. These scans were then used to identify and correct for change in geometric scaling over time, scanner qualification, scanner recalibration, and ongoing scanner quality. 16 For more information, description of the ADNI-GO MRI procedures may be found at http://www.adni-info.org. For ADNI2 visits, the phantom scan was used to certify and update scanners but was no longer collected each day. Improved vendor products were shown to address most of the artifacts that initially warranted a phantom scan.
Further, they showed that consistent results could be achieved across different scanners after ADNI 1 17 .

Fluorodeoxyglucose-PET (FDG-PET)
Description of FDG-PET acquisition may be found at http://www.adni-info.org. Briefly, patients fasted for at least four hours prior to being injected with ~185MBq of tracer and remained in a dim room for 20-30 minutes following injection. Dynamic 3D scans of six 5-minute frames were collected between 30-and 60-minutes. All PET images underwent quality control checking at the University of Michigan. Images were co-registered to native space, averaged together, aligned, and smoothed to 8mm resolution, which was the lowest resolution of any scanner used to collect data.

Biological samples
To calculate a polygenic hazard score (PHS) for Alzheimer's disease, single nucleotide polymorphisms (SNPs) were identified from a genome-wide association study in the International Genomics of Alzheimer's project (at p < 10-5). Thirty-one SNPs and two APOE variants were integrated to create a single hazard score. This score predicts progression to Alzheimer's disease diagnosis as well as age of onset. For a complete description of and methods used to calculate PHS, see Desikan and colleagues. 18 Analyses of cerebrospinal fluid were performed at the UPenn/ADNI Biomarker Laboratory using Roche Elecsys immunoassay and following Roche Study protocol. The Ab CSF immunoassay has an upper limit of 1700 pg/mL and lower limit and 200pg/mL; beyond these limits, performance has not been established. Participants with Ab levels greater than the upper technical limit were truncated to 1700 pg/mL. There were no individuals below the lower technical limit for Ab or outside of the technical limits for tau (80-1300 pg/mL) or p-tau (8-120 pg/mL).

SMOTE
The Synthetic Minority Oversampling Technique is a method designed to address the imbalance between the number of cases and controls within a dataset. This technique allows the addition of new minority cases into the dataset while preserving variance and without adding redundant information. After identifying a minority case, it then identifies similar minority cases (k-nearest neighbors). It generates a new minority case by considering random points between the example and the nearest neighbor to generate additional, plausible, minority cases until the dataset achieves the specified balance. 19 For these analyses, a 50:50 ratio of cases to controls was selected.