Neurofilament light levels predict clinical progression and death in multiple system atrophy

Abstract Disease-modifying treatments are currently being trialled in multiple system atrophy. Approaches based solely on clinical measures are challenged by heterogeneity of phenotype and pathogenic complexity. Neurofilament light chain protein has been explored as a reliable biomarker in several neurodegenerative disorders but data on multiple system atrophy have been limited. Therefore, neurofilament light chain is not yet routinely used as an outcome measure in multiple system atrophy. We aimed to comprehensively investigate the role and dynamics of neurofilament light chain in multiple system atrophy combined with cross-sectional and longitudinal clinical and imaging scales and for subject trial selection. In this cohort study, we recruited cross-sectional and longitudinal cases in a multicentre European set-up. Plasma and CSF neurofilament light chain concentrations were measured at baseline from 212 multiple system atrophy cases, annually for a mean period of 2 years in 44 multiple system atrophy patients in conjunction with clinical, neuropsychological and MRI brain assessments. Baseline neurofilament light chain characteristics were compared between groups. Cox regression was used to assess survival; receiver operating characteristic analysis to assess the ability of neurofilament light chain to distinguish between multiple system atrophy patients and healthy controls. Multivariate linear mixed-effects models were used to analyse longitudinal neurofilament light chain changes and correlated with clinical and imaging parameters. Polynomial models were used to determine the differential trajectories of neurofilament light chain in multiple system atrophy. We estimated sample sizes for trials aiming to decrease neurofilament light chain levels. We show that in multiple system atrophy, baseline plasma neurofilament light chain levels were better predictors of clinical progression, survival and degree of brain atrophy than the neurofilament light chain rate of change. Comparative analysis of multiple system atrophy progression over the course of disease, using plasma neurofilament light chain and clinical rating scales, indicated that neurofilament light chain levels rise as the motor symptoms progress, followed by deceleration in advanced stages. Sample size prediction suggested that significantly lower trial participant numbers would be needed to demonstrate treatment effects when incorporating plasma neurofilament light chain values into multiple system atrophy clinical trials in comparison to clinical measures alone. In conclusion, neurofilament light chain correlates with clinical disease severity, progression and prognosis in multiple system atrophy. Combined with clinical and imaging analysis, neurofilament light chain can inform patient stratification and serve as a reliable biomarker of treatment response in future multiple system atrophy trials of putative disease-modifying agents.

score in the four UMSARS items (feeding by nasogastric tube or gastrostomy because of severe dysphagia, falls at least once a day, unintelligible speech, and the inability to walk). The following clinical disease milestones were defined: milestone 1 = minimal motor symptoms, score <1 on UMSARS 1 item 7 (walking independently or mildly impaired, no walking aid required), milestone 2 = score of <2 on UMSARS 1 item 7 (walking moderately impaired, assistance and/or walking aid needed occasionally), milestone 3 = score of >3 on UMSARS 1 item 7 (walking severely impaired, assistance and/or walking aid needed frequently or wheelchair-bound), milestone 4 = a score of >3 on UMSARS 1 item 2 (marked impairment of swallowing, frequent food aspiration, nasogastric tube or gastrostomy feeding), and milestone 5 = UMSARS IV, score 5 (dependent in all daily activities).

MRI
In 55 patients, volumetric T1-weighted 3T MR images were acquired within 6 months of blood and CSF collection (mean CSF-MRI interval ±1 month). Of these, 14 patients had a follow-up scan at 12 months (mean interval between scans 13·5±7·0 months).
Similarly, baseline MRI was acquired for the 42 controls matched for age and sex. No follow-up MRI was obtained for controls.
MR scans were first bias field corrected and whole-brain parcellated using the geodesic information flow (GIF) algorithm, 3 which is based on atlas propagation and label fusion. We extracted the volumes of the following regions: whole brain, striatum (putamen and caudate), pons, and cerebellum (grey and white matter). Left and right volumes were summed, and total intracranial volume (TIV) was computed with SPM12 bias-corrected MR scans were rigidly registered to the MNI space. We used ITKSnap program (ITK-SNAP 3·8·0-beta) to load the image and used the sagittal view that best exposed the MCP by adapting previously described criteria. 5 We drew a straight line

Group effects
NfL concentrations in plasma and CSF from MSA patients were non-normally distributed, hence a nonparametric analysis was conducted. We used Wilcoxon's rank sum test to compare groups and the Spearman rank correlations to assess associations between the variables. To investigate and adjust the associations with further covariates (age, sex, disease duration), we employed nonparametric kernel regression with 1000 bootstrap replications. 6

Associations between NfL and disease severity, disease progression and survival
To analyse changes in plasma NfL and UMSARS, we employed linear mixed effects model (LMEM), where the subsequent follow-ups after initial visit at baseline were at 0·5, 1, 2, 3, 3.5, 4 and over 5 years. One level random intercept is included in the model We used Cox proportional hazard survival modelling to calculate hazard ratios (HRs) with 95% CIs between baseline NfL concentration and subsequent survival. NfL as predictor for survival (after biofluid collection) in patients was analysed using the Log Rank test and Kaplan-Meier curves comparing NfL tertiles and as continuous variables, and a Cox regression with correction for age, both on tertiles and NfL as continuous variables.

Diagnostic accuracy of NfL and cut off values
The accuracy of the diagnostic value of NfL for differentiating MSA cases from HC was assessed by calculating the area under the curve (AUC) of the receiver operating characteristic (ROC) curve. Firstly, we carried out ROC analyses for NfL in a cohort where matched plasma-CSF samples were available in MSA and HC to find the optimal cut off values. The cut-off value of NfL was calculated as the concentration that gave the maximum Youden's index (J = sensitivity + specificity -1). Then we applied these cut off values to the cohort of MSA cases where we had unpaired plasma samples and determined the specificity and sensitivity for the entire MSA cohort at these cut offs.

Modelling MSA disease trajectory
We used disease duration-by-NfL concentration comparison to generate MSA-specific curves across disease progression. We constructed MSA models of disease duration and plasma NfL concentrations using a two-degree polynomial regression model of main effects and interactions of disease duration and plasma NfL. We used a polynomial model that included linear and squared terms for variables and interactions between them to include higher order terms to ensure that the plateau does not result from the model assumptions but rather from the data. By combining our baseline cross-sectional values from 212 MSA cases, we modelled longitudinal plasma NfL trajectories across disease duration at sample collection range of 0-15 years and adjusted for UMSARS, age at sample collection and sex, to show the time point plasma NfL is predicted to rise, plateau and fall.

Association between absolute NfL levels with neuroimaging and clinical parameters
In cross-sectional (n=55) and longitudinal samples (n=14, total number of scans = 33), absolute NfL levels were evaluated for association with each of the regional brain volumes, MCP width, UMSARS and MoCA through multiple linear regression, adjusting for age, sex, disease duration and in brain volume analyses, for MRI scanner type. We addressed the issue of multiple collinearities by calculating the variance inflation factor (VIF).

Calculating the rate of change in biomarkers
Rates of change were extracted from LMEM, using time (in years) as a fixed effect.
Random slopes and intercepts of time per participant were included to produce individual regression coefficients per participant as previously described. 7 The rate of NfL change per individual was extracted from the model estimates for subsequent analyses. This model was also used for generating the rate of change for regional brain volumes and MCP width, UMSARS and MoCA.

Relating NfL rate of change to imaging and UMSARS rate of change
We determined the longitudinal relationship between the rate of change in plasma NfL and concurrent rate of change in the UMSARS, brain volumes of interest and the MCP width using LMEMs. The dependent term for each model was an imaging biomarker and UMSARS with fixed effect terms for baseline age, sex, time from baseline, extracted rate of change in plasma NfL, and interaction between time from baseline and rate of change in plasma NfL. Models contained random slope and intercept terms for participants. The primary term of interest was the interaction between the rate of change in plasma NfL and the time from baseline term. Models were fitted using lme4 in R.

Prospective prediction of brain volumes and UMSARS by plasma NfL rate of change
To investigate whether baseline plasma NfL could predict subsequent yearly changes in regional brain volume, cognition, and UMSARS scores we conducted a prospective study whereby following longitudinal plasma collection for NfL, we collected additional imaging and UMSARS data on 14 MSA cases.
To determine the rate of change in brain volumes and UMSARS between the imaging and clinical assessment concurrent to the participant's last blood draw and follow-up session, we fitted an LMEM for each participant, where the dependent variable was the imaging or UMSARS variable of interest at last plasma visit and follow-up visit, and the independent variable was the time between visits. We then used this rate of change for brain regions or UMSARS as the dependent variable in an LMEM, which included fixed effects for age, sex, and the rate of change in plasma NfL, and a random intercept term participant. The term of interest was the plasma NfL rate of change. Models were fitted using lme4 in R.

Sample size estimation for intervention trials
We derived the sample size per arm required to inform the design of therapeutic trials aiming to lower NfL by a range of desired therapeutic effect sizes. Given that the NfL changes in our cohort showed non-linear trajectories over time, the estimation assumed a non-linear progression. The estimation further assumed equal numbers of subjects in both study arms (allocation ratio 1:1), α=0.05, β=0.01, two-tailed independent t-tests, and the use of plasma NfL levels. Based on these assumptions, we derived the sample size per arm required to detect a given control-adjusted percent reduction in the treatment arm with 80% or 90% power and two-sided 5% type I error. To investigate whether baseline plasma NfL could predict yearly changes in brain region volumes and UMSARS scores, linear mixed-effects models were fitted with fixed effect terms for baseline age (age at onset), sex, time, baseline plasma NfL, and an interaction between time and baseline plasma NfL. Random slopes and intercepts of time per participant were included to produce individual regression coefficients per participant, as previously described 7 .