-
PDF
- Split View
-
Views
-
Cite
Cite
Vilija G Jokubaitis, Maria Pia Campagna, Omar Ibrahim, Jim Stankovich, Pavlina Kleinova, Fuencisla Matesanz, Daniel Hui, Sara Eichau, Mark Slee, Jeannette Lechner-Scott, Rodney Lea, Trevor J Kilpatrick, Tomas Kalincik, Philip L De Jager, Ashley Beecham, Jacob L McCauley, Bruce V Taylor, Steve Vucic, Louise Laverick, Karolina Vodehnalova, Maria-Isabel García-Sanchéz, Antonio Alcina, Anneke van der Walt, Eva Kubala Havrdova, Guillermo Izquierdo, Nikolaos Patsopoulos, Dana Horakova, Helmut Butzkueven, Not all roads lead to the immune system: the genetic basis of multiple sclerosis severity, Brain, Volume 146, Issue 6, June 2023, Pages 2316–2331, https://doi.org/10.1093/brain/awac449
- Share Icon Share
Abstract
Multiple sclerosis is a leading cause of neurological disability in adults. Heterogeneity in multiple sclerosis clinical presentation has posed a major challenge for identifying genetic variants associated with disease outcomes.
To overcome this challenge, we used prospectively ascertained clinical outcomes data from the largest international multiple sclerosis registry, MSBase. We assembled a cohort of deeply phenotyped individuals of European ancestry with relapse-onset multiple sclerosis. We used unbiased genome-wide association study and machine learning approaches to assess the genetic contribution to longitudinally defined multiple sclerosis severity phenotypes in 1813 individuals.
Our primary analyses did not identify any genetic variants of moderate to large effect sizes that met genome-wide significance thresholds. The strongest signal was associated with rs7289446 (β = −0.4882, P = 2.73 × 10−7), intronic to SEZ6L on chromosome 22. However, we demonstrate that clinical outcomes in relapse-onset multiple sclerosis are associated with multiple genetic loci of small effect sizes. Using a machine learning approach incorporating over 62 000 variants together with clinical and demographic variables available at multiple sclerosis disease onset, we could predict severity with an area under the receiver operator curve of 0.84 (95% CI 0.79–0.88). Our machine learning algorithm achieved positive predictive value for outcome assignation of 80% and negative predictive value of 88%. This outperformed our machine learning algorithm that contained clinical and demographic variables alone (area under the receiver operator curve 0.54, 95% CI 0.48–0.60).
Secondary, sex-stratified analyses identified two genetic loci that met genome-wide significance thresholds. One in females (rs10967273; βfemale = 0.8289, P = 3.52 × 10−8), the other in males (rs698805; βmale = −1.5395, P = 4.35 × 10−8), providing some evidence for sex dimorphism in multiple sclerosis severity. Tissue enrichment and pathway analyses identified an overrepresentation of genes expressed in CNS compartments generally, and specifically in the cerebellum (P = 0.023). These involved mitochondrial function, synaptic plasticity, oligodendroglial biology, cellular senescence, calcium and G-protein receptor signalling pathways. We further identified six variants with strong evidence for regulating clinical outcomes, the strongest signal again intronic to SEZ6L (adjusted hazard ratio 0.72, P = 4.85 × 10−4).
Here we report a milestone in our progress towards understanding the clinical heterogeneity of multiple sclerosis outcomes, implicating functionally distinct mechanisms to multiple sclerosis risk. Importantly, we demonstrate that machine learning using common single nucleotide variant clusters, together with clinical variables readily available at diagnosis can improve prognostic capabilities at diagnosis, and with further validation has the potential to translate to meaningful clinical practice change.
Introduction
Multiple sclerosis, a complex trait disease, is a leading cause of non-traumatic neurological disability in adults. It affects ∼2.8 million people worldwide, predominantly females.1 Rates of disability progression and long-term outcomes are highly heterogeneous among people with relapse-onset multiple sclerosis,2 and the underlying pathophysiological mechanisms that mediate this heterogeneity remain poorly understood. At present, the ability to predict a person’s probable long-term disease outcome at onset is very limited, but highly desirable, in order to stratify groups for clinical trial participation, or individuals for initiation with the most appropriate disease-modifying therapy.
So far, over 230 common genetic variants have been linked to multiple sclerosis risk.3 The only replicated genetic modifier of multiple sclerosis phenotype is carriage of the principal risk allele, the human leukocyte antigen (HLA) DRB1*15:01. In European populations, carriage of the HLA-DRB1*15:01 allele confers younger age of onset.4 However, large studies have shown that this allele has no effect on multiple sclerosis progression after onset.5,6 Further, there is strong evidence to suggest that currently known risk variants, aside from HLA-DRB1*15:01, play no major role in determining multiple sclerosis severity.7–9
A genetic influence on multiple sclerosis outcome is, however, plausible, in particular relating to the severity of secondary inflammation (e.g. development of slowly expanding or chronic rim-active lesions), resilience to neuroaxonal injury or remyelination capacity. Indeed, preliminary genome-wide association study (GWAS) evidence suggests that susceptibility and severity likely involve distinct biological processes and pathways.10–12
The best evidence so far for a genetic contribution to disease outcomes comes from a small number of cross-sectional GWAS dedicated to a search for severity signals associated with the Multiple Sclerosis Severity Scale (MSSS) score,13 or age at onset (AAO).9,10,14–17 However, these signals failed to reach significance at the genome-wide level, possibly due to inclusion of populations with both relapse-onset and progressive-onset clinical courses. As the genetic architecture underlying worsening in relapse-onset multiple sclerosis and progressive-onset multiple sclerosis is possibly distinct,18 it could be important to study these populations separately. Further, use of limited cross-sectional phenotypic MSSS data to assess disease severity limits accurate severity phenotyping due to both major ceiling effects and instability in relapse-onset multiple sclerosis,13 and therefore, may have limited severity signal discovery even in larger cohorts.17 With the exception of the 2011 IMSGC study17 that included 9772 cases, past studies of multiple sclerosis severity were limited by small sample sizes ranging from 163 cases14 to 1405 cases.9 Much of the discovery in the multiple sclerosis genetic risk space has been facilitated by studies that were orders of magnitude larger in size.3 However, case-control ascertainment is much simpler, and data more readily available, than detailed phenotypic ascertainment.
The heterogeneity in multiple sclerosis severity, both between individuals and within individuals over time, is large. Therefore, analysis of longitudinally acquired clinical trajectories over many years is likely to be more reliable for accurate severity assignation. Given that preliminary evidence suggests that genetic variation influences severity outcomes, we used both unbiased genome-wide association and machine learning (ML) approaches to examine (i) whether prospectively ascertained, longitudinally defined relapse-onset multiple sclerosis phenotypes could reveal novel genetic variants associated with disease severity, and thereby reveal novel insights into multiple sclerosis pathogenesis; and (ii) whether a ML model with multi-single nucleotide variant (SNV) inclusion has sufficient positive predictive value (PPV) to potentially be used at the time of multiple sclerosis diagnosis to guide clinical and treatment decisions. Our secondary analyses further interrogated SNV signals derived from our primary analyses, and also aimed to replicate previously reported suggestive markers of multiple sclerosis severity using a targeted approach.
Materials and methods
Study population
Participants were recruited from eight tertiary-referral multiple sclerosis-specialist centres, from three countries (Australia, Spain and the Czech Republic), participating in the MSBase Registry. MSBase is an international, prospective, observational, multiple sclerosis clinical outcomes registry, registered with the World Health Organization International Clinical Trials Registry Platform, ID ACTRN12605000455662.19 Data are entered by neurologists in, or near real-time including: participant demographics, disease phenotype, Expanded Disability Status Scale (EDSS) scores, relapse information and disease-modifying therapy use. Clinical assessments occur on average every 6 months. Data were extracted on 4 September 2019.
Ethics approvals
This study was approved by the Melbourne Health Human Research Ethics Committee, and by institutional review boards at all participating centres. All participants gave written informed consent for participation in the MSBase Registry, together with additional informed consent to participate in genetic research (HREC/13/MH/189 and per local approvals elsewhere).
Study inclusion criteria
People with multiple sclerosis of European ancestry with clinically definite, relapse-onset multiple sclerosis, based on McDonald criteria.20 Further, minimum inclusion criteria comprised: sex, birthdate, AAO, ≥5 years of symptom duration; ≥5 years prospective follow-up in MSBase; ≥3 EDSS scores recorded in the absence of a relapse (i.e. EDSS scores recorded within 30 days of relapse-onset date were excluded from analysis); availability of relapse and treatment history. All eligible patients attending clinic between January 2015 and June 2019 were offered study enrolment.
Phenotyping, severity assignation and recruitment
EDSS scores were used to calculate an age-related multiple sclerosis severity (ARMSS)21 score and MSSS score. Longitudinal MSSS scores may be less noisy in individual prognostics than cross-sectional measures.22 Therefore, for each participant, we calculated the median relapse-independent longitudinal ARMSS (l-ARMSS) and median longitudinal MSSS (l-MSSS) using each available ARMSS or MSSS score. Median l-ARMSS and l-MSSS scores were divided into quintiles to stratify the cohort for severity. The top and bottom quintiles were defined as outcome extremes, based on all individuals meeting minimum inclusion criteria (n = 5851; Fig. 1). Participant recruitment was enriched for these patients. Participants in the top and bottom decile of severity were reviewed to ensure accurate diagnosis. Individuals were excluded in the case of diagnostic uncertainty, or re-classification. The AAO phenotype was defined as age at first symptom onset.

Summary of patients screened for inclusion and analysis. CIS = clinically isolated syndrome; NMO = neuromyelitis optica; PPMS = primary progressive multiple sclerosis.
Symptom duration was defined as the number of years between first symptom onset and the most recently recorded visit in MSBase. Percentage of time exposed to disease-modifying therapy was defined as the total time exposed to any approved multiple sclerosis disease-modifying therapy as a percentage of symptom duration, as recorded in MSBase.
Statistical analyses
Data processing and statistical analyses were performed in Stata v.17 (Stata Corp, College Station, TX, USA) or R (http://R-project.org). Monash high performance computing infrastructure through MASSIVE was used for computationally extensive analyses.23 Continuous variables were assessed for normality using the Shapiro–Wilk normality test, and described as mean and standard deviation (SD) or median with interquartile range (IQR), as appropriate. Categorical variables were described using numbers and frequencies. Standardized differences were assessed using the Cohen’s d statistic. Correlations between l-ARMSS and AAO and the weighted genetic risk score (wGRS) were assessed using Pearson’s correlation coefficients. All analyses were two-tailed.
Genotyping, imputation and quality control
Whole blood gDNA was genotyped using the Illumina MegaEx BeadChip array. This array was customized with an additional 3000 SNVs of interest including: known multiple sclerosis risk SNVs, a suite of tag SNVs to classical HLA alleles,11 previously published putative severity SNVs.10,14,15,17 Samples were genotyped in two tranches at the John. P. Hussman Institute for Human Genomics (University of Miami, FL, USA); each containing DNA from Australia, the Czech Republic and Spain. Genome-wide data were therefore organized into six datasets. Samples underwent rigorous per-dataset quality control3 (Supplementary Table 1 and Supplementary material). Principal components (PC) analysis was implemented in Eigenstrat, and individuals that were outside ± 6 SD of the each of the first 10 PCs were excluded. PCs were projected to HapMapIII data to assess population stratification effects (Supplementary Fig. 1A–F), and we manually excluded population outliers falling outside the main European ancestry cluster. Following quality control, we imputed all samples using the Haplotype Reference Consortium panel (r1.1)24 on the Michigan imputation server. After per-SNV quality control, this resulted in 5 985 626 (final) SNVs with a minor allele frequency (MAF) of at least 5%.
Association testing
PLINK (v.1.9 and v.2.0)25 were used to conduct association testing and meta-analyses. For continuous traits, we used linear regression to analyse each of the six datasets, adjusted for the first five PCs, wGRS,26 disease-modifying therapy use, together with variables identified to have a standardized difference >15% between severity extremes. Combined dataset results of all six groups were then analysed using fixed-effects meta-analyses to identify statistically independent SNVs.
For binary traits, all six groups were analysed jointly due to lower sample numbers. We used logistic regression, adjusting for group ID, the first five PCs together with covariates with a standardized difference >15%.
The de novo GWAS P-value threshold was set to P < 5 × 10−8. SNVs meeting 1 × 10−8< P < 5 × 10−5 were considered to have nominal evidence of association with the trait of interest. For replication analyses, the Bonferroni-deflated P-value to meet replication threshold was set to P ≤ 4.31 × 10−4 (0.05/116 replication SNVs). We calculated wGRS26 based on directly genotyped SNVs described by the International Multiple Sclerosis Genetics Consortium.3
Calculations estimated a sample size of 915 individuals per group was required to achieve 80% power to detect an SNV with MAF 0.2 and an odds ratio of 1.3, on the basis of binary severity outcomes.
Survival analyses
We assessed time to reach the hard disability milestones of irreversible EDSS 3 and 6 for those individuals who had not yet reached these at first MSBase-recorded visit. Where disability milestones were not met during study observation, data were censored at the most recent visit. Survival analyses were based on carriage of the minor allele for SNVs at the top 10 nearest genes identified in our l-ARMSS and l-MSSS de novo association analyses; and top 5 nearest genes identified in our sex-stratified l-ARMSS and l-MSSS analyses. Cox proportional hazards regression (implemented in Stata v.17) was used to calculate hazard ratios (HRs) with 95% confidence intervals (CI). Multivariable models were adjusted for AAO, multiple sclerosis susceptibility wGRS, percentage of follow-up exposed to disease-modifying therapy and sex. The Schoenfeld residuals global test was used to detect a violation of the Cox proportional hazards assumption. Where the proportionality assumption was violated, a Weibull regression approach was applied. Survival data were visualized using Kaplan–Meier curves.
Heritability analysis
Narrow-sense heritability (hsnp) was estimated from individual-level GWAS data using a genome-based restricted maximum likelihood approach implemented using GCTA software,27 BOLT-LMM28 and a summary statistic approach using a linkage disequilibrium score regression implemented in LDSC software.29
Enrichment analyses
We used FUMA30 to assign SNVs (P < 1 × 10−5) to genes, tissues and functions following the online instructions. The Panther database31 was used to further confirm gene pathways/ontologies that were over-represented and enriched in the variants with top association hits (P < 1 × 10−5).
Machine learning
We implemented non-linear ML models for severity classification as these outperform linear ML models in the context of common variant-based disorders.32 All SNVs with P ≤ 0.01 in the de novo meta-analyses, together with demographic and clinical variables, were used to generate datasets compatible with gradient boosting algorithms (xgboost).33 A training set of 70% of people with multiple sclerosis was randomly selected, ensuring a balanced representation of severe and mild multiple sclerosis outcomes. After training with internal bootstrapping of 0.7 for 10 000 iterations, the model was tested on the 30% of the remaining cohort, a conservative approach to avoid model overfitting.34 Further, a slow learning rate (eta = 0.01) was implemented to avoid overfitting.35,36
Using the prediction values generated on the test set for each model, as well as the true membership values of each sample, a confusion matrix was generated along with accuracy, sensitivity, specificity and kappa statistics using the confusion matrix function of Caret package. Furthermore, to evaluate the prediction accuracy and performance of the models, the receiver operator characteristic curve was plotted, and the area under the curve for each model was calculated.
Data availability
Clinical data from the MSBase Registry: to protect participant confidentiality, de-identified patient-level data sharing may be possible in principle, but will require permissions/consent from each contributing data controller.
Genetic data are under controlled access while we continue to explore these data in further analyses. Access requests with scientifically sound proposals can be made in writing to Dr Vilija Jokubaitis ([email protected]) or Professor Helmut Butzkueven ([email protected]). Requests will be assessed and responded to within 4 weeks. If data access is approved, authorship should include the MSBase Genetics Consortium, with individual authors listed as appropriate.
Participant phenotypic data together with extended tabulated results from primary and secondary GWAS analyses, and extended survival model results are available on Dryad (https://doi.org/10.5061/dryad.hx3ffbghk).
Results
Cohort characteristics
The cohort comprised 5851 people of European descent with relapse-onset multiple sclerosis (Supplementary Fig. 1). Those who met study minimum inclusion criteria (Fig. 1), represented 63 072 patient-years of follow-up. Of these, 1813 (31%), representing 22 884 patient-years of follow-up, were genotyped and included in downstream analyses (Supplementary Table 1). The clinical and demographic characteristics of the cohort based on longitudinal age-related MSSS21 (l-ARMSS) scores (Table 1), and longitudinal MSSS (l-MSSS; Supplementary Table 2) are shown. Per-country cohort characteristics are provided in Supplementary Table 3. The correlation between l-ARMSS and l-MSSS was strong (r = 0.90, P < 0.0001). Individual disease trajectories based on l-ARMSS or l-MSSS assignation are shown in Fig. 2.

Cohort disease trajectories. Spaghetti plots depict each individual phenotyped. Thick lines represent median EDSS scores for each trajectory along the time course, shaded regions represent the 95th centiles of all EDSS scores contributing to the trajectory. (A) EDSS-age trajectories for included participants classified into mild (n = 1180), intermediate (n = 3559) or severe (n = 1112) groups based on median longitudinal ARMSS scores. (B) EDSS-symptom duration trajectories for included participants classified into mild (n = 1232), intermediate (n = 3512) or severe (n = 1107) groups based on median longitudinal MSSS scores.
. | . | Assessed . | Genotyped-passed QC . | Cohen’s d . | ||||
---|---|---|---|---|---|---|---|---|
Characteristics . | . | All . | Mild . | Severe . | All . | Mild . | Severe . | Mild versus severe . |
n = 5851 . | n = 1180 . | n = 1167 . | n = 1813 . | n = 447 . | n = 464 . | |||
Population, n (%) | Australian | 1993 (34.1) | 625 (53.0) | 375 (32.1) | 676 (37.3) | 209 (46.8) | 161 (34.7) | |
Czech | 2664 (45.5) | 346 (29.3) | 497 (42.6) | 716 (39.5) | 156 (34.9) | 164 (35.3) | ||
Spanish | 1194 (20.4) | 209 (17.7) | 295 (25.3) | 421 (23.2) | 82 (18.3) | 139 (30.0) | ||
l-ARMSS score | Median (IQR) | 4.53 (2.79, 6.55) | 1.49 (0.93, 1.97) | 8.24 (7.63, 8.91) | 4.13 (2.43, 7.21) | 1.49 (0.96, 2.03) | 8.55 (7.92, 9.12) | 10.40 |
Range | 0.08–9.99 | 0.08–2.39 | 7.13–9.99 | 0.14–9.98 | 0.14–2.38 | 7.13–9.98 | ||
Disease course, n (%) | RRMS | 5075 (86.7) | 1150 (97.5) | 743 (63.7) | 1471 (81.1) | 438 (98.0) | 226 (48.7) | |
SPMS | 776 (13.2) | 30 (2.5) | 424 (36.3) | 342 (18.9) | 9 (2.0) | 238 (51.3) | ||
Sex, n (%) | Female | 4261 (72.8) | 889 (75.3) | 809 (69.3) | 1313 (72.4) | 337 (75.4) | 316 (68.1) | |
Male | 1590 (27.2) | 291 (24.7) | 358 (30.7) | 500 (27.6) | 110 (24.6) | 148 (31.9) | ||
Date of onset (month, year) | Median (IQR) | Nov 2001 (Jan 1995–May 2007) | Mar 2004 (Jul 1997–Jun 2008) | Jul 1997 (Jan 1991–Jan 2004) | Jan 2000 (Jul 1993–Jan 2005) | Dec 2001 (Oct 1995–May 2006) | Aug 1995 (Jul 1989–Jan 2000) | |
AAO | Median (IQR) | 29.2 (23.4, 36.5) | 34.3 (27.7, 41.9) | 26.0 (20.9, 32.1) | 28.3 (22.7, 35.7) | 32.3 (26.4, 39.6) | 27.2 (22.1, 33.1) | 0.58 |
Age at most recent visit | Median (IQR) | 47.2 (39.7, 56.4) | 50.5 (42.8, 58.5) | 47.4 (40.0, 56.1) | 48.1 (40.9, 57.2) | 51.2 (43.5, 58.6) | 51.1 (43.6, 58.5) | 0.003 |
Range | 17.4–87.6 | 20.2–87.6 | 17.4–80.1 | 17.4–84.5 | 25.2–83.1 | 17.4–80.1 | ||
Follow-up in MSBase, years | Median (IQR) | 10.1 (7.5, 13.3) | 9.2 (7.0, 12.1) | 10.9 (8.3, 15.0) | 11.7 (9.7, 15.2) | 11.2 (9.4, 14.1) | 12.5 (10.2, 16.2) | 0.32 |
Range | 5.0–54.8 | 5.0–32.2 | 5.0–47.0 | 5.1–32.2 | 5.1–32.2 | 5.2–29.1 | ||
Number of EDSS assessed | Median (IQR) | 18 (11, 31) | 13 (9, 20) | 18 (11, 31) | 21 (14, 34) | 17 (12, 26) | 19 (12, 30) | 0.10 |
Range | 3–91 | 3–91 | 3–85 | 3–91 | 3–91 | 3–85 | ||
Symptom duration (years) | Median (IQR) | 16.0 (10.7, 23.1) | 13.8 (9.7, 20.2) | 20.6 (14.3, 26.9) | 18.1 (13.2, 24.4) | 16.2 (11.9, 22.0) | 22.2 (17.5, 28.0) | 0.61 |
Range | 5.1–66.6 | 5.1–58.5 | 5.3–57.1 | 5.6–55.4 | 5.9–55.4 | 6.7–50.4 | ||
% time exposed to DMT | Median (IQR) | 82.9 (36.1, 97.6) | 82.04 (14.6, 97.3) | 67.4 (18.3, 93.3) | 79.7 (37.6, 96.6) | 79.9 (29.8, 97.8) | 60.6 (17.3, 91.2) | 0.22 |
Range | 0–100 | 0–100 | 0–100 | 0–100 | 0–100 | 0–100 | ||
ARR | Median (IQR) | 0.16 (0, 0.38) | 0.10 (0, 0.19) | 0.17 (0, 0.44) | 0.17 (0.06, 0.36) | 0.10 (0, 0.22) | 0.17 (0.06, 0.39) | 0.43 |
wGRS | Mean (SD) | - | - | - | 2.85 (0.79) | 2.71 (0.80) | 2.90 (0.78) | 0.23 |
. | . | Assessed . | Genotyped-passed QC . | Cohen’s d . | ||||
---|---|---|---|---|---|---|---|---|
Characteristics . | . | All . | Mild . | Severe . | All . | Mild . | Severe . | Mild versus severe . |
n = 5851 . | n = 1180 . | n = 1167 . | n = 1813 . | n = 447 . | n = 464 . | |||
Population, n (%) | Australian | 1993 (34.1) | 625 (53.0) | 375 (32.1) | 676 (37.3) | 209 (46.8) | 161 (34.7) | |
Czech | 2664 (45.5) | 346 (29.3) | 497 (42.6) | 716 (39.5) | 156 (34.9) | 164 (35.3) | ||
Spanish | 1194 (20.4) | 209 (17.7) | 295 (25.3) | 421 (23.2) | 82 (18.3) | 139 (30.0) | ||
l-ARMSS score | Median (IQR) | 4.53 (2.79, 6.55) | 1.49 (0.93, 1.97) | 8.24 (7.63, 8.91) | 4.13 (2.43, 7.21) | 1.49 (0.96, 2.03) | 8.55 (7.92, 9.12) | 10.40 |
Range | 0.08–9.99 | 0.08–2.39 | 7.13–9.99 | 0.14–9.98 | 0.14–2.38 | 7.13–9.98 | ||
Disease course, n (%) | RRMS | 5075 (86.7) | 1150 (97.5) | 743 (63.7) | 1471 (81.1) | 438 (98.0) | 226 (48.7) | |
SPMS | 776 (13.2) | 30 (2.5) | 424 (36.3) | 342 (18.9) | 9 (2.0) | 238 (51.3) | ||
Sex, n (%) | Female | 4261 (72.8) | 889 (75.3) | 809 (69.3) | 1313 (72.4) | 337 (75.4) | 316 (68.1) | |
Male | 1590 (27.2) | 291 (24.7) | 358 (30.7) | 500 (27.6) | 110 (24.6) | 148 (31.9) | ||
Date of onset (month, year) | Median (IQR) | Nov 2001 (Jan 1995–May 2007) | Mar 2004 (Jul 1997–Jun 2008) | Jul 1997 (Jan 1991–Jan 2004) | Jan 2000 (Jul 1993–Jan 2005) | Dec 2001 (Oct 1995–May 2006) | Aug 1995 (Jul 1989–Jan 2000) | |
AAO | Median (IQR) | 29.2 (23.4, 36.5) | 34.3 (27.7, 41.9) | 26.0 (20.9, 32.1) | 28.3 (22.7, 35.7) | 32.3 (26.4, 39.6) | 27.2 (22.1, 33.1) | 0.58 |
Age at most recent visit | Median (IQR) | 47.2 (39.7, 56.4) | 50.5 (42.8, 58.5) | 47.4 (40.0, 56.1) | 48.1 (40.9, 57.2) | 51.2 (43.5, 58.6) | 51.1 (43.6, 58.5) | 0.003 |
Range | 17.4–87.6 | 20.2–87.6 | 17.4–80.1 | 17.4–84.5 | 25.2–83.1 | 17.4–80.1 | ||
Follow-up in MSBase, years | Median (IQR) | 10.1 (7.5, 13.3) | 9.2 (7.0, 12.1) | 10.9 (8.3, 15.0) | 11.7 (9.7, 15.2) | 11.2 (9.4, 14.1) | 12.5 (10.2, 16.2) | 0.32 |
Range | 5.0–54.8 | 5.0–32.2 | 5.0–47.0 | 5.1–32.2 | 5.1–32.2 | 5.2–29.1 | ||
Number of EDSS assessed | Median (IQR) | 18 (11, 31) | 13 (9, 20) | 18 (11, 31) | 21 (14, 34) | 17 (12, 26) | 19 (12, 30) | 0.10 |
Range | 3–91 | 3–91 | 3–85 | 3–91 | 3–91 | 3–85 | ||
Symptom duration (years) | Median (IQR) | 16.0 (10.7, 23.1) | 13.8 (9.7, 20.2) | 20.6 (14.3, 26.9) | 18.1 (13.2, 24.4) | 16.2 (11.9, 22.0) | 22.2 (17.5, 28.0) | 0.61 |
Range | 5.1–66.6 | 5.1–58.5 | 5.3–57.1 | 5.6–55.4 | 5.9–55.4 | 6.7–50.4 | ||
% time exposed to DMT | Median (IQR) | 82.9 (36.1, 97.6) | 82.04 (14.6, 97.3) | 67.4 (18.3, 93.3) | 79.7 (37.6, 96.6) | 79.9 (29.8, 97.8) | 60.6 (17.3, 91.2) | 0.22 |
Range | 0–100 | 0–100 | 0–100 | 0–100 | 0–100 | 0–100 | ||
ARR | Median (IQR) | 0.16 (0, 0.38) | 0.10 (0, 0.19) | 0.17 (0, 0.44) | 0.17 (0.06, 0.36) | 0.10 (0, 0.22) | 0.17 (0.06, 0.39) | 0.43 |
wGRS | Mean (SD) | - | - | - | 2.85 (0.79) | 2.71 (0.80) | 2.90 (0.78) | 0.23 |
ARR = annualized relapse rate.
. | . | Assessed . | Genotyped-passed QC . | Cohen’s d . | ||||
---|---|---|---|---|---|---|---|---|
Characteristics . | . | All . | Mild . | Severe . | All . | Mild . | Severe . | Mild versus severe . |
n = 5851 . | n = 1180 . | n = 1167 . | n = 1813 . | n = 447 . | n = 464 . | |||
Population, n (%) | Australian | 1993 (34.1) | 625 (53.0) | 375 (32.1) | 676 (37.3) | 209 (46.8) | 161 (34.7) | |
Czech | 2664 (45.5) | 346 (29.3) | 497 (42.6) | 716 (39.5) | 156 (34.9) | 164 (35.3) | ||
Spanish | 1194 (20.4) | 209 (17.7) | 295 (25.3) | 421 (23.2) | 82 (18.3) | 139 (30.0) | ||
l-ARMSS score | Median (IQR) | 4.53 (2.79, 6.55) | 1.49 (0.93, 1.97) | 8.24 (7.63, 8.91) | 4.13 (2.43, 7.21) | 1.49 (0.96, 2.03) | 8.55 (7.92, 9.12) | 10.40 |
Range | 0.08–9.99 | 0.08–2.39 | 7.13–9.99 | 0.14–9.98 | 0.14–2.38 | 7.13–9.98 | ||
Disease course, n (%) | RRMS | 5075 (86.7) | 1150 (97.5) | 743 (63.7) | 1471 (81.1) | 438 (98.0) | 226 (48.7) | |
SPMS | 776 (13.2) | 30 (2.5) | 424 (36.3) | 342 (18.9) | 9 (2.0) | 238 (51.3) | ||
Sex, n (%) | Female | 4261 (72.8) | 889 (75.3) | 809 (69.3) | 1313 (72.4) | 337 (75.4) | 316 (68.1) | |
Male | 1590 (27.2) | 291 (24.7) | 358 (30.7) | 500 (27.6) | 110 (24.6) | 148 (31.9) | ||
Date of onset (month, year) | Median (IQR) | Nov 2001 (Jan 1995–May 2007) | Mar 2004 (Jul 1997–Jun 2008) | Jul 1997 (Jan 1991–Jan 2004) | Jan 2000 (Jul 1993–Jan 2005) | Dec 2001 (Oct 1995–May 2006) | Aug 1995 (Jul 1989–Jan 2000) | |
AAO | Median (IQR) | 29.2 (23.4, 36.5) | 34.3 (27.7, 41.9) | 26.0 (20.9, 32.1) | 28.3 (22.7, 35.7) | 32.3 (26.4, 39.6) | 27.2 (22.1, 33.1) | 0.58 |
Age at most recent visit | Median (IQR) | 47.2 (39.7, 56.4) | 50.5 (42.8, 58.5) | 47.4 (40.0, 56.1) | 48.1 (40.9, 57.2) | 51.2 (43.5, 58.6) | 51.1 (43.6, 58.5) | 0.003 |
Range | 17.4–87.6 | 20.2–87.6 | 17.4–80.1 | 17.4–84.5 | 25.2–83.1 | 17.4–80.1 | ||
Follow-up in MSBase, years | Median (IQR) | 10.1 (7.5, 13.3) | 9.2 (7.0, 12.1) | 10.9 (8.3, 15.0) | 11.7 (9.7, 15.2) | 11.2 (9.4, 14.1) | 12.5 (10.2, 16.2) | 0.32 |
Range | 5.0–54.8 | 5.0–32.2 | 5.0–47.0 | 5.1–32.2 | 5.1–32.2 | 5.2–29.1 | ||
Number of EDSS assessed | Median (IQR) | 18 (11, 31) | 13 (9, 20) | 18 (11, 31) | 21 (14, 34) | 17 (12, 26) | 19 (12, 30) | 0.10 |
Range | 3–91 | 3–91 | 3–85 | 3–91 | 3–91 | 3–85 | ||
Symptom duration (years) | Median (IQR) | 16.0 (10.7, 23.1) | 13.8 (9.7, 20.2) | 20.6 (14.3, 26.9) | 18.1 (13.2, 24.4) | 16.2 (11.9, 22.0) | 22.2 (17.5, 28.0) | 0.61 |
Range | 5.1–66.6 | 5.1–58.5 | 5.3–57.1 | 5.6–55.4 | 5.9–55.4 | 6.7–50.4 | ||
% time exposed to DMT | Median (IQR) | 82.9 (36.1, 97.6) | 82.04 (14.6, 97.3) | 67.4 (18.3, 93.3) | 79.7 (37.6, 96.6) | 79.9 (29.8, 97.8) | 60.6 (17.3, 91.2) | 0.22 |
Range | 0–100 | 0–100 | 0–100 | 0–100 | 0–100 | 0–100 | ||
ARR | Median (IQR) | 0.16 (0, 0.38) | 0.10 (0, 0.19) | 0.17 (0, 0.44) | 0.17 (0.06, 0.36) | 0.10 (0, 0.22) | 0.17 (0.06, 0.39) | 0.43 |
wGRS | Mean (SD) | - | - | - | 2.85 (0.79) | 2.71 (0.80) | 2.90 (0.78) | 0.23 |
. | . | Assessed . | Genotyped-passed QC . | Cohen’s d . | ||||
---|---|---|---|---|---|---|---|---|
Characteristics . | . | All . | Mild . | Severe . | All . | Mild . | Severe . | Mild versus severe . |
n = 5851 . | n = 1180 . | n = 1167 . | n = 1813 . | n = 447 . | n = 464 . | |||
Population, n (%) | Australian | 1993 (34.1) | 625 (53.0) | 375 (32.1) | 676 (37.3) | 209 (46.8) | 161 (34.7) | |
Czech | 2664 (45.5) | 346 (29.3) | 497 (42.6) | 716 (39.5) | 156 (34.9) | 164 (35.3) | ||
Spanish | 1194 (20.4) | 209 (17.7) | 295 (25.3) | 421 (23.2) | 82 (18.3) | 139 (30.0) | ||
l-ARMSS score | Median (IQR) | 4.53 (2.79, 6.55) | 1.49 (0.93, 1.97) | 8.24 (7.63, 8.91) | 4.13 (2.43, 7.21) | 1.49 (0.96, 2.03) | 8.55 (7.92, 9.12) | 10.40 |
Range | 0.08–9.99 | 0.08–2.39 | 7.13–9.99 | 0.14–9.98 | 0.14–2.38 | 7.13–9.98 | ||
Disease course, n (%) | RRMS | 5075 (86.7) | 1150 (97.5) | 743 (63.7) | 1471 (81.1) | 438 (98.0) | 226 (48.7) | |
SPMS | 776 (13.2) | 30 (2.5) | 424 (36.3) | 342 (18.9) | 9 (2.0) | 238 (51.3) | ||
Sex, n (%) | Female | 4261 (72.8) | 889 (75.3) | 809 (69.3) | 1313 (72.4) | 337 (75.4) | 316 (68.1) | |
Male | 1590 (27.2) | 291 (24.7) | 358 (30.7) | 500 (27.6) | 110 (24.6) | 148 (31.9) | ||
Date of onset (month, year) | Median (IQR) | Nov 2001 (Jan 1995–May 2007) | Mar 2004 (Jul 1997–Jun 2008) | Jul 1997 (Jan 1991–Jan 2004) | Jan 2000 (Jul 1993–Jan 2005) | Dec 2001 (Oct 1995–May 2006) | Aug 1995 (Jul 1989–Jan 2000) | |
AAO | Median (IQR) | 29.2 (23.4, 36.5) | 34.3 (27.7, 41.9) | 26.0 (20.9, 32.1) | 28.3 (22.7, 35.7) | 32.3 (26.4, 39.6) | 27.2 (22.1, 33.1) | 0.58 |
Age at most recent visit | Median (IQR) | 47.2 (39.7, 56.4) | 50.5 (42.8, 58.5) | 47.4 (40.0, 56.1) | 48.1 (40.9, 57.2) | 51.2 (43.5, 58.6) | 51.1 (43.6, 58.5) | 0.003 |
Range | 17.4–87.6 | 20.2–87.6 | 17.4–80.1 | 17.4–84.5 | 25.2–83.1 | 17.4–80.1 | ||
Follow-up in MSBase, years | Median (IQR) | 10.1 (7.5, 13.3) | 9.2 (7.0, 12.1) | 10.9 (8.3, 15.0) | 11.7 (9.7, 15.2) | 11.2 (9.4, 14.1) | 12.5 (10.2, 16.2) | 0.32 |
Range | 5.0–54.8 | 5.0–32.2 | 5.0–47.0 | 5.1–32.2 | 5.1–32.2 | 5.2–29.1 | ||
Number of EDSS assessed | Median (IQR) | 18 (11, 31) | 13 (9, 20) | 18 (11, 31) | 21 (14, 34) | 17 (12, 26) | 19 (12, 30) | 0.10 |
Range | 3–91 | 3–91 | 3–85 | 3–91 | 3–91 | 3–85 | ||
Symptom duration (years) | Median (IQR) | 16.0 (10.7, 23.1) | 13.8 (9.7, 20.2) | 20.6 (14.3, 26.9) | 18.1 (13.2, 24.4) | 16.2 (11.9, 22.0) | 22.2 (17.5, 28.0) | 0.61 |
Range | 5.1–66.6 | 5.1–58.5 | 5.3–57.1 | 5.6–55.4 | 5.9–55.4 | 6.7–50.4 | ||
% time exposed to DMT | Median (IQR) | 82.9 (36.1, 97.6) | 82.04 (14.6, 97.3) | 67.4 (18.3, 93.3) | 79.7 (37.6, 96.6) | 79.9 (29.8, 97.8) | 60.6 (17.3, 91.2) | 0.22 |
Range | 0–100 | 0–100 | 0–100 | 0–100 | 0–100 | 0–100 | ||
ARR | Median (IQR) | 0.16 (0, 0.38) | 0.10 (0, 0.19) | 0.17 (0, 0.44) | 0.17 (0.06, 0.36) | 0.10 (0, 0.22) | 0.17 (0.06, 0.39) | 0.43 |
wGRS | Mean (SD) | - | - | - | 2.85 (0.79) | 2.71 (0.80) | 2.90 (0.78) | 0.23 |
ARR = annualized relapse rate.
Primary analyses
Genome-wide association search for SNVs associated with longitudinal severity measures
We first performed a genome-wide association analysis to identify novel variants associated with l-ARMSS continuous phenotypes. Regression analyses of l-ARMSS phenotypes were a priori adjusted for the first five PCs, the number of HLA-DRB1*15:01 alleles carried, percentage of time exposed to disease-modifying therapy and further, for imbalanced variables (Cohen’s d > 0.15, Table 1). Fixed-effects meta-analysis did not identify any SNVs that surpassed genome-wide significance (P < 5 × 10−8; Table 2, Fig. 3, Supplementary Table 4 and Supplementary Fig. 2). Similarly, adjusted l-MSSS analyses did not identify any significant associations, related to continuous phenotypes (Table 2, Supplementary Table 5 and Supplementary Fig. 3). Assessment of the genomic locations of SNVs with P < 1 × 10−5 for the l-ARMSS phenotype demonstrated that 55.1% (P = 6.63 × 10−3 for enrichment) of the signals were in intergenic regions and 34.7% were intronic (Supplementary Fig. 4A). This was numerically different to the l-MSSS end-point analysis, where 47.9% of SNVs were intronic (P = 5.41 × 10−3 for enrichment) and 36.8% were intergenic (P = 0.0193; Supplementary Fig. 4B).

Primary l-ARMSS genome-wide association results. (A) Manhattan plot of results from the l-ARMSS GWAS. Variants with suggestive significance (P < 1 × 10−5) are indicated above the dashed blue line. The strongest signal suggestive of association is seen on chromosome 22. (B) Locus zoom plot of the lead signal (rs7289466) on chromosome 22 demonstrates a series of SNVs in high linkage disequilibrium with the lead SNV. Variants are intronic to the SEZ6L gene.
Results of fixed-effects meta-analyses: top SNVs for the 10 nearest genes for l-ARMSS and l-MSSS phenotypes and top SNVs for the five nearest genes in sex-stratified analyses
rsID . | Chr . | Ranked order SNV . | Ranked order nearest gene . | Nearest gene . | Minor allele . | MAF . | adj P(fe)a,b . | β . |
---|---|---|---|---|---|---|---|---|
l-ARMSSa, n = 1813 | ||||||||
rs7289446 | 22 | 1 | 1 | SEZ6L | G | 0.27 | 2.73 × 10−7 | −0.488 |
rs1207401 | 22 | 2 | 1 | SEZ6L | A | 0.27 | 2.90 × 10−7 | −0.490 |
rs7758683 | 6 | 6 | 2 | EPHA7 | T | 0.23 | 1.88 × 10−6 | −0.477 |
rs56089601 | 4 | 7 | 3 | STK32B | C | 0.10 | 2.69 × 10−6 | 0.655 |
rs12953974 | 18 | 10 | 4 | CTIF | A | 0.12 | 3.64 × 10−6 | −0.607 |
rs56194930 | 9 | 11 | 5 | ACO1 | G | 0.11 | 3.69 × 10−6 | 0.648 |
rs73091975 | 7 | 14 | 6 | CCDC129 | G | 0.09 | 3.84 × 10−6 | −0.680 |
rs2274333 | 1 | 16 | 7 | CA6 | G | 0.29 | 4.60 × 10−6 | −0.429 |
rs295254 | 9 | 20 | 8 | RCL1 | G | 0.38 | 5.64 × 10−6 | 0.388 |
rs11057374 | 12 | 21 | 9 | DNAH10 | G | 0.35 | 5.79 × 10−6 | −0.390 |
rs111399831 | 13 | 29 | 10 | SUCLA2 | A | 0.21 | 7.34 × 10−6 | −0.468 |
l-MSSSb, n = 1813 | ||||||||
rs1207401 | 22 | 1 | 1 | SEZ6L | A | 0.27 | 2.88 × 10−7 | −0.484 |
rs7289446 | 22 | 4 | 1 | SEZ6L | G | 0.27 | 3.35 × 10−7 | −0.479 |
rs9643199 | 8 | 6 | 2 | MTSS1 | A | 0.26 | 1.90 × 10−6 | 0.465 |
rs2725556 | 15 | 22 | 3 | UNC13C | A | 0.08 | 2.39 × 10−6 | 0.709 |
rs61578937 | 8 | 23 | 4 | NCOA2 | G | 0.13 | 2.86 × 10−6 | −0.562 |
rs7758683 | 6 | 27 | 5 | EPHA7 | T | 0.23 | 3.61 × 10−6 | −0.456 |
rs56089601 | 4 | 32 | 6 | STK32B | C | 0.10 | 4.65 × 10−6 | 0.631 |
rs4495680 | 1 | 33 | 7 | RGS13 | G | 0.41 | 5.65 × 10−6 | −0.390 |
rs56363129 | 10 | 42 | 8 | SEPHS1 | G | 0.15 | 6.56 × 10−6 | 0.516 |
rs111399831 | 13 | 47 | 9 | SUCLA2 | A | 0.15 | 7.07 × 10−6 | −0.534 |
l-ARMSS Females only, n = 1313 | ||||||||
rs1219732 | 10 | 1 | 1 | CPXM2 | T | 0.35 | 6.48 × 10−8 | 0.569 |
rs10967273 | 9 | 3 | 2 | LOC100506422 | T | 0.13 | 1.16 × 10−7 | 0.799 |
rs4572384 | 16 | 15 | 3 | CRISPLD2 | A | 0.40 | 2.77 × 10−6 | 0.487 |
rs2776741 | 1 | 16 | 4 | RCAN3AS | A | 0.31 | 2.98 × 10−6 | −0.516 |
l-ARMSS Males only, n = 500 | ||||||||
rs7315384 | 12 | 1 | 1 | STAB2 | C | 0.24 | 1.29 × 10−7 | 1.041 |
rs7070182 | 10 | 2 | 2 | TCF7L2 | C | 0.18 | 3.65 × 10−7 | −1.112 |
rs11845242 | 14 | 4 | 3 | LINC00520 | G | 0.41 | 5.63 × 10−7 | −0.800 |
rs3885012 | 12 | 5 | 4 | PHLDA1 | G | 0.06 | 1.19 × 10−6 | −1.499 |
l-MSSS Females only, n = 1313 | ||||||||
rs10967273 | 9 | 1 | 1 | LOC100506422 | T | 0.13 | 3.52 × 10−8 | 0.830 |
rs9643199 | 8 | 2 | 2 | MTSS1 | A | 0.26 | 6.54 × 10−8 | 0.631 |
rs17169210 | 5 | 4 | 3 | SLC25A48 | C | 0.23 | 1.25 × 10−7 | −0.621 |
rs1219732 | 10 | 7 | 4 | CPXM2 | T | 0.35 | 1.70 × 10−7 | 0.550 |
l-MSSS Males only, n = 500 | ||||||||
rs698805 | 2 | 1 | 1 | CAMKMT | G | 0.07 | 4.35 × 10−8 | −1.540 |
rs7315384 | 12 | 2 | 2 | STAB2 | C | 0.24 | 8.00 × 10−8 | 1.020 |
rs3885012 | 12 | 14 | 4c | PHLDA1 | G | 0.06 | 3.69 × 10−7 | −1.312 |
rs7070182 | 10 | 35 | 5 | TCF7L2 | C | 0.18 | 3.86 × 10−7 | −1.054 |
rsID . | Chr . | Ranked order SNV . | Ranked order nearest gene . | Nearest gene . | Minor allele . | MAF . | adj P(fe)a,b . | β . |
---|---|---|---|---|---|---|---|---|
l-ARMSSa, n = 1813 | ||||||||
rs7289446 | 22 | 1 | 1 | SEZ6L | G | 0.27 | 2.73 × 10−7 | −0.488 |
rs1207401 | 22 | 2 | 1 | SEZ6L | A | 0.27 | 2.90 × 10−7 | −0.490 |
rs7758683 | 6 | 6 | 2 | EPHA7 | T | 0.23 | 1.88 × 10−6 | −0.477 |
rs56089601 | 4 | 7 | 3 | STK32B | C | 0.10 | 2.69 × 10−6 | 0.655 |
rs12953974 | 18 | 10 | 4 | CTIF | A | 0.12 | 3.64 × 10−6 | −0.607 |
rs56194930 | 9 | 11 | 5 | ACO1 | G | 0.11 | 3.69 × 10−6 | 0.648 |
rs73091975 | 7 | 14 | 6 | CCDC129 | G | 0.09 | 3.84 × 10−6 | −0.680 |
rs2274333 | 1 | 16 | 7 | CA6 | G | 0.29 | 4.60 × 10−6 | −0.429 |
rs295254 | 9 | 20 | 8 | RCL1 | G | 0.38 | 5.64 × 10−6 | 0.388 |
rs11057374 | 12 | 21 | 9 | DNAH10 | G | 0.35 | 5.79 × 10−6 | −0.390 |
rs111399831 | 13 | 29 | 10 | SUCLA2 | A | 0.21 | 7.34 × 10−6 | −0.468 |
l-MSSSb, n = 1813 | ||||||||
rs1207401 | 22 | 1 | 1 | SEZ6L | A | 0.27 | 2.88 × 10−7 | −0.484 |
rs7289446 | 22 | 4 | 1 | SEZ6L | G | 0.27 | 3.35 × 10−7 | −0.479 |
rs9643199 | 8 | 6 | 2 | MTSS1 | A | 0.26 | 1.90 × 10−6 | 0.465 |
rs2725556 | 15 | 22 | 3 | UNC13C | A | 0.08 | 2.39 × 10−6 | 0.709 |
rs61578937 | 8 | 23 | 4 | NCOA2 | G | 0.13 | 2.86 × 10−6 | −0.562 |
rs7758683 | 6 | 27 | 5 | EPHA7 | T | 0.23 | 3.61 × 10−6 | −0.456 |
rs56089601 | 4 | 32 | 6 | STK32B | C | 0.10 | 4.65 × 10−6 | 0.631 |
rs4495680 | 1 | 33 | 7 | RGS13 | G | 0.41 | 5.65 × 10−6 | −0.390 |
rs56363129 | 10 | 42 | 8 | SEPHS1 | G | 0.15 | 6.56 × 10−6 | 0.516 |
rs111399831 | 13 | 47 | 9 | SUCLA2 | A | 0.15 | 7.07 × 10−6 | −0.534 |
l-ARMSS Females only, n = 1313 | ||||||||
rs1219732 | 10 | 1 | 1 | CPXM2 | T | 0.35 | 6.48 × 10−8 | 0.569 |
rs10967273 | 9 | 3 | 2 | LOC100506422 | T | 0.13 | 1.16 × 10−7 | 0.799 |
rs4572384 | 16 | 15 | 3 | CRISPLD2 | A | 0.40 | 2.77 × 10−6 | 0.487 |
rs2776741 | 1 | 16 | 4 | RCAN3AS | A | 0.31 | 2.98 × 10−6 | −0.516 |
l-ARMSS Males only, n = 500 | ||||||||
rs7315384 | 12 | 1 | 1 | STAB2 | C | 0.24 | 1.29 × 10−7 | 1.041 |
rs7070182 | 10 | 2 | 2 | TCF7L2 | C | 0.18 | 3.65 × 10−7 | −1.112 |
rs11845242 | 14 | 4 | 3 | LINC00520 | G | 0.41 | 5.63 × 10−7 | −0.800 |
rs3885012 | 12 | 5 | 4 | PHLDA1 | G | 0.06 | 1.19 × 10−6 | −1.499 |
l-MSSS Females only, n = 1313 | ||||||||
rs10967273 | 9 | 1 | 1 | LOC100506422 | T | 0.13 | 3.52 × 10−8 | 0.830 |
rs9643199 | 8 | 2 | 2 | MTSS1 | A | 0.26 | 6.54 × 10−8 | 0.631 |
rs17169210 | 5 | 4 | 3 | SLC25A48 | C | 0.23 | 1.25 × 10−7 | −0.621 |
rs1219732 | 10 | 7 | 4 | CPXM2 | T | 0.35 | 1.70 × 10−7 | 0.550 |
l-MSSS Males only, n = 500 | ||||||||
rs698805 | 2 | 1 | 1 | CAMKMT | G | 0.07 | 4.35 × 10−8 | −1.540 |
rs7315384 | 12 | 2 | 2 | STAB2 | C | 0.24 | 8.00 × 10−8 | 1.020 |
rs3885012 | 12 | 14 | 4c | PHLDA1 | G | 0.06 | 3.69 × 10−7 | −1.312 |
rs7070182 | 10 | 35 | 5 | TCF7L2 | C | 0.18 | 3.86 × 10−7 | −1.054 |
fe = fixed-effects meta-analyses (n = 6) adjusted for the first five PCs. Bold P-values surpass genome-wide significance thresholds.
l-ARMSS analyses adjusted for: % time on therapy since disease onset, wGRS, number of DRB1*1501 alleles and imbalanced variables: follow-up time in MSBase (years), symptom duration (years), annualized relapse rate (ARR).
l-MSSS analyses additionally adjusted for % time on therapy since disease onset, wGRS, number of DRB1*1501 alleles, and imbalanced variables including: age at most recent visit, number of EDSS assessments, symptom duration (years).
Third closest gene hit had >80% heterogeneity.
Results of fixed-effects meta-analyses: top SNVs for the 10 nearest genes for l-ARMSS and l-MSSS phenotypes and top SNVs for the five nearest genes in sex-stratified analyses
rsID . | Chr . | Ranked order SNV . | Ranked order nearest gene . | Nearest gene . | Minor allele . | MAF . | adj P(fe)a,b . | β . |
---|---|---|---|---|---|---|---|---|
l-ARMSSa, n = 1813 | ||||||||
rs7289446 | 22 | 1 | 1 | SEZ6L | G | 0.27 | 2.73 × 10−7 | −0.488 |
rs1207401 | 22 | 2 | 1 | SEZ6L | A | 0.27 | 2.90 × 10−7 | −0.490 |
rs7758683 | 6 | 6 | 2 | EPHA7 | T | 0.23 | 1.88 × 10−6 | −0.477 |
rs56089601 | 4 | 7 | 3 | STK32B | C | 0.10 | 2.69 × 10−6 | 0.655 |
rs12953974 | 18 | 10 | 4 | CTIF | A | 0.12 | 3.64 × 10−6 | −0.607 |
rs56194930 | 9 | 11 | 5 | ACO1 | G | 0.11 | 3.69 × 10−6 | 0.648 |
rs73091975 | 7 | 14 | 6 | CCDC129 | G | 0.09 | 3.84 × 10−6 | −0.680 |
rs2274333 | 1 | 16 | 7 | CA6 | G | 0.29 | 4.60 × 10−6 | −0.429 |
rs295254 | 9 | 20 | 8 | RCL1 | G | 0.38 | 5.64 × 10−6 | 0.388 |
rs11057374 | 12 | 21 | 9 | DNAH10 | G | 0.35 | 5.79 × 10−6 | −0.390 |
rs111399831 | 13 | 29 | 10 | SUCLA2 | A | 0.21 | 7.34 × 10−6 | −0.468 |
l-MSSSb, n = 1813 | ||||||||
rs1207401 | 22 | 1 | 1 | SEZ6L | A | 0.27 | 2.88 × 10−7 | −0.484 |
rs7289446 | 22 | 4 | 1 | SEZ6L | G | 0.27 | 3.35 × 10−7 | −0.479 |
rs9643199 | 8 | 6 | 2 | MTSS1 | A | 0.26 | 1.90 × 10−6 | 0.465 |
rs2725556 | 15 | 22 | 3 | UNC13C | A | 0.08 | 2.39 × 10−6 | 0.709 |
rs61578937 | 8 | 23 | 4 | NCOA2 | G | 0.13 | 2.86 × 10−6 | −0.562 |
rs7758683 | 6 | 27 | 5 | EPHA7 | T | 0.23 | 3.61 × 10−6 | −0.456 |
rs56089601 | 4 | 32 | 6 | STK32B | C | 0.10 | 4.65 × 10−6 | 0.631 |
rs4495680 | 1 | 33 | 7 | RGS13 | G | 0.41 | 5.65 × 10−6 | −0.390 |
rs56363129 | 10 | 42 | 8 | SEPHS1 | G | 0.15 | 6.56 × 10−6 | 0.516 |
rs111399831 | 13 | 47 | 9 | SUCLA2 | A | 0.15 | 7.07 × 10−6 | −0.534 |
l-ARMSS Females only, n = 1313 | ||||||||
rs1219732 | 10 | 1 | 1 | CPXM2 | T | 0.35 | 6.48 × 10−8 | 0.569 |
rs10967273 | 9 | 3 | 2 | LOC100506422 | T | 0.13 | 1.16 × 10−7 | 0.799 |
rs4572384 | 16 | 15 | 3 | CRISPLD2 | A | 0.40 | 2.77 × 10−6 | 0.487 |
rs2776741 | 1 | 16 | 4 | RCAN3AS | A | 0.31 | 2.98 × 10−6 | −0.516 |
l-ARMSS Males only, n = 500 | ||||||||
rs7315384 | 12 | 1 | 1 | STAB2 | C | 0.24 | 1.29 × 10−7 | 1.041 |
rs7070182 | 10 | 2 | 2 | TCF7L2 | C | 0.18 | 3.65 × 10−7 | −1.112 |
rs11845242 | 14 | 4 | 3 | LINC00520 | G | 0.41 | 5.63 × 10−7 | −0.800 |
rs3885012 | 12 | 5 | 4 | PHLDA1 | G | 0.06 | 1.19 × 10−6 | −1.499 |
l-MSSS Females only, n = 1313 | ||||||||
rs10967273 | 9 | 1 | 1 | LOC100506422 | T | 0.13 | 3.52 × 10−8 | 0.830 |
rs9643199 | 8 | 2 | 2 | MTSS1 | A | 0.26 | 6.54 × 10−8 | 0.631 |
rs17169210 | 5 | 4 | 3 | SLC25A48 | C | 0.23 | 1.25 × 10−7 | −0.621 |
rs1219732 | 10 | 7 | 4 | CPXM2 | T | 0.35 | 1.70 × 10−7 | 0.550 |
l-MSSS Males only, n = 500 | ||||||||
rs698805 | 2 | 1 | 1 | CAMKMT | G | 0.07 | 4.35 × 10−8 | −1.540 |
rs7315384 | 12 | 2 | 2 | STAB2 | C | 0.24 | 8.00 × 10−8 | 1.020 |
rs3885012 | 12 | 14 | 4c | PHLDA1 | G | 0.06 | 3.69 × 10−7 | −1.312 |
rs7070182 | 10 | 35 | 5 | TCF7L2 | C | 0.18 | 3.86 × 10−7 | −1.054 |
rsID . | Chr . | Ranked order SNV . | Ranked order nearest gene . | Nearest gene . | Minor allele . | MAF . | adj P(fe)a,b . | β . |
---|---|---|---|---|---|---|---|---|
l-ARMSSa, n = 1813 | ||||||||
rs7289446 | 22 | 1 | 1 | SEZ6L | G | 0.27 | 2.73 × 10−7 | −0.488 |
rs1207401 | 22 | 2 | 1 | SEZ6L | A | 0.27 | 2.90 × 10−7 | −0.490 |
rs7758683 | 6 | 6 | 2 | EPHA7 | T | 0.23 | 1.88 × 10−6 | −0.477 |
rs56089601 | 4 | 7 | 3 | STK32B | C | 0.10 | 2.69 × 10−6 | 0.655 |
rs12953974 | 18 | 10 | 4 | CTIF | A | 0.12 | 3.64 × 10−6 | −0.607 |
rs56194930 | 9 | 11 | 5 | ACO1 | G | 0.11 | 3.69 × 10−6 | 0.648 |
rs73091975 | 7 | 14 | 6 | CCDC129 | G | 0.09 | 3.84 × 10−6 | −0.680 |
rs2274333 | 1 | 16 | 7 | CA6 | G | 0.29 | 4.60 × 10−6 | −0.429 |
rs295254 | 9 | 20 | 8 | RCL1 | G | 0.38 | 5.64 × 10−6 | 0.388 |
rs11057374 | 12 | 21 | 9 | DNAH10 | G | 0.35 | 5.79 × 10−6 | −0.390 |
rs111399831 | 13 | 29 | 10 | SUCLA2 | A | 0.21 | 7.34 × 10−6 | −0.468 |
l-MSSSb, n = 1813 | ||||||||
rs1207401 | 22 | 1 | 1 | SEZ6L | A | 0.27 | 2.88 × 10−7 | −0.484 |
rs7289446 | 22 | 4 | 1 | SEZ6L | G | 0.27 | 3.35 × 10−7 | −0.479 |
rs9643199 | 8 | 6 | 2 | MTSS1 | A | 0.26 | 1.90 × 10−6 | 0.465 |
rs2725556 | 15 | 22 | 3 | UNC13C | A | 0.08 | 2.39 × 10−6 | 0.709 |
rs61578937 | 8 | 23 | 4 | NCOA2 | G | 0.13 | 2.86 × 10−6 | −0.562 |
rs7758683 | 6 | 27 | 5 | EPHA7 | T | 0.23 | 3.61 × 10−6 | −0.456 |
rs56089601 | 4 | 32 | 6 | STK32B | C | 0.10 | 4.65 × 10−6 | 0.631 |
rs4495680 | 1 | 33 | 7 | RGS13 | G | 0.41 | 5.65 × 10−6 | −0.390 |
rs56363129 | 10 | 42 | 8 | SEPHS1 | G | 0.15 | 6.56 × 10−6 | 0.516 |
rs111399831 | 13 | 47 | 9 | SUCLA2 | A | 0.15 | 7.07 × 10−6 | −0.534 |
l-ARMSS Females only, n = 1313 | ||||||||
rs1219732 | 10 | 1 | 1 | CPXM2 | T | 0.35 | 6.48 × 10−8 | 0.569 |
rs10967273 | 9 | 3 | 2 | LOC100506422 | T | 0.13 | 1.16 × 10−7 | 0.799 |
rs4572384 | 16 | 15 | 3 | CRISPLD2 | A | 0.40 | 2.77 × 10−6 | 0.487 |
rs2776741 | 1 | 16 | 4 | RCAN3AS | A | 0.31 | 2.98 × 10−6 | −0.516 |
l-ARMSS Males only, n = 500 | ||||||||
rs7315384 | 12 | 1 | 1 | STAB2 | C | 0.24 | 1.29 × 10−7 | 1.041 |
rs7070182 | 10 | 2 | 2 | TCF7L2 | C | 0.18 | 3.65 × 10−7 | −1.112 |
rs11845242 | 14 | 4 | 3 | LINC00520 | G | 0.41 | 5.63 × 10−7 | −0.800 |
rs3885012 | 12 | 5 | 4 | PHLDA1 | G | 0.06 | 1.19 × 10−6 | −1.499 |
l-MSSS Females only, n = 1313 | ||||||||
rs10967273 | 9 | 1 | 1 | LOC100506422 | T | 0.13 | 3.52 × 10−8 | 0.830 |
rs9643199 | 8 | 2 | 2 | MTSS1 | A | 0.26 | 6.54 × 10−8 | 0.631 |
rs17169210 | 5 | 4 | 3 | SLC25A48 | C | 0.23 | 1.25 × 10−7 | −0.621 |
rs1219732 | 10 | 7 | 4 | CPXM2 | T | 0.35 | 1.70 × 10−7 | 0.550 |
l-MSSS Males only, n = 500 | ||||||||
rs698805 | 2 | 1 | 1 | CAMKMT | G | 0.07 | 4.35 × 10−8 | −1.540 |
rs7315384 | 12 | 2 | 2 | STAB2 | C | 0.24 | 8.00 × 10−8 | 1.020 |
rs3885012 | 12 | 14 | 4c | PHLDA1 | G | 0.06 | 3.69 × 10−7 | −1.312 |
rs7070182 | 10 | 35 | 5 | TCF7L2 | C | 0.18 | 3.86 × 10−7 | −1.054 |
fe = fixed-effects meta-analyses (n = 6) adjusted for the first five PCs. Bold P-values surpass genome-wide significance thresholds.
l-ARMSS analyses adjusted for: % time on therapy since disease onset, wGRS, number of DRB1*1501 alleles and imbalanced variables: follow-up time in MSBase (years), symptom duration (years), annualized relapse rate (ARR).
l-MSSS analyses additionally adjusted for % time on therapy since disease onset, wGRS, number of DRB1*1501 alleles, and imbalanced variables including: age at most recent visit, number of EDSS assessments, symptom duration (years).
Third closest gene hit had >80% heterogeneity.
A summary of the top variants associated with the continuous l-ARMSS and l-MSSS analyses are shown in Table 2. The top signal in the continuous l-ARMSS analysis was rs7289446 (β = −0.4882, P = 2.73 × 10−7), intronic to SEZ6L, a gene associated with dendritic spine density and arborization.37 The top signal in the continuous l-MSSS analysis also implicated a variant intronic to SEZ6L, rs1207401 (β = −0.4841, P = 2.88 × 10−7). These two SEZ6L associated SNVs are in perfect linkage disequilibrium (R2 = 1, D′ = 1; Fig. 3 and Supplementary Table 6).
Heritability analyses
To estimate the extent to which the variability of l-ARMSS or l-MSSS-defined severity could be explained by genetic architecture, we calculated narrow-sense heritability estimates (h2g) for our cohort (n = 1813). The overall heritability estimate for the l-ARMSS phenotype was h2g 0.19 (SE 0.15, P = 0.02) using the genome-based restricted maximum likelihood by GCTA method. Similar estimates were achieved using alternate heritability estimate tools (Supplementary Table 7). The overall l-MSSS h2g heritability estimate was slightly greater than for l-ARMSS (h2g 0.29; SE 0.14, P = 0.001). However, alternate heritability estimates for l-MSSS proved highly inconsistent (Supplementary Table 7).
Machine learning
Given that, as expected, our GWAS approach did not identify any SNVs that surpassed significance thresholds, we implemented an xgboost33 ML algorithm to determine whether a non-linear ML model could find genetic associations with severity that traditional GWAS could not. We input all SNVs with an l-ARMSS GWAS P ≤ 0.01, accounting for 62 351 variants. However, no single variant was given a weight of >0.005, confirming that no genetic variant contributed appreciably to multiple sclerosis severity.
Prediction of clinical course using machine learning
Recognizing that our ML model did not further illuminate the underlying genetic architecture of multiple sclerosis severity, we sought to determine whether ML could be used to predict severity, based on l-ARMSS outcome extremes (n = 447 mild, n = 464 severe). Our ML algorithm trained on 70% (n = 638 l-ARMSS) of the cohort, then tested the remaining 30% (n = 273 l-ARMSS). We included SNVs with an l-ARMSS GWAS P ≤ 0.01 as before, together with AAO and the wGRS. Our classification algorithm had high predictive accuracy, with an area under the receiver operating characteristic curve (AUROC) 0.835 (95% CI 0.79–0.88). The addition of sex and location of first symptoms to the above did not materially affect the ML outcomes (AUROC 0.836, 95% CI 0.79–0.88; Fig. 4A and B), with 89% sensitivity, and 79% specificity; a PPV of 80% and negative predictive value (NPV) of 88%. The l-ARMSS severity classification based solely on clinical and demographic covariates available at baseline (AAO, sex, location of first symptoms), performed no better than chance with an AUROC of 0.54 (95% CI 0.48–0.60), 50% sensitivity, and 58% specificity; with a PPV of 53% and NPV of 55% (Fig. 4C and D). Restricting the algorithm to just those SNVs with P < 1 × 10−5 in the l-ARMSS GWAS (n = 336) marginally decreased predictive accuracy to AUROC = 0.82 (95% CI 0.77–0.87) confirming the polygenic nature of the genetic architecture underlying multiple sclerosis severity.

ML algorithm classification of l-ARMSS mild and severe cases. (A) l-ARMSS genetic, demographic and clinical ML model (SNV P < 0.01, wGRS, AAO, sex, first symptoms) has a high predictive value: AUROC 0.84. (B) Feature importance for l-ARMSS genetic, demographic and clinical ML model demonstrates that AAO is the strongest predictor of outcome with a weight of <0.03. No SNV carries a severity prediction weight of >0.005. (C) l-ARMSS demographic and clinical ML model (AAO, sex, location of first symptoms) performs no better than chance: AUROC 0.54. (D) Feature importance for l-ARMSS + AAO + sex + location of first symptoms ML model shows that in the absence of genetic data, AAO carries a weight of 0.8 in predicting disease outcomes.
Secondary analyses
Sex-stratified genome-wide association search for SNVs associated with longitudinal severity measures
Given our primary analyses did not identify signals of genome-wide significance, we performed sex-stratified analyses to determine whether any variant effects were potentially sex-associated. Table 2 summarizes the SNVs nearest the top four gene regions for each sex. We found rs10967273, an intergenic variant, was associated with l-MSSS-defined severity in females (βfemale = 0.8289, P = 3.52 × 10−8, Supplementary Fig. 5). This variant did not surpass significance thresholds (βfemale = 0.7994, 1.17 × 10−7) in the l-ARMSS analysis.
In males, we identified rs698805 intronic to CAMKMT (βmale = −1.5395, P = 4.35 × 10−8) as associated with l-MSSS severity (Table 2 and Supplementary Fig. 6). This variant did not surpass GWAS thresholds in the l-ARMSS analysis (βmale = −1.4199, P = 5.64 × 10−6). The variants identified in our sex-stratified analyses were not associated with severity in the opposite sex (Supplementary Table 8). We did not identify any novel genetic associations with AAO (Supplementary Table 9).
Pathway analyses
Tissue enrichment implemented in FUMA revealed an over representation of cerebellar cortex-expressed genes for both l-ARMSS (cerebellar hemisphere P = 0.071; cerebellum P = 0.077; Supplementary Fig. 7A) and l-MSSS (cerebellar hemisphere P = 0.017; cerebellum P = 0.023); Supplementary Fig. 7B). In contrast, whole blood-associated genes were not enriched in our analyses with either l-ARMSS (P = 0.75) or l-MSSS (P = 0.82) outcomes. Gene set enrichment analyses of the l-ARMSS phenotype implicated endothelial cell development (β = 0.43; P = 8.25 × 10−5), pseudopodium assembly (β = 0.84; P = 2.37 × 10−4), response to progesterone (β = 0.35; P = 2.74 × 10−4) and NMDA receptor activity (β = 1.10; P = 2.79 × 10−4). The l-MSSS phenotype was additionally enriched for Wnt signalling pathways (β = 0.24; P = 2.02 × 10−4; Supplementary Table 10). We also examined gene set enrichment using Panther. Here we corroborated an overrepresentation of heteromeric G-protein signalling pathways associated with l-ARMSS (P = 4.98 × 10−5, FDR = 8.23 × 10−3) and l-MSSS (P = 1.00 × 10−4, FDR = 1.67 × 10−2) phenotypes. The AAO phenotype was associated with endothelin (P = 2.51 × 10−4, FDR = 4.18 × 10−2), and cadherin signalling pathways (P = 2.90 × 10−4, FDR = 2.42 × 10−2).
Survival analyses
We assessed whether 30 of the top signals (Table 2) might play a role in severity modulation using an alternative definition of severity, making use of our longitudinal dataset. We assessed the time to reach the hard disability milestones of irreversible EDSS 3 (irEDSS3) and irreversible EDSS 6 (irEDSS6) in both univariable and adjusted analyses. We identified four SNVs that were associated with time to reach irreversible EDSS 3 and 6 in both unadjusted and adjusted analyses. These SNVs were then combined in multivariable analyses to determine whether they could independently predict time to these disability milestones (Supplementary Table 11). Three SNVs remained independently predictive of both time to irreversible EDSS 3 and 6 (Fig. 5), including rs7289446 [intronic to SEZ6L: irEDSS3 adjusted HR (aHR) 0.77, P = 0.008, Fig. 5A; irEDSS6 aHR 0.72, P = 4.85 × 10−4, Fig. 5B], rs295254 (intronic to RCL1: irEDSS3 aHR 1.33, P = 9.10 × 10−4,Fig. 5C; irEDSS6 aHR 1.32, P = 7.37 × 10−4, Fig. 5D) and rs111399831 (nearest to SUCLA2; irEDSS3 aHR 0.77, P = 0.036, Fig. 5E; irEDSS6 aHR 0.62, P = 2.82 × 10−4, Fig. 5F).

Kaplan–Meier survival curves showing time to irreversible EDSS milestones based on the presence (1, 2) or absence (0) of the minor allele at each locus. (A) rs7289446 intronic to SEZ6L time to irreversible EDSS 3 (aHR 0.77, P = 0.008). (B) rs7289446 intronic to SEZ6L time to irreversible EDSS 6 (aHR 0.72, P = 4.85 × 10−4). (C) rs295254 intronic to RCL1 time to irreversible EDSS 3 (aHR 1.33, P = 9.10 × 10−4). (D) rs295254 intronic to RCL1 time to irreversible EDSS 6 (aHR 1.32, P = 7.37 × 10−4). (E) rs11399831 nearest to SUCLA2 time to irreversible EDSS 3 (aHR 0.77, P = 0.036). (F) rs11399831 nearest to SUCLA2 time to irreversible EDSS 6 (aHR 0.74, P = 2.82 × 10−4).
In sex-stratified analyses, we identified two SNVs as consistently associated with time to irreversible EDSS 3 and 6 in females (Fig. 6), but not males. The independent hazards of time to reach irreversible EDSS 3 and 6 for these variants were: rs9643199 (intronic to MTSS1: irEDSS3 aHR 1.35, P = 0.006, Fig. 6A; irEDSS6 aHR 1.46, P = 7.09 × 10−4, Fig. 6B) and rs2776741 (nearest to RCAN3AS; irEDSS3 aHR 0.77, P = 0.010, Fig. 6C; irEDSS6 aHR 0.74, P = 0.005, Fig. 6D and Supplementary Table 11). rs7070182 intronic to TCF7L2 (Fig. 6) was the only variant consistently associated with time to irreversible EDSS 3 (aHR 0.59, P = 0.013, Fig. 6E) and 6 (aHR 0.56, P = 0.005, Fig. 6F) in males, with no effect in females.

Kaplan–Meier survival curves showing time to irreversible EDSS milestones based on the presence (1, 2) or absence (0) of the minor allele at each locus by sex. (A) rs9643199 intronic to MTSS1 time to irreversible EDSS 3 in females (aHR 1.35, P = 0.006). (B) rs9643199 intronic to MTSS1 time to irreversible EDSS 6 in females (aHR 1.46, P = 7.09 × 10−4). (C) rs2776741 nearest to RCAN3AS time to irreversible EDSS 3 in females (aHR 0.77, P = 0.010). (D) rs2776741 nearest to RCAN3AS time to irreversible EDSS 6 in females (aHR 0.74, P = 0.005). (E) rs7070182 intronic to TCF7L2 time to irreversible EDSS 3 in males (aHR 0.59, P = 0.013). (F) rs7070182 intronic to TCF7L2 time to irreversible EDSS 6 in males (aHR 0.56, P = 0.005).
Validation assessment of previously published putative severity SNVs
We confirmed that homozygosity at the main European HLA-DRB1*15:01 tagging SNV, rs3135388 was associated with an earlier AAO (rs3135388, 29.2 versus 30.4 years, P = 0.005). In addition to DRB1*15:01, we tested 116 putative non-HLA SNV associations with cross-sectional MSSS measures, disease severity and AAO. We were able to replicate the association between rs868824, intronic to IMMP2L on chromosome 7, with AAO (β = −1.0935 years; P = 4.31 × 10−4), however, no other putative severity variant met or surpassed the Bonferroni-corrected replication threshold (P = 4.31 × 10−4, Supplementary Table 12).
Multiple sclerosis susceptibility allele association with severity phenotypes
The HLA-DRB1*15:01 allele was not associated with disease severity as per l-ARMSS, nor l-MSSS, nor was any other SNV-genotyped HLA allele (HLA-DRB1*03:01, HLA-DRB1*130:01, HLA-DRB1*08:01, HLA-DQB1*03:02, HLA-A*02:01, HLA-DQA1*01:01, HLA-B*44:02 and HLA-B*55:01).
We also sought to determine whether there was an association between known multiple sclerosis susceptibility variants (wGRS), and our severity phenotypes of interest. We found weak positive correlations between the multiple sclerosis susceptibility wGRS and l-ARMSS (r = 0.07, P = 0.003); l-MSSS (r = 0.03, P = 0.19) and a weak negative correlation with AAO (r = −0.08, P = 0.0005). We did not find an association between l-AMRSS or l-MSSS and the known non-HLA autosomal risk variants3 that were directly genotyped (198/200), P > 1 × 10−3.
Discussion
Two of the fundamental, unanswered questions with respect to relapsing-remitting multiple sclerosis are: What is the source of the marked clinical disease heterogeneity? Can we use genetic and other information to predict multiple sclerosis outcomes?
Through a series of analyses that took advantage of a unique, multicentre, prospectively ascertained, longitudinal, and deeply phenotyped clinical dataset,19 we can shed some light on the genetic architecture that underpins multiple sclerosis clinical heterogeneity. Our primary, unbiased GWAS analyses demonstrate that there are no common variants with moderate to large effect sizes that contribute to multiple sclerosis severity. With time, and very large cohorts, we will probably confirm that multiple sclerosis severity is at least partially determined by polygenic mechanisms of small effect size. Alternatively, we may find that variants that influence severity may be time-variable, rather than having a constant effect.38 Importantly, our results indicate that disease outcomes are not under strong genetic control. Our results demonstrated that common genetic variants explained only 20% of severity heritability, with wide error margins. Therefore, suggesting that, as clinical experience shows, outcomes are, to an extent, modifiable with appropriate and early DMT intervention.39–41 This is highlighted in our study, where half of the severe cohort were diagnosed with multiple sclerosis before the introduction of multiple sclerosis DMT, compared to a quarter of those in the mild cohort. This is further underscored in epidemiological studies in the modern era where, with the introduction of DMT, rates of disability accumulation have slowed, and fewer disabling cases are being seen, relative to historical cohorts.42–45 Future pharmacogenomic studies46 may prove to be invaluable to guide precise prescription practices to further slow progression or modify disease outcomes.
Our study was specifically designed to identify genetic associations related to relapse-independent disease progression, a recently accepted multiple sclerosis phenomena,47,48 in a cohort of people with relapse-onset multiple sclerosis. The top signals that emerged from both l-ARMSS and l-MSSS analyses were variants intronic to SEZ6L, a gene with CNS function. The integration of the top 30 SNVs identified in our GWAS analyses into hard EDSS disability milestone survival analyses, determined that six of these variants had additional evidence for severity modulation, and therefore are probably true associations. Five of these variants were intronic to, or near genes implicated in CNS function. These again included variants intronic to SEZ6L, together with: rs295254 intronic to RCL1, rs9643199 intronic to MTSS1, rs2776741 nearest to RCAN3AS and rs7070182 intronic to TCF7L2. The latter three having sex-specific effects. We also identified variants implicated in mitochondrial function: rs9643199 intronic to MTSS1 (having pleiotropic roles) and additionally rs111399831, nearest to SUCLA2. The HRs associated with reaching irreversible EDSS 3 or 6, conferred by carriage of the minor allele at each SNV, were consistent with the effect sizes of these variants in our GWAS analyses: that is, effect sizes were small, but significant in survival analyses. The variants we described in our analyses are intronic to, or near to genes, and are likely to be tagging rather than causal. However, importantly, these variants identify highly biologically and clinically plausible leads for potential replication of clinical heterogeneity in similar or larger cohorts.
The SEZ6L gene is implicated in dendritic spine density variation, and arborization in the hippocampus and somatosensory cortex.37 Disruptions in SEZ6L cause neurodevelopmental, psychiatric and neurodegenerative conditions, as well as having a role in motor function.37,49,50 While MTSS1 has been reported to associate with mitochondrial function, it has primarily been described in the context of B-cell mediated immunity,51 and various CNS pathologies.52,53 Most relevant perhaps to multiple sclerosis outcomes, is the association between MTSS1, cortical volume and Purkinje cell dendritic arborization.54,55 Progressive synaptic loss, is a hallmark of multiple sclerosis pathology56,57; evident in both acutely active demyelinating lesions,58 as well as chronic inactive lesions.59 It has been shown that loss of synaptic density is associated with network dysfunction,57 implicating a failure of synaptic plasticity to compensate for immune-mediated neural damage. It is therefore plausible that the variants identified in this study implicate a genetic susceptibility to impaired compensatory mechanisms, or impaired neural survival in those with severe multiple sclerosis. This of course requires independent validation but raises an intriguing new line of enquiry, and provides a rationale for targeted drug discovery in this space.
Interestingly, we identified a variant intronic to TCF7L2 as associated with severity in males, although it is possible this sex-specific association reflects lack of study power, rather than a genuine sex-specific effect. TCF7L2 is a transcription factor involved in Wnt signalling pathways, and associated with neurodevelopmental disorders.60 Critically, in the context of our study, TCF7L2 has been shown to maintain oligodendrocyte progenitor cells in the progenitor state, acting as a molecular switch that can inhibit Wnt signalling to promote oligodendroglial differentiation.61 Interestingly, gene set enrichment analyses revealed that both Wnt signalling pathway components together with progesterone response pathways were enriched in our analyses. Progesterone is a known regulator of myelin development, as well as having neuroprotective effects,62 lending support to the notion that genetic susceptibility to impaired remyelination predisposes to more severe multiple sclerosis.
Our tissue enrichment analyses specifically pointed towards genes enriched in cerebellar function. The cerebellum plays a key role in motor coordination as well as cognition.63 It has long been held39 and recently confirmed,64 that cerebellar signs and symptoms are a predictor of poor prognosis in multiple sclerosis.
In our secondary analyses, we replicated the association between the main multiple sclerosis risk allele HLA-DRB1*15:01 and AAO.4,6 Further, ours is the first study to replicate rs868824, intronic to IMMP2L,10 as being associated with AAO in a targeted analysis. IMMP2L, an inner mitochondrial membrane protease, has been associated with cellular senescence,65 oxidative stress66 and neurological disorders.67–70 Recent evidence points to accelerated cellular senescence and biological ageing in people with multiple sclerosis,71–73 and suggests that these factors may reduce remyelination capacity.71 The validation of the association of rs868824 with AAO, is a first step towards understanding the potential biological mechanisms underlying accelerated cellular senescence in multiple sclerosis. In addition to IMMP2L, we described two further variants associated with mitochondrial function, where mitochondrial dysfunction is a recognized pathophysiological hallmark of CNS injury in multiple sclerosis.74,75
The results of our analyses therefore point to highly relevant and biologically plausible genetic explanations for clinically observed disease heterogeneity. Although of course, the complex interplay between genes and the environment probably additionally plays a significant role in outcome modulation. It has been shown that disability accumulation may be modified by additional lifestyle factors such as pregnancy40 and smoking cessation.76 Epigenetic studies may therefore shed further light on relevant, modifiable mechanisms that regulate multiple sclerosis outcomes.
Regarding the utility of genetic information to predict future outcomes, our ML algorithm was unable to provide additional biological insights into the underlying genetic architecture of multiple sclerosis severity, instead reinforcing that common SNVs independently contribute miniscule weights towards determining multiple sclerosis severity. Regardless, ML was able to predict non-linear effects and large SNV clusters that can accurately classify multiple sclerosis outcomes, and may prove to be of prognostic utility. While we retained the multiple sclerosis susceptibility wGRS in our algorithm, it was not a major contributor to our predictive model, again consistent with our findings here, and past reports demonstrating that multiple sclerosis risk variants have little influence on severity outcomes.7 The classification accuracy of our ML algorithm was not materially impacted with the addition of clinical variables available at onset, nor sex. However, AAO was one of the strongest predictors of outcome in our ML models, consistent with past reports.40 Our ML algorithm achieved PPVs for outcome assignation of 80% and NPVs of 88%, vastly outperforming our algorithm containing only demographic and clinical variables available at onset, which performed no better than chance. To our knowledge, this is a world first for multiple sclerosis genetic studies. While a previous ML study successfully predicted multiple sclerosis severity,77 this was predicated on health records, and data that take years to decades to obtain, e.g. change in clinical parameters between years ‘x’ and ‘y’ to predict ‘z’. The variables included in our classification algorithm are readily available at disease onset. With the rapid decrease in the cost of beadchip genotyping, and high PPV and NPV we achieved, our ML algorithm could readily translate into clinical practice on validation in an independent cohort.
Our multicentre study was conducted using rigorously defined and prospectively collected longitudinal clinical and treatment data from the MSBase Registry, making our cohort globally unique. Our study participants were recruited from outpatient settings at tertiary-referral centres, with all patients meeting study inclusion criteria offered enrolment. Our severity assignation was based on a larger cohort of almost 6000 patient records. Those classified as having mild multiple sclerosis had a median l-ARMSS of 1.5, and those with severe disease had a median l-ARMSS of 8.6 representing genuine outcome extremes. However, we acknowledge that many of those in the severe group were diagnosed with multiple sclerosis on average 5.5 years earlier (1995) than those with mild disease (2001), prior to the introduction and wide availability of DMT. Therefore, the severe group, in part, represents a cohort that was largely untreated or undertreated during the critical period immediately after diagnosis where DMT are known to be most effective.39,78 However, over 25% of the severe cohort were DMT-treated for over 90% of their disease course, representing those with genuinely severe disease. On the other hand, the mild cohort could represent a cohort of treatment responders. This differential exposure to DMT may have therefore influenced our findings, and affected our longitudinal severity scores. However, all phenotypic outcome studies in the modern era cannot escape DMT effects. We tried to overcome this limitation by adjusting for DMT use in both our GWAS and survival analyses.
Owing to the unique nature of our cohort, we were unable to validate our results in an equivalent dataset, therefore, the data presented herein require future independent validation. We tried to overcome this limitation by testing top SNVs using alternate definitions of disease severity, namely analyses of time to irreversible disability milestones. However, we recognize that these different disability measures are somewhat inter-related, and so this limitation cannot be fully overcome without an independent replication cohort. Similarly, our ML analyses were performed using a conservative 70/30 training/testing split relative to the traditional 80/20 split, accompanied with internal bootstrapping to reduce model overfitting. We acknowledge, however, that we were unable to include MRI features at diagnosis in our ML algorithm that undoubtedly add clinical utility for prognostication. Further, we had insufficient numbers to add oligoclonal band status to our algorithm as this is not routinely assessed at diagnosis in Australia. Our efforts to expand our cohort for future analyses are ongoing, together with expansion to include ethnically diverse groups.
Here we report an important milestone in our progress towards understanding the clinical heterogeneity of multiple sclerosis outcomes, implicating functionally distinct mechanisms to multiple sclerosis risk. We demonstrate that common genetic variants of moderate to large effect sizes do not contribute to multiple sclerosis severity. In secondary sex-stratified analyses, we identified two genetic loci that surpassed GWAS significance thresholds, providing evidence of sex dimorphism in multiple sclerosis severity. We identified a further six variants with strong evidence for regulating clinical outcomes. We observed an overrepresentation of genes expressed in CNS compartments generally, and specifically in the cerebellum. These involved mitochondrial function, synaptic plasticity, oligodendroglial biology, cellular senescence, calcium and G-protein receptor signalling pathways, confirming a functional dichotomy of signals and pathways that regulate multiple sclerosis severity versus risk, further informing our understanding of multiple sclerosis pathogenesis. Importantly, we demonstrate that ML using common SNV clusters, together with clinical variables readily available at diagnosis can improve prognostic capabilities at diagnosis, and with further validation has the potential to translate to meaningful clinical practice change.
Acknowledgements
We thank all the people with multiple sclerosis who participated in this research without whom this work would not be possible. We would also like to acknowledge the patients and the Biobank Nodo Hospital Virgen Macarena (Biobanco del Sistema Sanitario Público de Andalucía) integrated in the Spanish National biobanks Network (PT20/00069) supported by ISCIII and FEDER funds, for their collaboration in this work. The authors would like to acknowledge research nurses Ms Jo Baker, Ms Jodi Haartsen, Ms Sandra Williams, Ms Lisa Taylor for assisting with blood collection for this study and Ms Malgorzata Krupa for research assistance. Further, the authors acknowledge the Center for Genome Technology within the University of Miami John P. Hussman Institute for Human Genomics for generating all the MegaEx array genotype data for this project and specifically Anna Konidari for overseeing the genotyping efforts and assistance with the Illumina custom design process. E.H., D.H. and P.K. are supported by the Czech Ministry of Education, project PROGRES Q27/LF1. F.M. and A.A. receive support from the Agencia Española de Investigación (AEI)-Fondos Europeos de Desarrollo Regional (FEDER) (PID2019–110487R-C21) and Junta de Andalucía (P18-RT-2623).
Funding
This work was supported by a Research Fellowship awarded to V.G.J. from Multiple Sclerosis Research Australia (16–0206), and research grant support from the Royal Melbourne Hospital (MH2013–055), Charity Works for MS (2012 Project grant), the MSBase Foundation and Monash University.
Competing interests
The authors report no competing interests.
Supplementary material
Supplementary material is available at Brain online.
References
Author notes
Dana Horakova and Helmut Butzkueven contributed equally to this work.