Neuromuscular disease genetics in under-represented populations: increasing data diversity

Abstract Neuromuscular diseases (NMDs) affect ∼15 million people globally. In high income settings DNA-based diagnosis has transformed care pathways and led to gene-specific therapies. However, most affected families are in low-to-middle income countries (LMICs) with limited access to DNA-based diagnosis. Most (86%) published genetic data is derived from European ancestry. This marked genetic data inequality hampers understanding of genetic diversity and hinders accurate genetic diagnosis in all income settings. We developed a cloud-based transcontinental partnership to build diverse, deeply-phenotyped and genetically characterized cohorts to improve genetic architecture knowledge, and potentially advance diagnosis and clinical management. We connected 18 centres in Brazil, India, South Africa, Turkey, Zambia, Netherlands and the UK. We co-developed a cloud-based data solution and trained 17 international neurology fellows in clinical genomic data interpretation. Single gene and whole exome data were analysed via a bespoke bioinformatics pipeline and reviewed alongside clinical and phenotypic data in global webinars to inform genetic outcome decisions. We recruited 6001 participants in the first 43 months. Initial genetic analyses ‘solved’ or ‘possibly solved’ ∼56% probands overall. In-depth genetic data review of the four commonest clinical categories (limb girdle muscular dystrophy, inherited peripheral neuropathies, congenital myopathy/muscular dystrophies and Duchenne/Becker muscular dystrophy) delivered a ∼59% ‘solved’ and ∼13% ‘possibly solved’ outcome. Almost 29% of disease causing variants were novel, increasing diverse pathogenic variant knowledge. Unsolved participants represent a new discovery cohort. The dataset provides a large resource from under-represented populations for genetic and translational research. In conclusion, we established a remote transcontinental partnership to assess genetic architecture of NMDs across diverse populations. It supported DNA-based diagnosis, potentially enabling genetic counselling, care pathways and eligibility for gene-specific trials. Similar virtual partnerships could be adopted by other areas of global genomic neurological practice to reduce genetic data inequality and benefit patients globally.


Introduction
Neuromuscular diseases (NMD) affect an estimated 15 million children and adults globally. 1They cause shortened life expectancy or chronic lifelong disability with personal and economic impact.Although individually rare, collectively they account for approximately ∼20% of all non-infectious neurological diseases.In low-to-middle income countries (LMICs), NMD prevalence and incidence are under-reported, as diagnosis is limited to a few specialist centres which may be geographically distant from much of the eligible population.
In high-income settings, improved diagnostics, especially genetic analysis, have delivered important advances in patient care pathways.Many interventions enabled by an accurate DNA-based diagnosis are relatively inexpensive, including genetic counselling, tailored application of widely-available medicines, screening for known complications (e.g.cardiac, respiratory, gastroenterological, metabolic) and physiotherapy.Genetic advances have also led to new advanced therapies, for example RNA targeting approaches (for spinal muscular atrophy, SMA) and AAV-mediated gene therapies or trials for SMA and Duchenne's muscular dystrophy (DMD). 2,3 key challenge to realizing DNA-based diagnostic benefits for patients in LMICs is that ∼86% of published genomic studies are derived from populations of primarily European ancestry, and non-European populations are under-represented in control databases.4 Knowledge of genetic diversity outside European populations is limited.[5][6][7] Improved understanding of NMD distribution and associated phenotypic and genetic variability requires a large, diverse, accurately-phenotyped cohort of patients and families with linked genetic data.Additional cohort benefits include generating allele frequency data for under-represented populations to aid variant classification, and potential to connect participants to clinical trials and novel therapies.
Here we describe our approach and genetic results in setting up a NMD genomic medicine partnership across 18 centres spanning seven countries.

Structure of the International Centre for Genomic Medicine in Neuromuscular Diseases
The International Centre for Genomic Medicine in Neuromuscular Diseases (ICGNMD) was launched in June 2019 and is ongoing.Partner sites (Supplementary Table 1) established aligned, locally-approved studies to recruit NMD patients and relatives to an international cohort and share materials and data in full compliance with all ethics and legislation (for further details about ethics and data storage and sharing see the Supplementary material).The inclusion criterion for participants was a suspected inherited neuromuscular disease clinically diagnosed by a trained clinical neurologist, or being a close relative.Participants with a local genetic diagnosis could be included, or unsolved local whole exome sequencing (WES) data reanalysed.UK partner sites may also recruit participants (typically with existing genetic test data) living in the UK; however, this paper considers only participant data from low-to-middle income partner sites.
The international regulatory and ethics landscape is complex and securing all regulatory and ethical approvals to balance data and material accessibility with patient rights was highly challenging and required the nuanced input of experienced local teams.
Building future genomics medicine capacity was crucial to all partners realising precision medicine benefits.Therefore, 17 fellows were appointed to support recruitment, data entry and results interpretation.Twelve were based in LMICs and all are pursuing their careers locally.Fellows were assigned one LMIC Principal Investigator (PI) and one UK PI for mentorship and capacity building; supervision ran alongside regular remote training and data interpretation and was considered highly effective by both fellows and PIs.
In-person face-to-face study induction training of all neurology fellows focused on standardized data entry to the ICGNMD REDCap database, 8 including Human Phenotype Ontology (HPO) terms, 9 standard clinical assessment scales and summary genetic data (for database instrument, see the Supplementary material), followed by regular online refresher training.As not all sites could access all investigations (e.g.MRI often unavailable), the only mandatory data entries were: proband or relationship to proband, diagnostic category (provisional clinical diagnosis), sex, age at recruitment and positive and negative HPO terms.Repeat measurements (e.g.blood creatine kinase) and progression indicators could be recorded but the study was primarily cross-sectional, reflecting challenges of re-evaluating participants who may struggle to travel to clinics.

ICGNMD genetic analysis and data report generation
After consent and data collection, international, remote expert group 'genetic analysis decision' meetings discussed the most appropriate initial genetic analysis.These meetings, attended by all PIs, also served as a Fellows' training forum.Clinical phenotype and any investigational data underpinned the decision to apply specific single gene tests (e.g.MLPA for SMA) or WES (Supplementary material), with optional genotyping arrays to detect large structural or copy number variation and/or for linkage analysis.Testing was typically proband first, and extended to relatives if needed and/or available (Fig. 1).Partners in Brazil, South Africa and Zambia sent DNA to the UK; however, partners in India and Turkey generated pseudonymized raw data to agreed standards and shared this for centralized analysis.
Single gene test (SGT) results were reviewed by trained staff at ICGNMD partner sites.Raw WES data were analysed via a common bioinformatics pipeline (Supplementary material), with ICGNMD Fellows supported to interpret and present genetic results.WES candidate variant prioritization followed a modified protocol developed by the 100 000 Genomes Project. 10 Initial analysis focused on variants with a minor allele frequency of of <0.01 (autosomal recessive inheritance) and <0.001 (autosomal dominant inheritance) within a subset of genes present in expertly reviewed gene panels (https:// panelapp.genomicsengland.co.uk/) aligning to the participant HPO terms and phenotype.If no significant genetic variants were reported, extended pipeline analyses (structural and copy number, mitochondrial, repeat expansion, de novo) were applied.To maximize reproducibility, open-source and well maintained software tools and databases were implemented. 11,12Given their relatively limited non-European ancestry data and lack of subpopulation resolution, e.g. for specific Indian and African ethnicities, we supplemented large-scale population data resources including gnomAD 13 with additional allele frequency data generated in local populations [14][15][16][17][18] and the growing ICGNMD in-house dataset.
Prioritized variants were reviewed, and potentially causative variants were classified using American College of Medical Genetics (ACMG) criteria. 19The outcome was classed as 'solved' where pathogenic/likely pathogenic variant(s) were identified and fitted with the phenotype (two variants/homozygous in recessive disorders).The outcome was classed as 'possibly solved' if there was a strong candidate variant (two variants/homozygous recessive disorders) based on population frequency (<0.01%frequency), bioinformatic predictions and clinical phenotype, but at least one variant was classified as a variant of uncertain significance (VUS) according to ACMG criteria.Where further manual curation was performed for subgroups of disease categories, rare variants considered relevant for each proband were reviewed against gnomAD, ClinVar, DECIPHER, VarSome, PubMed and Google to ascertain if previously reported.Variants were classified as 'novel' if absent from all these sources.

Results
Despite the SARS-CoV-2 pandemic, we established a phenotyping, genetic analysis and data sharing platform connecting centres across Brazil, India, Netherlands, South Africa, Turkey, UK and Zambia.We developed remote training and global webinars to discuss participants and supported decision-making for genetic analysis and result interpretation.As of January 2023, 6001 participants (including 3631 probands) had consented and provided DNA (Supplementary Table 2).The majority were in India (3578 participants, 60% of the total), followed by Brazil (979 participants, 16%), South Africa (737 participants, 12%), Turkey (578 participants, 10%) and Zambia (129 participants, 2%).The cohort included 337 (9% probands) participants 'locally genetically solved' or with existing genetic data to review at study start.The majority (3294, 91%) of participants had no previous genetic test.
Sixty-five per cent of participants were male, 35% female.The median age at proband recruitment was 26 years of age with 35% of the cohort aged 18 or under (Supplementary Fig. 1).Using 1000 Genomes populations as a background for ancestry estimation, 82% of individuals tested by WES were of non-European ancestry (Supplementary Fig. 2).Based on current recruitment, we estimate cohort size will exceed 10 000 at Year 5 end; June 2024.

Phenotypic spectrum
We recruited a people with a broad range of NMDs (Fig. 2 and Supplementary Table 3).Total recruitment to mid-January 2023 by initial clinical diagnosis included 18.1% limb girdle muscular dystrophy (LGMD), 15.5% genetic peripheral neuropathies (PN), 9.4% congenital myopathy or congenital muscular dystrophy (CM/ CMD) and 8.6% Duchenne muscular dystrophy or Becker muscular dystrophy (DMD/BMD).Other categories each contributed less than 7%.The four most common NMD categories were in line with those reported by centres worldwide. 1

Genetic data results
We report 978 new genetic analyses (including 547 proband WES and 274 proband SGTs) by January 2023, following the process in Fig. 1.Fifty per cent of probands receiving a SGT were solved (Supplementary Table 4).The first proband WES data review identified 223 (41%) variants to 'solve' and 83 (15%) variants to 'possibly solve' participants' NMDs (combined WES solved/possibly solved rate 56%).Single gene tests and WES combined yielded 43.8% 'solved' outcomes.Below are genetic summary data following indepth review of proband single gene test and WES results for the four clinical diagnostic categories with highest recruitment levels (PN, LGMD, CMD/CM and DMD/BMD) to demonstrate project value at Year 4 stage.These categories combined represent 1875 of 3631 study probands (51.6% of cohort) and 340 of 547 (62%) exomes available in January 2023, plus 182 single gene tests.

Discussion
Recent years have seen dramatic advances in gene discovery and genetic diagnosis in NMD.Resulting patient benefits include accurate diagnosis, genetic counselling, improved care pathways including complication screening and prevention, and potential access to clinical trials and disease modifying genetic therapies.However, these benefits and impacts have so far been limited or non-existent in lower-income contexts, despite most NMD patients living in LMICs.Here we explored a principally remote method of connecting academic partners and building international capacity for cohort development and genetic analysis of NMD patients in LMICs.
We harnessed key features of genetic analysis, i.e. that samples can be collected remotely and shipped for DNA extraction at relatively low-cost, enabling inclusion of geographically-dispersed patient populations and economies of scale.We took advantage of recent computational tools and high-performance clusters to enable efficient, remote processing of sequence data.We used lowcost cloud-based databases to support rapid and secure sharing of phenotypic data between distant sites.Specifically, we (i) tested feasibility of a distributed transcontinental genomic medicine partnership to build diverse, deeply-phenotyped and genetically characterized cohorts; and (ii) evaluated deploying this partnership and cohort to understand genetic architecture and advance diagnosis.We report 3600 probands and 6001 participants who together represent the first recruits to a new global cohort including previously under-represented populations.The network of trained fellows working with local PIs and PIs in the UK assembled a deeply phenotyped cohort of children and adults with NMDs.Over 3600 probands' detailed clinical phenotype and medical histories were recorded in the REDCap database after 43 months (June 2019 to January 2023), and over 2300 affected and unaffected (mainly first degree) family members.The male:female proband ratio (mean 1.86) is higher than reported by other NMD registries. 20This may be influenced by referral patterns to some recruitment sites and/or socio-economic circumstances differentially impacting ability to attend appointments. 21he data indicates patients with a wide spectrum of neuromuscular diseases joined the ICGNMD study, with a frequency broadly similar to reports from European centres. 1 The most common NMD clinical diagnoses were genetic peripheral neuropathies, LGMD, CM/CMD and Duchenne or Becker muscular dystrophy, together comprising over half the cohort.
Genetic peripheral neuropathies' solved rate of 46% included single gene tests for common genetic causes.4][25] There were more probands with GJB1 variants (29 participants, 16%) than the PMP22 duplication (14 participants, 8%) in our cohort, whereas other cohorts describe a higher rate of PMP22 duplication participants, usually greater than 50% (e.g.US/UK/European study 61% PMP22 duplication; 10% GJB1 variants). 25A possible cause of this is that 114/181 probands in the current study were from Brazil, where PMP22 duplication was excluded in many enrolled participants.
Congenital myopathies and muscular dystrophies yielded a comparable 'solved' rate of 53%, spanning 16 genes, including known variants in STAC3, RYR1 and COL6A2/3 in addition to 21 novel variants across 14 genes.There is a notable lack of solved patients with COL6A1-and TTN-related CM/CMD (two of the most common forms) [26][27][28][29] and further WES is underway.
The dystrophin gene diagnostic rate was high (98%), with MLPA and WES contributing 86% and 14% of diagnoses, respectively.Deletions and duplications were most common (87% solved participants), in keeping with previous studies. 30,31Comparisons between Indian and South African cohorts identified differences in both types of genetic variants (deletions, duplications, nonsense, splice variants) and their distribution (including intronic breakpoints) within the DMD gene.The South African cohort demonstrated a higher number of duplications and nonsense mutations and a higher proportion of intronic breakpoints in the proximal 5′ end of the gene.This could be due to differences in cohort size and variation in patient recruitment (e.g.locally solved via MLPA versus unsolved patients), however a comparatively low proportion of large deletions in South African populations is reported. 32It will be important to interrogate reasons for this observation as there may be implications for applicability of exon-skipping therapies.
Overall, our data indicate that, depending on the NMD diagnostic category, 44-98% of patients in LMIC settings may receive an accurate genetic diagnosis with single gene tests and/or WES, creating potential for benefit.We increased the reported genetic diversity associated with NMD, since 1 in 3.5 mutations were novel variants.These data also indicate an additional 15% of probands with a strong VUS candidate and 'possibly solved' classification require further evaluation/functional studies to confirm pathogenicity with corollary benefits for discovery research and pharmaceutical insights.The 28% of probands for whom no convincing variants were identified represent an important new diverse discovery cohort for further analysis including whole genome and long read approaches.
The ICGNMD team and results depended on international collaborations established at a smaller scale over the preceding decade, building trust and mutual understanding of local populations, facilities and perspectives.Such collaborations benefited from local computing and data-and material storage infrastructure, small-scale pump-priming, initiatives promoting clinical and genetic data interoperability, and gene-matching platforms.The partnership has potential for additional bidirectional benefit, including enabling deeper understanding of VUS, which is relevant to NMD patients in all countries, and the partnership is a foundation on which to build expanded testing capacity, tailored care guidelines and clinical trial readiness.
In conclusion, we demonstrated that it is feasible to set up a virtual transcontinental partnership using a cloud-based platform and harness big data to describe phenotypes and causative variants present in a diverse cohort recruited from many LMIC settings.Over half of tested participants obtained a research-based genetic diagnosis, opening up potential benefts to patients of an accurate DNA diagnosis and demonstrating feasibility of including more diverse populations in clinical trials.We recognize there are limitations to this study.We did not seek to collect epidemiological data, and our study does not allow conclusions about incidence or prevalence of NMD or genes.On the other hand, this study is in a 'real-world' setting, reflecting current practice in each LMIC centre, and shows that despite limitations, a cross-NMD solved rate of 44-59% can be achieved in this previously genetically untested population.This work indicates that geographical inequalities of access to an accurate DNA-based diagnosis can potentially be addressed through such virtual partnerships.These have genuine bidirectional value to all partners and the wider research community.Increasing the knowledge of genetic diversity can improve reliable variant interpretation and

Funding
Funders had no role in study design, data collection and analysis, decision to publish or preparation of the manuscript.This work was supported by a Medical Research Council strategic award to

Figure 1
Figure 1 ICGNMD workflow with key nodes for international discussion at genetic analysis decision and results review.ICGNMD Fellows' training spans this pathway.DMD = Duchenne muscular dystrophy; FSHD = facioscapulohumeral muscular dystrophy; MLPA = multiplex ligation dependent probe analysis; SMA = spinal muscular atrophy; WES = whole exome sequencing; WGS = whole genome sequencing.

Figure 5
Figure 5 Distribution of Duchenne Muscular Dystrophy (DMD) variants.(A) Exon distribution of causative DMD genetic variants for the whole cohort.(B) Exon distribution of causative DMD genetic variants in (i) Indian and (ii) South African cohorts.Blue = deletion; grey = duplication; orange = splice; red = nonsense.

Table 1 Novel genetic variants identified in the genetic peripheral neuropathy (neuropathy) cohort
out of frame deletion.The prevalence of different types of DMD variant varied significantly (Fisher's exact test, P < 0.001) between Indian and South African cohorts.Of 64 solved Indian participants, 60 (94%) carried a deletion, three (5%) a nonsense variant and one (1%) a splice variant.No Indian patients had a duplication.Of the 40 solved South African participants, 24 (60%) carried a deletion, eight (20%) a nonsense variant, seven (17%) a duplication and one ACMG = American College of Medical Genetics; LP = likely pathogenic; VUS = variant of uncertain significance.

Table 2 Novel genetic variants identified in the congenital myopathy/congenital muscular dystrophy (CM/CMD) cohorts
ACMG = American College of Medical Genetics; LP = likely pathogenic; VUS = variant of uncertain significance.
Of the 85 patients with deletions, 44 (52%; 33 Indian and 11 South African) are potentially amenable to antisense oligonucleotide (ASO) exon skipping therapies.Twentynine patients (22 Indian, seven South African) carry deletions amenable to licenced exon skipping ASO therapies targeting exons 45(9), 51(14)and 53 (5); a single patient carried an exon 52 deletion amenable to either exon 51 or 53 skipping.Of the remaining 15 patients with deletions amenable to exon skipping, six (40%) would be amenable to skipping of exon 44, an ASO currently in clinical trials.