Association between de novo variants of nuclear-encoded mitochondrial-related genes and undiagnosed developmental disorder and autism

Summary Background Evidence suggests that mitochondrial abnormalities increase the risk of two neurodevelopmental disorders: undiagnosed developmental disorder (UDD) and autism spectrum disorder (ASD). However, which nuclear-encoded mitochondrial-related genes (NEMGs) were associated with UDD–ASD is unclear. Aim To explore the association between de novo variants (DNVs) of NEMGs and UDD–ASD. Design Comprehensive analysis based on DNVs of NEMGs identified in patients (31 058 UDD probands and 10 318 ASD probands) and 4262 controls. Methods By curating NEMGs and cataloging publicly published DNVs in NEMGs, we compared the frequency of DNVs in cases and controls. We also applied a TADA-denovo model to highlight disease-associated NEMGs and characterized them based on gene intolerance, functional networks and expression patterns. Results Compared with levels in 4262 controls, an excess of protein-truncating variants and deleterious missense variants in 1421 cataloged NEMGs from 41 376 patients (31 058 UDD and 10 318 ASD probands) was observed. Overall, 3.23% of de novo deleterious missense variants and 3.20% of de novo protein-truncating variants contributed to 1.1% and 0.39% of UDD–ASD cases, respectively. We prioritized 130 disease-associated NEMGs and showed distinct expression patterns in the developing human brain. Disease-associated NEMGs expression was enriched in both excitatory and inhibitory neuronal lineages from the developing human cortex. Conclusions Rare genetic alterations of disease-associated NEMGs may play a role in UDD–ASD development and lay the groundwork for a better understanding of the biology of UDD–ASD.


Introduction
Undiagnosed developmental disorder (UDD) and autism spectrum disorder (ASD) have considerable overlap and encompass a diverse range of conditions characterized by impaired development of the central nervous system. 1 Their etiology involves pathophysiologic abnormalities in gene-sensitive individuals, with mitochondrial dysfunction possibly affecting a significant subset of children with UDD-ASD. 2,3However, the driving mechanisms of UDD-ASD with mitochondrial dysfunction are still not fully understood.The pathophysiological conditions of mitochondrial dysfunction may be caused by defects in mitochondrial proteins encoded by NEMGs or by changes in mitochondrial import mechanisms.
A previous study has reported that polymorphisms in NEMGs, such as SLC25A12 and PARK2, are associated with ASD. 4 Furthermore, a heterozygous variant in SCN2A (þ/p.R607 � ) was found to reduce synaptic activity, which in turn affects mitochondrial function and results in ATP and energy storage dysfunction. 5he NEMG VARS2 encodes a mitochondrial aminoacyl-transfer RNA synthetase that catalyzes the attachment of valine to transfer RNA for mitochondrial translation.VARS2 mutations are reported to cause combined oxidative phosphorylation (OXPHOS) deficiency and early-onset mitochondrial encephalopathies. 6More recently, Replogle et al. 7 reported that knockdown of transcriptional regulation factor (TEFM) and RNA degradation (PNPT1) lead to significant changes in the mitochondrial genome.These findings suggest that interference with the cellular mitochondrial crosstalk process can lead to impaired mitochondrial ATP production, leading to or contributing to the development of the disease.Therefore, dysfunctional homeostasis of cell mitochondria may involve nucleusmitochondria crosstalk caused by genetic variants.
Although some genetic NEMG variants have been described in patients with neurodevelopmental disorder and mitochondrial dysfunction, correlation studies have been inconclusive, likely due to clinical and genetic heterogeneity between study groups and small sample sizes. 8Advances in whole-exome and wholegenome sequencing of large cohorts have increased the recognition of NEMG mutations in the human genome.Meanwhile, de novo variants (DNVs) with relatively stronger pathogenic effects are well-established causes of ASD and UDD. 9 Accordingly, whether the DNVs in NEMGs are associated with UDD-ASD pathology remains unclear.
In this study, we compared coding DNVs in NEMGs between cases and controls and investigated the relationship between DNVs in NEMGs and genetically complex UDD-ASD.Furthermore, by performing integrative analysis, we prioritized potential diseaseassociated NEMGs that may contribute to elucidating the neurobiology of UDD-ASD (Figure 1).The study findings are expected to assist with patient stratification in clinical research and further aid in the development of more adequate treatment strategies for UDD-ASD.

Data collection and annotation
We manually curated NEMGs (Supplementary Table S1) from two mainstream mitochondrial databases: MitoCarta3.0 10 and MitoMiner. 11MitoCarta 3.0, a comprehensive inventory consisting of 1136 human genes that encode proteins with detailed information on mitochondrial localization, sub-mitochondrial compartments and the assignment of genes to a customized ontology comprising many mitochondrial pathways (such as OXPHOS, mitochondrial apoptosis, mitochondrial autophagy and mitochondrial biosynthesis).We also collected a list of 1329 genes based on the 'Integrated Mitochondrial Protein Index' for known mitochondrial proteins available through MitoMiner (IMPI-2020-Q3pre).MitoMiner serves as a robust platform for investigating mitochondrial localization, providing an exceptional amalgamation of experimental sub-cellular localization datasets, tissue expression profiles, predictions of mitochondrial targeting sequences, comprehensive gene annotation and valuable links to phenotype and disease associations.Mitochondrial genes in these groups were excluded to form 1421 unique NEMGs (Supplementary Table S1).
All DNVs identified in patients with UDD-ASD and controls were downloaded from the Gene4Denovo database (http://www.genemed.tech/gene4denovo/home).The number of samples and DNVs from each cohort and other information on the cases and controls are listed in Supplementary Table S2.Only exonic DNVs in the NEMGs (Supplementary Table S3) were used for subsequent analysis.All variants were annotated with ANNOVAR. 12Proteintruncating variants (PTVs) consisting of frameshift, stop-gain/loss and canonical splice site disruptions and deleterious missense variants (Dmis) were considered 'damaging DNVs' (dDNVs).

Burden and contribution analysis of PTVs and Dmis in NEMGs
The denovolyzeR 13 tool was employed with default parameters to evaluate the enrichment for three variant classes (synonymous variant, missense variant and PTV), aiming to determine whether NEMGs exhibited a higher number of DNVs among patients with UDD-ASD compared to what would be expected by chance.This R package calculates the enrichment value by dividing the number of observed DNVs by the expected number of DNVs.Given that the distribution of DNVs follows a Poisson distribution, the expected number of DNVs for each variant class in the case/control groups was determined by the Poisson distribution and the population size.The P-values are calculated as the tail-probability of the expected distribution, derived either through Poisson approximations or simulations, under the null hypothesis of no association between each variant class and disease status.Compared with NEMGs, burden analysis of de novo PTVs and missense variants in five gene sets (Supplementary Table S4) were also evaluated in UDD-ASD, UDD and ASD.Given that the sample size of the case and control groups differed greatly, we performed 10 000 downsamplings to randomly select cases with the same control sample size; then, we used the permutation test to report the empirical P-values.In addition, we used 'ascertainment differentials', 14 which are defined as the difference in the frequency of a specified variant type between the case and control groups, to estimate the contribution of NEMGs PTVs and Dmis to UDD-ASD patients.

Prioritizing disease-associated NEMGs using a statistical model
Previous exome analyses have shown that even in unrelated individuals, the number of mutations carried in the same gene can provide considerable statistical power for the establishment of associations. 9Therefore, we used the transmitted and de novo association (TADA) 15 test to prioritize genes associated with UDD-ASD.Under the null hypothesis, the TADA-denovo model generates random mutational data based on the background DNV rate of each gene of PTVs and Dmis and then calculates a P-values and a false discovery rate (FDR).The TADA-denovo model requires the following main parameters: the background DNV rate per gene of PTVs/Dmis and the number of disease risk genes of UDD-ASD, as described by He et al. 15 for estimators.For each disease-associated NEMG, we used the loss-of-function observed/expected upper bound fraction (LOEUF) score and the missense constraint scores (mis_Z scores) 16

Hub genes identification
We visualized the PPI network using Cytoscape v3.7.2 (https:// cytoscape.org/)and used the CytoHubba plug-in to identify the top 10 hub genes by 12

Permutation test
High-confidence UDD-ASD risk genes were extracted from Kaplanis et al., 17 Satterstrom et al., 9 Wilfert et al. 18 and the SFARI database (genes with confidence levels of 1 and 2; https://www.sfari.org/),with a total of 627 genes being identified.We used PPI data from STRING to investigate whether the connections among the disease-associated NEMGs were statistically significant beyond random expectation for the 627 previously identified genes and three gene sets.PPI with a confidence score >600 in STRING was considered to be connected.The empirical P-values were calculated using 1 000 000 permutation tests.

Expression pattern analysis of genes in the human brain
We used co-expression module data from Parikshak et al. 19 to analyze the BrainSpan developmental RNA-seq data (www.brainspan.org).A total of 17 co-expression modules previously reported to be associated with the prenatal and postnatal cerebral cortex (excluding the M7 module, which is the gray module housing all non-co-expressed genes) were analyzed to determine whether each module was significantly enriched within each gene set (NEMG ASD , NEMG UDD , NEMG ASD-UDD and NEMG TADA-meta ) using Fisher's exact test and P-values.Enrichment of all gene sets was used in R (R version 4.0.2):fisher:test ðmatrix ðc ðx1; x2; x3; x4Þ; alternative ¼ € two:sided € ÞÞ; where x1 is the number of each gene set expressed in the module, x2 is the number of each gene set not expressed in the module, x3 is the total number of genes expressed in the module minus x1 and x4 is the number of background genes minus x1, x2 and x3.In addition, a Wilcoxon signed-rank test was performed to compare the expression distribution of NEMG ASD , NEMG UDD and NEMG TADA-meta between the prenatal and postnatal periods using the BrainSpan developmental RNA-seq data.Gene ontology (GO) enrichment analysis was subsequently conducted using Metascape 20 to understand the biological significance of the NEMG ASD and NEMG UDD gene sets.Only the terms with a minimum count of three, P adjust value <0.05, and enrichment coefficient >1.5 were significant.
To evaluate the enrichment of specific cell types in the developing human cortex, Satterstrom et al. 9 used data from Nowakowski et al., 21 which were divided into 25 cell-type clusters by t-distributed stochastic neighbor embedding analysis.Six celltype clusters were excluded because they could not be associated with the cell types.To assess cell-type enrichment in the developing human cortex, we collated these re-analyzed results in subsequent analysis.We performed an enrichment analysis to determine whether each cell cluster of the developing human cerebral cortex was significantly enriched within each gene set (NEMG ASD-UDD , NEMG ASD , NEMG UDD and NEMG TADA-meta ).

High DNV burden in NEMGs is associated with UDD-ASD
We curated 1421 NEMGs from two sources and excluded 13 genes encoded by mtDNA (Supplementary Table S1).We analyzed DNVs in the coding region of UDD-ASD cases (31 058 and 10 318 cases with the primary diagnosis of UDD and ASD, respectively) and 4262 controls and reannotated the information with the same pipeline as that used for our previous study. 22y comparing the proportion of individuals carrying DNVs in the UDD-ASD and control groups, we observed a statistically significant enrichment of PTVs and missense variants of NEMGs (PTVþmissense, Observed/Expected No enrichment in the UDD-ASD cohorts was observed regarding synonymous variants (O/E ¼ 0.97, P ¼ 0.78).No significant enrichment of any mutation category was identified in the controls (Table 1).Default parameters were used for denovolyzeR and Dmis was not assessed.A two-tailed Fisher's exact test showed that Dmis were significantly enriched in the UDD-ASD cohort (odds ratio [OR] ¼ 1.88, P ¼ 0.00066).This burden of PTVs and missense variants persisted after dividing into UDD and ASD cohort, but UDD was more significant than ASD (Table 1).Compared to five reasonable sets of genes that may play a role in UDD_ASD, the NEMG variants exhibited the lowest enrichment of de novo PTV and missense variants (Supplementary Table S4).In addition, we performed 10 000 downsampled permutation tests to overcome bias resulting from the larger sample size for UDD-ASD than that for the controls (PTV, 0.0014 � P permutation �0.0090; missense, P permutation �10 −3 ; synonymous, 0.16 � P permutation �0.35; Supplementary Table S5).Further studies based on larger control sample sizes should be conducted to validate our conclusions.Overall, these findings suggest that the increased dDNV burden in NEMGs may contribute to UDD-ASD risk.

NEMGs dDNVs contribute to UDD-ASD risk in �1.49% of cases
Because NEMG dDNVs were significantly more frequent in patients with UDD-ASD than in controls, we estimated the percentage of these variants that were associated with UDD-ASD risk.The rate of PTV was 0.0047 in controls and 0.0086 in probands, yielding an ascertainment differential of 0.0039.Therefore, we estimated that �3.20% ([(0.0086− 0.0047)/ 0.0086 � 1421]/20154) of PTV events in probands contributed to UDD-ASD risk in 0.39% of cases.For Dmis, the rate was 0.013 in controls and 0.024 in probands, with an ascertainment differential of 0.011.We estimated that only �3.23% ([(0.024− 0.013)/ 0.024 � 1421]/20 154) of Dmis events in probands contributed to UDD-ASD risk in 1.1% of cases.In total, NEMGs dDNVs contributed to UDD-ASD risk in �1.49% of cases.
The intolerance gene often has a low LOEUF score and a greater mis_Z score; therefore, PTVs and Dmis occurring within constrained genes are of great interest. 23We found that 25 of the 130 disease-associated NEMGs had LOEUF scores <0.35  (Supplementary Table S7).These results implied that �19% of the 130 identified disease-associated NEMGs were intolerant to dDNVs.In addition, we found that 130 disease-associated NEMGs significantly connected with each of the unshared 627 UDD-ASD risk genes (Supplementary Figure S1) and three gene sets (genes encoded Fragile-X mental retardation protein targets, chromatin genes and genes encoded postsynaptic density proteins) based on the co-expression data from the human brain and PPI data compared with random expectation (Supplementary Figures S2-S4).Furthermore, we identified the top 10 hub genes (PHB, ALDH18A1, MFN2, ATP5F1A, AKT1, ENO1, ECHS1, PDHA1, IDH2 and ATP5F1B) (Supplementary Table S8), which were revealed to be highly interconnected in a PPI network.

Gene sets exhibit distinct expression patterns in the human brain
We evaluated whether gene sets associated with UDD-ASD risk converged on common biological processes.To do so, we used the previously published BrainSpan spatial-temporal transcriptome data of the human brain and identified 22 co-expression modules. 19SD-associated genes were differentially enriched in five coexpression modules during the development process: M2, M3, M13, M16 and M17, as described in detail by Parikshak et al. 19 Genes in M2 and M3 were co-expressed during human cortical development and exhibited significant enrichment of transcriptional and chromatin regulators and genes harboring rare de novo proteintruncating and missense variants.Genes with high connectivity in M13, M16 and M17 were expressed in early cortical development and were enriched for genes encoding synaptic proteins.Notably, ASD-associated genes in M12 were only enriched for M13, M16 and M17.Genes in M16 were more strongly associated with ASD risk genes. 19Subsequently, we assessed whether our identified 130 genes (NEMG ASD , NEMG UDD and NEMG TADA-meta ) were significantly overrepresented within each of the co-expression modules.NEMG TADA-meta genes were most overrepresented in M16 (OR ¼ 2.49, P¼0.028) and M3 (OR ¼ 2.18, P¼0.022) (Supplementary Table S9 and Figure 2A).Separation of the 130 genes according to phenotypic effect (NEMG ASD , NEMG UDD and NEMG ASD-UDD ) revealed a significant enrichment of NEMG UDD and NEMG ASD genes in the M16 module (OR ¼ 3.83, P¼0.0035) (Supplementary Table S9 and Figure 2A) and M12 module (OR ¼ 10.27, P¼0.0046) (Supplementary Table S9 and Figure 2A), respectively.These data provided some evidence to support a slight difference between NEMG UDD and NEMG ASD genes.Significant enrichment was observed NEMG ASD- UDD in M3 module (OR ¼ 11.75, P ¼ 0.022) (Supplementary Table S9 and Figure 2A).NEMG UDD and NEMG ASD genes were enriched in different co-expression networks and showed different temporal expression patterns.An unbiased and systematic GO enrichment analysis further indicated that NEMG UDD and NEMG ASD genes are involved in processes related to the mitochondrial membrane, mitochondrial matrix, small molecule catabolic process and mitochondrion organization, whereas NEMG ASD genes contributed mainly to basic and general functions, such as brain development, monoatomic cation homeostasis, positive regulation of DNA metabolic process and ATP-dependent activity (top-ranked GO terms) (Supplementary Table S10).
Next, we used a Wilcoxon signed-rank test to assess the relative pre-and postnatal expression bias for the gene sets.NEMG TADA-meta ,  S5).

Discussion
A critical challenge in UDD-ASD research includes the identification of disease-causing genes and a better understanding of their underlying molecular mechanisms.NEMGs act on mitochondrial function and affect gene expression and physiological regulation in the nucleus.Therefore, establishing abnormal regulation of NEMG as a source of molecular dysfunction contributing to UDD-ASD and further identifying specific pathogenic targets are key steps.By examining NEMGs harboring DNVs, we found significant enrichment of PTVs and Dmis in patients with UDD-ASD relative to controls, with ascertainment differentials of 0.39% and 1.1% per affected child, respectively.This is a conservative estimate of the role of DNVs because we have not yet ascertained copy number variants or variants in non-coding regions.
Some of the top 20 NEMGs identified in this study exhibit concurrence with previously published findings.DNMT3A encodes a DNA methyltransferase and localizes in mitochondria to maintain CpH methylation in neurons in vivo 24 and support the occurrence of mtDNA methylation.Knockout of DNMT3A disturbs regional mtDNA methylation patterns and alters the expression of mitochondrial genes (MT-ND2 and MT-CO1), and the oxygen respiration process. 25Although DNMT3A is not specifically targeted to the mitochondria, it is identified in this organelle, indicating that it is a likely epigenetic mechanism for mtDNA methylation.SHP2, a non-receptor protein tyrosine phosphatase encoded by the PTPN11 gene in humans, is predominantly localized within the mitochondria intercristae/intermembrane space, while also present in the cytoplasm and nucleus. 26Gain-offunction mutations in PTPN11 have been implicated in neurodevelopmental processes related to human brain function, memory formation and attention regulation. 27A mutation in the X-linked pyruvate dehydrogenase (PDH) alpha subunit gene (PDHA1) results in a deficiency of the majority PDH complex, which affects mitochondrial function and may contribute to brain dysfunction in autism. 28,29A study has implicated DNM1L variants in causing impaired mitochondrial function and a reduced response to stress in patients with severe neurological dysfunction. 30Hypusinated eIF5A promotes the efficient expression of a subset of mitochondrial proteins involved in the TCA cycle and OXPHOS.De novo heterozygous variants in EIF5A result in a disorder characterized by varying combinations of developmental delay, microcephaly, micrognathia and dysmorphism. 31Clinical manifestations of developmental disorders are caused by mutations or disruptions in the POLG gene, which is responsible for encoding the catalytic subunit of mitochondrial DNA polymerase gamma. 32Although the association between the genes in Table 2 and mitochondrial functions in ASD and UDD remains largely unexplored, these few examples highlight the enormous potential of these molecules as messengers in the crosstalk between the nucleus and the mitochondria.Although genes we identified to harbor dDNVs herein showed statistical significance in the TADA-denovo model, these represent only potential candidates for future investigations, as opposed to true disease-causing genes.The two hub NEMGs (ATP5F1A and ATP5F1B) (Supplementary Table S9) both encode a subunit of the mitochondrial ATP synthase, which is responsible for >90% of the cellular ATP synthesis.Among heterozygous ATP5F1A-related mitochondrial disorders is reversible neonatal mitochondrial disorder (OMIM #500009), which is caused by mutations in the mitochondrial gene MT-TE. 33ATP5F1B was previously identified in a study that explored ultra-rare inherited variants implicated in ASD. 18Moreover, we predicted MT-ATP6 to be a highly probable functional partner of ATP5F1B.Further studies are needed to understand the role of the nucleus-mitochondria-crosstalk signaling for UDD-ASD.
Lastly, we performed a detailed analysis of NEMGs that predominantly conferred a risk for UDD-ASD, as biological function analyses revealed a clear difference between the two sets of genes.The small effect of de novo mutations in NEMG ASD relative to NEMG UDD may indicate that individuals carrying dDNVs in NEMGs show a broad phenotype of UDD-ASD, without all core symptoms of ASD.This distinction has important implications for researchers because it suggests that determining the impact of these NEMGs could contribute to the classification of subtypes of UDD-ASD and tailor new methods for intervening in patients with this specific mitochondrial dysfunction.Larger cohorts will be needed in the future to validate our results and explain the differences in functional mechanisms through wet experiments.Single-cell gene expression data from the developing human cortex further showed that endothelial cells, as well as mature neurons from both excitatory and inhibitory lineages, may be implicated in ASD risk.In particular, a significant enrichment of NEMG TADA-meta, NEMG UDD and NEMG ASD-UDD was observed in brain endothelial cells.Segarra et al. 34 reported that communication between endothelial and glial cells is responsible for coordinating neuronal migration and the blood-brain barrier establishment in mice.Moreover, brain endothelial cells can directly regulate the synaptic plasticity of hippocampal neurons by secreting Sema3G and participating in brain cognition. 35owever, although the endothelium plays an essential role in brain function, it remains unclear how signals from NEMG UDD in brain endothelial cells orchestrate communication among vessels, glial cells and neurons, and how specific changes in the molecular characteristics of endothelial cells affect processes, such as central nervous system vascularization, extracellular matrix composition, neuroglial cytoarchitecture and blood-brain barrier development.
There are several limitations of this study.Given the considerably smaller size of the control sample, although we were able to successfully replicate the same outcomes in the burden analysis step using a downsized permutation test, future investigations must employ larger control sample sizes to substantiate our findings.We acknowledge that the number of NEMGs from databases used here may still need to be accurately and comprehensively defined, but they encompass a wealth of gene information related to sub-mitochondrial localization or ontologically linked to each category or pathway.Despite expanding the set of known associations between NEMGs and UDD-ASD, further studies are needed to understand the role of the nucleus-mitochondriacrosstalk signaling for UDD-ASD.Our analysis relied on bioinformatics methodologies to prioritize potential disease-associated genes and cannot definitively establish causality.While we chose to focus on 130 disease-associated NEMGs (FDR<0.2),some of these may not be truly causal, whereas other (FDR>0.2) associations may be causal.
In conclusion, based on dDNVs in NEMGs, we prioritized 130 UDD-ASD-associated NEMGs that showed the potential to shed light on the role of mitochondria in UDD-ASD pathogenesis.NEMGs involved in mitochondrial metabolism are candidates' worthy of further research concerning UDD-ASD.If NEMG mutation in UDD-ASD creates an aberrant neuronal 'wiring diagram' in fetal brain development, this may be difficult to repair.However, if NEMG mutation in a UDD-ASD subtype results in mild mitochondrial inhibition, NEMGs may represent a viable option and promising therapeutic intervention target for metabolic therapies in some patients.

Figure 1 .
Figure 1.Flow chart of this study.
to determine the intolerance to PTVs and Dmis, respectively.Low LOEUF scores at a cut-off value of 0.35 indicate greater intolerance to function in a gene.Mis_Z positive scores indicate more constraints, and viceversa.A high Z-score indicates greater intolerance to missense variants.

Table 2 .
The top 20 significant disease-associated NEMGs arising from the TADA-denovo model of prioritization , number of protein-truncating variants (PTVs) consists of frameshift, stop-gain, stop-loss and canonical splice site disruption in the case; nDmis, number of deleterious variations (Dmis) in the case; FDR, FDR from TADA-denovo analysis. nPTV