Data-driven multimodal fusion: approaches and applications in psychiatric research

Abstract In the era of big data, where vast amounts of information are being generated and collected at an unprecedented rate, there is a pressing demand for innovative data-driven multi-modal fusion methods. These methods aim to integrate diverse neuroimaging perspectives to extract meaningful insights and attain a more comprehensive understanding of complex psychiatric disorders. However, analyzing each modality separately may only reveal partial insights or miss out on important correlations between different types of data. This is where data-driven multi-modal fusion techniques come into play. By combining information from multiple modalities in a synergistic manner, these methods enable us to uncover hidden patterns and relationships that would otherwise remain unnoticed. In this paper, we present an extensive overview of data-driven multimodal fusion approaches with or without prior information, with specific emphasis on canonical correlation analysis and independent component analysis. The applications of such fusion methods are wide-ranging and allow us to incorporate multiple factors such as genetics, environment, cognition, and treatment outcomes across various brain disorders. After summarizing the diverse neuropsychiatric magnetic resonance imaging fusion applications, we further discuss the emerging neuroimaging analyzing trends in big data, such as N-way multimodal fusion, deep learning approaches, and clinical translation. Overall, multimodal fusion emerges as an imperative approach providing valuable insights into the underlying neural basis of mental disorders, which can uncover subtle abnormalities or potential biomarkers that may benefit targeted treatments and personalized medical interventions.


Introduction
The escalating pr e v alence of psyc hiatric disorders has imposed a substantial economic burden on society (Ferrari et al. 2022 ), particularly exacerbated by the impact of the COVID-19 pandemic.Compelling evidence has suggested that the presence of psychiatric disorders is associated with altered brain function and structur e, and ma gnetic r esonance ima ging (MRI) has emer ged as a non-inv asiv e tec hnique with significant pr omise for inv estigating br ain c hanges.Curr entl y, collecting m ultiple types of noninv asiv e br ain ima ging data fr om the same individual has become a common practice, aiming to identify potentially stable task-or disease-r elated c hanges and thus, to impr ov e the tr anslation of r esearch findings into clinical practice.Each imaging technique provides a unique perspective of brain function or structur e, suc h as functional MRI (fMRI) for hemodynamic response related to neural activity in the brain, electro-encephalography (EEG) for electrical activity with higher temporal but lo w er spatial resolution than fMRI, structural MRI (sMRI) for brain tissue type, as well as diffusion MRI (dMRI) for tissue micr ostructur e and br ain connecti vity.Despite se par ate anal ysis of eac h data modality can pr ovide important insights into the brain structural or functional integrity associated with physiological or behavior al featur es, ther e is increasing evidence that multimodal brain imaging can offer a better understanding of inter-subject variability from how brain structur e sha pes br ain function, to what degr ee br ain function feeds back to change its structure, and what functional or structural aspects of physiology ultimately drive cognition and behavior (Sui, Adali, et al., 2012 ;Sui, Huster, et al., 2014 ).Consequently, a k e y moti v ation for jointl y anal yzing m ultimodal data is to le v erage the cross-information in the existing data, thereby potentially r e v ealing important variations that may only be partially detected by a single modality.
T he a v ailability of m ultimodal br ain ima ging allows for joint analysis via the application of various data fusion approaches (Calhoun et al., 2016 ), including (i) visual inspection, which is basically inferring the multimodal information by separately visualizing results from essentially unimodal analyses; (ii) data integr ation, whic h is anal yzing eac h data type separ atel y and ov erlay them, thereby not allowing for an examination of interaction among data types (Ar dnt, 1996 ;Sav opol et al., 2002 ); (iii) asymmetric fusion, using one dataset to constrain another, such as dMRI being constrained by sMRI or fMRI data, which may impose potentiall y unr ealistic assumptions on the constr ained data (Goldber g- Zimring et al., 2005 ;Abramian et al., 2021 ;Behjat et al., 2021 ); and (iv) symmetric fusion, which treats multiple image types equally, taking full adv anta ge of the joint information in multiple datasets and pr oviding mor e vie ws for individual subjects and co-v ariation between modalities (Sui, Adali, et al., 2012 ;Sui, Huster, et al., 2014 ).Symmetric fusion a ppr oac hes can be br oadl y classified as being either model-based or data-driven.Model-based approaches, suc h as m ultiple linear r egr ession, dynamic causal modeling, and structural equation modeling, examine the goodness-of-fit of the data to the prior knowledge about the experimental paradigm and the properties of the data.Despite being widely used in biomedical data analysis, model-based approaches are limited when the dynamics of the experiment become hard to model.Ho w e v er, datadriv en methods ar e suitable for the analysis of such complex paradigms as they minimize the assumptions on the underlying properties of the data by decomposing the observed data based on a gener ativ e model.Data-driv en a ppr oac hes include, but ar e not limited to, principal component analysis (PCA), independent component analysis (ICA), and canonical correlation analysis (CCA).These methods belong to blind source separation approaches, as they do not r equir e prior hypotheses about the connection of inter est; hence, they ar e attr activ e for the exploration of the full body of data.We hav e de v eloped se v er al data fusion a ppr oac hes based on ICA and CCA (Sui, Adali, et al., 2012 ;Qi, Calhoun, et al., 2018 ), and applied them to unravel intricate relationships among genetic, br ain ima ging, and behavior, aiming to elucidate the complex neur al mec hanisms under pinning v arious psyc hiatric disorders and to facilitate personalized clinical interventions.
In this paper, we first provide some basic motivation regarding the benefits of data-driv en m ultimodal fusion and introduce some basic terminology for c har acterizing m ultimodal data analysis .Next, we pro vide a summary of m ultiv ariate a ppr oac hes for multimodal data fusion, with an emphasis on ICA-or CCA-based methods .Following this , we r e vie w existing studies that have used m ultimodal fusion a ppr oac hes to inv estigate psyc hiatric disorders, and whene v er possible, the behavior al r ele v ance of the assessed physiological features will be mentioned.Finally, we discuss some emerging trends and approaches.

Data-driven fusion approaches using multimodal MRI
Differ ent br ain ima ging data types ar e intrinsicall y dissimilar, making it difficult to analyze them together without making several assumptions.Instead of directly analyzing the entire datasets together, an alternate a ppr oac h is to r educe eac h modality to a feature, a distilled dataset representing the interesting part of each modality (Calhoun et al., 2009 ), such as the fractional amplitude of lo w-frequenc y fluctuations (fALFF) from fMRI, fractional anisotr opy (FA) fr om dMRI, or segmented gr ay matter (GM) fr om sMRI, providing a natural way to discover multimodality associations and also alleviating the difficulty of fusing data types of different dimensionality and nature, as well as those that have not been recorded simultaneously.The trade-off is that some information may be lost, e.g.GM does not dir ectl y measur e volume or cortical thickness and FA does not provide directional information.Ne v ertheless, ther e is considerable evidence supporting the usefulness and validity of feature-level analysis (Smith et al., 2009 ).By contr ast, emer ging fusion a ppr oac hes hav e been de v eloped to dir ectl y handle the first-le v el 4D fMRI data to extract individual variations from high-dimensional raw images by integrating m ulti-le v el PCA and subject-le v el bac k-r econstruction tec hniques (Du et al., 2013 ;Qi et al., 2019 ).

Review of multivariate, multimodal fusion models
In the following sections, we will introduce several blind fusion models and semi-blind models, as well as discuss their c har acteristics in combining multimodal neuroimaging.

Joint ICA
jICA is one of the ICA-based data fusion methods that jointly anal yze m ultiple datasets by concatenating them along a certain dimension (Calhoun, Adali, Kiehl, et al., 2006 ).jICA is based on the assumption that two or more features share the same mixing matrix and order as well as equal contributions, and maximizes the independence among joint components .jIC A is feasible for many paired combinations of features, such as fALFF, GM, FA, or N-way data fusion (Calhoun, Adali and Liu, 2006 ;Franco et al., 2008 ;Xu et al., 2009 ).

Multilink jICA
Instead of reducing fMRI into a single map or a single intrinsic connecti vity network lik e default mode network, Khalilullah et al .
proposed ml-jICA to fuse GM and m ultiple r est fMRI networks, such as intrinsic connectivity networks, using the same core algorithm as jICA (Khalilullah et al., 2023 ).Ho w e v er, jICA assumes similar distributions among different modalities, and GM maps have a v ery differ ent distribution fr om intrinsic connectivity networks deriv ed fr om ICA, whic h ar e alr eady maximall y independent (Du et al., 2020 ).Ther efor e, they further proposed pml-jICA (Khalilullah et al., 2023 ), which allows for a shared mixing matrix for both the sMRI and fMRI modalities, while allowing for different mixing

MISA
Multidataset Independent Subspace Analysis

Parallel GICA+ICA
Figure 1: Ov ervie w of the curr ent popular m ultimodal fusion a ppr oac hes .T he fusion models with model-driv en (gr een), unsupervised (blue), and semi-supervised (orange) data-driven learning are listed in different categories, in which the models that can deal with 4D fMRI data are highlighted in y ello w.
matrices linking the sMRI data to the different intrinsic connectivity networks.

Disjoint subspace analysis using ICA
The assumption of the same mixing matrix across all modalities in the jICA can be a v ery constr aint, especiall y with mor e than two modalities .DS-IC A is introduced to identify and split all modalities into common and their distinct subspaces, and perform separ ate anal yses in subspaces .T he order of common and distinct subspaces is determined by the consecutive steps of PCA and CCA.Given the order, the common subspace across all modalities is decomposed with jICA whereas separate ICAs are used in the distinct subspaces (Adali et al., 2018 ;Akhonda et al., 2019 ).

Joint connectivity matrix ICA
Instead of being based on the selected regions of interest, cmICA decomposes voxel-wise brain connectivity matrix using ICA, gener ating maximall y independent spatial sources and their corresponding connectivity maps to the whole brain (Wu et al., 2015 ).Incor por ating the principles of jICA, Wu et al. proposed joint cmICA (Wu et al., 2023 ), a data-driven parcellation and automated linking of voxel-wise structural connectivity and functional connectivity information from whole-brain fMRI and dMRI without the need for a prior atlas .T he joint cmIC A can automaticall y extr act connectivity-based cortical sources that are shared between functional connectivity and structural connectivity, providing more flexibility in estimating sources and connectivity maps.

Multimodal CCA
mCCA allows a different mixing matrix for each modality and is used to find the linear combinations of variables in each dataset that maximize the inter-subject covariations across datasets, generating a set of components and their corresponding mixing profile, called canonical variants (CVs) (Correa et al., 2008 ).After decomposition, the CVs correlate each other only on the same indices and their corresponding correlation values are called canonical corr elation coefficients.Compar ed to jICA, whic h constr ains features to have the same mixing matrix, mCCA is flexible in that it allows common as well as distinct le v els of connection between two features, but the associated source maps may not be spatially sparse, especially when the canonical correlation coefficients are not sufficiently distinct (Sui et al., 2010 ).T he mCC A is in variant to differences in the range of the data types and can be used to jointl y anal yze v ery div erse data types.It can also be extended to multi-set CCA to incorporate more than two modalities (Li et al., 2009 ).Note that mCCA works on second-le v el fMRI featur es, wher eas m ultiset-CCA can also work with 4D raw fMRI data (Correa et al., 2010 ).

mCCA + jICA
Considering pr e vious findings on multiple modalities (Rykhlevskaia et al., 2008 ;Camara et al., 2010 ), it is plausible to assume that the components decomposed fr om eac h modality hav e some degr ee of corr elation between their mixing profiles among participants .T he mCC A + jIC A is a blind data-driven model that is optimized for this situation (Sui et al., 2011 ;Sui et al., 2013 ) and also has excellent performance for ac hie ving both flexible modal association and source separation.It takes adv anta ge of two complementary a ppr oac hes: mCC A and jIC A, allowing for both strong and weak connections as well as joint independent components .T he mCC A enhances the reliability of jIC A by pro viding a closer initial match via correlation; while the jICA further decomposes the r emaining mixtur es in the associated maps and relaxes the requirement of sufficient distinction imposed on the canonical correlation.Note that the mCCA + jICA a ppr oac h does not increase the computational load a ppr eciabl y and is not limited to tw o-w ay fusion, but can potentially be extended to three-way or N -way fusion of multiple data types by replacing mCCA with multi-set CCA (Li et al., 2009 ).It enables robust identification of correspondence among N-diverse data types and facilitates investigations into whether certain disease risk factors are shared or distinct across multiple modalities.

mCCAR + jICA
While maintaining the performance of mCCA + jICA, we hope to optimize specific subject-le v el corr elations with a measur e of inter est, e.g.cognitiv e/behavior al scor e, disease symptom, or genetic v ariant.Ther efor e, a supervised, goal-dir ected model that uses pr e vious information as a r efer ence to guide multimodal data fusion becomes a natur al option.To addr ess this, a fusion with reference model called "multi-site CCA with reference + joint independent component analysis" (mCCAR + jICA) was proposed (Qi, Calhoun, et al., 2018 ), whic h can identify co-v arying m ultimodal imaging patterns associated with the reference with higher estimation precision.The mCCAR + jICA consists of two steps, where the mCCAR was first implemented by imposing an additional constraint to maximize not only the covariations among mixing matrices of each modality, but also the column-wise correlations between mixing matrices of each modality and the r efer ence signals, resulting in the potential target components that are correlated with r efer ence signals in eac h modality, as well as being most corr elated acr oss participants between modalities .jIC A is further applied on the concatenated components to k ee p the modality linkage of the potential target components and maximize the spatial independence, generating the final independent components as well as their corresponding mixing matrices.By incorporating prior information, the mCCAR + jICA enables the identification of joint multimodal components that hav e r obust corr elations within r eferr ed measur es and among themselv es (inter-modality correlations) (Qi, Yang, et al., 2018 ;Zhi et al., 2020 ;Qi et al., 2021 ;Xu et al., 2022 ), which may not be detected by a blind N -way multimodal fusion a ppr oac h.

PM-SCCA
T he PM-SCC A is another CC A-based featur e-le v el fusion a ppr oac h for fusing a vast number of genetic markers, such as singlenucleotide pol ymor phisms (SNPs), and m ultimodal quantitativ e tr aits, including ima ging and cognitiv e measur es.By incor por ating prior knowledge, the method takes priors encoded as a preference matrix into a simplified version of sparse CCA to regularize the magnitude of the elements of canonical weight vectors (Sha et al., 2023 ).Ther efor e, the pr oposed PM-SCCA model can not only ca ptur e m ulti-SNP-m ulti-quantitativ e tr aits associations, but can also r ele v ant genetic and phenotypic featur es effectiv el y.

Linked ICA
Linked ICA is a probabilistic approach based on a modular Bayesian fr ame work, whic h is designed for sim ultaneousl y modeling and discovering common characteristics across multiple modalities (Gr ov es et al., 2011 ).The combined modalities can potentiall y hav e completel y differ ent units, noise le v els, spatial smoothness, and intensity distributions.In linked ICA, each modality is modeled using Bayesian tensor ICA (Beckmann et al., 2005 ) that differs from traditional ICA methods such as fast ICA (Hyvärinen et al., 2000 ) and Infomax (Bell et al., 1997 ) in that it incor por ates dimensionality reduction into the ICA itself thr ough automatic r ele v ance determination.Linked ICA can automatically determine the optimal weighting for each modality, and can also detect single-modality structured components when present.

Big-data linked ICA
Ne v ertheless, the linked ICA a ppr oac h encounter ed computational challenges when dealing with multimodal high dimensional and big sample size datasets, especially the release of international large datasets, such as the UK Biobank (Sudlow et al., 2015 ), ABCD stud y (Case y et al., 2018 ), and HCP (Van Essen et al., 2013 ) datasets .Consequently, BigFLIC A w as proposed b y integrating MELODIC's incremental group PCA to capture modes with e v en small v ariations within eac h modality (Smith et al., 2014 ) and online dictionary learning (Mairal et al., 2010 ) to reduce the dimension of feature (e.g.voxel) space, which can capture both local and distant spatial correlation structures (Gong et al., 2021 ).BigFLICA can both pr eserv e k e y information in original data and reduce the effects of stochastic domain-specific noise, as well as increase the computational efficiency of the linked ICA algorithm for extr emel y lar ge population datasets.

Semi-supervised BigFLICA
It is evident that both linked ICA and BigFLICA are purely unsupervised learning methods that do not use prior information, such as non-ima ging deriv ed phenotypes .Hence , Gong et al .introduced a semi-supervised, multimodal, and multi-task fusion approach for IDP disco very, termed semi-supervised BigFLIC A (SuperBigFLIC A) (Gong et al., 2022 ), which used external phenotype information to guide the identification of r ele v ant m ultimodal br ain networks associated with the phenotype of interest.

Par allel IC A
T he pIC A is another IC A-based featur e-le v el fusion a ppr oac h that can process multiple modalities simultaneously (Liu et al., 2009 ), and uncover the independent components of each modality and the relations among them.The pICA algorithm maximizes the cost function based on both entropy and the correlation term, implemented by identifying the maximally independent components within each dataset individually.Compared to jICA with str ong constr aints with common mixing matrix and order acr oss all modalities , pIC A pro vides a flexible fr ame work to combine multiple data types with different ranges and properties, such as differ ent neur oima ging, genetic, and phenotypic data.The twoand three-wa y pIC A ha ve been implemented to identify links among genetic, brain structure, and brain function (Vergara et al., 2014 ;Pearlson et al., 2015 ).The pICA has demonstrated superior efficacy in investigating the imaging-genetic associations, and the findings provide proof of concept that genomic SNP factors can be investigated by using phenotypic imaging findings in a m ultiv ariate format (Pearlson et al., 2015 ).

Par allel IC A with reference
No prior gene information is taken into account in the pICA.Nevertheless, incor por ating known genes involved in critical biological pathways in disease may help identify a set of genes contributing in a coordinated way to a larger network.Ther efor e, pICA-R w as proposed b y imposing an additional constraint on the infomax fr ame w ork to minimize the distance betw een a certain component and the r efer ence (Liu et al., 2012 ).

Par allel IC A with multiple references
A k e y factor that affects the performance of pICA-R is the refer ence accur acy.Degr adation is expected in component, loading, and linkage accuracies when the reference accuracy is below 0.2.Especiall y in SNP anal ysis, a r efer ential SNP set associated with the same trait of interest is desired to obtain a mor e r eliable r eference .T herefore , pIC A-MR is designed to directly combine multiple r efer ential SNP sets to constr ain the component of interest (Chen et al., 2014 ).Compared to pICA-R, this extended approach is more flexible in dynamically constraining components for multiple r efer ences and allows for some extent of heterogeneity in r efer ences.

Parallel group ICA + ICA
Many existing multimodal fusion approaches in fMRI focus on 3D feature summaries, neglecting the rich temporal information.T hus , the parallel group ICA + ICA fusion method was proposed to dir ectl y deal with the first-le v el 4D fMRI data (Qi et al., 2019 ;Qi, Silva, et al., 2022 ).This method integrates group IC A into pIC A in a unified optimization fr ame work, in whic h a ne w v ariability matrix is defined to ca ptur e subject-wise functional variability and used to link the mixing matrices of another modality.Such a method allo ws tw o-w ay fusion of 4D fMRI data with structural MRI features, facilitating the identification of multimodal spatiotemporal links and providing alternative views to investigate brain disorders in a unifying multimodal framework.

Independent vector analysis
IVA extends ICA to multiple datasets (Lee et al ., 2008 ), providing a natural and extendable way to dir ectl y link m ultiv ariate br ain ima ging data together.Based on the assumption of independence among sources within each dataset but dependence across datasets, IVA allows for a more flexible way of detecting the dependence across datasets by defining a source component vector that collects the independent components for each dataset, and has sho wn po w er in pr eserving the dataset v ariabilities when anal yzing m ultiple datasets (Laney et al. , 2015 ;Luo , 2023 ).IVA can be regarded as not only an extension of ICA, but also a generalization of CCA.By integrating higher-order statistics into the mCCA-based model, tIVA was pr oposed.This a ppr oac h constr ains the statistical inde pendence for inde pendent components within each dataset but statistical dependence across the datasets to fuse different modalities (Adali et al., 2015 ).Mor eov er, m ultimodal IVA was implemented by using MISA in the IVA model, thereby identifying common independent sources among multiple modalities (Damaraju et al., 2021 ).

aNy-way ICA
Whereas most fusion approaches require the same number of sources and/or components for all modalities (jIC A, mCC A, mCC A(R) + jIC A), Duan et al. proposed aNy-wa y IC A by combining infomax ICA and Gaussian IV A (IV A-G) via a shared weight matrix model without orthogonality constr aints, whic h can sim ultaneously maximize the independence of sources and correlations acr oss differ ent modalities with the same or different numbers of sources per modality (Duan et al., 2020 ).When applied to the fusion of sMRI, fMRI, and EEG with different numbers of sources, this a ppr oac h is able to r ecov er sour ces and loadings, as w ell as the true covariance patterns with improved recovery accuracies compared to mCCA and mCCA + jICA, especially under noisy conditions.

Consecuti v e independence and correlation transform
Existing fusion methods often r equir e the signal subspace order to be identical for all modalities, and cannot discover one-toman y associations, in whic h one component fr om one modality is linked with more than one component from another modality.
To address this, C-ICT was developed by combining ICA and IVA-G for the joint analysis of multimodal data, including four steps: (i) performing ICA on individual datasets separ atel y; (ii) selecting meaningful ICs and the corresponding subject covariations; (iii) performing IVA-G on the selected subject covariations of different datasets; and (iv) identifying significantly pair-wise associated source component vectors, and tracing back to the ICs in the ICA stage based on subject covariations with the highest contribution to the correlated source component vectors and identify them as associated components across different modalities (Jia et al., 2021 ).C-ICT is flexible in terms of the number of datasets combined and the number of orders of the signal subspace for each dataset, and can discover one-to-many associations.

Multidataset independent subspace analysis
MISA is a unified m ultidataset m ultidiv ersity m ultidimensional fr ame work for subspace modeling (Silva et al., 2020 ).In this framework, m ultiple datasets ar e jointl y decomposed, in whic h sources are combined into dimensional subspaces that can accommodate arbitrary links among groups of sources across different datasets and modalities, and all-order statistics are used to gauge their associations and pursue subspace independence.Compared with independent subspace analysis that is limited to subspaces within the same dataset, or IVA a ppr oac hes that hav e a rigid subspace structure in which a single component (no more, no less) from each dataset must go together to form a subspace, MISA allows for datasets to be not onl y heter ogeneous but also of different dimensionality, combining modalities of different intrinsic dimensionality in a single unified model and providing a robust generalization of man y m ultiv ariate a ppr oac hes including ICA, IVA, and independent subspace analysis (Silva et al., 2016 ;Silva et al., 2020 ).Collectiv el y, eac h method provides a unique perspective for inter pr eting the multiple datasets based on their various hypotheses.We summarized and compared the methods in Table 1 on their various optimization assumptions, purpose of the analysis, r equir ement of priors, number of the modalities, input data types r equir ed, and data dimensionality reduction methods, as well as their adv anta ges and disadv anta ges, aiding in selecting the a ppr opriate fusion method based on the available datasets.

Review of multimodal fusion analysis in psyc hia tric disor ders
We conduct a selective review of research to study associations among modalities in the context of psychiatric disorders with the pr e vious data-driv en m ultimodal fusion methods.Briefly, we searched PubMed for the terms multimodal, multimodal fusion, and multimodal modalities, and then narro w ed these to studies that actually used one of the fusion-based approaches mentioned before.All of the multimodal fusion studies in psychiatric disorders r e vie wed in this study ar e summarized in Table 2 .Gener all y speaking, most of the studies we r e vie wed demonstr ate congruent effects across modalities, and multimodal fusion almost alwa ys pro vides more po w er to differentiate disease than unimodal a ppr oac hes.

Blind multimodal fusion
Numer ous studies hav e demonstr ated that blind m ultimodal fusion can ca ptur e the co-occurring abnormalities in brain function and structure in patients with sc hizophr enia (SZ).Antonucci et al .identified aberrant structural-functional covariation networks using jICA (Antonucci et al., 2022 ), showing significantly reduced covariation between temporoparietal degree centrality and GM volume (GMV) in fr ontal, tempor al, parietal cortex, and thalam us in SZ patients, whic h was also associated with both social and occupational functioning.Ho w e v er, no gr oup differ ence was found in degree centrality using univariate analysis, demonstrating that le v er a ging the cross-information among multiple imaging modalities may provide meaningful results.One study combined fMRI, dMRI, and sMRI by mCCA on a dataset from 47 SZ and 50 healthy controls (HC) to identify covarying patterns of fALFF, FA, and GMV.One multimodal component was identified as both group discriminating and significantly correlated with the MATRICS Consensus Cognitive Battery composite (Sui et al., 2015 ).A main finding was that linked functional and structural deficits in the distributed cortico-striatal-thalamic circuit may account for se v er al aspects of cognitiv e impairment in SZ.P articularl y, r esults found that distinct dimensional aspects of cognitive composite might exhibit dissociable multimodal imaging signatures, as the increased fALFF values in the inferior parietal lobule significantl y corr elated with declined social cognition.Similarl y, Sui et al .integr ated ALFF, EEG spectr a, and GM using mCCA to dis-tinguish SZ from HC with > 90% classification accuracy (Sui, Castro , et al. , 2014 ).In addition, using four types of MRI feature in a joint analysis to investigate multiple impairments of SZ on a lar ge population (Fig. 2 ), r esearc hers not onl y identified cov arying functional and structural regions in the striatum, hippocampus, and frontal-parietal network, but also found high spatial consistency of these alter ed r egions acr oss differ ent scanners using mCC A + jIC A (Liu et al., 2019 ).T his suggests that the fusion results of mCCA + jICA are highly robust and replicable, while offering unique perspectives regarding the missing links between modalities.
Using tIVA, a more flexible fusion approach, Adali et al .identified significant group difference in the covariation of fMRI and EEG between SZ and HC, but not in all fMRI, sMRI, and EEG modalities (Adali et al., 2015 ).Significant group differences were found in the temporal-motor activation in fMRI and the N2 peak in EEG.Mor eov er, not limiting to the one-to-one fusion patterns, the study combined FA, GM, and fALFF for SZ and HC with C-ICT, identifying six inter pr etable triplets of components, eac h of whic h consists of three associated components from the three modalities (Jia et al., 2021 ).For instance, the corticospinal tract and superior longitudinal fasciculus from dMRI were not only associated with the uncus and inferior temporal gyrus from sMRI and superior frontal gyrus and middle frontal gyrus from fMRI, but were also associated with the precuneus and paracentral lobule from sMRI and superior temporal gyrus from fMRI.This indicates that C-ICT can r e v eal m ultiple associations acr oss thr ee modalities and pr ovide potential biomarkers for SZ, and is a flexible and informative method for the fusion of medical imaging data from different modalities.

Semi-blind multimodal fusion
By introducing prior information, semi-blind multimodal fusion a ppr oac hes enhance the sensitivity and specificity of identifying meaningful br ain ima ging cov ariance patterns that ar e associated with specific symptoms, cognitive deficits, and gene expression c hanges observ ed in psyc hiatric disor ders.As sho wn in Fig. 3 , cognitive global scores were used to guide three-way multimodal MRI fusion in two independent cohorts including both HC and SZ via the supervised learning strategy with mCCAR + jIC A. T he findings suggested that the salience network in GM, corpus callosum in FA, and centr al executiv e and default-mode networks in fALFF can serve as modality-specific biomarkers of generalized cognition (Sui et al., 2018 ).The identified MRI signatures are highly consistent cross-cohort and, more importantly, they are predictive of multiple-domain cognitive performance, suggesting that the r efer ence-guided m ultimodal fusion r esults may serv e as effectiv e pr edictors for the r ele v ant cognitiv e measur es (Sui et al., 2018 ).
Similarly, Qi et al. used mCCAR + jICA to investigate the fMRI-sMRI covarying patterns associated with the polygenic risk scores (PRS) for SZ on the UK Biobank dataset (Qi, Sui, et al., 2022 ).Results sho w ed a robust PRS-associated neuroimaging pattern with decreased GMV and fALFF in the frontotemporal cortex, which can distinguish SZ from HC with > 83% accuracy and can significantly predict their cognition and symptoms across four independent cohorts (Fig. 4 A).More interestingly, the identified frontotemporal alter ations wer e found to be impaired in patients with schizoaffective disorder (SAD), but not in autism spectrum disorder (ASD), depr ession, and attention-deficit/hyper activity disorder (ADHD), suggesting the potential m ultimodal br ain biomarker specific to SZ.
Table 1: Summary of assumptions , aims , and suggestions of uses for multimodal fusion methods.

Ad v antages and disad v antages jICA
Assume two or more features share the same mixing matrix and order as well as equal contributions.Maximize the independence among joint components.

Blind multimodal fusion
One study used multimodal fusion to investigate mood disorders, jointl y anal yzing fMRI and sMRI via mCCA + jICA in major depres-sive disorder (MDD), bipolar disorder (BP), and HC (He et al., 2017 ).The gr oup discriminativ e cov arying components wer e identified with reduced GM in the parietal and occipital cortices, and attenuated functional connectivity within sensory and motor networks

B A
Figure 4: ( A ) Multimodal covarying analysis guided by the PRS for SZ using mCCAR + jICA.The study identified that the SZ-PRS was associated with decreased GMV and fALFF in the frontotemporal cortex, which can distinguish SZ from HCs with more than 83% accuracy, and can significantly predict their cognition and symptoms across four independent cohorts.More interestingly, the study found that the identified frontotemporal alter ations wer e specific to SZ. ( B ) Multimodal cov arying anal ysis guided by autistic symptom scor e for ASD and its thr ee subtypes using mCCAR + jICA.The study sho w ed that the dorsolateral prefrontal cortex and superior/middle temporal cortex in fALFF and GM are the shared cov arying r egions among the thr ee subtypes, while the k e y differ ences among the thr ee subtypes ar e negativ e functional featur es within subcortical br ain ar eas.Mor eov er, eac h subtype-specific brain pattern is correlated with different symptom subdomains, with social interaction as the common subdomain.Reproduced with permission from Qi, Morris, et al. ( 2020 ) and Qi, Sui, et al. ( 2022 ).
for BP patients compared with HC, while showing the altered GM in the amygdala and cerebellum for MDD patients.In contrast to unimodal data, the identified multimodal patterns can distinguish MDD, and BP from HC with higher classification accuracy.Similarly, Tang et al. used mCCA + jICA to investigate the sMRI-dMRI covarying patterns in BP patients (Tang et al., 2020 ).One multimodal covarying pattern was identified with decreased GM in the inferior frontal gyrus, right anterior cingulate gyrus and left superior fr ontal gyrus, whic h was associated with reduced WM integrity in the corticospinal tract and superior longitudinal fasciculus.

Semi-blind multimodal fusion
One study investigated how miR-132 dysregulation may affect cov ariation of m ultimodal br ain ima ging data in 81 unmedicated MDD patients and 123 demogr a phicall y matc hed HCs using mC-C AR + jIC A, as well as in a medication-naive subset of MDD patients (Qi, Yang, et al., 2018 ).The findings suggested that higher miR-132 le v els in MDD wer e associated with both lo w er fALFF and lo w er GMV in the fronto-limbic network.Moreover, the identified br ain r egions linked with incr eased miR-132 le v els wer e also associated with poorer cognitive performance in attention and executiv e function.Fr om the aspect of electr oconvulsiv e ther a py tr eatment (ECT) r esponse,  Rating Scale guided brain structure-function fusion analysis via mCCAR + jICA in 118 patients with de pressi ve episodes and 60 HCs (Qi, Abbott, et al., 2020 ).Results demonstrated that higher ECT responsiveness was associated with reduced fALFF in the pr efr ontal cortex, insula, and hippocampus, linked with increased GMV in the anterior cingulate, medial tempor al cortex, insula, thalam us , caudate , and hippocampus .Relative to non-responders, responder-specific ECT-related brain networks occur in the fronto-limbic network and are associated with successful ther a peutic outcomes.Although ECT is r ecommended as an efficacious ther a py for tr eatment-r esistant depression, patients often experience cognitive impairment after ECT tr eatment (Semk ovska et al., 2010 ).One r ecent study combined fMRI and sMRI to identify ECT antidepr essant-r esponse and cognitive-impairment multimodal brain networks by mC-CAR + jICA (Qi, Calhoun, et al., 2022 ).The findings exhibited decreased fALFF in the superior orbitofrontal cortex and caudate accompanied by increased GMV in the medial temporal cortex in both antidepr essant-r esponse and cognitiv e-impairment networks.For the modality-specific components, increased GMV in the hippocampus and thalam us wer e specific to antidepressant r esponse, while decr eased fALFF in the amygdala and hippocampus was specific to antidepressant response, which was validated in two independent datasets.More interestingly, the E-field within these tw o netw orks sho w ed an inv erse r elationship with depr essive symptom reduction and cognitive impairment, and the optimal E-fled range as [92.7-113.9]V/m was estimated to maximize antidepressant outcomes without compromising cognitive safety, which may improve the ECT benefit to risk ratio.All these studies indicate the superiority of the supervised multimodal fusion a ppr oac hes in identifying potential biomark ers link ed to specific symptoms, gene expression, and personalized treatment optimization.

Blind multimodal fusion
Lifetime comorbidity among psychiatric disorders is pervasive, such as SZ and BP, and multimodal fusion allows us to le v er a ge multimodal data to explore common and specific mechanisms of m ultiple psyc hiatric disorders (Buc kley et al., 2009 ), which may contribute to the earl y dia gnosis and tr eatment for specific disorders .Using jIC A, Wang et al. in v estigated aberr ant inter actions between structure and function across SZ, SAD, and BP (Wang et al., 2015 ), showing that the common alterations across psychotic dia gnoses wer e the cov ariations between ALFF in pr efr ontalstriatal-thalamic-cerebellar networks and GM in the DMN, which were also correlated with cognitive function, social function, and Schizo-Bipolar Scale scores, whereas the fused alteration in the temporal lobe was unique to SZ and SAD.

Semi-blind multimodal fusion
For m ultiple psyc hiatric disorders, the study combined fMRI and sMRI to explor e symptom-driv en tr ansdia gnostic shar ed networks between SZ and substance use with drinking, smoking, depression, ASD, and ADHD via m ulti-gr oup data mining (Qi, Bustillo, et al., 2020 ).Results demonstrated that substance use was associated with cognitive deficits in SZ through the anterior cingulate cortex and thalamus in GMV; that depression was linked to the negative dimensions of the positive and negative syndrome scale and reasoning in SZ through caudate-thalamus-middle/inferior temporal gyrus in GMV; and that developmental disorder pat-tern was correlated with poor attention, speed of processing, and r easoning in SZ thr ough inferior tempor al gyrus in GMV, indicating that distinct comorbid psychiatric conditions are accompanied by distinct impaired brain networks associated with different symptoms and cognitive impairments .Moreo ver, Qi et al .combined three fMRI tasks and sMRI to explore the multimodal covarying patterns associated with novelty seeking on the IMA-GEN dataset.Results identified a covarying pattern including the pr efr ontal cortex, striatum, amygdala, and hippocampus, which can longitudinall y pr edict fiv e differ ent risk scales, including alcohol drinking, smoking, hyperactivity, depression, and SZ disorders, and can also classify among ADHD, depression, and SZ with an accuracy of 87.2%, revealing a potential transdiagnostic neur oima ging biomarker to predict disease risks or severity.

Blind multimodal fusion
One study combined fMRI, dMRI, and sMRI to explore ADHD using linked ICA (Wu et al., 2019 ), suggesting that c hildr en with ADHD sho w ed altered white matter microstructure in widespread white matter fiber tr acts, incr eased GMV in bilater al fr ontal r egions, and decreased GMV in posterior regions, as well as altered FC in default mode and frontoparietal networks.Wolfers et al. found the most pr edictiv e m ultimodal r egion for adult ADHD was primaril y located in the anterior temporal cortex by combining dMRI and sMRI using linked ICA (Wolfers et al., 2017 ).In addition, one recent study used the pml-jICA to fuse sMRI and fMRI in Alzheimer's disease patients (Khalilullah et al., 2023 ), identifying two joint components with partially overlapping regions that sho w ed opposite effects for Alzheimer's disease versus controls, but were able to be separated due to being linked to distinct functional and structural patterns.

Semi-blind multimodal fusion
There is a large heterogeneity in ASD, and one recent study combined GM and fALFF to dissect the heterogeneity in ASD by mC-C AR + jIC A with the Autism Diagnostic Observation Schedule as a r efer ence to guide m ultimodal fusion on Asper ger's, perv asiv e de v elopmental disorder-not otherwise specified (PDD-NOS), and autistic subtype from the ABIDE I/II datasets (Qi, Morris, et al., 2020 ).Results sho w ed that the dorsolater al pr efr ontal cortex and superior/middle temporal cortex were the primary common functional-structur al cov arying cortical br ain ar eas shar ed among the three subtypes, while the k e y differences among the three subtypes were negative functional features within subcortical brain areas (Fig. 4 B).Moreover, each subtype-specific brain pattern was correlated with different Autism Diagnostic Observation Schedule subdomains, with social interaction as the common subdomain.
Despite this study r e vie wing the a pplication of m ultimodal fusion in psychiatric disorders, the data-driven fusion approaches have also been successfully applied in other diseases, such as human immunodeficiency virus disease (Sui et al., 2021 ), e pile psy (Zhi et al., 2020 ), and substance use disorders (Ver gar a et al., 2014 ;Hirjak et al., 2022 ), to r e v eal potential m ultimodal ima ging biomarkers.

Emerging trends
Substantial pr ogr ess has been made in m ultimodal fusion a ppr oac hes, fr om jICA constr ained with the same mixing matrix and order of components as well as equal contributions for different modalities to mCC A, mCC A(R) + jIC A, linked IC A, pIC A, and IVA and its variants with more flexible framework to combine multiple data types, from identifying one-to-one to one-toman y associations, fr om identifying cov arying components to disco vering co varying and modality-specific components simultaneously among multiple modalities.Ho w ever, there still remains m uc h work to be done.With the collection of large-scale datasets, such as the UK Biobank with half a million UK participants, and various data types, such as gene , en vironment, transcriptome arra y, and beha vior, the question of how to effectiv el y fuse v arious types of data to identify potential stable and generalizable imaging biomarkers from high-dimensional data for psychiatric disorders r emains.Additionall y, deep learning, whic h can handle nonlinear features and learn high-dimensional r epr esentations, has been ov erwhelmingl y successful in computer vision, natur al langua ge pr ocessing, and video/speec h r ecognition (LeCun et al., 2015 ).Combining ad vanced dee p learning a ppr oac hes with the unique c har acteristics of br ain ima ging data is also a promising av enue for m ultimodal br ain ima ging anal ysis.Ultimatel y, quantitativ e m ultimodal fusion r esearc h needs to pay more attention to clinical translation, including the early diagnosis of high-risk populations and personalized treatment.

N-w ay m ultimodal fusion
The release of large-scale datasets, such as the UK Biobank, ABCD, and HCP datasets, presents an unprecedented opportunity to mine complementary information from diverse modalities.Sever al studies hav e de v eloped data-driv en m ultimodal fusion a ppr oac hes to combine multiple modalities on larger datasets.For example , BigFLIC A was de v eloped to integr ate 47 ima ging modalities to predict thousands of phenotypic and behavioral variables on the UK Biobank and HCP datasets, ∼20 times faster than linked ICA while ac hie ving impr ov ed pr edictiv e po w er compar ed with widel y used anal ysis str ategies, single-modality decompositions (Gong et al., 2021 ).Furthermore, by incorporating prior information, SuperBigFLICA was proposed to le v er a ge m ultiple imaging modalities to predict phenotypes, and has been performed on the UK Biobank dataset with ∼40 000 participants and 47 imaging modalities, along with > 7000 non-imaging derived phenotypes.Results sho w ed that the SuperBigFLICA a ppr oac h impr ov ed the pr ediction accur ac y of phenotypes b y up to 46% compar ed to conv entional expert-knowledge and unsupervised-learning a ppr oac hes.Additionall y, this a ppr oac h can also learn the generic imaging features that can predict new phenotypes (Gong et al., 2022 ).In addition, Damaraju et al. performed multimodal IVA on a large multimodal dataset of > 3000 participants in the UK Biobank study to identify GM-FA-ALFF link ed inde pendent sources, ca pturing a ge-associated cov arying biomarkers with GM in thalamus , caudate , and insular regions , as well as FA in periventricular and ALFF in visual and parietal regions (Damar aju et al., 2021 ).Mor e adv anced models, suc h as those that can handle N-way m ultimodal fusion, ar e being intr oduced and may become one of the leading directions in future neuroimaging researc h giv en the pr edominance of m ultimodal data acquisition.Additionall y, while the integr ation of lar ge datasets and m ultimodal analyses offers promising opportunities to advance psyc hiatric r esearc h, c hallenges r elated to data quality, harmonization, statistical methods, and inter pr etation m ust be car efull y addressed.

Deep learning
Deep learning has emerged as a po w erful and transformative appr oac h in the field of medical brain imaging research, including convolutional neural networks for capturing spatial patterns, r ecurr ent neur al networks for handling time-series data, and gr a ph conv olutional netw orks for extr acting topological pr operties, ac hie ving unpr ecedented accur acy in disease dia gnosis and image segmentation (Bzdok et al., 2018 ;Yan et al., 2022 ;Rahaman et al., 2023 ).Deep learning can automatically extract meaningful features from diverse neuroimaging modalities, such as sMRI, fMRI, and PET scans, to learn intricate variations and subtle abnormalities, ther eby impr oving classification or pr edictiv e accur acy for v arious neur ological and psyc hiatric disor ders, as sho wn in Fig. 5 .
For instance, an enhanced multi-modal graph convolutional netw ork w as constructed b y fusing the br ain structur al and functional gr a phs to distinguish HC fr om neur opsyc hiatric disorders, including SZ, BP, and ADHD, ac hie ving a classification accuracy of 93.71% (Liu, Wang, et al., 2022 ).A m utual m ulti-scale triplet gr a ph conv olutional netw ork that combined functional and structural connectivity enhanced the accuracy for multiple brain disorder classification (Yao, Sui, et al., 2021 ).In medical imaging, inter pr etability in deep learning is not mer el y a desir able featur e but an indispensable necessity (Bi et al., 2023 ).One recent study de v eloped an inter pr etable m ultimodal fusion fr ame w ork b y combining intermediate feature maps with gradient-based weights, which can perform automated diagnosis and result interpretation sim ultaneousl y (Hu et al., 2021 ).These findings suggest that deep learning could provide more accurate and early detection of brain abnormalities (Zhang et al., 2011 ;Li et al., 2020 ;Zhao et al., 2022 ), which may not have been revealed through separate unimodal analyses as typically performed in most neur oima ging experiments.

Clinical tr ansla tion
T he o v er arc hing goal of m ultimodal br ain ima ge fusion anal ysis for psychiatric disorders is to assist in clinical diagnosis and treatment.There is an increasing number of studies demonstrating the potential fusion of structural and functional data to impr ov e br ain disease classification and predictions (Gao et al., 2018 ;Lalousis et al., 2021 ;Wen et al., 2021 ;Xu et al., 2021 ;Zhi et al., 2021 ).For example, Sui et al. combined resting-state fMRI, EEG and sMRI data to classify 48 SZ from 53 HC and ac hie v ed the best performance with 91% accuracy compared to each single modality, confirming the effectiveness and adv anta ges of m ultimodal fusion (Sui, Castro , et al. , 2014 ).Ho w e v er, the classification or prediction accuracy tends to be r elativ el y low with the increased sample size, especially for ADHD, autism, and depression, hindering the translation of r esearc h e vidence into clinical pr actice (Woo et al., 2017 ).One futur e dir ection lies in building individualized pr ediction models based on imaging features derived from multimodal fusion, and combining beha vioral, en vironmental, or genetic variants to impr ov e the accur acy of psyc hiatric dia gnosis, risk warning, or tr eatment optimization (Jiang et al., 2018 ;Jiang et al., 2020 ;Sui et al., 2020 ;Qi et al., 2021 ).

Conclusions
In summary, this selective review underscores the pivotal role of data-driv en m ultimodal fusion a ppr oac hes in adv ancing our understanding of psychiatric disorders and r e volutionizing psyc hiatric r esearc h.By le v er a ging v arious data types, suc h as differ ent ima ging modalities and phenotypes, these a ppr oac hes have shown great promise in the identification of individualized signatures associated with clinical symptoms, personalized dia gnosis, or interv ention par ameters.With the e v er-expanding collection of rich information encompassing gene expression, envir onmental exposur es, pr otein expr ession, ima ging featur es, and behavioral outcomes, the most promising avenues for the future may lie in developing better data mining models that can complement and harness the intricate relationships between div erse neur oima ging and other forms of data, and ultimatel y tr anslating scientific discov eries into meaningful clinical translation.

Figur e 2 :
Figur e 2: T he application of blind four-wa y multiset CC A plus jIC A to identify m ultimodal alter ation in SZ.Cov arying functional and structur al abnormalities were identified in ( A ) regional homogeneity (ReHo), ( B ) GM, ( C ) FA, and ( D ) functional network connectivity (FNC) in two independent cohorts using different scanners, where the spatial maps of ReHo, GM, and FA were visualized at | Z | > 2.5 with the positive Z scores shown in red, and the FNC matrix was transformed into Z scores and thresholded at | Z | > 3. Reproduced with permission from Liu et al. ( 2019 ).

Figure 3 :
Figure 3: Cognition-directed multimodal fusion and prediction analysis using multi-site CCA with reference plus jICA (mCCAR + jICA).Cognition-associated multimodal covarying imaging patterns were identified in three modalities and are highly consistent across cohorts.More importantly, the identified imaging signatures are predictive of multiple-domain cognitive performance.Reproduced with permission from Sui et al. ( 2018 ).

Figure 5 :
Figure 5: Deep learning fr ame works that are popularly adopted by diverse MRI features.