The Human Brain Is Best Described as Being on a Female/Male Continuum: Evidence from a Neuroimaging Connectivity Study

Abstract Psychological androgyny has long been associated with greater cognitive flexibility, adaptive behavior, and better mental health, but whether a similar concept can be defined using neural features remains unknown. Using the neuroimaging data from 9620 participants, we found that global functional connectivity was stronger in the male brain before middle age but became weaker after that, when compared with the female brain, after systematic testing of potentially confounding effects. We defined a brain gender continuum by estimating the likelihood of an observed functional connectivity matrix to represent a male brain. We found that participants mapped at the center of this continuum had fewer internalizing symptoms compared with those at the 2 extreme ends. These findings suggest a novel hypothesis proposing that there exists a neuroimaging concept of androgyny using the brain gender continuum, which may be associated with better mental health in a similar way to psychological androgyny.


Introduction
In an ever-changing global environment, new learning for successful adaptation requires that we are able to be attentive to the world around us, cognitively flexible and able to employ a wide range of strategies. The ability to rapidly understand external context and decide on the optimal response, within a specific context, better enables us to take advantage of time-limited opportunities and thus provides us with mastery over the situation, thereby instilling resilience. Therefore, adaptiveness of cognition and behavior confers an advantage for individuals. Those who are limited by restricted approaches, stereotyped responses, and excessive internalizing in a variety of situations, including social, educational, and occupational ones, are less likely to flourish in society. It has been shown that being at the extreme end of the male continuum is disadvantageous both socially and psychologically. For example, these detrimental effects have been well-evidenced by a meta-analysis of 78 studies of about 20 000 participants, showing that conformity to typical masculine norms, for example, self-reliance and exercise of power over women, incurred social costs and psychiatric symptoms, including depression, loneliness, and substance abuse (Wong, et al. 2017). In contrast to these extreme stereotyped norms for males and females, "psychological androgyny" (Bem 1974(Bem , 1981(Bem , 1994 is the term that represents a flexibility and adaptability in sex roles and the behaviors associated with sex roles. An androgynous person possesses both masculine and feminine traits and the circumstances determine which traits (masculine or feminine) are employed (Rice 2006). Therefore, an androgynous person's behavior is not influenced by a gender schema. Many psychological studies have suggested that psychological androgyny, which allows for more flexible behavioral responses may be beneficial to mental health (Vafaei et al. 2014, Juster et al. 2016, Pauletti et al. 2017. For example, psychological androgyny was associated with fewer internalizing problems (Pauletti et al. 2017), higher creativity (Norlander et al. 2000), and has been found to be psychoprotective (Prakash et al. 2010). There are many reports of differences between male and female brains in the literature (Ruigrok et al. 2014, Satterthwaite et al. 2014, Choleris et al. 2018, Jiang et al. 2020). However, whether an "androgynous" brain, with a well-balanced combination of both female and male features, offers better mental health compared to a brain with predominantly female or male features, remains an unanswered question.
Most of the previous studies have focused on identifying sex differences in the brain (Choleris et al. 2018), but the identified effect sizes were generally small and lacked significant behavioral association (Hines 2020). At the structural level, females had higher gray matter volume (GMV) in the middle frontal gyrus (Z 2186 = 5.34) and lower GMV in the orbital frontal cortex (Z 2186 = 5.07) (Ruigrok et al. 2014). At the functional level, females had a lower mean network positive-participation coefficient (Z 672 = 2.21) (Satterthwaite et al. 2014). Although the effect size of those sex differences was small, multivariate classifiers have been trained to classify the sex of the brain (Satterthwaite et al. 2014, Weis et al. 2019) and achieved the best accuracy of 75% using independent test samples (for sample sizes ranging from 600 to 1700). These findings suggested that the brain's functional architecture may have both female and male characteristics at the same time (Joel et al. 2015). Therefore, we hypothesized that the brain's functional architecture can be mapped onto a continuum, and we used the biological information in regard to sex (male/female) to define the ends of the continuum. The importance of brain androgyny, akin to psychological androgyny, is that you are neither male nor female, but a combination of both. This gender continuum as a neuroimaging-defined marker of psychological androgyny may enable us to investigate the nonlinear relationships between brain gender and behavior or the variation in behavior within a sex group, which could not be uncovered using the biologically binary sex categories.
In this study, we used the resting-state functional magnetic resonance imaging (fMRI) data obtained from 9620 participants, who were aged between 17 and 78 years and recruited from 4 independent cohort studies. We investigated the sex differences of the brain functional networks in different age groups and systematically tested the potentially confounding effects on the identified differences. We subsequently built a multivariate classifier to estimate the likelihood of a given functional brain network to represent a male brain. Using this likelihood, we defined a brain gender continuum, and validated this definition by both sensitivity analyses and test-retest reliability analysis. Finally, as a demonstration of the behavioral relevance of the brain gender continuum, we tested the hypothesis that participants at the middle of this brain gender continuum, that is, with putatively androgynous brain network organization, had better mental health, specifically fewer internalizing or externalizing symptoms. Figure 1 shows the overall design of this study.
Each cohort study was approved by its corresponding ethics committee. All adult participants provided written informed consent after information on the research procedures had been provided by each cohort study team. For the IMAGEN study, when the children were under 18 years old, the children gave assent and their parents or legal guardian provided written informed consent.

Imaging Acquisition
The UKB participants were scanned on a 3-T Siemens Skyra scanner (Munich, Germany) with a spatial resolution of 2.4-mm isotropic voxels, a repetition time (TR) of 0.735 s, and a echo time (TE) of 39 ms. Scanning was conducted in ∼6 min. The HCP participants were scanned on a 3-T Siemens "customed Connectome Skyra" scanner in 2 sessions with two 15-min runs each and the main scanning parameters were 2.0-mm isotropic voxels, 0.72 s TR, 33.1 ms TE. The IMAGEN neuroimaging data were collected at 7 centers on 3-T scanners (Siemens used in Munich, Germany; Philips used in Best, The Netherlands; General Electric used in Chalfont St Giles, United Kingdom; Bruker used in Ettlingen, Germany), with a slice thickness of 2.4 mm, planar resolution of 3.4 mm, with a TR of 2.2 s, and a TE of 30 ms for about 6 min. The YMU images were collected on a Siemens Trio 3T scanner at the YMU, with a slice thickness of 3.4 mm, planar resolution of 3.44 mm, with a TR of 2.5 s, and a TE of 27 ms for about 8 min.

Preprocessing
We used the Functional Magnetic Resonance Imaging of the Brain (FMRIB) Software Library (FSL, v5.0.10; Jenkinson et al. 2012) to preprocess UKB resting-state fMRI data. The preprocessing procedure included slice-timing correction, motion correction, spatial smoothing with a 6-mm full-width at half-maximum Gaussian kernel, and wavelet despiking. The averaged white matter, cerebrospinal fluid signal, and 24 motion parameters were then regressed out from a voxel-level time series. The functional image was subsequently registered to a T1 structural image and normalized to 3-mm standard MNI space using linear and nonlinear registration with the default parameters. The same preprocessing pipeline was applied to both the IMAGEN data and the YMU data. Details could be found in Supplementary Method S11.
The resting-state functional images downloaded from the HCP consortium that already underwent HCP's minimal Both global and local brain functional connectivity were compared between 2 sex groups in different age groups; (B) based on the brain functional connectivity, a linear SVM to classify the female and the male brains was trained using the UKB sample with 10-fold cross-validation; (C) the effects of the sample size, the age composition and the functional networks on the SVM performance were assessed by sensitivity analyses; (D) the SVM model was tested using an independent sample from the HCP, and the test-retest reliability of the SVM prediction was also assessed among the 4 runs of the HCP data. A brain gender continuum was built by using the continuous output of the SVM classifier, with a value closer to 1 as more likely to be a male brain while a value closer to 0 as more likely to be a female brain; (E) the brain gender continuum was applied to the HCP participants and its associations with the internalizing and externalizing symptoms were assessed. preprocessing pipeline (Glasser et al. 2013). This pipeline mainly included the corrections for gradient-nonlinearityinduced distortion, head motion, and B 0 distortion, and the transformation of the corrected fMRI data to a 2 mm Montreal Neurological Institute (MNI) space. All of the transforms for each registration and distortion correction step were concatenated and applied in a single resampling step. Next, the global intensity normalization and a brain mask were applied. Finally, the structured artifacts were removed by an ICA (independent component analysis) + FIX (FMRIB's ICA-based X-noisifier) denoising procedure that was specifically trained on HCP data (Smith et al. 2013). The head motion parameters were also regressed out of the data (Satterthwaite et al. 2013). The downloaded data were ready for conventional volume-based analyses (Glasser et al. 2013). These preprocessing steps were carried out by the HCP consortium using FSL (FMRIB Software Library), FreeSurfer and the Connectome Workbench software. More details are provided in the WU-Minn HCP 1200 Subjects Data Release Reference Manual (https://www.humanconnecto me.org/storage/app/media/documentation/s1200/HCP_S1200_ Release_Reference_Manual.pdf).
The region-of-interest (ROI)-level time series based on the Anatomical Automatic Labelling parcellation (version 2) (AAL2 parcellation) (Rolls et al. 2015) were finally extracted by averaging the voxel-level time series within each of the AAL2 regions. ROI-level time series based on Power's parcellation (Power et al. 2011) were also extracted for validation tests.

Behavioral Assessment
The age of menopause (Field ID:3581) reported by the UKB participants (n = 2565) during their imaging visit were used in the current analysis. The Achenbach Adult Self-Report (ASR) (Achenbach 2009) was assessed for the participants in HCP. This assessment has been widely used for adults aged 18-59 years, including 123 items for the behavioral, emotional, and social problems. Anxious, withdrawn, and somatic complaints comprise the internalizing dimension, whereas aggressive, rule-breaking, and intrusive behaviors comprise the externalizing dimension. Finally, 691 subjects (M/F = 310/381) with complete family information and ASR scores were included in the current study.

Group Comparison
We first studied the sex difference of functional connectivity at multiple levels in different age groups. First, a functional connectivity between each pair of brain regions was calculated. Second, at the network level, the mean connectivity of the functional connectivity among the brain regions within a brain functional network was used. In Power's parcellation (Power et al. 2011), whole brain was divided into 11 functional networks, including sensory/somatomotor network, cingulo-opercular task control network, auditory network, default mode network, memory retrieval network, visual network, frontal-parietal task control network, salience network, subcortical, ventral attention network, and dorsal attention network. Third, we examined the global connectivity by averaging all functional connectivity.
We used student's t-test to calculate the effect size (i.e., Cohen's d) of the sex difference in cases with and without considering covariates, including total intracranial volume (TIV), signal to noise ratio (SNR), and mean FD. The seven data collection sites were considered as additional covariates in the IMA-GEN analyses. Bootstrap with 100 000 repetitions was applied to these comparisons. A correlation analysis was applied to the statistics before and after controlling additional variables to study the influence of covariates. At the network level, the correlation analysis had 64 degrees of freedom (from 66 types of pairwise connections, among 11 brain networks, of which 55 were inter-network connections and 11 were intra-network connections). At the regional level, the correlation analysis had 4369 degrees of freedom (from 4371 functional connectivity among 94 brain regions). A localized spline curve was used to fit the trajectory of global functional connectivity (gFC) with age. The fitted values for spline curve with and without considering the covariates were calculated for each age in the age range of 17-78 years old, with a manually chosen step length of 0.2 years. A correlation analysis with a degree of freedom of 304 was applied to those 2 groups of fitted values to assess the effect of covariates to the trajectory of gFC.

Multivariate Classifier
We used a support vector machine (SVM; Cortes and Vapnik 1995) with a linear kernel to classify the resting-state functional connectivity network into 2 sexes. We used the default box constraint parameters and the SMO (sequential minimal optimization) solver (RongEn et al. 2005) to train the SVM classifier. All of the functional connectivity data were used as the input features without feature selection. We randomly chose 3700 male subjects and 3700 female subjects from the UKB dataset to train our model; their age and corresponding squared and cubed age was regressed out from their functional connectivity. All functional connectivity was used as classifier features without feature selection. To assess the reliability of our model, we implemented a 10-fold cross-validation strategy on the UKB dataset. Next, all 7400 subjects were used to train a new SVM classifier and the classifier was tested using an independent data set, namely the HCP sample. The SVM classifier can provide the posterior probability computed through a sigmoid function. Using the coefficients derived from model fitting, the maximum likelihood estimate of the posterior probability = 1 1+exp AX+B , where A and B are coefficients, and X is the training data (Platt 1999). The posterior probability represents the probability that an input resulted in a certain output label, and we call that the "output scores of the SVM." To understand the meaning of the weights of the functional connectivity in the trained SVM, we first tested the correlation between the SVM weights and the effect sizes of the sex-differences estimated by the group comparisons above. Second, we compared the FC at the network level among 3 groups of participants with the lower (<0.35), intermediate (0.35 ≤ gender ≤ 0.65) and higher (>0.65) output scores by the SVM (i.e., a number between 0 and 1) to see whether the output scores could be used to indicate the brain functional connectivity. Here, we compared all 66 types of pairwise connections among 11 brain networks.

Sensitivity Analysis
We tested the effects of the sample size, percentage of elderly subjects, and brain network features on the performance of the classifier built using the UKB sample. In the sensitivity analysis for sample size, we randomly selected subjects with different sample size from the UKB sample, ranging from 200 to 7000, and then compared the classification accuracies on the independent test data set (i.e., the HCP sample). In the sensitivity analysis for age, we changed the percentage of elderly subjects (i.e., older than 65 years old) in the training data set, whereas the training sample size was always fixed at 3000. For example, if the percentage of elderly subjects was 20%, then we randomly selected 2400 subjects from the younger group and 600 subjects from the older group. The 11 brain networks were defined by the Power's parcellation (Power et al. 2011). In the sensitivity analysis for brain networks, we excluded the functional connectivity of one brain network from the SVM classifier at a time (i.e., excluding all the regions in a given network and all the functional connectivity attached to one of these regions), and then we compared the classification accuracies before and after excluding a network. Since the training samples were randomly chosen from the UKB, each test was repeated 100 times. The model performance was tested using the HCP dataset. The top 5% weights with the highest absolute values of the functional connectivity in the SVM classifier trained from UKB subjects were summarized into each functional brain network, that is, the networks with higher weights had greater difference between the 2 sex groups. To test whether these findings were atlas-dependent, we compared the results given by the AAL2 parcellation and the Power's parcellation.

Behavior Association
Using the continuous SVM output between 0 and 1, we mapped each brain functional architecture onto a "gender continuum" with the value 1 as the most likely to be collected from a male brain and the value 0 as the most likely collected from a female brain. We tested the associations of the gender continuum with internalizing symptoms, considering biological sex, age, handedness, and head motion as covariates. The partial Pearson correlation was used and when assessing the quadratic relationship, the linear term was also used as an additional covariate. The results were plotted using BrainNet Viewer (Xia et al. 2013).

Significance Test
Considering the family relatedness in the HCP dataset, we applied the multilevel block-permutation analysis for linear models (PALM) (Winkler et al. 2014) to assess the significance (Winkler et al. 2015). The significance level, p.perm, was given by 100 000 random permutations. The false discovery rate (the Benjamini-Hochberg method) was used to correct for multiple comparisons among 2 sex groups, the linear and quadratic terms, and the internalizing and externalizing terms, denoted as p.fdr.

Code Availability
Code used in the study could be found at https://github.com/zyfdu/Brain-Gender-Continuum.
At the network level, we found that males had stronger functional connectivity compared with the females with smallto-medium effect sizes within the default mode network (DMN) (d = 0.21), salience attention network (SN; d = 0.30), and frontoparietal task network (FPN; d = 0.24), in the adolescent sample from IMAGEN (Fig. 2C). The effect sizes of these differences increased from small-to-medium to large in the adult sample from HCP as d = 0.58 for DMN, 0.66 for SN, and 0.55 for FPN (Fig. 2D). After middle age, the functional connectivity decreased more steeply in males compared with females, so that the signs of sex differences were flipped, for example, d = −0.38 for DMN, −0.30 for SN, and −0.35 for FPN, in an older sample from UKB (Fig. 2E). The majority of the sex differences identified followed the same pattern that they were stronger in males compared with females in adolescents and adults, but after the middle age, they became stronger in females compared with males (see Supplementary Table S1).
In the single-center, lifespan sample from YMU, we divided the sample into the younger group (F/M = 37/30, age < 45 years old) and the older group (F/M = 42/27, age ≥ 45 years old). We found that the effect sizes of the sex-differences in the brain functional network connectivity (see Supplementary Table S2) were significantly correlated between the younger group and both the IMAGEN (r = 0.43, p = 7.19 × 10 −7 ) and the HCP samples (r = 0.36, p = 4.14 × 10 −5 ), and also between the older group and the UKB sample (r = 0.36, p = 4.28 × 10 −5 ). The majority of sex-differences in the brain functional connectivity followed the same lifespan pattern, whereby the functional connectivity was stronger in males in the younger group (e.g., At the regional level, averaging all the pair-wise correlations or edge weights for each brain regional node, we found that for adolescents ( Fig. 2I and J) and adults ( Fig. 2K and L) the weighted degree of the cingulate cortex had the greatest sex-difference (see Supplementary Table S3). Although for older subjects ( Fig. 2M and N) the weighted degree of the angular gyrus and the precuneus had the greatest sex-differences (see Supplementary Table S4).

No Significant Confounding Effect Was Identified
At global level, the trajectory fitted for gFC was almost identical to that fitted for the residual gFC after regressing out additional confounders, including the mean framewise-displacement (mean FD), SNR and TIV (r = 1.00, df = 304, Fig. 2A Fig. 2H; df = 4369), the effect sizes of the sex differences were correlated between those analyses with or without controlling for the covariates.

Brain Gender Continuum Built by a Multivariate Model
We trained a SVM classifier that reached a 10-fold crossvalidated accuracy of 80.46% within the UKB sample and reached a test accuracy of 77.75% (AUC = 0.84; Fig. 3C) using the independent HCP sample (accuracy of 4 HCP runs ranged from 69.26% to 73.30%, and the test-retest reliability of the SVM scores was high as the correlation of the SVM scores among these 4 runs ranged between 0.56 and 0.62 [df = 717]; see Supplementary Fig. S2).
The feature weights in the trained SVM were consistent with the sex difference detected by group comparison in the previous section. In the training sample (i.e., UKB; n = 7400; Figure 2. Brain sex differences. (A) Trajectories of the gFC in the female (red) and the male (blue) brains. Each dot stands for one individual. The lines were fitted by splines. After regressing the covariates, including the mean FD, SNR, and TIV, the trajectories of the residuals were reported as the dotted lines. In total 3 data sets (i.e., IMAGEN, HCP, and UKB) covered 3 age groups. (B) Trajectories of the gFC using the data collected at a single site covering a wider age range (i.e., the Yang-Ming data set). (C-E) Sex differences in 3 datasets at the network level. The mean of intra/inter-network connectivity was reported; (F-H) Sex differences in 3 datasets at the edge level. The brain regions were arranged into the brain functional networks. The sex-difference was reported for the connectivity between each pair of brain regions. In (C-H), the upper right triangle shows the original effect size, and the lower left triangle shows the effect size after controlling for the potential confounding factors (e.g., mean FD, SNR, and the TIV); (I-N) sex differences at the regional level. A negative sex difference meant this connectivity was stronger in the female brains compared with the male brains. The negative differences were averaged in (I), (K), (M) and the positive differences were averaged in (J), (L), (N). The negative weights in the SVM were averaged for each brain region among all its functional connectivity with the other brain regions. A negative weight meant this functional connectivity was in average stronger in the female brains compared with the male brains; (B) the positive weights in the SVM were averaged for each brain region; (C) performance of the SVM measured by the receive-operating characteristic (ROC) curve. Scatter plot of the SVM weights against the effect sizes of sex difference using the UKB sample before (D) and after (E) regressing out the gFC; (F) scatter plot of gender continuum calculated from Power Atlas (264 regions) and AAL2 Atlas (94 regions); (G) comparison of each of the intra-and inter-network connectivity (in total 66 types of connections) among the participants with the lower, higher and intermediate brain gender continuum scores. Inf luences on the classification accuracy of the trained SVM in the test sample when (H) changing the sample size; (I) changing the age composition while fixing the sample size at 3000; (J) changing the upper age bound of the participants, of the training data; (K) performance of SVM model without regressing age and its higher order terms, the model fails in male test samples; (L) removing one functional network from the input feature of the SVM. F/M = 3700/3700), we found that the SVM weights ( Fig. 3A and B) were correlated with the effect sizes estimated by the group comparisons above (r = 0.29, df = 4369; Fig. 3D). As the SVM was a multivariate approach, the SVM weight of each functional connectivity was established while controlling for the contributions of other functional connectivity. Indeed, we found the correlation between the SVM weight and the effect size of the sex difference became significantly stronger after controlling for the gFC in the group comparison for each functional connectivity (r = 0.33, df = 4369; 95% CI of the correlation increase: 0.03-0.05; Fig. 3E).
Two extreme ends of the brain continuum represent brains with either predominantly female features or predominantly male features, compared with the center of this continuum. We found that the participants who were scored intermediately by the SVM had the intermediate connectivity strength at the network level. For example, in the testing sample (i.e., the HCP cohort of young adults, n = 719), the DMN connectivity was the highest at the right end with predominantly male features (brain gender score > 0.65, F/M = 32/153), lowest at the left end with predominantly female features (brain gender score < 0.35, F/M = 161/14), and intermediate in the middle of this continuum (0.35 < brain gender score < 0.65, F/M = 202/157; Fig. 3G). One-way analysis of variance showed the difference in the DMN connectivity among these 3 groups was significant (F 688,2 = 13.8; p.fdr = 1.7 × 10 −6 after correcting for multiple comparisons among all within-network and between-networks connectivity). The post-hoc comparisons confirmed that such differences were significant between the middle group and both the left (t 532 = −3.21, p.fdr = 0.002) and the right groups (t 542 = 2.82, p.fdr = 0.006).
Controlling for age, the association between the number of years after menopause and the brain gender continuum score in the UKB female participants was significant but small (r = 0.048; df = 2563; 95% CI: 0.0082-0.0880). This result might suggest that as the sex hormone levels decrease, the brain gender continuum score, in females, moves towards the male end.

Sensitivity Analysis of the Multivariate Model
We trained and tested a new SVM by regressing out the additional covariates (e.g., mean FD, SNR, and TIV) from the training sample and the testing sample. We found that the output score of the new SVM was significantly correlated with the corresponding output scores given by the SVM without controlling for these additional covariates (r = 0.77; df = 717; see Supplementary Fig. S3B). When we controlled for the mean FD and SNR only, as these 2 variables were relevant to the quality of the images, the classification accuracy was 74.55% in the independent test sample (AUC = 0.83; see Supplementary Fig. S3A).
The output score of the SVM (i.e., the gender continuum) based on the AAL2 parcellation was significantly correlated with the SVM score based on the Power264 parcellation (r = 0.63; p = 1.88 × 10 −80 ; df = 717; Fig. 3F). Compared with the pure random distribution, we found that significantly more functional connectivity with the top 5% SVM weights were intra-network connections within DMN (P = 0.0139), and this finding was consistent between the AAL2 parcellation and the Power264 parcellation (see Supplementary Fig. S1).
Furthermore, by increasing sample size of the training set (i.e., UKB), the test accuracy using the independent test sample (i.e., HCP) gradually reached a stable value around 75% after the sample size reached 2000 (Fig. 3H). By removing one brain functional network from the SVM, we found that removing DMN significantly decreased the test accuracy (95% CI: [−11.40%, −4.73%] by bootstrap; Fig. 3L). We also found that a greater percentage of the older participants (older than 65 years) in the training sample (sample size remained the same as 3000) was correlated with lower test accuracy (r = −0.99, p = 2.34×10 −9 , df = 9, Fig. 3I; more details were provided in Supplementary Method S4). If we did not regress out age and its higher order terms, the SVM failed to identify the male brains in the test sample (mean accuracy = 54.27% ± 5.99%) but was systematically biased to label more brains as female (mean accuracy = 89.32% ± 2.92%; Fig. 3K). Therefore, we regressed out the age effect in the following analysis.

Brain Androgyny Associated with Fewer Internalizing Symptoms
In the HCP cohort, we found that the internalizing score, but not the externalizing score, was associated with the second-order term of the brain gender continuum (df = 684; r = 0.08; p.perm = 0.0409,uncorrected; Fig. 4A). This U-shaped relationship was mostly driven by the male participants (corrected among 2 sex groups, the internalizing and the externalizing symptoms, and the first and the second order terms; Fig. 4B). All 3 subscales of the internalizing symptoms followed the same relationship with the brain gender continuum in the male participants (anxious:r = 0.15, p.fdr = 0.0136; withdraw : r = 0.16, p.fdr = 0.0136; somatic complaints : r = 0.13, p.fdr = 0.0262; Fig. 4C-E).

Discussion
In the present study, we identified an age-dependent pattern of sex differences in the brain functional architecture using the fMRI data of nearly 10 000 participants from teenagers to older adults, and systematically examined the potentially confounding effects on these findings. Based on the identified sex differences, we trained an SVM classifier that achieved a 77.75% accuracy in an independent test sample. Using the continuous output of this SVM, we constructed a brain gender continuum and defined an androgynous brain to be at the middle of this continuum. Indeed, we showed that the patterns of functional connectivity, at the 2 extreme ends of this brain gender continuum, represented predominantly either more female or male features as compared with the center of the continuum. Finally, we used this brain gender continuum to uncover a Ushaped relationship between the neuroimaging-defined brain gender and mental health, particularly the participants with an androgynous brain indeed had fewer internalizing symptoms.
The age-dependency of the sex differences may be associated with a number of factors such as the behavior, genetics, and hormones. Research has shown that different environmental contexts, experiences, and behaviors, throughout the lifespan may alter the structural and functional architecture of the brain, in addition to modulation by neurotransmitters (Kolb and Gibb 2011). Genetic factors may also have differential expression across the lifespan, for example Deary et al. (2006) have shown different rates of heritability of intelligence across age. In addition, the sex hormones have nonlinear developmental trajectories (Haimov-Kochman and Berger 2014; Mcewen and Milner 2017) which increase during childhood and adolescence (Nottelmann et al. 1987) but decrease during aging (Rosario et al. 2004;Cui et al. 2013). Particularly, testosterone, a sex hormone, has been implicated in the developmental change of the DMN Figure 4. Behavioral association of the brain gender continuum. Scatter plot of the brain gender continuum score and the internalizing symptom score among (A) all participants; (B) male participants, from the HCP data set showed a U-shape relationship. The internalizing score reached its minima at a gender continuum score about 0.4922 and 0.6099, respectively. The scatter plots of the subscales of the internalizing symptom also showed a U-shape relationship, including anxious score (C), withdrawn score (D), somatic complaints (E). Residual of the brain gender continuum, as well as internalizing symptoms controlling mean FD, sex, age and headedness are plotted in the figure. (Nota et al. 2016), and in our study we found that 3 brain regions (i.e., the cingulate cortex, angular cortex, and precuneus) with the most differences in their functional connectivity were all identified within the DMN and these differences were also supported by previous studies using smaller samples at different age groups Ritchie et al. 2018;Ernst et al. 2019). Furthermore, in the trained SVM, a multivariate classifier, we also found that the DMN contributed the most to the classification accuracy of this model. Our findings suggest that the patterns of functional connectivity in the brain are unlikely to be entirely determined by the sex hormone levels. In the UKB sample, we showed that the greater the number of years since menopause, presumably reflecting decreased estrogen levels, the larger the gender brain continuum score, suggesting a shift towards the male end. However, the effect size of this association was small (r = 0.048). Therefore, while sex hormones influence the brain's functional connectivity many other factors, including those discussed above, also have an impact.
After systematically testing the potential confounders, we confirmed the findings of sex differences in the brain's functional connectivity. Based on the differences identified, we trained an SVM classifier and mapped each brain onto a brain gender continuum by using the continuous output of the SVM classifier. Some previous studies using cross-validation within the training samples achieved a high classification accuracy (∼90%) (Wang et al. 2012;Luo et al. 2019). However, applying such classifiers to the independent test samples, only moderate classification accuracies could be achieved (∼75%) (Satterthwaite et al. 2014;Weis et al. 2019), which were comparable with the classification accuracy of 77.75% achieved in the current study. Compared with the low classification accuracy (i.e., 65.7%) in a previous study using a test sample from a different age group compared with the training sample (Weis et al. 2019), our classifier achieved a better accuracy after regressing out age and its higher order terms from the functional connectivity matrix (77.75%). This result was in support of the finding that the sex difference in brain functional connectivity was age dependent.
The moderate classification accuracy of the multivariate classifier indicated that the brain functional architecture was unlikely to be conceptualized as binary, as is the case with biological sex, but was more likely to be continuously represented on a brain gender spectrum. At the behavioral level, Bem had hypothesized that an androgynous gender role would lead to higher self-esteem and better mental health (Bem 1974), since individuals identifying with androgyny are free to act in both masculine and feminine ways without many constraints of gender appropriateness (Bem 1977). In particular, the androgynous group reported having fewer internalizing symptoms (Pauletti et al. 2017). However, previous studies provided only the behavioral observations, therefore there was a need to understand the neural mechanism of such observations. Our results demonstrated that the participants whose brain functional connectivity mapped onto the androgynous segment of the brain gender continuum had fewer internalizing problems, which is advantageous for mental health. This U-shaped association was seen for both males and females, although it was most prominent in males. These findings may indicate that being more compassionate and sociable (traditionally female traits) could potentially improve self-esteem of men, thereby potentially reducing internalizing problems; but being more aggressive and confrontational (traditionally male traits) might not boost self-esteem of women (Pauletti et al. 2017). Future research should include self-report data on male/female behavioral traits within different contexts, for example work, home and social settings, which could further elucidate the relationship between psychological androgyny and the concept of brain androgyny.
However, the current study also has several limitations. First, no single large dataset exists that contains samples covering the entire lifespan, from infancy to old age. In our study, we first analyzed the large-scale multicenter samples from different age groups, and then validated the findings using a singlecenter sample covering a wider age range but with a smaller sample size. Across this age range, there will inevitably be many environmental factors which will have changed and may have some influence. Second, although the sex hormones have been implicated in the sex dimorphism of the brain's functional architecture (Bao and Swaab 2011), we need the lifespan measurements of the sex hormones to further investigate the molecular mechanisms underlying the brain gender continuum.

Supplementary Material
Supplementary material can be found Cerebral Cortex online.
Funding IMAGEN data are available by application to consortium coordinator Dr Schumann (http://imagen-europe.com) after evaluation according to an established procedure. Use of the Human Connectome Project (https://www.humanconnecto me.org/) dataset, the UK Biobank (http://www.ukbiobank.ac.u k/) dataset and the Yang-Ming dataset are acknowledged. UK Biobank is an open resource and is available to researchers by registering and applying to access the Resource via the Resource Access Management System (http://www.ukbioba nk.ac.uk/). This research has been conducted using the UK Biobank Resource under application 19542. The HCP S900 data were provided in part by the Human Connectome Project, WU-Minn Consortium (Principal Investigators: David Van Essen and Kamil Ugurbil; 1U54MH091657) funded by the 16 NIH Institutes and Centers that support the NIH Blueprint for Neuroscience Research; and by the McDonnell Center for Systems Neuroscience at Washington University.