Endosperm Evolution by Duplicated and Neofunctionalized Type I MADS-Box Transcription Factors

Abstract MADS-box transcription factors (TFs) are present in nearly all major eukaryotic groups. They are divided into Type I and Type II that differ in domain structure, functional roles, and rates of evolution. In flowering plants, major evolutionary innovations like flowers, ovules, and fruits have been closely connected to Type II MADS-box TFs. The role of Type I MADS-box TFs in angiosperm evolution remains to be identified. Here, we show that the formation of angiosperm-specific Type I MADS-box clades of Mγ and Mγ-interacting Mα genes (Mα*) can be tracked back to the ancestor of all angiosperms. Angiosperm-specific Mγ and Mα* genes were preferentially expressed in the endosperm, consistent with their proposed function as heterodimers in the angiosperm-specific embryo nourishing endosperm tissue. We propose that duplication and diversification of Type I MADS genes underpin the evolution of the endosperm, a developmental innovation closely connected to the origin and success of angiosperms.


Introduction
MADS-box transcription factors (TFs) are an evolutionary ancient class of TFs and major developmental regulators present in nearly all major eukaryotic groups (Alvarez-Buylla et al. 2000). They have largely amplified during land plant evolution and play important roles in regulating organ patterning and timing of reproductive developmental programs (Nam et al. 2003;Gramzow and Theißen 2013). The loosely conserved DNAbinding MADS domain is located at the N-terminus of MADS-box proteins, while based on the C-terminal sequences two types of MADS-box TFs are distinguished, Type I and Type II (Schwarz-Sommer et al. 1990;Alvarez-Buylla et al. 2000). The duplication and divergence of Type II MADS-box genes, or MIKC-type, have been linked to the evolution of floral organs in angiosperms, including flowers, ovules, and fruits (Becker and Theissen 2003;Nam et al. 2003;Ruelens et al. 2013Ruelens et al. , 2017. Compared with Type II, Type I MADS-box genes are underrepresented in gymnosperms and have experienced more frequent lineage-specific duplications in angiosperms, followed by fast pseudogenization and gene loss (Nam et al. 2004;Gramzow and Theißen 2013). Nevertheless, the role of Type I MADS-box TFs in angiosperm evolution remains to be identified. Emerging studies suggest a role for Type I MADS-box genes in the regulation of female gametophyte and endosperm development in Arabidopsis and grasses (Bemer et al. 2008;Colombo et al. 2008;Steffen et al. 2008;Roszak and Köhler 2011;Shirzadi et al. 2011;Hehenberger et al. 2012;Chen et al. 2016;Batista et al. 2019;Paul et al. 2020;Zhang et al. 2020).
The endosperm is a reproductive novelty of angiosperms that develops as the second product of double fertilization alongside the embryo to support its growth. This nourishing behavior of endosperm starts only after fertilization; in contrast to gymnosperms, where the large female gametophyte stores nutrients independently of the fertilization status of the gametophyte (Baroux et al., 2002). The endosperm is furthermore establishing reproductive barriers between closely related species, fueling plant speciation (Köhler et al. 2021). Considering the contribution of the endosperm to the evolutionary success of angiosperms, understanding the genetic basis of endosperm evolution is of key importance. In this study, we establish a link between the evolution of Type I MADS-box genes and the origin of the endosperm in flowering plants. We hypothesize that through gene duplication and neofunctionalization, novel subfamilies of Type I MADS-box TFs acquired endosperm-specific function in the shared common ancestor of all extant angiosperms after its divergence from gymnosperms. This process likely underpinned the evolution of the endosperm in angiosperms.
angiosperms revealed three major clades ( fig. 1; supplementary  fig. S1, Supplementary Material online), corresponding to the previously defined groups Ma, Mb, and Mc (Parenicov a et al. 2003;Arora et al. 2007;Gramzow and Theißen 2013). Specifically, we found Mc type genes in all angiosperms we assayed (supplementary table S2, Supplementary Material online), including Amborella trichopoda, the species sister to all other angiosperms, suggesting the presence of an ancestral Mc MADS-box gene in the most recent common ancestor of all angiosperms. Mb genes in angiosperms are sister to the angiosperm Mc clade, while the most closely related homologs in three major lineages of gymnosperms, Picea abies, Ginkgo biloba, and Gnetum luofuense (previously identified as Gnetum montanum in the genome project; Wan et al. 2018;Hou et al. 2020), form a clade that is the outgroup of the angiosperm Mc/Mb clade, followed successively by Mb-like genes in the fern Salvinia cucullata, the clubmoss Selaginella moellendorffii and the mosses Physomitrella patens and Sphagnum fallax. Supporting previous findings (Gramzow et al. 2014 Expression of Mc MADS-Box TF Genes in the Endosperm Is Ubiquitous across the Phylogeny of Angiosperms We investigated the expression patterns of the duplicated Type I MADS-box genes to pinpoint their regulatory roles in certain tissue types. Based on transcriptome data across different organs and developmental stages in Arabidopsis thaliana (Klepikova et al. 2016), Mc genes were preferentially expressed in seeds and siliques, but rarely in vegetative tissues ( fig. 2A). Using available microarray data from dissected seed tissues (Belmonte et al. 2013), we inferred that several Mc genes were mainly expressed in the early developing endosperm, but less or absent in the other compartments of seeds, such as seed coat or embryo ( fig. 2B). These data suggest that Mc MADS-box TFs have endosperm-specific functions in A. thaliana. Consistent with this notion, the Mc MADS-box TF PHERES1 is a master regulator of a gene regulatory network controlling endosperm development (Batista et al. 2019).
We also investigated the endosperm transcriptomes at early developing stages of maize, coconut, castor bean, soybean, and tomato and found at least one of the Mc genes to be expressed in the endosperm of each species, consistent with their proposed roles in endosperm development  Qiu and Köhler . doi:10.1093/molbev/msab355 MBE also detected in whole-seed transcriptomes of rice, avocado, and monkeyflower ( fig. 3). Since the orthologous Mc genes were primarily expressed in the endosperm in other species, we infer that the observed Mc expression in whole-seed transcriptomes likely reflects transcription predominantly in the endosperm. Thus, Mc genes are ubiquitously expressed in the early endosperm of various species representing major lineages of angiosperms, including eudicots, monocots, and magnoliids, indicating that endosperm expression of Mc genes is a conserved feature of angiosperms. Among those expressed Mc genes, OsMADS87/89 in rice have been characterized as TFs regulating endosperm development similar to PHERES1 in A. thaliana, suggesting that the expressed Mc genes in diverse angiosperm lineages may function similarly (Chen et al. 2016;Paul et al. 2020).
In contrast, Mb genes in A. thaliana were barely expressed in the endosperm or other seed tissues, only one of them had low expression in the seed coat ( fig. 2B). Similarly, in maize transcriptomes, Mb expression was not detected in the endosperm ( fig. 3). Although Mb expression was detectable at      . 3), the expression level of Mb genes was lower compared with the corresponding Mc expression. Based on whole-seed transcriptomes, Mb genes in avocado were nearly not expressed, Mb genes in rice were expressed at low level, whereas some Mb genes in monkeyflower were active at later stages of seed development compared with Mc genes. The sporadic occurrence of Mb gene expression in the endosperm or other seed tissues across the phylogeny of angiosperms suggests that the function of Mb is dispensable in the context of endosperm regulation. In support of this notion, Type I MADS-box genes with known functional roles in the endosperm are either Mc or Ma type genes (Bemer et al. 2008;Colombo et al. 2008;Steffen et al. 2008;Roszak and Köhler 2011;Shirzadi et al. 2011;Hehenberger et al. 2012;Chen et al. 2016;Batista et al. 2019;Paul et al. 2020;Zhang et al. 2020). The absence of Mb genes was previously reported for the orchids Apostasia shenzhenica, Phalaenopsis equestris, and Dendrobium catenatum, and the loss of Mb genes was proposed to be connected to the deficiency of endosperm in orchids . Nevertheless, some orchid species undergo double fertilization and form a rudimentary endosperm (Pace 1907;Sood and Mohana Rao 1988), suggesting that loss of Mb is not directly related to the loss of endosperm formation in orchids. In agreement with this view, transcripts of Ma and Mc are present in developing seeds of A. shenzhenica and P. equestris , likely derived from the arrested endosperm. In A. thaliana, expression of some Mb genes could be detected in the female gametophyte (Bemer et al. 2010), raising the hypothesis that their functional role is restricted to maternal tissues, rather than the endosperm. We tested this hypothesis by investigating the transcriptomes of species with perispermic seeds, in which the maternally derived perisperm rather than the endosperm provides nutrients to the embryo. Consistent with the proposed functional role of Mb genes in maternal tissues, we detected Mb transcripts in the transcriptome assembly from perisperm of Coffea arabica. Likewise, in Nymphaea thermarum perispermic seeds, transcript levels of Mb genes were much higher compared with the barely detectable Mc gene transcripts (supplementary fig. S3, Supplementary Material online), consistent with the perisperm accounting for the majority of the seed volume in Nymphaea (Povilus et al. 2015). We also investigated transcriptomes of gymnosperm reproductive tissues to infer the functional role of preduplicated Mb/c orthologs (supplementary fig. S3, Supplementary Material online). Mb/c orthologous genes were expressed in female cones of P. abies and ovules of G. luofuense and expression of some Mb/c orthologous genes could also be detected in developing seeds of G. luofuense, suggesting these genes perform important roles in the maternal reproductive tissue and possibly regulate the maternal nourishing behavior supporting the development of seeds. In gymnosperms, the large female gametophyte nourishes the embryo after fertilization; whereas in angiosperms, this role has been adopted by the endosperm which develops alongside the embryo after fertilization (Baroux et al. 2002). Based on our data, we propose that the function of preduplicated Mb/c genes was to control nutrient provisioning in the female gametophyte, a function that is maintained by angiosperm Mb genes acting in the female gametophyte and perisperm, whereas Mc genes neofunctionalized and adopted an endospermspecific function, likely enabling endosperm development.

Duplication of Ma Genes and Specialization of Interaction with Mc and Mb
MADS-box TFs usually form homo-or heterodimers (Kaufmann et al. 2005). In A. thaliana, an atlas of MADSbox interactions based on yeast two-hybrid data revealed distinct interaction patterns between Type II and Type I TFs (de Folter et al. 2005 (Paul et al. 2020). The two rice Ma genes as well as the two Mc genes are barely expressed in nonendosperm tissues (Sakai et al. 2011;Davidson et al. 2012). We found that the two rice Ma genes are closely related with each other in the same subclade ( fig.1; supplementary fig. S1, Supplementary Material online). Knockout of both, MADS78 and 79 genes, results in endosperm failure and seed lethality (Paul et al. 2020), revealing that other Ma TFs that putatively interact with Mb TFs cannot complement the Mc-interacting function in the endosperm.
To test whether the functional divergence of Ma genes can be detected in other angiosperm species, we analyzed the expression of Ma genes in the transcriptomes of endosperm or seeds where Mc expression could be detected. We also Duplicated and Neofunctionalized Type I MADS-Box TFs . doi:10.1093/molbev/msab355 MBE found Ma genes to be highly expressed specifically in the endosperm or seeds in those species, suggesting that the regulatory divergence between the Ma* genes and other Ma genes took place across the angiosperm phylogeny ( fig. 3). We hypothesize that in response to the duplication of Mb and Mc genes, the duplicated Ma genes specialized in protein-protein interactions and subsequently the novel interacting pairs, Ma* and Mc, together occupied the endosperm regulatory niche.
Although the phylogeny of Ma group Type I MADS-box TFs in land plants was difficult to resolve, there is only a single cluster of Ma genes in nonflowering plants ( fig. 1). Thus, the Ma-like genes in nonflowering plants have not undergone the diversification observed in angiosperms, so they likely represent the ancestral interacting partners of the preduplicated Mb-like genes ( fig. 1). In contrast, several rounds of duplications gave rise to angiosperm-specific Ma TF clades that could diverge to Ma* genes ( fig. 1), in concert with the duplication of Mb and Mc clades.
We observed that many angiosperm species have at least two clusters of divergent Ma genes, including the groups representing the successive sister lineages to all other angiosperms, Amborella and Nymphaeales. Furthermore, the Ma gene phylogeny of all major angiosperm groups is largely, although imperfectly, reflected by a two-clade pattern, despite the uncertainty at the basal nodes with quite short branches ( fig.1; supplementary fig. S1, Supplementary Material online). A parsimonious model to describe the evolution of Ma type genes in angiosperms is that ancestral angiosperms most likely already possessed two, if not multiple types of Ma genes that arose from angiosperm-specific duplication. These could then have subfunctionalized by forming heterodimeric complexes with either Mb or Mc interacting partners. Another requirement for the specialization of bona fide Ma* TFs was the acquisition of novel expression in the endosperm. We hypothesize this two-step specialization restrained the occurrence of Ma* precursors and propose that one group of ancestral Ma TFs initiated the subfunctionalization and gave rise to a single cluster of potential Ma* TFs, which were capable to specialize into Ma*, whereas the other Ma TFs did not gain this competence. We observed that in all the eudicot species we surveyed, there are Ma genes closely related to the AGL62 clade of Arabidopsis and expressed in the endosperm or seed transcriptomes; likewise, the expressed Ma genes in maize and coconut are in the same clade as MADS78/79 of rice (supplementary fig. S4, Supplementary Material online). These putative Ma* genes may have the same Ma* origin. Alternatively, it is also possible that several events of Ma* specialization took place in different Ma subclades convergently in angiosperms. Based on approximately unbiased (AU) tests (Shimodaira 2002) it is not possible to differentiate between the two hypotheses (supplementary fig. S5, Supplementary Material online). Nevertheless, following the specialization of an ancestral Ma*, some descendant genes that duplicated subsequently in the clade may have lost the function and pseudogenized, consistent with previous predictions (Nam et al. 2004). In consequence, the retained functional Ma* genes appear scattered in the phylogeny, obscuring a possible shared origin. In summary, we conclude that duplication of Ma genes and subsequent specialization of Ma* in angiosperms enabled the formation of heteromeric Type I MADS TF complexes required for the regulation of endosperm development.

Conclusion
Angiosperms are the most abundant and diverse group among land plants. The success of angiosperms is closely connected to the developmental innovations of flowers and fruits, as well as the process of double fertilization, coupling fertilization to the formation of the embryo nourishing endosperm tissue (Baroux and Grossniklaus 2019). Duplication and diversification of type II MADS-box genes underpin the evolution of flowers and fruits in angiosperms (Irish and Litt 2005;Ruelens et al. 2017), whereas the role of type I MADSbox genes for angiosperm evolution remained obscure. Based on our data, we propose that the origin of the embryo nourishing endosperm tissue is linked to the angiosperm-specific duplication of Type I MADS-box genes ( fig. 4). In the earliest land plants, ancestral Ma and Mb/c-like TFs likely formed heterodimers that had reproductive function based on the expression of gymnosperm Ma and Mb/c TFs in female cones and seeds. After the angiosperm lineage diverged from the gymnosperms, true Mc TFs arose by gene duplication, experienced neofunctionalization, and drove the concerted divergence of some Ma TFs formed by angiosperm-specific gene duplication events. These novel Mc-Ma heterodimers adopted a function as master regulators of the endosperm developmental network in flowering plants. This proposed scenario is strongly supported by the specific or preferential expression of Mc and Ma*genes in the endosperm of all sampled angiosperm species as well as functional data in A. thaliana and rice, revealing that Mc and Ma* TFs are required for endosperm development (Chen et al. 2016;Batista et al. 2019;Paul et al. 2020). In contrast to gymnosperms that only have few Type I MADS-box genes (Gramzow et al. 2014); in angiosperms, their number strongly amplified, correlating with the evolution of the embryo nourishing endosperm. The link between Mc TFs and endosperm evolution was furthermore supported by the negligible expression of Mc genes in perispermic seeds, in which the maternal perisperm instead of the endosperm supports embryo growth (Lu and Magnani 2018). The maternal nourishing function in perispermic seeds correlates with the expression of Mb genes, consistent with the proposed ancestral role of preduplicated Mb/c genes in regulating nutrient transfer from the maternal tissues to the embryo.
Together, our work provides new insights into the role of Type I MADS-box proteins in the origin and evolution of the endosperm, a developmental novelty associated with the rise and diversification of angiosperms.

Phylogenetic Analyses
Amino acid sequences of Type I and Type II MADS-box proteins of A. thaliana obtained from TAIR10 were used as Qiu and Köhler . doi:10. (Lu et al. 2020).
MUSCLE was used to generate the amino acid alignments of MADS-box domains extracted from the identified genes with default settings (Edgar 2004). IQ-TREE 1.6.7 was applied to perform phylogenetic analyses for maximum likelihood (ML) trees (Nguyen et al. 2015). The implemented ModelFinder determined LG amino acid replacement matrix (Le and Gascuel 2008) to be the best substitution model in the tree inference (Kalyaanamoorthy et al. 2017). One thousand replicates of ultrafast bootstraps were applied to estimate the support for reconstructed branches (Hoang et al. 2018). The Ma, Mb, and Mc Type I genes were curated from the phylogenetic position with the defined Arabidopsis MADS-box genes. Specifically, for elucidating the evolutionary trajectory of putative Ma* TFs, we compared the topology of constrained phylogenetic trees based on different hypotheses by AU tests (Shimodaira 2002).

Supplementary material
Supplementary data are available at Molecular Biology and Evolution online.

Acknowledgments
We thank Dr. Rebecca Povilus and Dr. William Friedman for sharing the seed transcriptome data of Nymphaea thermarum. We thank Dr. Qin Li for the comments on the data analyses and visualization. This work was supported by a grant from the Swedish Research Council (2017-04119) to C.K., a grant from the Knut and Alice Wallenberg Foundation

Data Availability
All data are incorporated into the article and its online supplementary material.