-
PDF
- Split View
-
Views
-
Cite
Cite
Gaspar Sánchez-Serna, Jordi Badia-Ramentol, Paula Bujosa, Alfonso Ferrández-Roldán, Nuria P Torres-Águila, Marc Fabregà-Torrus, Johannes N Wibisana, Michael J Mansfield, Charles Plessy, Nicholas M Luscombe, Ricard Albalat, Cristian Cañestro, Less, but More: New Insights From Appendicularians on Chordate Fgf Evolution and the Divergence of Tunicate Lifestyles, Molecular Biology and Evolution, Volume 42, Issue 1, January 2025, msae260, https://doi.org/10.1093/molbev/msae260
- Share Icon Share
Abstract
The impact of gene loss on the diversification of taxa and the emergence of evolutionary innovations remains poorly understood. Here, our investigation on the evolution of the Fibroblast Growth Factors (FGFs) in appendicularian tunicates as a case study reveals a scenario of “less, but more” characterized by massive losses of all Fgf gene subfamilies, except for the Fgf9/16/20 and Fgf11/12/13/14, which in turn underwent two bursts of duplications. Through phylogenetic analysis, synteny conservation, and gene and protein structure, we reconstruct the history of appendicularian Fgf genes, highlighting their paracrine and intracellular functions. An exhaustive analysis of developmental Fgf expression in Oikopleura dioica allows us to identify four associated evolutionary patterns characterizing the “less, but more” conceptual framework: conservation of ancestral functions; function shuffling between paralogs linked to gene losses; innovation of new functions after the duplication bursts; and function extinctions linked to gene losses. Our findings allow us to formulate novel hypotheses about the impact of Fgf losses and duplications on the transition from an ancestral ascidian-like biphasic lifestyle to the fully free-living appendicularians. These hypotheses include massive co-options of Fgfs for the development of the oikoblast and the tail fin; recruitment of Fgf11/12/13/14s into the evolution of a new mouth, and their role modulating neuronal excitability; the evolutionary innovation of an anterior tail FGF signaling source upon the loss of retinoic acid signaling; and the potential link between the loss of Fgf7/10/22 and Fgf8/17/18 and the loss of drastic metamorphosis and tail absorption in appendicularians, in contrast to ascidians.

Introduction
The bloom of sequenced genomes has revealed that gene losses are pervasive throughout the tree of life, leaving little doubt of their great potential as a key evolutionary force that can generate adaptive phenotypic diversity (Krylov et al. 2003; Albalat and Cañestro 2016; Fernández and Gabaldón 2020; Guijarro-Clarke et al. 2020; Helsen et al. 2020; Xu and Guo 2020). There are many paradigmatic examples of gene losses that have been key for the evolution of adaptations in certain species, under what is known as the “less is more” hypothesis (Olson 1999). Examples of adaptive gene losses include positively selected null-mutations in certain receptors that provide resistance to malaria and HIV in humans (Novembre et al. 2005; Hodgson et al. 2014), the loss of gluconeogenic muscle enzyme that allowed the evolution of true hovering flight in hummingbirds (Osipova et al. 2023), and many gene losses that facilitated the reconquest of aquatic and air environments in mammals (Sharma et al. 2018). However, how evolutionary processes can drive the loss of essential genes without carrying an important detrimental load remains enigmatic. The loss of a gene often does not come as an isolated event, but it is accompanied by the co-elimination of other genes that are functionally linked to a distinctive pathway (Albalat and Cañestro 2016). Moreover, within a given gene family, the loss of some members is often accompanied by the duplication of others, which increase the robustness of the genetic system and facilitate the events of gene loss (Force et al. 1999). In genetic systems that have suffered gene losses, we can often observe situations of function shuffling (Cañestro et al. 2009; Martí-Solans et al. 2021), a concept that describes when we observe that the same function is carried out by different paralogs in different species (McClintock et al. 2001). How function shuffling occurs remains unclear, and it can probably imply different processes, including from events of divergent subfunctionalization among different species, to the recruitment of regulatory elements that can drive the expression of different paralogs in convergent expression domains, all of it often accompanied by divergent events of gene losses in different lineages (Cañestro et al. 2009). To understand the evolutionary scenarios and impact of the loss of essential genes, such as those governing embryo development, it is necessary to study cases in which events of gene co-elimination and duplication can be related to the loss or survival of ancestral traits still present in sister groups, or even to the origin of evolutionary adaptations.
During recent years, the appendicularian tunicate Oikopleura dioica has become an attractive animal model to study the impact of gene loss on the evolution of developmental mechanisms in our own phylum, the chordates (Ferrández-Roldán et al. 2019). Appendicularians phylogenetic position is commonly considered to be basal among tunicates, but whether the last common ancestor of tunicates had an ascidian-like biphasic lifestyle or an appendicularian-like free-living lifestyle remains enigmatic (Delsuc et al. 2018; Kocot et al. 2018). This phylogenetic position, however, is not conclusive since there is also some evidence suggesting that appendicularians could be the sister group of some ascidians, implying that their free-living lifestyle derived from a biphasic lifestyle (Stach et al. 2008; Tsagkogeorga et al. 2009; Lemaire and Piette 2015). There is a growing body of evidence from paleontology, embryology, cladistic and phylogenetic inferences, as well as from the identification of relevant gene losses, supporting the latter hypothesis and thus suggesting that the last common ancestor of tunicates had a biphasic ascidian-like lifestyle (Stach et al. 2008; Ferrández-Roldán et al. 2021; Nanglu et al. 2023). For instance, the discovery of how gene loss drove the deconstruction of the cardiopharyngeal gene regulatory network in O. dioica has been key to understanding the adaptive evolution of the heart and regression of atrial muscles, overall facilitating the transition from an ancestral biphasic ascidian-like lifestyle to the complete free-living lifestyle in appendicularians (Ferrández-Roldán et al. 2021). Oikopleura dioica also appears to have suffered a large number of gene losses affecting important signaling pathways, such as the wingless (Wnt) and retinoic acid (RA) signaling pathways, that play fundamental roles in axial patterning and cell differentiation during embryo development in chordates (Martí-Solans et al. 2016, 2021). Oikopleura dioica has the minimal Wnt repertoire found among chordates, where only four out of the thirteen Wnt families are present. Oikopleura dioica also stands as the only chordate known to date that can be considered an “evolutionary knockout” model for RA signaling, a term that points to the study of animals that have naturally lost genes as models to understand biological processes or diseases in the absence of a particular gene (Albalat and Cañestro 2016). The antagonistic action of the RA and the fibroblast growth factor (FGF) signaling pathways is well known in vertebrates during axial patterning, where it regulates for instance the expression of Hox genes (Diez del Corral and Storey 2004; Wilson et al. 2009). The fact that this antagonistic action has also been described in ascidians has led to the conclusion that the RA-FGF antagonism was already present in the last common ancestor of olfactores (this is vertebrates + tunicates) (Pasini et al. 2012). Moreover, its absence in cephalochordates has led to the suggestion that the evolutionary innovation of this RA-FGF antagonism could have facilitated the evolutionary origin and radiation of olfactores (Bertrand et al. 2015). The loss of RA signaling and the drastic reduction of Wnt families in O. dioica makes the evolution of FGF signaling in this species particularly intriguing. Its study can contribute to a better understanding of how organisms manage the loss of important genes while maintaining similar morphologies (i.e. the inverse paradox in Evo-Devo) (Cañestro et al. 2007), and to correlating gene loss events with evolutionary adaptations of the lifestyle of certain groups of organisms.
Fibroblast growth factors form a family of signaling proteins that emerged concomitantly with the origin of Eumetazoans and have been extensively conserved during animal evolution, regulating a plethora of important biological processes such as cell proliferation, migration, or differentiation during embryonic development and adult tissue homeostasis (Bertrand et al. 2014; Teven et al. 2014). In general, Fgfs are small proteins characterized by a conserved FGF core homology domain of 120 to 130 amino acids disposed in a β-trefoil topology (Plotnikov et al. 2001). The FGF domain contains essential motifs for binding both heparin and the extracellular region of the Fgf receptors on the cell surface, forming a dimeric ternary complex that can trigger canonical signal transduction cascades inside the cells (Schlessinger et al. 2000). Outside the FGF domain, sequences are not conserved among different subfamilies, and some Fgf genes have independently evolved extended N- or C-terminal regions that are not homologous (Popovici et al. 2005). Consequently, protein alignments between distant Fgf subfamilies are not possible outside the FGF domain. Some of these extended regions of variable lengths often include signal peptides (SP) and nuclear localization signals (NLS) (Coulier et al. 1997). The SP is a short hydrophobic region that includes the first ∼15 to 30 residues of a protein at its N-terminus and is important for the secretion of proteins outside the cell (Owji et al. 2018). In principle, Fgfs that lack an SP remain intracellular, although alternative secretion mechanisms that enable these Fgfs to perform paracrine signaling functions have also been described (Revest et al. 2000; Miyakawa and Imamura 2003; Schäfer et al. 2004; Kirov et al. 2012). Moreover, there is an increasing body of evidence for intracellular functions and interacting partners for nonsecreted Fgfs (Goldfarb 2001, 2005; Schoorlemmer and Goldfarb 2002; Olsnes et al. 2003; Wu et al. 2012; Sluzalska et al. 2021). In addition, the presence of NLS allows some Fgfs to migrate into the nucleus and interact with other transcription factors to modulate the expression of target genes (Antoine et al. 2005; Bryant and Stow 2005; Sheng et al. 2005; Popovici et al. 2006).
The high sequence variability among Fgfs and the short length of their conserved core have hindered the phylogenetic classification of this family (Popovici et al. 2005). Recent evolutionary reconstructions suggested the existence of eight Fgf subfamilies in chordates (namely Fgf1/2, Fgf3, Fgf4/5/6, Fgf7/10/22, Fgf8/17/18/24, Fgf9/16/20, Fgf11/12/13/14, and Fgf19/21/23) (Oulion et al. 2012). It has been proposed that the basally divergent chordate amphioxus might possess the full catalogue of chordate Fgfs, with one single member for each of the eight subfamilies. This contrasts with the large number of Fgfs within each subfamily in vertebrates due to the various rounds of genome duplication that occurred during the evolution of different lineages (e.g. 19 in sarcopterygians, 22 in mammals, 23 in chicken, and 27 in zebrafish) (Dehal and Boore 2005; Bertrand et al. 2011; Oulion et al. 2012). In ascidian tunicates, seven Fgf genes have been found representing at least six of the eight chordate subfamilies, and suggesting that Fgf1/2 and Fgf3 were lost during the evolution of the lineage leading to ascidians, at the same time that a novel Fgf member, Fgf-L, originated probably as a duplicate of Fgf7/10/22 (Dehal and Boore 2005; Popovici et al. 2005; Oulion et al. 2012).
According to their functions, Fgf subfamilies have been classically classified into: (I) canonical Fgfs (i.e. Fgf1/2, Fgf3, Fgf4/5/6, Fgf7/10/22, Fgf8/17/18/24, Fgf9/16/20), which act as paracrine and autocrine signaling factors binding and activating the Fgf-receptor tyrosine kinases; (II) endocrine Fgfs (i.e. Fgf19/21/23), which bind the cofactor Klotho instead of heparin and act as long-distance signaling molecules in vertebrates, and (III) intracellular nonsecreted Fgfs (i.e. Fgf11/12/13/14) with intracrine nonsignaling functions that serve as cofactors to other proteins (reviewed in Ornitz and Itoh 2015, 2022). In the present work, as a case study to better understand the evolutionary impact of gene loss, we address the evolution of the FGF signaling pathway in appendicularian tunicates. The independent genome assemblies at the chromosome level of three different cryptic species of O. dioica from different parts of the globe (i.e. Barcelona, Osaka, and Okinawa) revealed an unprecedented level of genome scrambling that made them genetically different, despite their lack of obvious morphological differences (Plessy et al. 2024). This allowed us to identify for the first time with high confidence the full catalogue of Fgf genes in an appendicularian species, and to describe its complete atlas of expression during embryonic and larval development. Our results reveal that only two out of the eight chordate Fgf subfamilies have survived in O. dioica, and that the presence of 10 Fgfs in this species is the result of several gene duplications and losses that occurred during the evolution of the appendicularian lineage. Our findings, moreover, allow us to discuss how the massive losses and bursts of duplications affecting the Fgf family have impacted on the developmental mechanisms underlying the evolution of the appendicularian free-swimming lifestyle in this unique chordate, which functions as an evolutionary knockout for RA signaling. Finally, the results of this case study allow us to propose a “less, but more” scenario as a conceptual framework that facilitates a better understanding of the evolutionary impact of gene loss.
Materials and Methods
Laboratory Culture of Oikopleura dioica
Oikopleura dioica specimens were acquired from animal colonies that have been maintained in our facility in the University of Barcelona for over 5 years. The founder individuals were originally obtained from the Mediterranean coast near Barcelona (Catalonia, Spain) and cultured as detailed in Martí-Solans et al. (2015). This project did not raise any ethical concerns since the experimentation conducted on aquatic invertebrate animals does not fall under the regulations pertaining to animal experimentation, as stipulated in Real Decreto 223 14-3-1998 and Catalonia Ley 5/1995, DOGC2073,5172. Nonetheless, all experimental procedures adhered to the European Union (EU) guidelines for animal care and were formally approved by the Ethical Animal Experimentation Committee (CEEA-2009) of the University of Barcelona.
Genome Database Searches, Gene Identification and Phylogenetic Analyses
Fgf genes in O. dioica were first identified in the reference database of O. dioica relative to the Norwegian population (http://oikoarrays.biology.uiowa.edu/Oiko/) (Denoeud et al. 2010; Danks et al. 2013) with BLASTp and tBLASTn, using as queries the Fgf protein sequences from the vertebrate Homo sapiens and the tunicate C. robusta, as well as several other chordates. The corresponding orthologs were then identified in the telomere-to-telomere genomic assemblies of the Barcelona and Okinawa O. dioica species, and in the superscaffolded version of the Osaka O. dioica cryptic species using tBLASTn (Plessy et al. 2024); NCBI BioProject ID PRJEB55052, and assemblies in https://zenodo.org/records/10241527. In each cryptic species, the newly identified Fgf genes were used to search for further paralogs using tBLASTn to obtain the final Fgf catalogue. The final catalogue in Barcelona was confirmed searching for proteins containing the FGF domain (PF00167), retrieved from the Pfam database, with HMMscan against all the transcripts translated in all reading frames from a Barcelona O. dioica transcriptome assembly (Plessy et al. 2024).
Fgf genes in other appendicularian species were identified using O. dioica and other chordate Fgf proteins as queries for tBLASTn against publicly available genomes (i.e. Oikopleura albicans SCLG01000000, Oikopleura vanhoeffeni SCLH01000000) (Naville et al. 2019). Fgf genes in ascidian species other than C. robusta were identified using the Fgf protein sequences from C. robusta and other chordates as queries in BLASTp and tBLASTn searches against the gene models and whole-genome assemblies of most species available in ANISEED (i.e. Ciona savignyi, Phallusia fumigata, Phallusia mammillata, Halocynthia roretzi, Halocynthia aurantium, Botryllus schlosseri, Botryllus leachii, Molgula occulta, Molgula oculata, and Molgula occidentalis) (Brozovic et al. 2018).
Protein alignments were generated with MUSCLE and MAFFT implemented in Aliview v1.28 (Larsson 2014) and reviewed by hand. Nonhomologous independently extended N- and C-terminus of different subfamilies were aligned in a nonoverlapping manner to reduce background noise among Fgf subfamilies in which no similarity was detected outside the FGF core homology domain. Phylogenetic trees were based on Maximum-Likelihood (ML) inferences calculated with PhyML v3.0 (Guindon et al. 2010), as well as IQ-Tree (Nguyen et al. 2015). Le and Gascuel model was inferred as the best-fit substitution model according to Bayesian information criterion BIC to Fgf data, with a gamma with 4 categories and a shape alpha of 2.4681 (Kalyaanamoorthy et al. 2017). Tree node support was inferred by fast likelihood-based methods aLRT SH-like, aLRT chi2-based, and aBayes, and by standard or ultrafast bootstraps (n = 100) according to computational capacity. Phylogenetic trees were inferred both in complete and trimmed protein alignments, the later by removing all extended regions outside the FGF core homology domain, and in both cases producing the same tree topology regarding the Fgf subfamily homology of appendicularian genes. Gene names were assigned according to the previous literature, and those that were described for the first time in this work were assigned according to the topology in the phylogenetic tree. Oikopleura dioica paralogs were named with letters in alphabetical order, while other appendicularians paralogs were labeled with the last five digits of the genomic scaffold in which they were identified.
Protein Structure Analyses
The domain architecture and functional motifs of Fgf proteins were examined individually with InterProScan, a comprehensive software suite that combines the information stored in several protein databases (including InterPro, PFAM, SMART, or PANTHER) to provide in silico functional characterization of queried protein sequences (Jones et al. 2014). Hydropathy plots were generated in ProtScale available in Expasy (Gasteiger et al. 2005), using the Kyte-Doolittle hydrophobic scale and an interval of 9 amino acids with a linear weight variation model in a normalized scale, following previous similar analysis reported on Fgf9 (Miyakawa et al. 1999). Sequence identity and similarity for every pair of Fgf sequences was obtained from the global pairwise sequence alignment with the Needleman–Wunsch algorithm implemented in EMBOSS needle (Madeira et al. 2024). Three-dimensional structures of O. dioica Fgf proteins were predicted de novo with AlphaFold2 (Jumper et al. 2021). For each Fgf protein, the top-ranked relaxed model was imported into USCF ChimeraX for its visualization, analysis, and image generation (Pettersen et al. 2021). Nuclear Localization Signals (NLS) were predicted with the NLStradammus software using the 4-state HMM static model and a posterior cutoff of 0.5 (Nguyen Ba et al. 2009), or manually identified in the case of classical NLS based on the consensus stated in (Lu et al. 2021). Signal peptide (SP) predictions were conducted using the SignalP 6.0 (Teufel et al. 2022) and Phobius (Käll et al. 2004) software.
Cloning and Expression Analyses
Oikopleura dioica Fgf genes were PCR amplified from cDNA or gDNA obtained from individuals from the Barcelona population as previously described in Martí-Solans et al. (2016). The PCR products were cloned using the Topo TA Cloning Kit (K4530-20, Invitrogen), and the resulting plasmid was digested with the adequate restriction enzyme to synthesize antisense digoxigenin (DIG) riboprobes for whole-mount in situ hybridization (WMISH) (Bassham and Postlethwait 2000; Cañestro and Postlethwait 2007; Martí-Solans et al. 2016). Primers used for the cloning of O. dioica Fgf genes, as well as the DNA used as template and the length of the clone and probe are indicated in supplementary table S4, Supplementary Material online. All Fgf probes were tested in at least two separate experiments of whole-mount in situ hybridization, and often more than four when we observed variability. The described tissue-specific domains were clearly observed in at least five embryos.
Results
Extensive Loss of Fgf Subfamilies in Appendicularian Tunicates
The gold-standard chromosome arm level genome assembly of three cryptic species of the appendicularian O. dioica (BAR, OSA, and OKI) (Plessy et al. 2024) has allowed us to identify the full Fgf catalogue made up of 10 genes in each species (supplementary table S1, Supplementary Material online). Phylogenetic analysis showed a clear one-to-one orthology between each Fgf gene in the three O. dioica cryptic species, providing strong evidence that we had identified the full catalogue of Fgf genes in O. dioica (Fig. 1 and supplementary figs. S1 and S2, Supplementary Material online). Importantly, the use of three independently assembled genomes in our gene survey minimized the possibility of undetected unassembled regions containing additional Fgf genes, providing further evidence that the full Fgf catalogue in O. dioica was made of 10 genes. Phylogenetic analyses provided strong evidence with high node support values indicating that the 10 Fgf genes were paralogs originated by two independent bursts of appendicularian-specific duplications (red solid circled nodes in Fig. 1). The phylogenetic tree topology also indicated with high support values that all O. dioica Fgf genes belonged to only two subfamilies: Fgf11/12/13/14 and Fgf9/16/20 (black solid circled nodes in Fig. 1). Analyses on gene structure, protein domains and protein sequence motifs further supported this phylogenetic classification (see below). In contrast to O. dioica, our survey of 11 ascidian species' genomes revealed that most of them had 7 Fgf genes orthologous to those previously described in Ciona robusta, which are representatives of all chordate Fgf subfamilies except Fgf1/2 and Fgf3 (Satou et al. 2002; Popovici et al. 2005; Oulion et al. 2012). The exceptions were Ciona savingyi, in which we could not identify an ortholog for Fgf4/5/6, and Halocynthia roretzi, Halocynthia auriantum, Botrylloides leachii and Botryllus spp., in which we could not identify orthologs for Fgf-NA1. We also surveyed the genomes of two additional appendicularian species whose genome assemblies were not too fragmented (i.e. Oikopleura albicans and Oikopleura vanhoeffeni) (Naville et al. 2019). All Fgf genes identified in these species belonged to the same two subfamilies found in O. dioica (Fig. 1 and supplementary table S2, Supplementary Material online). These findings across ascidian and appendicularian tunicates suggested that the loss of Fgf1/2 and Fgf3 likely occurred in the last common ancestor of all tunicates, before the divergence of these two groups. Interestingly, while ascidians have not systematically lost any further Fgf subfamily, the appendicularian ancestor lost four additional subfamilies (i.e. Fgf4/5/6, Fgf7/10/22, Fgf8/17/18/24, Fgf19/21/23) before the radiation of this clade (Fig. 1). Regarding the evolutionary origin of Fgf11/12/13/14, which until now has remained unclear (Oulion et al. 2012), our phylogenetic inferences suggested that none of the cephalochordate Fgf genes belonged to this subfamily (supplementary fig. S2, Supplementary Material online). This subfamily appeared to be restricted to tunicates and vertebrates (Fig. 1 and supplementary fig. S2, Supplementary Material online), supporting previous work suggesting that Fgf11/12/13/14 was not present in amphioxus (Bertrand et al. 2011), nor in ambulacrarian genomes, including hemichordates (Oulion et al. 2012; Fan and Su 2015) and echinoderms (Lapraz et al. 2006; Röttinger et al. 2008; Czarkwiani et al. 2021). Our results, therefore, suggested that the Fgf11/12/13/14 subfamily was an innovation of olfactores.

Evolutionary tree of the Fgf subfamilies in chordates. ML phylogenetic tree of the Fgf family in chordates reveals that the 10 Fgf genes found Oikopleura dioica, together with all other genes found in other appendicularian species (in red) group in two clusters with high support values (nodes with red solid circles). The tree topology indicates that the two clusters belong with high support (nodes with black solid circles) to two subfamilies: Fgf9/16/20 (red background) and Fgf11/12/13/14 (blue background). The presence of Fgfs from ascidians (in blue), vertebrates (in black), and cephalochordates (in green) allowed to infer that appendicularians have lost subfamilies Fgf8/17/18, Fgf19/21/23, Fgf7/10/22, and Fgf4/5/6. The absence of Fgf11/12/13/14 in cephalochordates suggests that this subfamily might be a synapomorphy of the olfactores. Well-supported nodes of other Fgf subfamilies (aBayes = 1) with members of more than one subphylum are indicated with gray solid circles. Node support values correspond to likelihood-based method aLRT-SH-like/aBayes/uf-boostrap. The scale bar indicates amino-acid substitutions. Ascidian Fgf names have been maintained according to previous works (Satou et al. 2002; Oulion et al. 2012), despite some of them show ambiguities with the tree topology (i.e. Fgf7/10/22, Fgf-NA1-19/21/23, and FgfL) and the lack of high support for those nodes. Species abbreviations: Vertebrates (in black): Danio rerio (Dre), Gallus gallus (Gga), Homo sapiens (Hsa), Latimeria chalumnae (Lch), Mus musculus (Mmu), Xenopus tropicalis (Xtr); Ascidian tunicates (in blue): Botrylloides leachii (Ble), Botrylloides schlosseri (Bsc), Ciona robusta (Cro), Ciona savignyi (Csa), Halocynthia aurantium (Hau), Halocynthia roretzi (Hro), Molgula occidentalis (Mocci), Molgula occulta (Moccu), Molgula oculata (Mocul), Phallusia fumigata (Pfu), Phallusia mammillata (Pma); Appendicularian tunicates (in red): Oikopleura albicans (Oal), Oikopleura dioica (Odi), Oikopleura vanhoeffeni (Ova); Cephalochordates (in green): Branchiostoma belcheri (Bbe), Branchiostoma floridae (Bfl), Branchiostoma lanceolatum (Bla).
Appendicularian Expansion of the Surviving Fgf11/12/13/14 and Fgf9/16/20 Subfamilies
The phylogenetic analysis showed that the massive loss of Fgf subfamilies during the evolution of appendicularians was accompanied by a burst of duplications of the two surviving subfamilies, resulting in four Fgf11/12/13/14a-d paralogs and six Fgf9/16/20a-f paralogs (Fig. 1). The fact that all Fgf genes identified in the other two analyzed appendicularian species (namely O. vannhoeffeni and O. albicans) also appeared as paralogs within these two subfamilies indicated that the expansions might predate the radiation of the clade. Moreover, the tree topology also suggested that further independent lineage-specific gene duplications might have occurred within each Fgf subfamily (Fig. 1 and supplementary figs. S1 and S2, Supplementary Material online). These results revealed, therefore, that all the paralogs generated after the expansion of each subfamily in the appendicularian lineage are co-orthologs to the single gene representing those subfamilies in ascidians.
Analysis of microsynteny revealed a strong conservation of neighboring genes of each Fgf ortholog across all O. dioica cryptic species (i.e. Barcelona, Osaka, and Okinawa), providing strong support to the assigned homologies (Fig. 2a and supplementary fig. S3, Supplementary Material online). Despite this overall microsyntenic conservation, we also observed occasional small inversions and translocations of neighboring genes in most Fgf gene neighborhoods. Additionally, we noted positional changes of the Fgf orthologs within the same chromosomal arm among the three cryptic species, consistent with the characteristic genome scrambling described in O. dioica (Fig. 2b and supplementary table S1, Supplementary Material online) (Plessy et al. 2024). We did not detect synteny conservation among Fgf paralogs within each subfamily, reinforcing the idea that most of the Fgf gene duplications that expanded each subfamily were not recent, but likely ancestral, occurring before the radiation of the appendicularian clade.

Comparative synteny analysis of Fgf genes among the three Oikopleura dioica cryptic species from Barcelona (BAR), Osaka (OSA), and Okinawa (OKI). a) Comparison of microsynteny conservation between the genomic neighborhoods of Fgf genes (black arrow, and ten adjacent genes on each side). The BAR genome was used as the reference. Here, we show two illustrative examples, in which the Fgf9/16/20e neighborhood represents a case of high level of microsynteny conservation, and the Fgf9/16/20a neighborhood represents a case of low level of microsinteny conservation, especially when compared with OKI. The microsynteny comparison of the full O. dioica Fgf catalogue is provided in supplementary fig. S3, Supplementary Material online. b) Macrosynteny analysis comparing the position of Fgf genes at chromosome arm level. Each Fgf gene is labeled with a distinctive color.
Comparative analysis of gene structure among appendicularian Fgfs and those from cephalochordates, ascidians, and vertebrates provided further support to the conclusion that all Fgf genes in O. dioica belonged to only two subfamilies. The observation that all Fgf genes in amphioxus retain two conserved introns in the core FGF domain (i.e. internal core intron 1 and 2: ici1 and ici2) suggested that the ancestral Fgf gene also had these two introns. In the Fgf11/12/13/14 subfamily, all vertebrate members had retained these two ancestral introns, while in appendicularian and ascidian tunicates, ici1 had been lost, and only ici2 had been preserved in some of their members (Fig. 3). The presence of an additional internal core intron (ici3) exclusively in all O. dioica Fgf11/12/13/14 paralogs, and its identification in some Fgf11/12/13/14 genes from other appendicularian species, further supported the notion that they were paralogs resulting from appendicularian-specific duplications and suggested that ici3 could be a synapomorphic feature of the Fgf11/12/13/14 subfamily in this lineage. Moreover, the fact that we identified two core-flanking introns (i.e. cfi1 and cfi2) in all vertebrate and tunicate Fgf11/12/13/14 genes that were absent in all Branchiostoma floridae and ambulacrarian Fgf genes, suggested that these two core-flanking introns could be a conserved synapomorphy of the Fgf11/12/13/14 subfamily innovated in the olfactores clade (Fig. 3).

Comparative gene structures of O. dioica and other chordate Fgf genes. Exon–intron organization supports phylogenetic classification of Fgf9/16/20 and Fgf11/12/13/14 paralogs of O. dioica. “cfi” denotes conserved core-flanking introns, and “ici” denotes conserved internal core introns. Bfl_Fgfs represent the common structure of cephalochordate Fgf genes, featuring two internal core introns (ici) within the FGF domain coding sequence. Gene-specific introns are depicted as arrowheads and dashed lines at their respective locations. Predicted functional motifs are indicated as described in the legend. Black underlines highlight the presence and location of β-sheets as predicted by AlphaFold2. Orange dashed underlines highlight the presence and location of β-sheets that have been empirically determined, even though the AlphaFold2 software does not predict them (Goetz et al. 2009; Olsen et al. 2003; Plotnikov et al. 2001). For comparative purposes, genes and motifs are not drawn to scale. Dashed lines indicate alternative splicing variants of Fgf11/12/13/14 (Dis, distal; Med, medial; Pro, proximal), and black dashed lines boxes indicate exon length differences between O. dioica cryptic species.
Mapping of RNAseq and EST data in the genomes of the three O. dioica cryptic species revealed at least four alternative first exons (namely, distal, proximal, and middle 1, middle 2, etc.) that could give rise to different isoforms due to alternative splicing and transcription start site usage in most Fgf11/12/13/14 paralogs (Fig. 3). The presence of alternative isoforms differing in their N-terminus in members of the Fgf11/12/13/14 subfamily has been also described in vertebrates (Munoz-Sanjuan et al. 2000; Pablo and Pitt 2016). Our genome database surveys revealed evidence of similar alternative splice variants of the first exon in C. robusta (GeneID:445758 in NW_004190431.2) and other ascidian species, as we have also found in O. dioica, thus suggesting that this feature might be an ancestral characteristic of this Fgf subfamily in olfactores. Although many of these alternative first exons were rich in positively charged residues (i.e. Lysines and Arginines) and displayed similar hydrophobicity profiles among paralogs, sequence conservation was poor, supporting the hypothesis that the duplications that originated them were ancient in the evolution of appendicularians (Fig. 3). The presence of small differences in the alternative splice variants among the three O. dioica cryptic species (e.g. the first intron of the Fgf11/12/13/14d has been incorporated in the open reading frame in BAR, but not in OSA or OKI), together with the presence of cryptic species-specific introns (i.e. Fgf11/12/13/14c showed a unique intron in the FGF domain exclusively in OKI) illustrated the rapid evolution of the Fgf genes in appendicularians, and provided an example of genetic variation among the cryptic species (Fig. 3).
Analysis of gene structure of Fgf9/16/20 paralogs in O. dioica revealed that while Fgf9/16/20a had retained the 2 ancestral introns in the FGF domain (e.g. ici1 and ici2), which were also conserved in human and ascidian Fgf9/16/20 orthologs, all the other O. dioica Fgf9/16/20 paralogs (i.e. Fgf9/16/20b-f) showed an intronless structure (Fig. 3). This intronless structure suggested that an ancestral Fgf9/16/20a-like gene, most similar in sequence to Fgf9/16/20 orthologs in other chordate species, could have been duplicated by the integration of a retrotranscribed form followed by further gene duplications. The absence of introns in most Fgf9/16/20 genes found in other appendicularian species further reinforced the idea that this subfamily was expanded ancestrally in this clade, and that the six Fgf9/16/20 paralogs in O. dioica belonged to the same subfamily. Further evidence supporting that all O. dioica Fgf9/16/20 paralogs belonged to this subfamily was the conservation of two cysteine residues that were present in all tunicate Fgf9/16/20 orthologs and absent in all other Fgf subfamilies (supplementary fig. S4, Supplementary Material online).
Canonical Fgf9/16/20 Paracrine and Fgf11/12/13/14 Intracellular Functions
Analysis of protein sequence similarity among Fgfs revealed that during the expansion of the two surviving subfamilies in appendicularians, one or two of their members (namely, Fgf9/16/20a and Fgf11/12/13/14a-b) conserved high similarity with their co-orthologs in other chordate species, while the other paralogs suffered a remarkable sequence divergence, especially within the Fgf9/16/20 subfamily. Thus, for instance, while sequence identity among paralogs of the Fgf9/16/20 subfamily in humans ranges from 62% to 69.6% throughout the entire protein (80.6% to 87.8% throughout the FGF core), in O. dioica sequence similarity among some Fgf9/16/20 paralogs was as low as 18.7% (21.2% in the FGF core) (supplementary table S3, Supplementary Material online). To understand how this sequence divergence might have impacted the function of the paralogs within each subfamily, we examined their conserved protein domains with HMMscan, as well as the presence of potential signal peptides (SP), nuclear localizations signals (NLS), or other conserved motifs known to interact with other cofactors and proteins.
In the Fgf9/16/20 subfamily, despite the variability of the HMMscan e-values of the FGF domains, ranging from 1−35 of the Fgf9/16/20a to 1−7-1−13 of the Fgf9/16/20b-f, the AlphaFold2 software predicted that most O. dioica Fgf9/16/20 paralogs displayed the characteristic twelve β-sheets conforming to the typical β-trefoil fold structure of vertebrate Fgf9/16/20 proteins (supplementary figs. S4 and S5A, Supplementary Material online). Comparison of the 3D-models of the Fgs of humans and O. dioica revealed a high level of structural overlapping, suggesting conserved Fgf functions (supplementary fig. S5B, Supplementary Material online). Moreover, the presence of regions enriched in positively charged residues (e.g. Arginine and Lysine) in all O. dioica Fgf9/16/20 paralogs, especially near the end of the FGF domain where heparin binding sites (HBS) have been identified in vertebrates (Fig. 3 and supplementary fig. S4, Supplementary Material online) (Xu et al. 2012), suggested that these regions could bind heparin or heparan sulfate proteoglycans. The presence of a β-trefoil fold and HBS, therefore, suggested that all O. dioica Fgf9/16/20 paralogs could potentially interact with Fgf receptors on the cell surface to function through the canonical FGF signaling. To investigate the secretion potential of O. dioica Fgf9/16/20 paralogs, we examined the presence of signal peptides (SP) with signalP and Phobius software. The results predicted the presence of SP (likelihood > 0.5) in the N-terminus of four of the Fgf9/16/20 paralogs (namely, Fgf9/16/20a-d), but not in the other two (Fgf9/16/20e-f). The presence of SPs in four of the Fgf9/16/20 paralogs was comparable to the SP described in the N-terminus of Fgf9/16/20 of C. robusta (Satou et al. 2002), and it can explain the degeneration of the noncanonical signal peptide EFISIA motif within the FGF core, which is required for extracellular secretion via the endoplasmic reticulum in other organisms (Popovici et al. 2004). Interestingly, the absence of an SP in Fgf9/16/20e-f correlated with a certain conservation of the EFISIA motif (i.e. TFIQIA), producing a peak of hydrophobicity like those observed in the EFISIA motif of Fgf9/16/20 in other species (Fig. 3 and supplementary fig. S6, Supplementary Material online) (Miyakawa et al. 1999; Popovici et al. 2004). These observations, therefore, further supported the paracrine nature of the Fgf9/16/20 subfamily in appendicularians and suggested that different paralogs might have evolved different secretion mechanisms.
In the Fgf11/12/13/14 subfamily, its members have been traditionally associated with intracellular functions through interacting with various proteins, such as regulators of voltage-gated channels (i.e. Navs or Cavs), as regulators of transcription factors (i.e. islet brain-2 or NEMO), or as players of neuronal cytoskeleton architecture and cell morphology (Pablo and Pitt 2016). Like their vertebrate counterparts, all appendicularian Fgf11/12/13/14 paralogs lacked a signal peptide (Fig. 3), providing the first clue of a conserved intracellular function. The conservation in all O. dioica Fgf11/12/13/14 paralogs of a Leucine and an Arginine pair in positions that have been described to be critical for the interaction with Navs and islet brain-2 (Olsen et al. 2003; Pablo and Pitt 2016), and that were conserved in all vertebrate and ascidian members of the Fgf11/12/13/14 subfamily, but not in other Fgf subfamilies, reinforced the idea that this subfamily also played an intracellular function in appendicularians (supplementary fig. S4, Supplementary Material online). The presence of multiple alternative splice variants with different first exons found in most O. dioica Fgf11/12/13/14 paralogs (Fig. 3) was also consistent with the same feature described in vertebrates for Fgf11/12/13/14 genes with intracellular functions related to the modulation of voltage-gated channels regulating neural excitability (Munoz-Sanjuan et al. 2000; Laezza et al. 2009).
Upon the growing evidence that Fgf proteins may also have intranuclear functions (Popovici et al. 2006), we conducted NLS predictions with NLStradamus and searched for KRVR motifs known to provide NLS in O. dioica (Clarke et al. 2007). Our analysis revealed that most O. dioica Fgf11/12/13/14 paralogs possess an NLS at the end of the FGF core, similarly to vertebrate and ascidian Fgf11/12/13/14 proteins. Interestingly, we also found NLS in some of the alternative first exons that generate different isoforms diverging at the N-terminus, suggesting that these different isoforms not only might have different promoter usage, but also different intracellular localizations (Fig. 3). We also found NLS in three out of the six O. dioica Fgf9/16/20 paralogs, including the two that lack an SP, implying that these paralogs may have also evolved nonsecreted functions.
Overall, our findings from the structural analysis were consistent with paracrine functions for the Fgf9/16/20 subfamily and intracellular functions for the Fgf11/12/13/14 subfamily in appendicularians. The high sequence divergence, variation in the presence of putative SP and NLS, and the formation of different isoforms due to differential splice variants in the N-terminus raised the possibility that multiple functions might have also evolved among the different paralogs duplicated during the expansion of these two surviving Fgf subfamilies in appendicularians.
Fgf Expression Atlas During the Development of Oikopleura dioica
To better understand the functional consequences of gene loss and gene expansion on the Fgf subfamilies in O. dioica, we performed an exhaustive expression analysis of all the Fgf9/16/20 and Fgf11/12/13/14 paralogs by whole-mount in situ hybridization (WISH) throughout development, from eggs to late-hatchling stages (Fig. 4 a-j). In general, we found that the level of expression signal of most Fgf genes was low, and long periods of staining (e.g. between 1 and 14 d) were required to visualize some of the tissue-specific expression domains. Our description here will mostly focus on tissue-specific Fgf expression domains repeatedly observed in different embryos over background levels, but we cannot discard that in addition to those specific domains some of the Fgf genes also had a generalized basal expression scattered in other parts of the embryo, as it has been also described in other animals including C. robusta and amphioxus (Imai et al. 2004; Bertrand et al. 2011).

Developmental expression atlas of O. dioica Fgf genes. Whole-mount in situ hybridization images of O. dioica at various developmental stages: eggs (a-j1), 8-cell embryos (a-j2), 64-cell embryos (a-j3), incipient tailbud (ITB) embryos (a-j4), early tailbud (ETB) embryos (a-j5), mid tailbud (MTB) embryos (a-j6), late tailbud (LTB) embryos (a-j7), just hatchlings (a-j8), early hatchling larvae (a-j9), mid hatchling larvae (a-j10), and late-hatchling larvae (a-j11). Central images in each panel are left lateral views, oriented anterior to the left and dorsal to the top. Upper-right image insets (‘) are dorsal views of optical cross-sections at the levels indicated by black dashed lines. Black arrowheads label an stained cortical spot in unfertilized eggs; black double arrowheads label the A pair blastomeres in 8-cell embryos; orange arrowheads mark ingressing vegetal blastomeres in the 64-cell embryos; blue arrowheads label neural derivatives (cyan-blue labels the neural plate in 64-cell and ITB embryos, and the nerve cord in later stages; dark-blue labels the caudal ganglion, and pale-blue labels the anterior brain); light green arrowheads label epidermal domains in the trunk and light green double arrowheads label the area of the Langerhans receptors primordia; dark green arrowheads label epidermal domains in the tailbud tip; green dashed lines mark the lateral epithelium of the tail and the fins; purple arrowheads label undetermined endomesodermal domains in the trunk; magenta arrowheads mark the mouth primordium; magenta double arrowheads mark the pharyngeal slits; yellow arrowheads label notochord cells; red arrowheads label muscle precursor cells and muscle cells in the tail.
In oocytes, some Fgf genes (i.e. Fgf9/16/20a,d,e, and Fgf11/12/13/14a,b,d) showed a weak staining signal in the cytoplasm that was difficult to distinguish over the staining background. This suggested that, in general, Fgf transcripts were not a major component of the maternal contribution (Fig. 4 a-j1). In the case of Fgf9/16/20e and Fgf9/16/20f, however, we observed a small but intense staining spot in the cortical area of many unfertilized eggs (n = 9/11 and n = 11/13, for Fgf9/16/20e and Fgf9/16/20f, respectively), but not in all. This suggested that the formation of this Fgf transcripts spot might be transient and difficult to capture, or perhaps not present in all individuals (Fig. 4 e-f1, black arrowheads). In the early stages of development, from 8- to 64-cell, all Fgf9/16/20 paralogs started showing staining signals, suggesting that their expression onset took place concomitant with the activation of zygotic transcription (Wang et al. 2015) (Fig. 4 a-f2 & a-f3). At the 8-cell stage, many Fgf9/16/20 paralogs (i.e. Fgf9/16/20c, Fgf9/16/20d, Fgf9/16/20e, and Fgf9/16/20f) showed staining signal restricted to the smaller pair of blastomeres in the vegetal pole (Fig. 4 c-f2, black double arrowheads). These cells corresponded to the A/A4.1 blastomere pair in Delsman/Conklin nomenclature, which gives rise to most of the nervous system, the notochord, and other endomesodermal derivatives (Nishida 2008; Stach et al. 2008). At the gastrula stage (32 to 64 cells), we found that all Fgf9/16/20 paralogs were expressed in the precursor blastomeres of either mesodermal or ectodermal derivatives. Fgf9/16/20a, Fgf9/16/20b, and Fgf9/16/20c staining was detected in the endomesodermal blastomeres (Fig. 4 a-c3, orange arrowheads); Fgf9/16/20b, Fgf9/16/20e, and Fgf9/16/20f staining was detected in the neural plate (Fig. 4 b3 & e-f3, cyan arrowheads); Fgf9/16/20d was detected in notochord precursor cells (Fig. 4 d3, yellow arrowheads); and Fgf9/16/20f staining was detected in muscle precursor blastomeres (Fig. 4 f3, red arrowheads). Consistent with these observations in early developmental stages, the majority of tissue-specific Fgf expression domains observed during later embryogenesis were also predominantly associated with ectodermal derivatives (e.g. nervous system and epidermis) or mesodermal derivatives (e.g. notochord and muscle).
Among ectodermal derivatives, the staining signal of Fgf9/16/20 paralogs observed in the neural plate of 64-cell stage embryos persisted in cells of the developing nervous system up to the hatchling stage. These signals were observed in precursors of the brain, the caudal ganglion, and the spinal cord (Fig. 4 a-f, different tones of blue arrowheads). The fact that some of these neural expression domains disappeared at specific developmental stages suggested that some Fgf9/16/20 paralogs might be precisely regulated in specific subsets of neural populations along the anteroposterior axis during the formation of the central nervous system. For instance, Fgf9/16/20f expression domains were clearly observed in the neural plate at 64-cell stage and in the posterior part of the brain in incipient tailbud (ITB) embryos (Fig. 4 f3 to 4, cyan arrowheads), but they disappeared in subsequent stages until the just-hatchling stage, when an expression domain reappeared in the developing brain (Fig. 4 f8, light blue arrowhead). We also observed staining signals for Fgf11/12/13/14 paralogs in neural domains. However, in contrast to Fgf9/16/20 genes, which were predominantly expressed before hatching, Fgf11/12/13/14 genes were mainly expressed during the hatchling stages (Fig. 4 g-j, different tones of blue arrowheads). Neural expression domains of Fgf11/12/13/14 genes were detected in various locations within the nervous system, including specific cells of the brain dorsal to the sensory vesicle, the ventral region of the ciliary funnel (Fig. 4 g10 to 11, light blue arrowheads), groups of cells in the caudal ganglion (Fig. 4 g9 to 11, h10-11, i10-11 & j11, dark blue arrowheads), and isolated cells at different positions along the nerve cord in the tail (Fig. 4 g10 to 11, cyan arrowheads). Overall, these results revealed a complex pattern of Fgf expression in the developing neural system, suggesting that despite the extensive loss of Fgf subfamilies, the expansion of the surviving Fgf genes has allowed the preservation of neural functions during appendicularian development, similar to other chordates.
Among ectodermal derivatives, we also observed specific expression domains for various Fgf genes in the epidermis, both in the tail and in the trunk (Fig. 4, different tones of green arrowheads and dashed lines). In the tail, we observed a dynamic expression pattern with different Fgf genes expressed at different levels of the anteroposterior axis, mainly in two areas: the developing fin and the tip of the tail. In the precursor cells of the fin, Fgf11/12/13/14b expression was first detected in a bilateral pair of epidermal cells located in the middle region of the tail (Fig. 4 h7, green dashed lines). This expression later spread to the first third of the tail at the just-hatch stage and eventually extended, along with other Fgf paralogs (namely, Fgf11/12/13/14c, Fgf9/16/20a, Fgf9/16/20b, and Fgf9/16/20c) to the posterior half of the tail in mid- and late-hatchlings (Fig. 4 a10 to 11, b-c9-11, h7-11 & i10-11, green dashed lines). In the tip of the tail, a pair of epidermal cells started showing strong staining for Fgf9/16/20e and Fgf9/16/20f at the ITB stage. While Fgf9/16/20e expression was downregulated by the mid-tailbud stage, Fgf9/16/20f signal persisted until the mid-hatchling stage (Fig. 4 e4 to 5 & f4 to 10, dark green arrowheads). In the trunk, bilateral groups of epidermal cells expressed Fgf9/16/20f at different levels of the anteroposterior axis (Fig. 4 f4 to 10, light green arrowheads). The most posterior group included the primordia of the Langerhans receptors, which also expressed Fgf11/12/13/14c (Fig. 4 f8 to 9 & i9, light green double arrowheads). In the most rostral region of the trunk epidermis, Fgf11/12/13/14a showed an expression domain in a group of subepidermal cells in the area of the mouth at the just-hatchling stage. This expression domain was later expanded to the epidermal surface by the mid-hatched stage, coinciding with the opening of the mouth (Fig. 4 g8 to 10, magenta arrowheads). Interestingly, Fgf11/12/13/14a expression was also observed in the pharyngeal slits in the mid- and late-hatchling larvae (Fig. 4 g10 to 11, magenta double arrowheads). The expression of Fgf11/12/13/14a in the primordia of organs in which ciliated sensory cells developed (i.e. the stomodeum, the ciliary funnel, and the ciliary rings), together with the expression of Fgf9/16/20f and Fgf11/12/13/14c in the primordia of the Langerhans receptors, suggested that FGF signaling might be involved in the development of placodal derivatives in appendicularians, as well as in structures in which epithelial perforation and fusions occur, as described in other chordates (Bassham and Postlethwait 2005; Kourakis and Smith 2007; Bassham et al. 2008). In the late-hatchling stages, nearly all Fgf genes were strongly expressed in different parts of the oikoblast, the organ responsible for the architecture and secretion of the house (Fig. 4 a-j11). Some showed generalized patterns, while others were restricted to or excluded from specific regions. For example, Fgf9/16/20d was restricted to cells adjacent to the anterior cells of the field of Fol, the anterior rosette, the field of Martini, the posterior rosette, and its adjacent lateral bands (Fig. 4 d11). In contrast, Fgf9/16/20a-b were expressed throughout the entire oikoblast, but excluded from the ring of the mouth and the Giant cells (Fig. 4 a-b11). This finding suggested that FGF signaling has been recruited for the development of this innovative organ responsible for the formation of the house, as described for many other developmental genes in appendicularians (Mikhaleva et al. 2018).
Among endomesodermal derivatives, the notochord exhibited an Fgf9/16/20a expression domain restricted to the first and third cells from the early tailbud (ETB) to late tailbud (LTB) stages (Fig. 4 a5 to 7, yellow arrowheads). In these stages, Fgf9/16/20a staining was also observed in a few internal cells bilaterally located in the anterior half of the trunk, whose positions were compatible with endomesodermal progenitors of the pharynx, endostyle or buccal glands (Fig. 4 a5 to 8, purple arrowheads). At the LTB stage, a new mesodermal expression domain of Fgf9/16/20a appeared restricted to the first and eighth pairs of muscle cells, and it was maintained until the early hatchling stage (Fig. 4 a7 to 9, red arrowheads). Other Fgf paralogs with broad, ubiquitous expression patterns also showed stronger expression in muscle cells compared to other parts of the embryo (i.e. Fgf9/16/20b, Fgf9/16/20c, Fgf11/12/13/14b, and Fgf11/12/13/14c; Fig. 4, red arrowheads). From ITB to LTB stages, two cells located on the right side of the anterior part of the notochord, which later at the early hatchling stage were located anteriorly in a rostral position to the notochord, expressed a very distinct expression of Fgf11/12/13/14c (Fig. 4 i6 to 8, purple arrowheads). We could not determine the identity of these endomesodermal cells but, considering their position, they could be related to the development of endodermal progenitors of the digestive system or the gonad (Olsen et al. 2018).
Discussion
Less, but More: Massive Gene Losses Accompanied by Bursts of Duplications of the Surviving Paralogs
The study of FGF signaling is central for understanding many fundamental functions of cell biology, including proliferation, differentiation, migration, apoptosis, and survival of cells, from embryo development to adult tissue homeostasis, as well as for understanding how its malfunctioning can cause several diseases (reviewed in Dorey and Amaya 2010; Xie et al. 2020). FGF signaling has an ancient evolutionary origin, at least already present in the ancestral eumetazoan (Bertrand et al. 2014). Some of the conserved core functions of the FGF signaling also have an ancestral origin, such as mesodermal induction which predates the Cambrian explosion and the origins of Bilateria (Matus et al. 2007). Different taxa, however, have innovated a great variety of other FGF functions, in many cases associated with gene duplications, and often accompanied by gene losses (Popovici et al. 2005; Oulion et al. 2012). During the evolution of chordates, for instance, the expansion of eight single-gene subfamilies up to 27 Fgf genes early in the evolution of vertebrates due to the two rounds of genome duplication has been linked to the innovation and sophistication of many of the characteristic vertebrate traits, including axial patterning, somitogenesis, limb bud formation, visceral and skeletal development (Thisse and Thisse 2005), and even the invention of a “new head” (Bertrand et al. 2011). Studies of FGF signaling in ascidian tunicates have contributed to reveal that some of the traits that characterize vertebrates, indeed, were not vertebrate innovations, but were already present in the last common ancestor of olfactores, such as the role of Fgf8/17/18 in the organizer activity and the compartmentalization of the central nervous system and in placodal derivatives (Kourakis and Smith 2007; Imai et al. 2009; Wagner and Levine 2012; Stolfi et al. 2015; Horie et al. 2018). Here, our results in O. dioica unveil an unprecedented case among chordates, in which massive gene losses have erased all Fgf subfamilies but two, Fgf9/16/20 and Fgf11/12/13/14. Interestingly, the massive gene losses have been accompanied by two bursts of duplications, each affecting each subfamily, that gave rise to several Fgf paralogous genes describing an evolutionary scenario of “less, but more” (Fig. 5a).

Evolutionary scenario of Fgf subfamilies in chordates. a) Evolutionary tree of chordate subphyla indicating main events of losses (L), gains (G), duplications (D), or expansions by burst of duplications (E), as well as main associated patterns of conservation, innovation, function shuffling and extinction of Fgf expression domains. 2R-WGD: two rounds of whole-genome duplication. b) Comparative schematic representation of the main expression domains of Fgf subfamily members between ascidians and appendicularians (tailbud stage in the left, and hatchling in the right). Distinct colors are assigned to each Fgf as indicated in the figure legend. FgfNA1 expression has not been described in ascidians, and Fgf4/5/6* has been detected maternally and widely throughout development with no obvious tissue-specific domains (Imai et al. 2004). In appendicularians, the expression of Fgf9/16/20 paralogs is the most abundant in tailbud stages, and the expression of Fgf11/12/13/14 paralogs is more obvious at hatchling stages. Asterisk (in br*) denotes that expression was found in a slightly earlier stage than the one represented in the figure. Ascidians show a characteristic PTFS (Posterior-Tail FGF Source) and ATRAS (Anterior-Tail RA Source), while in appendicularians the loss of RA signaling (Cañestro and Postlethwait 2007; Martí-Solans et al. 2016) might be related to the innovation of an ATFS (Anterior-Tail FGF Source), considering that the conserved RA-FGF antagonistic action emerged at the base of olfactores (Pasini et al. 2012; Bertrand et al. 2015). Abbreviations: an, anterior notochord; am, anterior muscle; br, brain; cf, ciliary funnel; cns, central nervous system; gs, gill slit; m, mouth; o, oikoblastic epithilium; pl, placode; pm, posterior muscle; te, tail epidermis; tte, terminal tail epidermis.
Our analysis of gene phylogenies, synteny conservation, gene architecture and structural protein motifs across various appendicularian species provide solid evidence that all Fgf genes in appendicularians belong to only two subfamilies. This indicates that the losses likely occurred at the base of this clade, and that different species might have independently expanded their Fgf catalogue in a very dynamic fashion. This dynamic evolution of Fgf genes is particularly evident when comparing different cryptic species of O. dioica, where divergences are observed in the presence of different isoforms due to alternative splicing, significant sequence divergence outside the FGF domain, presence of novel introns, and microsyntenic rearrangements. The vast conservation of the FGF domain with their typical β-trefoil topology supports the idea that Fgf paralogs in O. dioica, despite their sequence divergence, can function in a similar way to the typical Fgf ligands of other animals. The common presence of secretion motifs in Fgf9/16/20 paralogs suggests that members of this subfamily can act extracellularly through the canonical signaling pathway typical of this subfamily (Itoh and Ornitz 2011; Ornitz and Itoh 2015). On the other hand, the extensive presence of nuclear localization signals in Fgf11/12/13/14 paralogs suggests that members of this subfamily might have intracellular functions, as it has been described for members of this Fgf subfamily in other chordates (Smallwood et al. 1996; Pablo and Pitt 2016).
Evolutionary Patterns Associated with the “Less, but More” Scenario of Fgf Evolution in Tunicates
Our comparative Fgf expression analysis between appendicularians, ascidians, and other chordates allows us to identify cases that can be categorized into four different evolutionary patterns associated with the “less, but more” scenario (Fig. 5a):
Conservation of ancestral expression domains
In the first category, we found that the early expression domains of the single ascidian co-ortholog Fgf9/16/20 in the A4.1 derived vegetal blastomeres of C. robusta have been preserved in four of the Fgf9/16/20 paralogs of O. dioica (i.e. Fgf9/16/20c, d, e, and f) in the same equivalent vegetal blastomeres (Imai et al. 2002a; Bertrand et al. 2003; Hudson et al. 2016; Satou 2020). This expression has been related to an ancestral function of the FGF signaling conserved among most bilaterians in initiating mesodermal and endodermal gene regulatory networks (Technau and Scholz 2003). In ascidians, Fgf9/16/20 is sufficient for the first phase of initiation of mesodermal induction, although Fgf8/17/18 is also required for a second phase of fate maintenance (Yasuo and Hudson 2007). This second phase might have been modified in O. dioica upon the loss of the Fgf8/17/18 subfamily. Moreover, the early expression of Fgf9/16/20 paralogs at the eight-cell stage in O. dioica is compatible with recent findings in ascidians in which FGF acts as a timer for zygotic genome activation whose responsiveness sharply starts between the 8- and 16-cell stages (Treen et al. 2023).
A second example of conservation of ancestral expression domains is shown by Fgf9/16/20e-f in the developing central nervous system from the 64-cell to tailbud stages, comparable to the expression of the ascidian Fgf9/16/20 co-ortholog in equivalent positions in the central nervous system dorsally at the level of the anterior tip of the notochord. This similarity suggests conserved Fgf9/16/20 ancestral functions in the anteroposterior patterning of the central nervous system among tunicates (Imai et al. 2002b; Miyazaki et al. 2007) (Fig. 5b).
For the Fgf11/12/13/14 subfamily, our data show that many of the expression domains of its paralogs in O. dioica are associated with the nervous system (i.e. dorsal and anterior ventral part of the brain, subset of cells of the caudal ganglion, as well as in isolated neurons located at different levels) (Fig. 5b). This neural Fgf expression is a shared characteristic with their vertebrate homologs, many of which are expressed in neurons where they perform Fgf receptor-independent intracellular functions. These functions involve interactions with voltage-gated sodium channels, affecting neuronal excitability, and have been implicated in human neuronal diseases (Wang et al. 2015). Supporting this neuronal function, our findings show that most O. dioica Fgf11/12/13/14 paralogs produce different isoforms corresponding to alternative splice variants of the first exons, a feature also shared with vertebrate Fgf11/12/13/14 genes that interact with voltage-gated sodium channels (Laezza et al. 2009). In ascidians, although Fgf11/12/13/14 has no detectable expression during development (Satou et al. 2002; Treen et al. 2014), our finding of alternative splicing in the first exon also suggests a conserved neural function. This is consistent with recent single-cell RNA-seq data in ascidians showing abundant Fgf11/12/13/14 transcripts in specific neurons of the tail, including bipolar tail neurons, which have been proposed to share properties with neural-crest-derived dorsal root ganglia (Stolfi et al. 2015; Horie et al. 2018). Future expression analyses of the ascidian Fgf11/12/13/14 in postmetamorphic animals to uncover more types of neurons that express this intracrine Fgf ligand, as well as functional characterization of its neural role in modulating neuronal excitability, will be of great interest to develop new animal models to better understand the molecular basis of related human neuronal disorders. Moreover, considering that our results reinforce the hypothesis that amphioxus and ambulacrarians lack Fgf11/12/13/14 (Lapraz et al. 2006; Röttinger et al. 2008; Bertrand et al. 2011; Fan and Su 2015; Czarkwiani et al. 2021), which suggests that its origin could be an innovative synapomorphy of olfactores (Fig. 5a), further studies of this gene subfamily in appendicularians and other tunicates could shed light on the role of intracrine Fgf ligands in the evolution of the nervous system within this clade following its divergence from cephalochordates.
(2) Function shuffling among surviving paralogs upon the loss of genes
In this second category, our study reveals two potential cases of function shuffling between the ascidian Fgf8/17/18 or Fgf7/10/22 and the O. dioica Fgf9/16/20f, all of which belong to the group of paracrine/autocrine secreted Fgf ligands. First, the Fgf8/17/18 epidermal expression domain in the tip of the tail of ascidians, which acts as a secreted posterior tail FGF source (PTFS) (Pasini et al. 2012; Kim et al. 2020), is comparable to the equivalent domain of Fgf9/16/20f in the posterior tip of the tail in O. dioica (Fig. 5b). Second, Fgf8/17/18 and Fgf7/10/22 in ascidians are also expressed during the development of the atrial siphon, which has been related to the evolution of otic placode homologs (Kourakis and Smith 2007). In O. dioica, in the absence of these two genes, it is the Fgf9/16/20f and Fgf11/12/13/14c the paralogs that show equivalent expression domains in the Langerhans receptor primordia, which have been proposed to be homologous placodal structures in appendicularians (Bassham and Postlethwait 2005) (Fig. 5b).
(3) Innovation of novel expression domains in novel paralogs
In the third category, among the novel expression domains innovated by the duplicated paralogs in O. dioica, we find at least three potential cases. First, in early hatchling stages Fgf11/12/13/14a is specifically expressed in the stomodeum, coinciding with the time when the mouth is opening (Fig. 5b). No comparable expression has been observed for the ascidian Fgf11/12/13/14 gene, which shows no detectable expression during development (Satou et al. 2002; Treen et al. 2014). Interestingly, the stomodeum and mouth opening derive from the anterior neuropore in ascidians (Veeman et al. 2010), while in O. dioica these structures develop in the most rostral part of the trunk directly connecting to the pharynx. In O. dioica, the recruitment of Pax2/5/8a expression in the primordium of the stomodeum has been suggested to be related to cellular functions of perforation, adhesion and fusion of epithelial openings, including the mouth (Bassham et al. 2008). Therefore, despite the expression of placodal markers such as Pitx suggests deep genetic homology among mouths and adenohypophysis-like organs—i.e. the ciliary funnels in ascidians and appendicularians, and Hatschek's pit in amphioxus (Bassham and Postlethwait 2005)—we cannot discard the possibility that the mouth of ascidians and appendicularians have independent evolutionary origins recruiting a common cassette of placodal genes, as has been suggested for other placodal-derived structures (Bassham and Postlethwait 2005). The involvement of different Fgf ligands in the late development of the mouth and pharyngeal slits in various animals across diverse taxa suggests that the FGF signaling might have been repeatedly recruited during the evolution of perforated structures (Crump et al. 2004; Röttinger et al. 2008; Bertrand et al. 2011; Fan et al. 2018; Rees et al. 2024). Moreover, the high expression of Fgf11/12/13/14a in the stomodeum, pharyngeal slits, and in the rostro-ventral part of the brain related to the ciliary funnel, regions also expressing Pax2/5/8a, suggests that O. dioica may have innovated the recruitment of Fgf11/12/13/14a in the evolution of mechanisms related to ciliary cells and perforated structures. To our knowledge, this would be the first case in which an intracrine Fgf ligand has been related with the development of the mouth.
The second case is related to Fgf expression in mesodermal derivatives, such as the notochord or muscle cells, in the anterior region of the tail during tailbud and hatchling stages. In ascidians, no Fgf expression has been detected in cells of the notochord or tail muscles in tailbud/hatchling stages, except for Fgf9/16/20 being expressed in the most posterior pair of muscle cells near the tip of the tail (Imai et al. 2004; Pasini et al. 2012). In contrast, O. dioica exhibits strong expression of Fgf9/16/20a in the most posterior muscle cells of the tail and the most anterior pair, as well as in the first and third cells of the notochord. Additionally, Fgf9/16/20b-d expression is observed above-background levels in the three most anterior and ventral muscle cells at tailbud stages. These results reveal that the anterior part of the tail in appendicularians could act as an anterior tail mesoderm FGF source (Anterior Tail FGF Source, ATFS) of secreted Fgf9/16/20 ligands (Fig. 5b). This may drastically differ with ascidians in which the Fgf9/16/20 source is restricted to the most posterior part of the tail (Posterior Tail FGF Source, PTFS), often associated with tail elongation and posterior cell identity differentiation or survival, similar to vertebrates (Diez del Corral and Storey 2004; Imai et al. 2004; Olivera-Martinez et al. 2012; Pasini et al. 2012). Interestingly, considering that the anterior region of the tail in ascidians serves as a source of RA signaling (Anterior Tail Retinoic Acid Source, ATRAS, Fig. 5b) from the most anterior Aldh1a1/2/3-positive cells of the tail muscle (Nagatomo and Fujiwara 2003), and given the antagonistic action between FGF and RA signaling conserved in ascidians and vertebrates, where down-regulation of RA signaling often leads to increased FGF production (Diez del Corral and Storey 2004; Olivera-Martinez et al. 2012; Pasini et al. 2012; Paschaki et al. 2013), it is tempting to speculate that the evolutionary innovation of this ATFS in appendicularians might be related to their loss of RA signaling (Cañestro and Postlethwait 2007; Martí-Solans et al. 2016) (Fig. 5). This loss would have reduced selective constraints, allowing some Fgf9/16/20 genes to gain novel expression domains in the anterior part of the tail. This drastic difference could represent a major shift in developmental signaling sources between appendicularians and ascidians, potentially driving the divergent evolution of developmental processes associated with the distinct adult body plans and lifestyles that characterize these two groups of tunicates.
The last case of novel Fgf expression domains in O. dioica are related to the patterning of the epidermis in late-hatchling stages. In the tail, the lateral wings that will develop into the tail fin express three Fgf9/16/20 and two Fgf11/12/13/14 paralogs. In the trunk, the oikoblast, which is the epidermal organ responsible for building the house, has recruited the expression of nearly all Fgf9/16/20 and Fgf11/12/13/14 paralogs, most of them in a broad fashion, although some appear to be restricted or excluded from certain fields within this complex organ (Fig. 4a11-j11). The fact that no similar Fgf expression domains have been observed in the tail or the trunk of ascidian larvae (Satou et al. 2002; Imai et al. 2004) suggests that the recruitment of FGF for epidermis patterning could be an innovation of the appendicularian lineage, linked to the evolution of the house building organ and the tail movements that characterize its fully free-living lifestyle.
(4) Extinction of ancestral expression domains linked to gene losses
In ascidians, Fgf7/10/22 (which previously had been also referred to as Fgf3) is strongly expressed throughout the ventral row of cells of the neural tube at tadpole stages, and its signaling function has been described to be crucial for the convergent extension of the notochord that underlies just underneath the neural tube (Shi et al. 2009). In O. dioica, the loss of Fgf7/10/22 predicts that the convergent extension of the notochord might have become independent of the FGF signaling from the ventral neural tube, a prediction that can be tested in future experiments interfering with FGF signaling. Moreover, in ascidians Fgf7/10/22 knockout leads to the arrest of tail absorption during metamorphosis, which has led to the suggestion that Fgf7/10/22 might play an inductive role for the metamorphosis in ascidians (Treen et al. 2014). In this context, it is tempting to speculate that the loss of Fgf7/10/22 in appendicularians could be related with the loss of a drastic metamorphosis and the lack of absorption of the tail, as it has been suggested in the ascidian-like biphasic tunicate ancestor (Fig. 5a). Additionally, ascidian metamorphosis involves mesenchymal tissue, which consists of mesodermal cells that remain in a pluripotent state during development until postmetamorphic differentiation into adult tissues and structures. The ascidian Fgf8/17/18 ortholog plays a role in the early differentiation of mesenchymal cells, with its expression maintained throughout embryonic development (Imai et al. 2004; Satou 2020). Therefore, the loss of Fgf8/17/18 in appendicularians could be related to the loss of mesenchymal tissue and the lack of a drastic metamorphic process (Fig. 5).
Altogether, our work highlights the evolution of the Fgf family in appendicularians as a paradigmatic example of what could be referred to as “less, but more”, where massive gene losses, but also extensive duplications, result in an overall conservation of Fgf expression domains, in many cases due to function shuffling among paralogs, but also in the loss and innovation of expression domains. The “less, but more” builds on the well-established “less is more” concept by highlighting the possibility of cooccurrence of events of gene losses and gene duplications in an adaptive evolutionary scenario, and how the interplay of these events may impact on the evolvability of the genetic system by generating complex evolutionary patterns that interconnect the roles of lost and expanded sets of genes. There are several examples of evolution of gene families characterized by the cooccurrence of losses and duplications, many of which possibly related to adaptations, such as those affecting the evolution of chemosensory receptor gene families linked to environmental adaptations across all animal species (Nei et al. 2008), the globin gene family and the radiation of mammals (Opazo et al. 2008), the contoxin gene family and the evolution of venoms in marine cone snails (Chang and Duda 2012), or the evolution of the Wnt family in O. dioica (Martí-Solans et al. 2021), among many other examples.
Here in our work, the recognition of different categories of evolutionary patterns associated with the “less, but more” scenario has allowed us to propose speculative hypotheses that will need to be tested with functional experiments to unveil the function of FGF in each of the expression domains, as well as to be studied across a larger sampling of ascidian and appendicularian species. Future work testing these hypotheses will be needed to better understand the mechanisms underlying those patterns of evolution, and to discard, for instance, cases in which apparently innovative appendicularian functions were ancestral, but lost in some ascidian species, or to discover cases in which function shuffling could be due to the complementary loss of ancestral redundant functions between paralogs that have been differentially preserved among different species. Interestingly, in our work, many of the innovations associated with the “less, but more” can be related to the transition from an ancestral ascidian-like biphasic lifestyle to the fully free-living lifestyle that characterizes appendicularians. Thus for instance, future functional analyses will be crucial to investigate several key aspects: the role of Fgf11/12/13/14 genes in the innovation of a new mouth opening and the modulation of neural excitability; the involvement of multiple Fgf genes in the patterning of the oikoblast and the innovation of the house; the role of FGF signaling in notochord convergence and in the development of the fin; the impact of the emergence of an ATFS and its potential connection to the fact that these organisms are evolutionary knockouts of RA signaling; and finally, how the loss of Fgf7/10/22 and Fgf8/17/18 might be related to the loss of tail absorption and the absence of metamorphosis, both of which occurred during the evolutionary innovation of the fully free-living lifestyle of appendicularians.
Conclusion
The “less, but more” evolutionary scenario illustrated in this work through the evolution of the Fgf family in appendicularians expands the “less is more” hypothesis by helping to recognize patterns of evolution associated with the cooccurrence of extensive gene losses and duplications. These findings can be useful to better understand the adaptive impact of gene loss on the evolution and diversification of species across the tree of life.
Supplementary Material
Supplementary material is available at Molecular Biology and Evolution online.
Acknowledgments
We thank present and past team members on C.C.'s laboratory for assistance and fruitful discussions on FGF signaling, gene loss, and evolution, specially to Sebastian Artime Paoletti for running the Oikopleura facility in the University of Barcelona. We thank to Centres Científics i Tecnològics de la UB for sea water supply and sequencing services.
Author Contributions
Conceptualization: C.C., G.S.-S.; formal analysis: G.S.-S.; funding acquisition, project administration, supervision: C.C.; investigation: G.S.-S., P.B., J.B.-R., A.F.-R., M.F.-T., N.P.T.-Á., J.N.W.; methodology: G.S.-S., P.B., J.B.-R., A.F.-R., M.F.-T., N.P.T.-Á, RA; resources: J.N.W., M.J.M., C.P., N.M.L.; software: N.P.T.-Á, J.N.W., M.J.M., C.P.; validation, visualization, writing (original draft), writing (review & editing): G.S.-S., C.C.
Funding
C.C. was funded by PID2019-110562GB-I00 and PID2022-141627NB-I00, and R.A. by PID2021-123258NB-I00 from Spanish Ministerio de Ciencia, Innovación y Universidades MICIU/AEI/10.13039/501100011033 and by ERDF/EU; C.C. by ICREA Acadèmia Ac2215698, C.C. and R.A. by 2021-SGR00372, and NPTA by 2021BP00067 AGAUR, Generalitat de Catalunya; NPTA by 2019IRBio001 from IRBio, Universitat de Barcelona; G.S.-S. by FPU18/02414 fellowship from Ministerio de Educación y Cultura, M.F.-T. by a PREDOC2020/58 fellowship from Universitat de Barcelona; A.F.-R. by MS12 Margarita Salas from Ministerio de Universidades (Spain).
Data Availability
The data underlying this article are available in the article and in its online Supplementary material.