Discovery of Paralogous GnRH and Corazonin Signaling Systems in an Invertebrate Chordate

Abstract Gonadotropin-releasing hormone (GnRH) is a key regulator of reproductive function in vertebrates. GnRH is related to the corazonin (CRZ) neuropeptide which influences metabolism and stress responses in insects. Recent evidence suggests that GnRH and CRZ are paralogous and arose by a gene duplication in a common ancestor of bilaterians. Here, we report the identification and complete characterization of the GnRH and CRZ signaling systems in the amphioxus Branchiostoma floridae. We have identified a novel GnRH peptide (YSYSYGFAP-NH2) that specifically activates two GnRH receptors and a CRZ peptide (FTYTHTW-NH2) that activates three CRZ receptors in B. floridae. The latter appear to be promiscuous, as two CRZ receptors can also be activated by GnRH in the physiological range. Hence, there is a potential for cross-talk between these closely related signaling systems. Discovery of both the GnRH and CRZ signaling systems in one of the closest living relatives of vertebrates provides a framework to discover their roles at the transition from invertebrates to vertebrates.


Introduction
Neuropeptides are the most diverse class of neuronal signaling molecules. They are found throughout the animal kingdom, including in animals without a nervous system (Senatore et al. 2017;Nässel and Zandawala 2019;Sachkova et al. 2021;Yañez-Guerra et al. 2022). The origins of most neuropeptide families can be traced to a common ancestor of bilaterian animals (Jekely 2013;Mirabeau and Joly 2013). Consequently, orthologs of several vertebrate neuropeptide signaling systems are found in invertebrates. One such conserved neuropeptide signaling pathway that has been widely studied is the gonadotropinreleasing hormone (GnRH) system which regulates reproductive function in humans and other vertebrates (Tsutsumi and Webster 2009). GnRH is orthologous to the adipokinetic hormone (AKH) in invertebrates such as arthropods and regulates energy homeostasis (Staubli et al. 2002;Bharucha et al. 2008;Zandawala et al. 2018). The GnRH/AKH signaling system is closely related to the corazonin (CRZ) signaling pathway, which influences metabolism and stress responses in insects (Veenstra 2009;Zhao et al. 2010;Kubrak et al. 2016;Zandawala et al. 2021). Recent evidence from the starfish Asterias rubens revealed that the GnRH and CRZ signaling systems are in fact paralogous and arose by gene duplication in a common ancestor of bilaterians (Tian et al. 2016). This study put an end to a long-standing debate on the precise relationship between GnRH and CRZ (Zandawala et al. 2018), exemplifying the power of deuterostomian invertebrates in resolving the evolution of bilaterian neuropeptide signaling systems. Independently, the AKH signaling system also appears to have duplicated in a common ancestor of the arthropods to give rise to the AKH/CRZ-related peptide (ACP) signaling, whose function remains elusive (Hansen et al. 2010).
Occasionally, both the neuropeptide sequence/structure and function tend to be conserved across animals, as evident for insulin which regulates lifespan in most animals (Barbieri et al. 2003). Hence, invertebrates such as insects and nematodes have served as good models to decipher the functions and modes-of-action of some conserved neuropeptides (Van Sinay et al. 2017;Nässel and Wu 2022). However, in most cases, including GnRH and AKH, vertebrate and insect counterparts of the same neuropeptide family regulate different functions. This is perhaps due to the large evolutionary distance separating these species. Therefore, there is a need to establish other model systems that are more closely related to vertebrates than insects to study neuropeptide function. The amphioxus B. floridae is best suited to address this need. Amphioxi are chordates and represent one of the closest living relatives of vertebrates (Putnam et al. 2008). Discovering neuropeptide signaling systems in B. floridae can thus provide a framework to unravel the functions of neuropeptides in this species which occupies a unique position in animal phylogeny.
Previously, one neuropeptide precursor and four receptors belonging to the GnRH/CRZ superfamily were identified in the B. floridae genome (Tello and Sherwood 2009;Roch, Tello, et al. 2014). Phylogenetic analysis revealed that two of these receptors grouped with protostomian CRZ receptors and the other two receptors grouped with GnRH/AKH receptors. A putative mature peptide (pQILCARAFTYTHTW-NH 2 ) encoded by this novel precursor was classified as a GnRH-like peptide despite the fact that it activated one of the two CRZ-like receptors in the high nanomolar range (Roch, Tello, et al. 2014). Regardless of the nomenclature, it is evident that additional and/or more specific peptide ligands for the B. floridae GnRH and CRZ receptors remain to be discovered. Here, we report the discovery of the authentic B. floridae GnRH precursor using a hidden Markov model (HMM)-based search. We show that the putative mature peptide encoded by this precursor can activate both the GnRH receptors. We also discover an additional previously unidentified CRZ receptor and identify the putative endogenous ligands for all three CRZ receptors.

Results and Discussion
Identification of the Authentic B. floridae GnRH Precursor To identify CRZ-like and GnRH-like precursors in B. floridae, we searched the Branchiostoma transcriptomes using a custom-generated HMM for GnRH/CRZ precursors. Our sensitive search uncovered a single novel precursor in B. belcheri and B. floridae. Since the precursor previously discovered by Roch, Tello, et al. (2014) encodes a peptide (supplementary fig. S1, Supplementary Material online) which activates a CRZ receptor, it would be more appropriate to reclassify it as the B. floridae CRZ precursor (Roch, Tello, et al. 2014). With this in mind, we hypothesized that our newly identified precursor encodes the elusive B. floridae GnRH ( fig. 1A and supplementary fig. S2, Supplementary Material online). Sequence analysis revealed that this precursor is predicted to generate a single copy of the peptide YSYSYGFAP-NH 2 . Interestingly, this peptide lacks the N-terminal glutamine (which gets converted to pyroglutamic acid in vivo) that is a conserved feature across vertebrate GnRH and arthropod AKH peptides ( fig. 1B). In silico analyses of the B. floridae CRZ precursor suggests that it also encodes a single copy of the mature peptide FTYTHTW-NH 2 (Tian et al. 2016;Zandawala et al. 2018). This peptide is much shorter than the putative mature peptide (pQILCARAFTYTHTW-NH 2 ) predicted previously (Roch, Tello, et al. 2014;Zandawala et al. 2018). Sequence analysis indicates that the seven amino acid residues at the N-terminal end of the extended peptide are in fact part of the signal peptide which immediately precedes the predicted mature peptide ( fig. 1A). Consequently, B. floridae CRZ also lacks the highly conserved N-terminal glutamine and is the shortest CRZ discovered to date ( fig. 1C).
Since both the putative B. floridae GnRH and CRZ mature peptides appear highly divergent, sequence alignments ( fig. 1B and C) are not entirely suitable to infer homology. Therefore, we compared other sequence features, including the position of the mature peptide within the precursor and conserved introns, since these tend to be conserved in neuropeptide orthologs across diverse animal phyla (Mirabeau and Joly 2013). A comparison of GnRH ( fig. 1D) and CRZ ( fig. 1E) precursors from different animals shows that the mature peptide is flanked by a signal peptide at the N-terminus for both neuropeptide families. Moreover, the phase of the first intron is also 0 in both GnRH and CRZ genes from deuterostomes. However, a feature that distinguishes deuterostomian GnRH from deuterostomian CRZ sequences is the number of introns. GnRH from B. floridae and other deuterostomes have two introns interrupting the protein-coding sequence ( fig. 1D), whereas deuterostomian CRZ sequences, including the one from B. floridae, have a single conserved intron ( fig. 1E). Remarkably, synteny analysis indicates that the genomic regions containing GnRH/AKH and CRZ genes are largely conserved between B. floridae and D. melanogaster ( fig. 1F). Taken together, sequence analyses indicates that the previously identified B. floridae precursor is the CRZ precursor. More importantly, our newly identified B. floridae precursor shares several features with GnRH/AKH precursors from other animals, and likely encodes the bona fide GnRH.

Identification and Functional Characterization of B. floridae GnRH and CRZ Receptors
Previously, two GnRH-like and two CRZ-like receptors were identified in the B. floridae genome (Mirabeau and Joly 2013;Roch, Tello, et al. 2014). Several neuropeptide receptors have undergone masive expansions in B. floridae Wang et al. 2017). Since a better, chromosomelevel assembly for the B. floridae genome is now available (Simakov et al. 2020 Having identified the B. floridae GnRH/CRZ-like peptides and receptors, we next sought to functionally validate these signaling systems by confirming that the putative GnRH and CRZ mature peptides can indeed activate their corresponding receptors. For this purpose, we expressed individual receptors in a heterologous expression system comprising human embryo kidney 293 cells (HEK293-G5a) cells and monitored luminescence responses to three synthetic peptides. We tested the newly identified B. floridae GnRH (YSYSYGFAP-NH 2 ), the CRZ (FTYTHTW-NH 2 ), as well as the N-terminally extended CRZ (CRZ-ext; pQILCARAFTYTHTW-NH 2 ) predicted and tested previously (Roch, Tello, et al. 2014). First, we tested the B. floridae GnRH receptors (GnRHR1 and GnRHR2) which were Signal peptides are highlighted in blue, dibasic cleavage sites are highlighted in green, and the putative mature peptides (without post-translational modifications) are highlighted in purple (GnRH) and red (CRZ). Sequence in pink within the CRZ signal peptide is part of the N′-terminally extended CRZ (CRZ-ext) reported by Roch, Tello, et al. (2014). Multiple sequence alignments of (B) GnRH and AKH mature peptides, and (C) CRZ mature peptides. Comparisons of the exon-intron structure of genes encoding (D) GnRH, AKH, and ACP precursors, and (E) CRZ precursors. Boxes represent exons (drawn to scale) and lines represent introns, with length underneath. Regions encoding signal peptides (blue) and mature peptides (purple and red) have been colored. The intron phase is indicated above the exon-intron boundary. In deuterostomes, GnRH precursors are encoded by three exons and CRZ precursors are encoded by two exons, which distinguishes the two neuropeptides. (F ) Conserved synteny for the genomic regions containing GnRH/AKH and CRZ precursor genes. Same colors are used for orthologs. For simplicity and ease of comparison, neighboring genes are aligned, locations of genes omitted and major gene loses indicated using broken boxes. NPS, neuropeptide-S; CCAP, crustacean cardioactive peptide; VP, vasopressin; OXT, oxytocin; ADRA1, adrenoceptor Alpha 1; OAMB, octopamine receptor in mushroom bodies; Agam,    showing the relationship between GnRH and CRZ receptors. The tree was generated with PHYML 3.0 using the maximum-likelihood method. aLRT-SH-like both specifically activated by GnRH in a dose-dependent manner, with EC 50 values for the response in the low nanomolar range ( fig. 2B and D). Neither of these two receptors were activated by CRZ or CRZ-ext. Together, these results confirm that the precursor identified here encodes the elusive B. floridae GnRH. Next, we tested the three CRZ receptors (CRZR1, CRZR2, and CRZR3). Consistent with Roch, Tello, et al. (2014), CRZR2 (formerly GnRHR3) was activated by CRZ-ext, with a comparable EC 50 value in the high nanomolar range ( fig.  2C and D). Additionally, CRZR1 and CRZR3 were also activated by CRZ-ext; however, the EC 50 values for the response were equally high. In contrast, the shorter CRZ activated all three CRZ receptors with a higher potency and efficacy compared to CRZ-ext ( fig. 2C and D). Thus, CRZ rather than CRZ-ext appears to be the endogenous ligand for the three B. floridae CRZ receptors. Identification of endogenous peptides in B. floridae via mass spectrometry can clarify whether CRZ, CRZ-ext, or both are found in vivo. Surprisingly, GnRH was also able to activate all three CRZ receptors, with EC 50 values for CRZR1 and CRZR2 responses in the mid-to-low nanomolar physiological range. Hence, there is a potential for cross-talk between the B. floridae GnRH and CRZ signaling systems in vivo. Decreased selection pressure on both the CRZ peptide and the ligand-binding pocket of CRZ receptors could explain how this signaling system was lost in vertebrates and other animals ( fig. 3).

Conclusion
In conclusion, we have discovered the bona fide GnRH, CRZ and their cognate receptors in the amphioxus, B. floridae. This suggests that CRZ signaling system has likely been lost in olfactores (vertebrates and urochordates). We acknowledge that the putative B. floridae GnRH and CRZ mature peptides require further validation using mass spectrometry. Discovery of both the signaling systems in one of the closest living relatives of vertebrates provides a framework to discover the physiological roles of GnRH and CRZ at the transition from invertebrates to vertebrates. Lastly, there is a strong need for neuropeptide discovery in diverse species as neuropeptide-encoding genes serve as excellent markers for single-cell transcriptome analyses (Ma et al. 2021). Hence, our methodological approach highlighted here can facilitate these studies since it is ideal for identifying highly divergent neuropeptides which evade typical BLAST searches.

Materials and Methods
Identification of the B. floridae GnRH Precursor Using an HMM Search A GnRH-like precursor from B. floridae was identified using a custom HMM for GnRH/CRZ precursors. The sequences used to generate the HMM (supplementary file 1, Supplementary Material online) were obtained from publicly available databases (NCBI). These sequences were aligned using the automated option in MUSCLE (Edgar 2004) and the resulting alignment trimmed using TrimAL (Capella-Gutiérrez et al. 2009). This trimmed alignment was used to generate the HMM (supplementary file 2, Supplementary Material online) using HMMER3.0 (http:// hmmer.org/) (Eddy 2011). This model was used to search B. floridae and B. belcheri proteomes with an e-value of 1e-1.

Sequence Analyses and Alignments
Potential signal peptide cleavage sites in B. floridae GnRH and CRZ precursors were predicted using SignalP 6 Server (Teufel et al. 2022). GnRH and CRZ mature peptide sequences were aligned using Clustal Omega (Sievers et al. 2011). The alignments were adjusted and shaded manually. Gene structures of GnRH and CRZ precursors were predicted using Webscipio 2.0 (Hatje et al. 2011). Membrane topology of the receptors was predicted in silico using Protter and DeepTMHMM (https://dtu.biolib.com/ DeepTMHMM) (Omasits et al. 2014). Synteny analysis of GnRH/AKH and CRZ was based on the one performed earlier by Roch, Busby, et al. (2014). Drosophila gene loci were obtained from FlyBase (Gramates et al. 2022) and B. floridae CRZ loci was determined via genome BLAST.

Clustering and Phylogenetic Analyses of GnRH and CRZ Receptors in Bilateria
Putative B. floridae GnRH and CRZ receptors were identified using a combination of clustering-based and phylogenetic analyses. Briefly, GnRH and CRZ receptor sequences support values (1,000 replicates) are shown next to the nodes. The clade containing the vasopressin/oxytocin and neuropeptide-S receptors was used to root the tree. Pastel-colored backgrounds represent different taxonomic groups (see key). Receptors functionally characterized in this study are highlighted in black. (supplementary file 3, Supplementary Material online) were aligned using MUSCLE and trimmed using TrimAL, as described above. This alignment was used to generate a custom HMM for GnRH and CRZ receptors (supplementary file 4, Supplementary Material online) with HMMER3.0. We used this model to search for GnRH/CRZ receptors (with an e-value of 1e-25) from select species belonging to major animal phyla. Predicted proteomes of the following species were searched: A. rubens, Bombyx mori, Caenorhabditis elegans, Ciona intestinalis, Clytia hemisphaerica, Danio rerio, Daphnia pulex, Gallus gallus, Homo sapiens, Latimeria chalumnae, Lottia gigantea, Mus musculus, Octopus bimaculoides, Petromyzon marinus, Priapulus caudatus, Rhodnius prolixus, Saccoglossus kowalevskii, Strongylocentrotus purpuratus, Tribolium castaneum, and Xenopus tropicalis. Since predicted proteomes of Alatina alata, B. belcheri, and B. floridae were not available, their transcriptomes were first translated into protein sequences (minimum length of 100 amino acids) using the prediction option in TransDecoder (https://github.com/ TransDecoder/TransDecoder/wiki) before performing the search. The sources for these predicted proteomes and transcriptomes are available in supplementary file 5, Supplementary Material online. Independently, the GnRH and CRZ receptor sequences used for generating the custom HMM above were also used as a query to perform a BLASTp search with an e-value cutoff of 1e-15. The resulting hits from the HMM and BLASTp searches were merged. Redundant sequences were then eliminated (at a 98% threshold) using CD-Hit (Fu et al. 2012). These searches retrieved several receptors including those activated by other neuropeptides and monoamines. To identify putative GnRH and CRZ receptors, all the receptor sequences were clustered in CLANS (https://toolkit.tuebingen.mpg. de/tools/clans) (Frickey and Lupas 2004), using the BLOSUM 62 matrix and extracted BLAST high scoring pairs with an e-value of 1e-15 as a threshold. The CLANS analysis file is available in supplementary file 6, Supplementary Material online. To identify the different clusters, we used the linkage-clustering option with at least two links. Receptor sequences from GnRH/CRZ, vasopressin/oxytocin and neuropeptide-S/NGFFFamide clusters were extracted (supplementary file 7, Supplementary Material online) and used for phylogenetic analysis. Sequences were aligned using the iterative refinement method E-INS-i in MAFFT version 7.0 (Katoh et al. 2019). The alignment was trimmed with TrimAl in gappy mode (Capella-Gutiérrez et al. 2009). The maximum-likelihood tree was produced using PHYML (ngphylogeny.fr) with the LG general amino acid replacement matrix and four discrete gamma models (Le and Gascuel 2008). Branch support was calculated based on 1,000 replicates with the aLRT-SH-like methodology (Guindon et al. 2010 3.-Evolution of GnRH and CRZ signaling systems. Animal phylogeny showing the occurrence of GnRH (purple), AKH (green), ACP (yellow), and CRZ (red) signaling systems. An empty box denotes absence of the signaling system. An asterisk indicates species in which the receptors have been functionally characterized and the question mark indicates uncertainty about the presence of that receptor in the genome. ACP (orange cross) and CRZ (red cross) systems have been independently lost in multiple lineages. Two major gene duplication events are marked in the phylogeny with numbers: (1) GnRH and CRZ signaling systems arose in a common ancestor of the Bilateria and (2) AKH and ACP signaling systems arose by duplication of the GnRH signaling system in a common ancestor of the Arthropoda. The lancelet Branchiostoma floridae, an invertebrate, is the closest relative of vertebrates to possess both the GnRH and CRZ signaling systems. Given the low quality of Petromyzon marinus genome assembly (supplementary file 5, Supplementary Material online), we cannot rule out the possibility that CRZ signaling is also found in vertebrates. Silhouettes were obtained from https://www.phylopic.org/.
peptides were first prepared in 50% Dimethyl sulfoxide. Two putative GnRH receptors and three putative CRZ receptors were identified in B. floridae based on our clustering and phylogenetic analyses. These receptors were codon-optimized for mammalian cell lines, synthesized and cloned into pcDNA3.4(+) vector by Genscript synthesis services. A 5′ partial Kozak translation initiation sequence (CCACC) was also included to improve expression. The codon-optimized receptor sequences are provided in supplementary file 9, Supplementary Material online. HEK293-G5a (Angio-proteomie CAT no. cAP-0200GFP-AEQ-Cyto) cells were cultured in 96 well-plates containing 100 μl of Dublecco's Modified Eagle medium (DMEM; Thermo; Cat. No. 10566016) supplemented with 10% foetal bovine serum (Thermo; Cat. No. 10082147). The plates were kept in an incubator at 37°C and 5% CO 2 . Upon reaching 85% confluency, cells were cotransfected with plasmids encoding individual receptors or the empty pcDNA3.1 + vector (control), and the Gαqi9 promiscuous protein [Addgene; Cat. No. 125711 {Masharina et al. 2012}], as described previously (Yañez-Guerra et al. 2022). Transfections were carried out with 60 ng of each plasmid and 0.45 μl of the transfection reagent Transfectamine 5,000 (AAT-bioquest; Cat. No. 60022) per well. Two days post-transfection, the media was replaced with fresh DMEM medium supplemented with 4 mM coelenterazine-H (Thermo Fisher Scientific; Cat. No. C6780). The cells were incubated for 2 h and then used for the luminescence assay. Various concentrations of the three synthetic peptides ranging from 10 −4 M to 10 −12 M were prepared in DMEM media and tested on the cells. Luminescence levels were recorded and integrated over a 65-s measurement period in a FlexStation 3 Multi-Mode Microplate Reader (Molecular Devices). A minimum of two independent transfections (biological replicates) for each receptor and two-three technical replicates for each peptide concentration were performed. The responses were normalized to the maximum response obtained following the addition of peptide in each experiment (100% activation) and to the response obtained with the vehicle media (0% activation). Normalized data from the independent transfections was used to plot the dose-response curves (fitted with a four-parameter curve) using the package drc in R (Ritz et al. 2015). The raw data for the emtpy-vector control and receptor characterization are provided in supplementary files 10 and 11, Supplementary Material online, respectively. The script used for the dose-response curve analysis is available on Github (https://github.com/Imnotabioinformatician/ Branchiostoma_Dose_response_curves).