Evolution of the Growth Hormone Gene Duplication in Passerine Birds

Abstract Birds of the order Passeriformes represent the most speciose order of land vertebrates. Despite strong scientific interest in this super-radiation, genetic traits unique to passerines are not well characterized. A duplicate copy of growth hormone (GH) is the only gene known to be present in all major lineages of passerines, but not in other birds. GH genes plausibly influence extreme life history traits that passerines exhibit, including the shortest embryo-to-fledging developmental period of any avian order. To unravel the implications of this GH duplication, we investigated the molecular evolution of the ancestral avian GH gene (GH or GH1) and the novel passerine GH paralog (GH2), using 497 gene sequences extracted from 342 genomes. Passerine GH1 and GH2 are reciprocally monophyletic, consistent with a single duplication event from a microchromosome onto a macrochromosome in a common ancestor of extant passerines. Additional chromosomal rearrangements have changed the syntenic and potential regulatory context of these genes. Both passerine GH1 and GH2 display substantially higher rates of nonsynonymous codon change than non-passerine avian GH, suggesting positive selection following duplication. A site involved in signal peptide cleavage is under selection in both paralogs. Other sites under positive selection differ between the two paralogs, but many are clustered in one region of a 3D model of the protein. Both paralogs retain key functional features and are actively but differentially expressed in two major passerine suborders. These phenomena suggest that GH genes may be evolving novel adaptive roles in passerine birds.


Introduction
Gene duplication is among the most important forces in evolution (Ohno 1970). Duplicated genes can reinforce preexisting functions or, less constrained by negative selection and often in a new regulatory context, duplicates can accumulate mutations and evolve new roles (Ohno 1970;Lynch and Force 2000;Assis and Bachtrog 2015;Jiang and Assis 2017). When functional divergence involves subdivision of previous functions between duplicates, they are said to be subfunctionalized; when a new function arises in one copy, it is said to be neofunctionalized (Conrad and Antonarakis 2007;Chen et al. 2013;Long et al. 2013). Duplication followed by functional divergence has occurred multiple times in the vertebrate growth hormone (GH) protein family, which has expanded via whole genome and segmental gene duplications in various lineages (McKay et al. 2004;Ocampo Daza and Larhammar 2018). Here we investigate the origin, evolution, and expression of one such gene duplication that occurred at the root of the avian order Passeriformes.
Passeriformes represent a monophyletic (Raikow 1982;Hackett et al. 2008) super-radiation comprising ∼6,500 of the ∼10,800 extant bird species (Gill et al. 2021), making them the most speciose order of birds and one of the most speciose clades of vertebrates of similar age (in this case ∼47 Ma) (Oliveros et al. 2019). The reasons for their apparent evolutionary success have long been debated. Raikow (1986) argued that the large number of passerine species is "likely an artifact of classificatory history," prompting counterarguments proposing traits or mechanisms that may account for passerine diversity. Traits proposed include true vocal learning (Fitzpatrick 1988;Vermeij 1988) and sexually dimorphic plumage (Barraclough et al. 1995), both thought to drive speciation via sexual selection. Alternatively, passerine speciation might be driven by strictly adaptive traits, such as a generalist ability to survive in new environments (Baptista and Trail 1992), the ability to build complex, camouflaged nests (Olson 2001), enhanced neural estrogen signaling (Schlinger et al. 2022), or greater behavioral plasticity, memory, and cognition than other avian lineages (Gesicki 2019).
However, many of these proposed key adaptations are not restricted to passerines. Vocal learning is present in parrots and hummingbirds (Johnson and Clark 2020). Sexually dimorphic plumage, found in almost every avian order, is likely the ancestral state for birds (Kimball and Ligon 1999). Complex nest building is not unique to passerines either; Olson (2001) cites the example of the mud nests built by swallows without acknowledging the equally complex nests made by swifts and other birds (Steeves et al. 2020). The cognitive abilities of parrots are also highly developed (Emery 2006). Biogeographical explanations of passerine species diversity are likewise inadequate; passerine diversification rates, while linked to major climatological and geological events, are decoupled from continent colonization events (Oliveros et al. 2019), and, while island systems have been shown to promote diversification of some passerine clades (McCullough et al. 2022), it is not clear why this effect would not drive commensurate net diversification in other avian orders. Thus, after over 40 years of active consideration, there are no clearly favored hypotheses on the causes of passerine diversity.
What then remain as possible mechanisms or key adaptations that might explain passerine diversity? Whether high speciation rates, low extinction rates, or both have driven passerine diversity, there may be underlying adaptive mechanisms. Compared with other birds, passerines have rapid developmental periods, short generation times, and many offspring (Cooney et al. 2020). Such traits are likely influenced by hormonal physiology. Indeed, hormones of the GH/insulin-like growth factor axis are known to influence several aspects of development and life history in birds such as body size, time to reach maturity, clutch size, egg weight, clutch interval, and lifespan (Lodjak et al 2018;Socha and Hrabia 2019). Could adaptive innovations in hormone physiology contribute to the evolutionary success of passerines? GH, also called somatotropin, is the ancestral hormone of the GH protein family, controlling adult body size across all vertebrates (Kawauchi et al. 2002). The expansion of the GH family via gene duplication in other taxa has led to a variety of functional hormones with novel adaptive roles, such as the convergent evolution of placental lactogens in disparate eutherian mammalian lineages (Alam et al. 2006;Papper et al. 2009). GH family genes have also evolved distinct and overlapping functions in diverse lineages. For example, prolactin, ancestrally an osmotic regulator in fishes (Ocampo Daza and Larhammar 2018), drives milk secretion in mammals (Soares 2004), crop milk production in pigeons and doves (Horseman and Buntin 1995), and brooding behavior in diverse birds (Angelier and Chastel 2009). Yuri et al. (2008) described a duplication of the GH gene in passerine birds and presented evidence for accelerated evolution and positive selection of the paralogs. Arai and Iigo (2010) independently discovered this duplication and provided evidence of differential expression of the two genes. With the recent release of hundreds of newly assembled genomes from virtually every family of extant birds, a search for lineage-specific genes in Passeriformes found that the GH gene duplication is the only novel gene or gene copy yet identified that appears to be present in all major lineages of this clade (Feng et al. 2020). Intrigued by the correlation of this genomic innovation with passerine diversification, we wished to explore its significance further.
Here, we evaluate molecular evolution of the GH paralogs using complete GH sequences from 183 passerines and 159 other birds. The rich genomic resources now available allowed us to examine the chromosomal context of both paralogs in multiple species and resolve the phylogenetic placement of key events including the GH duplication, lineage-specific deletions, and rearrangements of syntenic genes. We provide strong evidence of positive selection on both passerine GH paralogs, the first RNA-seq evidence on transcriptional functionality and tissue specificity of both paralogs in a passerine, and speculate on the potential neofunctionalization or subfunctionalization of these genes in the major lineages of passerines.

Copy Number and Duplication Origin
We retrieved a total of 497 full length GH coding sequences (CDS) from 342 bird genomes including all major lineages of living birds, comprising 159 non-passerines with a single copy, 154 passerines with two paralogs retrieved, and 29 passerines with only one full length paralog retrieved (supplementary table S1, Supplementary Material online). BLAST searches did not identify more than one copy in any non-passerine or more than two copies in any passerine. Gene tree estimation demonstrates that passerine GH sequences form reciprocally monophyletic clades that split at the root of Passeriformes ( fig. 1), consistent with a single GH gene duplication event in living birds. We designate the paralogs belonging to these clades GH1 and GH2 (see Methods for naming rationale).
All taxa for which we recovered two paralogs had one in each clade, and taxa with only one paralog recovered were widely dispersed on the tree ( fig. 1). Lack of a clear phylogenetic pattern to these absences suggests that no major group of passerines has lost the GH duplication and that most or all "missing" paralogs may be due to the draft or low-coverage nature of many bird genomes, as previously concluded by Feng et al. (2020). To test this conclusion, we searched for these "missing" paralogs in the raw read data for the 29 species whose genome assemblies only contained a single copy. For 28 of the 29 taxa, we found reads clearly attributable to the "missing" paralog; only Lanius ludovicianus failed to yield evidence of a second paralog (GH2 in this case). Thus, we found evidence that at least partial copies of both paralogs are present in 182 of the 183 passerines we studied. The majority of "missing" paralogs (21 of 29) were GH1s, possibly because GH1 resides on a microchromosome, while GH2 is on a macrochromosome. Microchromosomes usually have higher GC content and are more difficult to sequence and assemble (Peona et al. 2021).
The syntenic context of avian GH genes lends strong evidence to a single GH gene duplication event and shows that this event is part of a series of genic and genomic changes. Of 52 non-passerine genomes in which we examined syntenic context, SCN4A and CD79B were directly upstream of GH in 44 genomes and the remaining eight genomes had incomplete assemblies in this region (supplementary  table S8, Supplementary Material online). Similarly, in 24 passerine genomes in which we examined syntenic context, the GH1 gene was adjacent to SCN4A and CD79B in all cases. For passerine and non-passerine genomes alike, downstream synteny for GH1 is consistent, with PLEKHM1 and ARHGAP27 present within a few genes of GH/GH1 in every genome with a complete assembly in this region ( fig. 2), with one exception, Lonchura striata (noted in supplementary table S8, Supplementary Material online). These genes flanking GH/GH1 reside on microchromosome 27 of chicken (Gallus gallus) and zebra finch (Taeniopygia guttata). As the linkage of CD79B and GH occurs across Tetrapoda, for example Xenopus tropicalis genome assembly UCB_Xtro_10.0 and Homo sapiens genome assembly GRCh38.p13 (O'Leary et al. 2016), we infer this syntenic context to be ancestral and passerine GH1 to be orthologous to the single-copy non-passerine GH ( fig. 2).
In this same set of 24 passerine genomes, the derived paralog (GH2) consistently occurs with MAP6 and MOGAT2 downstream in a syntenic context that maps to zebra finch chromosome 1 ( fig. 2). Upstream synteny for this paralog is less informative due to multiple local chromosomal rearrangements ( . The ancestral segment containing GH that was originally duplicated from chromosome 27 to chromosome 1 also included at least two other predicted genes downstream: RDM1 and an unnamed leucine-rich repeat (LRR) gene. A series of deletions on chromosome 1 and tandem duplications or deletions on chromosome 27 have changed the number of RDM1 and/or LRR copies in various lineages, while GH1 and GH2 are widely, possibly universally, retained in passerines (figs. 1 and 2).

Gene Structure
All avian GH genes examined share the same gene structure, with five exons and four introns ( fig. 2C). Retention of introns in the derived passerine paralogs (GH2) suggests that the duplication occurred at the DNA level, and did not involve retroposition. The segmental nature of the original duplication discussed above is also consistent with this conclusion. Like human GH1, avian GH genes are typically 217 amino acids in length (observed range 214-218 residues). Small indels of 1-3 codons in single taxa or clades account for the length variation (supplementary table S2, Supplementary Material online).
Like those of other exported proteins, GH amino acid sequences typically begin with signal peptides that target them for secretion. The vast majority of non-passerine GH sequences are predicted with high likelihood to have signal peptides: 158 of 160 species examined across 38 orders (supplementary table S3, Supplementary Material online). Most passerine GH1 sequences are also predicted with high likelihood to have signal peptides, including 156 of 162 sequences examined. In contrast, many fewer passerine GH2 sequences are predicted to include a signal peptide (116 of 175 examined), and the likelihoods of those that do are often substantially lower. Only one of the 154 passerine species for which we could examine both paralogs is not predicted to a signal peptide site in either paralog, Chaetops frenatus. The predicted lengths of signal peptides range from 24 to 29 residues, with 27 residues being by far the most common (supplementary table S3  GH paralogs, which have 5′ UTRs from 28 to 119 bp (mean 56.5) and 3′ UTRs from 18 to 112 bp (mean 86.33).
The base composition of the passerine paralogs has diverged post-duplication ( fig. 3). While the GC content of both paralogs must initially have been identical, all three passerine suborders now display substantial differences between GH1 and GH2 in GC content (mean difference = 4.87%; range 1.42-8.12% across all species; supplementary table S5, Supplementary Material online). However, GH1 has higher GC content in Acanthsitta (suborder Acanthisitti) and oscines, while GH2 is higher in suboscines (suborder Tyranni).

Accelerated Evolution and Positive Selection in the Passerine GH Paralogs
Visual examination of the GH gene tree based on protein-CDS suggests that passerine GH paralogs generally have longer branches than non-passerine GH ( fig. 1). We confirmed this pattern with a likelihood ratio test using BASEML software (from the PAML package of programs for Phylogenetic Analysis by Maximum Likelihood v. 4.9j; Yang 2007), allowing for three different molecular clock rates: a background rate, a rate for the passerine GH1 clade, and a rate for the passerine GH2 clade. This test strongly rejected all three rates being

FIG. 2.-(A)
Local syntenic contexts of GH genes on chromosome 27 (ancestral locus) and chromosome 1 (duplicate locus in passerines). Pentagons indicate relative gene placements and orientations (distance not to scale). Rectangles indicate a variable number of tandem copies of LRR. Post-duplication, many changes to local syntenies have occurred in passerines. Each row in the diagram represents a local synteny pattern found in multiple species in avian outgroups, the sister taxon to passerines (parrots), and each passerine suborder, except for Acanthisitta, for which only one genome exists. The underlying syntenies informing this diagram, as well as a few single species exceptions to it, are reported in supplementary table S8, Supplementary Material online. (B) Cladograms of relationships among the passerine suborders and outgroups depicted in part A, with symbols denoting key rearrangements on chromosomes 27 and 1. Star: segmental duplication of a part of chromosome 27 containing GH, RDM1, and an LRR gene onto chromosome 1. Navy (darker) circles: deletions of LRR genes. Red (lighter) circles: deletions of RDM1 genes. (C) Gene structures of avian GH paralogs. Darker, colored regions denote exons; lighter, gray regions denote introns. Exon/CDS lengths, which do not vary with the paralog, are listed (rare taxon-specific indels are listed in supplementary  Both paralogs experienced similar post-duplication rate increases with maximum likelihood estimates of 4.8x/5.9x (constrained/unconstrained topology of the input tree) for GH1, and 4.3x/5.5x for GH2.
Because the increase in clock rate could have arisen due to neutral processes, we next conducted formal tests of positive selection. Using CODEML software (PAML v. 4.9j;Yang 2007), we tested a null model where the ratios of nonsynonymous to synonymous substitution rates (dN/dS or ω) must be less than one, simulating no positive selection, versus three alternate models where sites with ω > 1 were allowed on foreground branches. The best-fitting model tested (likelihood ratio test against the null P < 0.0001) allowed positive selection on both passerine GH1 and GH2. Of the single-clade models, both passerine GH1 and GH2 were individually highly significant for the model with ω > 1 allowed, but the model with positive selection allowed on GH2 displayed a larger improvement (supplementary table S6B, Supplementary Material online). Similar results were obtained for both constrained and unconstrained tree topologies.
Next, we performed a more conservative aBSREL (adaptive branch-site random effects likelihood) test of positive selection (Smith et al. 2015) on the two ancestral branches leading to the passerine GH1 and GH2 clades. Significant positive selection was indicated only on the branch leading to the GH2 clade (P = 0.0429 with the constrained topology, P = 0.0622 with the unconstrained topology, both after Holm-Bonferroni correction for multiple testing). This result suggests a burst of positive selection immediately following the duplication event. Overall, the relative rate and selection test results provide evidence for positive selection on both passerine GH1 and GH2, with a stronger signal on the GH2 clade. The

Sequence Conservation and Structural Implications
Other regions of the proteins show varying levels of sequence conservation across birds ( fig. 4). Non-passerine GH is more conserved than passerine GH1 and GH2, despite spanning a much longer period of evolutionary divergence: ∼47 MY for passerines (Oliveros et al. 2019) and 90-100 MY for extant lineages of non-passerine birds (Jarvis et al. 2014;Kimball et al. 2019). Overall amino acid sequence divergence between GH1 and GH2 within species varies from 5.07% to 17.51%, with a mean divergence of 13.44% for oscines and 10.04% for suboscines (supplementary table S7, Supplementary Material online). Mean amino acid sequence divergence of GH1 from non-passerine GH (11.59%) is similar to that between GH2 and non-passerine GH (10.93%). While many sites are variable across the alignment, there is only one fixed difference between the paralogs studied here; site 90 is always glycine in GH1 and always lysine in GH2 (supplemental data: GH amino acid alignment, Supplementary Material online; fig. 4). Site 90 is also glycine in all non-passerine GH studied, adding support to the conclusion that GH1 is the ancestral paralog and implying that lysine in GH2 is the derived condition at this site.
Other features of interest include two pairs of cysteines known to form two disulfide bridges in GH (Ocampo Daza and Larhammar 2018). These are universally conserved. Four predicted alpha helices are generally more conserved than the rest of the polypeptide. Sites crucial to binding with the growth hormone receptor (GHR) in human GH1 (de Vos et al. 1992) generally show no evidence of positive selection in passerine GH1 or GH2 (with limited exceptions; see fig. 4), in contrast to mammalian GH paralog evolution, where positively selected sites (PSS) tend to cluster in these GHR-binding areas (Buggiotti and Primmer 2006). Serine site 133 can be phosphorylated in human GH1 (Giorgianni et al. 2004) and is widely conserved in all birds ( fig. 4). Site 175 can also be phosphorylated in human GH1, but does not have a phosphorylatable residue in birds. However, nearby site 177 is a serine or threonine in most bird lineages and thus potentially phosphorylatable (supplemental data: GH amino acid alignment, Supplementary Material online). This site is under positive selection in passerine GH1, where Ser has frequently been replaced by amino acids (Gly, Ala, Arg, Asn) that cannot be phosphorylated.
The potential importance of both conserved features and PSS to the tertiary structure of the protein is evident in a 3D model of the folded polypeptide ( fig. 5). The four conserved alpha helices form a structural core, a shared feature of GH family proteins (Ocampo Daza and Larhammar 2018). The four conserved cysteine residues form two closely apposed pairs, as needed for two disulfide bridges. Many sites aligning with GHR-binding sites in human GH1 are clustered in three of the core alpha helices and two smaller helices that are all accessible on the surface of the 3D model ( fig. 5B). One core alpha helix that is buried in the model has no such sites. Both potentially phosphorylatable sites also appear on the surface of the model ( fig. 5B). Finally, 14 of the 21 sites inferred to be under positive  (Jumper et al. 2021;Varadi et al. 2021) are marked along the 217 amino acid avian consensus sequence. The percent identity at each amino acid site of various groups is overlaid, with 100% conserved sites marked in a darker color, green. Insertions with less than 10% frequency are masked. Key features of human GH1 are shown aligned with avian GH sequences, including the signal peptide, sites that bind GHR, and phosphorylation sites (Giorgianni et al. 2004). At the bottom, the passerine GH1/GH2 consensus sequence bar with numbered exons shows the placement of PSS. Only sites with P > 0.95 according to the Bayes empirical Bayes test of PAML v. 4.9j (Yang 2007) (209) that lies near the end of a highly conserved region of the primary sequence that is bent into a surface loop by a disulfide bridge that lies near the carboxy terminus of the polypeptide ( fig. 5B). Also in this region of the 3D model is site 90 ( fig. 5A), which carries the only fixed amino acid replacement between GH1 and GH2 (Gly > Lys). This site is involved in receptor binding in humans (de Vos et al. 1992).

Changes to Non-coding Regions
The promoter region of chicken GH has been well characterized (Ip et al. 2004). In this region, the proximal TATA box is retained in both passerine GH paralogs ( fig. 6). The binding site for transcription factor Pit-1 is also likely retained, but with some sequence changes in the first few bases. The core binding site for the inhibitory thyroid hormone response element (TRE) is likewise conserved. Interestingly, a second TATA box upstream shows paralog-specific conservation and degeneration. The ancestral sequence (TATAAAT) is conserved in all passerine lineages except GH1 of Acanthisitta and both GH1 and GH2 of suboscines. While the function of this putative TATA box is unknown, multiple TATA boxes in promoter regions are associated with multiple transcription initiation sites in other eukaryotic genes (Hahn et al. 1985;Grace et al. 2004). All passerine GH introns appear to retain splice site functionality with the preservation of GU-AG sites, except in intron 1 from the suboscine Tachuris rubigastrus GH1 (data not shown).

Gene Expression
Transcriptome sequencing from multiple tissues provides the first direct evidence for the expression of both GH1 and GH2 in a suboscine, the wire-tailed manakin (Pipra filicauda). GH1 was expressed at very high levels in the pituitary gland (>75,000 transcripts), at moderate levels in testes (5-34 transcripts) and at very low levels (≤5 transcripts) in ten brain areas tested (note that the log scale varies among panels in fig. 7). In contrast, expression of GH2 in the pituitary (≤43 transcripts) was significantly lower than GH1 expression ( fig. 7). No GH2 transcripts were detected in the testes. Levels of GH1 expression in the pituitary varied with social status and in testes with time of sacrifice; these effects are detailed elsewhere (Bolton et al. 2022). Expression of GH1 was low across brain areas, but somewhat variable between individuals. Nevertheless, in eight of ten brain regions tested, GH1 showed significantly higher expression than GH2. In fact, GH2 transcripts were not detected in any brain areas tested. These expression patterns in our suboscine data differ from those previously reported in oscines (Arai and Iigo 2010;Xie et al. 2010).  (Giorgianni et al. 2004). GHR: sites aligning with GH receptor binding sites in human GH1 (de Vos et al. 1992). 90: site 90 is glycine in all GH1 and lysine in all GH2 studied here.

Discussion
Multiple lines of evidence now combine to strongly suggest a role for selection in passerine GH paralog evolution. First, both GH paralogs are retained in virtually all passerine lineages, despite at least four post-duplication deletion events affecting other genes included in the segment originally duplicated ( fig. 2). Second, formal tests analyzing patterns of nonsynonymous and synonymous changes reject neutrality in favor of positive selection on various branches in the phylogeny after duplication (figs. 4 and 5). Not only did the molecular clock rate increase for both passerine GHs ( fig. 1 and supplementary table S6, Supplementary Material online), but the GH2 paralog experienced a burst of positive selection directly after the duplication event, driving its amino acid sequence apart from the ancestral one. Third, GH1 and GH2 retain key regulatory and structural sequence features necessary for the expression and function of GH family proteins (figs. 4-6). The combination of large-scale structural conservation together with accelerated evolution at mostly differing amino acid sites in the two paralogs suggests that they may have evolved novel or specialized functions. Dosage effects may also be important; the rapid development of passerines (Cooney et al. 2020) might plausibly be related to the presence of two copies, and both paralogs are expressed in pituitary ( fig. 7; see also Arai and Iigo 2010), which is the main site of GH expression during somatic growth. However, the predicted loss of a signal peptidase cleavage site in at least one-third of GH2 genes examined suggests that the products of these genes cannot be exported from the cells of origin, and thus a simple dosage effect on the endocrine functions of GH is unlikely in these species.
From the syntenic and sequence evidence we present, it is evident that the two copies of passerine GH can be explained by a single short segmental duplication and interchromosomal translocation in the common ancestor of Passeriformes, as previously inferred by Feng et al. (2020). We here show that the segment originally duplicated likely contained at least two additional genes, copies of which have been lost or gained on both chromosomes in various lineages. This mechanism contrasts with other expansions of the GH family which have taken place primarily via tandem duplication (Wallis et al. 1998;Alam et al. 2006;Wallis and Wallis 2006;Papper et al. 2009) and whole genome duplication (Whittington and Wilson 2013). Additionally, as the exon-intron structure is retained across avian GH paralogs, it seems clear that GH2 originated via segmental duplication rather than retroposition, which typically creates single-exon genes with introns removed (Cusack and Wolfe 2007). In passerines, post-duplication deletions and a series of rearrangements on chromosome 1 have altered the syntenic context and possibly the regulatory milieu of GH1 and GH2 among the major passerine lineages. In addition, the base composition of the paralogs has diverged substantially post-duplication. This may be a consequence of translocation onto a macrochromosome, as avian micro-and macrochromosomes differ on average in their GC content (Warren et al. 2017; O'Connor et al.  Our work, using the rich genomic resources now publicly available for birds, builds on the discovery of passerine GH2 and initial tests of evolutionary rate acceleration and positive selection performed by Yuri et al. (2008). Our data confirm their observation of a rate increase in amino acid sequence evolution of both passerine GH1 and GH2 in comparison to non-passerine avian GH. Yuri et al. estimated larger rate increases than those that we found (supplementary table S6, Supplementary Material online), likely due to their more limited taxon sample and use of partial sequences that happened to span a region (84 amino acids from sites 51-134) that is evolving more rapidly than the overall sequence ( fig. 4). The rate increases in GH evolution we estimated from complete CDS of a much larger taxon sample are still distinctly higher than expected from any previously observed genome-wide rate changes. For example, Yuri et al. (2008) estimated a roughly 2× increase for passerine GH introns and Nam et al. (2010) estimated an average rate increase of 0.08× for CDS in the zebra finch genome by comparison of 8,384 genes with their 1:1 orthologs from chicken. Yuri et al. (2008) detected evidence of positive selection on two amino acid sites, 99 and 107 as numbered herein. With a much larger taxon sample of complete sequences, we confirmed one of these sites (99) and found further evidence of positive selection on a total of 21 sites, 11 in each paralog, one of which is shared in GH1 and GH2.
The shared site is amino acid 27, the last residue in the predicted signal peptide of the vast majority of avian GH genes (supplementary table S3, Supplementary Material online). Signal peptidase is predicted to cut these polypeptides between sites 27 and 28, so 27 is in the −1 position with respect to cleavage. The −1 position is crucial for correct recognition and cleavage by signal peptidase (Auclair et al. 2011). Mutations here may destroy the signal peptidase cleavage site altogether if the replacement residue is incompatible, such as one with a polar or bulky side chain. Other mutations may affect the efficiency of signal peptidase binding (Choo and Ranganathan 2008). Therefore, positive selection on site 27 indicates evolutionary pressure for the two passerine paralogs to interact differently with signal peptidase. Consistent with diverging roles for the two paralogs, 63 of 154 passerine species sampled have lost a predicted signal peptidase cleavage site in either GH1 or GH2, most often in GH2. In many other species, the probability that such a site is present is substantially reduced. Changes destroying or modifying signal peptidase's affinity would likely affect whether GH proteins are exported from the cell, and if so, how efficiently. Export is, of course, a crucial step in the typical function of a protein hormone. GH genes without signal peptides may be specialized for intracellular roles of the protein.
Other PSS are clustered at one end of the 3D model of the folded protein ( fig. 5A), as is one of the cysteine pairs presumed to form a disulfide bridge ( fig. 5B). Disulfide bridges are often integral to maintaining protein tertiary structure. Also found in this region is site 90, which is known to be involved in binding to the GH receptor in humans (de Vos et al. 1992) and carries the derived Gly > Lys replacement in all passerine GH2. The change from a short, nonpolar side chain (glycine) to a long, charged one (lysine) may be of functional significance in receptor binding. The proximity of many PSS to these potentially important structural features suggests that selection may be acting to further modify protein function by changing binding affinity to the GH receptor or other ligands.
Degeneration or silencing are the most common fates of duplicate genes. However, the presence of many universally conserved sites and features suggests that the amino acid sequences of these paralogs are not in the process of degeneration. Among these conserved features are the four core alpha helices, the universally conserved cysteines expected to form two disulfide bridges, one site for potential phosphorylation, and most regions of the mature protein that likely interact with the GHR. Rather than degenerating, each paralog may be freed from some selective constraints and adapting under positive selection.
Neither of these paralogs have been transcriptionally silenced. Instead, gene expression data reported here and in previous studies suggests divergent patterns of GH1 and GH2 expression depending on species, and, possibly, lineage. In the suboscine Pipra, GH2 expression was limited entirely to the pituitary among the tissues we examined, while GH1 was expressed in multiple tissues. In the oscine Corvus, GH2 was expressed in a wider range of extrapituitary tissues than GH1 (Arai and Iigo 2010). Both GH1 and GH2 are expressed in brain of the oscine zebra finch, but only GH1 is differentially expressed after familiar versus novel song stimuli (Xie et al. 2010). Differing patterns of GH paralog expression may be influenced by the syntenic environment, which has changed dramatically due to the translocation from chromosome 27 to chromosome 1 and further rearrangements on chromosome 1 in both oscines and suboscines ( fig. 2). It is possible that these changes have brought GH2 under the influence of distinct distal regulatory elements in different taxa. Although proximal regulatory elements immediately upstream of both paralogs are generally conserved, base substitutions altering some key regulatory elements in this region ( fig. 6) may also be involved in observed expression differences ( fig. 7).
The divergent patterns of sequence evolution and gene expression of the passerine GH paralogs are consistent with functional divergence after duplication to more specialized or novel roles through subfunctionalization or neofunctionalization. That similar levels of CDS divergence have accumulated between each passerine paralog and ancestral avian GH is consistent with subfunctionalization. The loss of signal peptide cleavage sites in one paralog of many passerine species may also be consistent with subfunctionalization to an autocrine or intracrine role, because GH is believed to have endocrine, paracrine, and autocrine/ intracrine roles (e.g., Baudet et al. 2009;Hattori 2009;Harvey and Baudet 2014). The pattern of gene expression we report in the suboscine Pipra is likewise consistent with subfunctionalization, with GH1 expression in a range of tissues reminiscent of GH expression in chicken (Harvey and Hull 2003;Martínez-Moreno et al. 2014), but with GH2 limited to the pituitary. One possibility is that the pituitary functions of ancestral GH have been subdivided in passerine GH1 and GH2. However, the pattern previously reported for the oscine Corvus is more consistent with neofunctionalization, as GH2 is expressed in novel tissue types compared with ancestral GH1 (Arai and Iigo 2010). Given the multiple functions of ancestral GH, it seems plausible that both sub-and neofunctionalization have occurred across >47 Ma in a large radiation such as passerine birds.
The duplication of a gene so crucial to life history strategy, body composition, and reproduction may be a key adaptation driving the acceleration of passerine development, metabolism, and shorter generation times. These factors may also help explain a faster mutation rate (Gillooly et al. 2005;Nabholz et al. 2008), which correlates to increased diversification rates in birds (Lanfear et al. 2010). This accelerated evolution may have aided adaptation to novel environments as the passerine radiation spread across the globe from its putative origin on the Australian landmass (Barker et al. 2004;Oliveros et al. 2019).
Further research is needed into the developmental stageand tissue-specific expression patterns of the two paralogs. These fundamental biological data will help elucidate the roles of these duplicate genes in passerine growth and development. Our investigations of the molecular evolution and gene expression of the passerine GH paralogs lend powerful evidence to the adaptive fates of these GH paralogs post-duplication. Our findings show the retention of function and expression of both passerine GH paralogs and suggest that they are evolving in a manner consistent with novel or specialized functions. These derived functions may include crucial adaptations to the accelerated life history strategy associated with passerine diversity.

Determining Copy Number and Synteny
For our initial descriptive analysis, we used 76 wellannotated avian genomes (24 passerine and 52 nonpasserine) available on NCBI (O'Leary et al. 2016), plus the unannotated chromosome-level genome assemblies for the passerines Acanthisitta chloris and Manacus candei. Source genomes used in these and following analyses are listed in supplementary table S1, Supplementary Material online. To identify GH sequences and copy number, we searched annotated genomes for "somatotropin" and "growth hormone," which yielded hits for genes annotated "somatotropin," "somatotropin-like," "growth hormone," "GH," and "GH1." We then used the mRNA sequence of each annotated gene to query the focal taxon (taxid matching the genome of origin) using blastn. We evaluated the top 10 matching hits (or all hits, if fewer than ten resulted). From this pool, we retained all hits matching genomic sequences with a sequence identity above 60%, a gapless CDS, and no annotation that better represented the identity of the matching region.
We examined the syntenic context of these annotated GH genes. When chromosome-level assemblies were available, we noted the chromosomal location of the GH paralog(s). For each annotated avian genome, we recorded the genes flanking GH genes. For non-passerines, we also noted the two genes flanking either side of the homologous genomic region where a segment containing GH2 is inserted in passerines ( fig. 2). Additionally, to identify syntenies on unannotated chromosome-level assemblies for M. candei and A. chloris, we used blast, querying genes annotated on a previous assembly from the same species' genome (in the case of Acanthisitta) and from relatives' genomes (for both Manacus and Acanthisitta). Many annotated genomes did not convey enough information to reconstruct the full syntenies shown in figure 2 due to scaffolds ending in the middle of key regions. For a list of genomes informing the placement of the genes in these regions, see supplementary table S8, Supplementary Material online.
To characterize changes that have occurred between passerine lineages in the local genomic context of GH2, we used the chromosomal and local syntenic context of GH genes as well as pairwise dotplots to visualize runs of high sequence similarity. We extracted sequences 25 Mb in length centered on the GH2 translation start site from chromosome-level genome assemblies from each passerine suborder: oscine (T. guttata), suboscine (Chiroxiphia lanceolata), and acanthisittine (A. chloris). To obtain dotplots focused on the displacement of the segment containing GH2 in oscines compared with suboscines and Acanthisitta, we ran pairwise alignments of these three sequences in D-GENIES (Cabanettes and Klopp 2018) using the Minimap2.24 aligner (Li 2018), selecting "many repeats," and mapping results with the R version 4.2.0 (R Core Team 2022) package pafr (Winter 2020) (supplementary fig. S1, Supplementary Material online).

GH Sequence Curation and Alignment
For the initial categorization of GH paralogs as GH1 (ancestral) or GH2 (duplicate), we used sequences from the set of 76 annotated avian genomes from NCBI (O'Leary et al. 2016). GH paralogs were categorized as ancestral (GH1), or novel (GH2), based on their annotated syntenic context. To retrieve additional GH sequences, we queried 280 unannotated genomes with BLAST using the GH sequence(s) of the closest available annotated relative (supplementary table S1, Supplementary Material online). We assigned retrieved unannotated sequences the identity of GH1 or GH2 based on sequence similarity to annotated GH1 and GH2 paralogs. Retrieved sequences underwent quality checks: complete coding DNA sequences without multiple stop codons, gaps, or Ns were retained, as were sequences with other nucleotide ambiguities (R, Y, S, W, K, M). One sequence (Struthio camelus GH1) was discarded because it was highly divergent from all other bird sequences and contained multiple stop codons, also likely due to misassembly. We calculated base composition on our dataset of avian GH sequences that were free of ambiguities.
Failure to retrieve one full sequence from certain non-passerine genomes or two full sequences from certain passerine genomes was most likely due to genome incompleteness or mis-assembly. When the genome assembly of passerine species yielded only one complete paralog (black tips shown in fig. 1), we used blastn to search for evidence of a second copy in the raw read data posted for those species in the NCBI Short Read Archive. We used a 100 bp query sequence from the closest relative of each species (as per the phylogeny of Oliveros et al. 2019), selected from a region that typically shows 10-15% sequence divergence between GH1 and GH2. For example, we queried Ptilorrhoa leucosticta raw reads (no GH1 copy previously retrieved) with a 100 bp sequence from Rhagologus leucostigma GH1, and determined whether short read hits had higher sequence identity with previously retrieved P. leucosticta GH2 or the query R. leucostigma GH1. In cases where we found raw reads that closely matched the "missing" paralog from the related species, we considered this evidence that at least a partial copy of the "missing" paralog exists in this species genome, although it is not present in the currently available genome assembly.
For signal peptide prediction, we used amino acid translations of the same avian GH sequences as used in the selection tests described in Methods: Tests of Selection. For each of these sequences, we used SignalP 6.0 (Teufel et al. 2022) to predict the likelihood of each sequence containing a signal peptide. For sequences where the likelihood was >0.5, SignalP 6.0 predicted both the presence of a cleavage site and the probability of the cleavage site being in the predicted location. We chose the settings "Eukarya" to limit signal peptide prediction sites to Sec/SPI (Signal Peptidase I) sites only, as well as the "slow" setting to use the more complex and accurate protein language model (Teufel et al. 2022).
We performed the coding CDS alignment (supplemental data: GH CDS alignment, Supplementary Material online) in Geneious (Geneious® 2020.2.5) using Clustal Omega and then visually checked it. To analyze the upstream cisregulatory region, we retrieved 1 kb sequences 5′ of the translational start site, plus the first three codons to anchor the alignment, from 132 passerine GH1s, 146 GH2s, and 5 non-passerine avian GHs. We aligned these 1,009 bp sequences with the Geneious plugin of the sequence aligner MAFFT v. 7 (Katoh and Standley 2013) using default settings: algorithm = auto, gap open penalty 1.53, offset value 0.123, and scoring matrix 200 PAM/k = 2. We then annotated the 1,009 bp chicken sequence with the regulatory elements identified by Ip et al. (2004) and realigned the 250 bp region proximal to the translation start site with the same default settings in MAFFT to increase the accuracy of the alignment around regulatory element binding sites (supplemental data: GH flanking region alignment, Supplementary Material online), as the full 1,009 bp alignment featured numerous indels affecting the overall alignment. We considered sequence motifs aligning with the regulatory elements annotated on the chicken sequence to be homologous after inspection of the alignment.

Nomenclature
Given the clear evolutionary origin to the GH paralogs we found (see Results: Copy Number and Duplication Origin), for passerine sequences, we chose to use the symbol GH1 to refer to the ancestral paralog and GH2 to refer to the novel derived paralog. GH is used to refer to the non-passerine avian paralog as per the Chicken Gene Nomenclature Consortium (Burt et al. 2009). Previous authors, who had less data from fewer species, have used a variety of nomenclature. Yuri et al. (2008) named the paralogs based on the PCR product length of the partial passerine sequences examined, with GH long referring to what ultimately proved to be the ancestral copy and GH short referring to the novel copy. Feng et al. (2020) followed this usage. However, gene lengths are variable and some ancestral paralog sequences are shorter than novel paralog sequences (e.g., Catharus ustulatus has a 4,166 bp GH1 and a 5,033 GH2; Passer domesticus has a 4,287 bp GH1 and 5,206 bp GH2). Arai and Iigo (2010) assigned the novel copy the name GH1A and the ancestral copy GH1B; but these symbols unfortunately turned out to be counterintuitive given the evolutionary history now evident. Xie et al. (2010) used GH1 (ancestral) and GH1A (novel); while workable, we argue that GH1 and GH2 more clearly and succinctly convey the ancestral/descendant evolutionary relationship while allowing for the distinct possibility that both passerine paralogs function differently from nonpasserine GH.

Phylogenetic Tree Estimation
We estimated the trees used in the evolutionary analyses from the Geneious alignment of GH CDS using RAxML v. 8 (Stamatakis 2014) option "-f a," which searches for the best maximum likelihood scoring tree and performs a rapid bootstrap analysis in one program run. We used 1,000 bootstrap replicates under the GTRCAT model of rate heterogeneity. Two topologies were obtained and used in all evolutionary analyses: one from an unconstrained maximum likelihood tree search (the unconstrained tree) and a second (the constrained tree) in which the search was constrained to trees compatible with an input topology specifying many widely accepted relationships of the taxa involved (see supplementary fig. S2, Supplementary Material online for the unconstrained and constrained topologies, as well as the input constraint topology). Many of the constrained relationships represent short internodes deep in the tree; such relationships are unlikely to be recovered with a short single gene CDS alignment but are well supported by much larger phylogenomic datasets. Both trees were used throughout all evolutionary analyses to test for any effect of using the constrained tree versus unconstrained tree, making the biologically plausible assumption that the actual evolutionary history of GH genes (aside from duplication in the passerine clade) may be better represented by the constrained topology formed from the literature consensus derived from a multitude of sites. Relationships still unresolved in the avian literature were not constrained (e.g., the placement of Opisthocomus hoazin within Neoaves). The constraint topology was invoked in RAxML with the -g setting.

Molecular Clock
We performed tests of the relative molecular clock rate of protein-coding nucleotide sequence evolution on the constrained and unconstrained trees with the BASEML package in PAML version 4.9j (Yang 2007). The null model enforced a global clock rate, while alternative models allowed local clock rates to be determined for designated foreground clades. For both phylogenetic topologies tested, we designated each of the following as the foreground clade in a separate test: passerine GH1, passerine GH2, and passerine GH1 + GH2. We also performed a fourth test with each topology in which passerine GH1 and GH2 were allowed to have separate clock rates. See figure 1 for a visualization of these foreground and background branch sets.

Tests of Selection
We estimated the timing and sequence location of positive selection in the avian GHs with branch-site tests. We used the CODEML program in PAML v. 4.9j (Yang 2007) to estimate the branch-and site-specific ratios of nonsynonymous to synonymous substitutions (dN/dS or ω). In the branchsite test, the null model M7 (all branches and sites had a beta-distributed ω < 1) is compared with various alternate M8 models. M8 specifies the M7 condition on background branches and sites, allowing an additional category of ω on user-specified foreground branches (including all of their sites) where ω > 1, indicating elevated nonsynonymous changes and positive selection on branches and sites where this rate category occurred. We specified the following foreground clades: (A) the passerine GH1 clade, (B) the passerine GH2 clade, (C) all passerine GH (both GH1 and GH2), and (D) the two branches immediately following the duplication event to explicitly test the hypothesis of a burst of positive selection following duplication ( fig. 1). The tests on A, B, and C are discussed in Results, but the test on clade D failed to estimate ω for this test, likely due to the small number of test branches compared with the size of the dataset, so it was discarded. We also employed the aBSREL test (Smith et al. 2015) in the HyPhy 2.5 family of methods (Kosakovsky Pond et al. 2020) using the Datamonkey server (Weaver et al. 2018). aBSREL represents a stricter version of the CODEML branch-site test, as it allows more rate categories of ω, including ω > 1 on background branches. We tested the same A, B, C, and D foreground branch sets as in CODEML.
We then employed the Bayes empirical Bayes test in the CODEML program on the GH1 and GH2 clades individually to identify amino acids under selection. We placed the amino acids identified under positive selection on the translated CDS alignment, along with other amino acid features identified as important in the GH literature (see fig. 4), as well as on a 3D structural model of G. gallus GH available from AlphaFold v. 2.1.1 (Jumper et al. 2021;Varadi et al. 2021;fig. 5). We also modeled zebra finch GH1 and GH2 sequences with AlphaFold v. 2.1.1 to confirm that major structural features (alpha helices, disulfide bridges) had the same predicted placement as in chicken GH (data not shown). To capture paralog sequence differences that may not show up as PSS, we visually examined the amino acid alignment of avian GH, GH1, and GH2 for fixed mutations between either passerine paralog and all other sequences.
Characterization of Expression of GH Axis Genes With RNA-seq As part of a separate study on male cooperative display behavior (Horton et al. 2019), 12 male wire-tailed manakins (P. filicauda) were collected from the wild and sacrificed at Tiputini Biodiversity Station, Ecuador. Whole brains, pituitary, and testes were extracted from birds 4-6 min postmortem and stored in dry ice until placed in liquid nitrogen.
The ten brain nuclei and two peripheral tissues were prepared as TruSeq Stranded mRNA Sample Prep Kit libraries per tissue and per individual. Sequencing was performed on an Illumina HiSeq 4000. We mapped resultant singleended reads to the annotated P. filicauda genome (GCA_003945595.1) using splice-aware RNA-seq mapper STAR v. 2.75 (Dobin et al. 2013) on two pass mode. We set mismatch tolerance to 0.04, allowing four mismatches per 100 bp read, which is less than the sequence divergence of each exon between Pipra GH1 and GH2, except the very short exon 1. Visual inspection in IGV (Interactive Genomics Viewer) showed no bias for mapping rates in exon 1 of either paralog. We calculated the total number of reads per gene and per exon using featureCounts v2.0.1 (Liao et al. 2014). We used a counts-per-million reads formula to calculate a length normalized count for genes using the total exon length. For detailed methods and permits for animal collection, import, and export, see Horton et al. (2019).