Newly Discovered Occurrences and Gene Tree of the Extracellular Globins and Linker Chains from the Giant Hexagonal Bilayer Hemoglobin in Metazoans

Abstract Multicellular organisms depend on oxygen-carrying proteins to transport oxygen throughout the body; therefore, proteins such as hemoglobins (Hbs), hemocyanins, and hemerythrins are essential for maintenance of tissues and cellular respiration. Vertebrate Hbs are among the most extensively studied proteins; however, much less is known about invertebrate Hbs. Recent studies of hemocyanins and hemerythrins have demonstrated that they have much wider distributions than previously thought, suggesting that oxygen-binding protein diversity is underestimated across metazoans. Hexagonal bilayer hemoglobin (HBL-Hb), a blood pigment found exclusively in annelids, is a polymer comprised up to 144 extracellular globins and 36 linker chains. To further understand the evolutionary history of this protein complex, we explored the diversity of linkers and extracellular globins from HBL-Hbs using in silico approaches on 319 metazoan and one choanoflagellate transcriptomes. We found 559 extracellular globin and 414 linker genes transcribed in 171 species from ten animal phyla with new records in Echinodermata, Hemichordata, Brachiopoda, Mollusca, Nemertea, Bryozoa, Phoronida, Platyhelminthes, and Priapulida. Contrary to previous suggestions that linkers and extracellular globins emerged in the annelid ancestor, our findings indicate that they have putatively emerged before the protostome–deuterostome split. For the first time, we unveiled the comprehensive evolutionary history of metazoan HBL-Hb components, which consists of multiple episodes of gene gains and losses. Moreover, because our study design surveyed linkers and extracellular globins independently, we were able to cross-validate our results, significantly reducing the rate of false positives. We confirmed that the distribution of HBL-Hb components has until now been underestimated among animals.


Introduction
Aerobic cells require oxygen for their maintenance and growth; therefore, multicellular organisms such as metazoans depend on proteins for transporting oxygen from the external environment to body tissues (Terwilliger 1998). These proteins play an essential role in organismal homeostasis, and hence have been extensively investigated (Riggs 1976). Hemoglobins (Hbs) occur in all domains of life displaying an extraordinary diversity of form and function, among which vertebrate Hbs stand out as the most studied blood pigments of all (Royer 1992;Vinogradov et al. 2006Vinogradov et al. , 2007Gell 2018). Even though invertebrate Hbs present a wide variation of structures, much less is known about their diversity, distribution, and evolutionary history. Recent genomic and transcriptomic studies demonstrated that the known diversity of oxygen-carrying proteins in animals, such as hemerythrins (Hrs) and hemocyanins (Hcs), was underestimated (Bailly et al. 2008;Mart ın-Dur an et al. 2013;Costa-Paiva et al. 2017. For instance, Hrs were previously thought to be present only in four invertebrate phyla (Annelida, Brachiopoda, Priapulida, and Bryozoa) (Kurtz 1992), and that Hcs were exclusive to Mollusca and Arthropoda (Burmester 2002). However, Bailly et al. (2008) reported a broad distribution of Hrs among annelids. Subsequently, Mart ın-Dur an et al. (2013) and Costa-Paiva et al. (2017 discovered Hcs and Hrs among other metazoans, with novel records in seven and nine animal phyla, respectively. These data consistently showed the extent of the underestimation of oxygen transport/binding protein diversity in metazoans. The giant, extracellular, hexagonal bilayer hemoglobin (HBL-Hb) is a protein complex involved in oxygen transport in annelids (Vinogradov 1985;Weber and Vinogradov 2001). This protein group presents a unique quaternary structure and is known in a variety of annelids . HBL-Hbs have high molecular masses in the range of 3,000-4,000 kDa and comprise 180 polypeptides chains, which are grouped into two categories: globins and nonglobins (Lamy et al. 1996). The structure of earthworm (Lumbricus terrestris) HBL-Hb, which is among the best studied, contains 144 globin chains (each of which binds O 2 reversibly), arranged into 12 dodecamers that assemble around a central core of 36 linker chains (12 trimers) ( fig. 1) (Lamy et al. 2000;Royer et al. 2005Royer et al. , 2006. HBL-Hbs are synthesized intracellularly and secreted. Their extracellular location in blood vascular or coelomic fluid systems seems to correlate with their large molecular size, and prevents their excretory loss through tissue membranes (Weber and Vinogradov 2001). The protein complex is extremely stable, being resistant to autoxidation, and is capable of transporting O 2 to tissues when transfused into mammals without producing side effects (Harrington et al. 2007). Such features make HBL-Hb a promising candidate to act as a blood substitute in human transfusions/therapeutics and for the preservation of organs, tissues, and cells . HBL-Hb represents one of four groups of invertebrates extracellular Hbs, which comprised a multisubunit complex made of globins with a single oxygen-binding site in each one of them, and linker chains that are devoid of heme (Vinogradov 1985;Weber and Vinogradov 2001). Giant extracellular Hbs were also referred to as erythrocruorins or chlorocruorins, but there was a lack of uniformity in usage of these terms. Therefore, those names are no longer employed.
The pioneering work of Gotoh et al. (1987) divided HBL-Hb globins into two main chains, A and B, based on their primary structures. Later, Yuasa et al. (1996) corroborated these results and suggested a further division into four paralogous globin types: A1/A2 and B1/B2. According to them, the gene duplication events that originated the four subtypes occurred before the Annelida diversification. Suzuki et al. (1990) was the first to describe the two linker chains, L1 and L2, based on their sequences. Suzuki and Riggs (1993) described one additional L3 chain and suggested that it was the result of an ancestral globin gene duplication event and sets of paralogous genes. They also demonstrated that linkers have a conserved cysteine-rich domain that is not present in globin sequences, but homologous to the low-density lipoprotein receptor class A (LDL-A) found in many metazoans, for example, humans and frogs (Sudhof et al. 1985;Mehta et al. 1991). The LDL-A module makes linkers conformationally capable of binding the globin dodecamers to form the hexagonal bilayer structure of the HBL-Hb complex (Suzuki and Riggs 1993). Fushitani et al. (1996) described a fourth linker chain L4; however, this chain appears to be a minor component, most likely a variation of the L3 chain (Vinogradov 2004;Royer et al. 2006). Nevertheless, the evolutionary history of linkers remains unclear.
Previous studies concerning the evolutionary history of extracellular globins and linker chains have been conducted on a very limited sampling of annelid species. Given the recent discoveries of additional Hcs and Hrs with expanded sampling (Bailly et al. 2008;Mart ın-Dur an et al. 2013;Costa-Paiva et al. 2017, we wanted to revise the evolution of Hbs and in particular HBL-Hbs. Therefore, to better understand the diversity and expression of extracellular globins and linkers that are used in making HBL-Hbs, we interrogated transcriptomic data from across bilaterians, including heavy sampling of annelids. Consistently with other oxygen-carrying proteins, namely hemocyanin and hemerythrin, our results confirm a much broader distribution of extracellular globins and linkers across metazoan, suggesting that the underestimation of oxygencarrying proteins across animals is a pattern.

Materials and Methods
Transcriptomes of 319 metazoans from 16 phyla and one choanoflagellate were employed in this work, and information about all species is listed in supplementary file 1, Supplementary Material online. Transcriptomic data were originally obtained as part of the WormNet II project to resolve annelid phylogeny. Samples were collected using several techniques, including intertidal sampling, dredge, and box cores. All samples were preserved in RNALater or frozen at À80 C. RNA extractions, cDNA preparation, and sequencing followed Kocot et al. (2011) and Whelan et al. (2015). Total RNA was extracted either from whole animals or from the body wall. After extraction, RNAs were purified using TRIzol (Invitrogen) or the RNeasy kit (Qiagen) with on-column DNase digestion. The SMART cDNA Library Construction Kit (Clontech) was used to reverse transcribe single-stranded RNA template. Double-stranded cDNA synthesis was completed with the Advantage 2 PCR system (Clontech). The Genomic Services Lab at the Hudson Alpha Institute (Huntsville, AL) barcoded and sequenced libraries with Illumina technology. Paired-end runs were 100 or 125 bp in length, utilizing either v3 or v4 chemistry on Illumina HiSeq 2000 or 2500 platforms (San Diego, CA). Finally, paired-end transcriptome data were digitally normalized to an average k-mer coverage of 30 using normalize-by-median.py (Brown et al. 2012) and assembled using Trinity r2013-02-25 with default settings (Grabherr et al. 2011).
To search in silico for putative extracellular globin and linker genes associated with HBL-Hbs, we employed the Trinotate annotation pipeline (http://trinotate.github.io/; last accessed December 11, 2018) (supplementary file 2, Supplementary Material online) (Grabherr et al. 2011), which uses a BLASTbased method against the EggNOG 4.5.1 (Huerta-Cepas et al. 2016) and KEGG (Kanehisa et al. 2016) databases to provide the Gene Ontology (GO) annotation. The GO is a standardized functional classification system for genes that describes the properties of genes and their products using a dynamic-updated controlled vocabulary (Gene Ontology Consortium 2004). The Trinotate pipeline uses the following software: HMMER 3.2.1, for protein domain identification (Finn et al. 2011); tmHMM 2.0, for prediction of transmembrane helices in proteins (Krogh et al. 2001); RNAmmer 1.2, for prediction of ribosomal RNA (Lagesen et al. 2007); SignalP 4.1, to predict signal peptide cleavage sites (Petersen et al. 2011); GOseq, for prediction of the GO (Young et al. 2010); and EggNOG 4.5.1, for searching orthologous groups (Huerta-Cepas et al. 2016). Because HBL-Hb comprised two major protein components-linkers and globins-two independent searches were performed in attempt to provide a cross-validation of the results of each one of the searches, aiming to eliminate false-positive records. Retrieved sequences were manually verified by inspecting each functional annotation in order to select sequences annotated with Trinotate as linkers or extracellular globins. In one search, sequences annotated as linkers were retained, whereas extracellular globins were retrieved from a separated search. Contigs identified putatively as linkers or extracellular globins were subsequently translated into amino acids using TransDecoder with default settings (https://transdecoder. github.io/; last accessed October 3, 2018). TransDecoder may produce multiple open reading frames; therefore, all translated amino acid sequences were evaluated through the Pfam domain check (Finn et al. 2016) using the EMBL-EBI protein database with an e-value cutoff of 10 À10 . Sequences with a confirmed linker or extracellular globin Pfam domain, and with more than 200 and 130 amino acids respectively, were analyzed further. Manual evaluation was performed in order to verify the presence of the characteristic cysteine-rich amino acid signature found in linkers, including LDL-A: Cys-X (5-7) -Cys-X (5-6) -Cys-X (6) -Cys-Asp-X (3) -Asp-Cys-X (4) -Asp-Glu-X (2-4) -Cys (supplementary file 3, Supplementary Material online) (Negrisolo et al. 2001;Chabasse, Bailly, Sanchez, et al. 2006). For extracellular globins, the 12 invariant amino acid residues found in the main chains A and B  were used to confirm identity, including the Cys-19 residue which is the main feature that differs the extracellular globins from the intracellular ones (supplementary file 4, Supplementary Material online) (Shishikura et al. 1986;Gotoh et al. 1987;Yuasa et al. 1996;Negrisolo et al. 2001;. Sequences retained after all those steps were considered as linkers or extracellular globin genes and thus were separated into two respective protein data sets. Two protein data sets were created: the globin data set, comprised all 559 extracellular globin genes retained after the validation steps, and the linker data set, comprised all 414 linker genes retained after the validation steps. Both data sets were independently aligned with MAFFT using the accurate algorithm E-INS-i (Katoh and Standley 2013), and gap-rich regions in the alignments were removed with trimAl 1.2 (Capella-Guti errez et al. 2009) using a gap threshold of 0.5 for linker genes and 0.75 for extracellular globin genes (supplementary files 3 and 4, Supplementary Material online). Using Geneious 9.1.3 (Kearse et al. 2012), alignments were visually checked and trimmed to exclude residues 5 0 of the putative start codon. Resulting amino acid alignments were used for phylogenetic analyses. Given our data had a predominance of annelid samples, we reconstructed two gene genealogies for both extracellular globin genes and linker genes. For each gene, one tree included all samples from only Annelida and the other spanned all Metazoa samples. The four data sets used to build the trees were as follows: 1) EG-Ann: all 516 extracellular globin sequences which we found in 140 annelid species ( fig. 2); 2) LIN-Ann: all 387 linker sequences which we found in 153 annelid species ( fig. 3); 3) EG-Met: all 67 extracellular globin sequences of the 22 metazoans in which we found genes distributed in three hemichordates, two echinoderms, two brachiopods, four mollusks, one nemertean, one platyhelminth, one bryozoan, one phoronid, one priapulid, and six annelid species (fig. 4B); and 4) LIN-Met: all 37 linker sequences from the 26 metazoans in which we found genes distributed in two echinoderms, four hemichordates, two brachiopods, three platyhelminths, four mollusks, one nemertean, one bryozoan, one phoronid, one priapulid, and seven annelid species ( fig. 4A). The annelids used in both metazoan trees were chosen to represent major families indicative of the breadth of Annelida. ProtTest 3.4 was used to find the bestfit model of protein evolution for the data sets using the Akaike and Bayesian Information Criteria (Darriba et al. 2011). Bayesian phylogenetic inferences were implemented in MrBayes 3.2.1 (Ronquist and Huelsenbeck 2003) with two independent runs, each containing four Metropolis-coupled chains that were run for 10 7 generations and sampled every 500th generation to approximate posterior distributions. In order to confirm whether chains achieved stationary and determine an appropriate burn-in, we evaluated trace plots of all MrBayes parameter outputs in Tracer v1.6 (Rambaut et al. 2014). The first 25% of samples were discarded as burn-in and a majority rule consensus tree generated using MrBayes. Bayesian posterior probabilities were used for assessing statistical support of each bipartition. The maximum likelihood trees were constructed with RAxML (Kozlov et al. 2018) using the following parameters: The tree topology search was done using the best of BioNJs and NNIs; WAG model for amino acids substitution; uniform substitution rates among sites and bootstrap supports were provided by a 100-replicates. The resultant trees were summarized with FigTree 1.4.3 (Rambaut 2009) and rooted using midpoint rooting (Farris 1972;Hess and Russo 2007). Tertiary structures of extracellular globins and linkers were inferred using the automated protein structure homologymodeling server SWISS-MODEL (Arnold et al. 2006;Kiefer et al. 2009). Representatives of linker and extracellular globin genes from each newly reported phylum were used to confirm the existence of high similarity in tertiary structure (supplementary files 5 and 6, Supplementary Material online).
Additionally, in order to increase phylogenetic coverage, genomes from GenBank were surveyed to verify the presence of linkers and extracellular globins. We employed TBlastN with an e-value cutoff of 10 À5 . TBlastN search translated nucleotide databases from NCBI using a protein query (Altschul et al. 1990). We compared the genomes from NCBI of 13 species from eight phyla (Porifera, Cnidaria, Ctenophora, Chordata, Placozoa, Nematoda, Tardigrada, and Arthropoda-supplementary file 7, Supplementary Material online) with two different queries, one comprised five linker sequences from five metazoan species previously selected for possessing HBL-Hbs (MH995534, MH995580, MH995661, MH995788, and MH995796) and another one comprised five extracellular globin sequences from five metazoan species previously selected for possessing HBL-Hbs (MH995926, MH995932, MH996036, MH996080, and MH996186).

Results
In the initial screening of Trinotate output, we recovered 4,699 nucleotide sequences annotated as extracellular globins and 1,632 nucleotide sequences annotated as linkers. After translation from nucleotides to amino acids and undergoing thorough validation steps such as Pfam domain evaluation, and selection by minimum size, 559 extracellular globin genes and 414 linker genes remained (supplementary file 2, Supplementary Material online). These genes are actively transcribed in 171 species representing ten animal phyla (table 1). Linkers and extracellular globins have not been previously reported from Echinodermata, Hemichordata, Brachiopoda, Mollusca, Nemertea, Bryozoa, Phoronida, Platyhelminthes, and Priapulida. The number of expressed extracellular globin genes in a given species ranged from one in 26 different species to 12 in Amynthas sp. (Megascolecidae, Annelida) and Terebellides stroemii (Trichobranchidae, Annelida). The number of expressed linker genes ranged from one in 55 different species to 11 in Randiella sp. (Randiellidae, Annelida). Transcriptomes from one hemichordate (Balanoglossus aurantiaca), two platyhelminths (Acipensericola petersoni and Elopicola sp.), and 14 annelid species contained only linker genes and lacked extracellular globin genes. Although other species from these phyla presented both globin and linker genes (table 1), as we used transcriptomic data, we can only make inferences about the presence of gene signatures and will not draw conclusions about absences. None of the 13 genomes from GenBank that were surveyed contained either linkers or extracellular globin genes.
The tertiary structures of linker and extracellular globin genes inferred using the SWISS-MODEL server resulted in proteins with putative respiratory function. Therefore, they could be considered as potentially functional proteins capable of assembling into the HBL-Hb and the presence of complete linkers and extracellular globins was confirmed in each newly recorded phylum. However, using only bioinformatic data, verifying that linkers and globins are assembled and acting as an oxygen-carrying protein complex is not an easy task, but we have taken steps to confirm that the genes under examination possess the required features of functional genes. Complete amino acid alignments of extracellular globin genes had the maximum sequence length of 158 residues, and the alignment of linkers had the maximum length of 236 residues. All extracellular globin sequences started with a methionine and for nearly all the linker sequences the start methionine     was recovered, except for B. aurantiaca (Hemichordata), Hypomenia sp. (Mollusca), Leptochiton rugatus (Mollusca), Spathoderma clenchi (Mollusca), and Stereobalanus canadensis (Hemichordata). Nevertheless, these sequences were maintained due to their high similarity with the remaining linker sequences. All extracellular globin and linker sequences contained their characteristic signature residues, which is a key indicator of potential respiratory function (Gotoh et al. 1987;Yuasa et al. 1996;Negrisolo et al. 2001;Chabasse, Bailly, Sanchez, et al. 2006).
The best-fixed rate model for all data sets was the WAG model. As expected, the EG-Ann gene tree ( fig. 2) topology did not mirror the recent Annelida phylogeny (Weigert and Bleidorn 2016). The four paralogous globin types-A1, A2, B1, and B2-were not recovered as monophyletic groups. Only the two main chains originally proposed, A and B, were recovered as clades with strong statistical support (PP > 0.85; fig. 2). In the LIN-Ann gene tree ( fig. 3), topological relationships also mismatched annelid phylogeny, supporting the monophyly of each linker chain, L1, L2, and L3 ( fig. 3). Both trees representing sampling across metazoans did not reflect the recent phylogenies of Metazoa (Whelan et al. 2015;Halanych 2016;Kocot et al. 2017). In the EG-Met tree, globin chains A and B clustered into two major clades with high support values (PP ¼ 1.0; fig. 4B), but further subdivision into A1/ A2 and B1/B2 groups could not be recovered ( fig. 4B). In the LIN-Met tree, the three main linker chains did not form monophyletic groups when metazoan sequences were included ( fig. 4A). Finally, as shown in the metazoan trees (EG-Met and LIN-Met; fig. 4A and B), highlighted boxes with the same colors (red, blue, yellow, and green clades; fig. 4A and B) indicate strongly supported clades (PP ¼ 1.0) including the same species in both trees. Although the species in each clade are not closely phylogenetically related, they clustered as sister groups in both metazoan trees with strong statistical support.

Discussion
Extracellular globin genes and associated linker genes are much more diverse and broadly distributed than previously recognized (Weber and Vinogradov 2001). Herein, actively transcribed linker genes and extracellular globin genes were found in 16 species from nine phyla (other than Annelida) including the first record of extracellular globin genes in deuterostomes. As their distribution was thought to be exclusive to Annelida, all works on evolutionary hypotheses of the emergence of HBL-Hbs suggested that the molecule was already present in, but limited to, the common annelid ancestor (Yuasa et al. 1996;Negrisolo et al. 2001;Bailly 2002).
Considering that HBL-Hb oligomers are formed by 144 extracellular globins and 36 linker chains, the independent evolution of this protein complex is unlikely. As no linkers or extracellular globin genes were found in any of the nonnephrozoan genomes surveyed, such as sponges and cnidarians, we suggest that HBL-Hbs must have arisen in the nephrozoan ancestor ( fig. 5), just prior to the deuterostomeprotostome split. Some phylogenetically uncorrelated species, for example, annelids and starfishes (green boxes, fig. 4A and B), clustered as sister groups in both EG-Met and LIN-Met trees with high support values, and this seems to be an indication that linkers and extracellular globins may present coevolutionary dynamics. Goodman et al. (1988) and Hardison (1998) proposed the existence of a common ancestral globin gene that was present before the invertebrate-vertebrate divergence, which is paralogous to the ancestral myoglobin gene. They suggested that if the primitive Hb of metazoans was probably monomeric, thus the multisubunit Hbs represent an independently derived state in Annelida (Goodman et al. 1988). Considering that HBL-Hbs mostly likely appeared in the nephrozoan ancestor, and not in the annelid ancestor, this independently derived state can be extrapolated to nephrozoans. Yuasa et al. (1996) have proposed an evolutionary model for the HBL-Hbs, in which the common ancestor of this molecule was a protein formed only by globin chains, being the linker chains added posteriorly to form the final hexagonal bilayered structure. The work of Shishikura et al. (1986) indicates that the assembly of the extracellular globin trimers which form the globin dodecamer of HBL-Hbs is only possible with the presence of the Cys-19 residue. The free cysteine residue is responsible for the disulfide bond of extracellular globin trimers, which represents the main feature that differs extracellular globins from the intracellular ones (Shishikura et al. 1986;Gotoh et al. 1987). More recently, an additional six "globins" have been identified in vertebrates, namely androglobin, cytoglobin, globin E, globin X, globin Y, and neuroglobin, which form the globin superfamily alongside Hb and myoglobin (reviewed by Burmester and Hankeln 2014). These proteins are structurally conserved but differ in their amino acid sequence compositions and cellular functions, for example, signal transduction, lipid, and nitric oxide metabolism. The ancestor to this protein superfamily is estimated to have arisen some 1.5 Ga (Burmester and Hankeln 2014). These data, alongside our novel observations of giant extracellular Hbs, illustrate a much broader representation and functional repertoire for "globins" in extant species.
According to the pioneering work of Gotoh et al. (1987), the gene duplication events that gave rise to the two paralogous extracellular globin chains A and B occurred before the Annelida diversification. They suggest that the two paralogous extracellular globin genes arose from a single ancient duplication event, comparable to the a and b chains from the vertebrate Hbs (Gotoh et al. 1987). Although later works recovered clades A and B (Negrisolo et al. 2001;Bailly 2002;, they failed to support further subdivisions A1, A2, B1, and B2 as monophyletic groups, which agrees with our findings. Thus, our results confirmed the early hypothesis of two main paralogous globin chains, A and B (Gotoh et al. 1987), both in annelids and other metazoans. Further subdivisions (A1/A2 and B1/B2) though were not consistently recovered, either in annelids or in other metazoans. Regarding linkers, the works of Suzuki et al. (1990) and Suzuki and Riggs (1993) described three linker chains among annelids (L1, L2, and L3). Negrisolo et al. (2001), analyzing seven linker sequences, were unable to recover the three linker subtypes as monophyletic groups. Using 22 linker sequences, Chabasse, Bailly, Sanchez, et al. (2006) clustered linkers in two main clades but with low statistical support values. Employing more than 400 linker sequences, our much larger sample generated results that support the existence of three linker types L1, L2, and L3 among Annelida in agreement with Suzuki et al. (1990) and Suzuki and Riggs (1993). However, when other metazoan taxa were included in the analyses, none of the three linker types could be recovered as monophyletic groups. We corroborate the findings of three linker groups for annelids; however, this does not appear to be the case for higher taxonomic levels. Although the three linker clades L1, L2, and L3 presented low bootstrap values in the LIN-Ann gene tree ( fig. 3), we considered these groups to be valid because they have already been widely recorded and discussed in the literature in annelid species (Suzuki et al. 1990;Suzuki and Riggs 1993;Royer et al. 2006), and the posterior probability values have also corroborated the monophyly of the three groups ( fig. 3). Moreover, our findings confirmed that Annelida harbors the greatest diversity of oxygencarrying proteins among all other animal phyla (Costa-Paiva et al. 2017. Like Hrs and Hcs, the expansion of the extracellular globins in invertebrates may be associated with diverse biological functions other than oxygen transport. For example, Hrs and Hcs participate in metal detoxification (cadmium) and innate immunity, respectively. Hbs, including those from humans, blood clams (e.g., Telligarca granosa), alligators, and fish are precursors of antimicrobial peptides and act as enzymatic antioxidants under certain conditions in vivo and in vitro (Coates and Decker 2017). Lamy et al. (2000) suggested that the presence of the three linker types is not required for the assembly of the HBL-Hb molecule. Nevertheless, they showed that every linker chain can replace each other in the HBL-Hb assembly, and the molecule can assemble with only one or two types of linker chains. Their results also suggest that the globin dodecamer is unstable without linkers (Lamy et al. 2000). Additionally, the inferred tertiary structures of the novel extracellular globins suggested that they could have a putative respiratory function FIG. 5.-Hypothesized relationships among metazoan phyla derived from recent phylogenomic studies (Whelan, et al. 2015;Halanych 2016;Kocot, et al. 2017). Names in bold represent analyzed taxa, n is the total number of species analyzed in each phylum, blue rectangles represent the number of species with at least one extracellular globin sequence, and green rectangles represent the number of species with at least one linker sequence. NA is nephrozoan ancestor.
(supplementary file 5, Supplementary Material online). Using only bioinformatic data, we cannot state that the large complex structure of HBL-Hbs is the same in all metazoan lineages, which could be a unique adaptation within Annelida. Considering that the role of the linker subunits is predominantly structural, because in their absence, the functional globin dodecamer does not assemble into the hexagonal bilayer structure (Lamy et al. 1996), we believe this is sufficient evidence that the structure of large HBL-Hb complexes made of linkers and globins is a feature also present in other metazoans besides annelids. Because linkers can replace each other and the subdivision of extracellular globins into A1/A2 and B1/ B2 groups appears to be largely descriptive, we argue that species expressing extracellular globins and linkers could be capable of assembling the subunits into HBL-Hbs to facilitate oxygen transportation.
Our findings demonstrate that the diversity of HBL-Hbs is much greater than traditionally assumed and is found in multiple metazoan lineages, and both linkers and extracellular globins were likely present in the nephrozoan ancestor. The comprehensive phylogenetic analyses of transcriptomic data from >100 metazoans corroborated the results of Gotoh et al. (1987), which classified extracellular globins into two major chains (A and B). Conversely, our data did not support the subdivision of extracellular globins into A1/A2 and B1/B2 groups, indicating that they are not natural subfamilies. Also, our data supported the subdivision of linker units into L1, L2, and L3 in annelids only. Although the reconstructed tertiary structures of novel extracellular globin have shown the presence of putative oxygen-binding sites, additional studies on the biochemical properties of HBL-Hbs within the newly recorded groups would be the next step for confirming their oxygencarrying capabilities.