Emergence of a Thrombospondin Superfamily at the Origin of Metazoans

Abstract Extracellular matrix (ECM) is considered central to the evolution of metazoan multicellularity; however, the repertoire of ECM proteins in nonbilaterians remains unclear. Thrombospondins (TSPs) are known to be well conserved from cnidarians to vertebrates, yet to date have been considered a unique family, principally studied for matricellular functions in vertebrates. Through searches utilizing the highly conserved C-terminal region of TSPs, we identify undisclosed new families of TSP-related proteins in metazoans, designated mega-TSP, sushi-TSP, and poriferan-TSP, each with a distinctive phylogenetic distribution. These proteins share the TSP C-terminal region domain architecture, as determined by domain composition and analysis of molecular models against known structures. Mega-TSPs, the only form identified in ctenophores, are typically >2,700 aa and are also characterized by N-terminal leucine-rich repeats and central cadherin/immunoglobulin domains. In cnidarians, which have a well-defined ECM, Mega-TSP was expressed throughout embryogenesis in Nematostella vectensis, with dynamic endodermal expression in larvae and primary polyps and widespread ectodermal expression in adult Nematostella vectensis and Hydra magnipapillata polyps. Hydra Mega-TSP was also expressed during regeneration and siRNA-silencing of Mega-TSP in Hydra caused specific blockade of head regeneration. Molecular phylogenetic analyses based on the conserved TSP C-terminal region identified each of the TSP-related groups to form clades distinct from the canonical TSPs. We discuss models for the evolution of the newly defined TSP superfamily by gene duplications, radiation, and gene losses from a debut in the last metazoan common ancestor. Together, the data provide new insight into the evolution of ECM and tissue organization in metazoans.


Introduction
The extracellular matrix (ECM) of metazoans holds great fascination because of its roles in support of metazoan multicellularity and organismal complexity. In vertebrates, ECM is highly tissue specific and typically composed of around 150-300 proteins, along with carbohydrate components (Naba et al. 2012). Comparisons of the genome-predicted ECM proteins of bilaterian and nonbilaterian animals have demonstrated to date a suite of ECM proteins that are conserved between cnidarians and human. These include fibrillar-type collagens; associated proteins that are central to the production and assembly of collagen fibrils; laminin and collagen IV as core components of basement membranes, and thrombospondin (TSP) and SPARC/osteonectin as highly conserved matricellular glycoproteins (Exposito et al. 2010;€ Ozbek et al. 2010;Bertrand et al. 2013;Fidler et al. 2017;Lommel et al. 2018). These ECM proteins are considered likely to have contributed to the origin of metazoan multicellularity and are deduced to have originated in or after the last metazoan common ancestor because equivalent ECM proteins are not encoded in choanoflagellates or Capsaspora owczarzaki that represent the closest unicellular relatives of metazoans (King et al. 2008;Fairclough et al. 2013;Suga et al. 2013;Williams et al. 2014;Brunet and King 2017).
The above research has emphasized the conservation of certain types of ECM proteins. Current genomic and transcriptomic resources make it feasible to explore another question of importance with regard to the evolution of complex multicellularity and the estimated rapid radiation of nonbilaterian phyla : the radiation and diversification of ECM proteins within early-diverging metazoan phyla. Here, we investigated this issue with regard to the TSPs, secreted glycoproteins that have wide pathophysiological significance in mammalian ECM and the pericellular environment (Adams and Lawler 2011;Murphy-Ullrich and Sage 2014;Stenina-Adognravi and Plow 2019). TSPs are multidomain, calcium-binding glycoproteins, many of which oligomerize cotranslationally as trimers (subgroup A) or pentamers (subgroup B) (Adams and Lawler 2011;Vincent et al. 2013). TSPs with domain architectures related to subgroup B have been identified in cnidarians and Drosophila, where they have roles in body axis maintenance and developmental ECM organization, respectively (Chanana et al. 2007;Tucker et al. 2013;Lommel, Strompen, Hellewell et al. 2018).
TSPs have many "domain relatives," that is, proteins that have a single type of domain in common with TSPs. In particular, the epidermal growth factor (EGF)-like and thrombospondin type 1 domains (TSRs) are each widely represented in other categories of extracellular proteins (Tucker 2004;Wouters et al. 2005) and are also found outside the Metazoa (Adams and Lawler 2011). To date, the domain architecture of the C-terminal region of TSPs, that forms an integrated structural unit (comprising tandem repeated EGFlike domains, a series of calcium-binding TSP type 3 repeats and a C-terminal, L-type lectin domain), has been considered a unique signature of the TSPs (Kvansakul et al. 2004;Carlson et al. 2005).
A recent exploratory study of the TSPs of the anthozoan cnidarian, Nematostella vectensis, identified four transcribed paralogs. In phylogenetic trees based on the conserved TSP Cterminal region, one of these, Nv85341, has an unexpectedly close phylogenetic relationship with a thrombospondin of Ciona intestinalis termed TSP-DD (Tucker et al. 2013). Originally identified from expressed sequence tags, C. intestinalis TSP-DD was characterized by its apparent N-terminal discoidin-like domain (DD) and is secreted from cells as a monomer. At the time of its identification in 2010, protein sequence orthologs of TSP-DD were restricted to invertebrate deuterostomes (Bentley and Adams 2010).
The apparently anomalous identification of a possible TSP-DD-like polypeptide in N. vectensis led us to new investigations of early-diverging metazoans (cnidarians, poriferans, and ctenophores). We report here on previously undisclosed categories of TSP-related proteins, which we have designated mega-thrombospondin (mega-TSP), sushi-thrombospondin (sushi-TSP), and poriferan-TSP. All the predicted proteins are clearly related to TSPs by inclusion of the characteristic TSP Cterminal region domain architecture and differ in other domains and their phylogenetic distributions within the Metazoa. We present the first systematic evaluation of these proteins, their phylogenetic relationships with canonical TSPs, and the first analysis of biological function of a mega-TSP. These data illuminate the existence of an unappreciated TSP superfamily and lead to a new evolutionary scenario for the emergence of the canonical TSPs with implications for understanding of early metazoan evolution.

Identification of New Categories of Conserved TSP-Related Proteins
Comparative genomic and transcriptomic searches were carried out initially with the predicted partial protein sequence of N. vectensis TSP85341 (Nv85341) and then with other representative TSPs. These led to the identification of further predicted TSP-related protein sequences in multiple cnidarians. Because some of these sequences are predicted as much longer polypeptides than a canonical TSP (e.g., 2,827 aa for XP_012565470 of Hydra vulgaris, whereas canonical TSPs range from 840 to 1,152 aa), the complete nucleotide sequence of the open reading frame (ORF) of Nv85341 was obtained by DNA sequencing from cDNA prepared from RNA from 2-month-old juvenile N. vectensis polyps. Similarly, a related ORF of Hydra magnipapillata (seq379420, XM_002157707) was confirmed and extended from a transcriptome database (Hydra 2.0 Web Portal, https://research.nhgri.nih.gov/hydra/; last accessed October 2018). The full ORF has 99% identity to XP_012565470 of H. vulgaris. The complete N. vectensis and H. magnipapillata protein sequences and closely related partial sequences from other cnidarians were then used to query genomic and transcriptomic databases at NCBI and other repositories, which led to identification of further categories of proteins.
The most frequently identified type of TSP-related protein was conserved in multiple metazoan phyla from ctenophores to basal chordates, yet was not identified in ecdysozoans (arthropods and nematodes) or craniates (hagfish, lamprey and jawed vertebrates). We designated these proteins "mega-thrombospondins" (mega-TSPs; MT) because of their very large size (typically >2,700 aa) and because the discoidin domain is not present in the earliest emerging forms (see below). In most species, a single mega-TSP was identified; some species encode two distinct gene products (table 1A). We also identified a separate category of TSP-related protein that contains repeated sushi domains near the N-terminus; this was designated "sushithrombospondin" (sushi-TSP; ST). The sushi-TSP group was identified (as of February 2019) to be encoded in poriferans of classes Calcarea and Homoscleromorpha and cnidarians of class Anthozoa, orders Scleractinia and Actiniaria (table 1B). A distinct form of TSP-related protein, containing multiple tandem TSRs, was identified only in poriferans and designated as "poriferan-TSP" (PT) (table 1C). For all three categories of TSP-related protein, evidence of transcription was obtained from expressed sequence tag and/or transcriptome databases, with some ORFs being predicted directly from transcriptome databases. An additional form of TSP, more closely related in domain organization to pentameric TSPs but lacking a distinct N-terminal domain or coiled-coil domain, was identified uniquely from the transcriptome of the calcareous sponge Sycon ciliatum (table 2).
These studies also disclosed variation in the number of TSP-related proteins per species in poriferans, albeit that some identifications were uncertain due to ORFs that are incomplete for the N-terminus. The poriferans examined Evolution of Thrombospondin Superfamily . doi:10.1093/molbev/msz060 The domain architecture of EGF-like domains, TSP type 3 repeats and a single L-type lectin-like domain (also sometimes annotated as "TSP-C" or "Concanavalin A-like" (Con A-like) domain at the C-terminus is a defining feature of canonical TSPs ( fig. 1A; the animal groups to which these species belong are given in table 1). Domain analysis of full-length sequences representative of the newly identified TSP-related proteins showed that each has a distinctive, characteristic domain architecture. Thus, all mega-TSPs identified have the conserved features of a N-terminal secretory signal peptide; an extensive region (around 1,000 aa) predicted as multiple, tandem repeated leucine-rich repeats (LRRs); a central region of variable length and domain composition in which one or more cadherin domains are a well-conserved feature; a canonical TSP C-terminal region of 900-1,000 aa, followed by a second globular domain at the C-terminus that is variably annotated as "TSP-C," "ConA-like," or "MAM" (meprin, A-5 protein, and receptor protein-tyrosine phosphatase mu [Beckmann and Bork 1993] 1A and B).

MBE
In mega-TSPs from ctenophores, the central region includes a single cadherin domain ( fig. 1B). In mega-TSPs from sponges, the region also includes the immunoglobulin-like domain and, in Leucosolenia complicata mega-TSP, a "tyrosine kinase ephrin A/B receptor-like" domain (IPR01641) in addition to a cadherin-like domain: it can be noted that immunoglobulin-like and cadherin domains belong to the same "Greek key," beta-sandwich fold group (Hutchinson and Thornton 1993). All cnidarian mega-TSPs examined include one or more cadherin domains in the central region, and H. magnipapillata mega-TSP (HmMT) also includes a beta-barrel PA14 domain (InterPro011658) ( fig. 1B). In mega-TSPs from bilaterians, the central region also typically includes a fibronectin type III domain (also a member of the Greek key fold group), one or more F5_F8 or DD domains, and a growth factor receptor cysteine-rich domain (IPR009030; a fold related to the "tyrosine kinase ephrin A/B receptor-like domain") ( fig. 1B). Thus, the DD is not intrinsic to the mega-TSPs but appears as a bilaterian-specific domain addition.
Sushi-TSPs are shorter proteins (around 1,000-1,300 aa) that also lack a coiled-coil domain and are distinguished by the presence of three or more tandem sushi/short consensus repeat domains in the N-terminal region and a single EGF-like domain. In common with the mega-TSPs, the sushi-TSP of the sponge, Oscarella carmela, has tandem TSP-C/Con A-like and Con A/MAM-like domains proximal to the C-terminus, whereas sushi-TSPs from Orbicella faveolata and Acropora digitifera (both cnidarians) include three Con A-like domains at the C-terminus (examples in fig. 1C).
The poriferan-TSPs, identified in sponges from classes Demospongia, Calcarea, and Homoscleromorpha, have a closer resemblance of domain composition to canonical TSPs, but lack a laminin-G-like N-terminal domain or a coiled-coil domain and contain extended sets of tandem TSR domains (only three TSR are present in TSP1 and TSP2 of vertebrates). The full-length sequences identified from the demosponges Amphimedon queenslandica (Srivastava et al. 2010), Haliclona tubifera (GenBank transcriptome GFAV00000000.1), and Xestospongia testudinaria (Ryu et al. 2016) (table 1), each also contain an uncharacterized region of about 130 aa after the signal peptide, followed by five or six TSR domains, four EGF-like domains, tandem TSP type 3 repeats and a single, Con A-like, C-terminal domain ( fig. 1D). A unique form of TSP-related protein was also identified in the calcareous sponge S. ciliatum, in which the secretory signal peptide is followed by an unrelated  1E).
Because none of the TSP-related proteins were identified in nonmetazoans, we examined the representation of their component domains in other eukaryotes, with emphasis on the lineages of protists most closely related to Metazoa (supplementary fig. 1A, Supplementary Material online). With the exceptions of TSP-N and TSP-C domains and the TSP type 3 repeats, all other major conserved domains of TSP superfamily members are characterized in the InterPro database as conserved in other unikont lineages. The structural foldfamily of the TSP-N and TSP-C domains, as represented by LN-G and Con-A domains, is widely represented in unikonts (supplementary fig. 1B, Supplementary Material online). We also searched the genome-predicted proteins of relevant protist species, with emphasis on the closest relatives of metazoans, for the major conserved domains of TSP superfamily members. This approach yielded similar data, although the vWF_C domain was not identified in these species. The most widely conserved domains were the LRR, EGF-like, and sushi domains: all the other domains showed more limited phylogenetic representation yet were all represented in choanoflagellates (supplementary fig. 1C, Supplementary Material online). Thus, the separate domains of TSP superfamily members evolved before the Metazoa.

Molecular Modeling of LRRs and TSP C-Terminal Region
To examine whether the predicted LRRs and TSP-like Cterminal regions of the TSP-related proteins conformed to known structures, molecular modeling was carried out on the relevant sequence regions of mega-TSPs and sushi-TSP from cnidarians and a sponge, respectively ( fig. 2A).
HmMT was used to model the predicted LRR region of 863 aa. This region could be modeled as a set of tandem LRR with good correspondence to known structures of tandem LRR, with the inner face characterized by betasheets and a predominance of loop structures on the outer face ( fig. 2B).
The HmMT C-terminal region sequence shared 42% identity with the C-terminal region of TSP5, on which structure (PDB 3FBY, Tan et al. 2009), the model was based. The second ConA-like domain shared 21% identity with the structure of PDB 4DQA and the model was based on this structure. The separate alignments of HmMT with 3FBY and 4DQA overlapped in part and were in agreement in the overlap region, which added support to the modeled conformation of the two ConA-like domains with respect to each other ( fig. 2C). The short alpha-helical C-terminal extension from the ConA domain of the 3FBY crystal structure provided a reasonable point of attachment for modeling the second ConA domain in HmMT, allowing sufficient space to comfortably accommodate the model of the second ConA-like domain. The Cterminal region of N. vectensis mega-TSP (NvMT) was also modeled on PDB 3FBY (of 46% identity) and the second Con A-like domain on PDB 4DQA (of 21% identity). Again, the model had good correspondence to the TSP domain  2D). There was no sequence overlap in these alignments, so the orientation between the two Con A-like domains is less well supported. For the model of the C-terminal region of S. ciliatum sushi-TSP (ScST), the type 3 repeats and first ConA-like domain had 49% sequence identity with 3FBY and corresponded well with this structure ( fig. 2E). However, the second ConA-like domain could not be modeled due to low identity with the structures in the RCSB structural databank. The sequence identity of the second ConA-like domain between HmMT and NvMT is 31%, between HmMT and ScST is 20.1%, and for NvMT and ScST is 22%. Thus, these domains of mega-TSPs are more similar to each other than they are to the available structures. In view of the possible conservation of a second ConA-like domain of ScST, the 4DQA structure is shown placed in an analogous position ( fig. 2E). Overall, the models of the TSP-like C-terminal regions corresponded very well to known TSP structures, supporting the relationship with canonical TSPs.
Procheck (Laskowski et al. 1993) was used to evaluate the "protein-like" quality of the models with respect to phi-psi angles for residue types in Ramachandran space, bond lengths, and other parameters. This revealed that 98.6% of residues 108-971 of HmMT in the LRR model and 99.6% of the residues in the NvMT C-terminal region model were in favored or allowed regions of Ramachandran space and the quality of the models' geometry was better than a 2-2.5-Å structure. The extended N-terminal LRR of the HmMT model showed adequate correspondence given that the sequence homology with templates was low and discontinuous. For the modeled C-terminal regions, the conservation of disulfide bridges with the TSP C-terminal structures provided additional confidence in self-consistency (table 3).

Partial Conservation of Functional Motifs of TSPs in TSP Superfamily Proteins
Several peptide motifs have been identified within the C-terminal regions of canonical TSPs that have important roles in LGN, laminin G-like N-terminal domain; cc, coiled coil; vWF_C, von Willebrand factor C domain; TSR, thrombospondin type 1 domain; EGF, epidermal growth factor-like domain; PRT, proline-arginine-threonine-rich region; LRRs, leucine-rich repeats; CAD, cadherin domain; DD, discoidin domain; FNIII, fibronectin type III domain; GFR-cys-rich, growth factor receptor cysteine-rich domain; Ig, immunoglobulin domain; and SCR, short consensus repeat.
Evolution of Thrombospondin Superfamily . doi:10.1093/molbev/msz060 MBE cell or ECM interactions (arrowed in fig. 2C-E). The possible conservation of these motifs was first examined by multiple sequence alignment of the C-terminal regions. In TSP1, an Arginine-glycine-aspeactic acid (RGD) motif in type 3 repeat 7 coordinates a calcium ion and can bind integrins avb3 and aIIbb3 (Lawler and Hynes 1989;Kvansakul et al. 2004). A RGD motif at the equivalent position was conserved in subgroup A TSPs of chordates, Marsupenaeus japonicus TSP and a single mega-TSP (supplementary fig. 2A, Supplementary Material online). A Lysine-glycine-aspartic acid (KGD) motif in type 3 repeat 6 of TSP1 and equivalently positioned RGD in TSP2, also coordinate a calcium ion (Kvansakul et al. 2004;Carlson et al. 2005). Equivalent KGD or RGD motifs are conserved in some other TSPs, some poriferan-TSPs and a single mega-TSP (supplementary fig. 2B, Supplementary Material online). Within the L-lectin/Con A-like domain of TSPs, a highly conserved DDD motif coordinates calcium ions and controls deposition of trimeric or pentameric TSPs into cell-derived ECM (Kim, Christofidou et al. 2015). The DDD motif is conserved in the first L-lectin/Con A domain of most sushi-TSPs and some poriferan-TSPs, but not in mega-TSPs (supplementary fig. 2C, Supplementary Material online). The second, Con A-like domain of Orbicella faveolata sushi-TSP also contains a DDD motif but this did not align with the conserved position. The absence of the DDD motif from the Con A-like domains of mega-TSPs implies that these proteins do not share the mechanism for ECM retention identified for canonical TSPs.
To deeply investigate the locations of these motifs and their potential for calcium coordination, relevant regions of the C-terminal molecular models were compared in detail to the TSP1 C-terminal structure PDB 1UX6 (Kvansakul et al. 2004 3D). Thus, in both models, this site appears to provide sufficient negative charge to support calcium binding and it is predicted that calcium binding is retained ( fig. 3D). Cysteines C181-C161 in the NvMT model also appeared to be conserved with the disulfide C876-C856 in 1UX6 which constrains the structure and may stabilize potential calcium-binding sites (Kvansakul et al. 2004).
Considering the RGD motif of 1UX6, the analogous residues in HmMT are PVG (residues 217-219 of the model). In the 1UX6 structure, these residues constrict the entrance to a short loop with D902, S903, D904, G905, and D906, and these three aspartates chelate a calcium ion. In the HmMT model, the analogous loop is extended and the negatively charged residues do not appear to be in favorable positions to support calcium binding ( fig. 3B). In NvMT, the residues corresponding to RGD of 1UX6 are VGD (residues 219-221 of the model). Similar to the HmMT model, the associated loop is extended in comparison with the RGD-containing loop of 1UX6 ( fig. 3E). Without a significant deviation from the fold predicted in the model, it is unlikely that the neighboring acidic residues such as E206 and E217 of NvMT can provide the necessary orientation of negative charge for coordination of a calcium ion. Thus, the models suggest that the negative charge in this region would be insufficient to support calcium binding.
With regard to the conserved DDD motif in the L-lectin/ Con A-like domain of TSPs, the corresponding residues in HmMT are GID (aa312-314 of the model). Although D314 of HmMT overlays D1003 of 1UX6, N337, of HmMT overlays Q1027, and the HmMT model provides two glutamates (E311 and E336) in this vicinity, on balance, it appears that the required orientation of charges for calcium binding is unlikely to be achieved at the single aspartate residue ( fig. 3C). In the NvMT model, the residues corresponding to DDD are GTD (NvMT aa312-314; shown in cyan in fig. 3F). Similarly, by inspection, this region appears unlikely to provide sufficient negative charge for calcium ion binding. In summary, Cabinding is predicted to be conserved in type 3 repeat 6, but not at the RGD-equivalent site or in the Con A-like domains of mega-TSPs.

Expression Pattern of Mega-TSP in N. vectensis
In view that mega-TSPs have the widest phylogenetic distribution of the newly identified superfamily members, we chose to focus on mega-TSPs for initial biological investigations. Cnidarians were chosen with regard to their wellstudied biology and as the only early-diverging phylum in which canonical TSPs have been studied. Nematostella vectensis is well developed as a laboratory model for the anthozoan class of cnidarians (Layden et al. 2016). Tissue expression patterns of NvMT were examined by in situ hybridization at several life-cycle stages. In motile planula larvae (ca. 72-h postfertilization [pf]), the ectodermal layer was negative and the NvMT transcript was detected specifically as faint expression in the aboral endoderm ( fig. 4A). In primary polyps, faint expression continued in the endoderm, but not in the pharynx (shown at 5d pf in fig. 4B and C). In more developed MBE polyps, endodermal expression became more evident at the mesenteries and was most obvious in some cells in the endodermal layer at the tips of tentacles; this signal became stronger as the tentacles developed (shown at 7d pf in fig. 4D-F; boxed areas in fig. 4E and F are enlarged in fig. 4G and H). These patterns were not detected with an NvMT sense probe ( fig. 4I). We also analyzed NvMT transcript expression from a transcriptomic database that includes timepoints from the earliest stages of embryogenesis (Warner et al. 2018). The timecourse indicated relatively high NvMT transcript levels in the earliest stages up to 12hpf (likely due to maternal transcripts), lower expression in blastulae, increasing expression throughout the gastrula (24h-48hpf) and planula (48h-120hpf) stages, and maintenance of higher expression in juveniles (from 120hpf onward) (supplementary fig. 3A, Supplementary Material online). The data on planula larvae and juveniles are in line with the in situ data.
In adult N. vectensis polyps, examined as transverse sections through several body regions, the NvMT transcript was detected most strongly in the ectodermal layer. This was particularly apparent in a cross section through the pharynx of an animal with retracted tentacles (fig. 4J, the sense control is shown in fig. 4K), whereas the signal in the body wall of the pharynx was weaker. Sections through the gastric region of the body column also revealed widespread expression in the body wall and associated tissues and higher expression in the ectoderm than the endoderm ( fig. 4L and M). In internal tissues, transcript expression was higher in the ovarian cyst and mesenteric filament than in body wall muscles ( fig. 4M). Analysis of a head regeneration transcriptome timecourse database (Warner et al. 2018) indicated that NvMT expression drops irregularly during initial wound healing and then progressively increases during later stages of tissue remodeling and tentacle regrowth (supplementary fig. 3B, Supplementary Material online).

Expression Pattern and Functional Role of Mega-TSP in Head Regeneration of Hydra
Hydra is a representative of the medusozoans and has been studied for over 200 years as an accessible animal with a simple body plan and high regenerative capacity . The sequenced genome of H. magnipapillata encodes a single canonical TSP (Bentley and Adams 2010;Lommel, Strompen, Hellewell et al. 2018) and gene-silencing technology is well developed, therefore this species was chosen for examination of mega-TSP expression and function. Wholemounts of H. magnipapillata polyps had striking expression of HmMT in the body wall with the exception of the tentacles. Signal intensity was highest around the hypostome, mouth, gastric, and budding regions and was weaker in the peduncle and basal disc ( fig. 5A; a negative control is shown in fig. 5D). Cross sections through the gastric region emphasized the extensive and uniform signal within the body wall ( fig. 5B). Higher power views clarified that HmMT was expressed preferentially in the outer, ectodermal cell layer of the body wall, with little or no signal detected in the endoderm (fig. 5C).
The elevated expression of HmMT in the head and budding region of Hydra suggested a possible function in head patterning or regeneration. Quantified real-time polymerase chain reaction (qPCR) of HmMT transcript during head regeneration after transection showed a small initial decrease, recovery to similar abundance between 6 and 18 h, and increased abundance between 24 and 48 h followed by a decline ( fig. 5E). The 24-48-h period corresponds to the onset of assembly of a new head (MacWilliams 1983).
To test for a functional role of HmMT in regeneration, we analyzed the capacity for head or foot regeneration of polyps in which HmMT expression was silenced by electroporation of short interfering RNAs (siRNA). For these experiments, a transgenic Hydra line expressing green fluorescent protein (GFP) in ectodermal cells and red fluorescent protein (RFP) in endodermal cells (Carter et al. 2016) was used ( fig. 6A), so that the effectiveness of knockdown could be monitored by applying siGFP together with siHmMT. In animals electroporated with siGFP only, the GFP signal was abolished in the ectodermal layer of the affected side of the polyp (compare fig. 6A with fig. 6D; insets show enlarged views of the body wall). Head regeneration of siGFP animals was completed normally by 96 h after transection (compare fig. 6B, C, and  figure 6G is explained by the continuous tissue flow from the midgastric region toward the head and tentacles that involves the mesoglea and epithelial cells (Aufschnaiter et al. 2011). The movement of GFP-positive cells toward the head region is likely slowed more sharply on the side closest to the electroporation source. The functional requirement for HmMT in head regeneration was reproducible with three different siRNAs to HmMT used in different pairwise combinations ( fig. 6J). In contrast, none of the siRNAs blocked foot regeneration ( fig. 6K). These data establish that HmMT has a significant role in head regeneration of Hydra.

Phylogenetic Relationships of TSPs and TSP Superfamily Proteins
Our comparative genomics analysis established that mega-TSPs are the only TSP-related proteins present in ctenophores, whereas multiple forms are present in poriferans and cnidarians ( fig. 7A). Indeed, mega-TSP, sushi-TSP, and poriferan-TSP are each represented by distinct gene transcripts in the calcareous sponge S. ciliatum ( fig. 7A and  table 2).
Next, molecular phylogenetic analyses were carried out on the relationships of the newly identified TSP-related proteins and the canonical TSPs. With regard to mega-TSPs and sushi-TSPs that have more than one Con A-like domain, an unrooted phylogenetic tree of individual Con A-like domains showed that all the "N-terminal" Con A-like domains (i.e., the Con A-like domain immediately after the type 3 repeats) are more closely related to each other than to any of the additional Con A-like domains (supplementary fig. 4, Supplementary Material online). Therefore, further phylogenetic analysis was based on the region comprising the last Evolution of Thrombospondin Superfamily . doi:10.1093/molbev/msz060 MBE EGF-like domain, type 3 repeats and first Con A-like domain. Each category of sequences formed a distinct clade separate from the canonical TSPs. The tree topology as recovered by several methods placed sushi-TSPs closer to mega-TSPs and poriferan-TSPs closer to canonical TSPs ( fig. 7B). Considerable sequence diversity was apparent (from branch lengths) within the poriferan TSPs and mega-TSPs. Within the canonical TSPs, as expected from prior studies, subgroup A TSPs formed a separate subclade from other TSPs ( fig. 7B). Although TSP sequences from cnidarians to human were included, the TSP clade showed more limited evolutionary divergence than mega-TSPs or poriferan-TSPs, suggestive of a shorter evolutionary divergence time, and/or higher conservation/greater selection pressure.

Discussion
Among the hundreds of proteins in the metazoan extracellular milieu, TSPs have been considered a unique form of matricellular and ECM protein. Here, we identify, by comparative genomics, domain composition and molecular modeling, multiple forms of TSP-related proteins (mega-TSP, sushi-TSP, and poriferan-TSP) that all share a domain region equivalent to the C-terminal region of TSPs. These identifications provide the first evidence for evolution of a TSP superfamily. Whereas secreted TSP-like proteins are not present in the close relatives of metazoans (Williams et al. 2014), we find the component domains are encoded, indicating a likely origin of the TSP superfamily as novel gene product(s) in the metazoan common ancestor. Within the Metazoa, one or more superfamily members are transcribed along with canonical TSPs in many species, yet all are absent from nematodes, and only canonical TSPs were identified in tardigrades and arthropods. This is in line with the distinctive ECM composition of cuticle-bearing nematodes and other ecdysozoans (Johnstone 2000). Mega-TSPs are encoded in many modern metazoan phyla, from Ctenophora to Chordata, but appear absent from the Ecdysozoa, tardigrades and vertebrates.
Potential molecular activities of mega-TSP and sushi-TSP were explored by analysis of molecular structures and motifs. Evolution of Thrombospondin Superfamily . doi:10.1093/molbev/msz060 MBE molecular patterns through either the inner or outer face of the LRR structure (Bella et al. 2008;Ng and Xavier 2011). Our molecular modeling supported that the LRR domain of mega-TSP can adopt the characteristic curved structure with distinct concave and convex faces. Molecular models of the Cterminal regions of two mega-TSPs and a sushi-TSP predicted that these polypeptide sequences would form structures highly related to the equivalent regions of canonical TSPs. For TSP1, RGD-dependent cell attachment and spreading depends on extracellular calcium ion concentration and the status of intramolecular disulfide bonds (Sun et al. 1992;Kvansakul et al. 2004). The models from mega-TSPs predicted that the equivalent loop is longer and does not provide the charge environment to support coordination of a calcium ion. A RGD motif is also lacking at the site, yet the equivalently positioned motif is VGD, which has been related to b1 integrin binding in the integrin-inhibitory activities of certain disintegrins from snakes or blood-sucking insects (Calvete et al. 2003;Assumpcao et al. 2012). The KGD motif located in the penultimate type 3 repeat of certain TSPs also coordinates a calcium ion (Kvansakul et al. 2004;Carlson et al. 2005). The mega-TSP models predict that the environment of this motif would support calcium ion binding, yet the equivalent motif is IGD. This motif also has potential to bind integrins, as IGD motifs within the type I domains of fibronectin support fibroblast migration involving avb3 integrin (Millard et al. 2007;Maurer et al. 2012). Thus, whereas specific aspects of Ca-coordination appear distinct, integrin binding appears a likely conserved property.
In view that mega-TSP has a much wider phylogenetic range than poriferan-TSP and sushi-TSP, mega-TSP was chosen for molecular and biological studies. We focused on cnidarians because of their value to study fundamental aspects of tissue organization. Expression patterns of mega-TSP transcripts were examined in N. vectensis and H. magnipapillata, each of which has distinct advantages for study of Model for the evolution of the TSP superfamily based on "ctenophores first" metazoan evolution. (D) Model for the evolution of the TSP superfamily based on "poriferans first" metazoan evolution. In (C and D), numbers refer to proposed gene duplication events. The circular diagrams indicate the number of Con A-like domains present in the different proteins and whether a DDD motif is present. Brackets indicate that the DDD motif is found in some but not all extant species orthologs. Light gray ¼ hypothesized ancestral form, dark gray ¼ extant form. See Discussion for details. Shoemark et al. . doi:10.1093/molbev/msz060 MBE developmental stages (Nematostella) or regeneration of adult polyps (Hydra). In situ hybridization demonstrated widespread and dynamically regulated expression of NvMT during development, with transcripts confined to the endoderm until late stage polyps, with graded expression increasing toward the distal ends of tentacles. Ectodermal expression was prominent in the mesenteries of adult N. vectensis polyps, and ectodermal expression was conserved in the body wall of Hydra polyps. Distinctions in the patterns might reflect subfunctionalization of NvMT due to the greater number of TSP superfamily members expressed by N. vectensis (Tucker et al. 2013). The ectodermal expression of HmMT along the entire longitudinal axis of the Hydra is reminiscent of the expression patterns of major fibrillar collagens, Hcol1, Hcol2, and Hcol3 (Deutzmann et al. 2000;Zhang et al. 2007;Tucker and Adams 2014), indicating that HmMT might form part of the interstitial matrix of the mesoglea, rather than the basal lamina which is produced from the endoderm ). In addition, HmMT transcript, similar to other ECM-encoding transcripts, was identified to be increased during head regeneration.
We focused on hydra to examine the function of HmMT. In cnidarian polyps, a single oral-aboral body axis controls patterning of the head and foot regions. This axis is set up by a constant, positive feedback loop of Wnt3 production at the hypostome as the major driver of head organization, in conjunction with a gradient of foot inhibitor from the base of the polyp (Bode 2011). Upon transection of the body column in the gastric region, gradient repatterning at the transection site activates dynamic restructuring of mesoglea and cell layers to generate a new head on the basal body fragment and a new foot on the head fragment (Sarras et al. 1991;Aufschnaiter et al. 2011). Upon siRNA-mediated silencing of HmMT in adult polyps, a striking necessity for HmMT to support head regeneration was discovered. Identical phenotypes were obtained with three pairwise combinations of siRNA to HmMT, supporting that these resulted from ontarget effects. The need to stimulate tissue remodeling to reveal an acute phenotype has been documented for other hydra transcripts with functions in axis determination and may relate to the native rate of protein turnover in the head region (Aufschnaiter et al. 2011). Dysfunctional stem cell migration could play a role in the observed effect of HmMT silencing. Zhang and Sarras (1994) showed that perturbation of ECM structure in hydra, with RGDS synthetic peptide or an antibody to fibronectin, dramatically inhibited the migration of interstitial stem cells (i cells). Although hydras depleted of i cells can regenerate, normal head regeneration involves i cell activity (Chera et al. 2009). Alternatively, as head regeneration is dependent on de novo synthesis of mesoglea, a disordered basement membrane or interstitial matrix assembly caused by the lack of HmMT could block morphollaxis of the epithelial cells, which is a key process in the early phase of head regeneration (Agata et al. 2007).
Considering together the domain architectures and motifs, molecular modeling predictions, molecular phylogenetic data and analysis of transcribed sequences, models were formulated for the evolution of the TSP superfamily. With regard to the "ctenophore-sister" model of animal phylogeny (Whelan et al. 2017) and that mega-TSP was the sole form identified in ctenophores, one model is that a mega-TSP-like encoding sequence originated in the last metazoan common ancestor and that the TSP superfamily then evolved by multiple gene duplication events and subsequent domain shuffling that led to the evolution of paralogs with distinct domain compositions ( fig. 7C). In this model, the gene duplication giving rise to sushi-TSP in modern poriferans and cnidarians occurred after the divergence of ctenophores (i.e., in the last common ancestor of poriferans, placozoans, cnidarians, and bilaterians), with subsequent loss of this gene from all lineages except poriferans and cnidarians. The absence of a coiled-coil domain from mega-TSP, sushi-TSP, and poriferan-TSP indicates that oligomerization by this mechanism is a late-evolving property specific to the TSPs.
Alternative considerations, based on the "poriferan-sister" model (Feuda et al. 2017), involve more complexity of the ancestral state, evolutionary divergences, and gene losses. Because mega-TSP and sushi-TSP are both represented in extant poriferans, an evolutionary "big bang" giving rise to mega-TSP and sushi-TSP ancestors in the metazoan common ancestor is proposed ( fig. 7D). In this model, the mega-TSPonly state of ctenophores would result from gene loss. Assuming poriferan-TSP arose from a poriferan-specific gene duplication (discussed further below), the origin of canonical TSPs in cnidarians is posited to have resulted from gene duplication and rapid domain losses/domain shuffling and divergence of sushi-TSP. However, the molecular phylogenetic data do not clearly support this proposal.
The models also consider the conservation or absence of the DDD calcium-binding motif of the TSP L-lectin/Con-A like domain. This motif was absent from mega-TSPs. Because a DDD motif is apparent in the single Con A-like domain of poriferan-TSP and is variably present in sushi-TSP, we propose that this motif debuted in a common ancestor of sushi-TSP, and the poriferan-TSP/TSP ancestor, with subsequent lineage-specific losses ( fig. 7C). Gene duplication and domain shuffling/loss is proposed to have given rise to sushi-TSP (with two Con A-like domains) and the ancestor of poriferan-TSP and TSPs (with a single Con A-like domain). Duplication of the latter gene in the last common ancestor of poriferans, cnidarians, and bilaterians is proposed to have given rise to poriferan-TSP and canonical TSP, with loss of the poriferan-TSP-encoding gene from the cnidarian/bilaterian common ancestor ( fig. 7C). In the "poriferan-first" model, the same considerations lead to the proposal that TSPs arose from a duplication of the sushi-TSP gene ( fig. 7D).
With regard to the evolution of poriferan-TSP, our analysis demonstrated that all three categories of TSP-related proteins are expressed in this phylum. As others have noted more generally, there was very high in-clade sequence variation between different classes of Porifera (Renard et al. 2018). Different poriferan species encode distinct profiles of TSPrelated proteins and yet lack canonical TSPs. In the above models, we take the parsimonious assumption that canonical TSPs debuted in cnidarians. However, the presence of TSR domains in poriferan-TSPs, and also in subgroup A TSPs of Evolution of Thrombospondin Superfamily . doi:10.1093/molbev/msz060 MBE deuterostomes, poses an interesting issue of whether poriferan-TSPs represent orthologs or paralogs of subgroup A TSPs. A conceivable alternative would be that poriferan-TSP also arose in the metazoan common ancestor, with subsequent gene loss in ctenophores and a convoluted evolutionary pathway involving loss of TSR domains and the origin of a B-like TSP in the cnidarian/bilaterian common ancestor. The prior model for TSP evolution is that subgroup A TSPs arose in the last deuterostome common ancestor (Bentley and Adams 2010) and several lines of evidence support this view. Conservation of synteny is apparent for each of the five TSP genes of vertebrates, and the concept that these genes arose by duplications early in the vertebrate lineage is supported by the paralogous gene locations of THBS1, THBS3, THBS4, and COMP/THBS5 within the human genome (McKenzie et al. 2006). The present molecular phylogenetic analysis indicates that poriferan-TSPs form a separate clade to the canonical TSPs, whereas subgroup A TSPs clearly place closely to the pentameric TSPs ( fig. 7C). Thus, it seems most plausible that poriferan-TSP evolved separately in the sponge ancestor and represents a phylum-specific paralog. Overall, the data reveal for the first time that TSPs represent a discrete branch within a superfamily of previously unappreciated complexity and diversity, which appears to have originated at the debut of the Metazoa. The functional significance of Hydra mega-TSP implicates major roles in tissue patterning or dynamics. Further study of this superfamily may illuminate principles of metazoan tissue organization.

Multiple Sequence Alignment and Phylogenetic Trees
Data sets included the TSPs listed above and TSP-related proteins from the set in table 1 that represented the phylogenetic range of the predicted proteins. Multiple sequence alignments of amino acid sequence regions including the last EGF domain, the TSP type 3 repeats and the first Ltype lectin-like/Con A domain, or of single Con A-like domains, were prepared in MUSCLE 3.8 (Edgar 2004) or webPRANK (Löytynoja and Goldman 2010) at default parameters through the resources of EBI. For preparation of phylogenetic trees, variations present in <5% of the sequences leading to gapping were removed. Phylogenetic trees were constructed by the maximumlikelihood method in PhyML 3.0 (Guindon et al. 2010) at default parameters with 200 cycles of boot-strapping. Tree diagrams were rendered from the Newick outputs in iTOL (Letunic and Bork 2011 (Kim et al. 2007) with 24% identity; the region 529-691 (LRSNS to GPNSE), on pdb 4MN8 (Sun et al. 2013) with 27% identity; the region 692-837 (EILH to AEMS), on 4FDW (the crystal structure of a putative cell surface protein (Bacova_01565) from Bacteroides Ovatus Atcc 8483 at 2.05 A resolution), with 25% identity, and the region 838-1,016 on pdb 4MN8, with 27% identity. The segments were then conjoined by hand, guided by the repeating fold. The resulting model was energy minimized in water and 150-mM NaCl in Gromacs 4.6.7 (van der Spoel, E. Lindahl, B. Hess, and the GROMACS development team, www.gromacs.org) (Hess et al. 2008), with reasonable energies, indicating that there were no clashes or unreasonable bonds. The modeling of C-terminal regions focused in each case on the sequence from the start of the type 3 repeats to the C-terminus of the second Con A-like domain. The models for HmMT, NvMT, and ScST (scpid30246) C-terminal domain sequences were produced using HHPRED sequence alignment software (Söding et al. 2005) and Modeler (Webb and Sali 2014) (https://salilab.org/modeller/). The HmMT model shared 42% sequence identity with pdb 3FBY, the C-terminal region of human cartilage oligomeric matrix protein, also known as thrombospondin-5 (TSP-5) (Tan et al. 2009), between residues DNCEY and LQVR (designated as the TSP-like C-terminal domain). The second ConA-like domain of HmMT (from residues CLERMN to FLSQK) was modeled from pdb 4DQA, with which the HmMT second Con A domain shared 20.7% sequence identity. The C-terminal region from NvMT shared 46% identity with pdb 3FBY and was modeled between residues DNCP to EAKCA on this structure. The putative second ConA-like domain of NvMT was modeled for the region from residues VNQAL to FFVTYP against pdb 4DQA, with which it shared 21% identity. The first Con A-like domain from ScST (scpid30246; ScST) was modeled from residues SSSSVEC to CLRN on 3FBY, with which it shared 49% sequence identity. The second ConAlike domain of ScST, from TVDL to GPPRT, shared only 18% identity with the nearest structure, 4DQA, which was considered too low to produce a meaningful model.
All the resulting models were read into the Sybyl suite of software (Sybylx2.1.1, Tripos Inc.) where they were inspected. Cysteine residues likely to form disulfides were oriented within optimal reach of each other, oxidized, and disulfide bonds formed. The models were then minimised and overlaid for comparison with two TSP structures from pdb, neither of which had been used in model building: 1UX6 (TSP1 C-terminal fragment; Kvansakul et al. 2004) and 1YO8 (TSP2 C-terminal region; Carlson et al. 2005). The Hm model was further supported by the superposition of the DNC repeats with the native PDB 3FBY and 1YO8 crystal structures (1YO8 was not used to produce the model) in which overlying (conserved) disulfides constrain the structure and may stabilize potential calcium-binding sites. Models were visualized and figures prepared with the UCSF Chimera package (Pettersen et al. 2004) (http://www.cgl.ucsf.edu/ chimera). Procheck (Laskowski et al. 1993) was run from the CCP4 suite of software (http://www.ccp4.ac.uk/dist/ html/procheck_man/index.html).

Laboratory Culture of Cnidarians and siRNA Treatment
Hydra Species Hydra magnipapillata strain 105 and transgenic H. vulgaris AEP strain were cultured in hydra medium (HM) (1 mM CaCl 2 , 0.1 mM MgCl 2 , 0.1 mM KCl, 1 mM NaH 2 CO 3 , pH 7.8) at 18 C and fed two to three times a week with freshly hatched Artemia salina nauplii. Before experiments, animals were starved for 24 h. For qPCR, RNA was prepared as described in Lommel et al. (2018). For siRNA experiments, a transgenic H. vulgaris strain expressing ectodermal GFP (actin::GFP) and endodermal RFP (actin::RFP) (Carter et al. 2016), provided by Robert Steele's laboratory was used. siRNAs were designed for the HmMT C-terminal region (siHmMT1), the LRR domain region (siHmMT2) and the region between the LRR and EFG domains (siHmMT3). siRNAs for GFP and HmMT were purchased from Sigma-Aldrich/ Merck at HPLC grade (see supplementary table S2, Supplementary Material online, for all siRNA sequences). Electroporation was performed as described (Lommel et al. 2018). 3 lM siRNA in total (1 lM siGFP and 2 lM scrambled siGFP or combinations of 1 lM each of siHmMT1, siHmMT2, and siHmMT3) was added to the cuvette. Twenty four hours after electroporation, living animals (n ¼ 20) were transferred into a new Petri dish containing hydra medium and kept under normal hydra conditions. The feeding routine was restarted 2 days after electroporation. For the regeneration experiments, animals were allowed to recover for 7 days and fed at least four times. The animals were cut at 50% body length using a sterile scalpel blade (NeoLab sterile surgical blade No. 15). After three washing steps in hydra medium for 10 min at 18 C, the head and foot regenerates, respectively, were kept separately in Petri dishes containing hydra medium. Regeneration was documented after 72 and 96 h, using a Nikon SMZ25 stereomicroscope equipped with a Nikon DS-Ri2 high-definition camera.
Nematostella vectensis A colony of mature N. vectensis polyps descended from clones CH2 and CH6 from the University of California Davis Bodega Bay Marine Laboratory were raised as described (Hand and Uhlinger 1992). Nematostella polyps and embryos in the Heidelberg laboratory were kept in 1/3 seawater (Tropic Marine) at 18 C in the dark and fed once or twice a week with freshly hatched Artemia nauplii.

In Situ Hybridization
Nematostella vectensis Adults: Sexually mature N. vectensis were raised and prepared for in situ hybridization as described (Tucker and Gong 2014). The fixed tissues were infiltrated and embedded in Paraplast (Fisher Scientific) and 8-lm-thick sections collected on Superfrost Plus (Fisher Scientific) glass slides. Sections were treated with Proteinase K at room temperature for 3 min. After postfixation, rinsing and dehydration, 1 lg/ml of riboprobe (from the pCRII plasmid containing the NvMT nucleic acid region encoding aa 956-1,275 of NvMT, which is unique to mega-TSPs to minimize the possibility of crosshybridization with canonical TSP) was added to the slides and hybridization carried out at 65 C overnight followed by colorization. Larvae and primary polyps: In situ hybridization of planula larvae and primary polyps of N. vectensis was performed as described (Watanabe et al. 2014). In brief, specimens were fixed with 4% paraformaldehyde/PBST (PBS, 0.1% Tween 20) for 1 h at room temperature. Hybridizations were carried out in a hybridization solution containing 1% SDS at 60 C for at least 22 h. Sense (control) and antisense digoxigenin-labeled RNA probes corresponded to the region encoding aa 956-1,275 of NvMT.

Hydra magnipapillata
Customized LNA digoxygenin-labeled RNA probes were designed and produced by Exiqon (5 0 DigN/ AGTAAACTTGTGCTGTAGATGT/3 0 Dig_N; Position in target: 1,207-1,229) corresponding to the antisense strand of HmMT cDNA. The whole-mount in situ hybridization procedure was performed as described previously (Schüler et al. 2015). For hybridization, the LNA probe was added to a final concentration of 1 mM in fresh HS and hybridized for 60 h at 55 C.

Vibratome Sectioning of Hydra
Freshly in situ-stained Hydra polyps were embedded in 1.5 ml PBS containing 0.5% gelatin, 30% bovine albumin-fraction V (Carl Roth) and solidified with 105 ll glutaraldehyde (Sigma). Cross sections (100 lm) were cut from the midgastric region with a VT1000 S vibrating blade microtome (Leica) and mounted on microscope slides in Mowiol 4-88(Carl Roth).

Microscopy and Image Processing
Images were captured with a Nikon Eclipse 80i or E800 photomicroscope using NIC filters and a Nikon Digital Sight DS-U1 color camera and processed with Adobe Photoshop CS6 and ImageJ.