Dynamic Evolution of Retroviral Envelope Genes in Egg-Laying Mammalian Genomes

Abstract Independently acquired envelope (env) genes from endogenous retroviruses have contributed to the placental trophoblast cell–cell fusion in therian mammals. Egg-laying mammals (monotremes) are an important sister clade for understanding mammalian placental evolution, but the env genes in their genomes have yet to be investigated. Here, env-derived open reading frames (env-ORFs) encoding more than 400 amino acid lengths were searched in the genomes of two monotremes: platypus and echidna. Only two env-ORFs were present in the platypus genome, whereas 121 env-ORFs were found in the echidna genome. The echidna env-ORFs were phylogenetically classified into seven groups named env-Tac1 to -Tac7. Among them, the env-Tac1 group contained only a single gene, and its amino acid sequence showed high similarity to those of the RD114/simian type D retroviruses. Using the pseudotyped virus assay, we demonstrated that the Env-Tac1 protein utilizes echidna sodium-dependent neutral amino acid transporter type 1 and 2 (ASCT1 and ASCT2) as entry receptors. Moreover, the Env-Tac1 protein caused cell–cell fusion in human 293T cells depending on the expression of ASCT1 and ASCT2. These results illustrate that fusogenic env genes are not restricted to placental mammals, providing insights into the evolution of retroviral genes and the placenta.


Introduction
Endogenous retroviruses (ERVs) are remnants of ancient retroviral infections to host germ cells. ERVs have provided de novo genes and new functions in their hosts (Johnson 2019). In therian mammals, various retroviral envelope (env) genes have been reported to be co-opted for placental fusogenic genes during evolution, the original of which is responsible for viral entry by fusing the retrovirus and the target cell membrane. Syncytin-1 and syncytin-2 derived from ERV-W and ERV-FRD1, respectively, are thought to be responsible for the cell-cell fusion of human placental trophoblasts (Mi et al. 2000;Blond et al. 2000;Blaise et al. 2003;Mallet et al. 2004). Placental genes derived from env are not restricted to humans, but also to rodents (Dupressoir et al. 2005(Dupressoir et al. , 2011Redelsperger et al. 2014), rabbits ), carnivorans (Cornelis et al. 2012), ruminants Nakaya et al. 2013), tenrecs , and even in opossums belonging to marsupials (Cornelis et al. 2015). Moreover, non-mammalian species such as the viviparous Mabuya lizards were reported to acquire an env-derived gene, syncytin-Mab1, which was involved in placental cell-cell fusion (Cornelis et al. 2017). Given the widespread occurrence of ERVs in vertebrates (Hayward et al. 2015) and the ancient origin of retroviruses (Aiewsakun and Katzourakis 2017), the ERV invasion into the host genomes should have been occurring in egg-laying mammals (i.e., monotremes), which would be of particular interest to speculate the situation before the placental birth in mammals. Recently, the chromosome-level-assembled genomes of platypus and short-beaked echidna have been reported (Zhou et al. 2021). It enables us to investigate the monotreme ERVs in more detail. Indeed, in platypus and echidna genomes, we previously reported highly conserved genes that were derived from a retroviral reverse transcriptase, although their molecular functions are unclear ). Here, we comprehensively identified nearly full-length env-coding open reading frames (hereafter called env-ORFs) in the platypus and echidna genomes and found their dynamic evolution. Furthermore, a fusogenic Env protein identified in the echidna genome was experimentally verified. Our study demonstrates the ongoing acquisition of fusogenic env genes in mammals that do not utilize a placenta, providing key insights into the evolution of retroviral genes and the placenta.

Discovery of env-ORFs in the Platypus and Echidna Genomes
To obtain ORFs encoding nearly full-length envelope proteins in monotremes, we first extracted ORFs encoding more than 400 amino acids (i.e., 1,200 nucleotides starting from the start codon) from the chromosome-level-assembled reference genomes of platypus and echidna (Zhou et al. 2021). We then extracted ORFs encoding envelope proteins by sequence similarity search (see Materials and Methods). This search identified only two env-ORFs in the platypus genome. This observation is consistent with previous studies showing very few ERV-derived ORFs in the platypus (Nakagawa and Takahashi 2016;Ueda et al. 2020). In contrast, we identified 121 env-ORFs in the echidna genome. These monotreme env-ORFs were clustered into nine groups based on the putative amino acid sequences. We named these env-ORF groups as env-Oan1 and -Oan2 (platypus: Ornithorhynchus anatinus) and env-Tac1 to -Tac7 (echidna: Tachyglossus aculeatus) ( fig. 1). Detail information of each env-ORF such as ORF length, chromosome, locus, direction, and overlapping gene or neighboring genes were summarized in supplementary data set S1, Supplementary Material online. The env-Oan1 and -Oan2 are found to have only one copy for each in the platypus genome. For echidna, the env-Tac1, -Tac2, -Tac4, and -Tac7 were also low copies (one copy for env-Tac1 and -Tac2, two copies for env-Tac7, and three copies for env-Tac4), whereas env-Tac3, -Tac5, and -Tac7 had 18, 77, and 19 copies, respectively. These high-copy-number env-ORFs were shallowly branched in the phylogenetic tree ( fig.  1), suggesting the recent expansion in the echidna genome. Proviral structures containing env-ORFs were identified based on their genomic sequences ( fig. 2 and supplementary data set S2, Supplementary Material online). Note that, for the high-copy-number env-ORFs (i.e., env-Tac3, -Tac5, and -Tac6), we generated consensus sequences of env-ORFs with flanking sequences and identified their proviral structures. The topologies of the phylogenetic trees of putative Gag and Pol proteins were consistent with those of the Env proteins and also suggested that these ERVs belong to either gammaretroviruses (env-Tac1, -Tac2, -Tac3, and -Tac4) or betaretroviruses (env-Tac5, -Tac6, and -Tac7) ( fig. 3). Nucleotide sequences of 5′-LTR and 3′-LTR were almost identical in the high-copy proviruses encoding env-Tac3 (86.7% to 100%), env-Tac5 (94.5% to 100%), and env-Tac6 (84.4% to 99.5%) (supplementary data set S3, Supplementary Material online), suggesting that they still actively expand in echidna genomes.

Tissue Expression of Transcripts Encoding env-ORFs
To investigate the transcription of env-ORFs, we calculated the expression levels of genes consisting of transcripts encoding env-ORFs from the platypus and echidna tissue transcriptome data ( fig. 4A). In the platypus tissues, we could not detect the transcripts of env-Oan1 and -Oan2. In contrast, env-Tac1, -Tac2, -Tac4.1, and eight env-Tac5 loci were expressed in echidna tissues ( fig. 4B). The env-Tac1 showed the highest expression among these env-ORFs, whereas the env-Tac2 was expressed only in male samples. When multimapped reads were included in the analysis, the expression levels of env-Tac1 were increased, suggesting the presence of env-Tac1-like sequences in the echidna genome ( fig. 4C). In contrast, the expression level of env-Tac4.1 did not change when uniquely mapped reads were used. All expressed sequences in env-Tac5 genes showed specificity to the ovary ( fig. 4B). The expression levels of the env-Tac5 genes, such as env-Tac5.29, were greatly increased when multimapped reads were used, suggesting that their sequences were similar ( fig. 4C). These transcriptome analyses revealed that env-ORFs are actively transcribed in echidna tissues and have different tissue specificities among env-ORF groups.

env-Tac1 Potentially Encodes a Functional Env Protein
The Env-Tac1 protein is closely related to reticuloendotheliosis virus (REV), which belongs to the RD114/D-type retrovirus (RDR) interference group (figs. 1 and 5A). Previously, Niewiadomska and Gifford reported a REV-like env-ORF lacking the SU domain in the echidna genome, as detected by PCR (Niewiadomska and Gifford 2013). The env-Tac1 was thought to be a full-length copy of this ERV group, which was suggested by our phylogenetic analysis of REV-Env proteins (fig. 5A). A multiple sequence alignment of the Env-Tac1 protein with the Env proteins of REV and spleen necrosis virus (SNV), which is a subgroup of REV, showed that the Env-Tac1 protein retains functionally important amino acids ( fig. 5B). In particular, the SDGGGXXDXXR motif is important for the binding to receptors, amino acid transporter type 1 (ASCT1 [also known as SLC1A4]) and/or amino acid transporter type 2 (ASCT2 [SLC1A5]) (Sinha and Johnson 2017). The predicted topology of the Env-Tac1 protein suggested a typical Env protein structure with a signal peptide at the N-terminus and a single transmembrane (TM) domain at the C-terminus ( fig. 5C). Together, the Env-Tac1 protein retained amino acid motifs for functional RDR Env protein.

Env-Tac1 Protein Interacts with ASCT1 and ASCT2 to Induce Membrane Fusion
The fusogenic activity of the Env-Tac1 protein was experimentally characterized under the assumption that the Env-Tac1 uses ASCT1 and/or ASCT2 as receptors. First, we conducted a cell-cell-fusion-dependent Lac Z assay to verify the cell-cell fusion activity of the Env-Tac1 protein in the presence of ASCT1 or ASCT2 ( fig. 6A). The Env-Tac1 protein was revealed to mediate cell-cell fusion in 293T cells in the presence of both human and echidna ASCT1 and ASCT2 ( fig. 6B). The fusion activity was the highest on echidna ASCT2. These experimental data support that the Env-Tac1 Kitao et al. · https://doi.org/10.1093/molbev/msad090 MBE protein is the RDR Env protein that utilizes both ASCT1 and ASCT2 as receptors. To investigate whether the Env-Tac1 protein enables retroviral entry, we generated murine leukemia virus (MLV)-based pseudotypes with the Env-Tac1 protein carrying the LacZ marker ( fig. 6C). 293T cells stably expressing echidna ASCT1 and ASCT2 (293T-tacASCT1 and 293T-tacASCT2, respectively) were generated using a piggyBac system. We found that the Env-Tac1 protein mediated the infection of both 293T-tacASCT1 and 293T-tacASCT2 cells ( fig. 6D). The infectious units were higher in 293T-tacASCT2 cells than those in 293T-tacASCT1 cells, which is consistent with the higher cell-cell fusion activity on tacASCT2. Env proteins of SNV mediated infection only in 293T-tacASCT1 cells, suggesting that there are differences in amino acids that determine whether they are directed toward ASCT1 or ASCT2. We examined the expression of ASCT1 and ASCT2 in various echidna tissues and found that ASCT2 was particularly highly transcribed in the kidney ( fig. 6E), in which env-Tac1 was also expressed ( fig. 4B). These data indicate that the Env-Tac1 protein is a potential fusogen in echidna tissues utilizing ASCT1 and ASCT2 as receptors.

Estimation of env-tac1 Insertion
We conducted a rough estimation of the times of the ERV insertion based on the 5′-and 3′-LTR similarities. The genome substitution rate of monotremes is estimated at Envelope Genes in Egg-Laying Mammals · https://doi.org/10.1093/molbev/msad090 MBE ∼2.6 × 10 −3 substitutions per site per million years (Zhou et al. 2021). It is known that 5′-and 3′-LTRs of an ERV are identical when inserted. Therefore, assuming that 5′-and 3′-LTRs of ERV-Tac1 are under neutral selection, the p-distance and inserted age of the two LTR nucleotide sequences were calculated for all ERVs obtained in this study (supplementary data set S3, Supplementary Material online). As for env-Tac1, the p-distance of the 5′-and 3′-LTRs is 3.9 × 10 −3 substitutions per site, suggesting the ERV of env-Tac1 can be inserted around 1.5 million years ago (MYA). Considering the two monotremes diverged around 55 MYA (Zhou et al. 2021), the env-Tac1 could insert specifically in the echidna lineage. Indeed, the estimated time of all ERVs found in this study exhibits <55 MYA, excluding Env-Tac6.14 (estimated 60.1 MYA) (supplementary data set S3, Supplementary Material online). This result suggests that those ERVs were inserted after the speciation of the platypus and echidna lineages, which is concordant with the observation of the phylogenetic tree ( fig. 1).
To further investigate whether the homologous genes of env-Tac1 exist in the platypus genome as well as marsupial or eutherian genomes, we extensively searched using the FASTA program (E-value ≤ 1E −10 ) with the nucleotide

FIG. 2. Proviruses encoding platypus and echidna env-ORFs.
For each env-ORF, its retroviral structure, including gag, pro, and pol genes, was illustrated. Short bars indicate start codons, and tall bars indicate stop codons. ORFs of more than 300 nt were filled. ORFs from a stop codon to a stop codon are shown at the top, and ORFs from an ATG triplet to a stop codon are shown at the bottom. Retroviral gene names (gag, pro, pol, and env) were also indicated. 5′-and 3′-LTRs were filled with gray. Percent nucleotide identity between 5´-and 3´-LTRs was indicated on the right side of the sequences. For multicopy env-ORFs (i.e., env-Tac3, -Tac5, and -Tac6), consensus sequences were constructed.   . 2) and representative retroviruses was used for tree construction. "ERV_Tac" indicates a provirus containing the equivalent number of env-Tac gene (e.g., ERV_Tac1 is a provirus containing env-Tac1 ). An open circle in an internal node indicates >95% bootstrap support (1,000 replicates).
Envelope Genes in Egg-Laying Mammals · https://doi.org/10.1093/molbev/msad090 MBE between the env-Tac1 and the env-Tac1-like sequences, we obtained the most similar sequence (i.e., a sequence showing the lowest E-value) for each species and generated a maximum-likelihood-based tree adding RDR env genes used in figure 5A. As a result, the env-Tac1, including echidna ERV (Niewiadomska and Gifford 2013), made a clade with envelope genes of REV and SNV (supplementary fig. S1, Supplementary Material online), suggesting that there are no homologous sequences in the mammalian genomes examined in this study. This observation is consistent with the estimated insertion time of ERV-Tac1.

Discussion
The acquisition of the ERV-derived genes has been associated with the evolution of placentation in therians (Imakawa et al. 2015(Imakawa et al. , 2022. The capture of syncytin genes from retroviral env genes are thought to be a pivotal step for the emergence of placenta in egg-laying ancestors . Indeed, the very small numbers of ERVs in the platypus genomes arose the hypothesis that intensive invasion of therian genomes may facilitate the co-option of ERV-derived placental genes (Chuong 2013). However, our study revealed that the echidna genome contained more than 100 nearly full-length env-ORFs. The number of echidna env-ORFs was not unusual compared with that in other mammals. For example, 37 and 255 methionine-starting env-ORFs longer than 400 codons were found in the human and mouse genomes, respectively (Nakagawa and Takahashi 2016), and 162 nearly full-length env-ORFs have been identified in 9-banded armadillo (Malicorne et al. 2016). Thus, the copy number of env-ORFs varies greatly among mammalian species, including the monotremes, during evolution. This observation also indicates that horizontal gene transfer via retroviruses could be common in mammals considering the evolutionary timescale.
Among echidna env genes, we demonstrated the cellcell fusion activity of the Env-Tac1 protein ( fig. 6B and  D). Env-Tac1 and its receptor ASCT2 were found to be highly transcribed in the kidney (figs. 4B and 6E) suggesting that the fusogenic activity may be important function for kidney of echidnas. However, the biological process in which the fusogenic activity involved is still unclear. In addition, we have not examined whether the env-Tac1 mRNA is translated. We previously revealed that env-ORFs, including syncytin-1 and syncytin-2, require specific RNA motifs named SPRE to be translated (Kitao et al. 2021); however, the env-Tac1 does not contain the SPRE motif, and its translational regulatory mechanism is currently unknown. Further studies are required to characterize the molecular function of env-Tac1.
Identifying the evolutionary conservation of retroviral genes is necessary for determining their co-options on host functions. Previously, we reported retroviral reverse  ). The dN/dS analysis revealed that these genes were under purifying selection between the two species. In this study, however, no orthologous env genes were found in platypus and echidna, indicating that env genes are not conserved in monotremes. Given that most of the ERVs encoding env-ORFs still retained retroviral ORFs other Envelope Genes in Egg-Laying Mammals · https://doi.org/10.1093/molbev/msad090 MBE than env genes ( fig. 2), these env-ORFs could be under the process of mutational decay. Future comparative analyses of the evolutionary conservation of individual env genes in other echidna species, whose genomes have not yet been sequenced, will reveal the evolutionary conservation of individual env genes. In addition, we cannot distinguish whether each ERV cluster identified in this study expanded due to independent invasions or retrotranspositions, which required further studies. In conclusion, we demonstrated the diversity of env genes in monotremes and identified an env-derived gene in echidna showing cell fusion ability, which indicates the dynamic evolution of env-derived genes in mammals.

Phylogenetic Analysis
To infer the phylogenetic relationship of monotreme env-ORFs obtained in this study, amino acid sequences of 20 retroviral Env were retrieved from representative retroviruses (accession numbers are listed in supplementary data set S5, Supplementary Material online). In total, 153 amino acid sequences were aligned using the MAFFT v7.487 with the L-INS-i method (Katoh and Standley 2013). Conserved TM domains of the Env proteins (regions after furin cleavage RXR/KR motif) were extracted. For the Env-Tac1 phylogeny belonging to the RDR interference group, we generated a multiple sequence alignment of full-length amino acid sequences of the Env-Tac1 and RDR Env proteins (supplementary data set S5, Supplementary Material online) using the same procedure. IQ-TREE2 v2.0.8 (Minh et al. 2020) was used to construct the phylogenetic tree with 1,000 replicates generated by an ultrafast bootstrap approximation (Hoang et al. 2018). The tree was visualized using FigTree v1.4.4 (https://github.com/rambaut/figtree/releases).

Characterization of Proviruses of env-ORFs
The nucleotide sequences of env-ORFs with 10-kbp downstream and upstream extensions were retrieved using  BEDtools v2.30.0 (Quinlan and Hall 2010). 5′-and 3′-LTRs were identified using LTRharvest (Ellinghaus et al. 2008) and LTRdigest (Steinbiss et al. 2009) included in GenomeTools v1.6.2. To obtain consensus sequences for the multicopy env genes (env-Tac3, env-Tac5, and env-Tac6), the proviral sequences were aligned with MAFFT as previously described. Then minor insertions were removed using T-coffee v11.0.8 (Notredame et al. 2000) with the "rm_gap 60" option. A consensus sequence for each env group was generated using the cons program in the European Molecular Biology Open Software Suite v6.5.7.0 (32). The ORFs in proviruses were visualized using ApE v3.0.6 (Davis and Jorgensen 2022). Characterization of proviral ORFs was based on the protein motif search in the HMMER web server (Finn et al. 2011). For the estimation of the integration time, 5′-and 3′-LTRs for each ERVs were aligned using MAFFT as previously described, and the p-distance was calculated using the distmat program in the EMBOSS with default parameters.

Analysis of Transcriptome Data
RNA-seq data of platypus (20 samples from 6 tissues) (Marin et al. 2017) and echidna (11 samples from 7 tissues) (Zhou et al. 2021) were obtained from the SRA database whose accession IDs were summarized in supplementary data set S6, Supplementary Material online. Low-quality reads were trimmed and filtered using fastp v0.19.5 with default options (Chen et al. 2018). The filtered reads were mapped to each reference genome using HISAT2 v2.1.0 (Pertea et al. 2016). For each RNA-seq dataset, we conducted genome-guided transcript assemblies using StringTie2 v2.1.6 with "−rf" options to specify the strandedness (Kovaka et al. 2019). The resultant GTF files and the RefSeq GTF file were merged using StringTie2 with the "−merge" option to generate a merged GTF file. To quantify the expression level of each env-ORF, we calculated transcripts per kilobase million (TPM) for 20 platypus and 11 echidna RNA-seq samples using the Stringtie2 program v2.1.6 with "-e −rf" options (Kovaka et al. 2019). Exons in the merged GTF file which overlapped with env-ORFs were identified using BEDtools intersect with "-s" option to specify the strandedness (Quinlan and Hall 2010), and TPM of genes whose exons were overlapped with env-ORF were regarded as expression levels of env-ORFs. For extraction of unique-mapped reads, we collected reads that had "NH:i:1" flag, which indicates they were uniquely mapped, in aligner-generated SAM files.

Plasmids
To construct the Env-Tac1 expression plasmid (phCMV3-Env-Tac1), dsDNA encoding the env-Tac1 ORF was synthesized by Eurofins Genomics (Tokyo, Japan). The synthesized dsDNA was inserted into EcoRI and BamHI sites of phCMV3 (#P003300, Genlantis, San Diego, CA, USA) using NEBuilder HiFi DNA Assembly Master Mix (#M5520AA, New England Biolabs, Ipswich, MA, USA). For the construction of the SNV-Env expression plasmid (phCMV3-SNV-Env), the SNV-Env-coding region was amplified from pPR102 (Dougherty et al. 1989) using PCR and inserted into the EcoRI and BamHI sites of phCMV3. To construct the human ASCT1 and ASCT2 expression plasmids (phCMV3-hsASCT1 and phCMV3-hsASCT2), human ASCT1 and ASCT2 were amplified from 293T cDNA and inserted into the EcoRI and BamHI sites of phCMV3. For the construction of the echidna ASCT1 and ASCT2 expression plasmids (phCMV3-tacASCT1 and phCMV3-tacASCT2), dsDNA fragments coding the ASCT1 and ASCT2 were synthesized based on the RefSeq sequences of XM_038750847.1 (ASCT1) and XM_038766835.1 (ASCT2) (Eurofins Genomics K.K., Tokyo, Japan) and inserted into the EcoRI and BamHI sites of phCMV3. For the construction of the echidna ASCT1 and ASCT2 expression piggyBac plasmids (pPB-tacASCT1 and pPB-tacASCT2), the GFP coding region of the piggyBac plasmid (#VB900088-2265rnj, VectorBuilder, Chicago, IL, USA) was removed by inverse PCR and was replaced with the echidna ASCT1 and ASCT2 fragments using NEBuilder HiFi DNA Assembly Master Mix. All PCRs described above were carried out using KOD One Master Mix (#KMM-101, TOYOBO, Osaka, Japan) with a C1000 Touch thermal cycler (Bio-Rad, Hercules, CA, USA).

Supplementary Material
Supplementary data are available at Molecular Biology and Evolution online.