Endogenous parvoviral elements in the pit viper (Protobothrops mucrosuamatus) and in three members of three mammalian orders: implications for ecology and evolution of genera Amdoparvovirus and Protoparvovirus.

Amdoparvoviruses (family Parvoviridae: genus Amdoparvovirus) infect carnivores, and are a major cause of morbidity and mortality in farmed mink. Relatively little is known about amdoparvovirus evolution, partly because so few endogenous parvoviral elements (PVe) derived from amdoparvovirus-like viruses have been identified. In this study, we systematically screened animal genomes to identify PVe disclosing a high degree of similarity to amdoparvoviruses, and investigated their genomic, phylogenetic and protein structural features. We report the first full-length, amdoparvovirus-derived PVe in the genome of the Transcaucasian mole vole (Ellobius lutescens). Furthermore, we identify four further PVe in mammal and reptile genomes that are intermediate between amdoparvoviruses and protoparvoviruses (genus Protoparvovirus) in terms of their phylogenetic placement and genomic features. In particular, we identified a genome-length PVe in the genome of a pit viper (Protobothrops mucrosquamatus) that is protoparvovirus-like in terms of its phylogenetic placement and the structural features of its capsid protein (as revealed by homology modeling), but exhibits characteristically amdoparvovirus-like features including (i) a putative middle ORF gene, and (ii) the lack of the phospholipase A2 (PLA2) domain as well as (iii) the putative transcription of VP1. These findings indicate that either: (i) amdoparvoviruses evolved from protoparvoviruses via a series of transitional forms, or; (ii) there are as yet uncharacterised parvovirus lineages that possess a mixture of proto- and amdoparvovirus-like characteristics. Our investigation also provides evidence that amdoparvovirus host range has extended to rodents in the past, and that reptilian parvoviruses exist outside of genus Dependoparvovirus. Finally, we show that PVe in the mole vole and pit viper encode intact, expressible replicase genes, adding to a growing body of evidence that these genes have repeatedly been co-opted or exapted in vertebrate genomes.


BACKGROUND 1
The virus family Parvoviridae comprises small, single-stranded DNA viruses that 2 infect vertebrate (subfamily Parvovirinae) and invertebrate (subfamily Densovirinae) 3 hosts. The small (4-6 kb) genome is encompassed by characteristic palindromic repeats, 4 which form hairpin-like secondary structures characteristic for each genus [1]. Despite 5 exhibiting a low level of sequence homology, parvovirus genomes are highly conserved 6 in overall structure, containing two large gene cassettes responsible for encoding the 7 non-structural (NS) and the structural (VP) proteins. The N-terminal region of the minor 8 capsid protein VP1 includes a highly-conserved phospholypase A2 (PLA2) motif, which 9 is considered to be essential to escape from the endosomal compartments after entering 10 the host cell [2]. The parvovirus capsid is icosahedral with a T=1 symmetry, displaying a 11 jelly roll fold of conserved β -sheets linked by variable surface loops, designated variable 12 region (VR) I to IX [3]. 13 Amdoparvovirus is a newly defined parvoviral genus in the family Parvoviridae 14 [4]. The type species -Aleutian mink disease virus (AMDV) -causes an immune 15 complex-associated progressive syndrome in mink (Family Mustelidae: genera Neovison 16 and Mustela) called Aleutian disease or plasmacytosis. First described in 1956, Aleutian 17 disease is presently considered to be one of the most important infectious diseases 18 affecting farm-raised mink [5,6]. AMDV infection is known to be widespread in wild mink 19 as well as in farmed animals [5], and amdoparvoviruses have been identified in other 20 carnivore species, including raccoon dogs [7], foxes and skunks [8][9][10][11]. However, 21 relatively little is known about the biology of amdoparvovirus infection in the natural 22 environment. For example, it is unclear whether amdoparvoviruses cause disease in wild 23 animals, or whether the host range of the genus extends beyond carnivores. 24 Furthermore, relatively little is known about the long-term evolutionary history of 25 amdoparvoviruses, as compared to some other parvovirus genera. In particular, the 26 ancient origins of the Dependoparvovirus and Protoparvovirus genera has been 27 demonstrated through the identification of endogenous parvoviral elements (PVe) at 28 orthologous loci in distantly related mammalian species [12][13][14][15]. PVe are sequences 29 homologous to parvovirus genomes that occur sporadically in eukaryotic genomes, and 30 are presumed to derive from ancient parvovirus genomes via a form of horizontal gene 31 transfer in which DNA sequences derived from viruses are incorporated into the germline 32 of the ancestral host species (presumably via infection of germline cells) such that they 33 are inherited as host alleles [7]. PVe provide a rare and useful source of retrospective 34 22

METHODS 24
Genome screening in silico 25 Vertebrate genome assemblies were obtained from NCBI genomes ( Table S1). 26 Screening was performed using the database-integrated genome-screening tool 27 (available from http://giffordlabcvr.github.io/DIGS-tool/). The DIGS procedure used to 28 identify PVe comprised two steps. In the first, a parvovirus protein sequence (e.g. Rep or 29 Cap) was used to search a particular genome assembly file using the tBLASTn program 30 [23]. In the second, sequences that produce statistically significant matches to the probe 31 are extracted and classified by tBLASTn-based comparison to a set of virus reference 32 genomes (Table S2), this time using the BLASTx program [23]. Results are captured in 33 a MySQL database. 34 We applied a systematic approach to naming PVe. Each element was assigned a 1 unique identifier (ID) constructed from two components. The first component is the 2 classifier 'PVe'. The second component is itself a composite of two distinct 3 subcomponents separated by a period; (i) the name of the lowest level taxonomic group 4 (i.e. species, genus, subfamily, or other clade) into which the element can be confidently 5 placed by phylogenetic analysis; (ii) a numeric ID that uniquely identifies the insertion 6 (for cases where multiple, closely-related PVe occur in the same species genome). 7 8 Sequence analysis 9 ORFs were inferred by manual comparison of putative peptide sequences to 10 those of closely related exogenous parvoviruses. The characterization and annotation of 11 the revealed endogenous sequences was executed using Artemis Genome Browser 12 We screened whole genome sequence (WGS) assemblies of 688 animal species 31 (Table S1) for PVe disclosing a high degree of homology to amdoparvoviruses. 32 Similarity searches using the replicase (NS) and capsid (VP) proteins of AMDV identified 33 six such PVe (Table 1). These sequences were revealed in five species genomes, 34 including one reptile -the pit viper (Protobothrops mucrosquamatus) -in addition to four 1 mammals including three placental species (one rodent and two afrotherians), and one 2 marsupial species: the Tasmanian devil (Sarcophilus harrisii). 3 In all cases, genomic regions exhibiting similarity to amdoparvoviruses were 4 present within large contigs, and there was little question they represented PVe as 5 opposed to contaminating virus. Further investigation of these loci revealed additional 6 PVe sequences upstream and downstream of the amdoparvovirus-like regions initially 7 identified via DIGS. We generated an overview of the six PVe loci, delineating the 8 locations of promoters, polyadenylation signals, parvovirus genome features, and 9 transposable element insertions (Figure 1). Their relative length and location of the 10 preserved fragments compared to the exogenous amdoparvovirus genome is shown in 11 We identified two Amdoparvovirus-derived sequences in the genome of the 20 Transcaucasian mole vole (Ellobius lutescens). The first of these elements spanned a 21 near complete genome containing both the NS and VP genes, while the second 22 spanned the majority of the NS gene, with no identifiable VP present (Figure 2). 23 Following the systematic approach described in the methods section, these elements 24 were given the identifiers PVe-Amdo-EllLut.1 and PVe-Amdo-EllLut.2 respectively 25 ( Table 1). In the interests of brevity, we refer to these elements as EllLut.1 and EllLut.2 26 in subsequent text. 27 Amdoparvovirus genomes encode a short middle ORF (M-ORF) of unknown 28 function between the two major open reading frames (ORFs), NS and VP. As shown in 29 Figure 1, a region of potentially protein-coding sequence that corresponds to the M-ORF 30 of AMDV is present in the first of these elements. A methionine (M) residue that might 31 represent the start codon of an M-ORF gene product could not be identified. However, 32 this is also the case for several exogenous amdoparvovirus isolates. 33 The putative NS ORF of EllLut.1 has gaps relative to AMDV -in fact, the 1 complete N-terminal region is absent up to residue 28G of the protein encoded by AMDV 2 NS gene. The gap in NS is apparently due to a deletion (since the corresponding region 3 is present in EllLut.2). A partial VP ORF could be identified downstream of the NS gene, 4 corresponding to the VP1u and the VP2 N-terminal, as well as nucleotides encoding the 5 last 173aa of the C-terminus. Interestingly, the complete missing fragment could be 6 identified downstream of the NS ORF, encompassed by LINE and SINE elements. None 7 of these VP homologues contained a PLA2 motif. The VP1u encoding sequence 8 included a possible intron, hence displaying the typical VP1u transcription pattern of 9 amdoparvoviruses [22]. The putative VP ORF encoded by EllLut.1 has a gap relative to 10 the AMDV VP that spans most of the 5' region of the gene, presumably due to these 11 sequences having been deleted. Frameshifting mutations are present in both the NS and 12 one in the VP pseudogenes of EllLut.1. The element is integrated into a locus that is 13 homologous to mouse chromosome 12. 14 The EllLut.2 element comprises the NS gene alone (Figures 1 and 2). This 15 element is integrated into a locus immediately adjacent to the sequences encoding the 16 MAF BZIP transcription factor G (MAFG) gene, which in the mouse genome is located in 17 the 11qE2 region of chromosome 11. The otherwise intact NS gene lacks the methionine 18 start codon, however, right upstream the three stop codons disrupting the MAFG gene, a 19 conventional ATG start codon could provide translation initiation to express a MAFG-NS 20 fusion product (Figure 1). The identification of a potential promoter sequence 21 downstream of the MAFG gene supports the existence of such a fusion protein. 22 Phylogenies on both VP and NS aa sequences showed that both mole vole 23 elements were relatively closely related and grouped robustly within the clade defined by 24 exogenous amdoparvoviruses (Figure 3). We identified empty integration sites in the E. 25 talpinus genome at the loci where the EllLut.1 and EllLut.2 elements are integrated in 26 E.lutescens. This indicates that both elements were integrated into the E.lutescens 27 germline after these two species diverged ~10 million years ago (MYA) [33,34]. It seems 28 unlikely, however, that either element has been integrated in very recent times. Firstly, 29 the degraded nature of EllLut.1 indicates that it has been resident in the germline for 30 some time. Furthermore, the genomes of two E.lutescens individuals have been 31 generated (genomic DNA was obtained from the livers of both a male and a female 32 individual). Both PVe were present in both individuals. The EllLut.2 element in the 33 female animal had a 13-14 bp deletion relative to the one in the male. 34 The identification of an empty integration in E. talpinus facilitated the identification 1 of genomic flanking regions for both PVe. In the case of EllLut.2, genomic flanks occur 2 close to the coding region of NS gene, suggesting the element was derived from an 3 mRNA that was reverse transcribed and integrated into the nuclear genome of an 4 ancestral germline cell. By contrast, EllLut.1 appears to be derived from genome-length 5 nucleic acid, although the partial, otherwise non-disrupted downstream VP ORF might 6 have gone through the same integration process as the NS gene of EllLut.2. In addition 7 to containing regions of both NS and VP, this element includes a region of 3' sequence 8 between the end of VP and the beginning of the genomic flanking sequence that exhibits 9 similarity to the 3' untranslated region of AMDV and contains inverted repeats capable of 10 folding into a stem loop structure (data not shown). We could not, however, identify 11 sequences corresponding to the 5' UTR in the EllLut.1 element. No methionine start 12 codon could be identified at the start of the NS pseudogene encoded by this element. 13 14

Amdoparvovirus-like PVe in the pit viper germline 15
We identified a PVe in the genome of the pit viper (Protobothrops 16 mucrosquamatus) genomes that encoded a nearly complete parvovirus genome. The 17 entire PVe sequence was ~4.5 kb in length and was integrated into the complementary 18 strand of the Probothrops genome. It comprised two major ORFs and a minor ORF as 19 well as a clearly-identifiable and potentially functional downstream promoter. 20 Furthermore, two polyadenylation signals could be identified. The pit viper PVe was 21 flanked by partial palindromic repeats, resembling the amdoparvoviral hairpin structures 22 (Figure 4). 23 The first major ORF exhibited a relatively high degree of aa identity to the AMDV 24 NS protein (35% with no deletions). The second major ORF, which was disrupted by 25 nonsense mutations (two stop codons, two frameshifts), was homologous to 26 amdoparvovirus VP encoding genes (36% aa identity with skunk amdoparvovirus VP). 27 This ORF did not possess a conventional Met start codon to express VP1, however, the 28 three aa-long exon leader, revealed upstream, is suspected to provide this, like in case 29 of amdoparvoviruses. Aligned with the EllLut.1and exogenous amdoparvovirus VP aa 30 sequences, deletions of various lengths could be observed, limited almost exclusively to 31 the variable regions (Figure 5a). These VRs form surface loops responsible for host-32 virus interactions, such as immunogenicity and receptor attachment. The only insertion, 33 six-aa-long, was present in VR VIII (Figure 5a). In contrast, the EllLut.1 VP displayed 34 very similar organization to the exogenous amdoparvovirus VPs. Despite of lacking the 1 PLA2 domain, in all exogenous amdoparvovirus VP aa sequences a polyglycine (poly-G) 2 stretch could be revealed, which is suspected to be responsible for externalizing the 3 VP1u, so the PLA2 enzymatic function can be carried out as well as revealing the 4 nuclear localization signal (NLS) [35]. The poly-G was present in the predicted VP 5 sequence, in contrast with the EllLut.1 VP, from which it was absent (Figure 6). 6 Interestingly, both PVe VP1u displayed the putative NLS. 7 The NS gene of the pit viper PVe clustered as an outgroup to a clade containing 8 the mole vole PVe and exogenous amdoparvoviruses (Figure 3a). However, this 9 relationship was not strongly supported by bootstrap values. The pit viper PVe locus 10 occurs on a contig that has not been mapped to a specific chromosome. Nevertheless, 11 the pre-integration locus could be identified in WGS data of two other reptilian species: 12 the Burmese python (Python bivittatus), and a colubrid, the common garter snake 13 (Tamnophis sirtalis). Both genomes contain the LINE transposon of the RTE-BovB class 14 [36] but lack any sequences of parvoviral homologues. 15 A small ORF was identified between the putative NS and VP genes. The similar 16 position ORF to ORF-M of amdoparvoviruses would suggest this ORF could represent a 17 highly divergent form of the amdoparvovirus ORF-M gene. 18 19

Amdoparvovirus-like PVe in marsupial and afrotherian germlines 20
We identified three additional short, fragmentary matches to amdoparvoviruses in 21 mammalian genomes ( Table 1). One of these, identified in the Cape hyrax (Procavia 22 capensis), has been reported previously [14]. We identified two additional, similar length 23 matches in the Tasmanian devil (Sarcophilus harrisii) and aardvark (Orycteropus afer) 24 genomes, both of which turned out to contain complete or near complete VP genes 25 (Figure 1 and 2). As all three of these sequences were heavily disrupted by stop 26 codons, frameshifts as well as retrotransposable elements, their former coding 27 sequences were present as small fragments. With the exception of the aardvark EVE, 28 which harbored a partial, highly disrupted NS homologue of 343 aa, only minimal trace of 29 the former replication protein encoding genes could be detected (Figure 1). At the N-30 terminal of their putative VP1 sequences a well-preserved calcium-binding loop of the 31 PLA2 domain could be revealed, however, the catalytic core was barely recognizable in 32 the Cape hyrax PVe and completely absent in the aardvark PVe (Figure 6). Similarly to 33 the EllLut.1 VP, the poly-G stretch was absent in all three cases, although only the Cape 1 hyrax PVe lacked the NLS. 2 In phylogenies based on NS (Figure 3a), PVe from the aardvark grouped 3 together with the Mpulungu bufavirus of shrews [18], as a robustly-supported sister 4 group to rodent, ungulate and carnivore protoparvoviruses. VP-based phylogenies unite 5 PVe from the pit viper, Tasmanian devil, hyrax and aarvark in a sister clade to 6 protoparvoviruses (with the exception of the bufavirus), but with low support. The 7 Tasmanian devil PVe clustering as an outgroup to the afrotherian PVe was well-8 supported.  (Figure 4b and c). In this study, we screened animal genomes in silico to identify PVe disclosing a 2 high degree of similarity to amdoparvoviruses. We identified six such sequences, and 3 used comparative approaches to investigate them. Our analysis revealed new 4 information about the biology of amdoparvoviruses and their evolutionary relationships 5 with protoparvoviruses. We also found evidence from comparative analysis that some of 6 the parvovirus replicase genes we identify may have been co-opted or exapted by the 7 host species genomes in which they occur. 8 9 The genomic fossil record of amdoparvovirus-like parvoviruses 10 The most conspicuously amdoparvovirus-like PVe were identified in the genome 11 of the Transcaucasian mole vole. These two elements, which share ~80% identity at the 12 amino acid level, were found to group robustly within the amdoparvovirus clade in ML 13 phylogenies (Figure 3). They also exhibit characteristic features that support their 14 grouping within the genus Amdoparvovirus, including the absence of the PLA2 domain 15 from the predicted VP protein sequence as well as the presence and location of the 16 intron in their VP1u-encoding gene (Figures 1 and 6). No other PVe were identified that 17 grouped within the Amdoparvovirus genus. However, we identified several that displayed 18 a mixture of amdoparvovirus and protoparvovirus features. In the pit viper element, 19 certain aspects of genome organization suggested an amdoparvoviral origin -specifically 20 (i) the presence of a single promoter and two polyadenylation signals, and (ii) the 21 attributes of the intron in the VP1u and the PLA2 absence. In phylogenetic terms, 22 however, these PVe appear to be marginally more closely related to protoparvoviruses 23 than to amdoparvoviruses (Figure 3). 24 Our phylogenies support a common evolutionary origin for; (i) genus 25 Amdoparvovirus; (ii) genus Protoparvovirus; (iii) the recently described 'bufaviruses', 26 which appear to represent highly divergent protoparvoviruses [18,40]; and (iv) PVe 27 derived from a basal, amdoparvovirus-like protoparvovirus lineage (referred to hereafter 28 as 'AP'). Within the AP lineage, the pit viper PVe consistently groups in a basal position 29 (Figure 3), close to the split between the Amdoparvovirus and Protoparvovirus genera. 30 Furthermore, this PVe harbors the most amdoparvovirus-like characteristics out of the 31 AP stem group. This suggests it might represent an ancestral progenitor of 32 contemporary amdoparvoviruses, potentially implying that genus Amdoparvovirus has an 33 origin in lower vertebrate hosts, which might have been reptiles. In this scenario, the 34 afrotherian and the marsupial PVe of the AP stem group represent transitional forms 1 along a pathway from protoparvovirus-like complete PLA2 domains to the PLA2-absent 2 amdoparvoviral-like VP1u. Thus, the absence of heavy degradation of the catalytic core 3 in these PVe might reflect various stages of PLA2 diminishment during evolution. 4 However, these PVe could also be remnants of an as yet unknown parvovirus lineage, 5 distinct from Amdoparvovirus, that contained features of both amdo-and 6 protoparvoviruses and may now be extinct. 7 8

Evolution of amdoparvovirus and protoparvovirus capsid proteins 9
We used homology modelling to infer the structures of the capsid (VP) proteins 10 encoded by the ancestral parvoviruses that gave rise to the mole vole and pit viper PVe. 11 Comparison of predicted VP proteins structures to those of exogenous 12 amdoparvoviruses revealed that most differences are limited to highly variable regions, 13 i.e. VR loops and VP1u. Currently, we cannot be certain these differences reflect 14 changes prior to germline incorporation. However, as the VRs are situated on the virion 15 surface and play an important role in receptor attachment and host immune interactions, 16 this should be amenable to experimental investigation [37]. All known amdoparvoviruses 17 infect carnivore hosts. Since successful infection and replication in a carnivore cell 18 versus a rodent or reptilian cell likely requires very different surface features, we might 19 expect to see a high degree of divergence in these regions. Consistent with this, the 20 number of deletions in the VR regions of the putative pit viper PVe VP protein sequence 21 were found to be relatively high compared to those in the mole vole PVe. Fundamental 22 differences in the reptile anti-viral response (namely the greater reliance on native, 23 aspecific immune processes as opposed to adaptive processes) could account for this 24 [38]. This idea has previously been proposed to explain the smooth surface features of 25 invertebrate-infecting densoviruses [39]. 26 The results of homology modelling should always be interpreted with caution -27 nevertheless, the more protoparvovirus-like capsid appearance of both PVe is intriguing, 28 as it is consistent with the close evolutionary relationship between the Amdoparvovirus 29 and Protoparvovirus genera suggested by phylogenies. Possibly, genus amdoparvovirus 30 has changed tropism or receptor specificity during its evolution. In other words, these 31 PVe might share an ancestral tropism or receptor specificty with carnivore 32 protoparvoviruses, in contrast to AMDV, hence their similar appearance. 33 As in the case of VR regions, we cannot say for certain whether the inferred loss 1 of PLA2 function (see Figure 3) occurred before or after integration, however, the 2 otherwise intact, and clearly-recognizable nature of the calcium binding loop suggests it 3 occurred before. Potentially, the loss of the poly-G stretch in capsid (see Figure 6) might 4 have occurred in the progenitor virus, and could be related to the loss of PLA2 function. 5 The insertion in the pit viper VP VRVIII is a unique finding of unknown role, however, 6 dependoparvovirus-derived PVe in marsupial genomes have also been found to harbor 7 extended VRVIIs, similar to the VP encoded by AmdoPVe-EllLut [40]. 8 9 Timescale of amdoparvovirus evolution 10 Unfortunately, the present data do not allow us to unequivocally demonstrate 11 minimum ages for any of the PVe described here, since we did not identify any 12 orthologous PVe in related species. Nevertheless, the mutational degradation observed 13 in many of these elements suggests they are likely to have similarly ancient origins to 14 other PVe (i.e. extending back many millions of years). Furthermore, in the case of the 15 mole vole, the presence of two EllLut.2 alleles was indicated (one containing a deletion), 16 suggesting that, at the very least, these elements have been present in the species gene 17 pool for multiple generations. 18 While we cannot be certain how recently in evolutionary history the mole vole and 19 pit viper PVe were generated, we could establish maximum age bounds for these 20 elements, based on the identification of empty integration sites in related species. In the 21 case of the pit viper PVe, identification of an empty orthologous insertion site in colubrid 22 snakes established that incorporation into the viper germline occurred within the last 34-23 54 Mya [41]. In the case the mole vole, empty insertion sites were identified in the most 24 closely related sister taxon (E. talpinus), establishing that the PVe in the E.lutescens 25 were incorporated after these species diverged ~10Mya. In reality, the mole vole 26 elements could be much younger than this, but even if we assume they are ~10 MY old, 27 we might still expect to observe the close relationship with contemporary 28 amdoparvoviruses indicated by our phylogenies, despite the rapid rates of evolution 29 observed in the parvoviruses. This is due to the time-dependent bias of viral evolutionary 30 rates, which can vary by several orders of magnitude depending on the timeframe of 31 measurement [42]. In fact, given that the timescale of parvovirus evolution has already 32 been shown to date back many millions of years, we think it is likely that 33 amdoparvoviruses have similarly ancient origins, and identification of additional PVe will 1 eventually confirm this. 2 3

Insights into amdoparvovirus and protoparvovirus host range 4
Our findings suggest that the apparently restricted host range of 5 amdoparvoviruses (order Carnivora) may simply reflect limited sampling. Recently, a 6 partial amdoparvovirus capsid sequence was identified via metagenomic screening of 7 samples derived from the least horseshoe bat (Rhinolophus pusillus), suggesting that 8 bats (order Chiroptera) may also harbour amdoparvoviruses. However, as with all virus 9 sequences recovered via metagenomic sequencing, there remains a degree of 10 uncertainty regarding host associations [43], whereas the identification of very clearly 11 amdoparvovirus-derived PVe in the genome of the Transcaucasian mole vole constitutes 12 unequivocal evidence that amdoparvoviruses have infected rodents in the past. 13 The pit viper PVe described here is the first to be identified in a reptile genome 14 and provides the first evidence for the existence of reptilian parvoviruses outside of the 15 genus Dependoparvovirus. Interestingly, Dependoparvovirus, the other genus of 16 Parvoviridae known to infect reptiles is also suspected to be of reptilian origin [1]. 17 18

PVe-encoded replicase genes appear to have been exapted or co-opted 19
Amongst the various parvovirus-derived EVEs that have been reported so far, 20 several have exhibited such intact or nearly intact replicase genes. [44,45]. In this 21 report, we identify yet another, relatively intact and independently acquired parvovirus 22 replicase in a rodent genome. Intriguingly, analysis of the locus indicated that this 23 replicase is possibly expressed as a joint protein with the partial MAFG gene product. As 24 MAFG is a transcription factor, this fusion protein might have gained new, beneficial 25 functions as far as the host cell's transcription machinery and gene expression are 26 concerned. Furthermore, the PVe we identify in the pit viper genome also appears to 27 encode an intact, expressible, parvovirus replicase gene, and represents the first non-28 mammalian example of this phenomenon. Fixation of PVe is extremely unlikely and 29 would be expected to occur over many host generations when it does occur. Thus, the 30 observation that (i) PVe encoding replicase genes have been independently acquired by 31 multiple vertebrate species, and (ii) the capacity of these PVe to express mRNA and 32 protein has apparently been maintained, suggests that these genes are being 33 functionalised and selected for in some as yet undetermined way. 34 1 Conclusions 2 Our analysis of PVe indicates that either; (i) amdoparvoviruses and 3 protoparvoviruses evolved from a common ancestor via a series of transitional forms, or 4 (ii) there are, or have been in the past, parvovirus lineages that possess a mixture of 5 proto-and amdoparvovirus-like characteristics. Our investigation also provides insight 6 into parvovirus host range, suggesting amdoparvoviruses likely infect a broad range of 7 mammalian species, and that a lineage of protoparvovirus-like viruses infects reptiles, 8 and might have given rise to exogenous amdoparvoviruses of today. Finally, we provide 9 further evidence that replicase genes of PVe derived from multiple Parvovirinae genera 10 have been co-opted or exapted in vertebrate genomes. 11       structure for all the three further models. The bar shows the distance from the capsid center in Ångströms and the structures are colored accordingly. The pentagon marks the five-fold, the triangles the three-fold and the two-fold is indicated by an ellipse. The arrows mark the VRIII region of the pit viper EVE and the VRVII of the Transcaucasian mole vole EVE (AmdoPVe-EllLut-1) capsids, which contain the only insertions compared to amdoparvovirus VR regions. (C) Ribbon diagrams of the VRVIII (left) and VRVIII (right) loops of CPV (blue), Aleutian mink disease virus (AMDV) (black), AmdoPVe-EllLut-1 (pink) and the pit viper EVE (yellow) capsid models and structures. Figure 6. Alignment of the minor capsid protein VP1 N-terminals of representative members from genera Protoparvovirus (yellow) and Amdoparvovirus (cyan), including the VP1 unique region, the location of the phospholipase A2 (PLA2) domain, if present (bold). The calcium binding site is marked with the blue arrows and black arrows highlight the catalytic core. A glycine-rich stretch is present at the beginning of the major capsid protein VP2-overlapping region (framed), which is suspected to be responsible for conformational changes during endosomal escape. Similarly to amdoparvoviruses, the pit viper and vole PVe completely lack the PLA2, whereas the catalytic core absent in case of the aardvark EVE. Remnants of the G-rich region are recognizable in case of the Tasmanian devil PVe, but entirely absent in the aardvark, rock hyrax and vole PVe. The putative nucleus localization signal (underlined) is present in all endogenous sequences with the exception of the rock hyrax PVe. Abbreviations:EllLut-1 -Transcaucasian mole vole endogenous amdoparvovirus, PV -parvovirus