Differential Evolution of Antiretroviral Restriction Factors in Pteropid Bats as Revealed by APOBEC3 Gene Complexity

Abstract Bats have attracted attention in recent years as important reservoirs of viruses deadly to humans and other mammals. These infections are typically nonpathogenic in bats raising questions about innate immune differences that might exist between bats and other mammals. The APOBEC3 gene family encodes antiviral DNA cytosine deaminases with important roles in the suppression of diverse viruses and genomic parasites. Here, we characterize pteropid APOBEC3 genes and show that species within the genus Pteropus possess the largest and most diverse array of APOBEC3 genes identified in any mammal reported to date. Several bat APOBEC3 proteins are antiviral as demonstrated by restriction of retroviral infectivity using HIV-1 as a model, and recombinant A3Z1 subtypes possess strong DNA deaminase activity. These genes represent the first group of antiviral restriction factors identified in bats with extensive diversification relative to homologues in other mammals.


Introduction
Bats are reservoirs of emerging viruses that are highly pathogenic to other mammals including humans. In recent years, viruses such as Ebola, Marburg, Hendra, Nipah, Sosuga, and severe acute respiratory syndrome coronavirus (SARS-CoV) have crossed species barriers from their natural hosts into humans (Calisher et al. 2006;Smith and Wang 2013;Amman et al. 2015;Hayman 2016). Intriguingly, experimental infections have demonstrated that specific bat species can be infected with Ebola, Marburg, SARS-CoV, Hendra, and Nipah viruses without the development of clinical symptoms (Swanepoel et al. 1996;Williamson et al. 1998;Middleton et al. 2007;Watanabe et al. 2010;Jones et al. 2015). This has led to efforts to uncover differences that might exist between the antiviral strategies of bats and other mammals Zhang et al. 2013;Zhou et al. 2016).
Based upon studies of endogenous retroviruses, bats are an important source of mammalian retroviruses and many cross-species transmissions of retroviruses from bats to other mammals have been documented (Hayward et al. 2013;Cui et al. 2015). A number of antiviral restriction factors are known to be capable of inhibiting the normal replication of retroviruses; among the most well studied of these are the members of the APOBEC3 (A3) protein family. A3 proteins are capable of restricting the replication of several virus families including retroviruses, hepadnaviruses, and parvoviruses, in addition to retrotransposons which are genomic parasites (Chen et al. 2006;Muckenfuss et al. 2006;Renard et al. 2010). A3 proteins are DNA cytosine deaminases capable of hypermutating viral genomes and sterically hindering nascent DNA synthesis (Sheehy et al. 2002;Lecossier et al. 2003;Refsland and Harris 2013). All A3 proteins contain one or two zinccoordinating cytosine deaminase domains (Z-domains) which are required for deaminase activity (Chiu and Greene 2008). A3 proteins are divided into three phylogenetic subgroups, Z1, Z2, and Z3, which are distinguished by the conservation of amino acid residues within the Z-domain (LaRue et al. 2009;Refsland and Harris 2013). We hypothesized that, given the prolonged history of coexistence between bats and their retroviruses, differences exist relative to other mammals in the antiviral genes responsible for retroviral restriction.
Pteropid bats are species within the family Pteropodidae, and are implicated as reservoirs of emerging human viral pathogens (Hayman 2016). Here, we report assemblies of the A3 loci in the closely related pteropids, Pteropus vampyrus and Pteropus alecto, which harbor Nipah and Hendra viruses, respectively (Halpin et al. 2000;Rahman et al. 2010). Several bat genomes (Fang et al. 2015) have been published in recent years; however, no available assembly has resolved the A3 gene locus. This failure is likely due to the presence of many repetitive sequences, generated by extensive gene duplication and recombination, which are known to hinder genome assembly (Zhang et al. 2013;Fang et al. 2015).
In this study, pteropid A3 loci were generated through the step-wise extension of cDNA-mapped gene scaffolds followed by remapping of sequence read archives, supported experimentally by an A3 expression analysis of bat spleen tissue. The functionality of P. alecto A3 subtypes were investigated through analysis of their ability to deaminate DNA, and restrict the infectivity of a model retrovirus. The role of A3mediated restriction of ancient bat retroviruses was assessed using a hypermutation analysis of endogenous retroviruses within the genome of P. vampyrus.

Results
Pteropid Bats Express a Diverse Range of APOBEC3 Gene Products Pteropid A3 gene expression was initially determined by searching for A3 homologues in the P. alecto transcriptome database through a tBLASTn analysis using the human A3 proteins A3A, A3C, and A3H, representing the three zinccoordinating cytosine deaminase motif (Z-domain) subtypes A3Z1, A3Z2, and A3Z3, as search queries. A3 expression was subsequently assessed through analysis of cDNA (complementary DNA) generated from P. alecto spleen tissue, which harbors multiple immune cell types known to express A3 genes in humans (Refsland et al. 2010). This analysis revealed 20 distinct A3 mRNA transcripts derived from seven A3Z1 genes, five A3Z2 genes, and a single A3Z3 gene (supplementary table S1, Supplementary Material online). Pairwise comparison of these A3 transcripts (supplementary fig. S1, Supplementary Material online) reveal a wide range of sequence identities between pteropid A3 genes, indicating considerable diversity among homologs: A3Z1 genes range from 86% to 97% identity, whereas A3Z2 genes range from 54% to 99% identity.
In addition to the expected A3Z1, A3Z2, and A3Z3 subtypes, several A3 amino acid sequences contained Z-domains distinct from the canonical domains as defined by a specific combination of key conserved residues (LaRue et al. 2009). This group, designated A3Z2B, carried both the A3Z2 domain-specific tryptophan-phenylalanine (WF) residues and the A3Z1 domain-specific arginine-adjacent isoleucine (RI) motif ( fig. 1A). Of the 20 unique transcripts, 19 harbored a single Z-domain and one contained two Z-domains, an Nterminal A3Z2B domain paired with a C-terminal A3Z3 domain. These results indicate that pteropid bats express a more diverse repertoire of A3 mRNA than reported to date for other mammals ( fig. 1C).

Generation of the Pteropid APOBEC3 Gene Locus
The generation of the pteropid A3 loci was accomplished in a step-wise manner outlined in supplementary fig. S2, Supplementary Material online. The A3 gene locus in Eutherian mammals is located between the CBX6 and CBX7 genes (LaRue et al. 2008). BLASTn searches using the P. alecto A3 cDNA products as queries revealed that publicly accessible bat genomes of P. alecto and P. vampyrus did not contain assembled A3 gene loci located within the boundaries of CBX6 and CBX7. However, the bat A3 cDNA could be mapped, albeit across numerous discontiguous scaffolds, to the P. vampyrus genome accessible through the Ensembl database. The A3 locus contains many large, repetitive sequence elements that result in the failure of automated assembly. This problem was overcome by matching exons derived from sequenced cDNA, allowing the informed pairing of discontiguous scaffolds. This was not possible using the P. alecto genome as it did not contain sufficient assembled scaffolds to which A3 cDNA products could be mapped. It was reasonable to map P. alecto cDNA products against the P. vampyrus genome as these species are very closely related, with a recent divergence of $4.4 Ma and 97-99% genomic nucleotide sequence identity (Almeida et al. 2014). The P. vampyrus A3 locus was then generated through a step-wise extension of preexisting P. vampyrus gene scaffolds to join the gaps between exon-paired scaffolds, followed by remapping of the sequence read archives (SRA) to validate and generate the 190 kB P. vampyrus A3 locus. The P. alecto A3 locus was generated by mapping P. alecto reads against the P. vampyrus A3 locus and was found to be slightly shorter at approximately 188 kB as a result of multiple short 3-20 nt insertions and deletions. These indels occurred in the A3 loci since the divergence of P. alecto and P. vampyrus, likely as a result of nonhomologous recombination events, the frequency of which is known to be increased in regions of the genome that contain sections of repetitive DNA (Metzenberg et al. 1991).
Within the P. alecto A3 locus 18 gene coding regions were identified including 12 A3Z1, one prototypical A3Z2 (herein described as A3Z2A), four A3Z2B, and one A3Z3 ( fig. 1B). The organization and number of the A3 genes was found to be identical between P. vampyrus and P. alecto. Neural network promoter prediction revealed promoter elements immediately upstream of each gene. Interferon-stimulated response element (ISRE) transcription factor binding sites for interferon regulatory factors and NF-jB were also identified in each promoter region (supplementary data S1, Supplementary Material online). These data indicate that each pteropid A3 gene is most likely capable of being transcribed and that their expression is potentially upregulated as part of the antiviral interferon response. Differential Evolution of Antiretroviral Restriction Factors in Pteropid Bats . doi:10.1093/molbev/msy048 MBE All P. alecto A3 cDNA sequences were successfully mapped against the assembled P. alecto loci, revealing various alternative splicing schemes that allow many of the A3 genes to express numerous transcripts (supplementary fig. S3 and table S1, Supplementary Material online). A comparison of the A3 loci of P. vampyrus and P. alecto against the A3 loci of other mammals ( fig. 1C) reveals that pteropid bats possess the largest A3 locus yet reported, containing 18 putative A3

Bats Possess a Novel Subtype of APOBEC3
Analysis of the phylogenetic relationships between the A3 proteins of P. alecto and other mammals ( fig. 2) confirms the expected phylogenetic positions of the A3Z1, A3Z2A, and A3Z3 proteins, and reveals that the A3Z2B proteins, containing both Z1-and Z2-defining sequence motifs, phylogenetically cluster within the broader A3Z2 clade. Notably, pteropid A3Z2A and A3Z2B are positioned a greater sequence distance from each other than from the A3Z2 proteins of distantly related mammals, revealing that pteropid bats possess four phylogenetically distinct A3 Z-domain subtypes, two of which appear to be derivatives of the Z2 clade.

Bat APOBEC3 Proteins Are Potent DNA Mutators
We tested the activity of the bat A3 proteins for which expression was verified (supplementary table S1, Supplementary Material online) by assessing their ability to cause mutations in E. coli though a rifampicin mutagenesis assay, utilizing human A3B and A3G as positive controls because they are known to be expressed and enzymatically active in E. coli (Holden et al. 2008;Shi et al. 2015). Mutations in the rpoB gene increase with mutagenic stress mediated by A3 (Harris et al. 2002;Shi et al. 2015). Mutational frequencies above background levels were only observed for A3Z1 proteins, and we found that four bat A3Z1s showed a 10-fold or greater increase in mutation frequency ( fig. 3A). In particular, A3Z1c isomer B showed over 1000-fold increase above empty vector and 100-fold above A3Z1c isomer A. To determine if the difference in mutation frequency correlated with differences in mutation signatures, we sequenced a PCR-amplified portion of the rpoB gene known to harbor hotspots for rifampicin resistance. Bat A3 mutation signatures were compared against human A3B and A3G ( fig. 3B), which have dinucleotide sequence preferences for 5 0 -TC and 5 0 -CC, respectively. We found that bat A3Z1c isomer A was highly specific for causing mutation at a known 5 0 -TCG hotspot at nucleotide position 1585, which is also the preferred target site for A3B (Shi et al. 2015). In contrast, A3Z1c isomer B targeted sites at position 1565, 1585, and 1592, which have 5 0 -TCT, 5 0 -TCG, and 5 0 -TCC contexts, respectively. Decreased specificity may account for the high mutation frequency observed in the rifampicin mutagenesis assays. These results show that bat A3 proteins are functionally active and can mutate cytosines in a diverse range of dinucleotide contexts.

Bat APOBEC3 Proteins Restrict HIV-1 Infectivity
The antiviral activity of bat A3 proteins was determined by assessing the capacity of representatives of each bat A3 subtype to restrict HIV-1 infectivity in target cells. HIV-1 encodes the anti-A3 counter-measure, Vif, a protein that causes the proteolytic degradation of A3 through the recruitment of the cellular E3 ubiquitin ligase (Marin et al. 2003;Sheehy et al. 2003;Kane et al. 2015). HIV-1 stocks were generated by cotransfection of 293 T producer cells with bat A3 and an HIV-1 variant that does not express Vif. HIV-1 particles were harvested from the cell culture supernatant, and virion production was determined by quantifying virion-associated reverse transcriptase (RT) activity. HIV-1 infectivity was assessed by inoculating human TZM-bl target cells, which express luciferase in the presence of HIV-1 infection, with virus normalized to equivalent levels of virion-associated RT activity. Four bat A3 proteins demonstrated a dose-dependent inhibition of HIV-1 infectivity ( fig. 4). The sole double Z-domain bat A3 protein, A3Z2Bd-Z3, produced the largest reduction in HIV-1 infectivity, which was comparable to inhibition by the human A3 protein, A3G. These data indicate that bat A3 proteins are functional restriction factors.
FIG. 2. Mammalian APOBEC3 phylogeny. The phylogenetic relationships of mammalian A3 proteins were inferred using the maximum likelihood method and JTT þ G model with 1,000 bootstrap replicates. A3 subgroups Z1, Z2, and Z3 are paralogs, which have expanded independently. Bootstrap values for nodes represented in >40% of trees are shown. A3Z1, A3Z2/Z2A, A3Z2B, and A3Z3 clades are shown in green, orange, pink, and blue, respectively. The scale represents the number of amino acid substitutions per site. Differential Evolution of Antiretroviral Restriction Factors in Pteropid Bats . doi:10.1093/molbev/msy048 MBE Ancient Bat Retroviruses Were Hypermutated by APOBEC3 A3 activity results in C to U lesions in the negative (cDNA) strand, which lead to positive (genomic) strand G to A mutations in endogenous retroviruses (ERVs) within host genomes (Perez-Caballero et al. 2008). A3 proteins also exhibit local sequence-context bias in the nucleobases they deaminate. The presence of strand-biased, context-specific A3 mutagenic signatures in ERVs constitutes unambiguous evidence of a past encounter with one or more cellular A3 enzymes (Lee et al. 2008). To assess hypermutation of pteropid ERVs, we identified a group of closely related ERVs from each of the Betaretrovirus and Gammaretrovirus genera present in the genome of P. vampyrus. We interrogated the P. vampyrus genome since the P. alecto genome assembly does not contain sufficient contiguous endogenous retroviral sequences to conduct this analysis. A previously reported group of bat endogenous betaretroviruses, sub-classified as Group 7 betaretroviruses, was selected and a group of bat endogenous gammaretroviruses (herein described as Group 1 gammaretroviruses) was identified in the manner described previously (Hayward et al. 2013). Phylogenies were generated using the maximum likelihood method and an ancestral reconstruction was performed by extrapolation of the common ancestor of each group (see supplementary fig. S4, Supplementary Material online). Mutations in each ERV were identified by comparison against the group's common ancestor. Strand-biased G to A hypermutation and contextspecific GG and GA dinucleotide preferences were identified in both groups ( fig. 5). These results are consistent with the hypothesis that ancient bat retroviruses were subjected to A3-mediated hypermutation, supporting the expected role of bat A3 proteins as antiviral restriction factors. The divergence times of the bat A3Z1 genes were estimated using a molecular clock dating analysis that indicates these genes arose through duplication events beginning approximately 27 Ma (supplementary fig. S5, Supplementary Material online).

Discussion
This study was undertaken to address the question of differential antiviral gene evolution in bats, and more specifically, whether important differences exist in a major class of genes that control and restrict retroviruses and genomic retroelements. We demonstrated that bats possess 18 A3 genes, 13 of which are transcriptionally active in spleen tissue, with several proving active in a heterologous E. coli base mutation assay or MBE an HIV-1 restriction assay. In addition to the expected A3Z1, A3Z2, and A3Z3 phylogenetic subtypes, we identified an additional division within the bat A3Z2 clade in which we designated A3Z2A as the prototypical A3Z2 subtype, and A3Z2B, as a subdivision that possesses a combination of invariant amino residues key to the amino acid character of both the A3Z1 and A3Z2 subtypes. Representative A3 loci have previously been delineated for every Eutherian order with the exceptions of Chiroptera (bats) and all orders within the clade Afrotheria (e.g., elephants, tenrecs, dugongs, and hyraxes) (Nakano et al. 2017). With the caveat that the A3 loci of the vast majority of mammalian species remain to be resolved, the pteropid bat species studied here are shown to possess the largest and most phylogenetically diverse antiviral A3 gene repertoire of any mammal reported to date.
Assessment of the functionality of bat A3 proteins through a rifampicin mutagenesis assay revealed several A3Z1 proteins as potent mediators of cytosine deamination with A3Z1 isomer B demonstrating a mutational frequency greater than human A3B and A3G. Although some A3Z1 proteins, and A3 proteins of other subtypes were not found to be catalytically active in this assay, numerous factors could account for inactivity within E. coli, such as protein misfolding or missing cellular cofactors and localization requirements. The capacity of bat A3 proteins to restrict retroviral infection was observed through the use of HIV-1 as a model target for A3 functionality. Four bat A3 proteins representing single and double Z-domain proteins of the Z2A, Z2B, and Z3 subtypes were observed to be capable of inhibiting HIV-1 infectivity in a dose-dependent manner. The bat A3Z1 proteins found to be mutagenic were not found to be capable of restricting HIV-1 infectivity. A possible explanation for this is that not all bat A3 proteins are capable of restricting this specific virus, as is the case for human A3 proteins such as A3B and A3C which inhibit SIV but not HIV, whereas A3F and A3G are potent inhibitors of HIV (Sheehy et al. 2003;Yu et al. 2004;Zheng et al. 2004). Although many endogenous retroviruses are present in bat genomes (Cui et al. 2012;Hayward et al. 2013), no exogenous bat retroviruses have yet been reported. To address our expectation that bat retroviruses could be subjected to restriction by bat A3 proteins we performed a hypermutation analysis of the endogenized retroviruses present in the genome of P. vampyrus and found evidence of strand-biased, context-dependent, cytosine deamination of both beta-and gammaretroviruses. These results implicate bat A3 proteins as the mediators of G to A hypermutation in bat retroviruses.
The large number of A3Z1 genes present in the pteropid A3 locus is surprising given the scarcity of this subtype among other mammals (LaRue et al. 2009). The presence of 12 A3Z1 genes in pteropid bats indicates an evolutionary advantage for pteropid bats in the maintenance and expansion of this subtype. The most comparable APOBEC3 protein in humans is A3A, the sole APOBEC3 single-domain Z1 protein. A3B and A3G are double-domain proteins that have a Z1 paired with a Z2A domain (fig. 1C). Endogenous A3A is cytoplasmically FIG. 4. Antiviral activity of A3 proteins. Vif-deficient HIV-1 was generated in the presence of A3 proteins. Human 293 T cells were transfected with 400 ng of the HIVDvif expression plasmid pIIIBDvif and variable amounts (25-600 ng) of human or bat A3 expression plasmids. TZM-bl indicator cells were inoculated with 293 T cell culture supernatants normalized for viral particles by virion-associated RT activity. Infectivity relative to HIVDvif alone was measured by quantifying luciferase activity in TZM-bl lysates. HIV, human immunodeficiency virus type 1; hA3G, human APOBEC3G; Z1c Iso A, Pteropus alecto APOBEC3Z1c isomer A; Z1c Iso B, Pteropus alecto APOBEC3Z1c isomer B; Z2A Iso A, Pteropus alecto APOBEC3Z2A isomer A; Z2Ba Iso C, P. alecto APOBEC3Z2Ba isomer C; Z3 Iso A, P. alecto APOBEC3Z3 isomer A; Z2Bd-Z3, P. alecto APOBEC3Z2Bd-Z3. Error bars represent the standard error of the mean from three independent assays. Differential Evolution of Antiretroviral Restriction Factors in Pteropid Bats . doi:10.1093/molbev/msy048 MBE located (Land et al. 2013) but accesses the nucleus during telophase, the final stage of mitosis ). It has restrictive activity against retrotransposons (Chen et al. 2006;Koito and Ikeda 2013) and viruses including HIV-1, adenoassociated virus, and human papillomavirus (Chen et al. 2006;Berger et al. 2011;Warren et al. 2015). In humans, evidence suggests that A3Z1 domain-containing proteins may represent a double-edged sword, as potent inhibitors of genomic retrotransposons and a possible threat to the cell's own genomic sequence integrity (LaRue et al. 2008;Roberts et al. 2013;Taylor et al. 2013;Burns et al. 2015;Green et al. 2016). One possible reason for the observed expansion of A3Z1 genes in bats may be related to the evolution of flight, which is hypothesized to be an important link to the physiological conditions allowing bats to be sources of high viral diversity (O'Shea et al. 2014). Increased metabolic capacity has been linked to the evolution of sustained flight in bats (Shen et al. 2010), and the by-products of oxidative metabolism include reactive oxygen species that are known to damage cellular DNA (Cooke et al. 2003). The most notable genetic adaptations involved in the development of flight in bats have been found to exist in the genes responsible for the detection and repair of cellular DNA damage (Zhang et al. 2013;Zhou et al. 2016). It is possible that enhanced tolerance to DNA damage would allow for the expansion of genes such as A3 whose products are known to cause such damage.
Many studies have indicated that A3 enzymes restrict endogenous transposable elements such as LINE-1 (long interspersed nuclear element 1) and Alu (Muckenfuss et al. 2006;Stenglein and Harris 2006;Kinomoto et al. 2007;Wissing et al. 2011). Interestingly, LINE-1 elements became extinct in the pteropid bat lineage approximately 24 Ma (Cantrell et al. 2008). Preliminary dating of the A3 gene expansions (supplementary fig. S5, Supplementary Material online) indicate that FIG. 5. Hypermutation analysis of endogenized retroviruses reveals that ancient bat retroviruses were subjected to restriction by A3 proteins. (A) Comparison of guanine to adenine (G to A) versus cytosine to thymidine (C to T) mutations on the coding (þ) strand of endogenous retroviruses reveals strand-biased cytosine deamination. (B) Subsequent analysis of the dinucleotide context of the G to A mutations in the coding strand reveals that the GG to AG and GA to AA mutations typical of human A3G are the most common mutations in the endogenous retroviruses of bats. Statistical significance was calculated using Fisher's Exact Test. *P < 0.05; **P < 0.01; ***P < 0.001. Hayward et al. . doi:10.1093/molbev/msy048 MBE timing of the initial expansion of the bat A3Z1 subtype may coincide with this event and further investigation of a possible connection between pteropid A3 genes and the LINE-1 extinction may be warranted.
This study represents the first report of the differential evolution of an important antiviral gene family in bats relative to other mammals. The full account and explanation of the ability of bats to act as such effective hosts and transmitters of zoonotic viral pathogens will likely consist of many complex and interlinking factors beyond simple metrics such as gene numbers and amino acid sequence characteristics. Ongoing studies of this expanded gene family should further demystify the strategies and mechanisms utilized by bats that result in an impressive tolerance for viral infections.

Generation of Bat Transcriptomes
Approval for the use of bat tissues was granted by the Commonwealth Scientific and Industrial Research Organization's Australian Animal Health Laboratory Animal Ethics Committee (Protocol AEC1281). Transcriptome data sets of P. alecto were generated from pooled total RNA from mitogen-stimulated spleen, lymph node, white blood cells, and unstimulated bone-marrow and thymus from one pregnant female and one adult male, and the unstimulated thymus of a juvenile male as described previously (Papenfuss et al. 2012 Ensembl: ENST00000348946]. The human A3 transcripts were translated into protein sequences using the CLC Genomics Workbench 8.0 (https://www.qiagenbioinformatics.com/; last accessed April 3, 2018). To identify transcripts of interest the tBLASTn function of CLC Genomics Workbench was used with the following parameters: BLOSUM62 matrix, word size ¼ 3, E-values < 1 Â 10 À12 , gap costs of existence 11, extension 1, with no filtering of regions of low complexity. Transcripts were confirmed as A3 homologues by initial identification of, in their translated protein sequences, the presence of the conserved Z-domain, Hx 1 Ex 21-27 S/TWSPCx 2-4 C (where "x" indicates a nonconserved position [Conticello et al. 2005;LaRue et al. 2009]), and subsequent identification of the A3 subtype-specific conserved motifs described in detail in LaRue et al. (2008,2009).

cDNA Analysis
To confirm A3 gene product expression, polymerase chain reaction (PCR) amplification assays were performed using cDNA generated from P. alecto spleen tissue and various combinations of forward and reverse primers designed using the A3 transcript predictions identified in the transcriptome analysis. P. alecto trapping, tissue collection and RNA extraction was performed as reported previously (Cowled et al. 2011) except RNAlater (Ambion) preserved spleen from four male adult bats was pooled prior to tissue homogenization followed by total RNA extraction with the Qiagen RNeasy Mini kit with on-column removal of genomic DNA with DNase I. Total RNA was reverse transcribed into cDNA with the Qiagen Omniscript reverse transcriptase according to the manufacturer's protocol except that the reaction contained 100 ng/ll total RNA, 1 lM oligo(dT) 18 (Qiagen), and 10 lM random hexamers (Promega).
All PCR amplification assays were performed using the Roche FastStart High Fidelity PCR system (Cat # 04738292001) utilizing an annealing temperature gradient of 54-64 C in 2 C increments. Each reaction was made up to a total of 20 ll, containing 1 unit of polymerase, 2 ng of total cDNA, and 8 pmol of each primer. All other parameters for PCR amplification were performed according to the manufacturer's recommended protocol. PCR reaction products were separated by agarose gel electrophoresis, using a 1% (w/v) agarose gel run at 120 V for 25 min. All DNA bands were excised from the gel and purified using the Qiagen QIAquick Gel Extraction kit (Cat # 28706) according to the manufacturer's protocol. The gel extracted PCR products were cloned into the pCR2.1 plasmid vector using the Invitrogen TOPO TA Cloning kit (Cat # 450641) according to the manufacturer's protocol and transformed into Escherichia coli TOP10 (Invitrogen). The nucleotide sequence of clones were determined by Sanger sequencing using the M13F (5 0 -GTAAAACGACGGCCAG-3 0 ) and M13R (5 0 -CAGGAAACAGCTATGAC-3 0 ) sequencing primers.

APOBEC3 Gene Locus Assembly
To identify the gene scaffolds representing fragments of the A3 gene locus in the P. vampyrus genome, a BLASTn analysis was performed using the A3 gene products amplified in the cDNA analysis as queries. The P. vampyrus genome was downloaded from Ensembl and all BLASTn searches were performed locally using CLC Genomics Workbench with the following parameters: Match score 2, mismatch cost À3, word size ¼ 5, E-values < 1 Â 10 À5 , gap costs of existence 5, extension 2, with no filtering of regions of low complexity. To extend the sequence of each scaffold, 150 nt at both the 5 0 and 3 0 termini was used as the query sequence for a BLASTn search against the P. vampyrus Illumina whole genome sequencing (WGS) project SRA [SRA: SRP047390] using the NCBI BLASTn suite (http://blast.ncbi.nlm.nih.gov/Blast.cgi; last accessed April 3, 2018) with the following parameters: Program ¼ megablast, maximum target sequences ¼ 500, E-values < 1 Â 10 À20 , word size ¼ 28, maximum matches in a query range ¼ 13, match score ¼ 1, mismatch cost ¼ 2, gap costs ¼ linear, with no filtering or masking. All hits with a Differential Evolution of Antiretroviral Restriction Factors in Pteropid Bats . doi:10.1093/molbev/msy048 MBE percentage identity of 97% or higher were downloaded and assembled using the CLC Genomics Workbench Assemble Sequences tool into a contiguous consensus sequence using the following parameters: Minimum aligned read length ¼ 20, alignment stringency ¼high, conflicts ¼ Vote (A, C, G, T). Consensus sequences were aligned with the original query to confirm that the assembly represented a terminally extended match of the original query sequence; they were subsequently used as a new query in an otherwise identical BLASTn search against the same SRA. This process was iteratively repeated until all scaffolds could be extended out far enough that their sequences overlapped with another scaffold. The extended and overlapping scaffolds were joined end to end to create a contiguous reference sequence construct representing the P. vampyrus A3 gene locus.
To validate the draft A3 locus reference sequence, all P. vampyrus WGS SRA were downloaded from NCBI and locally converted into paired FASTQ data sets using the NCBI SRA Toolkit 2.4.3 (http://www.ncbi.nlm.nih.gov/Traces/sra/sra.cgi; last accessed April 3, 2018). Each data set was subjected to a quality control check using the FastQC program 0.11.3 (Babraham Bioinformatics, Cambridge). Paired FASTQ data sets were imported into CLC Genomics Workbench using the Illumina import function and the CLC Genomics Workbench Next Generation Sequencing (NGS) Toolkit was used for all following tasks. All data sets were trimmed on the basis of quality using the default parameters. Data sets with short paired-read distances were assessed for the presence of overlapping pairs and these were merged where appropriate using the default parameters. The P. vampyrus data set included long paired reads with insert sizes of 2-20 kb which are helpful for the resolution of repetitive gene regions. Validation of the P. vampyrus A3 locus was performed by mapping all reads to the reference sequence construct using the following parameters: Linear gap cost, length fraction ¼ 0.9, similarity fraction 0.9, with automatic detection of paired read distances. Other parameters were left in their default settings. Potential gene coding regions were identified by mapping P. alecto A3 cDNA sequences against the locus through a local BLASTn analysis using CLC Genomics Workbench with default settings. Promoter elements were predicted using the online Neural Network Promoter Prediction 2.2 tool (Reese 2001) (http://www.fruitfly.org/seq_tools/promoter.html; last accessed April 3, 2018) using the default parameters. ISRE were predicted using the online PROMO transcription factor binding site prediction tool (Farr e et al. 2003) (http://alggen. lsi.upc.es/cgi-bin/promo_v3/promo/promoinit.cgi?dirDB=TF_ 8.3; last accessed April 3, 2018) using the parameters: Species: Mammalia, Factors: Mammalia.

APOBEC3 Phylogenetic Analysis
The identities of the putative bat A3 gene products were confirmed by phylogenetic analysis. All reference sequences were downloaded from Ensembl (https://www.ensembl.org/; last accessed April 3, 2018) and aligned with P. alecto sequences using MUSCLE (Edgar 2004). Regions of low conservation and uncertain alignment were removed using the Gblocks program (Talavera and Castresana 2007). Phylogenetic relationships were inferred using the maximum likelihood (ML) method available in the MEGA6 program (Tamura et al. 2013), employing nearest-neighbor interchange and branch swapping, and the best-fit model of amino acid substitution was found to be JTT þ G (Tamura et al. 2013). To determine the robustness of each node 1,000 bootstrap replications were incorporated.

Rifampicin Mutagenesis Assay
Bat A3 cDNAs were PCR amplified from parental vectors and subcloned into pET24(þ) vectors (EMD Biosciences). Plasmids were transformed into calcium competent C43 (DE3) strain E. coli (gift from Dr. Do-Hyung Kim) and grown in LB plates with kanamycin. Single colonies were selected and grown in LB with kanamycin for 28 h to ensure stationary phase growth. Thereafter, cultures were spread on either rifampicin plates or serially diluted in M9 salt media and spread on kanamycin plates. Colony counts were obtained after 18 h. rpoB mutation frequency was obtained by dividing the number of viable colonies on rifampicin plates by colonies on kanamycin plates and correcting for the dilution factor. A portion of the rpoB gene was PCR amplified using primers: forward 5 0 -TTGGCGAAATGGCGGAAAACC and reverse 5 0 -CACCGACGGATACCACCTGCTG. PCR products were enzymatically purified using Exonuclease I and rSAP (NEB). Purified fragments were sequenced using the forward primer (GeneWiz, NJ).

Human Cell Cultures
Human embryonic kidney cells (293 T cells) and human epithelial cervical adenocarcinoma cells modified to express constitutively high levels of HIV-1 receptors and coreceptors (TZM-bl cells), were utilized in the inhibition of infectivity assay. The TZM-bl cells were obtained through the NIH AIDS Research and Reference Reagent Program (Wei et al. 2002). All cell cultures were maintained at 37 C, 5% CO 2 in Dulbecco's modified Eagle's medium (Thermo) supplemented with heat-inactivated fetal bovine serum (100 ml/l; Invitrogen), glutamine (292 lg/ml; Invitrogen), and the antibiotics penicillin (100 units/ml; Invitrogen) and streptomycin (100 units/ml; Invitrogen).

A3 Retroviral Restriction Assay
The antiviral activity of bat A3 was assessed by examining the capacity to restrict HIV-1 infectivity. Bat A3 cDNAs were PCR amplified from parental vectors and subcloned into pcDNA3.1 mammalian expression vectors (Thermo Fisher, MA). 293 T cells were cotransfected with different quantities of bat A3 or human A3G (phu-A3GHA; [Mariani et al. 2003]) expression plasmids (25-600 ng) and 400 ng of pIIIBDvif (Simon et al. 1995), which generates HIV-1 virions in the absence of Vif. Untransfected cells and cells transfected only with pIIIBDvif were used as controls. Transfected cell cultures were incubated at 37 C, 5% CO 2 for 48 h, and then virion-containing supernatants were collected and clarified by centrifugation at 200 Â g for 5 min. Virus production was determined by quantifying virion-associated RT activity, as previously described (Wapling et al. 2005). TZM-bl cells Hayward et al. . doi:10.1093/molbev/msy048 MBE were inoculated with RT normalized supernatant and incubated at 37 C, 5% CO 2 for 48 h. HIV-1 infection of TZM-bl cells was assessed through quantitation of luciferase activity in cell lysates as previously described (Tyssen et al. 2010).

Hypermutation Analysis
The impact of cytosine deamination on the genomes of ancient bat retroviruses was assessed by hypermutation analysis (Lee et al. 2008). The genomes of all ERVs used in the analysis were extracted from the P. vampyrus genome. Multiple outgroups were tested to confirm that the position of the ancestral node remained consistent. For the betaretroviruses, we . The most closely related betaretrovirus and gammaretrovirus to our ERV groups were JSRV and MMLV, respectively, and were employed as outgroups for the phylogenies. Sequence alignments and phylogenies were conducted as described for the APOBEC3 phylogenetic analysis, and the best-fit model of nucleotide substitution was found to be T92 þ G for both groups of retroviruses. Ancestral sequences were computationally extrapolated using the MEGA6 program (Tamura et al. 2013) using the mutations accumulated in each ERV since integration by estimating the ancestral state of each node in the maximum likelihood phylogenetic tree. The state is chosen to be the one that maximizes the probability of the given sequence data. Mutational biases were then evaluated by analyzing an alignment of the ancestral sequence and the ERVs (supplementary data S2 and S3, Supplementary Material online) using the online Hypermut tool (http://www.hiv. lanl.gov/content/sequence/HYPERMUT/hypermut.html; last accessed April 3, 2018) (Rose and Korber 2000) with context enforced on the ancestral reference sequence. The Hypermut tool incorporates Fisher's exact test to determine statistical significance.

Molecular Clock Dating Analysis
To estimate the time of the expansion of A3Z1 genes in bat genome, molecular dating was performed using the Bayesian evolutionary method as implemented in BEAST v1.8.2 (Drummond et al. 2012). All reference nucleotide sequences were aligned using MUSCLE and adjusted manually. A recombination test was then performed using the RDP, GENECONV and MaxChi methods implemented in RDP3 (Martin et al. 2010). After removing recombinant sequences, all sequences were submitted to BEAST for estimation of the divergence time. Fossil node calibrations were taken from published literature (Kumar et al. 2017 Ma. Bayesian phylogeny was inferred using the GTR þ G substitution model, the Yule speciation tree prior and lognormal relaxed clock (Uncorrelated) prior. The BEAST processes were run for ten million generations until all relevant parameters converged [effective sample size (ESS)>200], with 10% of the Bayesian Markov chain Monte Carlo (MCMC) chains discarded as burn-in.

Supplementary Material
Supplementary data are available at Molecular Biology and Evolution online.