Gene Duplication and Gain in the Trematode Atriophallophorus winterbourni Contributes to Adaptation to Parasitism

Abstract Gene duplications and novel genes have been shown to play a major role in helminth adaptation to a parasitic lifestyle because they provide the novelty necessary for adaptation to a changing environment, such as living in multiple hosts. Here we present the de novo sequenced and annotated genome of the parasitic trematode Atriophallophorus winterbourni and its comparative genomic analysis to other major parasitic trematodes. First, we reconstructed the species phylogeny, and dated the split of A. winterbourni from the Opisthorchiata suborder to approximately 237.4 Ma (±120.4 Myr). We then addressed the question of which expanded gene families and gained genes are potentially involved in adaptation to parasitism. To do this, we used hierarchical orthologous groups to reconstruct three ancestral genomes on the phylogeny leading to A. winterbourni and performed a GO (Gene Ontology) enrichment analysis of the gene composition of each ancestral genome, allowing us to characterize the subsequent genomic changes. Out of the 11,499 genes in the A. winterbourni genome, as much as 24% have arisen through duplication events since the speciation of A. winterbourni from the Opisthorchiata, and as much as 31.9% appear to be novel, that is, newly acquired. We found 13 gene families in A. winterbourni to have had more than ten genes arising through these recent duplications; all of which have functions potentially relating to host behavioral manipulation, host tissue penetration, and hiding from host immunity through antigen presentation. We identified several families with genes evolving under positive selection. Our results provide a valuable resource for future studies on the genomic basis of adaptation to parasitism and point to specific candidate genes putatively involved in antagonistic host–parasite adaptation.


Introduction
The adoption of a parasitic lifestyle represents a major niche shift that has occurred multiple times across the tree of life (Poulin and Randhawa 2015;Weinstein and Kuris 2016). The similar selective pressures involved in exploiting hosts have resulted in convergent macroevolutionary features, such as a tendency for morphological simplification (O'Malley et al. 2016) and the associated genome compaction, reduction, and streamlining across many parasite lineages (Peyretaillade et al. 2011;Chang et al. 2015;Lu et al. 2019;Slyusarev et al. 2020). At the same time, parts of the parasite genome involved in, for example, host exploitation and life-cycle complexity may have experienced expansions. Comparative genomic analyses have implied that gene duplications can drive innovation in gene function during radiations of parasitic lineages (Zarowiecki and Berriman 2015).
Novel gene functions involved in the response to host immunity may be particularly important for the evolution of parasitism. For example, mucins, a family of heavily glycosylated surface epithelial proteins, have undergone multiple rounds of duplication in the blood fluke, Schistosoma mansoni. Mucins frequently recombine, generating antigenic variation through splice variants (Roger et al. 2008). Increased life-cycle complexity, especially within the parasitic flatworms (Poulin and Randhawa 2015), may have also driven the evolution of functional novelty involved in host exploitation strategies. For instance in S. mansoni, multiple duplication events in the gene superfamily SCP/TAPS (sperm-coating protein/TPx/antigen 5/ pathogenesis-related protein 1) have led to an array of proteins that are now associated with an active role in penetration of the snail host tissues (Cantacessi and Gasser 2012). Duplicated genes, which evolve beyond sequence recognition, can also give rise to lineage-specific genes ("gained" genes), which can confer specific, novel traits, important in adaptation of that lineage to its particular niche (David et al. 2008;Takeuchi et al. 2016).
With the whole genome sequences of over 30 nematodes (roundworms) and 25 platyhelminth (flatworms, including trematodes) species, it has been possible to characterize the births and expansions of new gene families arising by duplication at key taxonomic levels (Rö delsperger 2018). Nematodes and platyhelminths are two invertebrate animal phyla consisting of parasitic and free-living organisms with the parasitic ones causing major animal, crop, and human diseases, as well as being a major economic burden (Disease and Injury Incidence and Prevalence Collaborators, GBD 2016; International Helminth Genomes Consortium 2019).
The microphallid Atriophallophorus winterbourni (syn. Microphallus sp. or Microphallus livelyi) is a digenean trematode parasite native to the lakes of New Zealand (Blasco-Costa et al. 2020). It alternates between two hosts in its life cycle; the intermediate host is Potamopyrgus antipodarum, a prosobranch dioecious mud snail (Warwick 1952;Winterbourn 1970) and the final hosts are waterfowl, mainly dabbling ducks (Lively and McKenzie 1991). Multihost life cycle is a general characteristic of all digenean trematodes, and always includes a molluscan species as an intermediate host and a vertebrate as the final host (Galaktionov and Dobrovolskij 2003) (supplementary box S1, Supplementary Material online). The metacercarial asexual stage of A. winterbourni develops in the gonad of the snail, which is consequently castrated. The adult worm stage occurs in the gut of waterfowl, where the worms reproduce sexually, producing eggs released with the waterfowl feces (Lively and McKenzie 1991). Atriophallophorus winterbourni notably lacks several life cycle stages known to occur in other digenean trematodes, including sporocyst, redia, cercaria, and possibly miracidia stages ( fig. 1). Unlike some other well-studied digenean trematodes (see fig. 1 and supplementary box S1, Supplementary Material online), A. winterbourni is not known to infect humans and has low virulence in its final bird host. The Potamopyrgus-Atriophallophorus system has been studied intensively because the parasite seems to be in a tight coevolutionary relationship with its host in natural populations (Lively et al. 2004). The host-parasite interaction has been used to test alternative explanations for the maintenance of sex in Potamopyrgus snails (Lively 1987(Lively , 1989. Previous field and laboratory studies suggest that A. winterbourni adaptation to local host populations is genotype specific to a degree that the parasite population can adapt to specifically infect the most common host genotypes, which creates negative frequencydependent dynamics between the two (Dybdahl and Lively 1996;Lively et al. 2004;Jokela et al. 2009). Additionally, recent experimental evidence has indicated that the parasite alters the behavior of the snail, causing it to migrate to the shallow parts of the lake where the final host resides (Feijen F., Buser C., Klappert K., Kopp K., Lively C., Zajac N., Jokela J., in preparation).
In this study, we assembled de novo the A. winterbourni reference genome, annotated protein-coding genes, and assigned putative functions using Gene Ontology (GO). With the knowledge from previous studies of pathways and gene families potentially important in trematode adaptation to parasitism, we used comparative genomics to contrast A. winterbourni with other trematodes. We studied the evolution of homologous gene families across the phylogeny of platyhelminths using hierarchical orthologous groups (HOGs), or sets of orthologs/paralogs which all originate from a single gene in the last common ancestor of a clade of interest (Altenhoff et al. 2013). By tracing HOGs along the species tree, it is possible to infer the evolutionary history of gene loss, gain, and duplications since the ancestral gene. Using HOGs, we reconstructed the ancestral digenean trematode genome, the Plagiorchiida ancestral genome, and the ancestral genome before the split of Xiphidiata and Opisthorchiata suborders. Using these ancestral genomes, we identified the evolutionary events (duplications, gains, and retention of 1:1 orthologs) that shaped each gene family in the lineage leading to A. winterbourni. We characterized the duplicated, gained, and 1:1 orthologs (i.e., conserved/ retained) genes shared between all trematodes, as well as those specific to A. winterbourni. We discuss the relevance and function of these gene families in A. winterbourni and search for signatures of positive selection in two of the largest gene families. We use the inferred changes in the gene content to better understand the genetic novelty necessary for adaptation to parasitic lifestyles in the lineage leading to A. winterbourni. Through outlining candidate genes for parasitism, we provide a basis for future studies on the genomics of parasite-host coevolution and we broaden the knowledge on trematode evolutionary history.

Parasite Collection and DNA Extraction
Potamopyrgus antipodarum snails infected with A. winterbourni were collected from Lake Alexandrina (New Zealand, South Island) in January 2017 from several shallow localities (<1.5 m) by pushing a kicknet through the vegetation. The snails were transported to the Swiss Federal Institute of Aquatic Science (Eawag, Dü bendorf, Switzerland) within 2 weeks of collections and were kept in boxes of 500 snails in a flow-through system that filtered the water every 12 h. Snails were fed spirulina ad libitum (Arthospira platensis, Spirulina California, Earthrise) once a day.
Infected snails were individually dissected and 200-1,000 A. winterbourni metacercariae were isolated under 10Â-20Â magnification. The metacercariae were hatched into adult worms (see supplementary methods S1, Supplementary Material online for details). Obtaining adult worms was necessary to separate the parasite from the double-walled metacercarial cyst that contained both the parasite and the snail DNA (Galaktionov and Dobrovolskij 2003). The worms were lysed using a CTAB buffer and Proteinase K (2 mg/ml) with overnight incubation at 55 C (Yap and Thompson 1987). DNA was isolated using a chloroform: isoamyl alcohol solution (24:1) and precipitated with sodium acetate (3 M). The resulting pellet was washed twice with 70% ethanol. DNA was stored in RNase/DNase-free water (Sigma-Aldrich, MO) at À20 C until sequencing library preparation.

Estimation of Genome Size
To guide the de novo genome assembly, genome size was estimated using flow cytometry with Propidium Iodide staining (CyFlow Space, Sysmex). Atriophallophorus winterbourni worms were hatched according to the above-described protocol. A pool of 15 worms was stained for 1 h with Propidium Iodide (according to the Partec protocol of CyStain PI Absolute T kit) and treated with DNase-free RNase. Three batches of 15 worms were measured, each taken from a different snail host. The DNA content of 2C nuclei was calculated using heads of isoline Drosophila melanogaster males and a laboratory clone of Daphnia galeata as two independent standards. For the haploid DNA content of Drosophila melanogaster, a value of 175 Mb (Bennett et al. 2003) was used and for Daphnia galeata a value of 158 Mb (S. Dennis, personal communication, December 12, 2019). Each standard was run separately with each batch of worms.

Sequencing
The DNA of A. winterbourni was sequenced using Illumina and Pacific Biosciences Technologies (Ambardar et al. 2016). For Illumina sequencing, two infected snails were selected from a shallow water habitat from one site sampled at Lake Alexandrina. A total of 200 ng DNA was extracted from approximately 800-1,000 worms and was sent to the Functional Genomics Center Zurich (University of Zurich, Zurich) for library preparation and paired end sequencing using the Illumina HiSeq4000 sequencing platform. A single TruSeq library was constructed from the DNA using the TruSeq Nano DNA library prep kit according to Illumina protocols, obtaining an average of 500 bp insert size. The library was sequenced without indexing on a single Illumina lane. For Pacific Biosciences sequencing, we selected 33 infected snails from two different sites from a shallow water habitat with a high infection prevalence within Lake Alexandrina. We assumed no distinct or significant population structure for the parasite from different sites within the same habitat zone, as previously shown for the snail host (Paczesniak et al. 2013). Genomic material was isolated from a pool of approximately 13,000-30,000 worms. The high-molecular weight DNA with an average length of 45,000 bp (assessed with a Bioanalyser) was sent for sequencing to the Functional Genomics Center Zurich (University of Zurich, Zurich), where it was sequenced with the Pacific Biosciences RSII sequencing platform. A 10-kb SMRT-bell library was constructed from a total of 10 mg of DNA. The library was sequenced using three SMRT cells using P6/C4 chemistry. Primary filtering was performed by Functional Genomics using the SMRT Link software from Pacific Biosciences. We performed secondary filtering, choosing only reads of at least 1,000 bp in length and with read quality >80%. No error correction was performed on the PacBio data at this stage, as it was corrected later with the Illumina data during the hybrid assembly.

Illumina Data Correction
A quality trimming step was performed with Trimmomatic 0.35 on the raw Illumina HiSeq data before proceeding with the assembly. Adapter sequences were removed and bases with a phred quality score below 5 were removed from the start and the end of the reads. Reads were scanned with a sliding window of 4 and were clipped if the average quality per base dropped below 15. Reads shorter than 50 bp were discarded. The reads were then submitted to PRINSEQ (Schmieder and Edwards 2011) for filtering for ambiguous bases (Ns), characters different than A, C, G, T or N, and for removal of exact duplicates. For assessment of contamination, we used taxonomic interrogation of the paired reads with Kraken v2, standard database (Wood and Salzberg 2014).

Hybrid Assembly and Annotation
Paired reads from Illumina were used together with long reads from Pacific Biosciences for a hybrid assembly with the MaSuRCA 3.2.3 assembler using default parameters (Zimin et al. 2013). Redundans 0.13c (Pryszcz and Gabald on 2016) and AGOUTI (Zhang et al. 2016) were used for improvements. Redundans improves the quality of the assembly by reduction, scaffolding, and gap closing (Pryszcz and Gabald on 2016). The reduction steps consist of identification and removal of heterozygous contigs, based on pairwise sequence similarity searches. Heterozygous contigs are expected to have high sequence identity (Pryszcz and Gabald on 2016). The quality was assessed using the N50 statistic, BUSCO 3.0.2 (Benchmarking Universal Single Copy Orthologs) (Waterhouse et al. 2018), and Blobtools 0.9.19.5 (Laetsch and Blaxter 2017). BUSCO 3.0.2 assesses the completeness of single copy orthologs based on evolutionary-informed expectations about gene content using the lineage data set metazoa_odb9. Blobtools 0.9.19.5 was used for taxonomic partitioning of the assembly. All scaffolds >50,000 bp (2,718 scaffolds) plus a random sample of scaffolds <50,000 bp from the assembly (2,661 scaffolds) were submitted to BLAST 2.3.0 using the NCBI nr database for taxonomic annotation. Taxonomic assessment of those scaffolds was used as input for Blobtools. The paired and filtered Illumina reads and PacBio reads of at least 1,000 bp in length and with read quality >80% were mapped back to the final assembly with BWA-MEM 0.7.17, yielding an average of 143Â coverage per base (125Â from the Illumina reads and 18Â from the PacBio reads).
Genome annotation was performed using the Maker 2.31.9 annotation pipeline (for details see supplementary methods S2, Supplementary Material online) (Cantarel et al. 2008

Comparative Genomics and Ancestral Genome Reconstruction
We selected 20 species of platyhelminthes and nematodes for comparative genomic analysis. The choice of both nematodes and trematodes was based on their comparisons in other helminth genomic analyses (Zarowiecki and Berriman 2015; International Helminth Genomes Consortium 2019) and will allow for future comparison of trematodes to model species of nematodes. The species consisted of 14 digenean trematodes (including our focal species), 3 species of parasitic cestodes, 1 species of parasitic monogeneans, and 2 species of free-living nematodes (see supplementary box S1, Supplementary Material online, fig. 2). We chose these species on the basis of close relatedness to A. winterbourni and quality of the genome assembly and annotation (species also used in International Helminth Genomes Consortium [2019]). The proteomic, genomic, and transcriptomic sequences for analysis were obtained from the NCBI database of invertebrate genomes (ftp.ncbi.nlm.nih.gov) and from the EBI database (ftp://ftp.ebi.ac.uk/). For the analysis, we used the most recent genomes from those databases with available transcriptomic data (CDS_genomic) and protein annotation (see supplementary table S1, Supplementary Material online).
The OMA standalone (Orthologous Matrix) software was used for inference of HOGs of genes shared between species . This software conducts an all-againstall comparison to identify the evolutionary relationships between all pairs of proteins included in the custom-made database of the 20 genomes. The program was run with default parameters and with the "bottom-up" algorithm for inference of HOGs. Caeorhabditis elegans and P. pacificus were specified as outgroup species. After obtaining the phylogenetic species tree (see next section), OMA was rerun with the precise species tree specified.
The data obtained from OMA was then analyzed with the python library pyHam (Train et al. 2019). With pyHam we reconstructed a model of the ancestral genomes at each stage of the phylogeny leading to A. winterbourni and carried out all comparisons between ancestral and extant genomes to obtain classes of duplicated, gained, retained, or lost genes (see Jupyter notebook supplementary material S6, Supplementary Material online). We also used pyHam to visualize genomic changes along each branch of the phylogenetic tree.
Phylogenetic Species Tree OMA Groups, that is, Orthologous Groups, from the OMA output were used for phylogenetic tree construction, as they are stringent groups of orthologs and do not contain paralogs (Zahn-Zabal, Dessimoz, Glover 2020). The phylogenetic tree was constructed following the protocol of Dylus et al. (2020). Briefly, Orthologous Groups containing at least 15 species of monogeneans, cestodes, and trematodes were extracted using the custom script filter_groups.py from the git repository: https://github.com/DessimozLab/f1000_PhylogeneticTree. Nematodes were excluded from precise phylogenetic and time tree reconstruction, as they are too evolutionarily distant. Within each Orthologous Group, sequences were aligned using MAFFT (mafft 7.273, 1,000 cycles of iterative refinement) (Katoh et al. 2009). The separate alignments were concatenated into one supermatrix using a custom script con-cat_alignment.py from the git repository: https://github.com/ DessimozLab/f1000_PhylogeneticTree. The final size of the supermatrix was 145,802 sites for all 18 species. No columns from the supermatrix were excluded. The supermatrix was used as input for IQ-TREE maximum-likelihood phylogenetic tree construction (Trifinopoulos et al. 2016;Kalyaanamoorthy et al. 2017;Hoang et al. 2018) using the ModelFinder Plus option for finding the best fitting model. Branch support was calculated with 1,000 Ultrafast bootstrap alignments and 1,000 iterations. The maximum-likelihood tree was confirmed with ASTRAL III (Zhang et al. 2018) by constructing a species tree from gene trees of the 238 Orthologous Groups. Each gene tree was first constructed with IQ-TREE using ModelFinder Plus for choosing an appropriate model; branch support was calculated with 1,000 bootstrap alignments and 1,000 iterations. The IQ-TREE tree, together with the supermatrix, were used in Mega-X 6.06 for time tree reconstruction using the Maximum-Likelihood RelTime method (Tamura et al. 2013). We used two pieces of evidence for time calibration, discussed in the Results.

GO Enrichment Analysis
We performed GO annotation for each species using Pannzer2, EggNOG (Diamond mapping mode) and OMA (Huerta-Cepas et al. 2016;Altenhoff et al. 2018;Tö rö nen et al. 2018). Each extant species genome was functionally annotated with orthology-informed putative functions using OMA, Pannzer2, and EggNOG reaching between 26% and 96% of genes annotated for each species (supplementary table S2, Supplementary Material online). We then performed GO enrichment analysis using GOATOOLS (Klopfenstein et al. 2018), which finds statistically over-and under-represented GO terms in the set of genes of interest compared with all the GO terms in the background population. For analyses that were species specific, the background set was all the genes in the genome. For analyses of ancestral genomes, the background population was all the ancestral genes, that is, the set of HOGs at that taxonomic level. To get the GO terms for any particular ancestral gene/HOG, we took the union of all the GO terms in the extant "children" species. Fisher's exact test was used for computing uncorrected P values. The P values were then corrected using the Bonferroni method and retained if the corrected P value was <0.05. Subsequently, all enriched GO terms were categorized into GO slim categories using the AGR subset (Alliance of Genome Resources, http://geneontology.org/docs/download-ontology/, l a s t accessed: May 7, 2020) and unique genes within each enriched GO slim category were counted. For each GO term, the IC (Information Content) score was calculated as: IC(t) ¼ Àlog(P(t)) with P(t) being estimated as the empirical frequency of the term in the UniProt-GOA database (Barrell et al. 2009). The average IC was calculated for each GO slim term using the IC values of all enriched GO terms in each category (Mistry and Pavlidis 2008;Mazandu and Mulder 2014). GO slim terms were used in summarizing the data.

Estimation of dN/dS in Gene Families in Atriophallophorus winterbourni
HOGs 25969 and 36190 with over 30 A. winterbourni genes were investigated for signatures of positive selection. All proteins within the two families were submitted to NCBI BLASTP to find their best hit against the nr database and obtain putative functions. We then applied the protocol from Jeffares et al. (2015) to estimate the nonsynonymous to synonymous substitution rate ratio within each HOG and to investigate whether selection models explain the data better than null models (Yang 1997;Kohlhase 2006). Protein sequences were aligned using Clustal Omega (Madeira et al. 2019), then converted to codon alignment in Phylip format with PAL2NAL (Suyama et al. 2006). Positive selection analyses are sensitive to alignment errors; thus the gap-ridden alignment of HOG 36190 was subjected to a more stringent alignment filtering, guided by the approach proposed by Moretti et al. (2014) (for details, see supplementary methods S5, Supplementary Material online). Branch site models in codeml were used to estimate dN, dS, and x (dN/dS) (model ¼ 2, NSsites ¼ 2). The likelihood ratio test (LRT) was used to determine significance. Gene trees were constructed with protein sequence alignments using IQ-TREE (Trifinopoulos et al. 2016;Kalyaanamoorthy et al. 2017;Hoang et al. 2018). First an initial parsimony tree was created by a phylogenetic likelihood library; 168 protein models were then tested for best fit with the data according to the Bayesian Information Criterion. Branch support was calculated with 1,000 bootstrap alignments (ultrafast bootstrap) and 1,000 iterations. The models chosen were JTT þ F þ G4 for HOG 25969 (general matrix with empirical amino acid frequencies from the data and discrete Gamma model with four categories) and WAG þ G4 for HOG 36190 (general matrix with discrete Gamma model with four categories).
FIG. 2.-Phylogenetic tree and classification of species used in the analysis. The data used for the tree were all Orthologous Groups from the OMA analysis with genes from at least 15 species present (238 groups of orthologs). The combined data was used in IQ-TREE to create a robust consensus species tree. The tree and the combined alignment of 238 groups of orthologs was used in Mega-X 6.06 for reconstruction of the time tree. The scale below indicates divergence time in million years (Myr). Each node has a divergence time with the confidence interval indicated in brackets in million years and a bootstrap support indicated after a slash.

Results and Discussion
Genome of A. winterbourni

Species Phylogeny and Molecular Clock
To reconstruct a robust maximum-likelihood phylogenetic tree, 238 Orthologous Groups (groups containing only orthologs, with a maximum one gene per species) shared between at least 15 out of 18 species of Platyhelminths were used. The phylogenetic estimate was well resolved and congruent with previous publications based on genetic markers or whole genomes ( fig. 2) . 2). The divergence time estimates across the phylogeny were inferred using several independent pieces of evidence, used as calibration points for Time Tree: the existence time of the prototrematode first associated with a molluscan host around 400 Ma, and the origin of Schistosoma species in the Creataceous period (66-145 Ma) (Gibson 1987;Hausdorf 2000;Blair et al. 2001;Peterson et al. 2004;Parfrey et al. 2011).
Evolutionary Patterns of Gene 1:1 Orthology, Gain, Loss, and Duplication across Trematoda The OMA analysis identified 38,144 HOGs among all the species included (2 Nematodes and 18 Platyhelminthes). Specifically, in A. winterbourni 5,815 gene families were found (comprising 7,828 out of a total of 11,499 genes, 68.1%) with the rest being identified by OMA as singletons not belonging to any family (3,671 genes, 31.9%). Comparisons of three ancestral genomes among the trematode phylogeny (the ancestral Trematoda, the ancestral Plagiorchiida, and the Opisthorchiata/Xiphidiata ancestor) revealed many duplicated and gained gene families ( fig. 3,  supplementary fig. S7, Supplementary Material online). A particularly high proportion of genomic novelty was inferred during the initial speciation of Trematoda from the Trematoda/ Cestoda common ancestor (37.2% of newly acquired genes), and again during the divergence of A. winterbourni from the most recent Opisthorchiata/Xiphidiata ancestor (31.9% of newly acquired genes, fig. 3B). The proportion of duplicated genes in the A. winterbourni genome was also high (24%) when compared with the Trematoda/Cestoda split (10.9%) ( fig. 3B). In A. winterbourni, many of the duplicated genes were found in expanded gene families (503 genes comprising 66 HOGs with a minimum of 5 duplications per HOG) and 13 of these HOGs were massively expanded, with over ten duplicated genes since the Opisthorchiata/Xiphidiata speciation (supplementary table S4, Supplementary Material online).
We found only 660 genes lost in the ancestral Trematode from the previous ancestor. We observed a progressive increase in the number of lost genes to the Opisthorchiata/Xiphidiata ancestor ( fig. 3A). The Plagiorchiida ancestor exhibited comparable gene loss to gene gain and duplication whereas in the Opisthorchiata/ Xiphidiata ancestor, gene loss exceeded the number of duplications or gains ( fig. 3A).

1:1 Orthologs in Trematodes
Based on previous studies, we assumed that many of the genes that remain conserved throughout speciation are housekeeping genes, the building blocks of the organism, and necessary for life, growth, and reproduction (Wu et al. Additionally, we found 28 single-copy orthologs present in all species, which have been maintained since the trematode ancestor. Examination of their functions through the annotations of best studied trematodes (Fasciola hepatica [NCBI 2017], Schistosoma mansoni [Protasio et al. 2012;Wang et al. 2016]) revealed that the 28 retained gene families shared between them all were largely involved in cell functioning and growth, division, and cell-to-cell or protein-to-protein interactions (supplementary results S4, table S7, Supplementary Material online).

Genes Duplicated and Gained in Trematodes
We hypothesized that the duplicated genes are more likely to be adaptive than the single-copy orthologs due to the redundant second copy being functionally The proportions (on the bars) and the total numbers (next to the bars) of retained (pink), duplicated (green), and gained (yellow) genes in each reconstructed ancestral genome leading to A. winterbourni. The oldest ancestral genome is on the left-hand side and the extant A. winterbourni genome on the right-hand side. The total number of genes per genome is above each bar beneath the name. (C) Heatmaps summarising the GO enrichment analysis of the duplicated and gained genes in the three reconstructed ancestral genomes and the extant genome of A. winterbourni. All enriched GO terms were categorized into GO slims, listed on the y-axis of each heatmap. The colors indicate the mean IC value of each GO slim category and the number printed on top is the number of unique genes within that GO slim category (see Materials and Methods). maintained through positive selection to play a new or same role within the organism (Ohno 2013;Yang et al. 2015). Multiple duplications within gene families would further suggest an adaptive importance of these key HOGs. The novel (gained) genes may similarly indicate areas of genetic innovation that were crucial in adoption of new hosts, expansion/streamlining of life cycles, and adaptation to changing environments. The origins of the gained genes may stem from neofunctionalization or high divergence of duplicated genes, therefore also potentially involved in adaptive functions as suggested by the gene duplication model of Ohno (2013).
Examination of the enriched functions from the trematode ancestor to the most recent ancestor of A. winterbourni are presented in figure 3C and appear to indicate a progressive gain and duplication of potentially adaptive genes. An "ancestral GO enrichment" analysis of the ancestral genomes was used to retrieve the putative functions of all gained genes (shared between at least 70% of the extant species in Trematoda, 66.7% of Plagiorchiida, or 50% of Xiphidiata/ Opisthorchiata) and duplicated genes (minimum five duplicated genes per family) (supplementary methods S6, Supplementary Material online). Here, we concentrate on functional analysis of ancestral genomes because the inferred gene duplications, gains, and losses are based on evidence present in all of the extant genomes. For example, a gene is inferred to be gained at a particular ancestral level if it is present in at least two species only in that clade. Therefore, ancestral genomes (i.e., internal nodes in the species tree) are more robust than extant genomes in terms of inferred evolutionary duplications, gains, and losses. Additionally, by only considering gained genes present in the majority of the extant species of a given clade, or duplicated genes present with at least five copies, we have more confidence that we are looking at bona fide gains and duplications. The ancestral genome annotation was based on combining the GO terms assigned to the extant genomes. We further categorized the enriched GO terms into GO slim categories to give a broader overview of the functions and counted unique genes within each of those categories (summarized in fig. 3C). Although there was a similar number of enriched functions for the duplicated and gained genes in the Trematoda and Plagiorchiida ancestors, we found more functions enriched in duplicated than in gained genes in the Xiphidiata/Opisthorchiata ancestor and in A. winterbourni. Considering only the duplicated genes, from the trematode ancestral genome to the A. winterbourni genome, there was a progressive increase in the number of enriched GO slim functions over time, and an overall increase in the number of unique genes contributing to each function. The increase in the number of unique genes could possibly reflect the increasing importance of this function over time or increased duplication rate of certain families.
We present the average IC per GO slim category, which can be used as a proxy to estimate the specificity of a particular GO term (see Materials and Methods). The higher the IC, the more specific a term. For the gained genes, we found a progressive increase in IC value of the different GO slim categories but we did not find an increase in the number of enriched GO slim functions or the number of unique genes within them (fig. 3C). The increase in average IC values of GO slim categories enriched for gained genes could suggest an increase in specificity of functions over time (fig. 3C). These observations are best illustrated with enriched GO slim functions such as catalytic activity (GO:0003824), including microtubule motor activity, but also cellular component organization (GO:0005634), including actin bundle filament organization and response to stimulus. A literature review relates them to the importance of the microtubule-based and actin-based cytoskeletal system building the outer body layering (tegument), through which the parasite interacts with the host environment. Microtubule associated proteins in the tegument, including tubulin, paramyosin, actin, dynein light chains, and various antiporters, participate in absorption and secretion (e.g., nitrogen utilization), transport of vesicles from sub-tegumental cells to the tegument cytoplasm, and cell motility (Githui et al. 2009;Young et al. 2010). Molecular characterization and immunostaining studies have also shown dynein light chains to function as tegument associated antigens (Hoffmann and Strand 1997;Yang et al. 1999;Jones et al. 2004), important in hiding from host immunity. The tegument has been shown to be an essential structure for adaptation to the external environment (Kim et al. 2012) including the pH of the digestive system of the hosts. Indeed, our results show dynein light chain, tegument-associated antigen, and a tubulin-beta chain to be the functions of 3 of the 12 HOGs duplicated since the Trematode ancestor and with at least 3 copies in 75% of the extant species (supplementary table S8, Supplementary Material online). We also found dynein light chain to be the putative function of one of the most duplicated HOGs in A. winterbourni (supplementary table S4, Supplementary Material online, see next section), as well as a HOG duplicated in all 14 trematode species (supplementary table S9, Supplementary Material online). Thus, we speculate the functions related to the tegument to be also of great importance in our focal species.
The results might indicate acquisition of more complex and specific adaptations to hosts and environments over time. More experimentally validated GO annotations in our species of interest could shed light on this hypothesis in the future.

Gene Loss in Trematodes
Gene loss is known to be common for intracellular parasites (Sakharkar et al. 2004;Corradi 2015) and it is much rarer in parasites with complex life cycles and multiple hosts (Zarowiecki and Berriman 2015). However, in several helminths there has been a loss of a mitochondrial gene atp8 (Egger et al. 2017) or cytochrome P450 redox enzymes (Tsai et al. 2013) as well as other functional losses and gene family contractions (International Helminth Genomes Consortium 2019). Here, we again focused on ancestral genomes because they are inferred by the accumulation of gene presence and absence information from the extant genomes, that is, if a gene is not found in all the extant genomes of a clade, we can assume it was lost in the last common ancestor of that clade. Thus, ancestral genome analysis is less prone to being undermined by poorer quality genomes (Deutekom et al. 2019). In our study, the robustness was exhibited by the number of losses being always much lower in ancestral than extant genomes (supplementary fig. S7, Supplementary Material online). We also performed a GO enrichment of the lost genes for A. winterbourni as well as the ancestors leading to it. For the ancestral genomes, the background population for GO enrichment was the union of all the GO terms in the extant children species constituting the previous ancestor to the ancestor of interest.
Although there was a progressive increase in the number of genes lost from Trematoda to Opisthorchiata/Xiphidiata ancestor, a GO enrichment analysis of lost genes did not reveal any functions to be enriched in the Trematoda or Opisthorchiata/Xiphidiata ancestor. In the Plagiorchiida ancestor we found loss of genes related to intrinsic components of membrane (GO:0016021) and wide pore channel activity (GO:0022829). We did not find any enrichment of GO terms for the lost genes in A. winterbourni. Since the functions of the lost genes appear to not be related to any specific biological processes, we speculate that there is a greater importance of gene gains and duplications in adaptation to parasitism.

Role of Gene Duplication and Gain in Driving Adaptation of A. winterbourni
The Opisthorchiata species exhibit a high similarity in life cycle traits and set of hosts. The A. winterbourni genome exhibited comparable proportions of gained, retained, and duplicated genes since the Opisthorchiata/Xiphidiata ancestor (31.9%, 44.1%, 24%, respectively) as Opisthorchis viverrini (41%, 50.5%, 8.5%, respectively), that is, in both species the highest proportion of genes was retained and the smallest proportion of genes was duplicated. On the other hand, Opisthorchis felineus exhibited a much higher proportion of genes originating through duplication since Opisthorchiata/ Xipihidiata ancestor (52.4%) and Clonorchis sinensis had the most genes originating through gain since the Opisthorchiata/Xiphidiata ancestor (54.3%). Thus, across the four species, sometimes gene duplication and sometimes gene gain seems to play a greater role in gene family evolution. However, it is important to note that inferences regarding gene duplications, gains, and losses in extant species rather than ancestral species are impacted to a greater extent by fragmentation in genome assemblies, likely inflating the numbers in these categories of genes.
The A. winterbourni genome revealed a massive expansion of 13 HOGs that occurred after the speciation from Opisthorchiata/Xiphidiata ancestor (over ten duplicated genes/HOG, comprising 221 genes, supplementary table S4, Supplementary Material online). Comparing A. winterbourni to the Opisthorchiata/Xiphidiata ancestor, two gene families stood out due to the presence of more than 30 A. winterbourni genes: HOG 25969, with 31 genes in A. winterbourni out of 56 genes in all trematodes, and HOG 36190, with 36 genes in A. winterbourni out of 72 genes. In these two families, 29 and 31 genes originated through duplication since the Opisthorchiata/Xiphidiata ancestor for HOG 25969 and HOG 36190, respectively. In any other trematode, only 1-5 copies were found. These genes were investigated for being artificially duplicated due to a high proportion of BUSCO duplicated genes found within the assembly. Genes could be considered artificial duplications due to being fragmented by breaks between scaffolds (Alkan et al. 2011). We looked at the positions of the duplicated genes of HOG 25,969 and 36,190 on their scaffolds, and we did not find this to be the case (supplementary table S10, Supplementary Material online). We thus concluded our genes are likely real duplications rather than artificial duplications due to assembly fragmentation.

Functions of Massively Expanded Gene Families in A. winterbourni
Examination of GO annotations of the 13 HOGs with over ten recently duplicated genes (supplementary table S4, Supplementary Material online) led us to speculate that the genes are likely involved in host tissue invasion and exploitation (metallohydrolases, Baskaran et al. 2017), escape from host immunity (serpins, Bao et al. 2018), and host behavioral manipulation (glutamine synthase, Helluy and Thomas 2010) (supplementary table S4, Supplementary Material online).
Specifically, we examined the two most highly duplicated gene families in depth. We determined HOG 36190 (36 genes in A. winterbourni) to be a gene family of putative glutamine synthases (supplementary table S4, Supplementary Material online). Already from the Plagiorchiida ancestor to the Opisthorchiata/Xiphidiata ancestor there was a significant enrichment in biological processes and cellular components related to glutamine family amino acid metabolic processes, including glutamate ammonia ligase activity (GO:0004356), positive regulation of synaptic transmission, glutamatergic (GO:0051968), glutamate binding (GO:0016595), glutamate catabolic process (GO:0006538), and glial cell projection (GO:0097386) (supplementary table S5, Supplementary Material online). The glutamine biosynthesis pathway is a pathway in which one of the end products is proline, a nonessential amino acid. An extremely active proline pathway has already been observed in most helminths infecting humans (Fasciola hepatica, Schistosomes), with host derived arginine used as a substrate (Ertel and Isseroff 1976;Toledo and Fried 2010;Mehlhorn 2016). These excessive proline levels have been implicated in the pathogenesis of trematode infections. Proline alters antioxidant defenses, activating secondary metabolite virulence factors, but also provides an energy source for a metabolic shift appropriate for adaptation to the host environment (Ertel and Isseroff 1976;Toledo and Fried 2010;Mehlhorn 2016). Glutamine synthase has also been found to be a marker for glial cells, immunity cells of the central nervous system. A study on Microphallus papillorobustus, a trematode parasite of Gammarus crustaceans, found disruption of the glutamine metabolism in the brain of the gammarids due to astrocyte-like glia and nitric oxide production by the parasite metacercariae, resulting in altered neuromodulation and behavior of the host (Helluy and Thomas 2010). The gene family is thus especially interesting and a potential candidate in parasite-host interactions as previous research has shown A. winterbourni to be affecting the behavior of its snail host (Feijen et al., in preparation;Levri and Lively 1996).
The second highest-duplicated gene family in A. winterbourni was HOG 25969, with 31 genes. It consists of proteins putatively encoding for O-sialoglycoprotein endopeptidase, tRNA N6-adenosine threonylcarbamoyltransferase, metallohydrolase, and/or glycoprotease/Kae1, all related to DNA repair, protein binding, and metal ion binding (supplementary table S4, Supplementary Material online). The GO annotation indicates the family to be potentially involved in DNA repair, nuclease activity, and nucleic acid phosphodiester bond hydrolysis. Metalloproteases have been found to be duplicated and under positive selection in other parasitic worms (Strongyloides papillosus), showing them to be involved in host tissue penetration at final larval stage (Baskaran et al. 2017).

Signatures of Selection in Two Expanded Gene Families of A. winterbourni
We next investigated the two highly duplicated (>30 genes) HOGs described above for signatures of positive selection. Signatures of positive selection were detected by comparing the dN/dS ratio at branches leading to the radiation of A. winterbourni genes, indicated with a #1 on the gene tree, with the dN/dS ratio of background branches ( fig. 4,  supplementary fig. S9, Supplementary Material online). Selection is generally considered negative/purifying if x (or dN/dS) is less than 1, neutral if x is 1, and positive if x is greater than 1. The tree is unrooted. Each name is a species name followed by the original gene name (protein name). Atriophallophorus winterbourni gene names are shortened version of gene names in supplementary table S10, Supplementary Material online. The numbers above branches indicate ultrafast bootstrap support, for the #1 branches the bootstrap support is after a backslash. The branches labelled with #1.X indicate the separation between the foreground branches and the background branches (distinction used in codeml for investigation of selection). The test for selection compares the dN/dS between the foreground branch and the background branches. The total number of genes in this HOG per trematode species is given next to each species name.
HOG 36190 was the most massively expanded HOG and selection was found to be acting on some but not all genes within this family. In the dN/dS ratio analysis, the null model (allowing x 1) explained the data better than the alternative model (allowing x > 1) for 2 out of 3 of the investigated branches, indicating neutral evolution (table 2, supplementary  fig. S9, Supplementary Material online). The signature of selection was detected only on one branch, a long branch leading to a subset of 13 A. winterbourni genes within this family (supplementary fig. S9, Supplementary Material online, branch #1.3). Eleven sites were identified as >50% probability to be under positive selection with one having a probability >90%. From this we conclude that selection might be acting on some, but not all, genes within this family potentially indicating a certain structure evolving under positive selection. However, considering we do not find selection on any other branches in the gene tree, it also has to be taken into account that genes in this family might be highly proliferating due to being in genomic locations prone to duplication events. Their increasing number can be causing redundancy, which can ultimately be deleterious to the organism (Schiffer et al. 2016).
For the gene family HOG 25969 the alternative model (allowing x > 1) explained the data better than the null model (allowing x 1) for all the investigated branches, indicating a signature of positive selection on all of the investigated foreground branches (table 2, fig. 4). With this result we followed up with the post-hoc Bayes Empirical Bayes (BEB) analysis implemented in the alternative model (Yang et al. 2005).
For the branch leading to all 31 A. winterbourni genes, the BEB analysis identified 39 amino acids residues to be under positive selection in the alignment with 4 sites having an over 95% probability of being selected. For sites under positive selection among different subsets of foreground lineages, see table 3. Analysis of positive selection on the structures of the enzyme showed the active, DNA or mental binding site to be under highest probability of selection suggesting an important role (supplementary fig. S8, results S5, Supplementary Material online). However, without experimental characterization it is difficult to say what role the family might be playing in A. winterbourni.

Conclusions
In our study, we report a de novo sequenced genome of a digenean trematode parasite, A. winterbourni, its phylogenetic position among other digenean trematodes, and the time of speciation of its ancestor from Opisthorchiata suborder. Using 14 other currently available and well-studied parasitic digenean trematodes, we reconstruct the ancestral trematode genome and investigate which genes have originated through duplication, which were gained and which have remained conserved (retained) through each speciation point until the extant genome of A. winterbourni. The comparative genomic approach is a powerful tool for identifying candidate duplicated gene families involved in adaptation. We find 13 gene families expanded recently in A. winterbourni, and for two we infer signatures of positive selection. Our description of candidate gene families putatively involved in parasite infectivity will facilitate the identification of genomic regions directly involved in the host-parasite coevolutionary arms race and will facilitate studying coadaptation in the laboratory. Gene expression studies in diverse life-cycle stages and functional confirmation via, for example, RNAi knockout studies will be required to provide a direct link between the genes and phenotypes involved. By focusing on gene duplications and retention across the digenean trematodes our work informs on the genomic basis of adaptation to parasitic lifestyles and paves the way for future adaptation genomics focusing on antagonistic relationships between host and parasites.