A reference genome for the long-term kleptoplast-retaining sea slug Elysia crispata morphotype clarki

Abstract Several species of sacoglossan sea slugs possess the incredible ability to sequester chloroplasts from the algae they consume. These “photosynthetic animals” incorporate stolen chloroplasts, called kleptoplasts, into the epithelial cells of tubules that extend from their digestive tracts throughout their bodies. The mechanism by which these slugs maintain functioning kleptoplasts in the absence of an algal nuclear genome is unknown. Here, we report a draft genome of the sacoglossan slug Elysia crispata morphotype clarki, a morphotype native to the Florida Keys that can retain photosynthetically active kleptoplasts for several months without feeding. We used a combination of Oxford Nanopore Technologies long reads and Illumina short reads to produce a 786-Mb assembly (N50 = 0.459 Mb) containing 68,514 predicted protein-coding genes. A phylogenetic analysis found no evidence of horizontal acquisition of genes from algae. We performed gene family and gene expression analyses to identify E. crispata genes unique to kleptoplast-containing slugs that were more highly expressed in fed versus unfed developmental life stages. Consistent with analyses in other kleptoplastic slugs, our investigation suggests that genes encoding lectin carbohydrate-binding proteins and those involved in regulation of reactive oxygen species and immunity may play a role in kleptoplast retention. Lastly, we identified four polyketide synthase genes that could potentially encode proteins producing UV- and oxidation-blocking compounds in slug cell membranes. The genome of E. crispata is a quality resource that provides potential targets for functional analyses and enables further investigation into the evolution and mechanisms of kleptoplasty in animals.


Introduction
The ability to feed on algae and temporarily sequester functional chloroplasts (called kleptoplasts) has convergently evolved multiple times in invertebrates (Van Steenkiste et al. 2019).In no lineage is this strategy more common than in the gastropod superorder Sacoglossa, a group of sap-sucking sea slugs (Händeler et al. 2009;Maeda et al. 2021).In slug digestive cells, phagocytosed algal material is selectively degraded, leaving only kleptoplasts intact, which provide the slug with nitrogen and fixed carbon (Trench et al. 1974;Raven et al. 2001;Curtis et al. 2010;Cruz et al. 2020).The length of time kleptoplasts remain photosynthetically active differs between slug species, ranging from a few days in species such as Elysia cornigera to over eleven months in Plakobranchus ocellatus (Clark et al. 1990;Evertsen et al. 2007;Händeler et al. 2009).Species that can incorporate functional kleptoplasts for up to 2 weeks of starvation are referred to as short-term-retention forms, and those that retain functional kleptoplasts for over 20 days are considered long-term forms (Christa, Händeler, et al. 2014).Phylogenetic analyses indicate that the ability to retain short-term and long-term kleptoplasts has evolved multiple times in sacoglossan slugs (Christa et al. 2015;Maeda et al. 2021).Moreover, kleptoplasts acquired from the same algal donor species can have wildly different retention times in different slug species.For example, Elysia crispata (synonym Elysia clarki) retains long-term kleptoplasts for several months from multiple algal donors, including Penicillus capitatus, whereas Elysia patina also harbors chloroplasts from P. capitatus, but the organelles are fully degraded within 2 weeks (Curtis et al. 2010).This suggests that slug-dependent factors contribute to kleptoplast stability and retention.Yet the mechanisms by which sacoglossan sea slugs maintain functional kleptoplasts for extended periods of time remain unclear.
In algae, most chloroplast genes are encoded in the nuclear genome, and key proteins for photosynthesis such as photosystem

Slug and algal material and laboratory culturing in aquaria
Wild-caught E. crispata were purchased from a local aquarium shop in Lafayette, Indiana (United States).The slugs were shipped overnight to the shop after being collected from the vertical walls of an undisclosed canal in the Florida Keys (United States).
Slugs were acclimated in 20 L aquaria containing artificial seawater (ASW) prepared using distilled water and Instant Ocean Reef Crystals at a specific gravity of 1.023 at 25°C (77°F).Slugs were then transferred to connected 180-200 L aquaria containing ASW recirculated through a 280 L sump tank and a Pentair SMART UV sterilizer to reduce microorganismal growth.The water temperature was maintained at 25°C using a standard aquarium heater.Aquaria were illuminated with white LED lights (Fluval Sea Marine 3.0 LED Aquarium Lights set at 75% of maximum intensity) on a 12:12 light:dark cycle.Every week, ∼50% of the system volume was replaced with fresh ASW, and freshly cultured algae was introduced.Aquaria were monitored daily for the deposition of egg masses.Egg masses were collected within 1 day of deposition.
Macroalgae were cultured on 15 × 30 × 1 cm gridded plastic racks in a separate 220 L aquarium containing recirculated ASW filtered through a 20 L sump tank.To help promote algal growth, a constant turbulent water flow was maintained using 3 circulation pumps positioned across the top, middle, and bottom of the aquarium in alternating directions.Weekly maintenance consisted of treating the water with 50 µg mL −1 rifampicin and changing out 100% of the ASW 24 h later.Aquarium ASW was supplemented 3 times per week with 1 h of gentle bubbling of compressed carbon dioxide and dosing with 100 mL each of 20 g L −1 potassium nitrate (Thermo Fisher, Waltham, MA, United States), 1.5 g L −1 potassium dihydrogen phosphate (Thermo Fisher, Waltham, MA, United States), and 1:20 diluted Guillard's F/2 trace elements (Algae Research Supply, San Diego, CA, United States).

Sampling, RNA extraction, and RNA sequencing and processing
Wild-caught adult slugs deposited egg masses ∼2-3 weeks after being introduced in the laboratory aquaria.Egg samples (Fig. 1d) for RNA extraction were rinsed in fresh ASW, flash-frozen in liquid nitrogen, and stored at −80°C within 24 h of deposition.Each biological replicate was derived from using the entire egg mass deposited by an individual slug.
To obtain free-swimming veligers, crawling larvae, and juvenile slugs, each remaining collected and rinsed egg mass was transferred to an individual petri dish containing 25 mL of freshly prepared ASW and supplemented with 5 µg mL −1 rifampicin and 500 ng mL −1 ivermectin to eliminate the potential growth of cyanobacteria, bacterial pathogens, and predators.Individual egg masses were maintained at room temperature under white LED light (Fluval Sea Marine 3.0 LED Aquarium Lights set at 75% of maximum intensity) on a 12:12 light:dark cycle.Egg masses were carefully rinsed and filtered through 52-micron nylon mesh and transferred to new ASW with rifampicin and ivermectin every 2 days.Free-swimming veliger larvae (Fig. 1e) hatched ∼21-28 d after egg deposition and were collected by centrifuging the ASW media at 1000 × g for 1 min.Pellets of free-swimming veliger samples were rinsed in ASW, flash-frozen in liquid nitrogen, and stored at −80°C until RNA extraction.Each biological replicate was derived from using all the free-swimming veligers that hatched from a single egg mass deposited by an individual slug.The hatchlings from the remaining unused egg masses were allowed to metamorphose into larvae.Crawling larvae (Fig. 1f) were maintained as pools derived from individual egg masses and kept in petri dishes as described for egg masses.Crawling larvae samples were prepared as described for veliger samples ∼3-5 d after metamorphosis.Each biological replicate was derived from using all the larvae derived from a single egg mass.The larvae derived from the remaining unused egg masses were introduced into 16 × 20 cm Pyrex dishes containing freshly prepared ASW, 5 µg mL −1 rifampicin, 500 ng mL −1 ivermectin, and ∼1 g of macroalgae.After reaching a length of ∼1-2 cm, juvenile slugs (Fig. 1c and g) were collected, flash-frozen in liquid nitrogen, and stored at −80°C until RNA extraction.Each biological replicate consisted of 3 individual slugs.
Total RNA was isolated from 100 mg of pulverized flash-frozen E. crispata eggs, free-swimming veligers, crawling larvae, or juvenile slugs using TRIzol reagent (Thermo Fisher Scientific, Waltham, MA, United States) according to the manufacturer's protocol.Extracted RNA samples were digested with DNaseI (New England Biolabs, Ipswich, MA, United States) and column purified using an RNA concentrator and cleanup system (Zymo Research, Irvine, CA, United States).The eluted RNA samples were quantified using a NanoDrop spectrophotometer.Purified samples were measured for RNA integrity (RIN) scores at the Purdue Genomics Core Facility.Samples with RIN scores ≥7 were selected for library prep.Three biological replicates of each E. crispata developmental stage of the slug were sent to Novogene Corporation Inc. (Sacramento, CA, United States) for library construction (NEBNext Ultra RNA Library Prep Kit, New England Biolabs) from 1 µg total RNA and Illumina sequencing.Paired-end (2 × 150 bp) reads were quality filtered and adapter trimmed using fastp v0.20.1 (Chen et al. 2018).

Sampling, DNA extraction, and genome sequencing and assembly
Genomic DNA (gDNA) for Illumina sequencing was extracted from E. crispata eggs reared in petri dishes (without algae).Samples were pulverized in liquid nitrogen using a TissueLyser II (Qiagen, Hilden, Germany).DNA was extracted using 1 of 2 methods, a CTAB-based protocol for samples with high mucopolysaccharide content (dx.doi.org/10.17504/protocols.io.kxygxp81zl8j/v1) modified from Winnepenninckx et al. (1993) or a DNAzol Reagent (Invitrogen, Waltham, MA, United States) protocol (dx.doi.org/10.17504/protocols.io.n92ldprrnl5b/v1).Quantity and quality of DNA for each sample were assessed using a Qubit 4 fluorometer (Invitrogen, Waltham, MA, United States) and a TapeStation 4150 (Agilent, Santa Clara, CA, United States).No difference in quality between the CTAB-and DNAzol-based protocols was observed, and the samples were pooled prior to Illumina sequencing.The sequencing library was constructed using an NEBNext DNA library prep kit (New England Biolabs), and 2 × 150-bp paired-end reads were sequenced using an Illumina NovaSeq 6000 by Novogene Corporation Inc. (Sacramento, CA, United States).Illumina gDNA read quality was assessed by FastQC v0.10.0 (Babraham Bioinformatics 2011).Illumina TruSeq adapters and low-quality reads were removed with fastp v0.20.1 (Chen et al. 2018) using default parameters.
gDNA for Oxford Nanopore Technologies (ONT) sequencing was extracted from 2 E. crispata sample types: brains dissected from wild adults and whole bodies of wild adults starved for a minimum of 2 weeks (Table S1).Both the CTAB and DNAzol protocols referenced above were used to generate highmolecular-weight DNA for ONT long-read sequencing (Table S1).Short DNA fragments less than 10 kb were depleted using an SRE XS kit (Circulomics, Baltimore, MD, United States) according to the manufacturer's protocol.At least 1.9 μg of high-molecularweight gDNA was used as input for ONT LSK-109 library ligation kits and sequenced on R9 MinION flow cells.Six flow cells were necessary to generate sufficient read depth for whole genome assembly (Table S1).Base calling of ONT reads was performed with Guppy v6.1.2(Oxford Nanopore Technologies, Oxford, United Kingdom).ONT reads were quality filtered by Filtlong (Wick 2022) using the Illumina reads as an external reference and with the following parameter settings: -trim -split 500 -keep_percent 90 -min_length 3000.
Two programs for genome assembly were evaluated: CANU v2.2 (Koren et al. 2017) using an estimated genome size of 711 Mb and Flye v2.9 (Kolmogorov et al. 2019) using thekeep-haplotypes and -nano-raw parameter settings.Three programs for error correction and genome polishing were evaluated: Racon v1.5.0 (Vaser et al. 2017), Medaka v1.6.0 (nanoporetech 2022), and NextPolish v1.4.0 (Hu et al. 2020).Racon and Medaka polishing runs were performed using ONT reads only.NextPolish polishing runs were performed using ONT and Illumina reads as references.Racon polishing was iteratively performed up to 4 times.All polishing programs were run using default parameters.Polished assemblies were compared to each other and to the unpolished starting assemblies using QUAST-LG v5.2.0 (Mikheenko et al. 2018) and BUSCO v4.0.6 (Manni et al. 2021) run in genome mode with the eukaryo-te_odb10, metazoan_odb10, and mollusca_odb10 lineage data sets.The Flye assembly was selected as the final published assembly as it resulted in longer contigs (Table S2) and better BUSCO recovery compared to CANU (Table S3).Polishing results were comparable across the different iterations.No single iteration performed best for all BUSCO odb10 data sets: 3 rounds of Racon polishing yielded the best BUSCO recovery of the eukaryote_odb10 data set (96.08% complete and single copy), 1 round of Racon polishing followed by Medaka and NextPolish yielded the best BUSCO recovery of the metazoan_odb10 data set (93.82% complete and single copy), and a single round of Medaka polishing yielded the best BUSCO recovery of the mollusca_odb10 data set (87.76% complete and single copy).Therefore, the 3 best iterations were all carried forward to gene annotation and protein prediction (see Genome and gene annotation methods section below).
Ultimately, 1 round of Racon polishing followed by Medaka and NextPolish was selected as the final polishing strategy based on its superior BUSCO recovery of all three odb10 data sets when run in protein mode (Table S3).A flowchart of the genome assembly workflow is provided in Fig. S2.

Genome and gene annotation
De novo repeat identification was performed using RepeatModeler v2.0.1 (Flynn et al. 2020), and the resulting repeat library was used to inform repeat masking with RepeatMasker v4.0.7 (Smit et al. 2017).Gene model and protein prediction was conducted using BRAKER2 v2.1.5(Hoff et al. 2019;Brůna et al. 2021).For the initial run, BRAKER2 was supplied the E. crispata assembly with (1) repeats soft-masked, (2) a custom protein database comprised of metazoa_odb10 and the predicted proteomes of 23 mollusk genome assemblies on NCBI (Table S4), and (3) the E. crispata Illumina RNA-Seq data aligned to the genome using STAR v2.7.10 (Dobin et al. 2013).We noticed that highly conserved, multicopy genes (e.g.tandem duplicates of histone protein H3) were missing from this initial run, and manual investigation revealed that these types of gene families had been incorrectly modeled as repetitive elements by RepeatModeler.Therefore, we performed a second BRAKER2 run using the unmasked assembly.Gene models resulting from BRAKER2 annotation with and without the use of the masked reference assembly were independently filtered to retain the highest-quality gene models.Unique gene models resulting from the unmasked run were identified through coordinate intersection with the masked gene set using BEDTools (Quinlan and Hall 2010) and were added to the final gene set if they satisfied the following criteria: (1) had a significant BLASTp hit (e-value < 1e −3 ) to a database of model animal proteomes that included human (GRCh38), zebrafish (GRCz11), mouse (GRCm39), and 2 mollusks Crassostrea virginica (C_virginica-3.0)and Pomacea canaliculata (v1.0), (2) did not intersect with a classified repeat element (LINE, SINE, LTR, etc.), and (3) did not have exons that were >50% simple sequence repeats.A flowchart of these curation steps is provided in Fig. S3.
During the Alien Index (AI) analysis (see Methods section on HGT below), we flagged 194 contigs as bacterial contamination, which were excluded from the nuclear genome assembly.Gene models were removed if they were located on any contig that was flagged from the AI pipeline.Additionally, we flagged the proximal end of contig_7836 as misassembly likely resulting from kleptoplast contamination.The region consisted of 16,750 kb and contained 4 genes, including 2 genes annotated as Photosystem I P700 chlorophyll a apoprotein (PsaA/PsaB)-like.Typically, PsaA and PsaB are both encoded on the chloroplast genome of chlorophyte algae.To investigate the accuracy of the assembly at this locus, Illumina gDNA and ONT reads were aligned to the genome using BWA MEM (Li and Durbin 2009) and minimap2 (Li 2016) using default parameters, respectively.Alignments relative to genes and repeat elements were visualized using Integrated Genomics Viewer (Robinson et al. 2011).No Illumina reads joined the locus in question to the rest of contig_7836.only a single ONT read mapped to the "gap" between the contig ends, but it mapped poorly (Fig. S6).Altogether, this indicated that this region was likely incorporated into the nuclear genome assembly in error rather than a true HGT event.Therefore, this region of contig_7836 was manually removed from the final genome assembly and the 4 gene models removed from the final gene set.The resulting E. crispata assembly and gene annotations following these steps were designated as final (v1).

Search for HGT
We searched the E. crispata genome for possible HGT using the AI score as previously described (Wisecaver et al. 2016).Briefly, each predicted protein sequence was queried against the same custom protein database used for BlobTools (see above) with DIAMOND v2.0.8.146 (Buchfink et al. 2015).A custom python script sorted the DIAMOND results based on the normalized bitscore (nbs), where nbs was calculated as the bitscore of the single best-scoring Highscoring Segment Pair to the subject sequence divided by the best bitscore possible for the query sequence (i.e. the bitscore of the query aligned to itself).The AI score is given by the formula: AI = nbsO − nbsM, where nbsO is the normalized bit score of the best hit to a species outside of the Metazoa lineage (NCBI: txid33208), nbsM is the normalized bit score of the best hit to a species within Metazoa skipping all hits to Placobranchoidea (NCBI: txid71491), a sublineage within Sacoglossa that contains all currently available genomes from kleptoplastic slugs including E. crispata.AI scores range from −1 to 1, being greater than 0 if the predicted protein sequence had a better hit to a nonmetazoan sequence, suggestive of either HGT or contamination (Wisecaver et al. 2016).Contigs were flagged as contamination if the minimum AI score for all genes on a contig was > 0 or if at least half of genes on a contig had an AI score > 0.1.All contamination-flagged contigs were bacterial in origin and were excluded from the nuclear genome assembly (see Methods section on genome and gene annotation above).
We filtered HGT candidates (AI > 0) to those that were most likely to be phylogenetically informative by requiring AI > 0.1 and total database hits ≥ 50.Phylogenetic trees of protein sequences were constructed for all filtered AI-flagged HGT candidates.Full-length proteins corresponding to the top 200 hits (e-value < 1e −10 ) to each HGT candidate were extracted from the local database using esl-sfetch (Eddy 2009).Protein sequences were aligned with MAFFT v7.471 using the E-INS-i strategy and the BLOSUM30 amino acid scoring matrix (Katoh and Standley 2013) and trimmed with trimAL v1.4.rev15 using its gappyout strategy (Capella-Gutierrez et al. 2009).Phylogenies were inferred using maximum likelihood as implemented in IQ-TREE v1.6.12 (Nguyen et al. 2015) using an empirically determined substitution model and 1000 rapid bootstrap replications.The phylogenies were midpoint-rooted, and branches with local support < 95 were collapsed using the ape and phangorn R packages (Paradis et al. 2004;Schliep 2011).Phylogenies were visualized using ETE v3 (Huerta-Cepas et al. 2016) and inspected manually to identify phylogenetically supported HGT candidate proteins.

Species phylogenies
Sacoglossan 28S ribosomal DNA and histone H3 sequences were downloaded from the NCBI nucleotide database (Fig. S7), and corresponding 28S and H3 sequences in E. crispata were identified using BLAST.For the 28S phylogeny, sequences were aligned with MAFFT v7.471 using the G-INSI-i iterative refinement method (Katoh and Standley 2013).H3 sequences were aligned with MAFFT using the RevTrans online web portal (Wernersson and Pedersen 2003).Maximum likelihood phylogenies were constructed using IQ-TREE v2.2.0 (Minh et al. 2020) using the built-in ModelFinder to determine the best-fit nucleic acid substitution model (Kalyaanamoorthy et al. 2017) and 1000 ultrafast bootstrap replicates.The concatenated ML tree was constructed based on the combined 28S and H3 data matrix (Chernomor et al. 2016) consisting of 2 partitions and 2006 sites.Phylogenies were visualized using ETE v3 (Huerta-Cepas et al. 2016).

Gene expression quantification and differential expression analysis
Quantification of gene expression was performed using Kallisto v0.46.2 (Bray et al. 2016).The Kallisto index was built using all BRAKER2 predicted transcripts with the default k-mer size of 31.Transcripts per million (TPM) gene abundance values from Kallisto were scaled using the average transcript length averaged over samples and to library size, using the lengthScaledTPM option in tximport v1.18.0 (Soneson et al. 2016).Quality control of the data was performed via principal component analysis, which showed clustering of all biological replicates and strong separation of crawling larva and juvenile slug sample types.Veliger and egg sample types also showed distinct clusters but with slight overlap of 1 egg replicate close to 1.5 standard deviations from the veliger cluster in PC1 and PC2 space (Fig. S8).
A differential expression analysis was performed to compare gene expression in the juvenile slug samples (postfeeding) to the 3 prefeeding developmental stages (egg, veliger, and crawling larvae).Raw gene counts from Kallisto were passed to EdgeR v3.32.1 (Robinson et al. 2009), and only genes with an average TMP > 1 in at least 1 of the 4 developmental stages were retained.Raw gene counts were normalized using the TMM (trimmed mean of M values) method (Robinson and Oshlack 2010).Exact tests were conducted using a trended dispersion value and a double tail reject region.The false discovery rate (FDR) was calculated using the Benjamini-Hochberg (BH) procedure (Benjamini and Hochberg 1995).

Identification and analysis of orthologous gene families
Homology between the predicted proteomes of E. crispata and 4 other gastropods (Table S5) was determined using OrthoFinder v2.5.4 (Emms and Kelly 2019) with sequence similarity searches performed by DIAMOND (Buchfink et al. 2015).When multiple protein isoforms were present, the longest protein sequence per gene was selected for the OrthoFinder analysis.Functional annotations were performed on the predicted proteome of E. crispata using the eggNOG-mapper v2 online service using default settings (Cantalapiedra et al. 2021).Hypergeometric tests for enrichment of functional categories were performed in python using the SciPy library hypergeom (Virtanen et al. 2020), and P-values were adjusted for multiple comparisons using the StatsModels library multitest with the BH method (Benjamini and Hochberg 1995;Seabold and Perktold 2010).Gene ontology (GO) terms were collapsed to medium length lists of representative GO terms using the simRel somantic similarity score in Revigo (Supek et al. 2011).

Ketoacyl synthase domain phylogeny
The E. crispata polyketide synthase gene EcPKS1 (Torres et al. 2020) was queried using blastp against the same custom database used for the AI analysis, and full-length sequences corresponding to the top 2000 significant hits (e-value < 1e −10 ) were extracted.All significant hits were queried using hmmscan (Eddy 2009) against 3 ketoacyl synthase domain profiles present in fatty acid and polyketide synthases (including EcPKS1): PF00109, beta-ketoacyl synthase, N-terminal domain; PF02801, beta-ketoacyl synthase, C-terminal domain; and PF16197, ketoacyl synthetase C-terminal extension.For each homolog, the maximum sequence region that spanned all 3 domain profiles was extracted and retained for phylogenetic analysis; extracted domain regions < 350 or >650 amino acids were excluded.To reduce redundancy in the final phylogeny, database sequences were collapsed with CD-HIT v4.8.1 using a sequence identity threshold of 0.8 (Fu et al. 2012); sequences from kleptoplastic slugs were not collapsed.The final sequence set was aligned with MAFFT v7.471 using the L-INS-i strategy (Katoh and Standley 2013).Phylogenies were inferred using maximum likelihood as implemented in IQ-TREE v1.6.12 (Nguyen et al. 2015) using an empirically determined substitution model and 1000 rapid bootstrap replications.The phylogeny was visualized using ETE v3 (Huerta-Cepas et al. 2016).

E. crispata genome assembly and annotation
Sequencing data for the E. crispata genome assembly consisted of 18.98 Gbp of ONT long reads (2,504,236 reads with an N50 of 8,728 bp) and 75.25 Gbp of paired-end Illumina short reads.A k-mer frequency analysis of the Illumina data indicated an estimated heterozygosity of 0.88% and a projected genome size of 711.11Mb (Fig. S9).The final genome assembly consisted of 8,089 contigs spanning 786.3 Mb, with an N50 of 458.73 kb.To evaluate completeness and coverage, we aligned the ONT and Illumina gDNA reads to the E. crispata genome assembly.Coverage histograms showed a single dominant peak at 20× and 86× coverage for ONT and Illumina reads, respectively (Fig. S9), indicating that the genome assembly was largely homozygous without a significant amount of redundant haplotigs.The proportion of the genome with ≥10× read support was high: 751.15 Mb (96.29%) using ONT reads and 757.17Mb (96.29%) using Illumina reads.
Repetitive elements made up 29.85% of assembly bases (Table S6).Most repetitive elements were unclassified, which comprised 17.7% of the genome assembly.Simple repeats made up 7.19%. of the genome.Classified retroelements and DNA transposons comprised 2.85% and 1.47% of the genome, respectively.The repeat content in E. crispata is comparable to the repeat content observed in the other sequenced sacoglossan genomes (29-33% repetitive) (Cai et al. 2019;Maeda et al. 2021).
A total of 68,514 genes encoding 71,367 transcripts were predicted using a combination of ab initio, homology-based, and transcriptome-based prediction methods (Table 1).The average protein-coding gene was 5,779.57-bplong and contained 4.82 exons.Functional annotations were ascribed to 24.23%, 15.51%, and 15.01% of genes using the PFAM, KEGG, and GO databases, respectively (Table 1).Within the E. crispata proteincoding gene set, 96.65% (922/954) of BUSCO conserved metazoan genes were identified as complete, and of those, 94.03% were present in single copy, and 5.97% were duplicated (Table S3).Furthermore, 99.22% (253/255) of BUSCO conserved eukaryota genes were identified as complete; of those, 92.50% were present in single copy, and 7.51% were duplicated (Table S3).This level of BUSCO recovery is the greatest of any currently available genome assembly from kleptoplastic sacoglossans (Fig. 2a; Table S7).
The E. crispata genome assembly is similar to E. marginata in genome assembly size and number of protein-coding genes (Fig. 2a).Gene count is noticeably elevated in E. crispata, E. marginata, and P. ocellatus (n > 68,000) compared to a fourth kleptoplastic Sacoglossa, E. chlorotica (n = 24,980), as well as the nonkleptoplastic sea hare, Aplysia californica (n = 19,945).We suspected the difference in gene count was due to an abundance of de novo-predicted gene calls in the former species set.To address this, we determined the number of E. crispata-predicted proteincoding genes with transcriptome-or homology-based support.Most of the de novo predicted genes in E. crispata (66.55%) showed evidence of expression, having an average length-scaled TPM > 1 in at least 1 of the 4 developmental stages surveyed (Fig. 2b; Table 1).Fewer genes had homology-based support; 41.58% were assigned to gene families containing 1 or more species in addition to E. crispata, and 26.21% of genes were assigned 1 or more functional annotations via EggNOG-mapper (Fig. 2b; Table 1).We created a high-confidence gene set of 27,767 genes, which we defined as those genes with 2 or more sources of support (i.e.RNA-seq, orthology, and EggNOG-mapper).Median gene length and exons per transcript were increased in the high-confidence gene set compared to the total set (Table 1).Both gene sets can be downloaded from FigShare (see Data availability).When accessing the data, we encourage users to select the gene set most appropriate for their biological question.

Assessment of HGT
We calculated the AI for all E. crispata predicted proteins to identify possible contamination and cases of HGT.The AI screen flagged 194 contigs as bacterial contamination, which were subsequently excluded from the final assembly (see Materials and methods).Only 7 proteins were flagged as candidate HGTs with top hits to diverse organisms in Apicomplexa, Ochrophyta, Proteobacteria, and Riboviria (Table S8).None of the candidate HGTs had predicted functions associated with photosynthesis or plastids (Table S9).We used a custom phylogenetic pipeline to manually evaluate all candidate HGTs and found that all but 1 lacked phylogenetic support for HGT.The only HGT with phylogenetic support was Ecla3748g428790 (Fig. 3), a single-exon gene with homology to a fragment of RNA-dependent RNA polymerase (RdRp) from negative-sense RNA viruses of invertebrates.The Ecla3748g428790 transcript had low but measurable expression support in our RNA-seq data set with length-scaled TPM > 1 in 3 of 4 developmental stages surveyed (Table S9).RdRp genes appear to be frequently transferred into the nuclear genomes of their eukaryotic hosts (Dolja and Koonin 2018), supporting the inference that this sequence was horizontally acquired from a virus in E. crispata.

Phylogenetic placement
Slugs were identified as the clarki morphotype of E. crispata based on macroscopic morphological features, including a near-uniform green coloration with small round white spots, a transparent and tapered foot, and the presence of a gap in the parapodia in the midline near the head (Fig. 1a, b, and g) (Pierce et al. 2006;Krug et al. 2016).To confirm the placement of our assembled genome in the Sacoglossa species tree, we constructed concatenated maximum likelihood phylogeny consisting of 2 nuclear loci coding for the 28S large ribosomal subunit and histone protein H3.Both 28S and H3 have served as DNA barcode genes in a previous phylogenetic analysis of the lineage (Händeler et al. 2009;Christa, Gould, et al. 2014, 2015;Krug et al. 2016;Martín-Hervás et al. 2021).The 28S sequence from the E. crispata genome was 99.58% identical (1406/1412 bp) to 28S amplified from another E. crispata collected from the Florida Keys (GenBank accn: KM230502, Krug et al. 2016), and the H3 genome sequence was 100% identical (327/ 327 bp) H3 amplified from the same sample (GenBank accn: KM040828, Krug et al. 2016), supporting its identification as E. crispata based on morphology.Genome assembly sequences grouped with E. crispata and sister species Elysia ellenae (Ortea et al. 2013;Krug et al. 2016) in both the concatenated and single-gene phylogenies with strong support (ultrafast bootstrap ≥ 98; Fig. 4; Fig. S7).

Gene expression across developmental life stages
We tracked E. crispata gene expression across 4 developmental life stages.Three life stages (egg, veliger, and crawling larva) were grown in the absence of algae and contained no kleptoplasts.The fourth life stage, juvenile slug, was allowed to feed on macroalgae and acquired green pigmentation indicative of active kleptoplast sequestration (Fig. 1g).Most predicted genes were not expressed under any of the four life stages (n = 22,925; average length-scaled TPM < 1); most genes in this category (95.7%; n = 21,932) were not assigned to the high-confidence set (Fig. 5a; Table S9).In contrast, gene models that were expressed across all 4 developmental stages (n = 19,772; average length-scaled TPM > 1) were mostly comprised of high-confidence genes (Fig. 5a).A total of 5,863 gene models were uniquely expressed in the kleptoplast-containing juvenile slugs with no measurable expression in the unfed life stages; of those, 2,044 gene models (34.9%) were considered high-confidence (Fig. 5a).RNA-seq analysis identified 3318 differentially expressed genes (DEGs) that were significantly more abundant in kleptoplastcontaining juvenile slugs compared to the unfed developmental life stages (−log 2 FC > 2; BH adjusted P-value < 0.01; Fig. 5b).A test for significant enrichment of GO functional categories in DEGs identified 561 enriched GO terms (hypergeometric test, BH-adjusted P-value < 0.05; Table S10).Among the most abundant were GO terms such as GO: 0051239, regulation of multicellular organismal process; GO: 0032879, regulation of localization; GO: 0009605, response to external stimulus; and GO: 0002376, immune system process (Table S10).We hypothesize that DEGs in this analysis include candidate genes involved in long-term kleptoplast retention.

Candidates for kleptoplasty-associated genes
To distinguish conserved gastropod developmental genes from genes potentially involved in kleptoplasty, we performed an OrthoFinder analysis (Emms and Kelly 2019) using the predicted proteome of E. crispata and 4 additional gastropod genomes (Fig. 6).Included in our analysis were genomes from 2 sacoglossan slugs capable of long-term kleptoplast retention, E. chlorotica (Cai et al. 2019) and P. ocellatus (Maeda et al. 2021).A third Sacoglossa was capable of short-term kleptoplast retention, E. marginata (Maeda et al. 2021).Lastly, 1 nonkleptoplastic gastropod relative (the sea hare, A. californica) was also included (see Table S6 for genome accessions).The OrthoFinder analysis identified 119,336 orthogroups (predicted gene families), of which 21,821 were present in 2 or more species in the analysis.Excluding singletons, most orthogroups were present in all 5 gastropod species (n = 11,249; Fig. 6a).Within these conserved orthogroups, 1,054 E. crispata sequences had significantly higher expression in juvenile slugs compared to the earlier developmental stages (Fig. 6b).Functional enrichment of GO terms indicated that many of these genes in E. crispata have putative functions in the regulation of  S12 for references.See Fig. S7 for single-gene phylogenies of 28S and H3.multicellular organismal processes (GO: 0051239), developmental processes (GO: 0032502, and GO: 0050793), and anatomical structure morphogenesis (GO: 0009653) (Fig. 6c; Table S10).
A total of 2,165 orthogroups were present in all kleptoplastic Sacoglossa and absent in the nonkleptoplastic sea hare (Fig. 6a).Within these sacoglossan-specific orthogroups, 554 E. crispata sequences had significantly higher expression in juvenile slugs compared to the earlier developmental stages (Fig. 6b).These genes showed significant enrichment of GO categories involved in the regulation of ROS (GO: 2000377 and GO: 1903427), regulation of nitric oxide (GO: 0045019), and adaptive immune response (GO: 0002819) (Fig. 6d; Table S10).
Previous genomic and transcriptomic analyses of P. ocellatus identified 3 types of genes that were highly duplicated in the P. ocellatus genome and overexpressed in kleptoplast-containing tissues, suggesting they may be involved in kleptoplast retention (Maeda et al. 2021).We searched the E. crispata genome for sequences belonging to these 3 gene families: apolipoprotein D-like (KEGG: K03098), cathepsin D-like (KEGG: K01379), and lectin-like genes (KEGG: ko04091).The E. crispata genome contained 11 apolipoprotein D-like genes and 6 cathepsin D-like genes (Table S9), but neither gene family was enriched in genes with higher expression in kleptoplast-containing slugs compared to earlier developmental stages.In contrast, the E. crispata genome contained 307 lectin-like genes, 40.4% of which had higher expression in kleptoplast-containing slugs (Table S9).Moreover, lectins was the only KEGG functional category significantly enriched in the sacoglossan-specific gene set (hypergeometric test, BH-adjusted P-value = 1.3e −6 ; Table S11).
A polyketide synthase in E. chlorotica (EcPKS1) was recently shown in vitro to synthesize the precursor for 7-and 8-propionate pyrones (Torres et al. 2020), which are proposed to function as UVand oxidation-blocking compounds in slug cell membranes (Powell et al. 2018).Type 1 iterative polyketide synthase genes in kleptoplastic slugs, including EcPKS1, appear to have evolved from fatty acid synthase (FAS) genes present in all animals that are required for primary metabolism (Torres et al. 2020).We identified several PKS/FAS homologs encoded in the E. crispata genome (Fig. 7).A phylogeny of the ketoacyl synthase domain from these sequences showed that 1 E. crispata gene (Ecla2149g328170) grouped with FAS genes from E. chlorotica and other gastropods.Four genes in E. crispata were members of an expanded clade of sacoglossan-specific PKS-like genes (Fig. 7b).Ecla12132g95200 (EclaPKS1) grouped with EcPKS1 in a subclade specific to species capable of long-term kleptoplast retention.Ecla7432g605620 (EclaPKS2) and Ecla7432g605630 (EclaPKS2-like) grouped with EcPKS2.In a third PKS clade, Ecla2359g352500 (EclaPKS3) grouped with an uncharacterized gene in E. chlorotica (GenBank accn: RUS77019), which we called EcPKS3 following the naming convention used previously by others (Torres et al. 2020).EclaPKS1 and EclaPKS3 were mostly highly expressed in kleptoplast-containing juvenile slugs (Fig. 7c); however, neither was statistically significantly overexpressed in juvenile slugs compared to the unfed developmental stages (Table S9).

Discussion
The E. crispata genome assembly serves as a high-quality reference for future functional and evolutionary studies in kleptoplastic sea slugs.With calculated BUSCO scores of 93-99% (Table S7), the E. crispata gene space is the most complete of all sequenced sacoglossans (Fig. 2) and contends with the most complete of all mollusk genomes (Caurcel et al. 2021).As in previous genome level analyses of kleptoplastic sea slugs (Wägele et al. 2011;Bhattacharya et al. 2013;Maeda et al. 2021), we found no evidence of HGT from host algae.Nevertheless, our search strategy detected 1 case of probable HGT from viruses (Fig. 3).Alternative mechanisms for kleptoplast maintenance must be considered and investigated to better understand how these sea slugs can retain functional kleptoplasts for so long in the absence of algal nuclearencoded chloroplast genes.
The ability to retain long-term kleptoplasts has evolved multiple times independently in Sacoglossa, but the exact number and timing of transitions from short-term to long-term kleptoplasty remains unclear (Christa et al. 2015).E. crispata belongs to the Elysia West Atlantic subclade (Krug et al. 2016), a lineage that is unique in that it contains several species that retain longterm kleptoplasts including E. crispata, E. chlorotica, Elysia diamedea, and E. viridis (Fig. 4; Table S12).The prevalence of long-term kleptoplasts in this subclade suggests that long-term retention may be an ancestral characteristic of this lineage.However, another species within the subclade, E. serca, does not retain functional kleptoplasts (Clark et al. 1990), and several members of this subclade (e.g.E. canguzua, E. ellenae, and E. evelinae) are yet to be studied for kleptoplast retention.Characterizing the retention time for these species and determining the timing of transition(s) to long-term kleptoplasty in the West Atlantic subclade would inform our understanding of how kleptoplasty evolved in sacoglossan slugs.A single ancestral origin for long-term retention in this subclade would suggest that 1 or more rare innovations facilitated this key transition.In contrast, if long-term retention arose repeatedly Orange intersection indicates orthogroups present in all species; green intersection indicates orthogroups present in all kleptoplastic species and absent in the outgroup.b) For the orange and green intersections, DEGs with increased abundance in fed slugs compared to unfed life stages (see Fig. 5b).Tests for enrichment of functional categories were performed on these 2 DEG categories: c) top 25 most abundant GO categories significantly enriched in DEGs with homologs in all other species and d) top 25 most abundant GO categories significantly enriched in DEGs with homologs in all other kleptoplastic species and absent in the outgroup (see Table S10).GO categories highlighted in the text are bolded.
in the subclade (i.e.parallel or convergent evolution), the transition to long-term retention may be the result of relatively simple genetic mechanisms.Addressing this question requires additional taxonomic sampling and phenotyping in this group.Combined, comparative genomic and gene expression analyses identified 554 genes in E. crispata that were both unique to kleptoplast-containing slugs and more highly expressed in juvenile slugs compared to unfed early developmental stages.Genes within this intersection included lectin carbohydrate-binding proteins (Table S11) as well as those involved in regulation of ROS, production of nitric oxide, and adaptive immunity (Fig. 6d).Similar gene families were identified in a genomic analysis of P. ocellatus, another long-term kleptoplastic Sacoglossa (Maeda et al. 2021), providing additional evidence that genes in these categories may play a role in prolonged kleptoplast retention in these slugs.Recently, Torres et al. (2020) identified 3 type 1 polyketide synthase genes in E. chlorotica that may play a role in protecting the kleptoplast from UV and oxidative damage.Our analysis of this gene family identified 4 PKS genes in E. crispata (Fig. 7), including 1 belonging to a new PKS3 subclade, suggesting that polyketide-based products in kleptoplastic slugs may be more diverse that previously recognized.Future work is necessary to develop E. crispata into a genome-enabled model system for functional investigation of these genes and others to better understand the genetic and cellular mechanisms of kleptoplasty.

Fig. 1 .
Fig. 1.Elysia crispata morphotype clarki.a) Wild-caught adult slug, dorsal view, showing near-uniform dark coloration with small round light spots indicative of clarki morphotype.b) Close up of wild-caught adult slug showing parapodial gap indicative of clarki morphotype.c) Lab-reared cornucopia of slugs.d) Egg clutch 24-h postdeposition.e) Free-swimming veliger.Larvae.f) Crawling larvae.g) Lab-reared juvenile slug, ventral view, showing transparent and tapered foot indicative of clarki morphotype; pigmentation indicates presence of kleptoplasts.Scale bar in b and g = 10 mm.Scale grid in d, e, and f = 2 × 2 mm.

Fig. 2 .Fig. 3 .
Fig. 2. Genome assembly and annotation statistics.a) Comparison of assembly completeness and contiguity of 4 kleptoplastic sea slugs (E.crispata, E. chlorotica, P. ocellatus, and E. marginata) and the sea hare A. californica.b) Area-proportional Venn diagram (Micallef and Rodgers 2014) depicting overlapping support for E. crispata gene models.Support categories are as described in Table1.Genes with 2 or more sources of support were assigned to the high-confidence gene set.

Fig. 4 .
Fig. 4. Phylogenetic analysis of sacoglossan sea slugs.Maximum likelihood analysis of concatenated 28S and H3 loci.E. crispata genome sequence is in red and bolded.Asterisk (*) indicates additional E. crispata sequence from Krug et al. (2016).Tree was rooted on the Thuridilla/Plakobranchus clade.Numbers along select branches indicate IQ-TREE ultrafast bootstrap support values (≥95) for the descendant nodes.The color bar indicates the kleptoplast retention state for each species: long-term retention (>20 days) and short-term retention (>1 and <20 days); see Table S12 for references.See Fig. S7 for single-gene phylogenies of 28S and H3.

Fig. 5 .
Fig. 5. Gene expression in different slug life stages.a) Upset plot displaying intersection of gene expression across 3 unfed life stages (purple: eggs, veligers, and crawling larvae) and in algae-fed juvenile slugs (green).Filled circles and connecting lines indicate expression support under those conditions (average length-scaled TPM > 1).above bars indicate the total number of genes present in each intersection.Red bars indicate the proportion of each intersection comprised of high-confidence genes.b) Distribution of E. crispata gene expression comparing algae-fed slugs to developmental stages.Dashed lines indicate significance thresholds for differential expression (−log 2 FC > 2; adjusted < 0.01).Number indicates count of DEGs with increased abundance in fed slugs.

Fig. 6 .
Fig. 6. family analysis of DEGs.a) Upset plot depicting intersection of orthogroups among four kleptoplastic sea slugs and the sea hare A. californica.Orange intersection indicates orthogroups present in all species; green intersection indicates orthogroups present in all kleptoplastic species and absent in the outgroup.b) For the orange and green intersections, DEGs with increased abundance in fed slugs compared to unfed life stages (see Fig.5b).Tests for enrichment of functional categories were performed on these 2 DEG categories: c) top 25 most abundant GO categories significantly enriched in DEGs with homologs in all other species and d) top 25 most abundant GO categories significantly enriched in DEGs with homologs in all other kleptoplastic species and absent in the outgroup (see TableS10).GO categories highlighted in the text are bolded.

Table 1 .
Summary statistics of E. crispata gene models.
b Assigned 1 or more annotation via EggNOG-mapper.c Excluding orthogroups unique to E. crispata.