The gene-rich genome of the scallop Pecten maximus

Abstract Background The king scallop, Pecten maximus, is distributed in shallow waters along the Atlantic coast of Europe. It forms the basis of a valuable commercial fishery and plays a key role in coastal ecosystems and food webs. Like other filter feeding bivalves it can accumulate potent phytotoxins, to which it has evolved some immunity. The molecular origins of this immunity are of interest to evolutionary biologists, pharmaceutical companies, and fisheries management. Findings Here we report the genome assembly of this species, conducted as part of the Wellcome Sanger 25 Genomes Project. This genome was assembled from PacBio reads and scaffolded with 10X Chromium and Hi-C data. Its 3,983 scaffolds have an N50 of 44.8 Mb (longest scaffold 60.1 Mb), with 92% of the assembly sequence contained in 19 scaffolds, corresponding to the 19 chromosomes found in this species. The total assembly spans 918.3 Mb and is the best-scaffolded marine bivalve genome published to date, exhibiting 95.5% recovery of the metazoan BUSCO set. Gene annotation resulted in 67,741 gene models. Analysis of gene content revealed large numbers of gene duplicates, as previously seen in bivalves, with little gene loss, in comparison with the sequenced genomes of other marine bivalve species. Conclusions The genome assembly of P. maximus and its annotated gene set provide a high-quality platform for studies on such disparate topics as shell biomineralization, pigmentation, vision, and resistance to algal toxins. As a result of our findings we highlight the sodium channel gene Nav1, known to confer resistance to saxitoxin and tetrodotoxin, as a candidate for further studies investigating immunity to domoic acid.

These studies demonstrate that bivalve genomes are often 1 Gb or more in size, and generally exhibit large amounts of heterozygosity, related to their tendency to be broadcast spawners with excellent dispersal capabilities, resulting in large degrees of panmixia. Gene expansion has been noted as a characteristic of the clade, with some species exhibiting tandem duplications and gene family expansions, particularly in genes associated with shell formation and physiology (e.g., HSP70 [36]).
Here we describe the genome of the king scallop, P. maximus, which has been assembled from Pacific Biosciences (PacBio), 10X Genomics, and Hi-C libraries. It is a well-assembled and complete resource and possesses a particularly large gene set, with duplicated genes making up a substantial part of this complement. This genome and gene set will be useful for a range of investigations in evolutionary genomics, aquaculture, population genetics, and the evolution of novelties such as eyes and colouration, for many years to come.

Sample information, DNA extraction, library construction, sequencing, and quality control
A single adult Pecten maximus (NCBI:txid6579; marinespecies.org:taxname:140712) was purchased commercially, marketed as having been collected in Scotland. The shell was preserved and is deposited in the Natural History Museum, London, with registration number NHMUK 20170376. The adductor muscle was used for high molecular weight DNA extraction using a modified agarose plug-based extraction protocol (Bionano Prep Animal Tissue DNA Isolation Soft Tissue Protocol, Bionano Genomics, San Diego, CA, USA). DNA was cleaned using a standard phenol/chloroform protocol (phenol: chloroform: isoamyl alcohol 25:24:1, followed by centrifugation and ethanol precipitation), concentration determined with a Qubit high sensitivity kit, and high molecular weight content confirmed by running on a Femto Pulse (Agilent, Santa Clara, CA, USA). Pecten maximus is a member of the superfamily Pectinoidea, which includes Pectinidae (scallops), Propeamussiidae (glass scallops), and Spondylidae (spiny oysters), and together with their close relatives (Anomioidea, jingle shells; Dimyoidea, dimyarian oysters; and Plicatuloidea, kittenpaw clams) these superfamilies form the order Pectinida. C, Distribution map of P. maximus, showing range (dark blue) of species across northern Europe and surroundings (map from simplemaps, distribution according to [2]).
PacBio and 10X Genomics linked-read libraries were made at the Wellcome Sanger Institute High-Throughput DNA Sequencing Centre by the Sanger Institute R&D and pipeline teams using established protocols. PacBio libraries were made using the SMRTbell Template Prep Kit 1.0 and 10X libraries using the Chromium Genome Reagent Kit (v2 Chemistry). These libraries were then sequenced on Sequel 1 and Illumina HiSeq X Ten platforms, respectively, at the Wellcome Sanger Institute High-Throughput DNA Sequencing Centre. The raw data are available from the European Nucleotide Archive, with accession number ERS3230380. Hi-C reads were created by the DNA Zoo Consortium ( www.dnazoo.org) and submitted to NCBI with accession number SRX6848914. Read quality, adapter trimming, and read length were assayed using NanoPlot and NanoComp (PacBio reads) [38] and FastQC (10X reads, FastQC, RRID:SCR 014583) [39] (Supplementary File 1 [ 40,41]). PacBio libraries provided ∼65.9× coverage of this genome; 10X reads and Hi-C provided a further 113.7× and 63.4× estimated coverage, respectively, assuming a genome size of 1.15 Gb as estimated from our reads (see Fig. 2). A summary of statistics relating to these reads can be found in Table 1.

Genome assembly
PacBio reads were first assembled with wtdbg2 v2.2 using the "xsq" preset option for PacBio Sequel data [42]. The PacBio reads were then used to polish the contigs using Arrow (genomicconsensus package, PacBio tools). This was followed by a round of Illumina polishing using the 10X data, which consisted of aligning the 10X data to the contigs with longranger align, calling variants with freebayes (freebayes, RRID:SCR 010761) 1.3.1 [43] and applying homozygous non-reference edits to the assembly using bcftools-consensus [44]. Medium-range scaffolding was performed using Scaff10X v.4.2 [45]. Longer-range Hi-Cbased scaffolding was then performed on the 10X assembly by the DNA Zoo Consortium using 3D-DNA [46], followed by manual curation of difficult regions by means of Juicebox Assembly Tools [47]. A further round of polishing with Arrow was performed on the resulting scaffolds, with reads spanning gaps contributing to filling in assembly gaps. This was followed by a further 2 rounds of freebayes (freebayes, RRID:SCR 010761) Illumina polishing. Finally, the assembly was analysed and manually curated by inspection using the gEVAL browser [48].  [50] of content of the P. maximus genome. Note that little-to-no contamination of the assembly can be observed, with the small amount of sequence annotated as non-metazoan mirroring the metazoan content in GC content and average coverage. Additional Blobplot plots and data, including those separated by phylum/superkingdom, can be found in Supplementary File 2. D, Hi-C contact map based on assembly created using 3D-DNA and Juicebox Assembly Tools (see [51] for an interactive version of this panel). Full statistics regarding our assembly can be seen in Table 2. The assembly contains a total of 918,306,378 bp, across 3,983 scaffolds. The N50 is 44,824,366 bp, with 50% of the genome found in 10 scaffolds. The Hi-C analysis identified that P . maximus possesses 19 pairs of chromosomes, in agreement with a prior study [52], and these are well recovered in our assembly, with 844,299,368 bp (92%) of our assembly in the 19 biggest scaffolds, the smallest of which is 32,483,354 bp, and the largest 60,076,705 bp in length; only 0.08% of the assembly is represented as Ns (691,874 bp). The assembly was screened for trailing Ns, and for contamination against databases of common contamination sources, adapter sequences, and organelle genomes derived from NCBI (using megaBLAST algorithm, requiring e-value ≤1e−4, sequence identity ≥90%, and for organelle genome comparisons, match length ≥500 [53]). This process identified no contamination. The Hi-C contact map for the final assembly ( Fig. 2D) demonstrates the integrity of the chromosomal units. The interactive version of the contact map is available at [51] (powered by Juicebox.js [54]) and on the DNAzoo website [55]. Our assembly is the most contiguous of all published bivalve genome assemblies to date (Table 3).

Assembly assessment
The total size of our assembly, 918 Mb, falls short of previous estimates of the genome size of P. maximus, with flow cytometry estimating a genomic C-value of 1.42 [56]. Assessments of genome size based on k-mer counting using Genomescope (10,000 cov cut-off) [57] suggest that the complete genome size is ∼1.025 Gb ( Fig. 2A). Estimates using PacBio reads and Minimap2 [58], showing base pair count at each depth, put the genome size at 1,146 Mb, which is more in line with flow cytometry results. This discrepancy is likely to be caused by heterochromatic regions inaccessible to current sequencing technologies. The expected genome size of P. maximus is slightly larger than many other sequenced bivalve species, and our assembly size (in base pairs) is in line with that of other sequenced scallop species (Table 3). It is, however, half the size of the genomes of the sequenced mussels G. platifrons and M. philippinarum. Scallops therefore have intermediate genome sizes on average when compared to other molluscs -larger than oysters such as C. gigas and gastropods such as Lottia gigantea, but smaller than mussels and cephalopods. The reasons for these differences in genome size are at present unclear but may include gene duplications, repetitive element expansions, and, in some cases, whole-genome duplications (WGDs) [59].
To confirm the efficacy of the contamination screen performed during the assembly process, we verified the absence of parasitic or pathogenic sources by creating a Blobplot (Fig. 2C) using Blobtools (Blobtools, RRID:SCR 017618) [50]. We observed very few scaffolds (1.94 Mb, or ∼0.21% of our assembly) with blast similarity to Proteobacteria, but with coverage values and GC content exactly mirroring the rest of the assembly. In the majority of these cases, the assignment to Proteobacteria will be due to a chance blast match with high similarity over a small region of the contig length, rather than actual bacterial origin (Supplementary File 2 [40,41]). The vast majority of the assembly (885.71 Mb) was assigned to the clade Mollusca, as expected (Fig. 2C).
The reasonably high level of observed heterozygosity calculated by GenomeScope (GenomeScope, RRID:SCR 017014) [ 57] from raw reads (1.71%, Fig. 2A) in the P. maximus assembly is a common phenomenon in broadcast-spawning marine invertebrates [60]. It should be noted that we used freebayes-polish on our final assembly when using this resource for studies focusing on genetic diversity, and no detectable heterozygosity will remain. In our raw reads, levels of heterozygosity in P. maximus were higher than those found in the Sydney rock oyster Saccostrea (0.51%), or the Pacific oyster C. gigas (0.73%). Both of these oyster samples were derived from selective breeding programmes, which would reduce heterozygosity compared to wild populations [24].
Repetitive elements have been noted as playing an important role in genome evolution in molluscs, and in bivalves in particular (e.g., [61]). We used RepeatModeler (RepeatModeler, RRID:SC R 015027) and RepeatMasker (RepeatMasker, RRID:SCR 012954) [62] to identify and mask regions of the genome containing previously identified or novel repetitive sequences (Table 4). With the caveat that not all repetitive elements have been classified, it seems that long terminal repeats (LTRs) are less common in P. maximus compared to other species (0.52%, cf. 1.35% in S. glomerata and 2.5% in C. gigas) but that short interspersed nuclear elements (SINEs) are more common (2.19%, cf. 0.09% in S. glomerata and 0.6% in C. gigas). A total of 27.0% of the genome was classified as repetitive elements, with 16.7% of the genome made up of elements not present in preconfigured RepeatMasker libraries (but likely shared with other bivalve species). While the genome of P. maximus is large by scallop standards, its size is not due to large amounts of repetitive elements because 27.0% is low compared to many other genome resources. For example, C. gigas has a repeat content of 36% [19], and S. glomerata, 45.0% [24].

Gene prediction and annotation
Gene sequences were predicted using Augustus (Augustus: Gene Prediction, RRID:SCR 008417) annotation software [63], with 1 novel [40,41] and several previously published P. maximus RNA sequencing (RNAseq) datasets [64,65] used for training. The novel dataset was derived from 2 samples of P. maximus mantle tissue from the same specimen used for genomic DNA extraction. These were sequenced on an Illumina HiSeq to a depth of 338,910,597 reads. After initial trimming of poorquality sequence and residual adapters with TrimGalore v0.6 [66], this library was assembled using Trinity (Trinity, RRID: SCR 013048) v2.8.4 [67] with all default settings. Following assembly, chimeric, fragmented, or locally misassembled transcripts were filtered using Transrate v1.0.3 [68], where "good" transcripts were retained, followed by DETONATE (DETONATE, RRID:SCR 017035) v1.11 with the bowtie2 option [69], where transcripts scoring <0 were discarded. Transcripts were then clustered using cd-hit-est v4.8.1 [70] at an identity threshold of 95% (-c 0.95 -n 8 -g 1), and the representative sequence of each cluster was retained. The non-masked genome was used as the basis for gene prediction, to avoid artefacts, missed exons, or missing gene portions caused by gene overlap with masked areas of the genome. Training was first performed using the aforementioned RNAseq datasets, as part of the AUGUSTUS pipeline (which incorporates BLAT alignment [71]). After training, the re- These data, with comparison to Gastropoda, can be seen in Table 1 of Sun et al. [72]. sulting hints file was submitted once more to Augustus for prediction, with options regarding untranslated regions (UTRs) and gene prediction on both strands set to "true." The same messenger RNA files used for initial training were also provided to AU-GUSTUS for this prediction step. Note that UTR prediction with AUGUSTUS is imperfect in non-model organisms, and UTR regions provided here are current best estimates and would benefit from full-length RNA sequencing (e.g., Isoseq, on the PacBio platform). This annotation resulted in an initial set of 215,598 putative genes (with 32,824 genes having ≥2 alternative isoforms), resulting in 249,081 discrete transcript models. We filtered the initial gene set by comparing our gene models to 7 previously published bivalve resources (A. purpuratus, A. farreri, M. yessoensis, C. gigas, P. fucata, G. platifrons, and M. philippinarum) using Or-thofinder2 (OrthoFinder, RRID:SCR 017118), and retained genes with orthologues shared with other species (57,574 genes, further details below). To ensure that we did not discard transcribed genes absent from other bivalves but present in our resource, we also retained those genes with an empirically determined "good" hit in the nr database, lenient enough to recover genes from more distantly related species but stringent enough to avoid chance similarity (23,541 genes, diamond blastp, -moresensitive -max-target-seqs 1 -outfmt 6 qseqid sallseqid stitle pident evalue -evalue 1e-9 [73]), a total of 81,115 genes. However, we then removed from this combined total any genes that had a match within our identified repetitive elements (13,374 genes, tblastn, -evalue 1e-29 -max target seqs 1 -outfmt '6 qseqid staxids evalue' [53]). This evalue cutoff was chosen after initial trials to include genes that mapped to pol, env, tc3 transposase, Gag-Pol, and reverse transcriptase genes in automated blast. This resulted in a final, 67,741-gene, curated set, of which 16,693 genes possess ≥1 alternative transcript. Full, curated and annotated gene sets in a variety of formats can be found in online repositories [40,41].
This number, while still high in comparison to the number of genes found in many metazoan species, is comparable to the number of unigenes (72,187) in the Argopecten irradians resource [74]. To confirm the veracity of these gene models as transcribed genes, we mapped samples from a number of previously sequenced, independent RNAseq experiments to our gene models using STAR 2.7 [75] and the -quantMode GeneCounts option. This records only the reads corresponding to 1 gene, with no multimappers recorded, and is thus a highly stringent test of transcription. Of our 67,741 curated "high-confidence" gene models, 47,159 (69.6%) were transcribed in the novel mantlespecific RNA dataset presented in this article. From independent samples, 33,553 genes were transcribed in the mantle of the sole control sample from a previous heat stress experiment [64]. A total of 48,882 genes were expressed in 2 replicate late veliger controls from an experiment where embryos were exposed to a range of water conditions (varying pH, PRJNA298284) and 39,640 were expressed in MiSeq reads sampled from mixed adductor muscle, hepatopancreas, and male and female gonad tissue (PRJEB17629). In total, 57,368 of our 67,741 curated highconfidence gene models (84.7%) are supported by these independent RNAseq experiments, 54,153 (79.9%) of which were found in samples other than our novel transcriptome. These mapping results have been made available for download as Supplementary File 3 (40, 41. It should be noted that this is likely an underestimate of transcription, given that multi-mapping reads were discounted from consideration. If additional tissues and life stages were targeted, given the fact that these genes have known orthologues in closely related species (see Orthofinder2 results above), it is likely that almost all of our gene models would be found to be expressed.
The 84,866 transcripts in our high-confidence gene set (some genes possess >1 transcript) have a mean of 5 exons. This is fewer than that seen in M. yessoensis (7 exons on average) or P. fucata (6 on average) (Table S8, [22]). This may indicate a degree of fragmentation in our gene models (although that is not observed empirically), or alternatively, that some of the genes in our gene models have been copied via retrotransposition and lack introns, which would lower the average exon number and contribute to the high number of genes seen in this species.
We have performed annotation of gene complements using 2 automated methods. BLAST annotation was performed with peptide sequences using DIAMOND against the nr database (locally updated 11 November 2019) with more lenient settings than used for curation of our gene models (tblastn, -moresensitive -max-target-seqs 1 -outfmt 6 qseqid sallseqid stitle pident evalue -evalue 1e-3 -threads 4 [73]), with 88,824 of our unfiltered gene models recovering a hit, although this figure includes hits to repetitive elements removed in our curated dataset (Supplementary File 4, 40, 41). Of the 67,741 highconfidence genes, 59,772 possess a hit in the nr database (88.2%), indicating a highly annotatable dataset. We also used the KEGG-KAAS automatic annotation server, using peptide sequence and the Bidirectional Best Hit (BBH) method. The standard eukaryotic species set, complemented with L. gigantea, Pomacea canaliculata, C. gigas, M. yessoensis, and Octopus bimaculoides was used for annotation, with 14,495 of our gene models mapping to KEGG pathways (Supplementary File 5, 40, 41).

Gene complement and expansion
We investigated the gene complement of P. maximus to understand the nature of the events that resulted in it and other scallops possessing a large number of annotated genes compared with related mollusc species. This analysis was performed predominantly using Orthofinder2 (-t 8 -a 8 -M msa -T fasttree settings and using only the longest transcript per gene for P. maximus, Fig. 3A). These statistics reveal that P. maximus exhibits little gene loss compared with other related species. The percentage of orthogroups containing P. maximus genes is very high (83.4%) compared to every other species examined. Pecten maximus has therefore lost fewer genes from the ancestrally shared cassette than any of the other species listed. Pecten maximus also possesses 518 species-specific orthogroups-comparatively more than any other species listed. These genes could be true novelties because they are not found in any of the 8 other species of bivalve examined here, but they may be derived from repetitive content, as the unfiltered P. maximus gene set was used as the basis of this comparison.
Using these results, we are also able to understand the prevalence of gene duplication across the phylogeny of bivalves. Gene duplication events were inferred from the orthogroup analysis and mapped onto the phylogeny of the 8 bivalve species examined here (Fig. 3B). We conclude that gene duplication events are common in extant species of bivalve, and some gene duplicates are shared by leaf nodes as a result of events in the stem lineage. However, duplications in P. maximus are particularly prevalent. With 28,880 unique duplications, P. maximus has more than double the number of duplicates of any other species, with M. yessoensis the next closest example. However, it should be noted that not all gene annotations were performed in an identical fashion, and particularly if genes have been missed in other species, e.g., through sparse RNAseq for gene prediction, this will negatively influence their counts in these results.
Of the genes that are shared with other lineages, P. maximus has a highly complete complement (Fig. 3C). No other species examined here possesses as many shared orthogroups in total or shares as many with other species. In pairwise comparisons, only the mussels M. philippinarum and G. platifrons show similar numbers of shared orthogroups with each other, but not with other species. This is consistent with the previous finding that the scallop M. yessoensis is closer in gene complement to the oysters C. gigas and P. fucata than the oysters are to one another [22], a fact reflected in early divergence of these 2 distantly related oyster species [77]. Scallops in general therefore have a better-conserved gene cassette compared to the ancestral genotype than exhibited in oysters.
We conclude that P. maximus has a well-conserved gene set, which has been added to substantially by gene duplication. Its large gene complement is therefore explained by a strong pattern of gene gain, coupled to very little gene loss.

Hox genes
The prevalence of gene duplication within P. maximus led us to consider whether a WGD event had occurred in this lineage. As a test for this, we used the well-conserved Hox and Parahox gene clusters, which are normally preserved as intact complexes and duplicated in the presence of additional WGD events (e.g., [78,79]).
P. maximus possesses a single Hox cluster spanning nearly 1.73 Mb (from 28,829,013 to 30,558,725 bp) on scaffold HiC scaffold 2 arrow ctg1 (Fig. 4A). It also features a single Parahox cluster on scaffold HiC scaffold 5 arrow ctg1. The complex, like that of M. yessoensis [22], is stereotypical. This evidence, along with a lack of any obvious signal in our k-mer plots (Fig. 2) or previous karyotypic work [52], suggests that no WGD has taken place, although this possibility cannot be completely excluded.

Immunity to neurotoxins
Bivalves are known to accumulate a number of toxins derived from phytoplankton, and human ingestion of contaminated bivalves can result in 5 known syndromes: ASP caused by DA, PSP from STX, diarrhetic shellfish poisoning from okadaic acid and analogues, neurotoxic shellfish poisoning caused by brevetoxin and analogues, and azaspiracid shellfish poisoning from azaspiracid [16]. Adult P. maximus are relatively immune to STX and DA and, as such, may be vectors for the syndromes PSP and ASP, which are of the greatest concern to human health [80,81]. STX and brevetoxin are neurotoxins that bind to the voltagegated sodium channel, blocking the passage of nerve impulses [83]. Previous studies have shown that genetic mutations within the sodium channel gene, Neuron Navigator 1 (Nav1), confer immunity in taxa that accumulate STX (e.g., the soft-shell clam Mya arenaria [84], scallop Azumapecten farreri [21], and copepods Calanus finmarchicus and Acartia hudsonica [85]) or other similaracting neurotoxins such as tetrodotoxin (TTX) (e.g., pufferfish, Tetraodon nigroviridis and Takifugu rubripes; salamanders [86][87][88][89]; and the venomous blue-ringed octopus [90]).
The P. maximus Nav1 gene possesses the expected canonical domain structure observed in other taxa. Furthermore, it possesses the characteristic thymine residue in Domain 3 (Fig. 5, position 1,425 in reference to rat sodium channel IIA), also described in the other 2 scallop species sequenced so far, which has been shown to confer resistance to these toxins in pufferfish, copepods, and the venomous blue-ringed octopus [85][86][87]. It does not, however, have the E945D mutation seen in the softshell clam M. arenaria and some pufferfish, which experimental evidence suggests also confers resistance [84], nor the D1663H or G1664S mutations in the blue-ringed octopus [90]. Instead, it has 1 novel and 2 ancestrally shared changes (shared with scallops and other bivalves) that may be of interest in studying alternative means of resistance in this molecule.
Unlike STX and TTX, DA does not directly target sodium channels; instead it mimics glutamate and binds preferentially to glutamate receptors including N-methyl-D-aspartate (NDMA), kainate, and α-amino-3-hydroxy-5-methyl-4isoxazolepropionic acid (AMPA) receptors, leading to elevated levels of intracellular calcium and potentially, calcium toxicity Arrows show direction of transcription where known. B, Phylogeny of P. maximus Hox and Parahox genes alongside those of known homology from previous work [95,96] inferred using MrBayes (MrBayes, RRID:SCR 012067) [97] under the Jones model (1,000,000 generations, with 25% discarded as "burn-in") from a MAFFT alignment under the L-INS-I model [98]. Numbers at base of nodes are posterior probabilities, shown to 2 significant figures. Branches are coloured by gene. Figure 5: Domain alignments (generated using MAFFT using the E-INS-I model [98]) of the sodium channel Nav1 showing residues (text in red, highlighted in yellow) implicated in resistance to the neurotoxins tetrodotoxin (TTX) and saxitoxins (STX). Species of vertebrate and mollusc known to be resistant to TTX or STX [86][87][88][89] are shown alongside species and sub-populations with no resistance to these toxins. Species (and sub-populations) that produce or accumulate these toxins with little or no ill effect are marked with a skull-and-crossbones. Pecten maximus (bold text) shares a thymine residue in domain 3 known to confer neurotoxin resistance in several other species. It also has a number of residues (shown in green text with amber background) in Domains 3 and 4, which are either unique to P. maximus or shared with other resistant shellfish, but not seen in other species. These residues are good candidates for testing for a functional role in resistance in the future. [9,13]. A recent study, however, has shown that extracellular sodium concentration plays a crucial role in excitotoxicity of DA [99], suggesting that mutations that we observe at Nav1 may also confer a degree of immunity to DA in P. maximus. This has ramifications for the study of neurotoxin resilience and prevalence in the increasingly important commercially fished populations of P. maximus.

Conclusions
The genome of Pecten maximus presented here is a wellassembled and annotated resource that will be of utility to a wide range of investigations in scallop, bivalve, and molluscan biology. It is, to date, the best-scaffolded genome available for bivalves, despite the heterozygosity seen in this clade. Given that this assembly is based on state-of-the-art long-range data and has undergone structural verification, this resource will be key for comparative analysis of structural variation and long-range synteny. The curated gene set of this species exhibits little loss compared to other sequenced bivalve species and possesses numerous duplicated genes, which have contributed to the largest gene set observed to date in molluscs. The genes are well annotated, with 88.2% of our high-confidence gene set mapped to a known gene. This genome has already yielded a range of insights into the biology of P. maximus and will provide a basis for investigations into fields such as physiology, neurotoxicology, population genetics, and shell formation for many years to come.