Chromosome-Level Genome Assembly of the Viviparous Eelpout Zoarces viviparus

Abstract The viviparous eelpout Zoarces viviparus is a common fish across the North Atlantic and has successfully colonized habitats across environmental gradients. Due to its wide distribution and predictable phenotypic responses to pollution, Z. viviparus is used as an ideal marine bioindicator organism and has been routinely sampled over decades by several countries to monitor marine environmental health. Additionally, this species is a promising model to study adaptive processes related to environmental change, specifically global warming. Here, we report the chromosome-level genome assembly of Z. viviparus, which has a size of 663 Mb and consists of 607 scaffolds (N50 = 26 Mb). The 24 largest represent the 24 chromosomes of the haploid Z. viviparus genome, which harbors 98% of the complete Benchmarking Universal Single-Copy Orthologues defined for ray-finned fish, indicating that the assembly is highly contiguous and complete. Comparative analyses between the Z. viviparus assembly and the chromosome-level genomes of two other eelpout species revealed a high synteny, but also an accumulation of repetitive elements in the Z. viviparus genome. Our reference genome will be an important resource enabling future in-depth genomic analyses of the effects of environmental change on this important bioindicator species.

Section S1: Nuclear genome characteristics Fig. S1: BlobPlot analysis, depicting the relation between (A) the cumulative length of the scaffolds versus the cumulative base count, showing that 94% of the genome are assembled in the first 24 scaffolds, which correspond to the 24 chromosomes of the haploid Z. viviparus genome, and (B) the logarithmic genome coverage (y-axis) against the GC content (x-axis).The taxonomic assignment of contigs indicates that no contamination is present.Imperfect TRs were identified with the Phobos software v3.3.12 using the command line parameters "-s 10 -U 51 --outputFormat 1 --NsAsMissense".This identifies TRs in the unit size range 1-51 bp.Since the highest unit length can include imperfect matches to TRs of even longer unit lengths, we used the resulting data to only analyse the repeat content in the size range 1-50 bp.A TR is reported if it has a minimum score of 10, whereby each match gets a score of +1 and each mismatch or gap a score of -5.The first unit is not scored.Unknown nucleotides are treated as mismatches.Repeat patterns reported below are always reported as normalised form.A normalised repeat pattern represents all patterns that differ from the normalised form by (i) cyclic permutations of the repeat pattern and (ii) cyclic permutations of the reverse complement.The normalised form is the pattern that comes first when sorting all patterns it represents alphabetically.

TRs found in the assemblies of the Z. viviparus genome and genomes of relatives
The proportion of nucleotides in the genome assembly of Z. viviparus that belong to TRs is 7%, which is comparable to genome assemblies of other species belonging to the infraorder Zoarcales (Fig. S5).The highest repeat content was found in the genome assembly of M. gelatinosum with a genomic coverage > 10%, with a particularly high coverage contributed by TRs in the length range 11-50 bp.In most of the analysed genomes, short tandem repeats with a unit size of 1-6 bp are dominant (Fig. S6).One notable exception is the M. gelatinosum genome, in which we detected a single dominant repeat pattern i.e., AAAAAAAAATATTTTTTTAGTTACTTTGG, with a genomic coverage of 4.1%.In Z. viviparus, di-nucleotide repeats have the highest genomic coverage (1.8%), followed by 28 bp repeats (1.1%).and 20 bp repeats (0.3%).
Z. viviparus has 29 repeats that are more than 100,000 bp long (Table S1).Notably, all long repeats are based on only a few repeat patterns.The longest connected repeat array was found on chromosome 18 and is 1,831,213 bp long, starting at nucleotide 4 and stretches until nucleotide 1,831,216.This repeat was split by Phobos into three overlapping parts.The first "split" is due to a shift in the alignment.The second split results from a shift in the pattern (Table S1).The repeat has a pattern length of 41 bp in the first two parts and a pattern length of 20 bp in the third part of the repeat (Table S1).After this 1.8 mega base pairs repeat, we found 96 bp before the same 20 bp repeat pattern is found again for another 158,801 bp.So effectively, chromosome 18 starts with a 1,990,110 bp long repeat array.The main contributor to the 28 bp peak found for the Z. viviparus genome (Fig. S6) is a 28 bp pattern (AATGAGTGAGGTCATGTCAGACCAGTGC) which only occurs in 35 repeat loci, but has a total genomic coverage of 0.7%.The longest repeat with this pattern is found in chromosome 8 in the coordinate range 25,904,445 to 26,592,201 with a repeat length of 687,757 bp and a perfection of 92.7%.Patterns that differ only by a single nucleotide substitution are found as the main pattern in several long TRs (Table S1).Moreover, several large unplaced scaffolds consist entirely of a single repeat with this pattern or one of its almost identical variants (Table S2).None of these long repeats were softmasked by RepeatMasker.

Telomeric repeat analysis
To further explore the chromosome-level nature of our Z.viviparus genome assembly, we searched for the telomeric repeat motif (TTAGGGx/CCCTAAx)n using a C++ program written by C. Mayer, whereby x can represent one or more additional bases.This TR has been first described in in the telomeric regions of human chromosomes (Moyzis et al. 1988), but was shown to be conserved across vertebrates including fish (Ocalewicz 2013) and is even considered the ancestral telomere repeat motif of metazoans (Traut et al. 2007).To identify telomeric regions within the Z. viviparus genome, we searched for exact matches of TTAGGG/CCCTAA in the first (5' end) and last (3' end) 40 kilo base pairs of the 24 main scaffolds of the assembly (Tab.S3).While the TR was clearly present at the beginning and/or end of eight scaffolds, no significant accumulation of the TR was found in the remaining 16 scaffolds.
In these 16 scaffolds, a variant of the TR motif might be present, or the telomeres were not successfully assembled.In fact, the assembly of repetitive sequence elements such as telomeres is challenging.For the Z. viviparus genome assembly, this problem is likely aggravated by its high repeat content, especially observed in the 16 scaffolds in which we could not detect the standard vertebrate telomere repeat motif (Tables S1 -S3).
Table S3: Telomeric repeat content in the 24 main scaffolds of the Z. viviparus genome assembly.Repetitions > 100 of the TTAGGG motif and its reverse complement CCCTAA are indicated in bold.

Fig. S2 :
Fig. S2: Snail plot presenting statistics of the Z. viviparus genome assembly (Z_viviparus_1_2).The plot is organised in bins, each comprising 2% of the 663 mbp assembly.The dark and light blue arcs represent the GC/AT content across the genome whereas the dark and light orange arcs indicate the scaffolds used to calculate the N50 and N90 lengths on the radial scale, respectively.The scaffold length distribution is shown in dark grey on a logarithmic radial scale where dashed lines show successive orders of magnitude.The longest scaffold is shown in red (chromosome 1, 33.2 mbp).The light grey spiral displays the cumulative scaffold length on the decadic logarithmic scale, whereby the white scale lines show successive orders of magnitude.

Fig. S3 :
Fig. S3: Merqury assembly spectrum plots between (A) the haplotypes initially generated by Hifiasm and (B) the haplotype resolved final assembly i.e., after the scaffolding and gap closing procedure.The high proportion of shared k-mers (green) indicate that no irregularities were introduced due to technical artefacts (e.g., bias during sequencing and/or assembly).

Fig. S5 :
Fig. S5: Comparison of genomic TR coverage for the genome assemblies of Z. viviparus, L. maculatus, A. ocellatus, M. gelatinosum, P. gunnellus and L. pacificus for TRs in different repeat unit length classes.

Fig. S6 :
Fig. S6: Comparison of genomic TR coverage for individual repeat unit lengths.

Table S1 :
Z. viviparus has 29 repeats in the unit size range 1-50 with a length longer than 100,000 bp.The two repeats marked by (*) are overlapping and have the same repeat pattern.They are reported as two separate repeats, since Phobos could not align the repeat across a larger gap in the alignment.All longer repeats across multiple chromosomes are based on a small number of repeat patterns, indicated by colours in the table.Similar repeat patterns have similar colours.Some repeat units consist of multiple similar copies themself.

Table S2 :
Scaffolds consisting entirely of variants of a single repeat pattern.TRs are coloured consistently with TableS1.