A Chromosome-Level Genome Assembly of the Reed Warbler (Acrocephalus scirpaceus)

Abstract The reed warbler (Acrocephalus scirpaceus) is a long-distance migrant passerine with a wide distribution across Eurasia. This species has fascinated researchers for decades, especially its role as host of a brood parasite, and its capacity for rapid phenotypic change in the face of climate change. Currently, it is expanding its range northwards in Europe, and is altering its migratory behavior in certain areas. Thus, there is great potential to discover signs of recent evolution and its impact on the genomic composition of the reed warbler. Here, we present a high-quality reference genome for the reed warbler, based on PacBio, 10×, and Hi-C sequencing. The genome has an assembly size of 1,075,083,815 bp with a scaffold N50 of 74,438,198 bp and a contig N50 of 12,742,779 bp. BUSCO analysis using aves_odb10 as a model showed that 95.7% of BUSCO genes were complete. We found unequivocal evidence of two separate macrochromosomal fusions in the reed warbler genome, in addition to the previously identified fusion between chromosome Z and a part of chromosome 4A in the Sylvioidea superfamily. We annotated 14,645 protein-coding genes, and a BUSCO analysis of the protein sequences indicated 97.5% completeness. This reference genome will serve as an important resource, and will provide new insights into the genomic effects of evolutionary drivers such as coevolution, range expansion, and adaptations to climate change, as well as chromosomal rearrangements in birds.


Introduction
The ecology and evolution of the reed warbler (Acrocephalus scirpaceus) has been of interest for over 40 years (Thorogood et al. 2019) as it is one of the favorite host species of the brood-parasitic common cuckoo (Cuculus canorus) (Davies and Brooke 1989;Stokke et al. 2018). Decades of field experiments have demonstrated behavioral coevolution and spatial and temporal variation in species interactions (e.g., Thorogood and Davies 2013). However, the reed warbler's response to climate change has begun to attract increasing attention. Reed warblers are experiencing far less severe declines in population size than is typical for long-distance migrants (Both et al. 2010;Vickery et al. 2014). In fact, they are expanding their breeding range northwards into Fennoscandia (J€ arvinen and Ulfstrand 1980;Røed 1994;Stolt 1999;Brommer et al. 2012), and have generally increased their productivity following the rise in temperature (Schaefer et al. 2006;Eglington et al. 2015;Meller et al. 2018). They are also showing rapid changes in phenology (Halupka et al. 2008), and migratory behavior; instead of crossing the Sahara, monitoring suggests that some reed warblers now remain on the Iberian Peninsula over winter (Chamorro et al. 2019). Morphological traits such as body mass and wing shape have been shown to change rapidly in reed warbler populations, indicating possible local adaptation (Kralj et al. 2010;Salewski et al. 2010;Saetre et al. 2017). Genetic differentiation is generally low between reed warbler populations, but moderate levels of differentiation have been connected to both migratory behavior (Proch azka et al. 2011) and wing shape (Kralj et al. 2010). Reed warblers thus provide a promising system to study population, phenotypic, and genetic responses to climate change.
Although there has been an increasing number of avian genome assemblies in recent years (e.g., Feng et al. 2020), many nonmodel species, including the reed warbler, are still lacking a genome resource. To date, the closest relative to the reed warbler with a published reference genome is the great tit (Parus major) (GCA_001522545.3, deposited in NCBI; Laine et al. 2016), but the unpublished genome of the garden warbler (Sylvia borin) is available in public databases (GCA_014839755.1, deposited in NCBI). There is also a genome in preprint from the Acrocephalus genus, the great reed warbler (A. arundinaceus) (Sigeman, Strandh, et al. 2020), but the scaffolds are not chromosome length.
Here, we present the first genome assembly of the reed warbler, based on PacBio, 10Â, and Hi-C sequencing, with descriptions of the assembly, manual curation, and annotation. This genome will be a valuable resource for a number of studies, including studies of coevolution, population genomics, adaptive evolution, and comparative genomics. For reducedrepresentation sequencing (e.g., RAD-seq) studies, it will help produce a more robust SNP set than with a de novo approach (Shafer et al. 2017). It will facilitate the detection of selective sweeps, and provide the physical localization of variants (Manel et al. 2016), thus giving insight into the potential genes involved in adaptation. Furthermore, the genome will be an important resource in the study of chromosomal rearrangements in birds.

Genome Annotation
The GC content of the reed warbler genome assembly was 41.9%. The total repeat content of the assembly was 10.94%, with LTR elements as the most common type of repeat (4.50%) followed by LINEs (4.11%) (table 1).
Using the Comparative Annotation Toolkit, based on a whole-genome multiple alignments from Cactus, we predicted 14,645 protein coding genes, with an average Coding DNA Sequence (CDS) length of 1,782 bp, and an average intron length of 2,918 bp (table 1). The annotated genes had 97.5% completeness (based on predicted proteins).

Synteny Analysis
The reed warbler genome showed high synteny with the great tit genome, though with some notable differences ( fig. 1). The reed warbler chromosome 6 is a fusion of great tit chromosomes 7 and 8, and reed warbler chromosome 8 is a fusion of great tit chromosomes 6 and 9. Interestingly, these chromosomes are not fused in the garden warbler genome (supplementary fig. 1, Supplementary Material online), but correspond to the great tit chromosomes. This suggests that the fusions evolved relatively recently, perhaps at the base of the Acrocephalidae branch within Sylvioidea, but further research is needed to determine this. Hi-C contact maps confirm that the chromosomes assembled in the reed warbler genome are unbroken (supplementary fig. 2, Supplementary Material online). Interchromosomal rearrangements are rare in avian evolution (Ellegren 2010;Skinner and Griffin 2012), with some exceptions, such as in the orders Falconiformes (Damas et al. 2017) and Psittaciformes (Furo et al. 2018). In fact, in all or most species of Psittaciformes, chicken chromosomes 6 and 7, and 8 and 9 are fused (Furo et al. 2018;Kretschmer et al. 2018)-the same chromosomes involved in the fusions discovered in the reed warbler genome. We can only speculate about the significance of this without more data. Passeriformes, the sister group of Psittaciformes, exhibit much lower rates of interchromosomal rearrangements, despite being a large, highly diverse order (Kretschmer et al. 2021). There is still a large knowledge gap in the cytogenetics of birds (Degrandi et al. 2020), and more research is needed to determine the rarity of the fusions we discovered in the reed warbler genome.
We furthermore confirm the previously identified neo-sex chromosome (Pala et al. 2012;Sigeman, Ponnikas, et al. 2020), a fusion between the ancestral chromosome Z and a part of chromosome 4A (according to chromosome naming from the zebra finch). This fusion is thought to have occurred at the base of the Sylvioidea branch (Pala et al. 2012) and is shared with all species of Sylvioidea studied so far (Sigeman, Ponnikas, et al. 2020). Figure 1 clearly shows that reed warbler chromosome Z corresponds to great tit chromosome Z, plus a part of great tit chromosome 4A, whereas reed warbler chromosome Z corresponds to garden warbler chromosome Z (supplementary fig. 1, Supplementary Material online).

Conclusion
In this study, we present the first assembled and annotated genome for the reed warbler A. scirpaceus. We have accomplished this through utilizing long read PacBio sequencing, and scaffolding with paired-end 10Â and Hi-C reads. In addition to the previously identified autosomesex chromosome fusion shared by all members of Sylvioidea, we found unequivocal evidence of two novel macrochromosomal fusions in the reed warbler genome. Further research is needed to determine the evolutionary age of these fusions, especially because they are not present in the garden warbler genome, suggesting they are relatively new. This genome will serve as an important resource to increase our knowledge of chromosomal rearrangements in birds, both their prevalence and their significance for avian evolution. Furthermore, the genome will, through the identification of genetic variants and information of the function of associated genes, provide a deeper insight into the evolution of the reed warbler, a bird which will continue to fascinate researchers for years to come.

Materials and Methods
Sampling and Isolation of Genomic DNA Blood was collected from a brachial vein of a female reed warbler (subspecies A. scirpaceus scirpaceus, NCBI Taxonomy ID: 126889) in Storminnet, Porvoo (60 19 0 24.9 00 N, 25 35 0 23.0 00 E), Finland, on May 22, 2019. Catching and sampling procedures complied with the Finnish law on animal experiments and permits were licensed by the National Animal Experiment Board (ESAVI/3920/2018) and Southwest Finland Regional Environment Centre (VARELY/ 758/2018). Reed warblers were trapped with a mist net, ringed and handled by E.K. under his ringing license. The blood ($80 ml) was divided and stored separately in 500 ml ethanol, and in 500 ml SET buffer (0.15M NaCl, 0.05M Tris, 0.001M EDTA, pH 8.0). The samples were immediately placed in liquid nitrogen, and kept at À80 C when stored. We performed phenolÀchloroform DNA isolation on the sample stored in SET buffer, following a modified protocol from Sambrook et al. (1989).

Library Preparation and Sequencing
DNA quality was checked using a combination of a fluorometric (Qubit, Invitrogen), UV absorbance (Nanodrop, Thermo Fisher) and DNA fragment length assays (HS-50 kb fragment kit from AATI, now part of Agilent Inc.). The PacBio library was prepared using the Pacific Biosciences Express library preparation protocol. DNA was fragmented to 35 kb. Size selection of the final library was performed using BluePippin with a 15 kb cut-off. Six single-molecule realtime (SMRT) cells were sequenced using Sequel Polymerase v3.0 and Sequencing chemistry v3.0 on a PacBio RS II instrument. The 10Â Genomics Chromium linked-read protocol (10Â Genomics Inc.) was used to prepare the 10Â library, and due to the reed warbler's smaller sized genome, only 0.7 ng/ml of high molecular weight DNA was used as input. A high-throughput chromosome conformation capture (Hi-C) library was constructed using 50 ml of blood, following step 10 and onwards in the Arima Hi-C (Arima Genomics) library protocol for whole blood. Adaptor ligation with Unique dual indexing (Illumina) were chosen to match the indexes from the 10Â linked-read library for simultaneous paired-end sequencing (150 bp) on the same lane on an Illumina HiSeq X platform. Both libraries were quality controlled using a Fragment analyzer NGS kit (AATI) and qPCR with the Kapa library quantification kit (Roche) prior to sequencing.
The sequencing was provided by the Norwegian Sequencing Centre (https://www.sequencing.uio.no, last accessed September 17, 2021), a national technology platform hosted by the University of Oslo and supported by the "Functional Genomics" and "Infrastructure" programs of the Research Council of Norway and the South-Eastern Regional Health Authorities.

Genome Size Estimation and Genome Assembly
The genome size of the reed warbler was estimated by a kmer analysis of 10Â reads using Jellyfish v. 2.3.0 (Marc¸ais and Kingsford 2011) and Genome Scope v. 1.0 (Vurture et al. 2017), with a k-mer size of 21. The estimated genome size was 1,130,626,830 bp.

Curation
The assembly was decontaminated and manually curated using the gEVAL browser (Chow et al. 2016;Howe et al. 2021), resulting in 521 corrections (breaks, joins and removal of erroneously duplicated sequence). HiGlass (Kerpedjiev et al. 2018) and PretextView (https://github.com/wtsi-hpag/ PretextView, last accessed September 17, 2021) were used to visualize and rearrange the genome using Hi-C data, and PretextSnapshot (https://github.com/wtsi-hpag/ PretextSnapshot, last accessed September 17, 2021) was used to generate an image of the Hi-C contact map. The corrections made reduced the total length of scaffolds by 0.5% and the scaffold count by 44.6%, and increased the scaffold N50 by 20.2%. Curation identified and confirmed 29 autosomes and the Z and W chromosomes, to which 98.6% of the assembly sequences were assigned.

Genome Quality Evaluation
We assessed the quality of the assembly with the assembla-thon_stats.pl script (Bradnam et al. 2013) and investigated the completeness of the genome with Benchmarking Universal Single-Copy Orthologs (BUSCO) v. 5.0.0 (Simão et al. 2015), searching for 8,338 universal avian single-copy orthologs (aves_odb10).

Genome Annotation
We used a repeat library provided by Alexander Suh called bird_library_25Oct2020 and described in Peona et al. (2020) to softmask repeats in the reed warbler genome assembly. Softmasked genome assemblies for golden eagle (Aquila chrysaetos), chicken (Gallus gallus), great tit (Parus major), Anna's hummingbird (Calypte anna), zebra finch (Taeniopygia guttata), great reed warbler (Acrocephalus arundinaceus), icterine warbler (Hippolais icterina), collared flycatcher (Ficedula albicollis), and New Caledonian crow (Corvus moneduloides) were downloaded from NCBI. The triangle subcommand from Mash v. 2.3 (Ondov et al. 2016) was used to estimate a lower-triangular distance matrix, and a Python script (https://github.com/marbl/Mash/issues/9#issuecomment-509837201, last accessed September 17, 2021) was used to convert the distance matrix into a full matrix. The full matrix was used as input to RapidNJ v. 2.3.2 (Simonsen et al. 2008) to create a guide tree based on the neighbor-joining method. Cactus v. 1.3.0 (Armstrong et al. 2020) was run with the guide tree and the softmasked genome assemblies as input.
We also downloaded the annotation for chicken, and used it as input to the Comparative Annotation Toolkit (CAT) v. 2.2.1-36-gfc1623d (Fiddes et al. 2018) together with the hierarchical alignment format file from Cactus. Chicken was used as reference genome, reed warbler as the target genome and the AUGUSTUS (Stanke et al. 2008) species parameter was set to "chicken." InterProScan v. 5.34-73 (Jones et al. 2014) was run on the predicted proteins to find functional annotations, and DIAMOND v. 2.0.7 (Buchfink et al. 2021) was used to compare the predicted proteins against UniProtKB/Swiss-Prot release 2021_03 (The UniProt Consortium 2021). AGAT v. 0.5.1 (Dainat 2021) was used to generate statistics from the GFF3 file with annotations and to add functional annotations from InterProScan and gene names from UniProtKB/Swiss-Prot. BUSCO v. 5.0.0 was used to assess the completeness of the annotation.

Synteny Analysis
We aligned the assembly against the great tit (Parus major) and the garden warbler (Sylvia borin) genome assemblies with minimap2 v. 2.18-r1015 and extracted only alignments longer than 5,000 bp. The bundlelinks from circos-tools v. 0.23 was used to merge neighboring links using default options and a plot was created using circos v. 0.69-8.

Supplementary Material
Supplementary data are available at Genome Biology and Evolution online.