Abstract

Incredibly powerful whole genome studies of conservation genetics, evolution, and biogeography become possible for non-model organisms when reference genomes are available. Here, we report the sequence and assembly of the whole genome of the little vermilion flycatcher (Pyrocephalus nanus; family Tyrannidae), which is an endemic, endangered, and declining species of the Galapagos Islands. Using PacBio HiFi reads to assemble long contigs and Hi-C reads for scaffolding, we assembled a genome of 1.07 Gb comprising 267 contigs in 152 scaffolds, scaffold N50 74 M, contig N50 17.8 M, with 98.9% assigned to candidate chromosomal sequences and 99.72% of the BUSCO passeriformes 10,844 single-copy orthologs present. In addition, we used the novel HiFiMiTie pipeline to fully assemble and verify all portions of the mitochondrial genome from HiFi reads, obtaining a mitogenome of 17,151 bases, containing 13 protein-coding genes, 22 tRNAs, 2 rRNAs, two control regions, and a unique structure of control region duplication and repeats. These genomes will be a critical tool for much-needed studies of phylogenetics, population genetics, biogeography, and conservation genetics of Pyrocephalus and related genera. This genome and other studies that use it will be able to provide recommendations for conservation management, taxonomic improvement, and to understand the evolution and diversification of this genus within the Galapagos Islands.

Significance

The genus Pyrocephalus (family Tyrannidae) comprises four recognized species with nine subspecies, distributed within North, Central, and South America, including the Galapagos Islands. The taxonomy of the genus is in flux, and the species endemic to Galapagos include one recently extinct species (Pyrocephalus dubius), and several declining populations of another vulnerable species (Pyrocephalus nanus). This genome will provide valuable reference for much-needed phylogenetic, population genetic, and conservation genetic work on this genus and species. This and additional studies are being done to help advise active management of this species in Galapagos.

Introduction

The Galapagos Islands are recognized for their geographical isolation, high endemism, and the biogeographic patterns of evolution within the archipelago. Of 213 native bird species recorded, 48 are endemic (Charles Darwin Foundation 2024). Of these endemic species, the complete genome has been assembled at the scaffold level for only four species, including the Medium Ground Finch (Geospiza fortis) (Zhang et al. 2014), the Galapagos Flightless Cormorant (Phalacrocorax harrisi) (Burga et al. 2017), Galapagos Penguin (Spheniscus mendiculus) (Pan et al. 2019), and the Small Tree Finch (Camarhynchus parvulus) (Rubin et al. 2022).

Due to increasing anthropogenic pressure and decreasing population sizes of many endemic birds in Galapagos (Fessl et al. 2015; Dvorak et al. 2017, 2020; Geladi et al. 2021), there is a pressing need for high-resolution genetic conservation studies of threatened species in Galapagos. One important endemic species is the Galapagos little vermilion flycatcher, Pyrocephalus nanus, which was once distributed throughout most of the Galapagos archipelago (12 islands) (Gifford 1919), but in recent decades, its populations have disappeared from four islands and are on the verge of disappearing from two more (Merlen 2013; Fessl et al. 2015; Leuba et al. 2020).

Pyrocephalus nanus is found in multiple distinct climate and ecosystem types (BirdLife International 2023) from xeric/desert ecosystems near the coast (Rothschild and Hartert 1899; Gifford 1919) to the tops of volcanoes with evergreen humid vegetation and frequent precipitation (Mosquera et al. 2022). The lack of a reference genome or detailed genetic study has limited investigations into its population history, taxonomy, evolutionary biology, and its adaptations to various ecosystem types, although preliminary work suggests that there is significant variation among populations (Carmi et al. 2016). Current bird conservation programs in Galapagos aim to recover and restore the former range distribution of this species, and therefore, knowledge obtained from genomic data will be useful for conservation planning and action (García-Dorado and Caballero 2021). The aim of this study is to assemble the whole genome of P. nanus as a tool for guiding research and management. Additionally, the genome should be a useful resource for studies of one of the largest avian families Tyrannidae, which includes Pyrocephalus.

Results

Reference Whole Genome

We sequenced two cells of PacBio HiFi reads that produced 6.84 million reads with 788.79 billion bases, N50 read length 12,741 bp and mean read length 12,551 bp. The Hi-C library produced 103.42 million paired reads, and we retained 99.94 million paired reads after cleaning. We removed 145 sequences of 19,364,419 bases from the HiFiasm contig-only assembly classified by purge_dups as Haplotig, Repeat, Junk, or Highcov. After contig assembly, removal of purge_dup sequences, scaffolding with Hi-C reads, and manual contig placement and orientation, we obtained a final assembly of 1.073 Gb, in 152 scaffolds and 267 contigs (Fig. 1 and Table 1). By comparing the long scaffolds to the chromosome-level assembled genome of the zebra finch (NCBI RefSeq GCF_003957565.2), and using telomere sequences at the end or beginning, we were able to reorganize scaffolds of the P. nanus genome into 38 chromosomes (supplementary table S2). Additionally, we were able to identify the fifth largest chromosome as sex chromosome Z (Fig. 1) in addition to the 37 autosomes. Of the 38 chromosome designated sequences, 34 contain at least one telomere and 10 of them have a top and a bottom telomere.

a) Graphical representation of assembled genome statistics, genome size 1.07 Gigabases (Gbp) with N50 and N90 scaffold size comparison graph produced with Quast (Gurevich et al. 2013). Chr 1,2,3… means the chromosome and its number located in the assembled genome of LVF (little vermilion flycatcher), ChrZ-Lvf is the sex chromosome. b) kmer histogram produced with GenomeScope2 to see genome length (len) derived from PacBio sequences and kmer coverage (kcov), read error rate (err), genome unique length (uniq), and duplicate (dup). c) Hi-C contact maps from Juicebox Assembly Tools (JBAT) v.1.11.08 after manual refinement, orientation of contigs, and their delineation within chromosomes, scale in megabases (Mb). d) Pyrocephalus nanus is endemic to the Galapagos archipelago, a volcanic island system located in the Eastern Tropical Pacific. A red dot in the center of the map highlights the highlands of Santa Cruz Island where the sample was collected. The birds are sexually dimorphic, on the right, you can see the adult plumage.
Fig. 1.

a) Graphical representation of assembled genome statistics, genome size 1.07 Gigabases (Gbp) with N50 and N90 scaffold size comparison graph produced with Quast (Gurevich et al. 2013). Chr 1,2,3… means the chromosome and its number located in the assembled genome of LVF (little vermilion flycatcher), ChrZ-Lvf is the sex chromosome. b) kmer histogram produced with GenomeScope2 to see genome length (len) derived from PacBio sequences and kmer coverage (kcov), read error rate (err), genome unique length (uniq), and duplicate (dup). c) Hi-C contact maps from Juicebox Assembly Tools (JBAT) v.1.11.08 after manual refinement, orientation of contigs, and their delineation within chromosomes, scale in megabases (Mb). d) Pyrocephalus nanus is endemic to the Galapagos archipelago, a volcanic island system located in the Eastern Tropical Pacific. A red dot in the center of the map highlights the highlands of Santa Cruz Island where the sample was collected. The birds are sexually dimorphic, on the right, you can see the adult plumage.

Table 1

Sample information assembled genome statistics and accession numbers

Species Scientific namePyrocephalus nanusCommon nameLittle vermilion flycatcher
Location sample collectionIsland: Santa Cruz, Province: Galapagos Islands, Country: EcuadorCoordinateS0.63172° W90.36300°
BioProjects and voucherNCBI BioProjectPRJNA1040305
NCBI BioSampleSAMN38255597
NCBI Genome accessionJAWZSU000000000
Genome assemblybPyrNan1_0.fasta
NCBI SRA accession raw reads dataPRJNA1040305
Genome sequencePacBio HiFi readsRun2 PacBio SMRT cells 6.84 million HiFi reads
Illumina HiCRunNovaSeq 6000 103.42 million pair reads
Genome assembly quality metricsNumber of scaffolds152
Total size of scaffolds1,072,479,546
Longest scaffold117,695,848
Shortest scaffold12,661
Mean scaffold size7,055,786
Median scaffold size81,135
N50 scaffold length74,038,366
N90 scaffold length12,777,097
L50 scaffold count6
L90 scaffold count21
N50 contig length17,811,915
N90 contig length4,626,274
Number of contigs267
Number of contigs in scaffolds144
Number of contigs not in scaffolds123
GC (%)42.30
Gaps 100 Ns115
BUSCO completeness passeriformes n = 10,844CSDFM
99.7%99.6%0.1%0.1%0.2%
OrganelleWhole mitochondrial genome 17,151 bp22 tRNAs, 2 rRNAs, 2 CRs and 13 PCGs
Species Scientific namePyrocephalus nanusCommon nameLittle vermilion flycatcher
Location sample collectionIsland: Santa Cruz, Province: Galapagos Islands, Country: EcuadorCoordinateS0.63172° W90.36300°
BioProjects and voucherNCBI BioProjectPRJNA1040305
NCBI BioSampleSAMN38255597
NCBI Genome accessionJAWZSU000000000
Genome assemblybPyrNan1_0.fasta
NCBI SRA accession raw reads dataPRJNA1040305
Genome sequencePacBio HiFi readsRun2 PacBio SMRT cells 6.84 million HiFi reads
Illumina HiCRunNovaSeq 6000 103.42 million pair reads
Genome assembly quality metricsNumber of scaffolds152
Total size of scaffolds1,072,479,546
Longest scaffold117,695,848
Shortest scaffold12,661
Mean scaffold size7,055,786
Median scaffold size81,135
N50 scaffold length74,038,366
N90 scaffold length12,777,097
L50 scaffold count6
L90 scaffold count21
N50 contig length17,811,915
N90 contig length4,626,274
Number of contigs267
Number of contigs in scaffolds144
Number of contigs not in scaffolds123
GC (%)42.30
Gaps 100 Ns115
BUSCO completeness passeriformes n = 10,844CSDFM
99.7%99.6%0.1%0.1%0.2%
OrganelleWhole mitochondrial genome 17,151 bp22 tRNAs, 2 rRNAs, 2 CRs and 13 PCGs
Table 1

Sample information assembled genome statistics and accession numbers

Species Scientific namePyrocephalus nanusCommon nameLittle vermilion flycatcher
Location sample collectionIsland: Santa Cruz, Province: Galapagos Islands, Country: EcuadorCoordinateS0.63172° W90.36300°
BioProjects and voucherNCBI BioProjectPRJNA1040305
NCBI BioSampleSAMN38255597
NCBI Genome accessionJAWZSU000000000
Genome assemblybPyrNan1_0.fasta
NCBI SRA accession raw reads dataPRJNA1040305
Genome sequencePacBio HiFi readsRun2 PacBio SMRT cells 6.84 million HiFi reads
Illumina HiCRunNovaSeq 6000 103.42 million pair reads
Genome assembly quality metricsNumber of scaffolds152
Total size of scaffolds1,072,479,546
Longest scaffold117,695,848
Shortest scaffold12,661
Mean scaffold size7,055,786
Median scaffold size81,135
N50 scaffold length74,038,366
N90 scaffold length12,777,097
L50 scaffold count6
L90 scaffold count21
N50 contig length17,811,915
N90 contig length4,626,274
Number of contigs267
Number of contigs in scaffolds144
Number of contigs not in scaffolds123
GC (%)42.30
Gaps 100 Ns115
BUSCO completeness passeriformes n = 10,844CSDFM
99.7%99.6%0.1%0.1%0.2%
OrganelleWhole mitochondrial genome 17,151 bp22 tRNAs, 2 rRNAs, 2 CRs and 13 PCGs
Species Scientific namePyrocephalus nanusCommon nameLittle vermilion flycatcher
Location sample collectionIsland: Santa Cruz, Province: Galapagos Islands, Country: EcuadorCoordinateS0.63172° W90.36300°
BioProjects and voucherNCBI BioProjectPRJNA1040305
NCBI BioSampleSAMN38255597
NCBI Genome accessionJAWZSU000000000
Genome assemblybPyrNan1_0.fasta
NCBI SRA accession raw reads dataPRJNA1040305
Genome sequencePacBio HiFi readsRun2 PacBio SMRT cells 6.84 million HiFi reads
Illumina HiCRunNovaSeq 6000 103.42 million pair reads
Genome assembly quality metricsNumber of scaffolds152
Total size of scaffolds1,072,479,546
Longest scaffold117,695,848
Shortest scaffold12,661
Mean scaffold size7,055,786
Median scaffold size81,135
N50 scaffold length74,038,366
N90 scaffold length12,777,097
L50 scaffold count6
L90 scaffold count21
N50 contig length17,811,915
N90 contig length4,626,274
Number of contigs267
Number of contigs in scaffolds144
Number of contigs not in scaffolds123
GC (%)42.30
Gaps 100 Ns115
BUSCO completeness passeriformes n = 10,844CSDFM
99.7%99.6%0.1%0.1%0.2%
OrganelleWhole mitochondrial genome 17,151 bp22 tRNAs, 2 rRNAs, 2 CRs and 13 PCGs

Assessment of the Nuclear Genome

GenomeScope2 kmer 21 modeling showed a smaller genome at 791 Mbp than the 1.07 Gbp final assembly size, and 2% heterozygosity with only 45 Mbp repeats. HiFiasm showed kmer peak homozygosity of 24, matching GenomeScope's 24.1 assessment, and did not find a heterozygosity peak. The genome size of 1.07 Gbp had repeats of 120 M bases, 11.20%, which is often underrepresented in the GenomeScope model. To assess the completeness of the genome, we used BUSCO v.5.4.7 (Simão et al. 2015) and compleasm v0.2.2 (Huang and Li 2023) and combined results. BUSCO + compleasm identified Complete BUSCO genes: 99.72% (single-copy orthologs: 99.63%, duplicated: 0.09%, fragmented: 0.06%, missing: 0.22%, n: 10,844 genes) (Table 1, see Supplementary material, part II).

Genome Annotation

Repeatmasker masked 11.20% (see supplementary table S1) of the 1.07 Gbp genome, or 120 Mbp as repeats. This is less than many other avian genomes but similar to several other flycatcher genomes recently sequenced. All annotations were done bioinformatically without RNA-seq data. BRAKER initially found 31,920 start codons and 31,931 stop codons. Functional gene annotation was used to remove low confidence gene models. The final gene annotation included 30,101 genes and 31,748 mRNAs, with a total of 161,303 exons. Genes composed 13.26% of the genome. Gene names and/or descriptions were assigned to 19,303 of the identified mRNAs from the genome. Additional annotation statistics can be found in the supplement, as well as full annotation (.gff, .fna, .faa) files (included as Supplementary material).

Mitochondrial Genome

A total of 167 HiFiasm corrected HiFi reads mapped to the mitogenome, and the HiFiMiTie pipeline unambiguously assembled these into a complete circular mitochondrial genome of 17,151 nucleotides. The genome has a remnant control region (CR) flanked by tRNA Glu and tRNA Phe, and a primary control region flanked by tRNA Thr and tRNA Pro (see supplementary fig. S1, Supplementary Material online, Supplementary material). A total of 113 HiFi reads contained control region segments: each of the remnant CR sections was 178 nt in length, and each of the primary CR sections was 1,430 nt in length. As typical for avian mitogenomes, there were 22 tRNAs, 2 rRNAs, and 13 protein-coding genes (supplementary fig. S1, Supplementary Material online; see details in part I of Supplementary material).

As a check, the pipeline runs megahit (Li et al. 2016) using the same 167 HiFi reads. The resulting sequence is identical except where the primary control region contains repeats. The issue with repeat contraction or duplication is a common problem with kmer based assemblers such as megahit. The HiFiMiTie primary assembly mode uses segmented multiple sequence alignment and is typically a more reliable tool for resolving repeated regions when highly accurate long reads are available.

Discussion

Here, we present a highly complete genome of P. nanus, and the first genome of a Galapagos passerine assembled with PacBio HiFi sequences. This haploid genome size of 1.073 Gb is similar to other genomes of this bird family that are between 1.0 and 1.1 Gb (Ruegg et al. 2018). Of the eight complete genomes currently assembled on the Tyrannidae family (NCBI), none were made with PacBio HiFi sequences. These other assemblies have a large number of scaffolds ranging from 1,692 to 43,947 with a mean of 16,422 see supplementary table S3, Supplementary Material online, Supplementary material. Using HiFi reads, we were able to obtain longer average contig sizes and with Hi-C reads fewer scaffolds. This high-resolution genome is also an important resource for future studies on the Tyrannidae family (441 species), the largest avian family (Billerman et al. 2020), and the subfamily Fluvicolinae (∼130 species) that includes Pyrocephalus, its closest related taxa Alectrurus, Arundicola, Gubernetes, Colonia, and several other poorly studied genera (Ohlson et al. 2008; Feng et al. 2020). Also, in our haplotype assembly, we obtained a BUSCO + compleasm complete score of 99.72% suggesting that a great conservation of genes in our assembly and that the genome assembly is highly complete (Huang and Li 2023). This assembly is a step forward to conduct more genomic studies on the endemic species of the Galapagos archipelago.

Materials and Methods

Tissue Collection

A single egg of P. nanus was collected while monitoring nests on Santa Cruz Island during the 2021 breeding season. The nest was blown to the ground by strong winds where the author collected a single broken egg containing a significantly developed but dead embryo. Approximately 40 min after death, the embryo was preserved in 96% alcohol and stored in a freezer at −27 °C. Prior to analysis, the frozen sample was exported from Galapagos to the California Academy of Sciences.

PacBio High-Molecular-Weight DNA Extraction, Library Preparation, and Sequencing

We prepared libraries using standard PacBio recommendations, but full protocol details for library prep can be found in PacBio (2012) and Van Dam et al. (2021). We confirmed large amounts of high-molecular-weight DNA using FemtoPulse (Agilent, Santa Clara, CA). DNA was sent to the QB3 Genomics facility at the University of California Berkeley for HiFi library preparation and sequencing on a Pacific Biosciences Sequel II platform, sequencing two SMRT cells with HiFi version 2 chemistry.

In Situ Hi-C Library Preparation

Additional muscle tissues from the same sample were homogenized using a sterile razor blade on ice. An in situ Hi-C library was prepared as described in Rao (2014) with a few modifications. Briefly, after the streptavidin pull-down step, the biotinylated Hi-C products underwent end repair, ligation, and enrichment steps using the NEBNext UltraII DNA Library Preparation kit (New England Biolabs Inc, Ipswich, MA). Titration of the number of PCR cycles was performed as described in Belton (Belton et al. 2012).

Genome Assembly

For HiFi data preparation, cutadapt v.4.4 (Martin 2011) was used to remove any read with length less than 1000 bp or that contained a PacBio SMRTbell adapter in any position. Cutadapt arguments revcomp, error-rate 0.1, overlap 35, discard, minimum-length 1000 were used along with the –b adapter argument to create cleaned fastq HiFi reads. To assess genome size, we ran jellyfish v2.3.0 (Marçais and Kingsford 2011) using its count option with long reads and kmer size 21, then jellyfish histo for a histogram of kmer frequencies. The histo file was uploaded to GenomeScope2 (qb.cshl.edu/genomescope/genomescope2.0) (Ranallo-Benavidez et al. 2020) to provide estimates of genome properties including total size, repeat content, and heterozygosity.

The two PacBio HiFi cleaned fastq read sets were assembled into genome contigs using the program hifiasm v.0.16.1-r375 (Cheng et al. 2021) and run with arguments —write-ec —write-paf. The HiFiasm program was run via a custom script that converts the program's gfa output to fasta files with any circular records stored separately. Various statistics files were created from the fasta file, including N50, N90, and telomere location in contigs, and BUSCO v.5.4.7 (Simão et al. 2015) was run using the avian passeriformes lineage dataset. To remove haplotypic duplicates, we ran purge_dups v.1.2.5 (Guan et al. 2020) using cutoffs -l 5 -m 36 -u 108 and excluded records from the contig assembly identified as duplicative. To scan for contaminants, we used Kraken2 (Wood et al. 2019) and blastn (Camacho et al. 2009). Taxonomy ID results from the blastn search were translated and sorted by clade, allowing for the identification of any non-avian contigs that were then removed from the contig-level assembly.

To scaffold contigs using Hi-C reads, the Illumina reads from the Hi-C tissue were cleaned and prepared in two steps. First, fastp v.0.23.2 (Chen et al. 2018) was run with the dedup argument to remove Illumina adapters and any read less than 100 bp, and its pair, after adapter removal. Following this, Arima's pipeline (github.com/ArimaGenomics/mapping_pipeline) (Arima 2021) was used with the fastp cleaned Hi-C reads as input to perform additional clean-up and to map the reads to the contig-level assembly with bwa (Li and Durbin 2009; Li 2013). The resulting bam file and the contig assembly were input into the YaHS v.1.2 scaffolding program (Zhou et al. 2023). YaHS scaffolding was run with bam file mapped Hi-C reads and fasta contig assembly input and the –no-contig-ec option. YaHS created .hic and .assembly files that were used to display Hi-C contact maps in Juicebox Assembly Tools (JBAT) v.1.11.08 (Durand et al. 2016a, 2016b) for manual refinement, and we interactively updated the location and orientation of contigs and their delineation within chromosomes.

To assess the level of genome completeness, we ran both compleasm v.0.2.2 (Huang and Li 2023; a reimplementation of BUSCO using miniprot (Li 2023)), and BUSCO v.5.4.7 (Simão et al. 2015) with its default MetaEuk (Levy-Karin et al. 2020) mode, each using the 10,844 ortholog lineage dataset passeriformes_odb10. We updated any BUSCOs not found by compleasm that were found in the BUSCO MetaEuk results.

Genome Annotation

Prior to gene annotation, regions with repeats were identified using RepeatModeler v2.0.1 (Flynn et al. 2020). We combined repeat models found using RepeatModeler with the avian repeat models from Repbase RepeatMasker libraries v20181026 into a single fasta file. These combined repeat models were used in Repeatmasker v4.0.9 (Smit et al. 2015) with the options -small -xsmall and -nolow to create a soft-masked repeat version of the assembly file used for the gene model structural annotation as well as to create a table of repeat types and lengths.

We annotated the genome without RNA sequence data using BRAKER (version 3.03 April 2023) to predict the location of genes (mRNAs, introns, exons, CDS) using BRAKER's ProtHint pipeline (version 2.6.0 Georgia Tech GeneMark) and AUGUSTUS 3.5.0 (Gabriel et al. 2023) with the vertebrate amino acid sequences from Vertebrata_OrthoDB_10 (Kuznetsov et al. 2023) BRAKER pipeline B., previously called BRAKER2, which is employed when RNA-seq data is unavailable. Potential gene positions were output to gff files. We then eliminated any potential genes that didn’t have both a start and a stop codon and genes that were fully nesting within other genes. Functional annotation of these genes was begun by looking for protein domains in the amino acid sequences found by BRAKER by running InterProScan-5.61-93.0 (with CDD-3.20, FunFam-4.3.0, PANTHER-17.0, Pfam-35.0, PIRSF-3.10, PRINTS-42.0, ProSitePatterns-2022_05, ProSiteProfiles-2022_05, S MART-9.0, SUPERFAMILY-1.75, TIGRFAM-15.0 analyses). The sequences were also blasted against Genbank nt, nr, and swissprot databases downloaded on March 27, 2023 and UNIPROT TrEMBL downloaded on May 15, 2021. Coding sequences were blasted using BLASTN (v2.14) using the nt database. Translated protein sequences were blasted using BLASTP (v2.14) with SwissProt, diamond blastp v.2.1.6 (Buchfink et al. 2021) with the TrEMBL database and with OrthoDB10 vertebrate orthologs. Protein domain IDs and Gene Ontology terms, from InterProScan output, were added to the gff file for each gene and isoform model as were the functional annotation description from the lowest eValue, highest score result from the blast searches. They were also added to the Amino Acid and to the CDS fasta file gene sequences.

Mitochondrion Assembly

Mitochondrial sequence was derived from corrected HiFi reads from the nuclear genome assembly using an internally created program pipeline named HiFiMiTie (Henderson 2021) version 0.1. The HiFiMiTie pipeline was designed to extract and assemble mitochondrial sequence from PacBio HiFi long reads and also resolve control region heteroplasmy and repeats. It also discovers and annotates tRNAs, rRNAs, protein-coding genes, and up to two duplicate CRs. See details and full logged output in the Supplementary material, part I.

Supplementary Material

Supplementary material is available at Genome Biology and Evolution online.

Acknowledgments

Ecuador is a signatory to the Nagoya protocol; collecting and export permits were approved by the Ministry of Environment of Ecuador and the Galapagos National Park. This study was conducted with permission of the Galapagos National Park Directorate research permit PC-06-21. The Ministry of Environment of Ecuador approved the genetic permit (MAE-DNB-CM-2016-0043) and sample export permit (N° 012-2021-EXP-CM-FAU-DBI/MAAE). This publication is contribution number 2550 of the Charles Darwin Foundation for the Galapagos Islands. The study was funded by a PhD fellowship awarded to D.J.A. from the World Wildlife Fund, Mr. Russell E. Train and Education for Nature, Agreement #RH94, a Lakeside Fellowship to D.J.A. from the California Academy of Sciences, Peter and Kris Norvig donation to the Charles Darwin Foundation, and a donation from Michael Simon to the California Academy of Sciences. Other funding sources to conduct fieldwork include Galapagos Conservation Trust; Galapagos Conservancy and Lindblad Expeditions—National Geographic. We thank Courtney Pike for feedback in improving the manuscript and sample collection.

Data Availability

The assembled genome is available in NCBI with the reference number submission ID SUB13890500 and the accession number JAWZSU000000000. The raw sequence files used in the assembly are available in the NCBI Sequence Read Archive, accession number PRJNA1040305. Scripts and codes used to perform these analyses are on GitHub (https://github.com/calacademy-research/).

Literature Cited

Arima
 
X
.
Arima genomics mapping pipeline
.
2021
[accessed 2019 Feb 9]. https://www.github.com/ArimaGenomics/mapping_pipeline/blob/master/arima_mapping_pipeline.sh.

Belton
 
JM
,
McCord
 
RP
,
Gibcus
 
JH
,
Naumova
 
N
,
Zhan
 
Y
,
Dekker
 
J
.
Hi–C: a comprehensive technique to capture the conformation of genomes
.
Methods
.
2012
:
58
(
3
):
268
276
. https://doi.org/10.1016/j.ymeth.2012.05.001.

Billerman
 
SM
,
Keeney
 
BK
,
Rodewald
 
PG
,
Schulenberg
 
TS
.
Birds of the world
.
Ithaca, NY
:
Cornell Laboratory of Ornithology
;
2020
.

BirdLife International
.
Pyrocephalus nanus. The IUCN Red List of Threatened Species 2023: e.T103682926A172654604
. https://doi.org/10.2305/IUCN.UK.2023-1.RLTS.T103682926A172654604.en

Buchfink
 
B
,
Reuter
 
K
,
Drost
 
HG
.
Sensitive protein alignments at tree-of-life scale using DIAMOND
.
Nat Methods.
 
2021
:
18
(
4
):
366
368
. https://doi.org/10.1038/s41592-021-01101-x.

Burga
 
A
,
Wang
 
W
,
Ben-David
 
E
,
Wolf
 
PC
,
Ramey
 
AM
,
Verdugo
 
C
,
Lyons
 
K
,
Parker
 
PG
,
Kruglyak
 
L
.
A genetic signature of the evolution of loss of flight in the Galapagos cormorant
.
Science
.
2017
:
356
(
6341
):
eaal3345
. https://doi.org/10.1126/science.aal3345.

Camacho
 
C
,
Coulouris
 
G
,
Avagyan
 
V
,
Ma
 
N
,
Papadopoulos
 
J
,
Bealer
 
K
,
Madden
 
TL
.
BLAST+: architecture and applications
.
BMC Bioinformatics
.
2009
:
10
(
1
):
1
9
. https://doi.org/10.1186/1471-2105-10-421.

Carmi
 
O
,
Witt
 
CC
,
Jaramillo
 
A
,
Dumbacher
 
JP
.
Phylogeography of the vermilion flycatcher species complex: multiple speciation events, shifts in migratory behavior, and an apparent extinction of a Galápagos-endemic bird species
.
Mol Phylogenet Evol
.
2016
:
102
:
152
173
. https://doi.org/10.1016/j.ympev.2016.05.029.

Charles Darwin Foundation
. Galapagos Species Database, dataZone. Charles Darwin Foundation. 2024 [accessed 2024 May 1]. https://datazone.darwinfoundation.org/en/checklist/.

Chen
 
S
,
Zhou
 
Y
,
Chen
 
Y
,
Gu
 
J
.
fastp: an ultra-fast all-in-one FASTQ preprocessor
.
Bioinformatics
.
2018
:
34
(
17
):
884
890
. https://doi.org/10.1093/bioinformatics/bty560.

Cheng
 
H
,
Concepcion
 
GT
,
Feng
 
X
,
Zhang
 
H
,
Li
 
H
.
Haplotype-resolved de novo assembly using phased assembly graphs with hifiasm
.
Nat Methods.
 
2021
:
18
(
2
):
170
175
. https://doi.org/10.1038/s41592-020-01056-5.

Durand
 
NC
,
Robinson
 
JT
,
Shamim
 
MS
,
Machol
 
I
,
Mesirov
 
JP
,
Lander
 
ES
,
Aiden
 
EL
.
Juicebox provides a visualization system for Hi-C contact maps with unlimited zoom
.
Cell Syst.
 
2016a
:
3
(
1
):
99
101
. https://doi.org/10.1016/j.cels.2015.07.012.

Durand
 
NC
,
Shamim
 
MS
,
Machol
 
I
,
Rao
 
SSP
,
Huntley
 
MH
,
Lander
 
ES
,
Aiden
 
EL
.
Juicer provides a one-click system for analyzing loop-resolution Hi-C experiments
.
Cell Syst.
 
2016b
:
3
(
1
):
95
98
. https://doi.org/10.1016/j.cels.2016.07.002.

Dvorak
 
M
,
Fessl
 
B
,
Nemeth
 
E
,
Anchundia
 
D
,
Cotín
 
J
,
Schulze
 
CH
,
Tapia
 
W
,
Wendelin
 
B
.
Survival and extinction of breeding landbirds on San Cristóbal, a highly degraded island in the Galápagos
.
Bird Conserv Int
.
2020
:
30
(
3
):
381
395
. https://doi.org/10.1017/S0959270919000285.

Dvorak
 
M
,
Nemeth
 
E
,
Wendelin
 
B
,
Herrera
 
P
,
Mosquera
 
D
,
Anchundia
 
D
,
Sevilla
 
C
,
Tebbich
 
S
,
Fessl
 
B
.
Conservation status of landbirds on Floreana: the smallest inhabited Galapagos Island
.
J Field Ornithol
.
2017
:
88
(
2
):
132
145
. https://doi.org/10.1111/jofo.12197.

Feng
 
S
,
Stiller
 
J
,
Deng
 
Y
,
Armstrong
 
J
,
Fang
 
Q
,
Reeve
 
AH
,
Xie
 
D
,
Chen
 
G
,
Guo
 
C
,
Faircloth
 
BC
, et al.  
Dense sampling of bird diversity increases power of comparative genomics
.
Nature
.
2020
:
587
(
7833
):
252
257
. https://doi.org/10.1038/s41586-020-2873-9.

Fessl
 
B
,
Anchundia
 
D
,
Carrion-Tacuri
 
J
,
Cimadom
 
A
,
Cotın
 
J
,
Cunninghame
 
F
,
Dvorak
 
M
,
Mosquera
 
D
,
Nemeth
 
E
,
Sevilla
 
CR
, et al.  
Galapagos landbirds (passerines, cuckoos, and doves): Status, threats and knowledge gaps. In Galapagos Reports 2015-2016. GNPS, GCREG, CDF and GC. Puerto Ayora, Galapagos, Ecuador.
 
2015
.

Flynn
 
JM
,
Hubley
 
R
,
Goubert
 
C
,
Rosen
 
J
,
Clark
 
AG
,
Feschotte
 
C
,
Smit
 
AF
.
RepeatModeler2 for automated genomic discovery of transposable element families
.
Proc Natl Acad Sci U S A
.
2020
:
117
(
17
):
9451
9457
. https://doi.org/10.1073/pnas.1921046117.

Gabriel
 
L
,
Brna
 
T
,
Hoff
 
KJ
,
Ebel
 
M
,
Lomsadze
 
A
,
Borodovsky
 
M
,
Stanke
 
M
.
BRAKER3: fully automated genome annotation using RNA-Seq and protein evidence with GeneMark-ETP, AUGUSTUS and TSEBRA
.
bioRxiv
.
2023
.
preprint this version posted February 29, 2024
. https://doi.org/10.1101/2023.06.10.544449.

García-Dorado
 
A
,
Caballero
 
A
.
Neutral genetic diversity as a useful tool for conservation biology
.
Conserv Genet
.
2021
:
22
(
4
):
541
545
. https://doi.org/10.1007/s10592-021-01384-9.

Geladi
 
I
,
Henry
 
P-Y
,
Mauchamp
 
A
,
Couenberg
 
P
,
Fessl
 
B
.
Conserving Galapagos landbirds in agricultural landscapes: forest patches of native trees needed to increase landbird diversity and abundance
.
Biodivers Conserv
.
2021
:
30
(
7
):
2181
2206
. https://doi.org/10.1007/s10531-021-02193-9.

Gifford
 
EW
.
Expedition of the California Academy of Sciences to the Galapagos Islands, 1905–1906. XIII. Field notes on the land birds of the Galapagos Islands and of Cocos Island, Costa Rica
.
Proc Cal Acad Sci Fourth Series
.
1919
:
2
:
189
258
.

Guan
 
D
,
McCarthy
 
SA
,
Wood
 
J
,
Howe
 
K
,
Wang
 
Y
,
Durbin
 
R
.
Identifying and removing haplotypic duplication in primary genome assemblies
.
Bioinformatics
.
2020
:
36
(
9
):
2896
2898
. https://doi.org/10.1093/bioinformatics/btaa025.

Gurevich
 
A
,
Saveliev
 
V
,
Vyahhi
 
N
,
Tesler
 
G
.
QUAST: quality assessment tool for genome assemblies
.
Bioinformatics
.
2013
:
29
(
8
):
1072
1075
. https://doi.org/10.1093/bioinformatics/btt086.

Henderson
 
J
.
HiFiMiTie find & analyze metazoan mitochondria from HiFi reads
.
2021
 
California Academy of Sciences, Institute for Biodiversity Science & Sustainability
. [accessed 2021 Nov 15]. https://github.com/calacademy-research/HiFiMiTie.

Huang
 
N
,
Li
 
H
.
compleasm: a faster and more accurate reimplementation of BUSCO
.
Bioinformatics
.
2023
:
39
(
10
):
btad595
. https://doi.org/10.1093/bioinformatics/btad595.

Kuznetsov
 
D
,
Tegenfeldt
 
F
,
Manni
 
M
,
Seppey
 
M
,
Berkeley
 
M
,
Kriventseva
 
EV
,
Zdobnov
 
EM
.
OrthoDB v11: annotation of orthologs in the widest sampling of organismal diversity
.
Nucleic Acids Res.
 
2023
:
51
(
D1
):
D445
D451
. https://doi.org/10.1093/nar/gkac998.

Leuba
 
C
,
Tebbich
 
S
,
Nemeth
 
E
,
Anchundia
 
D
,
Heyer
 
E
,
Mosquera
 
DA
,
Richner
 
H
,
Rojas Allieri
 
ML
,
Sevilla
 
C
,
Fessl
 
B
.
Effect of an introduced parasite in natural and anthropogenic habitats on the breeding success of the endemic little vermilion flycatcher Pyrocephalus nanus in the Galápagos
.
J Avian Biol
.
2020
:
51
(
8
):
1
13
. https://doi.org/10.1111/jav.02438.

Levy-Karin
 
E
,
Mirdita
 
M
,
Söding
 
J
.
MetaEuk—sensitive, high-throughput gene discovery, and annotation for large-scale eukaryotic metagenomics
.
Microbiome
.
2020
:
8
(
1
):
1
15
. https://doi.org/10.1186/s40168-020-00808-x.

Li
 
H
.
Aligning sequence reads, clone sequences and assembly contigs with BWA-MEM. arXiv:1303-3997, 26 May 2013, preprint: not peer reviewed
. https://doi.org/10.48550/arXiv.1303.3997.

Li
 
H
.
Protein-to-genome alignment with miniprot
.
Bioinformatics
.
2023
:
39
(
1
):
btad014
. https://doi.org/10.1093/bioinformatics/btad014.

Li
 
H
,
Durbin
 
R
.
Fast and accurate short read alignment with Burrows–Wheeler transform
.
Bioinformatics
.
2009
:
25
(
14
):
1754
1760
. https://doi.org/10.1093/bioinformatics/btp324.

Li
 
D
,
Luo
 
R
,
Liu
 
C-M
,
Leung
 
C-M
,
Ting
 
H-F
,
Sadakane
 
K
,
Yamashita
 
H
,
Lam
 
T-W
.
MEGAHIT v1. 0: a fast and scalable metagenome assembler driven by advanced methodologies and community practices
.
Methods
.
2016
:
102
:
3
11
. https://doi.org/10.1016/j.ymeth.2016.02.020.

Marçais
 
G
,
Kingsford
 
C
.
A fast, lock-free approach for efficient parallel counting of occurrences of k-mers
.
Bioinformatics
.
2011
:
27
(
6
):
764
770
. https://doi.org/10.1093/bioinformatics/btr011.

Martin
 
M
.
Cutadapt removes adapter sequences from high-throughput sequencing reads
.
EMBnet J
.
2011
:
17
(
1
):
10
12
. https://doi.org/10.14806/ej.17.1.200.

Merlen
 
G
.
Gone, gone…going: the fate of the vermilion flycatcher on Darwin's islands
.
Galapagos Report
.
2013
:
2012
:
180
188
.

Mosquera
 
D
,
Fessl
 
B
,
Anchundia
 
D
,
Heyer
 
E
,
Leuba
 
C
,
Nemeth
 
E
,
Rojas Allieri
 
ML
,
Sevilla
 
CR
,
Tebbich
 
S
.
The invasive parasitic fly Philornis downsi is threatening little vermilion flycatchers on the Galápagos Islands
.
Avian Conserv Ecol
.
2022
:
17
(
1
):
6
. https://doi.org/10.5751/ACE-02040-170106.

National Center for Biotechnology Information (NCBI)
.
Bethesda (MD): National Library of Medicine (US), National Center for Biotechnology Information; [1988]—[cited 2023 Aug 30]
. https://www.ncbi.nlm.nih.gov/assembly/?term=tyrannidae.

National Center for Biotechnology Information (NCBI)
. Bethesda (MD): National Library of Medicine (US), National Center for Biotechnology Information; [1988]—[cited 2023 Oct 24]. https://www.ncbi.nlm.nih.gov/datasets/genome/GCF_003957565.2/

Ohlson
 
J
,
Fjeldså
 
J
,
Ericson
 
PG
.
Tyrant flycatchers coming out in the open: phylogeny and ecological radiation of Tyrannidae (Aves, Passeriformes)
.
Zool Scr.
 
2008
:
37
(
3
):
315
335
. https://doi.org/10.1111/j.1463-6409.2008.00325.x.

Pacbio Extracting DNA Using Phenol-Chloroform
. [cited 25 July 2023]. Available: https://www.pacb.com/wp-content/uploads/2015/09/SharedProtocol-Extracting-DNA-usinig-Phenol-Chloroform.pdf.
2012
.

Pan
 
H
,
Cole
 
TL
,
Bi
 
X
,
Fang
 
M
,
Zhou
 
C
,
Yang
 
Z
,
Ksepka
 
DT
,
Hart
 
T
,
Bouzat
 
JL
,
Argilla
 
LS
, et al.  
High-coverage genomes to elucidate the evolution of penguins
.
GigaScience
.
2019
:
8
(
9
):
giz117
. https://doi.org/10.1093/gigascience/giz117.

Ranallo-Benavidez
 
TR
,
Jaron
 
KS
,
Schatz
 
MC
.
GenomeScope 2.0 and Smudgeplot for reference-free profiling of polyploid genomes
.
Nat Commun.
 
2020
:
11
(
1
):
1432
. https://doi.org/10.1038/s41467-020-14998-3.

Rao
 
SS
.
A 3D map of the human genome at kilobase resolution reveals principles of chromatin looping
.
Cell
.
2014
:
159
(
7
):
1665
1680
. https://doi.org/10.1016/j.cell.2014.11.021.

Rothschild
 
LWRB
,
Hartert
 
E
.
A review of the ornithology of the Galápagos Islands: with notes on the Webster-Harris expedition. Zoological Museum
.
1899
:
85
142
.

Rubin
 
CJ
,
Enbody
 
ED
,
Dobreva
 
MP
,
Abzhanov
 
A
,
Davis
 
BW
,
Lamichhaney
 
S
,
Pettersson
 
M
,
Sendell-Price
 
AT
,
Sprehn
 
CG
,
Valle
 
CA
, et al.  
Rapid adaptive radiation of Darwin’s finches depends on ancestral genetic modules
.
Sci Adv.
 
2022
:
8
(
27
):
eabm5982
. https://doi.org/10.1126/sciadv.abm5982.

Ruegg
 
K
,
Bay
 
RA
,
Anderson
 
EC
,
Saracco
 
JF
,
Harrigan
 
RJ
,
Whitfield
 
M
,
Paxton
 
EH
,
Smith
 
TB
.
Ecological genomics predicts climate vulnerability in an endangered southwestern songbird
.
Ecol Lett.
 
2018
:
21
(
7
):
1085
1096
. https://doi.org/10.1111/ele.12977.

Simão
 
FA
,
Waterhouse
 
RM
,
Ioannidis
 
P
,
Kriventseva
 
EV
,
Zdobnov
 
EM
.
BUSCO: assessing genome assembly and annotation completeness with single-copy orthologs
.
Bioinformatics
.
2015
:
31
(
19
):
3210
3212
. https://doi.org/10.1093/bioinformatics/btv351.

Smit
 
AFA
,
Hubley
 
R
,
Green
 
P
.
RepeatMasker Open-4.0. 2013–2015
.
2015
:
289
300
.

Van Dam
 
MH
,
Cabras
 
AA
,
Henderson
 
JB
,
Rominger
 
AJ
,
Pérez Estrada
 
C
,
Omer
 
AD
,
Dudchenko
 
O
,
Lieberman Aiden
 
E
,
Lam
 
AW
.
The Easter Egg Weevil (Pachyrhynchus) genome reveals syntenic patterns in Coleoptera across 200 million years of evolution
.
PLoS Genet.
 
2021
:
17
(
8
):
e1009745
. https://doi.org/10.1371/journal.pgen.1009745.

Wood
 
DE
,
Lu
 
J
,
Langmead
 
B
.
Improved metagenomic analysis with Kraken 2
.
Genome Biol.
 
2019
:
20
(
1
):
1
13
. https://doi.org/10.1186/s13059-019-1891-0.

Zhang
 
G
,
Li
 
C
,
Li
 
Q
,
Li
 
B
,
Larkin
 
DM
,
Lee
 
C
,
Storz
 
JF
,
Antunes
 
A
,
Greenwold
 
MJ
,
Meredith
 
RW
, et al.  
Comparative genomics reveals insights into avian genome evolution and adaptation
.
Science
.
2014
:
346
(
6215
):
1311
1320
. https://doi.org/10.1126/science.1251385.

Zhou
 
C
,
McCarthy
 
SA
,
Durbin
 
R
.
YaHS: yet another Hi-C scaffolding tool
.
Bioinformatics
.
2023
:
39
(
1
):
btac808
. https://doi.org/10.1093/bioinformatics/btac808.

This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
Associate Editor: Qi Zhou
Qi Zhou
Associate Editor
Search for other works by this author on:

Supplementary data