Abstract

Despite being quite specious (~10,000 extant species), birds have a fairly uniform genome size and karyotype (including the common occurrence of microchromosomes) relative to other vertebrate lineages. Storks (Family Ciconiidae) are a charismatic and distinct group of large wading birds with nearly worldwide distribution but few genomic resources. Here we present an annotated chromosome-level reference genome and chromosome orthology analysis for the wood stork (Mycteria americana), a species that has been federally protected under the Endangered Species Act since 1984. The annotated chromosome-level reference assembly was produced using the blood of a wild female wood stork chick, has a length of 1.35 Gb, a contig N50 of 37 Mb, a scaffold N50 of 80 Mb, and a BUSCO score of 98.8%. We identified 31 autosomal pairs and two sex chromosomes in the wood stork genome, but failed to identify four additional autosomal microchromosomes previously found via karyotyping. Orthology analyses confirmed reported synapomorphies unique to storks and identified the chromosomes participating in these fusions. This study highlights the difficulty and potential problems associated with delineating microchromosomes in reference genome assemblies. It also provides a foundation for studying karyotype evolution in the core water bird clade that includes penguins, albatrosses, storks, cormorants, herons, and ibises. Finally, our reference genome will allow for numerous genomic studies, such as genome-wide association studies of local adaptation, that will aid in wood stork conservation.

Introduction

Bird genomes feature a relatively small and stable genome size (nuclear DNA content) compared to other vertebrate taxa (Tiersch and Wachtel 1991). The reason for small genome sizes in birds is unknown, but it has been hypothesized to be adaptive for the energy requirements associated with flight (Hughes and Piontkivska 2005). Genome organization within Aves is also quite consistent. Birds have a diploid number of 2n ≈ 80 which includes ~10 pairs of macrochromosomes (large chromosomes that can be flow-sorted) and many somewhat smaller microchromosomes, although the transition in size between macrochromosomes and microchromosomes is more gradual than this binary classification implies (Griffin et al. 2007). Compared to macrochromosomes, microchromosomes have high recombination rates and G + C content and are gene dense with little repetitive sequence (International Chicken Genome Sequencing Consortium 2004). Other vertebrate taxa, such as reptiles (Olmo 2008), amphibians (Morescalchi 1980), and fish (Ohno et al. 1969), contain species with microchromosomes as well. However, the presence of many microchromosomes seems to be particularly associated with avian genomes, which may have retained this feature from an original chordate ancestor (Waters et al. 2021).

Storks (Order Ciconiiformes, Family Ciconiidae) are a distinct lineage of large wading birds that constitute the only family in their order. Current molecular evidence places storks within the clade Pelecanimorphae as sister to Pelecanes, a clade that contains Order Suliformes (frigatebirds, gannets, boobies, darters, cormorants, and shags) and Order Pelecaniformes (ibises, spoonbills, herons, bitterns, shoebill, hamerkop, and pelicans) (Hackett et al. 2008; Burleigh et al. 2015; Kuramoto et al. 2015; Prum et al. 2015; Kimball et al. 2019; Kuhl et al. 2021). More broadly, storks are members of the core water bird clade, Aequornithes, which includes Gaviiformes (loons) and Feraequornithes (Burleigh et al. 2015; Sangster and Mayr 2021). Feraequornithes contains the Pelecanimorphae (ciconiiforms, suliforms, and pelecaniforms) and the Procellariimorphae (albatrosses, petrels, and penguins) (Burleigh et al. 2015; Sangster and Mayr 2021).

Traditionally, storks have been classified into three distinct lineages, tribes Mycteriini (genera Anastomus and Mycteria), Ciconiini (genus Ciconia), and Leptoptilini (genera Leptoptilos, Jabiru, and Ephippiorhynchus), based on morphology and behavior (Kahl 1987). Several lines of evidence, including karyotype analysis by cell staining (de Boer and van Brink 1982), a DNA–DNA hybridization study (Slikas 1997), comparison of cytochrome b sequences (Slikas 1997), and chromosome painting (Seligmann et al. 2019) suggest non-monophyly of the tribe Leptoptilini. The recent stork phylogeny of Rodríguez-Rodríguez and Negro (2021) supports this claim. Within this phylogeny, storks are divided into four groups: 1) Jabiru and Ephippiorhynchus, 2) Mycteriini, 3) Ciconiini, and 4) Leptoptilos. Groups 1 and 2 form a clade sister to a clade consisting of groups 3 and 4.

One member of tribe Mycteriini, the wood stork (Mycteria americana), is a species of conservation concern in the United States. The wood stork’s range includes the southeastern United States, Mexico, Central America, Cuba, and South America. In 1984, the U.S. government listed the wood stork as an endangered species (“Endangered and Threatened Wildlife and Plants; U.S. Breeding Population of the Wood Stork Determined to be Endangered; Final Rule,” February 28, 1984) due to the loss of suitable feeding habitat in southern Florida, the historical stronghold of the U.S. wood stork population (Ogden and Patty 1981). Northward range expansion and a concomitant increase in stork numbers in the succeeding decades motivated downlisting of the species in the United States from endangered to threatened status (“Endangered and Threatened Wildlife and Plants; Reclassification of the U.S. Breeding Population of the Wood Stork from Endangered to Threatened; Final Rule,” June 30, 2014). It has been recently proposed to delist the wood stork completely from the Endangered Species Act due to recovery, including the perception of sufficient numbers and productivity to guarantee long-term viability of the U.S. wood stork population. However, the adaptative potential for the species remains unclear amidst climate change related threats including changes in seasonal rainfall patterns, warming temperatures, and sea level rise (“Endangered and Threatened Wildlife and Plants; Removal of the Southeast U.S. Distinct Population Segment of the Wood Stork From the List of Endangered and Threatened Wildlife”, February 15, 2023).

There are currently few genomic resources for storks including only one stork chromosome-level assembly (the maguari stork (Ciconia maguari); NCBI BioProject PRJDB4709). The objective of this study is to build an annotated chromosome-level genome for the wood stork that will provide a detailed map of what genes are present on each chromosome and serve as a resource for conservation and evolutionary studies. In this paper, we additionally test for genome-level synapomorphies unique to storks to improve our understanding of genome evolution in birds.

Methods

Biological materials

The Jacksonville Zoo and Aquarium in northern Florida contains a wood stork rookery that was naturally established in 1999 (Bear-Hull et al. 2005). In May 2021, fresh blood samples from ten of the colony’s chicks were collected in tubes pre-coated with the anticoagulant EDTA and stored at −80 °C. DNA was extracted using the DNeasy Blood & Tissue Kit (Qiagen, Valencia, CA, USA) following the manufacturer’s protocol. Birds were sexed genetically according to Griffiths et al. (1998) and Lee et al. (2010) to identify a female individual for genomic sequencing. In birds, female is the heterogametic sex (ZW).

Nucleic acid library preparation

Following genetic sexing, a blood sample from a single female wood stork was sent to the commercial provider Cantata Bio (Scotts Valley, CA, USA) for nucleic acid library preparation. Two genomic libraries were produced: 1) a PacBio high-fidelity (HiFi) library (~20 kb) for long read sequencing, and 2) a Dovetail Omni-C library for short read sequencing and continuity ligation. Additionally, an RNA-Seq library was produced for genome annotation.

For HiFi library preparation, high-quality double stranded DNA was extracted from stork blood and purified using the Blood & Cell Culture DNA Mini Kit (Qiagen). Following purification, DNA was quantified using the Qubit 2.0 Fluorometer (Thermo Fisher Scientific, Waltham, MA, USA) and the Qubit dsDNA Broad Range Assay Kit (Thermo Fisher Scientific). The library was prepared using the SMRTbell Express Template Prep Kit 2.0 (PacBio, Menlo Park, CA, USA) following the manufacturer’s protocol. The library was bound to DNA polymerase using the Sequel II Binding Kit 2.0 (PacBio).

Dovetail Omni-C library preparation followed methods described in Putnam et al. (2016). Briefly, chromatin was cross-linked using formaldehyde and extracted. Cross-linked chromatin was subsequently fragmented using DNAse I, a sequence-independent endonuclease. The ends of the chromatin fragments were blunted and tagged with biotin, followed by proximity ligation to create chimeric molecules. Crosslinks were reversed and DNA was purified from protein. Next, DNA was treated to remove biotin that was not internalized within ligated fragments and sheared to ~350 bp mean fragment size. Sequencing libraries were generated using NEBNext Ultra enzymes (New England Biolabs, Ipswich, MA, USA) and Illumina-compatible adapters. Streptavidin beads were used to isolate biotin-containing fragments, which were subsequently amplified using polymerase chain reaction (PCR).

Extraction of total RNA was performed using the RNeasy Plus Mini Kit (Qiagen) following the manufacturer’s protocol. After extraction, RNA was quantified using: 1) the Qubit 2.0 Fluorometer (Thermo Fisher Scientific) with the Qubit RNA Broad Range Assay Kit (Thermo Fisher Scientific), and 2) the 4200 TapeStation system (Agilent, Santa Clara, CA, USA). DNase treatment, AMPure bead cleanup (Beckman Coulter Life Sciences, Indianapolis, IN, USA), and Qiagen FastSelect HMR rRNA (Qiagen) depletion were performed prior to library preparation. Library preparation was performed using the NEBNext Ultra II DNA Library Prep Kit for Illumina (New England Biolabs) following the manufacturer’s protocol.

Sequencing and genome assembly

Genome assembly

Genomic assembly and annotation were executed by Cantata Bio. A list of all programs and versions used throughout the assembly process is available in Table 1.

Table 1.

Computer programs used in genome assembly, assembly validation and annotation, and chromosome/organelle delineation for the wood stork (Mycteria americana) reference genome.

PurposeProgramVersion
Genome AssemblyInitial de novo genome assemblyHifiasm0.15.4
Compare nuclear sequences to sequence databaseBLAST2.9.0
Removal of contaminantsBlobTools1.1.1
Removal of haplotypic duplicationsPurge_dups1.2.5
ScaffoldingHiRise2.1.1
Short read alignmentBWA0.7.17
Assembly Validation and AnnotationAssess genome completenessBUSCO4.05
Pipeline for identification of repetitive elementsRepeatModeler2.0.1
Algorithm for identification of repetitive elementsRECON1.08
Algorithm for identification of repetitive elementsRepeatScout1.0.6
Annotation and masking of repetitive elementsRepeatMasker4.1.0
RNA read alignmentSTAR2.7
Gene predictionAUGUSTUS2.5.5
Gene predictionSNAP2006-07-28
Gene predictionMAKER3.01.03
tRNA predictiontRNAscan-SE2.05
Chromosome/Organelle DelineationProcessing of Hi-C pairsPairtools1.0.2
Index .pairs filePairix0.3.7
Generate genomic contact matrixCooler0.8.11
Visualize genomic contact matrixHiGlass1.11.7
Genome-to-genome comparisonD-GENIES1.4.0
Genome-to-genome alignmentMinimap22.24
Align reads to referenceBWA-MEM22.2.1
Post-alignment metricsSAMtools1.16.1
Software environmentR4.2.1
Chromosome orthologypafr0.0.2
Mitochondrion trimmingGeneious Prime2023.1.2
PurposeProgramVersion
Genome AssemblyInitial de novo genome assemblyHifiasm0.15.4
Compare nuclear sequences to sequence databaseBLAST2.9.0
Removal of contaminantsBlobTools1.1.1
Removal of haplotypic duplicationsPurge_dups1.2.5
ScaffoldingHiRise2.1.1
Short read alignmentBWA0.7.17
Assembly Validation and AnnotationAssess genome completenessBUSCO4.05
Pipeline for identification of repetitive elementsRepeatModeler2.0.1
Algorithm for identification of repetitive elementsRECON1.08
Algorithm for identification of repetitive elementsRepeatScout1.0.6
Annotation and masking of repetitive elementsRepeatMasker4.1.0
RNA read alignmentSTAR2.7
Gene predictionAUGUSTUS2.5.5
Gene predictionSNAP2006-07-28
Gene predictionMAKER3.01.03
tRNA predictiontRNAscan-SE2.05
Chromosome/Organelle DelineationProcessing of Hi-C pairsPairtools1.0.2
Index .pairs filePairix0.3.7
Generate genomic contact matrixCooler0.8.11
Visualize genomic contact matrixHiGlass1.11.7
Genome-to-genome comparisonD-GENIES1.4.0
Genome-to-genome alignmentMinimap22.24
Align reads to referenceBWA-MEM22.2.1
Post-alignment metricsSAMtools1.16.1
Software environmentR4.2.1
Chromosome orthologypafr0.0.2
Mitochondrion trimmingGeneious Prime2023.1.2

Programs are listed in the order in which they were used, and each listed program is accompanied with its purpose and version.

Table 1.

Computer programs used in genome assembly, assembly validation and annotation, and chromosome/organelle delineation for the wood stork (Mycteria americana) reference genome.

PurposeProgramVersion
Genome AssemblyInitial de novo genome assemblyHifiasm0.15.4
Compare nuclear sequences to sequence databaseBLAST2.9.0
Removal of contaminantsBlobTools1.1.1
Removal of haplotypic duplicationsPurge_dups1.2.5
ScaffoldingHiRise2.1.1
Short read alignmentBWA0.7.17
Assembly Validation and AnnotationAssess genome completenessBUSCO4.05
Pipeline for identification of repetitive elementsRepeatModeler2.0.1
Algorithm for identification of repetitive elementsRECON1.08
Algorithm for identification of repetitive elementsRepeatScout1.0.6
Annotation and masking of repetitive elementsRepeatMasker4.1.0
RNA read alignmentSTAR2.7
Gene predictionAUGUSTUS2.5.5
Gene predictionSNAP2006-07-28
Gene predictionMAKER3.01.03
tRNA predictiontRNAscan-SE2.05
Chromosome/Organelle DelineationProcessing of Hi-C pairsPairtools1.0.2
Index .pairs filePairix0.3.7
Generate genomic contact matrixCooler0.8.11
Visualize genomic contact matrixHiGlass1.11.7
Genome-to-genome comparisonD-GENIES1.4.0
Genome-to-genome alignmentMinimap22.24
Align reads to referenceBWA-MEM22.2.1
Post-alignment metricsSAMtools1.16.1
Software environmentR4.2.1
Chromosome orthologypafr0.0.2
Mitochondrion trimmingGeneious Prime2023.1.2
PurposeProgramVersion
Genome AssemblyInitial de novo genome assemblyHifiasm0.15.4
Compare nuclear sequences to sequence databaseBLAST2.9.0
Removal of contaminantsBlobTools1.1.1
Removal of haplotypic duplicationsPurge_dups1.2.5
ScaffoldingHiRise2.1.1
Short read alignmentBWA0.7.17
Assembly Validation and AnnotationAssess genome completenessBUSCO4.05
Pipeline for identification of repetitive elementsRepeatModeler2.0.1
Algorithm for identification of repetitive elementsRECON1.08
Algorithm for identification of repetitive elementsRepeatScout1.0.6
Annotation and masking of repetitive elementsRepeatMasker4.1.0
RNA read alignmentSTAR2.7
Gene predictionAUGUSTUS2.5.5
Gene predictionSNAP2006-07-28
Gene predictionMAKER3.01.03
tRNA predictiontRNAscan-SE2.05
Chromosome/Organelle DelineationProcessing of Hi-C pairsPairtools1.0.2
Index .pairs filePairix0.3.7
Generate genomic contact matrixCooler0.8.11
Visualize genomic contact matrixHiGlass1.11.7
Genome-to-genome comparisonD-GENIES1.4.0
Genome-to-genome alignmentMinimap22.24
Align reads to referenceBWA-MEM22.2.1
Post-alignment metricsSAMtools1.16.1
Software environmentR4.2.1
Chromosome orthologypafr0.0.2
Mitochondrion trimmingGeneious Prime2023.1.2

Programs are listed in the order in which they were used, and each listed program is accompanied with its purpose and version.

The HiFi library was loaded onto a Sequel II 8M SMRT cell (PacBio) using the MagBindKit v2 (PacBio) and sequenced to 67× coverage using circular consensus sequencing (CCS) mode (Wenger et al. 2019). A draft genome assembly was built from the subsequent reads using default parameters in Hifiasm v0.15.4 (Cheng et al. 2021). The Hifiasm output assembly (hifiasm.p_ctg.fa) was then compared to the BLAST (Basic Local Alignment Search Tool) v2.9.0 (Altschul et al. 1990) nucleotide database (nt). The resulting file was used as input for BlobTools v1.1.1 (Laetsch and Blaxter 2017), and scaffolds identified as possible contamination were removed from the assembly (filtered.asm.cns.fa). Finally, Purge_dups v1.2.5 (Guan et al. 2020) was used to remove haplotigs and contig overlaps (purged.fa).

The Omni-C library was sequenced on a HiSeqX platform (Illumina, San Diego, CA, USA) to ~30× coverage using 2 × 150 bp paired-end reads. The input de novo assembly and the Omni-C library reads were used as input data for HiRise v2.1.1 (Putnam et al. 2016), a software pipeline designed to scaffold genome assemblies using proximity ligation data. Briefly, the Omni-C reads were first mapped to the Hifiasm assembly using BWA v0.7.17 (Li and Durbin 2009). Only reads with mapping quality scores ≥50 were retained. Then, the separations of Omni-C read pairs mapped within draft scaffolds were used by HiRise to produce a likelihood model that identified and broke putative misjoins, scored prospective joins, and made novel joins.

Assembly validation

The qualities of the initial Hifiasm assembly and the scaffolded HiRise assembly were assessed for genome completeness using the program BUSCO v4.05 (Manni et al. 2021). BUSCO uses universal single-copy orthologs; for this project, the eukaryota_odb10 database, which includes 70 species and 255 single-copy orthologous genes, was used.

Genome annotation

First, repetitive regions (e.g. transposable elements) within the genome were identified de novo using the pipeline RepeatModeler v2.0.1 (Flynn et al. 2020). This pipeline used two distinct discovery algorithms to accomplish this task: RECON v1.08 (Bao and Eddy 2002) and RepeatScout v1.0.6 (Price et al. 2005). The repeat library produced from the pipeline was input into the program RepeatMasker v4.1.0 (Smit et al. 2013–2015) which annotated and masked the repeats in the assembly file.

The RNA-Seq library was run on the NovaSeq6000 platform (Illumina) in 2 × 150 bp configuration. RNA-Seq reads were mapped onto the genome using the RNA-Seq aligner STAR v2.7 (Dobin et al. 2013). The bam2hints tool within AUGUSTUS v2.5.5 (Stanke and Waack 2003) was used to generate intron-exon boundary hints. Coding sequences from four avian species, the little egret (Egretta garzetta), the crested ibis (Nipponia nippon), the chicken (Gallus gallus), and the zebra finch (Taeniopygia guttata), were used to train initial ab initio models for the gene prediction programs SNAP v2006-07-28 (Korf 2004) and AUGUSTUS; for AUGUSTUS, this included six rounds of prediction optimization. Following model training, gene prediction was performed in the repeat-masked assembly file using SNAP and AUGUSTUS. Gene prediction was also performed in the repeat-masked assembly file using the annotation pipeline MAKER v3.01.03 (Cantarel et al. 2008). UnitProKB/Swiss-Prot peptide sequences from the UniProt Knowledgebase (http://www.uniprot.org) and the protein sequences from the four avian species above (i.e. little egret, crested ibis, chicken, and zebra finch) were used when running MAKER. Annotation edit distance (AED) scores for each of the predicted genes were generated by MAKER to assess gene prediction quality.

The final genome annotation contained the intersection of the genes predicted by SNAP and AUGUSTUS. Genes were further characterized for their putative function by performing a BLAST search of the peptide sequences against the UniProt Knowledgebase. tRNAs were predicted using tRNAscan-SE v2.05 (Chan et al. 2021).

Determination of chromosome-level scaffolds including sex verification

After scaffolding with HiRise, a genomic contact matrix was produced to visualize chromosomal-level scaffolds. First, the command parse in the Pairtools v1.0.2 (Open2C et al. 2023) pipeline was used to identify valid ligation events present in the Omni-C data. Then, the pairs were sorted and PCR duplicates were removed using the Pairtools commands sort and dedup, respectively. Pairtools split was used to produce a .pairs file, and the .pairs file was indexed with Pairix v0.3.7 (Lee et al. 2022). A single resolution cool file was generated using the command cload pairix in Cooler v0.8.11 (Abdennur and Mirny, 2020). Subsequently, a multi-resolution mcool file was generated using the command zoomify in Cooler. The genomic contact matrix, in the form of the mcool file, was visualized in the software HiGlass v1.11.7 (Kerpedjiev et al. 2018).

Six different avian genomes with chromosome-level assemblies were downloaded from the GenBank database (Benson et al. 2013): 1) chicken (WGS master accession JAENSK000000000; BioProject PRJNA660757), 2) zebra finch (WGS master accession RRCB00000000; BioProject PRJNA489098), 3) common cuckoo (Cuculus canorus; WGS master accession JAGIYT000000000; BioProject PRJNA562015), 4) Humboldt penguin (Spheniscus humboldti; WGS master accession JAPZLJ000000000; BioProject PRJNA838343), 5) plumbeous ibis (Theristicus caerulescens; WGS master accession JAJGSR000000000; BioProject PRJNA774297), and 6) maguari stork (WGS master accession JAGFVN000000000; BioProject PRJNA715733) for chromosome orthology analysis. To assess reciprocity in orthology, each of these genomes were aligned with the wood stork genome twice using the program D-GENIES v1.4.0 (Cabanettes and Klopp 2018). One alignment used the wood stork genome as the query and the other genome as the target and the other used the wood stork genome as the target and the other genome as the query. The program utilized Minimap2 v2.24 (Li 2018), with the option for few repeats, for genome alignment and then generated dot-plots. Additionally, D-GENIES produced an association table for each comparison that included the best matching chromosome in the target for each scaffold in the query (or vice-versa) as well as PAF (Pairwise mApping Format) files that consisted of alignments between sequences.

The package pafr v0.0.2 (Winter et al. 2020) in R v4.2.1 (R Core Team 2022) was used in conjunction with the PAF files to visualize and confirm orthology between wood stork chromosome scaffolds and chromosomes of chicken, finch, cuckoo, penguin, ibis, and maguari stork (see R Markdown). Briefly, we first visualized the coverage of the wood stork sex chromosome scaffolds (wood stork Z chromosome = MAMZ and wood stork W chromosome = MAMW) with the sex chromosomes of chicken (GGAZ and GGAW) and maguari stork (CMAZ and CMAW). Coverage plots between wood stork chromosomes and chicken and/or penguin chromosomes were produced when dot plots and association tables indicated non 1:1 chromosome orthology between species. Next, we used pafr to confirm orthology between microchromosomes MAM25-29 and their complements in Humboldt penguin, maguari stork, and chicken. Finally, we used the leftover microchromosomes in chicken, zebra finch, common cuckoo, and Humboldt penguin that were not orthologous to MAM1-29 to identify if any other scaffolds in the wood stork genome were chromosomes.

We compared the visualized alignment of the scaffold pertaining to the wood stork mitogenome with that of the mitogenomes of the other species. Based on an unusually large sequence length (33,032 bp) and incongruence with the other mitogenomes, the wood stork mitochondrion scaffold was trimmed using Geneious Prime v.2023.1.2 (https://www.geneious.com). First, the scaffold was aligned to the white stork (Ciconia ciconia; GenBank accession NC_002197) mitogenome using a Geneious alignment (parameters included a global alignment with free end gaps and a cost matrix of 70% similarity) and trimmed. Geneious alignments between the trimmed sequence and chicken (CM028585.1), oriental stork (Ciconia boyciana; NC_002196.1), and black stork (Ciconia nigra; KF906246.1) mitogenomes available on GenBank were performed to confirm accurate trimming.

To verify using bioinformatics that our stork was a female, we mapped the sequencing reads back to the assembled wood stork genome using BWA-MEM2 v2.2.1 (Vasimuddin et al. 2019). We then used the function coverage in SAMtools v1.16.1 (Danecek et al. 2021) to determine coverage or read depth of each scaffold. SAMtools coverage also provided the mapping quality values for the wood stork reads mapped onto the wood stork reference; this reinforced confidence in which scaffolds could be assigned as chromosome-level scaffolds.

Results

Sequencing and genome assembly

Genome assembly and assembly validation

We obtained ~4.7 million (67 gigabase-pairs [Gbp]) PacBio CCS reads, which resulted in 67× coverage for the initial de novo genome assembly using Hifiasm. The initial de novo genome assembly using Hifiasm had a total length of 1.35 Gbp across 359 contigs (342 scaffolds) and a contig (and scaffold) N50 of 36,845,572 base-pairs (bp) after primary filtering. The initial assembly had a longest contig length of 131,390,114 bp and 17 gaps. After scaffolding this assembly with HiRise using the Dovetail Omni-C library, the final assembly retained a total length of ~1.35 Gbp. The final assembly contained 280 scaffolds; the number of contigs in the final assembly remained at 359 because the scaffolding process does not change the number (or length) of contigs. The scaffold N50 for the final assembly was 80,020,930 bp. The final assembly contained 70 gaps and a BUSCO score of 98.8%.

Genome annotation

Annotation predicted 28,238 genes in the assembly, and these genes accounted for 38.87 Mb or 2.88% of the length of the final assembly. The average gene length was 1.38 kb, and there were 2,731 single-exon genes identified. At least 50% of predicted genes had AED scores <2.5 and at least 80% of predicted genes has AED scores <0.5, signifying that most predicted genes in the annotation were well supported by external evidence (Supplementary Fig. S1). Out of 255 BUSCO genes searched, BUSCO analysis of predicted genes identified 226 (88.6%) complete single-copy BUSCOs and 18 (7.1%) fragmented BUSCOs. Eleven (4.3%) BUSCOs were missing. In terms of repeats masked in the genome, 15.4% of the total genome was masked. Within the genome, 5.9% were Class I TEs repeats, 0.1% were Class II TEs repeats, 0.2% were low complexity repeats, and 0.9% were simple repeats.

Determination of chromosomal-level scaffolds including sex verification

Visualization with HiGlass of the largest 53 scaffolds in the wood stork reference assembly identified 26 well-defined scaffolds (Fig. 1A). Based on orthology analyses, these 26 scaffolds corresponded to wood stork chromosomes 1-24 (MAM1-24) and two MAMZ scaffolds (a major MAMZ scaffold that accounted for most of the chromosome and a minor MAMZb that was at a distal end of the Z chromosome; Supplementary Table S1 and Fig. S2). After the 26th scaffold, scaffolds became less defined, with some scaffolds having little intra-scaffold contact (Fig. 1B). Synteny plots with maguari stork, Humboldt penguin, and chicken confirmed five scaffolds after scaffold 26 in the contact matrix were most likely wood stork chromosomes 25-29 (MAM25-29; Fig. 2). At least 50% of the orthologous maguari stork chromosome had synteny with the wood stork scaffold and at least 60% of the wood stork scaffold was aligned to the orthologous penguin microchromosome. One exception was MAM27, which was orthologous to maguari stork chromosome 29 (CMA29) but showed little orthology to chicken or penguin microchromosomes. Two scaffolds pertaining to MAMW (defined MAMWa and MAMWb) and an additional MAMZ scaffold (defined MAMZc) were also present after scaffold 26 (Supplementary Fig. S2). The two MAMW scaffolds covered about a quarter of maguari stork chromosome W (CMAW) but did not have significant alignments with chicken chromosome W (GGAW; Supplementary Fig. S2). MAMZc aligned to distal portions of both chicken chromosome Z (GGAZ) and maguari stork chromosome Z (CMAZ; Supplementary Fig. S2).

Genomic contact matrix of the wood stork (Mycteria americana) reference genome visualized with HiGlass v1.11.7 (Kerpedjiev et al. 2018). A) The contact matrix of the largest 53 scaffolds is depicted with thin black borders around the contacts for the 24 largest scaffolds and with a thicker black border (bottom right corner) around the contacts for the remaining 29 scaffolds. B) A close-up of the contact matrix for Scaffolds 25–53 with thin black borders around the contacts for these scaffolds.
Fig. 1.

Genomic contact matrix of the wood stork (Mycteria americana) reference genome visualized with HiGlass v1.11.7 (Kerpedjiev et al. 2018). A) The contact matrix of the largest 53 scaffolds is depicted with thin black borders around the contacts for the 24 largest scaffolds and with a thicker black border (bottom right corner) around the contacts for the remaining 29 scaffolds. B) A close-up of the contact matrix for Scaffolds 25–53 with thin black borders around the contacts for these scaffolds.

Synteny plots of wood stork (Mycteria americana) chromosomes 25–31 (MAM25–31) against orthologous chromosomes in other avian species. Positions are displayed in megabases (Mb). Each stork chromosome is aligned to the orthologous chromosome in Humboldt penguin (Spheniscus humboldti; SHU), except for MAM27 which is aligned to chicken (Gallus gallus) chromosome 31 (GGA31). Each stork chromosome is also aligned to the orthologous chromosome in maguari stork (Ciconia maguari; CMA). Unplaced scaffolds in the maguari stork genome are represented as “CMA US.”
Fig. 2.

Synteny plots of wood stork (Mycteria americana) chromosomes 25–31 (MAM25–31) against orthologous chromosomes in other avian species. Positions are displayed in megabases (Mb). Each stork chromosome is aligned to the orthologous chromosome in Humboldt penguin (Spheniscus humboldti; SHU), except for MAM27 which is aligned to chicken (Gallus gallus) chromosome 31 (GGA31). Each stork chromosome is also aligned to the orthologous chromosome in maguari stork (Ciconia maguari; CMA). Unplaced scaffolds in the maguari stork genome are represented as “CMA US.”

Two additional microchromosomes were defined, wood stork chromosomes 30 and 31 (MAM30 and MAM31), based on synteny plotting between wood stork scaffolds and additional microchromosomes in the Humboldt penguin genome (Fig. 2). These two microchromosomes were not previously identified in the maguari stork genome assembly, but unique unplaced scaffolds in the maguari stork assembly were identified that had at least 60% synteny with these microchromosomes. Additional scaffolds in the wood stork genome were suspected of being orthologous to additional penguin microchromosomes due to substantial sequence alignments (>30% of wood stork scaffold aligned to a penguin microchromosome). However, there were no substantial sequence alignments between these wood stork scaffolds and maguari stork unplaced scaffolds (see R Markdown).

Chromosome orthology analyses based on dot plots and association tables revealed fissions and fusions of chromosomes in the wood stork and the other avian species studied (Supplementary Table S1 and Fig. S3). Wood stork chromosomes 4 and 10 (MAM4 and MAM10) were orthologous to different sections of chicken chromosome 4 (GGA4; Supplementary Fig. S4), but orthologous to separate chromosomes in the other avian species (Supplementary Table S1). Several wood stork chromosomes were fusions of two chromosomes in chicken: 1) MAM6 was a fusion of GGA6 and GGA10, 2) MAM7 was a fusion of GGA8 and GGA9, and 3) MAM8 was a fusion of GGA11 and GGA13 (Fig. 3; Supplementary Fig. S5; Table S1). These fusions were shared with maguari stork, but not with zebra finch, common cuckoo, Humboldt penguin, or plumbeous ibis (Supplementary Table S1).

Fissions and fusions identified in clade Feraequornithes. Orthologs of chicken (Gallus gallus) chromosomes 1 (GGA1) and 6–14 (GGA6-14) in a) Humboldt penguin (Spheniscus humboldti; SHU), b) wood stork (Mycteria americana; MAM), and c) plumbeous ibis (Theristicus caerulescens; TCA). Arrows (and colors) correspond to the fate of the chicken chromosomes in the genomes of the three other species.
Fig. 3.

Fissions and fusions identified in clade Feraequornithes. Orthologs of chicken (Gallus gallus) chromosomes 1 (GGA1) and 6–14 (GGA6-14) in a) Humboldt penguin (Spheniscus humboldti; SHU), b) wood stork (Mycteria americana; MAM), and c) plumbeous ibis (Theristicus caerulescens; TCA). Arrows (and colors) correspond to the fate of the chicken chromosomes in the genomes of the three other species.

Orthology analyses additionally identified fissions and fusions of chromosomes in chicken and the other avian species. Chicken chromosome 1 (GGA1) was orthologous to two chromosomes in zebra finch (TGU1 and TGU1A, data not shown) and two chromosomes in plumbeous ibis (TCA2 and a portion of TCA3; Supplementary Table S1; Fig. 3c). Also of note were the fusions observed within the clade Feraequornithes (Fig. 3). These fusions were not shared between Humboldt penguin, plumbeous ibis, and storks, but did involve complements of the same chicken chromosomes 6–14 (GGA6-14). In Humboldt penguin, chromosome 5 (SHU5) was a fusion of GGA6 and GGA8 and chromosome 6 (SHU6) was a fusion of GGA7 and GGA9 (Fig. 3a). In wood stork (and maguari stork), as previously described, MAM6 was a fusion of GGA6 and GGA10, MAM7 was a fusion of GGA8 and GGA9, and MAM8 was a fusion of GGA11 and GGA13 (Fig. 3b). Substantial chromosomal rearrangements occurred in the plumbeous ibis genome (Fig. 3c). Plumbeous ibis chromosome 5 (TCA5) was a fusion of GGA7, GGA8, and GGA14, chromosome 8 was a fusion of GGA9 and GGA11, and chromosome 9 was a fusion of GGA10 and GGA12. It appears that GGA1 experienced a fission event in plumbeous ibis, in which part of the chromosome became plumbeous ibis chromosome 2 (TCA2) and the other part of the chromosome fused with GGA6 to become plumbeous ibis chromosome 3 (TCA3).

Originally, it appeared that the scaffold pertaining to the wood stork mitochondrion sequence was the composite of duplicated contigs (Supplementary Fig. S6). After alignment and trimming, the final wood stork mitochondrion sequence was 17,347 bp in length and appeared orthologous to chicken, white stork, black stork, and oriental stork mitogenome sequences.

The mean coverage of wood stork autosomes 1–24 (MAM1-24) was 31.39. Coverage of the major Z scaffold (MAMZ) was approximately half that of the autosomes (17.63) validating that the individual sequenced was the heterogametic sex (female).

Discussion

Despite new technologies and diminishing costs facilitating the production of more accurate and complete genomic resources for non-model organisms, challenges remain in assembling genomes with complex architecture. Unlike human and other mammalian genomes, most avian genomes, many reptile genomes, and some fish and amphibian genomes include microchromosomes, or chromosomes <0.5 µm in size, in addition to macrochromosomes (Srikulnath et al. 2021; Waters et al. 2021). These microchromosomes are often gene-rich, have little repetitive sequence, and comprise up to a third of the total genome content in avian genomes. Recent genome assemblies for chicken characterize all chromosomes including microchromosomes (Masabanda et al. 2004), but many avian genomes are being deposited into public repositories (e.g. GenBank) with discrepancies between karyotype number from cytological studies and the number of chromosome sets identified in genome assemblies. This discordance is likely due to the difficulty of distinguishing some microchromsomes from unplaced scaffolds and thus results in the number of chromosomes in the final genomic assemblies being underestimated.

We produced a highly contiguous genome assembly of the wood stork and were able to identify 31 autosomes in the wood stork genome. Francisco and Galetti Junior (2000), and more recently de Sousa et al. (2023), determined 2n = 72 for wood stork based on karyotype analysis. Based on this diploid number, we would expect 35 autosomes in the wood stork genome and thus our assembly and subsequent analysis has either failed to assemble or failed to identify four microchromosomes. This is not surprising as microchromosomes are small and can be hard to both identify and assemble. Future studies aimed at producing complete sequences for the wood stork genome may utilize alternative techniques such as isolating and sequencing individual microchromosomes.

Chromosome-level assemblies of wood stork and maguari stork identified chromosome synapomorphies that are most likely unique to Order Ciconiiformes. Previous research partially characterized these synapomorphies using chicken probes for GGA1-9 on the jabiru and the maguari stork (Seligmann et al. 2019) and chicken probes for GGA1-11 on the wood stork (de Sousa et al. 2023). Our data support the identified GGA8/GGA9 fusion in storks, and our comparisons with other members of the clade Feraequornithes (Humboldt penguin and plumbeous ibis) provide additional support that this fusion is unique to storks. Additionally, our study identified that GGA10 was the unidentified chromosome in Seligmann et al. (2019) that was fused with GGA6 in storks. This is in partial disagreement with de Sousa et al. (2023), who stated that GGA6 was fused with an unidentified chromosome in wood stork, but characterized GGA10 as an independent chromosome in the wood stork karyotype.

The two stork fusions (GGA8/GGA9) and (GGA6/GGA10) were not found in a chromosome painting study of three members of Pelecaniformes, the grey heron (Ardea cinerea), the little egret (Egretta garzetta), and crested ibis (Nipponia nippon), further suggesting these fusions are stork specific (Wang et al. 2022). The inclusion of additional taxa from within the clade Feraequornithes in studies of karyotype evolution is needed to support these hypotheses.

Our study also identified the fusion, GGA11/GGA13, that was present in both stork species but none of the other species tested, including Humboldt penguin and plumbeous ibis. The fusion GGA11/GGA13 has also been identified in the neotropic cormorant (Nannopterum brasilianum) (Kretschmer et al. 2021) and was partially characterized for the wood stork by de Sousa et al. (2023) who found GGA11 fused to an unidentified chromosome. Further research needs to be done to determine if the GGA11/GGA13 fusion in some members of Feraequornithes occurred independently in separate lineages or was the product of the same event.

The reference genome described here will allow for broader studies of genome evolution in the core water bird clade (including penguins, albatrosses, storks, cormorants, herons, and ibises) and numerous genomic studies of wood storks specifically. In particular, the genomic population structure of wood storks has not been adequately tested and evidence of local adaptation of wood storks to nesting and feeding sites has yet to be assessed. Such studies will provide a solid foundation for wood stork management and conservation.

Acknowledgments

The authors thank Donna Bear (Jacksonville Zoo and Gardens) for supplying wood stork blood samples, which were collected under US Fish and Wildlife Service migratory bird permit TE08606C, US Geological Survey banding permit 23946, and IACUC approval USCA-IACUC-006. We also thank Jordan Zhang and Mark Daly (Cantata Bio) for genome assembly, annotation, answering our questions about assembly statistics, and connecting us with other avian genome researchers. Nicolas Alexandre (University of California Berkeley) provided advice on which programs to use to analyze the genome post-assembly and Linelle Abueg (Rockefeller University) provided us with genomes from the Vertebrate Genomes Project and answered our questions regarding VGP assembly. Finally, we thank Philippe Bordron (Centre INRA de Toulouse) for providing help with the D-GENIES program.

Funding

This material is based upon work supported by the National Science Foundation under Grant No. 2129600.

Data availability

We have deposited the primary data and code underlying these analyses as follows: 1) Description of the biological source material was deposited at GenBank under BioSample SAMN36274196. 2) Annotated chromosome-level assembly was deposited at GenBank under accession JAUNZN000000000 in BioProject PRJNA990753. 3) Raw DNA and RNA sequence reads were deposited under BioProject PRJNA990753 in NIH’s Sequence Read Archive (SRA). 4) Pairwise mApping Format (PAF) files and chromosome orthology code can be found in https://github.com/rflamio/stork_chrs.

References

Abdennur
N
,
Mirny
LA.
Cooler: scalable storage for Hi-C data and other genomically labeled arrays
.
Bioinformatics
.
2020
:
36
:
311
316
. https://doi.org/10.1093/bioinformatics/btz540

Altschul
SF
,
Gish
W
,
Miller
W
,
Myers
EW
,
Lipman
DJ.
Basic local alignment search tool
.
J Mol Biol
.
1990
:
215
:
403
410
. https://doi.org/10.1016/S0022-2836(05)80360-2

Bao
Z
,
Eddy
SR.
Automated de novo identification of repeat sequence families in sequenced genomes
.
Genome Res
.
2002
:
12
:
1269
1276
. https://doi.org/10.1101/gr.88502

Bear-Hull
D
,
Brooks
W
, Jr
.,
Maharaj
AM.
2005
, October 12–15.
Tracking movements of an adult male wood stork Mycteria americana during the breeding season at the Jacksonville Zoo and Gardens [conference session]
. 29th Waterbird Society Meeting, Jekyll Island, GA, United States.

Benson
DA
,
Cavanaugh
M
,
Clark
K
,
Karsch-Mizrachi
I
,
Lipman
DJ
,
Ostell
J
,
Sayers
EW.
GenBank
.
Nucleic Acids Res
.
2013
:
41
:
D36
D42
. https://doi.org/10.1093/nar/gks1195

Burleigh
JG
,
Kimball
RT
,
Braun
EL.
Building the avian tree of life using a large-scale, sparse supermatrix
.
Mol Phylogenet Evol
.
2015
:
84
:
53
63
. https://doi.org/10.1016/j.ympev.2014.12.003

Cabanettes
F
,
Klopp
C.
D-GENIES: dot plot large genomes in an interactive, efficient and simple way
.
PeerJ
.
2018
:
6
:
e4958
. https://doi.org/10.7717/peerj.4958

Cantarel
BL
,
Korf
I
,
Robb
SMC
,
Parra
G
,
Ross
E
,
Moore
B
,
Holt
C
,
Sánchez Alvarado
A
,
Yandell
M.
MAKER: an easy-to-use annotation pipeline designed for emerging model organism genomes
.
Genome Res
.
2008
:
18
:
188
196
. https://doi.org/10.1101/gr.6743907

Chan
PP
,
Lin
BY
,
Mak
AJ
,
Lowe
TM.
tRNAscan-SE 20: improved detection and functional classification of transfer RNA genes
.
Nucleic Acids Res
.
2021
:
49
:
9077
9096
. https://doi.org/10.1093/nar/gkab688

Cheng
H
,
Concepcion
GT
,
Feng
X
,
Zhang
H
,
Li
H.
Haplotype-resolved de novo assembly using phased assembly graphs with hifiasm
.
Nat Methods
.
2021
:
18
:
170
175
. https://doi.org/10.1038/s41592-020-01056-5

International Chicken Genome Sequencing Consortium
.
Sequence and comparative analysis of the chicken genome provide unique perspectives on vertebrate evolution
.
Nature
.
2004
:
432
:
695
716
. https://doi.org/10.1038/nature03154

Danecek
P
,
Bonfield
JK
,
Liddle
J
,
Marshall
J
,
Ohan
V
,
Pollard
MO
,
Whitwham
A
,
Keane
T
,
McCarthy
SA
,
Davies
RM
, et al. .
Twelve years of SAMtools and BCFtools
.
GigaScience
.
2021
:
10
:
1
4
. https://doi.org/10.1093/gigascience/giab008

de Boer
LEM
,
van Brink
JM.
Cytotaxonomy of the Ciconiiformes (Aves), with karyotypes of eight species new to cytology
.
Cytogenet Cell Genet
.
1982
:
34
:
19
34
. https://doi.org/10.1159/000131791

de Sousa
RPC
,
Campos
PSB
,
dos Santos
MS
,
O’Brien
PC
,
Ferguson-Smith
MA
,
de Oliveira
EHC.
Cytotaxonomy and molecular analyses of Mycteria americana (Ciconiidae: Ciconiiformes): insights on stork phylogeny
.
Genes
.
2023
:
14
:
816
. https://doi.org/10.3390/genes14040816

Dobin
A
,
Davis
CA
,
Schlesinger
F
,
Drenkow
J
,
Zaleski
C
,
Jha
S
,
Batut
P
,
Chaisson
M
,
Gingeras
TR.
STAR: ultrafast universal RNA-Seq aligner
.
Bioinformatics
.
2013
:
29
:
15
21
. https://doi.org/10.1093/bioinformatics/bts635

Endangered and Threatened Wildlife and Plants; Reclassification of the U.S. Breeding Population of the Wood Stork from Endangered to Threatened
; Final Rule, 79 Fed. Reg. 37078-37103; June 30,
2014
.

Endangered and Threatened Wildlife and Plants; Removal of the Southeast U.S. Distinct Population Segment of the Wood Stork From the the List of Endangered and Threatened Wildlife
, 88 Fed. Reg. 9830-9850; February 15,
2023
.

Endangered and Threatened Wildlife and Plants; U.S. Breeding Population of the Wood Stork Determined to be Endangered
; Final Rule, 49 Fed. Reg. 7332-7335; February 28,
1984
.

Flynn
JM
,
Hubley
R
,
Goubert
C
,
Rosen
J
,
Clark
AG
,
Feschotte
C
,
Smit
AF.
RepeatModeler2 for automated genomic discovery of transposable element families
.
Proc Natl Acad Sci USA
.
2020
:
117
:
9451
9457
. https://doi.org/10.1073/pnas.1921046117

Francisco
MR
,
Galetti Junior
PM.
First karyotypical description of two American Ciconiiform birds, Mycteria americana (Ciconiidae) and Platalea ajaja (Threskiornithidae) and its significance for the chromosome evolutionary and biological conservation approaches
.
Genet Mol Biol
2000
:
23
:
799
801
. https://doi.org/10.1590/S1415-47572000000400015

Griffin
DK
,
Robertson
LBW
,
Tempest
HG
,
Skinner
BM.
The evolution of the avian genome as revealed by comparative molecular cytogenetics
.
Cytogenet Genome Res
.
2007
:
117
:
64
77
. https://doi.org/10.1159/000103166

Griffiths
R
,
Double
MC
,
Orr
K
,
Dawson
RJG.
A DNA test to sex most birds
.
Mol Ecol
.
1998
:
7
:
1071
1075
. https://doi.org/10.1046/j.1365-294x.1998.00389.x

Guan
D
,
McCarthy
SA
,
Wood
J
,
Howe
K
,
Wang
Y
,
Durbin
R.
Identifying and removing haplotypic duplication in primary genome assemblies
.
Bioinformatics
.
2020
:
36
:
2896
2898
. https://doi.org/10.1093/bioinformatics/btaa025

Hackett
SJ
,
Kimball
RT
,
Reddy
S
,
Bowie
RCK
,
Braun
EL
,
Braun
MJ
,
Chojnowski
JL
,
Cox
WA
,
Han
K-L
,
Harshman
J
, et al. .
A phylogenomic study of birds reveals their evolutionary history
.
Science
.
2008
:
320
:
1763
1768
. https://doi.org/10.1126/science.1157704

Hughes
AL
,
Piontkivska
H.
DNA repeat arrays in chicken and human genomes and the adaptive evolution of avian genome size
.
BMC Evol Biol
.
2005
:
5
:
12
. https://doi.org/10.1186/1471-2148-5-12

Kahl
MP.
An overview of the storks of the world
.
Colonial Waterbirds
.
1987
:
10
:
131
134
. https://doi.org/10.2307/1521251

Kerpedjiev
P
,
Abdennur
N
,
Lekschas
F
,
McCallum
C
,
Dinkla
K
,
Strobelt
H
,
Luber
JM
,
Ouellette
SB
,
Azhir
A
,
Kumar
N
, et al. .
HiGlass: web-based visual exploration and analysis of genome interaction maps
.
Genome Biol
.
2018
:
19
:
125
. https://doi.org/10.1186/s13059-018-1486-1

Kimball
RT
,
Oliveros
CH
,
Wang
N
,
White
ND
,
Barker
FK
,
Field
DJ
,
Ksepka
DT
,
Chesser
RT
,
Moyle
RG
,
Braun
MJ
, et al. .
A phylogenomic supertree of birds
.
Diversity
.
2019
:
11
:
109
. https://doi.org/10.3390/d11070109

Korf
I.
Gene finding in novel genomes
.
BMC Bioinf
.
2004
:
5
:
59
.

Kretschmer
R
,
de Souza
MS
,
Furo
IO
,
Romanov
MN
,
Gunski
RJ
,
Garnero
ADV
,
de Freitas
TRO
,
de Oliveira
EHC
,
O’Connor
RE
,
Griffin
DK.
Interspecies chromosome mapping in Caprimulgiformes, Piciformes, Suliformes, and Trogoniformes (Aves): cytogenomic insight into microchromosome organization and karyotype evolution in birds
.
Cells
.
2021
:
10
:
826
. https://doi.org/10.3390/cells10040826

Kuhl
H
,
Frankl-Vilches
C
,
Bakker
A
,
Mayr
G
,
Nikolaus
G
,
Boerno
ST
,
Klages
S
,
Timmermann
B
,
Gahr
M.
An unbiased molecular approach using 3ʹ-UTRs resolves the avian family-level tree of life
.
Mol Biol Evol
.
2021
:
38
:
108
127
. https://doi.org/10.1093/molbev/msaa191

Kuramoto
T
,
Nishihara
H
,
Watanabe
M
,
Okada
N.
Determining the position of storks on the phylogenetic tree of waterbirds by retroposon insertion analysis
.
Genome Biol Evol
.
2015
:
7
:
3180
3189
. https://doi.org/10.1093/gbe/evv213

Laetsch
DR
,
Blaxter
ML.
BlobTools: interrogation of genome assemblies [version 1; peer review: 2 approved with reservations]
.
F1000Research
.
2017
:
6
:
1287
. https://doi.org/10.12688/f1000research.12232.1

Lee
JC
,
Tsai
LC
,
Hwa
PY
,
Chan
CL
,
Huang
A
,
Chin
SC
,
Wang
LC
,
Lin
JT
,
Linacre
A
,
Hsieh
HM.
A novel strategy for avian species and gender identification using the CHD gene
.
Mol Cell Probes
.
2010
:
24
:
27
31
. https://doi.org/10.1016/j.mcp.2009.08.003

Lee
S
,
Bakker
CR
,
Vitzthum
C
,
Alver
BH
,
Park
PJ.
Pairs and Pairix: a file format and a tool for efficient storage and retrieval for Hi-C read pairs
.
Bioinformatics
.
2022
:
38
:
1729
1731
. https://doi.org/10.1093/bioinformatics/btab870

Li
H.
Minimap2: pairwise alignment for nucleotide sequences
.
Bioinformatics
.
2018
:
34
:
3094
3100
. https://doi.org/10.1093/bioinformatics/bty191

Li
H
,
Durbin
R.
Fast and accurate short read alignment with Burrows-Wheeler transform
.
Bioinformatics
.
2009
:
25
:
1754
1760
. https://doi.org/10.1093/bioinformatics/btp324

Manni
M
,
Berkeley
MR
,
Seppey
M
,
Simao
FA
,
Zdobnov
EM.
BUSCO update: novel and streamlined workflows along with broader and deeper phylogenetic coverage for scoring of eukaryotic, prokaryotic, and viral genomes
.
Mol Biol Evol
.
2021
:
38
:
4647
4654
. https://doi.org/10.1093/molbev/msab199

Masabanda
JS
,
Burt
DW
,
O’Brien
PCM
,
Vignal
A
,
Fillon
V
,
Walsh
PS
,
Cox
H
,
Tempest
HG
,
Smith
J
,
Habermann
F
, et al. .
Molecular cytogenetic definition of the chicken genome: the first complete avian karyotype
.
Genetics
.
2004
:
166
:
1367
1373
. https://doi.org/10.1534/genetics.166.3.1367

Morescalchi
A.
Evolution and karyology of the amphibians
.
Bolletino di zoologia
.
1980
:
47
:
113
126
. https://doi.org/10.1080/11250008009438709

Ogden
JC
,
Patty
BW.
The recent status of the wood stork in Florida and Georgia
. In:
Odom
RQ
,
Guthrie
JW
, editors.
Proceedings of the nongame and endangered wildlife symposium (Technical Bulletin WL5)
.
Atlanta, GA, USA
:
Georgia Department of Natural Resources Game and Fish Division
;
1981
. p.
97
102
.

Ohno
S
,
Muramoto
J
,
Stenius
C
,
Christian
L
,
Kittrell
WA
,
Atkin
NB.
Microchromosomes in holocephalian, chondrostean and holostean fishes
.
Chromosoma
.
1969
:
26
:
35
40
. https://doi.org/10.1007/bf00319498

Olmo
E.
Trends in the evolution of reptilian chromosomes
.
Integr Comp Biol
.
2008
:
48
:
486
493
. https://doi.org/10.1093/icb/icn049

Open2C
,
Abdennur
N
,
Fudenberg
G
,
Flyamer
IM
,
Galitsyna
AA
,
Goloborodko
A
,
Imakaev
M
,
Venev
SV.
2023
.
Pairtools: from sequencing data to chromosome contacts
. Preprint at bioRxiv. https://doi.org/10.1101/2023.02.13.528389, preprint: not peer reviewed.

Price
AL
,
Jones
NC
,
Pevzner
PA.
De novo identification of repeat families in large genomes
.
Bioinformatics
.
2005
:
21
:
i351
i358
. https://doi.org/10.1093/bioinformatics/bti1018

Prum
RO
,
Berv
JS
,
Dornburg
A
,
Field
DJ
,
Townsend
JP
,
Lemmon
EM
,
Lemmon
AR.
A comprehensive phylogeny of birds (Aves) using targeted next-generation DNA sequencing
.
Nature
.
2015
:
526
:
569
573
. https://doi.org/10.1038/nature15697

Putnam
NH
,
O’Connell
BL
,
Stites
JC
,
Rice
BJ
,
Blanchette
M
,
Calef
R
,
Troll
CJ
,
Fields
A
,
Hartley
PD
,
Sugnet
CW
, et al. .
Chromosome-scale shotgun assembly using an in vitro method for long-range linkage
.
Genome Res
.
2016
:
26
:
342
350
. https://doi.org/10.1101/gr.193474.115

R Core Team
.
R: a language and environment for statistical computing (Version 4.2.2)
.
R Foundation for Statistical Computing
;
2022
. https://www.R-project.org/ [
accessed 2022 Sep 21
].

Rodríguez-Rodríguez
EJ
,
Negro
JJ.
Integumentary colour allocation in the stork family (Ciconiidae) reveals short-range visual cues for species recognition
.
Birds
.
2021
:
2
:
138
146
. https://doi.org/10.3390/birds2010010

Sangster
G
,
Mayr
G.
Feraequornithes: a name for the clade formed by Procellariiformes, Sphenisciformes, Ciconiiformes, Suliformes and Pelecaniformes (Aves)
.
Vertebr Zool
.
2021
:
71
:
49
53
. https://doi.org/10.3897/vz.71.e61728

Seligmann
ICA
,
Furo
IO
,
dos Santos
MS
,
Tagliarini
MM
,
Araujo
CCD
,
O’Brien
PCM
,
Ferguson-Smith
MA
,
de Oliveira
EHC.
Comparative chromosome painting in two Brazilian stork species with different diploid numbers
.
Cytogenet Genome Res
.
2019
:
159
:
32
38
. https://doi.org/10.1159/000503019

Slikas
B.
Phylogeny of the avian family Ciconiidae (storks) based on cytochrome b sequences and DNA–DNA hybridization distances
.
Mol Phylogenet Evol
.
1997
:
8
:
275
300
. https://doi.org/10.1006/mpev.1997.0431

Smit
AFA
,
Hubley
R
,
Green
P.
RepeatMasker Open-4.0
.
Institute for Systems Biology
;
2013–2015
. http://www.repeatmasker.org [
accessed 2023 Feb 2
].

Srikulnath
K
,
Ahmad
SF
,
Singchat
W
,
Panthum
T.
Why do some vertebrates have microchromosomes
?
Cells
.
2021
:
10
:
2182
. https://doi.org/10.3390/cells10092182

Stanke
M
,
Waack
S.
Gene prediction with a hidden Markov model and a new intron submodel
.
Bioinformatics
.
2003
:
19
:
ii215
ii225
. https://doi.org/10.1093/bioinformatics/btg1080

Tiersch
TR
,
Wachtel
SS.
On the evolution of genome size in birds
.
J Hered
.
1991
:
82
:
363
368
. https://doi.org/10.1093/oxfordjournals.jhered.a111105

Vasimuddin
M
,
Misra
S
,
Li
H
,
Aluru
S.
2019
.
Efficient architecture-aware acceleration of BWA-MEM for multicore systems
. 2019 IEEE International Parallel and Distributed Processing Symposium (IPDPS). p.
314
324
. https://doi.org/10.1109/IPDPS.2019.00041

Wang
J
,
Su
W
,
Hu
Y
,
Li
S
,
O’Brien
PCM
,
Ferguson-Smith
MA
,
Yang
F
,
Nie
W.
Comparative chromosome maps between the stone curlew and three ciconiiform species (the grey heron, little egret and crested ibis)
.
BMC Ecol Evol
.
2022
:
22
:
23
. https://doi.org/10.1186/s12862-022-01979-x

Waters
PD
,
Patel
HR
,
Ruiz-Herrera
A
,
Álvarez-González
L
,
Lister
NC
,
Simakov
O
,
Ezaz
T
,
Kaur
P
,
Frere
C
,
Grützner
F
, et al. .
Microchromosomes are building blocks of bird, reptile, and mammal chromosomes
.
Proc Natl Acad Sci USA
.
2021
:
118
:
e2112494118
. https://doi.org/10.1073/pnas.2112494118

Wenger
AM
,
Peluso
P
,
Rowell
WJ
,
Chang
P-C
,
Hall
RJ
,
Concepcion
GT
,
Ebler
J
,
Fungtammasan
A
,
Kolesnikov
A
,
Olson
ND
, et al. .
Accurate circular consensus long-read sequencing improves variant detection and assembly of a human genome
.
Nat Biotechnol
.
2019
:
37
:
1155
1162
. https://doi.org/10.1038/s41587-019-0217-9

Winter
D
,
Lee
K
,
Cox
M.
pafr: read, manipulate and visualize “Pairwise mApping Format” data
.
R package version 0.0.2
;
2020
. https://CRAN.R-project.org/package=pafr [
accessed 2023 Jan 29
].

This article is published and distributed under the terms of the Oxford University Press, Standard Journals Publication Model (https://academic.oup.com/pages/standard-publication-reuse-rights)
Corresponding Editor: Alexander Suh
Alexander Suh
Corresponding Editor
Search for other works by this author on: