The assembled and annotated genome of the pigeon louse Columbicola columbae, a model ectoparasite

Abstract The pigeon louse Columbicola columbae is a longstanding and important model for studies of ectoparasitism and host-parasite coevolution. However, a deeper understanding of its evolution and capacity for rapid adaptation is limited by a lack of genomic resources. Here, we present a high-quality draft assembly of the C. columbae genome, produced using a combination of Oxford Nanopore, Illumina, and Hi-C technologies. The final assembly is 208 Mb in length, with 12 chromosome-size scaffolds representing 98.1% of the assembly. For gene model prediction, we used a novel clustering method (wavy_choose) for Oxford Nanopore RNA-seq reads to feed into the MAKER annotation pipeline. High recovery of conserved single-copy orthologs (BUSCOs) suggests that our assembly and annotation are both highly complete and highly accurate. Consistent with the results of the only other assembled louse genome, Pediculus humanus, we find that C. columbae has a relatively low density of repetitive elements, the majority of which are DNA transposons. Also similar to P. humanus, we find a reduced number of genes encoding opsins, G protein-coupled receptors, odorant receptors, insulin signaling pathway components, and detoxification proteins in the C. columbae genome, relative to other insects. We propose that such losses might characterize the genomes of obligate, permanent ectoparasites with predictable habitats, limited foraging complexity, and simple dietary regimes. The sequencing and analysis for this genome were relatively low cost, and took advantage of a new clustering technique for Oxford Nanopore RNAseq reads that will be useful to future genome projects.


Introduction
Parasites represent a large proportion of eukaryotic biodiversity, and it is estimated that 40% of insect diversity is parasitic (de Meeû s and Renaud 2002). Parasitic lice (Insecta: Phthiraptera) comprise a group of about 5000 species that parasitize all orders of birds and most orders of mammals (Mullen and Durden 2009;Clayton et al. 2015). Two thirds of louse species are associated with only a single host species (Durden and Musser 1994;Smith 2004). The genus Columbicola comprises 91 known species, all found on pigeons or doves (Bush et al. 2009;Gustafsson et al. 2015;Adly et al. 2019); most of these louse species are found on a single host species (Johnson et al. 2007(Johnson et al. , 2009.
Like all feather lice (suborder Ischnocera), Columbicola are "permanent" parasites that complete their entire life cycle on the body of the host (Marshall 1981). Feather lice feed primarily on feathers, which they metabolize with the assistance of endosymbiotic bacteria (Fukatsu et al. 2007;Smith et al. 2013). The feather damage caused by lice has a chronic effect that leads to reduced host survival (Clayton et al. 1999) and mating success (Clayton 1990). Birds are able to defend themselves against feather lice by preening them with the beak. However, Columbicola lice escape from preening by hiding in grooves between feather barbs, and the sizes of these grooves scale with host body size. In microevolutionary time, the result is stabilizing selection on body size of lice (Clayton et al. 1999;. In macroevolutionary time, the result is that host defense (preening) and body size interact to reinforce the host specificity and size matching of Columbicola species to their hosts (Clayton et al. 2003). Similarly, selection for visual crypsis drives the evolution of color similarities between Columbicola species their hosts (Bush et al. 2010(Bush et al. , 2019. Within the feather lice, the biology of C. columbae (Figure 1) is better known than that of any other louse species, including details about its morphology, physiology, ecology, and behavior (Martin 1934;Stenram 1956;Rakshpal 1959;Nelson and Murray 1971;Eichler et al. 1972;Rudolph 1983;Clayton 1990Clayton , 1991Clayton and Tompkins 1995;Clayton et al. 1999Clayton et al. , 2003Clayton et al. , 2008Harbison and Clayton 2011). A unique feature of the C. columbae study system is that its host, the rock pigeon Columba livia, has been under artificial selection by pigeon breeders for millennia, resulting in dramatic phenotypic variation (Darwin 1868;Shapiro and Domyan 2013), similar to that seen across the 300þ other species of pigeons and doves (Gibbs et al. 2001). This variation makes it possible to transfer C. columbae among diverse size and color phenotypes within the single native host species. Recently, we showed that switching lice to pigeons of different sizes and colors elicits rapid population-level changes in louse size and color (Bush et al. 2019;Villa et al. 2019). Despite the wealth of phenotypic data about real-time adaptation of C. columbae to changes in host environment, the underlying molecular mechanisms remain unknown.
A deeper understanding of louse evolution and genetics is limited largely by a paucity of genomic resources. The louse with the best available genomic resources is the blood-feeding human body louse Pediculus humanus, the draft genome of which was assembled using low-coverage shotgun sequencing (Kirkness et al. 2010). Pediculus humanus had the smallest insect genome known at that time (108 Mb), with a repertoire of 10,773 annotated genes. Presently, what we know about the genomic signatures of parasitism in Phthiraptera is largely limited to this one species. Prior work on C. columbae has generated valuable DNA sequence datasets for phylogenetics  and studies of mitochodrial evolution (Sweet et al. 2021), but whole-genome data are still lacking.
Here, we report a high-quality draft genome assembly and annotation for C. columbae that incorporates short-read Illumina (Bennett 2004) sequences, long-read Oxford Nanopore (Jain et al. 2016) sequences, and scaffolding using Hi-C data (van Berkum et al. 2010). These new resources will enable genomic approaches to understanding the molecular basis of rapid adaptation in C. columbae. More generally, the C. columbae genome provides comparative genomic data to understand the molecular basis of traits associated with parasitism that are shared among lice.

Animal tissue samples
All lice used in this study were drawn from natural populations infesting wild-caught feral rock pigeons (C. livia) from Salt Lake City, UT. We maintained 15-20 infested pigeons in cages to provide a reliable source of C. columbae for sequencing.
We reduced the nucleotide heterozygosity of our colony by creating a partially inbred population of lice. Initially, a single pair of lice (1 male and 1 female) was arbitrarily drawn from the pigeon colony and allowed to reproduce on a new, individually caged louse-free feral pigeon. After a period of 21 days, all immature lice were removed from the pigeon using CO 2. At this point, these F1 lice were all full siblings. All offspring were then individually placed in glass vials with pigeon feathers for food, and allowed to mature. Rearing lice individually in vials ensured that F1s could not mate. Once mature, a single pair of unmated F1 adults (1 male and 1 female) were arbitrarily chosen and placed on a new, louse-free feral pigeon to mate and reproduce. Thus, all offspring on this new pigeon were the product of full-sibling mating and represented the first generation of inbreeding. These methods were repeated for eight generations.
After eight rounds of full-sibling inbreeding, the partially inbred lice were transferred to a new louse-free pigeon and left to mature and produce offspring. We left the lice on this pigeon for 4 months, which allowed the population to grow enough to provide sufficient numbers for sequencing. The lice used for Illumina genomic DNA sequencing were derived from this partially inbred population. Reduced heterozygosity should have resulted in higher quality polishing of the Oxford Nanoporederived contigs with our Illumina data (see below). We pooled 100 adult lice for Illumina genomic DNA sequencing, 2000 adult lice for Oxford Nanopore genomic DNA sequencing, and 100 adult lice for Illumina RNA sequencing. All lice were drawn from the same partially inbred laboratory population, except for the lice used in Oxford Nanopore DNA sequencing, which were drawn from the main laboratory population from which the inbred population was derived. Ultimately, the inbred louse population was too small to provide sufficient material for Oxford Nanopore DNA sequencing. We generated Oxford Nanopore RNA sequencing reads from four different life stages of lice, all drawn from the inbred population (100 lice each from nymphal instars 1, 2, and 3 adults).

Isolation of genetic material
DNA was isolated by grinding with a disposable homogenizer pestle (VWR, Radnor PA, USA) on ice for 30 min followed by DNA extraction with the Qiagen DNeasy extraction kit (Qiagen, Germantown, MD, USA). DNA for long read sequencing was extracted using the Qiagen DNA Blood and Tissue Midi kit. RNA was isolated using the Qiagen Oligotex mRNA mini kit.

Illumina genomic DNA and RNA sequencing
Illumina DNA sequencing was performed using an Illumina HiSeq 2500 sequencer at the University of Utah High Throughput Genomics Core. We generated four libraries with mean insert sizes of 180, 500, 3500, and 8200 bp. Genomic DNA was sequenced with paired-end 125-bp reads. cDNA sequencing was also performed on the Illumina HiSeq 2500 sequencer, producing paired end reads with a read length of 125 bp. Oxford Nanopore genomic DNA and RNA sequencing We generated long read genomic data using Oxford Nanopore MinION sequencers and a custom library preparation designed to increase read length. This protocol followed the standard procedure for producing 1d 2 reads with kit LSK308 (Oxford Nanopore community, https://community.nanoporetech.com/protocols/), with the following modifications. (1) During all alcohol washes of magnetic SPRI beads, an additional wash was performed using Tris-EDTA to remove small DNA fragments. This step was performed quickly and without disturbing the beads to avoid dissolving all available DNA into solution.
(2) All elutions from magnetic SPRI beads were performed after an incubation in elution buffer at 37 for 30 min. These practices improve the length of Oxford Nanopore sequencing reads (Urban et al. 2015).

Genome size estimation
We used the following formula (Liu et al. 2020) to estimate genome size from 21-mers counted from the Illumina sequencing data using jellyfish (Marc¸ais and Kingsford 2011): where G is the genome size, n is the total number of sequenced bases, c is the expected sequence coverage depth, L is the average sequencing read length, and K is the k-mer length.

Genome assembly
We used Trimmomatic version 0.36 (Bolger et al. 2014) to trim Illumina input reads using the following settings: ILLUMINACLIP: adapters.fa: 2:30:10 LEADING: 20 TRAILING: 20 MINLEN: 30 CROP: 85. We then used fastq-join from ea-utils version 1.1.2-537 (Aronesty 2011) to join all short reads into pair joined reads, and used these throughout the assembly process. We used Canu version 1.6 ) with the parameter genome Size ¼ 220m to assemble Oxford Nanopore genomic DNA reads, then polished the assembled contigs using pilon v1.22 (Walker et al. 2014) and the Illumina genomic DNA reads. The pilon software was run with the following switches: -changes -vcf -vcfqetracks -fix all.
The polished draft assembly was scaffolded by Phase Genomics using their proprietary scaffolding software (Burton et al. 2013;Bickhart et al. 2017;Peichel et al. 2017). We supplied Phase Genomics with approximately 1600 lice preserved at À80 for high molecular weight DNA extraction, Hi-C library preparation, and sequencing (Belton et al. 2012).

Transcript selection and assembly
Standardized pipelines do not yet exist for selecting transcripts from raw Oxford Nanopore RNAseq reads. Therefore, we produced a custom pipeline that identifies putatively full-length transcripts to serve as evidence for genome annotation. In short, we aligned all RNAseq reads using Minimap (Li 2018), then clustered these alignments into sets that represent a gene using Carnac-LR (Marchet et al. 2019). We wrote a program, wavy_choose, that extracts the aligned reads from the original data, then identifies reads that likely represented full-length transcripts using scipy's function scipy.signal.find_peaks_cwt() (Du et al. 2006; Figure 2). A more detailed version of the pipeline follows, below.
Minimap aligns long, low-quality reads against one another, and can do so in an all-by-all comparison. Carnac-LR then clusters these reads into groups according to their alignments. Each Carnac-LR-clustered group of mRNA sequencing reads should represent all of the reads associated with a single gene, but if a gene has multiple alternative transcripts, Carnac-LR will not distinguish between them. The custom tool wavy_choose takes all of the clustered reads identified by Carnac-LR and identifies clusters within clusters that are most similar in both length and sequence. Because Oxford Nanopore reads are generally long enough to span an entire mRNA transcript, wavy_choose identifies the reads most likely to be complete transcripts by identifying the most common read lengths. It then removes all nonfulllength reads from the analysis. This tool is especially well suited to transcript discovery, as multiple alternative transcripts may be identified from a single cluster of reads with overlapping sequence, and wavy_choose makes no assumptions as to the number of transcripts to identify.
The function find_peaks_cwt() uses continuous wavelet transformation, a technique from signal processing (Grossmann and Morlet 1984) to identify peaks in a 2-dimensional dataset. It does this by first convolving (transforming) the dataset to amplify the portion of the dataset that matches a wavelet with specified parameters (here, the default "Mexican hat" wavelet) and dampens the portions of the dataset that do not match the wavelet. The program then identifies local relative maxima that appear at the specified peak widths (here, 50-200 bp) and have sufficiently high signal-to-noise ratio (here 1.0). This widely accepted technique is straightforward to apply in this context, but it is limited to detecting transcripts that have unique lengths. Two alternative transcripts of matching lengths would appear as a single peak in the length histogram. In these cases, reads from both alternative transcripts were retained in the final dataset. We kept at least one read per cluster of reads.
Untrimmed Illumina cDNA reads were assembled using Trinity with the -jaccard_clip setting (Grabherr et al. 2011). Figure 2 wavy_choose identifies likely full-length transcripts from clustered Oxford Nanopore reads. Depicted here is a histogram of read lengths (blue) for one carnac-LR-clustered set of reads. wavy_choose is able to identify two length peaks (red lines) in this transcript set, and discards all reads of other lengths. This process simplifies the transcriptome evidence dataset for MAKER, which uses the identified reads for gene annotation.

Genome annotation
We used a combination of wavy_choose-selected Oxford Nanopore-derived transcripts, Illumina RNAseq-derived Trinity assemblies, and orthology information from Swissprot as evidence for gene models in MAKER (Cantarel et al. 2007), a widely used genome annotation tool. We used AUGUSTUS 3.3.1 to perform the gene finding portion of the MAKER pipeline. BUSCO (Simão et al. 2015) trains AUGUSTUS as part of the BUSCO pipeline, so we ran BUSCO on the genome assembly and used its AUGUSTUS training model during gene finding. We used both WU BLAST (Chao et al. 1992) and InterProScan (Jones et al. 2014) to match genes to their orthologs in the Uniprot-Swissprot database, and to provide the GO terms associated with genes in the final annotation set.

Feature density analysis
We used bedops (Neph et al. 2012) to generate a.bed file of sliding windows across all chromosomes, then used bedmap (Neph et al. 2012) to count genes and repetitive sequences in these windows. Sliding windows were 1 Mb in width with a step length of 100 kb. For genes, we counted the total number of features identified by MAKER as "gene"s in its output.gff file. For repeats, we counted all MAKER-identified repeatmasker "match"es.

Detection of bacterial contaminants
After assembly and annotation, we manually checked the louse genome for contamination with bacterial genomic sequences by identifying regions with unusually high gene density, repeatmasker-identified (Chen 2004) artifacts, and contiguous runs of bacterial genes. We also used kraken (Wood and Salzberg 2014) with the DustMasked MiniKraken DB database (https://ccb.jhu. edu/software/kraken/dl/minikraken_20171101_8GB_dustmasked. tgz) to identify known bacterial kmer contaminants.
We identified two sections of the genome that likely contained bacterial contamination, and removed them from the final assembly. The first section, at the beginning of chromosome 4, had a higher density of genes than any other region of the genome (280 genes per 10 kb, vs. 64 genes per 10 kb in the bacteria-free genome). It also had a paucity of repetitive elements (262 repeats per 10 kb, as opposed to 800 elsewhere). MAKER's annotation (see below) indicated that the majority of the region's genes were bacterial in origin, and BLASTn searches (Zhang et al. 2000) against the NCBI nr database (https://blast.ncbi.nlm.nih.gov/) confirmed this, as did kraken. The region also contained the annotation's only instance of an explicit bacterial artifact identified by repeatmasker. The second region, on chromosome 8, was flagged as containing bacterial content by kraken. Both the chromosome 4 and 8 regions contained genes annotated by MAKER as similar to genes from the Sodalis clade, which contains the endosymbiont of the tsetse fly and a known bacterial endosymbiont of C. columbae (Fukatsu et al. 2007;Smith et al. 2013). Two hundred nineteen of the 554 genes in the chromosome 4 section are annotated as being Sodalis-related, as are 3 of the 4 genes in the chromosome 8 section. Thus, the totality of evidence led us to conclude that these regions on chromosomes 4 and 8 of our preliminary C. columbae genome assembly were bacterial contaminants from a known Sodalis-clade endosymbiont.
Lice were starved for 24 h and the transparent gut was checked visually for content before DNA and RNA extraction, reducing the likelihood of contamination due to eukaryotic tissue in the gut. Nevertheless, these measures do not completely rule out sequence contamination from the pigeon host, humans, or other eukaryotes. We searched for contamination from eukaryotes by performing BLASTn searches (Zhang et al. 2000) against the human reference genome (Schneider et al. 2017), the C. livia reference genome (Holt et al. 2018), and the NCBI nr nucleotide database. We did not find any regions in the C. columbae genome greater than 3 kb in length and with identity greater than 90% to any of the target sequence databases. Therefore, we concluded that there is not substantial eukaryotic contamination in the final assembly.

Data availability
Raw sequence data for this project are publicly available through NCBI SRA (SAMN16076762-SAMN16076765). All analysis scripts are available through GitHub at https://github.com/jgbaldwin brown/jgbutils. The genome assembly and annotation are available at NCBI GenBank (PRJNA662097).

Genome size estimation
We generated 2.92 Â 10 10 bases of genomic sequence using the Illumina short-read platform (mean read length after trimming ¼ 107.2 bp). We estimated the genome size via k-mer counting (Liu et al. 2020) using jellyfish (Marc¸ais and Kingsford 2011) (Figure 3). Using a k-mer size of 21, we estimate the genome size of C. columbae to be 230 Mb, within the range expected for insects.

Genome assembly summary
We generated a high-quality draft genome assembly using a combination of Illumina and Oxford Nanopore sequencing data, and Hi-C scaffolding (Table 1). Our initial, unscaffolded assembly with Canu consists of 1193 contigs with a total length of 206 Mb, and an N50 contig length of 511 kb. We scaffolded the assembly using Hi-C data, producing chromosome-size scaffolds from the initial contigs. The final assembly comprises 12 chromosomesized scaffolds and 380 small scaffolds, totaling 208 Mb of sequence. The N50 scaffold length for the final assembly is 17.7 Mb. Karyotyping evidence (Ries 1932) indicates that the C. columbae genome consists of 12 holocentric chromosomes. Based on this physical evidence, and the striking difference in size between the 12 largest scaffolds and all other scaffolds in the assembly ( Figure 4), we predict that each of the 12 largest scaffolds in the assembly represents one of the 12 karyotyped chromosomes.

Annotation
We annotated the genome using the MAKER pipeline, with transcriptome evidence from Trinity-assembled Illumina RNAseq reads and wavy_choose-selected Oxford Nanopore RNAseq reads (Figure 2). We identified 19,139 transcripts from 13,362 genes. 13,246 of these genes are functionally annotated by BLAST using the Swissprot database, 8354 are functionally annotated by similarity to InterPro or Pfam, and 13,248 are functionally annotated by either Swissprot, InterPro, or Pfam. MAKER produces a combined quality statistic called Annotation Edit Distance (AED); Eilbeck et al. 2009;Holt and Yandell 2011). Perhaps owing to our use of long-read transcriptome sequencing, 10.3% of our annotated transcripts have ideal AED scores of 0 ( Figure 5), and only 5.6% of annotated transcripts have low-quality AED scores above 0.5. The abundance of low AED scores and relative dearth of low scores are indicators of a high-quality annotation.

Genome completeness
We used BUSCO (Simão et al. 2015) to measure completeness of the genome by counting the number of highly conserved, single copy genes that should be present in insects ( Table 2). The reference genome, transcriptome, and translated transcriptome contain complete copies of 96%, 90%, and 87% of insect BUSCOs, respectively. These high values indicate that the C. columbae genome assembly is sufficiently complete for downstream comparative genomic analyses.

Repetitive elements
Repeatmasker identified 20.2 Mb (9.70%) of the genome as repetitive content. Of this 20.2 Mb, 65.8% is DNA transposons, 14.8% is LINEs, 8.6% is simple repeats, and 5.7% is LTR transposons (Wicker et al. 2007). The remainder (5.1%) is an assortment of transposable elements, low-complexity regions, and satellites (Table 3). Repetitive content in the C. columbae genome is, therefore, considerably higher than in P. humanus. In the latter species, only 1% of the genome is annotated as class I (retrotransposons, including LTR, LINE, and SINE) or class II (DNA transposons) transposable elements, and 6.9% is tandem repeats (Kirkness  Cumulative coverage of various assemblies Figure 4 Cumulative coverage of initial and final (scaffolded) C. columbae genome assemblies, illustrating the improvement in contiguity by scaffolding with Hi-C data. All scaffolds in the assembly are plotted largest to smallest, from left to right. The x-axis indicates cumulative length of an assembly, and the y-axis corresponds to the cumulative portion of the genome covered by initial contigs (red dots) and final scaffolds (blue dots).  . One caveat to this conclusion is the lower contiguity of the earlier P. humanus assembly. Because genomes often fail to assemble at repetitive sites, the P. humanus assembly may have captured a smaller proportion of repetitive sequences than the more contiguous C. columbae assembly. Kirkness et al. (2010) predicted that the monophagous, permanently parasitic lifestyle of lice should lead to reduced genomes due to the reduced need to seek food and avoid enemies compared to free-living species. While Kirkness et al. identified a reduction in gene families related to sensing, their conclusion that overall genome size is affected by lifestyle is not supported by the genome size of C. columbae, which has a genome size and number of genes that are more typical for a free-living insect. Indeed, both C. columbae and P. humanus appear to have a full complement of genes, and while P. humanus has a small genome and a reduction in transposable elements, C. columbae has neither of these. The pattern of reduced genome size and reduction in TE content without loss of genes is characteristic of highpopulation-size species (Lefé bure et al. 2017). However, a robust estimate of the population size of P. humanus, combined with evidence ruling out alternative hypotheses, would be necessary to demonstrate that population size drove the reduced genome size in P. humanus. Other authors (Oliver et al. 2007) have hypothesized that large populations may not actually be under selection to have smaller genomes.

Genomic evidence for the lack of centromeres
Centromeres are characterized by a depletion of genic content and an increase in repetitive content (Jain et al. 2018). Based on these criteria (Figure 6), we find no evidence for centromeres in any of the C. columbae chromosomes. Presence of genes is moderately anti-correlated with presence of simple repetitive sequences (r ¼ À0.28, 1 Mb sliding windows). Still, the overall repeat density is not correlated with gene density, and both measures are relatively consistent across the genome. Many chromosomes (cf., Figure 6, chromosomes 6 and 7) have a twin-peaked pattern of simple repeats, in which chromosome ends and centers have high genic content and low repeat content, but the genomic segments between the ends and the center have high repeat content and low-genic content. It is possible that these twin peaks of simple repeat content are the centromeres in a polycentromeric chromosome, and that the chromosomes were actually misclassified as holocentric based on karyotyping evidence.

Comparisons to the closest sequenced relative
The closest relative of C. columbae with an assembled genome is the human body louse P. humanus. C. columbae, and P. humanus are thought to have diverged 65 million years ago . Pediculus humanus has five metacentric chromosomes and one telocentric chromosome (Kirkness et al. 2010), in contrast to the 12 putatively holocentric chromosomes described here. Pediculus humanus has a genome assembly size of 108 Mb, approximately half that of the 208-Mb C. columbae genome assembly. The C. columbae genome has a typical genome-wide GC content of 36%, while P. humanus has an extremely AT-rich genome with 28% GC content, making C. columbae the more typical insect genome of the two.

Synteny analysis
We used the default settings of SynIma (Farrer 2017) to identify synteny between C. columbae and P. humanus (Figure 7). We were unable to test for chromosome-scale syntenic blocks between P. humanus and C. columbae due to the low contiguity of the P. humanus genome. However, we found very few locations in which synteny is broken between a P. humanus scaffold and a C. columbae scaffold, showing that short-range synteny is almost entirely conserved between these species.

Functional annotation reveals depletion of environmental sensing and metabolic genes
Pediculus humanus has a small complement of opsins (3, as opposed to 275 in D. melanogaster) and G protein-coupled receptors (GPCR, 104, as opposed to 408 in D. melanogaster) (Kirkness et al. 2010;Thurmond et al. 2019). Similarly, we find that only 2 annotated genes in C. columbae are associated with the opsin gene ontology term (GO:00007602) and only 107 genes are associated with the GPCR GO category (GO:00004930). This reduced repertoire of sensory system genes supports the hypothesis that the relatively static environments encountered by lice and other ectoparasites relaxes selection on the ability to sense and respond to stimuli in more variable environments (Kirkness et al. 2010). Columbicola columbae is incapable of surviving off of its obligate host, so there might be little selection to retain complex visual, olfactory, or other complex sensory acuity. We find support for the hypothesis that specific gene families, such as those relating to sensory capabilities and metabolism, are reduced in obligate parasites (Jackson 2015).
Pediculus humanus is massively depleted in terms of odorant receptors, gustatory receptors, and chemosensory proteins, and C. columbae shows the same pattern. For example, C. columbae has only 13 genes with olfactory receptor activity (GO: 0004984) and P. humanus has only 10, compared with 152 in D. melanogaster (Thurmond et al. 2019). Columbicola columbae has 2 genes associated with taste receptor activity (GO: 0008527) and P. humanus has 6, yet D. melanogaster has 150. We speculate that this dramatic depletion of taste receptor genes is due to the homogeneous diet of ectoparasitic lice. The diet of C. columbae, for instance, consists entirely of pigeon feathers and flakes of dead skin (Ash 2008;Nelson and Murray 1971;Singh et al. 2010). Another highly depleted gene functional category in P. humanus is the insulin signaling/TOR pathway. Kirkness et al. (2010) show that the canonical pathway appears nonfunctional in P. humanus, with only one gene having P. humanus EST-derived evidence for its expression. BLAST evidence indicates that other TOR pathway genes are reduced to a single copy in P. humanus, including genes such as dilps and eIF-4E (class I), which respectively have 6 and 7 copies in D. melanogaster (Kirkness et al. 2010). We find the same qualitative result in C. columbae, with no annotated genes associated with the insulin receptor signaling pathway (GO:0008286). Finally, the complement of detoxification genes is depleted in both P. humanus and C. columbae, with C. columbae having no annotated genes associated with detoxification (GO:0098754).
The striking reduction in sensory and metabolic gene categories in C. columbae and P. humanus could be due to independent gene loss in each lineage, inheritance of a depleted repertoire from a common ancestor, or a combination of the two. Loss of the same suite of genes in each species would be consistent with inheritance of a reduced sensory repertoire from a common ancestor, while loss of different genes in each species would indicate independent reductions. Reciprocal best BLAST hits of C. columbae and P. humanus genes to a shared outgroup, Drosophila melanogaster, indicate that the identities of the lost and retained

Figure 7
Short range synteny is largely conserved between C. columbae (bottom) and P. humanus (top) genomic scaffolds. Lines connecting scaffolds from each genome assembly represent the positions of orthologous genes. P. humanus contigs were aligned to the C. columbae genome in order and orientation using SynIma. Chromosome-size scaffolds in C. columbae are labeled 1-12.
genes are mostly the same between the two louse species (Table 4), thereby supporting the hypothesis of ancestral loss. We note the possibility that these "missing" genes are not actually absent from the genomes of C. columbae and P. humanus, but are simply not annotated in their respective genomes. However, the BUSCO completeness score of 96.4% for the C. columbae genome renders large-scale incompleteness and misannotation less likely.