Novel Insights into Chromosome Evolution in Birds, Archosaurs, and Reptiles

Homologous synteny blocks (HSBs) and evolutionary breakpoint regions (EBRs) in mammalian chromosomes are enriched for distinct DNA features, contributing to distinct phenotypes. To reveal HSB and EBR roles in avian evolution, we performed a sequence-based comparison of 21 avian and 5 outgroup species using recently sequenced genomes across the avian family tree and a newly-developed algorithm. We identified EBRs and HSBs in ancestral bird, archosaurian (bird, crocodile, and dinosaur), and reptile chromosomes. Genes involved in the regulation of gene expression and biosynthetic processes were preferably located in HSBs, including for example, avian-specific HSBs enriched for genes involved in limb development. Within birds, some lineage-specific EBRs rearranged genes were related to distinct phenotypes, such as forebrain development in parrots. Our findings provide novel evolutionary insights into genome evolution in birds, particularly on how chromosome rearrangements likely contributed to the formation of novel phenotypes.


INTRODUCTION
A prominent feature of animal genome evolution is the non-random rearrangement of chromosomes (Pevzner and Tesler 2003). For millions of years genomes of multiple species have maintained homologous synteny blocks (HSBs), demarcated by dynamic "evolutionary breakpoint regions" (EBRs) (Figure 1). Evidence suggests that each of them evolves by distinctly different mechanisms ): HSBs maintain the order of genes related to organismal development whereas EBRs often affect chromosomal regions related to lineage-specific biology (Groenen, et al. 2012;Ullastres, et al. 2014). These data are somewhat mammal-centric and conclusions thus may not hold for other amniotes. While the availability of genetic maps and chromosome assemblies of the chicken, turkey, and zebra finch genomes provided an important insight into avian chromosome evolution (Burt, et al. 1999;Völker, et al. 2010;Warren, et al. 2010), a comprehensive study at the sequence level is lacking, making unclear if bird chromosomes follow similar patterns of evolution as their mammalian counterparts.
Using a new EBR-detection approach applied to 21 bird genomes assembled to whole chromosomes or large scaffolds (Zhang, et al. 2014), and four non-avian reptile genomes of similar quality, we examined the association of EBRs and multispecies HSBs (msHSBs) with gene networks, transposable elements (TEs) and conserved non-coding sequences. We identified gene networks that: (1) were preferentially reshuffled during avian chromosome evolution, or (2) have been maintained in msHSBs for millions of years of evolution. Our results represent the first comprehensive sequence analysis of chromosome evolution in birds and reptiles, demonstrating how chromosome evolution may have acted upon the formation of various phenotypes.

RESULTS AND DISCUSSION
Lineage-specific EBRs in birds. We developed an interactive resource for genome synteny comparison in 26 species (Evolution Highway; http://eh-demo.ncsa.uiuc.edu/birds; Supplementary Table 1). We aligned 20 avian and five outgroup genomes to the chicken genome to define syntenic fragments at three resolutions of rearrangement detection: 100Kbp, 300Kbp and 500Kbp ( Figure 1). We developed and evaluated (Supplementary Table 2, 3 and 4) a method of detecting EBRs within scaffolds of scaffold-based assemblies that combines an algorithmic approach to identify putative EBRs (Supplementary Table 5) with independent PCR verification of these regions in several assemblies to find paired read spanning levels in scaffolds associated with confirmed EBRs in order to estimate and minimise the number of chimeric joints in the final EBR list (Supplementary Table 5 and 8). This resulted in 0-22% false positives and 33-45% false negatives in our EBR set, depending on the sequencing coverage of each assembly (Supplementary Table 7). At 100Kbp resolution 1,796 avian EBRs were assigned to phylogenetic nodes and 1,021 (56.85%) passed our chimeric scaffold detection quality controls. Out of 1,021 EBRs, 42 were specific to all Galliformes, and 16 were specific to the chicken lineage (Figure 1 and Supplementary Table 5). We detected a total of 874 lineage-specific EBRs, i.e. assigned to lineages leading to each species in our set after the divergence from the most recent common ancestor with other included species (Supplementary Table 5).
Lineage-specific EBRs are enriched in TEs in birds. In mammals, lineage-and orderspecific EBRs are enriched for TEs that were active at the time of lineage/order formation (Groenen, et al. 2012;Schibler, et al. 2006), and TEs can promote chromosome rearrangements by non-allelic homologous recombination (Bailey, et al. 2004).
In birds, we found that one or more of four families of TEs (LINE-CR1, LTR-ERVL, LTR-ERVK, and LTR-ERV1) were significantly enriched in lineage-specific EBRs among 19 bird species (>100bp on average in the EBR-or non-EBR-containing non-overlapping 10Kbp genome intervals; FDR<10%; Figure 2). The only exceptions were ostrich and Adelie penguin lineage-specific EBRs, which had a significant negative association with the LINE-CR1 elements and LINE-CR1 and LTR-ERVL elements, respectively, implying the presence of still unidentified lineage-specific TEs associated with EBRs in these two species. Our findings suggest that lineage-specific EBRs are associated with the presence of TE elements in birds, following the trend previously reported for mammals (Groenen, et al. 2012).

Multispecies (ms)HSBs in avian and reptile genomes.
To evaluate if msHSBs were maintained during bird evolution, five sets of msHSBs (the regions of genomes that were not interrupted by EBRs; Supplementary Table 10 and 11) were defined: avian, archosaurian, archosaurian/testudines, sauropsid, and amniote. We detected 1,746 avian msHSBs, covering 76.29% of the chicken genome. Using the Kolmogorov-Smirnov test, the distribution of msHSB sizes was tested for goodness-of-fit to an exponential distribution, following previous publications Pevzner and Tesler 2003). We detected 21 msHSBs longer than the maximum lengths expected from a random distribution of EBRs (Supplementary  Table 10 and 11), indicating that large msHSBs could be maintained in evolution of bird and other reptile genomes (Supplementary Table 10). Six amniote-, four sauropsid-, three archosaurian/testudines-, three archosaurian-, and five avian-msHSBs were significantly longer than would be expected from a random distribution of EBRs (Supplementary Table 10).
To unravel the potential functional role of msHSBs in reptilian genomes we asked whether msHSBs were enriched in avian conserved non-coding elements (CNEs), many of which are gene regulatory sequences or miRNA (Zhang, et al. 2014), and chicken genes. All five msHSB sets were highly enriched (p-value <3e-12) in avian CNEs, with a ratio between CNE base pairs in msHSBs and other genome intervals ranging from 1.45 for avian to 1.62 for archosaurian/testudines msHSBs ( Table 1). The density of chicken genes in all msHSBs followed the opposite trend, with msHSBs having significantly fewer genes than other genome intervals (ranging from 0.58 for avian msHSBs to 0.74 for sauropsid and amniote ones; p-value <3e-12; Table 1). To test if CNEs enrichment in msHSBs is not due to the reduction in the number of genes in msHSBs, we renamed all coding bases as additional CNE bases within the 91,947 windows in the chicken genome used to analyse the CNE density. We compared the original and obtained CNE densities in each window and found that the increment was very low with the average genome-wide ratio of the obtained to the real CNE bases of 1.02. We repeated this experiment for msHSB windows and non-msHSB windows separately and observed very similar values (1.02 for both). These values are much lower than the ratio of CNE bases in msHSBs compared to other genome intervals (Table 1), suggesting that the enrichment of CNEs in msHSBs detected is not due to the lack of genes in msHSBs.
Overall, msHSBs in birds and other reptiles are gene-sparse but enriched for bird-specific non-randomly conserved DNA sequences (Table 1). Avian and reptile msHSBs lack coding genes but are enriched in CNEs, and at least the largest msHSBs are non-randomly maintained in evolution. This likely reflects the existence of selection against chromosome rearrangements in some avian genome intervals.

Signatures of gene-functional enrichment in msHSBs.
To identify if there are gene pathways associated with bird and/or reptile msHSBs we measured gene ontology (GO) enrichment in msHSBs. We analysed msHSBs >1.5Mbp in the chicken genome, covering from 8.03% to 18.12% of the genome in amniote and avian msHSBs, respectively and 10,830 genes with a single ortholog in human and chicken. We identified functional enrichment in all five sets of msHSBs ( Figure 3 and Supplementary Table 12; FDR<10%).
The development of primary sexual characteristics term-related genes were significantly enriched in avian, archosaurian and archosaurian/testudines msHSB sets. Out of these 17 genes distributed across 12 chicken chromosomes, only one (BMPR1B) was found in an avian-specific msHSBs but absent from the remaining msHSB sets. BMPR1B plays a role in ovulation (Onagbesan, et al. 2003), and in formation of the bird three-digit limb (Welten, et al. 2005). A bird-specific CNE found 100bp upstream to BMPR1B contains two transcription factor binding sites (TFBSs) for AP-1 (known as cJun) and NF-E4. The AP-1 transcription factor superfamily plays a role in the regulation of apoptosis during limb development in chickens (Suda, et al. 2014), and could account for the reported differences in expression of BMPR1B in birds compared to other vertebrates (Brawand, et al. 2011). Therefore, the presence of this CNE containing a relevant TFBS could contribute to formation and stability of this msHSB in avian evolution.
Appendage and limb development genes (19 genes in 12 avian msHSBs on eight chicken chromosomes) were significantly enriched in the avian msHSB set only. Five genes were in avian-specific msHSBs (SHOX, DLX5, DLX6, HOXA11, and BMPR1B). DLX5 is under positive selection in birds (Zhang, et al. 2014) and mis-expression in chicken embryos leads to feather fusions and loss (Rouzankina, et al. 2004). In line with a previous study (Lowe, et al. 2015) reporting CNEs near feather-related genes controlling the expression of these genes, we found a bird-specific CNE 1.9Kbp, containing a TFBS for TGGCA-binding proteins upstream of DLX5. The HOXA11 gene is expressed during the proximodistal limb bud development leading to the formation of ulna and radius bones (Zeller, et al. 2009), and is under positive selection in birds (Zhang, et al. 2014). Overall, msHSBs are enriched for genes related to clade-specific phenotypes, suggesting a link between the formation of these genomic regions and clade-specific traits.
Functional categories of genes in lineage-specific EBRs. To evaluate potential associations between gene functional groups and lineage-specific EBRs, we performed GO enrichment analysis in EBRs from the 21 bird genomes. Only EBRs from genomes assembled with the aid of maps and those that passed our chimeric scaffold quality control were included in this analysis (Supplementary Table 5). We considered enriched GO terms those with genes in at least four EBRs per species to detect the terms affected by multiple chromosome rearrangements. Twenty-three categories were significantly enriched in EBRs in lineages leading to eight bird species (Table 2 and Supplementary Table 13).
The EBRs leading to budgerigar after the divergence from the ancestor of Passeriformes/parrots tended to reshuffle genes involved in forebrain development.
Remarkably, the same term was also enriched in avian and archousaurian msHSBs, however, the gene pathways affected by EBRs and msHSBs were different (Figure 3 and 4). The msHSBs contained genes related to three of the five conserved canonical signalling pathways involved in forebrain development in vertebrates (Bertrand and Dahmane 2006;Rhinn, et al. 2006): the Hedgehog pathway (SHH, Gli2 and Gli3), the WNT pathway (WNT3A, betacatenin and Lef-1) and the FGF pathway (FGF8 and SOX2) (Harrison-Uy and Pleasure 2012; Quinlan, et al. 2009) (Figure 4). Several studies demonstrated that WNT3A is expressed in mouse dorsal telencephalon, but not in chicken (Hollyday, et al. 1995), possibly explaining the anatomical differences between the forebrain in these species (Robertshaw and Kiecker 2012;Shimogori, et al. 2004). In contrast, the budgerigar lineage-specific EBRs contained genes related to the NOTCH1-NUMB pathway (Figure 4) as well as DRAXIN. All three genes are involved in differentiation of neurones (Islam, et al. 2009;Wakamatsu, et al. 1999).
Although all vocal-learner bird species (songbirds, parrots and hummingbirds) have a 'vocal brain nuclei' in the forebrain, parrots, in addition, have a unique song-system compared to other vocal-learners (Chakraborty, et al. 2015;Jarvis 2004). To the best of our knowledge, this is the first report of distinct components of the same developmental network being found in the evolutionary stable and dynamic parts of animal genomes.
In summary, we demonstrated that genome synteny comparison represents a powerful tool to detect ancestral and lineage-specific genome-rearrangements, as well as evolutionary stable chromosomal intervals. Consistent with previous studies in mammals Murphy, et al. 2005), chromosome breakage in reptiles and birds is not random but associated with genomic features including TEs and CNEs. We identified functional categories of genes enriched in conserved regions maintained from ancestral chromosomes or in some lineage-specific EBRs with genes related to ancestral-or lineage-specific biology.
The most interesting result of EBR contribution to avian evolution (budgerigar) in our set was associated with the highest quality genome supported by additional mapping information.
Therefore, the availability of more genomes supported by maps or assembled to a chromosome level will allow us to identify further genomic changes that contributed to the formation of existing species and clades.

Identification of SFs.
Alignments of 20 bird genomes and five outgroup genomes were performed against chicken genome using SatsumaSynteny (Grabherr, et al. 2010) (Supplementary Table 1). Syntenic fragments (SFs) were defined using three sets of parameters to detect genome rearrangements that are ≥500Kbp, ≥300Kbp and ≥100Kbp in the chicken genome with SyntenyTracker (Donthu, et al. 2009).
Identification and classification of EBRs. Breakpoint regions (BRs) were defined as the intervals delimited by two adjacent SF boundaries on the same reference chromosome. We developed a new multi-step approach to detect and classify EBRs from chromosome-level and fragmented assemblies. Briefly, we identified all potential BRs for every target genome pairwise comparison with the reference at each resolution in the reference genome coordinates. Then BRs from all pair-wise genome comparisons were cross-compared for reference genome coordinate overlaps. If a target genome was not assembled to chromosomal level, only BRs found within the scaffolds of the target assembly were classified as EBRs. We performed a phylogenetic classification of BRs using an ad hoc likelihood ratio approach, by calculating likelihoods for all possible classifications for each BR. The ratios of likelihoods were calculated for the first and second most likely classifications and were used as a quantitative basis for assigning BRs to phylogenetic branches, thereby qualifying them as EBRs, and distinguishing EBRs from so called uncertain BRs that could not be unambiguously assigned to a specific phylogenetic branch (see Supplementary data for more details).
To test the accuracy of our EBR classification approach we: a) compared the EBRs detected by our algorithm in the cattle genome to the previously published manually-defined cattle EBRs (Supplementary Table 2) and b) simulated a set of rearranged genomes with predefined phylogeny of EBRs (Supplementary Figure 2). We compared these EBRs and their classification to the EBRs detected and classified by our algorithm from the same set of genomes (Supplementary Table 3 and 4). Since many of the assemblies used in this study were sequenced and assembled at scaffold level using NGS technologies, we developed a methodology to distinguish between putative assembly errors and lineage-specific EBR in NGS assemblies. First, we tested the EBR intervals by PCR using primers from the EBRflanking DNA regions for three genomes with different sequencing coverage (63x, 85x and 105x). We calculated a minimum paired-read spanning coverage from the read libraries in all potential EBR intervals in the same genomes and correlated the levels of coverage to the rates of positive and negative PCR results to estimate the paired-read spanning level for each sequencing coverage that resulted in the minimum number of false positive and false negative EBRs (Supplementary Table 7 and 8). We applied these thresholds to other genomes with similar sequencing coverage (Supplementary Table 8).
To avoid possible underestimation of EBR numbers that would lead to detection of false regions of multispecies synteny we chose the highest (100Kbp) resolution to define msHSBs. The 500Kbp set was selected for gene enrichment analysis in EBRs to further minimize the effects of potential assembly errors in EBRs.

Identification of msHSBs.
Multispecies HSBs were defined as the regions of reference chromosomes with no EBRs or uncertain BRs detected in our set of species. Five sets of msHSBs were defined: (i) avian msHSBs, including all birds, (ii) archosaurian msHSBs, including birds and crocodiles, (iii) archosaurian/testudines msHSBs, in birds, crocodiles, and turtles, (iv) sauropsida msHSBs, including all reptiles, and (v) amniote msHSBs, identified in all species studied. The distribution of msHSB sizes was tested for goodness-of-fit to an exponential distribution using the Kolmogorov-Smirnov test following previous publications Pevzner and Tesler 2003) (Supplementary Table 9 and 10).

Functional analysis of genes in EBRs and msHSBs.
Coordinates of all genes with a single known ortholog in the chicken and human genomes were downloaded from Ensembl (v.74).
We focused on this set of genes because the follow-up analyses used functional annotation of genes generated mostly for mammalian genomes. To avoid genes that could be located in mis-assembled parts of both genomes or have erroneous definitions of orthology in Ensembl, we used the gene list to build chicken-human pairwise HSBs with SyntenyTracker using the gene coordinates. This allowed the detection of "singleton" and "out-of-place" genes located in unexpected positions within or between HSBs. These genes were removed from further analyses. We assigned the genes to EBRs or msHSBs following the previously published procedures ). For the identification of GO terms overrepresented in msHSBs, we considered msHSBs >1.5Mbp in the chicken genome to avoid genes that could be located in proximity to EBRs. To evaluate gene functional enrichment in EBRs, we considered genes that were located within or ±300Kbp from EBR boundaries. We used the Database for Annotation, Visualization and Integrated Discovery (DAVID) (Huang, et al. 2008) to detect overrepresented GO terms in our datasets. We considered as significantly enriched terms with >2 fold-enrichment and false discovery rate (FDR) <10% in EBRs or msHSBs relative to all other regions on chicken chromosomes.

Comparing densities of TEs in EBRs and other parts of bird genomes. Lineage-specific
EBRs identified in chicken genome coordinates were translated into the coordinates of target bird genomes using the correspondence between SF boundary coordinates in the chicken and target genomes. In the resulting EBR sets and chicken-specific EBRs we calculated the densities of TEs from major families and compared to those in other intervals of each target genome (RepeatMasker, RepBase v.18), as previously described (Elsik, et al. 2009;Groenen, et al. 2012;).

Density of bird-specific CNEs and genes in msHSBs.
Bird-specific conserved elements (Zhang, et al. 2014) defined in galGal3 coordinates were filtered to remove elements present in coding parts of chicken genes and all mRNA sequences mapped to the chicken genome, leaving only putative conserved non-coding elements (CNEs). Then, we used LiftOver (Kent, et al. 2003) to translate the CNE coordinates to galGal4 assembly to make the data compatible with our HSBs sets. We repeated filtering steps for the new genome coordinates obtained.
The set of elements that was not overlapping with coding sequences after two filtering steps represented the bird CNEs in the chicken genome. Densities of CNEs and chicken genes (UCSC; all known gene set) were calculated in all msHSBs sets, and were compared to the rest of the reference genome using the previously published pipeline ).
After the GO enrichment analysis was performed, we screened the avian-specific CNEs nearby genes in the enriched categories for TFBSs using PROMO (Messeguer, et al. 2002) with a dissimilarity margin ≤10% with TFBSs found in chicken.
Key words: Chromosome rearrangements, birds, reptiles, genome evolution, comparative genomics. and approved the manuscript for publication.

Competing financial interests
The author(s) declare no competing financial interests. homologous synteny blocks (HSBs). Blue and red blocks define SFs in target genomes in "+" and "-" orientation, respectively compared to the chicken chromosome 5 defined at 100Kbp resolution, with target species scaffold or chromosome numbers indicated inside the blocks. Only the columns with genomes assembled to chromosomes (turkey, duck, zebra finch, Anole lizard, and opossum) contain complete HSBs while blocks in the remaining columns represent either HSBs or SFs. EBRs are defined as white intervals in between either two adjacent SFs originating from the same scaffold in a target genome or two adjacent HSBs.

Figure legends
Reference-specific EBRs are represented by the white intervals that overlap in all species. The arrowheads point to a chicken-specific and a Galloanserae-specific EBRs. Pale grey boxes demarcate avian msHSBs that are > 1.5Mbp in the chicken genome. Asterisks demark genomes with modified scaffold IDs for better visibility. All reference chromosome and target genome alignments are available from the avian Evolution Highway website: http://ehdemo.ncsa.uiuc.edu/birds.

Fig. 2. Relationship between lineage-specific evolutionary breakpoint regions (EBRs)
and transposable elements (TEs) in avian species. The phylogenetic tree is based on (Jarvis, et al. 2014). Red bars indicate a significant enrichment of TEs from one or more abundant avian TE families (LINE-CR1, LTR-ERVL, LTR-ERVK and LTR-ERV1) in lineage-specific EBRs (p-value<0.05; FDR<10%); green bars show significant negative associations of TEs with lineage-specific EBRs (p-value<0.05; FDR<10%); and grey bars indicate elevated numbers of the TE families in lineage EBRs (higher number of TEs in EBRs compared to the rest of the genome but not reaching a significance level of p-value<0.05 and FDR<10% likely due to a low number of lineage-specific EBRs resulting in a low power of the statistical test).