Evolutionary dynamics of sex chromosomes of palaeognathous birds

Standard models of sex chromosome evolution propose that recombination suppression leads to the degeneration of the heterogametic chromosome, as is seen for the Y chromosome in mammals and the W chromosome in most birds. Unlike other birds, palaeognaths (ratites and tinamous) possess large non-degenerate regions on their sex chromosomes (PARs or pseudoautosomal regions), despite sharing the same sex determination region as neognaths (all other birds). It remains unclear why the large PARs of palaeognaths are retained over more than 100 MY of evolution, and the impact of these large PARs on sex chromosome evolution. To address this puzzle, we analysed Z chromosome evolution and gene expression across 12 palaeognaths, several of whose genomes have recently been sequenced. We confirm at the genomic levels that most palaeognaths (excepting some species of tinamous) retain large PARs. As in neognaths, we find that all palaeognaths have incomplete dosage compensation on the regions of the Z chromosome homologous to degenerated portions of the W (differentiated regions or DRs), but we find no evidence for enrichments of male-biased genes in PARs. We find limited evidence for increased evolutionary rates (faster-Z) either across the chromosome or in DRs for most palaeognaths with large PARs, but do recover signals of faster-Z evolution similar to neognaths in tinamou species with mostly degenerated W chromosomes (small PARs). Unexpectedly, in some species PAR-linked genes evolve faster on average than genes on autosomes. Increased TE density and longer introns in PARs of most palaeognaths compared to autosomes suggest that the efficacy of selection may be reduced in palaeognath PARs, contributing to the faster-Z evolution we observe. Our analysis shows that palaeognath Z chromosomes are atypical at the genomic level, but the evolutionary forces maintaining largely homomorphic sex chromosomes in these species remain elusive.

W chromosome in most birds. Unlike other birds, palaeognaths (ratites and tinamous) possess large 23 non-degenerate regions on their sex chromosomes (PARs or pseudoautosomal regions), despite 24 sharing the same sex determination region as neognaths (all other birds). It remains unclear why the 25 large PARs of palaeognaths are retained over more than 100 MY of evolution, and the impact of these 26 large PARs on sex chromosome evolution. To address this puzzle, we analysed Z chromosome 27 evolution and gene expression across 12 palaeognaths, several of whose genomes have recently been 28 sequenced. We confirm at the genomic levels that most palaeognaths (excepting some species of 29 tinamous) retain large PARs. As in neognaths, we find that all palaeognaths have incomplete dosage 30 compensation on the regions of the Z chromosome homologous to degenerated portions of the W 31 (differentiated regions or DRs), but we find no evidence for enrichments of male-biased genes in 32 PARs. We find limited evidence for increased evolutionary rates (faster-Z) either across the 33 chromosome or in DRs for most palaeognaths with large PARs, but do recover signals of faster-Z 34 evolution similar to neognaths in tinamou species with mostly degenerated W chromosomes (small 35 PARs). Unexpectedly, in some species PAR-linked genes evolve faster on average than genes on 36 autosomes. Increased TE density and longer introns in PARs of most palaeognaths compared to 37 autosomes suggest that the efficacy of selection may be reduced in palaeognath PARs, contributing to 38 the faster-Z evolution we observe. Our analysis shows that palaeognath Z chromosomes are atypical 6 emu scaffolds (Sackton et al. 2018), and then aligned additional palaeognaths (Fig. 1) to emu. 148 We then ordered and oriented putatively Z-linked scaffolds in non-ostrich assemblies into 149 pseudo-chromosomes using the ostrich Z chromosome as a reference (Fig. S1). Visualization 150 of pseudo-chromosome alignments (Fig. S1) showed little evidence for inter-chromosomal 151 translocations, as expected based on the high degree of synteny across birds (Ellegren 2010); 152 an apparent 12Mb autosomal translocation onto the ostrich Z chromosome is a likely mis-153 assembly (Fig. S2).

156
We next annotated the pseudoautosomal region (PAR) and differentiated region (DR) 157 of the Z chromosome in each species. The DR is thought to arise as a result of inversions on 158 the Z or W chromosome that suppress recombination between the Z and the W, eventually 159 leading to degeneration of the W-linked sequence inside the inversion and hemizygosity of 160 should be about twice that of genes with intact W-linked gametologs (in the PAR). 172 For seven species genomic sequencing data from females, either newly generated in 173 this study (lesser rhea, thicket tinamou, Chilean tinamou) or previously published (emu, 174 ostrich, cassowary, North Island brown kiwi, white-throated tinamou), we annotated PAR 175 and DR regions using coverage (Fig. 1B, Fig. S3). While some variation in coverage 176 attributable to differences in GC content is apparent, the coverage reduction in the DR region 177 is robust (Fig. 1B). For three of these species, we also used newly generated (emu) or 178 published (ostrich, Chilean tinamou) male and female RNA-seq data; using expression ratios 179 to annotate DR/PAR boundaries produced results consistent with coverage analysis in all 180 three of these species (Fig. 1B, Fig. S3, Fig. S4). We used expression ratios alone to 181 demarcate the DR/PAR boundaries in little spotted kiwi and Okarito kiwi (Fig. S4), which we 182 found to be in similar genomic locations in both species, and also in the same locations as the 183  with one exception in the Okarito brown kiwi (Fig. S6). Overall, we see little evidence for 239 accumulation of male-biased genes in palaeognath PARs, and suggest that the lack of 240 degeneration of the emu W chromosome, and likely other palaeognathous chromosomes is 241 probably not due to resolution of sexual antagonism through acquisition of sex-biased genes. 242 Large PARs are associated with lack of faster-Z evolution in palaeognaths 243 The unusually large PARs and the variation in PAR size make Palaeognathae a 244 unique model to study faster-Z evolution. To test whether Z-linked genes evolve faster than 245 autosomal genes, we computed branch-specific dN/dS ratios (the ratio of nonsynonymous 246 substitution rate to synonymous substitution rate) using the PAML free-ratio model for We included include 23 neognaths and 12 palaeognaths in our analysis. Overall, Z-254 linked genes in neognaths (with few exceptions) have a significantly higher dN/dS ratio than 255 autosomal (chr 4/5) genes, suggesting faster-Z evolution (Fig. 3). This result is consistent 256 with a previous study involving 46 neognaths (Wang et al. 2014). In contrast to neognaths, 257 the majority of palaeognaths, and all but two species with large PARs, had similar dN/dS 258 ratios for autosomal and Z-linked genes, and thus did not show evidence for faster-Z 259 evolution (Fig. 3). 260 The lack of a faster-Z effect for palaeognaths with large PARs is perhaps not surprising, since 261 PAR-linked genes are not expected to evolve faster than autosomal genes under standard 262 models of faster-Z evolution. We divided Z-linked genes into those with presumed intact W-263 linked gametologs (PAR genes) and those with degenerated W-linked gametologs (DR 264 genes). Surprisingly, we see little evidence for faster-Z evolution in palaeognaths even for 265 DR genes: only in cassowary, thicket tinamou and white-throated tinamou do DR genes show 266 accelerated dN/dS and dN relative to autosomes (Fig. 4, Fig. S8). Thicket tinamou and white-267 throated tinamou possess small PARs typical of neognaths, and faster-Z has also been 268 11 observed for white-throated tinamou in a previous study (Wang et   The observation of faster-DR evolution in cassowary (p = 0.009, two-sided 273 permutation test) suggests that faster-DR evolution may not be limited to species with 274 12 extensive degeneration of the W chromosome (e.g., with small PARs). However, an 275 important caveat is that the cassowary genome (alone among the large-PAR species) was 276 derived from a female individual, which means that some W-linked sequence could have 277 been assembled with the Z chromosome, especially for the region with recent degeneration. 278 This would cause an artefactual increase in divergence rate. 279 Unexpectedly, in four species of palaeognaths we find evidence that genes in the PAR 280 evolve faster than autosomal genes on chromosomes of similar size (chr4/5), which is not 281 predicted by either the positive selection or genetic drift hypothesis for faster-Z evolution 282 (Fig. 4) suggest that small PARs evolve slower in birds (Smeds et al. 2014). The small number of 298 PAR-linked genes in white-throated tinamou (N=9) suggests some caution in interpreting this 299 trend is warranted. The kiwis also show a trend towards faster-PAR evolution, though this is 300 only statistically significant in Okarito brown kiwi (p = 0.010, two-sided permutation test) 301 (Fig. 4). Interestingly, species with evidence for faster-PAR evolution also have suggestive 302 evidence of relatively faster rates of W chromosome degeneration. In particular, tinamous 303 have intermediate or small PARs (Fig. 1), suggesting that sex chromosomes may not be as 304 stable in these species as in ratites. Similarly, while PARs in the little spotted kiwi and great 305 spotted kiwi are large compared to neognaths, they are relatively shorter than in ostrich, emu 306 and cassowary, suggesting additional degeneration of the W chromosomes. Moreover, in 307 North Island brown kiwi, coverage for female reads suggests an on-going degeneration of the 308 W chromosome (Fig. S3). However, further study will be needed to confirm this trend. white-throated tinamou, r = 0.98) (Fig. S9, Table S2). Extrapolating from autosomal data, we 320 would expect PARs (smaller than 50Mb in all species) to have lower TE density than chr5 321 (~63Mb) or chr4 (~89Mb) if similar evolutionary forces are acting on them to purge TEs. 322 Strikingly, we find that all palaeognaths with large PARs harbor significantly higher TE 323 densities on the PAR than autosomes (Fig. 4B), which suggests reduced purging of TEs on 324 PARs. For DRs, it is unsurprising that TE densities are much higher than in chr4/5 (Fig. 4B) 325 since both reduced recombination rates (due to no recombination in females) and reduced Ne 326 (due to hemizygosity of the DR in females) will reduce the efficacy of selection. In fact, TE 327 14 densities of the DRs are also higher than those of all macro-chromosomes, as well as those of 328 the PARs (Fig. 4B). 329 Intron size is probably also under selective constraint (Carvalho and Clark 1999) The pattern of larger intron sizes in the PARs remains unchanged when all macro-344 chromosomes were included for comparison (Fig. S9). Similar to PARs, DRs also show 345 larger intron sizes relative to chr4/5 (p < 0.00081, Wilcoxon rank sum test). 346 Finally, codon usage bias is often used as proxy for the efficacy of selection and is 347 predicted to be larger when selection is more efficient (Shields et al. 1988). To assess codon 348 usage bias, we estimated effective number of codon (ENC) values, accounting for local 349 nucleotide composition. ENC is lower when codon bias is stronger, and thus should increase 350 with reduced efficacy of selection. As expected, ENC values showed a strong positive 351 correlation with chromosome sizes (Table S2), and are higher for DR-linked genes in most 352 species (although not rheas, the little spotted kiwi, or the Okarito brown kiwi) (Fig. S10). least a plausible proxy for recombination rate. In contrast to the results for collared 371 flycatcher, GC3s of palaeognath PARs were significantly lower than those of chr4/5s (p < 372 2.23e-05, Wilcoxon sum rank test) (Fig. 5A, Fig. S9, Fig. S11), except for white-throated 373 tinamou and thicket tinamou. Inclusion of the other macro-chromosome does not change the 374 16 pattern (p < 0.0034). Moreover, distribution of GC3s along the PAR is more homogeneous 375 compared to chr4 or chr5, except for the chromosomal ends (Fig. S11). 376 Overall, then, we find evidence from TE density and intron size that efficacy of 377 selection may be reduced on the PAR in most large-PAR palaeognaths, potentially because of 378 reductions in recombination rate (as suggested by reduced GC3s), although we note the 379 signature from codon bias (ENC) is more ambiguous. If indeed recombination rate is reduced 380 relative to a similarly sized autosome for most large-PAR species, that could explain why we 381 see some evidence for faster-PAR evolution in palaeognaths. Notably, the two species in our dataset that presumably share heteromorphic sex 400 chromosomes derived independently from neognaths (white-throated tinamou and thicket 401 tinamou) do show evidence for faster-Z evolution, and in particular faster evolution of DR 402 genes. In contrast, palaeognaths with small DR and large PAR do not tend to show evidence 403 for faster-DR, even though hemizygosity effects should be apparent (the exception is 404 cassowary, which may be an artifact due to W-linked sequence assembling as part of the Z). 405 A previous study on neognaths shows that the increased divergence rate of the Z is 406 mainly contributed by recent strata while the oldest stratum (S0) does not show faster-Z 407 effect (Wang et al. 2014). Neognaths and palaeognaths share the S0, and since their 408 divergence only a small secondary stratum has evolved in palaeognaths ). In 409 particular, the DR of palaeognaths without heteromorphic sex chromosomes is largely 410 dominated by this shared S0 stratum. The absence of faster-Z effect in palaeognath DR where 411 S0 dominates is therefore largely consistent with the results of the study on the neognath S0 412 stratum. A possible mechanism to explain this pattern is that, in S0, the reduced effective We also detect evidence for faster evolution of genes in the PAR for tinamous and 424 some species of kiwi. Since the PAR is functionally homomorphic and recombines with the 425 homologous region of the W chromosome, it is not clear why this effect should be observed 426 in these species. However, a common feature of the PARs of tinamous and kiwis is that they 427 are relatively shorter than PARs of other palaeognaths. This raises at least two possible 428 explanations for the faster-PAR effect in tinamous and kiwis: 1) the differentiation of the sex 429 chromosomes is more rapid compared with other palaeognaths, and at least some parts of the 430 PARs may have recently stopped recombining and actually become DR but undetectable by 431 using the coverage method; or 2) the PARs are still recombining but at lower rate, resulting in 432 weaker efficacy of selection against deleterious mutations. 433

Efficacy of selection and recombination rate 434
Multiple lines of evidence suggest a possible reduction in the efficacy of selection in 435 the PAR across all palaeognaths with a large PAR. Specifically, we find both an increase in 436 TE density and an increase in intron size in PARs. In contrast, we do not find clear evidence 437 for a reduction in the degree of codon bias in PARs. However, it is possible that genetic drift 438 (Marais et al. 2001), GC-biased gene conversion (Galtier et al. 2018) and/or mutational bias 439 (Szövényi et al. 2017) may also affect the codon bias, which may weaken the correlation 440 between codon usage bias and the strength of natural selection. 441 It is unclear, however, why the efficacy of selection may be reduced in PARs. One 442 possible cause is that the PARs may recombine at lower rate than autosomes. This is a 443 somewhat unexpected prediction because in most species PARs have higher recombination 444 rates than autosomes (Otto et al. 2011). In birds, direct estimates of recombination rates of the 445 PARs are available in both collared flycatcher and zebra finch, and in both species PARs 446 recombines at much higher rates than most macrochromosomes (Smeds et al. 2014;Singhal 447 et al. 2015). This is probably due to the need for at least one obligate crossover in female 448 meiosis, combined with the small size of the PAR in both collared flycatcher and zebra finch. 449 In palaeognaths where PARs are much larger, direct estimates of recombination rate 450 from pedigree or genetic cross data are not available. Our observation that GC3s are 451 significantly lower in large palaeognath PARs than similarly sized autosomes is at least 452 consistent with reduced recombination rates in these species. However, a recent study on 453 greater rhea shows that the recombination rate of the PAR does not differ from similarly 454 sized autosomes in females (del Priore and Pigozzi 2017), but this study did not examine 455 males. Since the recombination rates differ extensively between sexes (van Oers et al. 2014; 456 Halldorsson et al. 2016;Bhérer et al. 2017), more data is needed to test whether sex-average 457 recombination rate of the PAR differs from autosomes. Additionally, a previous study of emu 458 conducted prior to the availability of an emu genome assembly suggested that the PAR has a 459 higher population recombination rate than autosomes (Janes et al. 2009). However, of twenty 460 two loci in that study, seven appear to be incorrectly assigned to the sex chromosomes based 461 on alignment to the emu genome assembly (Table S3), potentially complicating that 462 conclusion. The relatively small size of that study and recently improved resources and 463 refined understanding of recombination rates across chromosome types provide opportunities 464 for a new analysis. Further direct tests of recombination rate on ratite Z chromosomes are 465 needed to resolve these discrepancies. 466

Identification of the Z chromosome, PARs and DRs 516
The repeat-masked sequence of ostrich Z chromosome (chrZ) (Zhang et al. 2015) was 517 used as a reference to identify the homologous Z-linked scaffolds in recently assembled 518 palaeognath genomes (Sackton et al. 2018). We used nucmer function from MUMmer 519 package (Kurtz et al. 2004) to first align the ostrich Z-linked scaffolds to emu genome; an 520 emu scaffold was defined as Z-linked if more than 50% of the sequence was aligned. The Z-521 linked scaffolds of emu were further used as reference to infer the homologous Z-linked 522 sequences in the other palaeognaths because of the more continuous assembly of emu 523 genome and closer phylogenetic relationships, and 60% coverage of alignment was required. 524 During this process, we found that a ~12Mb genomic region of ostrich chrZ (scf347, scf179, 525 scf289, scf79, scf816 and a part of scf9) aligned to chicken autosomes. The two breakpoints 526 can be aligned to a single scaffold of lesser rhea (scaffold_0) (Fig. S1), so we checked 527 whether there could be a mis-assembly in ostrich by mapping the 10k and 20k mate-pair 528 reads from ostrich to the ostrich assembly. We inspected the reads alignments around the 529 breakpoint and confirmed a likely mis-assembly (Fig. S2). The homologous sequences of this 530 region were subsequently removed from palaeognathous Z-linked sequences. When a smaller 531 ostrich scaffold showed discordant orientation and/or order, but its entire sequence was 532 harbored within the length of longer scaffolds of other palaeognaths (Fig. S1), we manually 533 changed the orientation and/or order of that scaffold for consistency. After correcting the 534 orientations and orders of ostrich scaffolds of chrZ, a second round of nucmer alignment was 535 performed to determine the chromosomal positions for palaeognathous Z-linked scaffolds. 536 One way to infer the boundary between the PAR (pseudoautosomal region) and DR 537 (differentiated region) is to compare the differences of sequencing depth of female DNA. 538 Because the DR is not recombining in female and W-linked DR will degenerate over time 539 (and thus diverge from Z-linked DR), the depth of sequencing reads from the Z-linked DR is 540 generally expected to be half of that from the PAR or autosomes. This approach was applied 541 to cassowary, whose sequence is derived a female individual. For emu, female sequencing 542 was available from Vicoso et al. (Vicoso, Kaiser, et al. 2013). To facilitate the PAR 543 annotation, we generated additional DNA-seq data from a female for each of lesser rhea, 544 Chilean tinamou and thicket tinamou. Default parameters of BWA (v0.7.9) were used to map 545 DNA reads to the repeat-masked genomes with BWA-MEM algorithm (Li 2013), and 546 22 mapping depth was calculated by SAMtools (v1.2) (Li et al. 2009). A fixed sliding window 547 of 50kb was set to calculate average mapping depths along the scaffolds. Any windows 548 containing less than 5kb were removed. Significant shifts of sequencing depth were annotated 549 as the boundaries of the PARs and DRs. 550 Another independent method for PAR annotation is based on gene expression 551 differences between male and female of PAR-and DR-linked genes. To reduce the effect of 552 transcriptional noise and sex-biased expression, 20-gene windows were used to calculate the 553 mean male-to-female ratios. The shifts of male-to-female expression ratios were used to 554 annotate approximate PAR/DR boundaries. This method was applied to little spotted kiwi, 555 Okarito brown kiwi, emu and Chilean tinamou. Given the small divergence between little 556 spotted kiwi and great spotted kiwi, it is reasonable to infer that the latter should have a 557 similar PAR size. Neither female reads nor RNA-seq reads are available for greater rheas and 558 elegant crested tinamou, so the PAR/DR boundaries of lesser rhea and Chilean tinamou were 559 used to estimate the boundaries respectively. 560

Comparison of genomic features 561
To estimate GC content of synonymous sites of the third position of codons (GC3s), 562 codonW (http://codonw.sourceforge.net) was used with the option '-gc3s'. The exon density 563 was calculated by dividing the total length of exon over a fixed 50k windows by the window 564 size. Similarly, we summed the lengths of transposable elements (TEs, including LINE, 565 SINE, LTR and DNA element) based on RepeatMasker outputs (Kapusta et al, personal 566 communication) to calculate density for 50k windows. Intron sizes were calculated from gene 567 annotations (GFF file). Codon usage bias were quantified by effective number of codons 568 (ENC) using ENCprime. We used intronic sequences to estimate background nucleotide 569 frequency to further reduce the effect of local GC content on codon usage estimates. 570 Wilcoxon sum rank test were used to assess statistical significance. 571

Divergence analyses 572
The estimates of synonymous and non-synonymous substitution numbers and sites 573 were extracted from PAML (Yang 2007) outputs generated by free-ratio branch models, 574 For the newly generated samples (emu brains, gonads and spleens), RNA extraction 593 was performed using RNeasy Plus Mini kit (Qiagen). The quality of the total RNA was 594 assessed using the RNA Nano kit (Agilent). Poly-A selection was conducted on the total 595 RNA using PrepX PolyA mRNA Isolation Kit (Takara). The mRNA was assessed using the 596 RNA Pico kit (Agilent) and used to make transcriptome libraries using the PrepX RNA-Seq 597 for Illumina Library Kit (Takara). HS DNA kit (Agilent) was used to assess the library 598 quality. The libraries were quantified by performing qPCR (KAPA library quantification kit) 599 and then sequenced on an NextSeq instrument (High Output 150 kit, PE 75 bp reads). Each 600 library was sequenced to a depth of approximately 30M reads. The quality of the RNA-seq 601 data was assessed using FastQC. Error correction was performed using Rcorrector; unfixable 602 reads were removed. Adapters were removed using TrimGalore!. Reads of rRNAs were 603 removed by mapping to the Silva rRNA database. 604 We used RSEM (v1.2.22) (Li and Dewey 2011) to quantify the gene expression 605 levels. RSEM implemented bowtie2 (v2.2.6) to map the RNA-seq raw reads to transcripts 606 (based on a GTF file for each species), and default parameters were used for expression 607