A new duck genome reveals conserved and convergently evolved chromosome architectures of birds and mammals

Abstract Background Ducks have a typical avian karyotype that consists of macro- and microchromosomes, but a pair of much less differentiated ZW sex chromosomes compared to chickens. To elucidate the evolution of chromosome architectures between ducks and chickens, and between birds and mammals, we produced a nearly complete chromosomal assembly of a female Pekin duck by combining long-read sequencing and multiplatform scaffolding techniques. Results A major improvement of genome assembly and annotation quality resulted from the successful resolution of lineage-specific propagated repeats that fragmented the previous Illumina-based assembly. We found that the duck topologically associated domains (TAD) are demarcated by putative binding sites of the insulator protein CTCF, housekeeping genes, or transitions of active/inactive chromatin compartments, indicating conserved mechanisms of spatial chromosome folding with mammals. There are extensive overlaps of TAD boundaries between duck and chicken, and also between the TAD boundaries and chromosome inversion breakpoints. This suggests strong natural selection pressure on maintaining regulatory domain integrity, or vulnerability of TAD boundaries to DNA double-strand breaks. The duck W chromosome retains 2.5-fold more genes relative to chicken. Similar to the independently evolved human Y chromosome, the duck W evolved massive dispersed palindromic structures, and a pattern of sequence divergence with the Z chromosome that reflects stepwise suppression of homologous recombination. Conclusions Our results provide novel insights into the conserved and convergently evolved chromosome features of birds and mammals, and also importantly add to the genomic resources for poultry studies.

Ducks have a typical avian karyotype that consists of macro-and micro-chromosomes, but a pair of much less differentiated ZW sex chromosomes compared to chicken. To elucidate the evolution of chromosome architectures between duck and chicken, and between birds and mammals, we produced a nearly complete chromosomal assembly of a female Pekin duck by combining long-read sequencing and multiplatform scaffolding techniques. Results: The major improvement of genome assembly and annotation quality resulted from successful resolution of lineage-specific propagated repeats that fragmented the previous Illumina-based assembly. We found that the duck topologically associated domains (TAD) are demarcated by putative binding sites of the insulator protein CTCF, housekeeping genes, or transitions of active/inactive chromatin compartments, indicating the conserved mechanisms of spatial chromosome folding with mammals.
There are extensive overlaps of TAD boundaries between duck and chicken, and also between the TAD boundaries and chromosome inversion breakpoints. This suggests strong natural selection on maintaining regulatory domain integrity, or vulnerability of TAD boundaries to DNA double-strand breaks. The duck W chromosome retains 2.5fold more genes relative to chicken. Parallel to the independently evolved human Y chromosome, the duck W evolved massive dispersed palindromic structures, and a sequence divergence pattern with the Z chromosome that reflects stepwise suppression of homologous recombination. Conclusions: Our results provide novel insights into the conserved and convergently evolved chromosome features of birds and mammals, and also importantly add to the genomic resources for poultry studies. Birds have the largest species number but one of the smallest genome sizes among terrestrial 61 vertebrates. This has attracted extensive efforts into elucidating the diversity of their 62 'streamlined' genomes that give rise to the tremendous phenotypic diversity since the era of 63 cytogenetics [1]. The karyotype of birds exhibits two major distinctions from that of mammals: 64 first, it comprises about 10 pairs of large to medium sized chromosomes (macrochromosomes) 65 and about 30 pairs of much smaller sized chromosomes (microchromosomes) [2]. During the 66 over 100 million years (MY) of avian evolution, there were few interchromosomal 67 rearrangements among most species [3][4][5]

except for falcons and parrots (Falconiformes and 68
Psittaciformes) [6][7][8][9]. Among the published karyotypes of over 800 bird species, the majority of 69 them have a similar chromosome number around 2n=80 [10]. These results indicate that the 70 chromosome evolution of birds is dominated by intrachromosomal rearrangements. Genomic 71 comparisons between chicken, turkey, flycatcher and zebra finch [11,12] found that birds, similar 72 to mammals [13,14], have fragile genomic regions that were recurrently used for mediating 73 intrachromosomal rearrangements, and these regions seem to be associated with high 74 recombination rates [15] and low densities of conserved non-coding elements (CNEs) [5]. 75 However, compared to mammals [13,14,16], much less is known about the interspecific 76 diversity within avian chromosomes, particularly microchromosomes (but see [5,12]) at the 77 sequence level, due to the scarcity of chromosome-level bird genomes. 78 The other major distinction between the mammalian and avian karyotypes is their sex 79 chromosomes. Birds have a pair of female heterogametic (male ZZ, female ZW) sex 80 chromosomes that originated from a different pair of ancestral autosomes than the eutherian 81 XY [17,18]. Since their divergence about 300 MY ago, sex chromosomes of birds and mammals 82 have undergone independent stepwise suppression of homologous recombination, and produced 83 a punctuated pattern of pairwise sequence divergence levels between the neighboring regions 84 termed 'evolutionary strata' [19][20][21]. Despite the consequential massive gene loss, both chicken 85 W chromosome (chrW) and eutherian chrYs have been found to preferentially retain dosage-86 sensitive genes or genes with important regulatory functions [22]. In addition, the human chrY 87 has evolved palindromic sequences that may facilitate gene conversions between the Y-linked 88 gene copies [23], as an evolutionary strategy to retard the functional degeneration under the non-89 recombining environment [24]. Interestingly, such palindromic structures have also been reported 90 on sex chromosomes of New World sparrows and blackbirds [25], and more recently in a plant 91 species, the willow[26], suggesting it is a general feature of evolving sex chromosomes. Both 92 cytogenetic work and Illumina-based genome assemblies of tens of bird species suggested that 93 bird sex chromosomes comprise an unexpected interspecific diversity regarding both their 94 lengths of recombining regions (pseudoautosomal regions, PAR), and their rates of gene loss [20, 95 27]. For example, PARs cover over two thirds of the length of ratite (e.g., emu and ostrich) sex 96 chromosomes [28], but are concentrated at the tips of the chicken and eutherian sex 97 chromosomes. However, so far only the chicken chrW has been well-assembled using the 98 laborious iterative clone-based sequencing method [22], and the majority of genomic sequencing 99 projects tend to choose a male bird to avoid the repetitive chrW. This has hampered our broad 100 and deep understanding of the composition and evolution of avian sex chromosomes. 101 The Vertebrate Genomes Project (VGP) has taken advantage of the development of long-102 read (PacBio or Nanopore) sequencing, linked-read (10X) and high-throughput chromatin 103 conformation capture (Hi-C) technologies to empower rapid and accurate assembly of 104 chromosome-level genomes including the sex chromosomes, in the absence of physical 105 maps [29]. Further, Hi-C can uncover the three-dimensional (3D) architecture of chromosomes 106 that is segregated in active (A) and inactive (B)   and 82-fold of Hi-C reads from a male individual (Figure 1, Supplementary Table S1), and 136 assembled the genome with a modified VGP pipeline [29]. To identify the female-specific chrW 137 sequences, we also generated 72-fold Illumina reads from a male individual to compare to the 138 previously published female reads. Our primary assembly of PacBio long reads assembles the 139 entire genome into 1,645 gapless contigs (Supplementary Table S2), resulting in a 14-fold 140 reduction of contig number (1,645 vs. 227,448) and 212-fold improvement of contig continuity 141 measured by N50 (5.5Mb vs. 26.1Kb) compared to the BGI1.0 genome. To scaffold the contigs, 142 we first corrected their sequence errors with 92-fold female Illumina reads, then oriented and 143 connected them into 942 scaffolds with 10X linked-reads, BioNano optical maps and Hi-C reads. 144 As Hi-C data provides linkage but not orientation information, in our final step of chromosome 145 anchoring, we incorporated an RH linkage map [32] and reduced the scaffold number further 146 down to 755. We however detected 69 cases of conflicts of orientation between the RH map and 147 the Hi-C scaffolds, manifested as inversions. By carefully examining the presence/absence of 148 raw PacBio reads, Illumina mate-pairs, and syntenic chicken/goose sequences [40,41] Table S3). The 159 remaining 4.4% (62.1 Mb) of the genome not anchored is likely due to their repetitive sequence 160 composition or lack of linkage markers, or alternative haplotype sequences not removed by 161 purge haplotigs. In particular, the macrochromosomes have become much more continuous 162 (Figure 1b-c), and we have assembled majorities of microchromosomes that were all unmapped 163 in the BGI1.0 genome. 164 The ZJU1.0 genome assembly also has a higher level of completeness measured by its 165 almost gapless sequence composition (0.37% vs. 3.17%), and substantial numbers of annotated 166 telomeric and centromeric regions (Figure 2a, Supplementary Table S4-5), compared to the 167 BGI1.0 assembly. We filled in a total of 116.2 Mb sequences of gaps within or between the 168 BGI1.0 scaffolds, which were enriched for repetitive elements and GC-rich sequences 169 ( Supplementary Fig. S3-4). This can be explained by the inability of Illumina reads to span or 170 resolve the repeat regions with high copy numbers or complex structures, and the sequencing 171 bias against the GC-rich regions [42][43][44]. Indeed, we found specific transposable elements (TE) 172 that are enriched in the filled gaps ( Supplementary Fig. S4). These include the chicken repeat 1 173 (CR1) retroposon CR1-J2_Pass and the long terminal repeat (LTR) GGLTR8B that have 174 undergone recent lineage-specific bursts in duck after its divergence with other Galloanserae 175 species (Figure 2b, Supplementary Table S6). These apparent evolutionarily young repeats in 176 ducks show a lower level of sequence divergence from their consensus sequences 177 ( Supplementary Fig. S5), and tend to insert into other older TEs and form a nested repeat 178 structure ( Supplementary Fig. S6). 179 Assembly of exon sequences embedded in such complex repetitive regions also led to the 180 improvement of gene model annotations in our new assembly (e.g., Figure 2c). Overall, our new 181 gene annotation combining a total of 17 duck tissue transcriptomes and chicken protein queries 182 has predicted 15,463 protein-coding genes, including 71 newly annotated chrW genes. We have 183 recovered 8,238 missing exons in the BGI1.0 assembly from 2,099 genes, of which 745 genes 184 were completely missing. We also corrected 683 partial genes, and merged them into 356 genes 185 in the new assembly. The overall quality of our new duck genome is better than that of the 186 previous Sanger-based zebra finch, and comparable to the latest version of chicken [

Different genomic landscapes of duck micro-and macro-chromosomes 191
Our high-quality genome assembly and annotation of Pekin duck uncovered a different genomic 192 landscape between the macro-and micro-chromosomes. Duck microchromosomes have a higher 193 gene density than macrochromosomes per Mb sequence or per TAD domain (P< 2.2e-16, 194 Wilcoxon test). The recombination rate estimated from the published population genetic data [45] 195 is also on average 2.3-fold higher on microchromosomes than on macrochromosomes (16.3 vs.  Table S4-5). We found 22 telomeric sites among the 31 chromosomes, of which 11 were 208 interstitial telomeric repeat (ITR) sites inside the chromosomes (Figure 3a anchors') at the duck TAD boundaries (Supplementary Fig. S15a-b), suggested an active 'loop 263 extrusion' mechanism involving both the extruding factors cohesin protein complex along 264 chromatin and the counteracting CTCF protein [66]. In support of this, TAD boundaries that 265 overlap with DNA loops have a significantly higher density of putative CTCF binding sites than 266 any other TAD boundaries (Supplementary Fig. S15c)  We hypothesize that the TAD units or TAD boundaries are probably under strong selective 281 constraint during evolution. This is suggested by some congenital diseases and cancer cases 282 caused by disruptions of TADs through structural variations [70], and also sharing of TAD 283 boundaries between distantly related species [65,71]. A substantial proportion (42.6%) of duck 284 TAD boundaries are shared with those of chicken (Figure 4c). This is probably an underestimate 285 given that different tissues of Hi-C data were used here to identify TADs for the two bird 286 species. A comparable level of conservation of human TAD boundaries (53.8%) has also been 287 observed with mouse[65], and expectedly a lower level (26.8%) of conservation has been 288 observed between human and chicken [55]. The other evidence of strong selective constraints 289 acting on the integrity of TADs come from our findings here on the pattern of chromosomal 290 inversion breakpoints of duck, whose TAD insulation scores are significantly (P< 2.2e-16, 291 Wilcoxon test) lower (Figure 4d) than the TAD interior regions. That is, inversions more often 292 precisely occurred at the TAD boundaries rather than within the TADs, i.e., disrupting the pre-293 existing TADs. Only one third of the detected inversions have both their breakpoints located 294 within the TADs, whereas the remaining two thirds have both or one of their breakpoints 295 overlapping with the TAD boundaries (Figure 4e-g). Novel TAD boundaries that were created 296 by the duck-specific inversions (e.g., Figure 4g) tend to have significantly higher insulation 297 scores, i.e., weaker insulation strengths than those that are conserved between duck and chicken 298 (Supplementary Fig. S18). This suggests that natural selection may more frequently target 299 evolutionarily older and stronger TAD boundaries. We have to point out the alternative 300 explanation for the overlap between the TAD boundaries and inversion breakpoints (Figure 4e) 301 is that chromatin loop anchors bound by CTCF protein are more likely genomic fragile sites 302 vulnerable for DNA double-strand breaks[72] that induce the inversions. Consistent with this 303 explanation, we found that the TAD boundaries that overlap with inversion breakpoints (Figure  304 4h, bottom) have a significantly (P<0.001, Chi-square test) higher percentage of loop anchors 305 than others (Figure 4h, top). 306 Since the novel TADs generated by chromosome inversions (e.g., Figure 4g) may create 307 aberrant or new promoter-enhancer contacts, and consequently divergent gene expression during 308 evolution, we further compared the levels of gene expression divergence in the conserved TADs 309 vs. those novel TADs that encompass inversion breakpoints between chicken and duck. 310 Interestingly, genes that are close to the novel TAD boundaries created by inversions only show 311 slightly but not significantly higher levels of expression divergence than the genes located in the 312 conserved TADs, except for some tissue (Supplementary Fig. S19). This reflects that the TAD 313 boundary changes have only affected a few genes' expression patterns. It can be also explained 314 by other regulatory divergences (e.g., in cis-elements) within the conserved TADs during the 315 long-term divergence between chicken and duck, that have increased the target genes' expression 316 divergence to the same degree as that in the novel TADs. 317 318 Sex chromosome evolution of Pekin duck 319 The Pekin duck provides a great model for understanding the process of avian sex chromosome 320 evolution because their differentiation degree is between those of ratites and chicken [27]. 321 Previous comparative cytogenetic work found that the FISH probe of chicken chrZ cannot 322 produce hybridization signals on chicken chrW because of their great sequence divergence, but 323 instead can paint the entire chrW of duck and ostrich, suggesting that substantial sequence 324 homology has been preserved between the Z/W chromosomes of the two species since the 325 recombination was suppressed [27,65]. The size of duck chrW is nevertheless smaller (estimated 326 size 51Mb) [73,74] compared to chrZ, probably because of extensive large deletions. 327 Our new duck genome has assembled most of its chrZ, except for 1.3 Mb unanchored 328 sequences, into one continuous sequence of 84.5Mb long (Supplementary Fig. S20) including 329 one 2.2Mb long PAR at the chromosome tip (Figure 5a). This is consistent with previous 330 cytogenetic work showing only one recombination nodule concentrated at the tip of the female 331 duck sex chromosomes [75]. Consistently, the PAR shows a significantly (P< 2.2e-16, Wilcoxon 332 test) higher rate of recombination than the rest Z-linked SDR that do not have recombination in 333 females (Figure 5a). The distribution of GC content also exhibits a sharp shift at the PAR 334 boundary because of the effect of gBGC (Supplementary Fig. S21). The evolution of chicken 335 chrZ is marked by the acquisition of large tandem arrays of four gene families that are 336 specifically expressed in testis [18]. In contrast, we have not found similar tandem arrays of testis 337 genes on chrZ of duck, and all of the four Z-linked chicken testis gene families are located on the 338 autosomes of duck (Supplementary Fig. S22). The assembled duck chrW assembly contains 36 339 scaffolds with a total length of 16.7Mb (about one third of the estimated size), all of which are 340 almost exclusively mapped by female reads (Supplementary Fig. S20). It marks an 8.8-fold 341 increase in size compared to our previous assembly using Illumina reads [20,76], and is much 342 longer than the most recent assembly of chicken chrW (6.7 Mb) [22]. We have annotated a total 343 of 71 duck W-linked SDR genes, and all of them are single copy genes, compared to 27 single-344 copy genes and one multicopy gene on the chicken chrW, with 20 genes overlapped between the 345 two (Figure 5b). The only multicopy chicken W-linked gene HINTW with about 40 copies [22] is 346 present as a single-copy gene on the duck chrW. These results indicate that duck and chicken 347 have independently evolved their sex-linked gene repertoire since their species divergence. The 348 duck chrW retained more genes than chicken, and represents an intermediate stage of avian sex 349 chromosome evolution between those of ratites and chicken. 350 Due to the intrachromosomal rearrangements of chrZ, most birds (including duck) except 351 for ratites have retained few ancestral gene syntenies of their proto-sex chromosomes before the 352 suppression of homologous recombination [20,76], and exhibit dramatic reshufflings of their old 353 evolutionary strata. In order to accurately reconstruct the history of duck sex chromosome 354 evolution, we used a newly produced chrZ assembly of emu in our group to approximate the 355 avian proto-sex chromosomes. Almost all (15.2Mb, 91%) of the duck chrW sequences can be 356 aligned to the chrZ of emu, and form a clear pattern of four evolutionary strata. This is 357 manifested as a gradient of Z/W pairwise sequence divergence, i.e., a gradient of the age of strata 358 along the chrZ, which is named from the old to the young, as stratum 0, S0 to S3, (Figure 5a). 359 Within each stratum, chrW scaffolds of similar levels of sequence divergence are clustered and 360 separated from the neighbouring strata with different divergence levels ( Supplementary Fig.  361   S23). The genes enclosed in each stratum are consistent with our previous annotation of the duck 362 evolutionary strata based on the BGI1.0 genome, and show a consistent gradient of synonymous 363 substitution rates (Supplementary Fig. S24) between the Z-and W-linked alleles according to 364 the age of the strata where they reside. We have not found any chrW scaffolds that span the 365 boundaries of neighbouring strata, probably because of some complex repeat sequences that 366 accumulate at the boundary. Interestingly, the inferred boundaries between evolutionary strata on 367 chrZ, i.e., the breakpoints between the inverted regions within or between the strata (8 out of 9 368 boundaries shown in Figure 5a) tend to have a low TAD insulation score, i.e., to overlap with 369 TAD boundaries or loop anchors (Supplementary Fig. S25). This again strongly supports the 370 idea that loop anchors or TAD boundaries are likely the genomic fragile regions that induced 371 inversions. 372 Because of the lack of recombination, majorities (30 or 42.9%) of W-linked genes 373 probably have become pseudogenes or long non-coding RNA genes due to the frameshift 374 mutations or premature stop codons (Supplementary Fig. S26). The other pronounced signature 375 of functional degeneration of chrW is accumulation of TEs. The duck chrW shows a much 376 higher genomic proportion (46.5% vs. 10.1%) and a different composition of TEs compared to 377 the genome average (Figure 5c). The W-linked repeats are concentrated at those families that 378 have specifically expanded their copy numbers in the duck after it diverged from other 379 Anseriformes (Supplementary Fig. S27, Supplementary Table S8). Among them, different TE 380 families exhibit opposing trends of colonizing the different evolutionary strata of different ages 381 (Figure 5d, Supplementary Fig. S28). TE families that have been propagating since the 382 ancestor of Neoaves (e.g., CR1-J2_Pass, Supplementary Fig. S6)[77] are more enriched in the 383 older strata, while TE families that were specifically propagated in the duck (e.g., TguERV3_I-384 int, Figure 2b) are more enriched in the younger strata. This suggests that older evolutionary 385 strata might be saturated for old TEs relative to TEs with recent activities. Particularly, duck or 386 Anseriformes enriched repeats are nested with each other and form 38 palindromes dispersed 387 across the entire chrW (Figure 5e) frequently exposes the TAD boundaries to double-strand breaks, and induces chromosomal 414 inversions involving the entire TAD. This mechanism may also account for the common 415 genomic fragile sites found in both birds and mammals that have been reused during evolution to 416 mediate genomic rearrangements [7,11,13,80]. Overall, despite divergent chromosomal 417 composition, our results revealed conserved mechanisms of chromosome folding and 418 rearrangements between birds and mammals. 419 The two clades of vertebrates also evolved convergent sex chromosome architectures. Our 420 finding that the duck chrW has suppressed recombination with chrZ in a stepwise manner is 421 similar to the pattern of evolutionary strata between the human X and Y chromosomes [19]. As 422 the result of recombination suppression, the duck chrW has accumulated massive TEs, some of 423 which formed dispersed palindromes along the chromosome. Unlike other sex-specific 424 palindromes reported in primates, birds and willow [25,26,[81][82][83] genome-wide distribution of the 190bp sequences was generated by binning the genome with a 494 50kb non-overlapping window to find the local enrichment of copy numbers, which was defined as 495 the putative centromeres. For telomeres, we used the known vertebrate consensus sequence[104] 496 'TTAGGG/CCCTAA' to search for the clusters of consensus sequence on both strands from the 497 above tandem repeat annotation. Consensus sequence enriched genomic blocks in a 50kb window 498 were then defined as the putative telomere regions. 499

Building the chromosomal sequences and identifying the sex-linked sequences 501
To anchor Pekin duck scaffolds onto chromosomes, we first collected the ordered 1689 RHmap 502 linked contigs[32] and 155 BAC clone sequences[33] from the previous studies. We aligned these 503 sequences, as well as the Illumina duck genome[36] (BGI1.0) to the new duck scaffolds we 504 generated by nucmer[105] (v3.23) packages (http://mummer.sourceforge.net) and only kept the best 505 hits for each sequence. Scaffolds were orientated and ordered first based on the RHmap contigs that 506 span more than one scaffold, then by BAC sequences whose order was determined previously by 507 FISH, and finally by the syntenic relationship with the BGI1.0 genome. We also corrected 508 scaffolding errors using the raw PacBio reads, if the order of our scaffolds had conflicts with that of 509 RHmap or BAC sequence order (Supplementary Fig. S2). 510 To identify the sex-linked sequences, Illumina reads from both sexes were aligned to the 511 scaffold sequences using BWA ALN[106] with default parameters. Read depth of each sex was 512 then calculated using SAMtools [107] in 5kb non-overlapping windows, and normalized against the 513 median value of depths per single base pair throughout the entire genome, respectively, to enable 514 the comparison between sexes. To identify the Z-linked sequences, the depth ratio of male-vs-515 female (M/F) was calculated for the genomic regions mapped by reads for each sequences, with a 516 minimum 80% coverage in both sexes, and sequences with a depth ratio ranging from 1.5 to 2.5 517 were assigned as Z-linked. To identify the W-linked sequences, we calculated M/F depth ratio as 518 well as M/F coverage ratio and assigned scaffolds to W-linked when either ratio was within the 519 range from 0.0 to 0.25 as W-linked sequences (Supplementary Fig. S21). Since we do not have 520 linkage markers on the W chromosome, we ordered the W scaffolds based on their unique aligned 521 position with the Z chromosome using RaGOO[108] (v1.1) with default parameters 522 (https://github.com/malonge/RaGOO). 523 To identify the inversions in the duck genome, genomic syntenic blocks between chicken 524 and duck, and emu and duck were constructed using nucmer (v3.1) with the parameters: -b 500 -l 525 20. Then inversions between chicken and duck were manually checked by plotting the dot plot 526 between the two species. The duck specific inversions were identified by excluding chicken-527 specific inversion, using emu as the outgroup. 528 529

Hi-C analyses 530
Hi-C read mapping, filtering, correction, binning and normalization were performed by HiC-531 Pro[109] (v2.10.0) with the default parameters. In brief, Hi-C reads of chicken [110] (sourced from 532 FR-AgENCODE project) and duck were mapped to the respective reference genome and only 533 uniquely mapped reads were kept. Then each uniquely mapped reads were assigned to a restriction 534 fragment and invalid ligation products were discarded. Data was then merged and binned to 535 generate the genome-wide interaction maps at 10kb and 50kb resolution. TADs were identified by 536 HiCExplorer[111] (v3.0) with the application hicFindTADs. First, HiC-Pro interaction maps were 537 transformed to h5 format matrix by hicConvertFormat with parameters: --inputFormat hicpro --538 outputFormat h5. Then the h5 matrix was imported to hicFindTADs with parameters:--outPrefix 539 TAD --numberOfProcessors 32 --correctForMultipleTesting fdr. hicFindTADs identifies the TAD 540 boundaries through an approach that computes a TAD insulation score. Genomic bins with low 541 insulation scores relative to neighboring regions were defined as local minima and called as the 542 TAD boundaries. Human CTCF[112] motif was used as a query for FIMO in MEME [113] 543 (v4.12.0) to identify the putative CTCF binding sites. CTCF density in every 10kb non-overlapping 544 sliding window along the genome was calculated to check its enrichment at the TAD boundaries. 545 We identified the A/B compartments using the pca.hic function from HiTC[114] (High Throughput 546 Chromosome Conformation Capture analysis) R package with default parameters, and the 10kb 547 matrix generated by HiC-Pro as the input. We identified the chromatin loops by Mustache[115] 548 with the parameters: -p 32 -r 10kb -pt 0.05, after converting the h5 format matrix to mcool matrix 549 format by hicConvertFormat with parameters: --inputFormat h5 --outputFormat mcool. 550

Evolutionary strata 551
To demarcate the evolutionary strata, all the repeat masked duck W-linked scaffolds were aligned to 552 emu Z chromosome using LASTZ[116] (v0.9) with parameters: --step=19 --hspthresh=2200 --553 inner=2000 --ydrop=3400 --gappedthresh=10000 --format=axt, and a score matrix set for the distant 554 species comparison. Alignments were converted into 'net' and 'maf' results using UCSC Genome 555 Browser's utilities (http://genomewiki.ucsc.edu/index.php/). Based on 'net' and 'maf' results, the 556 identity of the aligned sequence was calculated for each alignment block with a 10kb non-557 overlapped window and then we oriented the aligned W-linked sequences along the Z 558 chromosomes. Then we color-coded the pairwise sequence divergence level between the Z/W 559 sequences to demarcate the evolutionary strata. 560  Interstitial telomere sequences were labelled with green triangles on the chromosome. Putative 617 centromeres (red lines) and telomeres (green lines) were inferred by the enrichment of 618 centromeric and telomeric repeat copies, which show a sharp peak. We then show the 619 recombination rate and GC content calculated in non-overlapping 50kb windows, as well as two 620

Gene expression analyses
repeat families (GGERVL-A-int and CR1-J2 Pass) that we identified to be enriched at 621 centromeric regions and chrW. We also show the male vs. female (    Click here to access/download; Figure;Fig4 Figure 5