Reduced chromatin accessibility underlies gene expression differences in homologous chromosome arms of diploid Aegilops tauschii and hexaploid wheat

Abstract Background Polyploidy is centrally important in the evolution and domestication of plants because it leads to major genomic changes, such as altered patterns of gene expression, which are thought to underlie the emergence of new traits. Despite the common occurrence of these globally altered patterns of gene expression in polyploids, the mechanisms involved are not well understood. Results Using a precisely defined framework of highly conserved syntenic genes on hexaploid wheat chromosome 3DL and its progenitor 3 L chromosome arm of diploid Aegilops tauschii, we show that 70% of these gene pairs exhibited proportionately reduced gene expression, in which expression in the hexaploid context of the 3DL genes was ∼40% of the levels observed in diploid Ae tauschii. Several genes showed elevated expression during the later stages of grain development in wheat compared with Ae tauschii. Gene sequence and methylation differences probably accounted for only a few cases of differences in gene expression. In contrast, chromosome-wide patterns of reduced chromatin accessibility of genes in the hexaploid chromosome arm compared with its diploid progenitor were correlated with both reduced gene expression and the imposition of new patterns of gene expression. Conclusions Our pilot-scale analyses show that chromatin compaction may orchestrate reduced gene expression levels in the hexaploid chromosome arm of wheat compared to its diploid progenitor chromosome arm.


Background
Polyploidy arises from the duplication or fusion of genomes and has occurred frequently in the lineages of many organisms, from fish to flowering plants [1]. The ancestral flowering plant lineage has undergone ≥2 genome duplication events, with subsequent multiple genome duplication events in different lineages [2]. Radical alterations of gene expression patterns are commonly observed consequences of polyploidization. In newly formed al-lotetraploids of Arabidopsis arenosa and Arabidopsis thaliana ∼5% of genes were expressed at different levels than in the parental lines [3]. In Tragopogon allopolyploids, ∼76% of homoeologous genes displayed additive expression (the average of expression measured in each parental line), and ∼20% of transcripts exhibited non-additive expression in which gene expression levels varied from the average of parents [4]. Over longer time scales, a tendency of homoeologous gene expression from one parental genome to dominate over the other has been observed in several polyploids, including cotton, Brassicas, and maize [5]. The mechanisms underlying these genome-scale changes in gene expression are poorly understood.
Many crop species have undergone relatively recent polyploid events prior to and during domestication, leading to new traits and improved performance [6]. Large genomic segments such as entire chromosomes from related species are also added to crop varieties to introduce new traits [7]. Understanding how these large-scale genomic changes influence genome stability and gene expression, and how they give rise to improved performance in crops, is therefore centrally important from both practical and research perspectives. Several hypotheses have been proposed to explain genomic interactions in hybrids and polyploids, ranging from complementation of differing alleles, misregulation of gene expression, epigenetic changes, and the activities of transposable elements (TEs) [8][9][10].
The wheat group of the Triticeae is characterized by stable tetraploid and hexaploid species that exhibit greater diversity, adaptability, and potential for domestication than their diploid progenitors. Multiple types of genomic changes, including altered expression patterns of genes and TEs, and epigenetic changes, have been proposed to contribute to this "genomic plasticity" [11]. The hexaploid bread wheat genome (Triticum aestivum) arose from the very recent integration of the diploid Aegilops tauschii DD genome into a tetraploid Triticum turgidum AABB genome [12]. The 3 component genomes are very closely related, sharing common ancestry in the Triticeae lineage ∼6.5 million years ago. In newly synthesized allohexaploid wheat [13] and tetraploid AADD and S S AA (where S S is Aegilops longissima) [14] wheat lines between 60% and 80% of genes were additively expressed, suggesting a dynamic re-adjustment of gene expression patterns as a consequence of allopolyploidy. Rapid asymmetric changes in short RNA, histone methylation, and gene expression in the 2 allotetraploids are thought to contribute to genome-biased gene expression and the activation of TE transcription [14]. RNA-sequencing (RNA-seq) analyses of newly formed allotriploid ABD and stable allohexaploid AABBDD lines generated from T. turgidum and Ae tauschii showed rapid and extensive changes in gene expression in triploid tissues that were partly restored upon genome duplication [15]. The overall very high conservation of sequences and gene order of D genome chromosomes in diploid Ae tauschii and hexaploid wheat [16] provides an important opportunity to assess differences between diploid and hexaploid states of very similar chromosomes. Here we analyse gene expression, DNA methylation, and chromatin accessibility of a set of well-defined syntenic genes from diploid chromosome 3 L of Ae tauschii and hexaploid chromosome 3DL of bread wheat and show that reduced chromatin accessibility underlies large-scale changes in gene expression between diploid and hexaploid states.

Plant materials
The Paragon elite wheat (Triticum aestivum, NCBI:txid4565; AABBDD) variety was used because it is a commonly used experimental line with a sequenced genome and extensive functional genetic resources. The diploid progenitor species Aegilops tauschii (NCBI:txid200361; DD), accession AL8/78, that has a sequenced genome was used for most experiments. Two divergent Ae tauschii lines, Clae23 and ENT336, were also used for comparative transcriptomics. All plants were grown in a glasshouse with supplementary lighting (12-24 • C, 16/8 h light/dark cycle).

Chromosome 3DL arm assembly and annotation
A set of bacterial artificial chromosome (BAC) scaffolds of flowsorted chromosome 3DL [17] was extended and scaffolded using wheat Pacific Biosciences assemblies from Triticum 3.1 [18] as templates ( [19]; Additional File 1). These scaffolds were further extended using Fosill long mate-pair reads [17]. The chromosome 3DL pseudomolecule was assessed by mapping the resulting 504 scaffolds to International Wheat Genome Sequencing Consortium (IWGSC) 3D [20]. The scaffolds were localized and assigned to a specific order and strand and linked (Additional File 2). Order discrepancies were manually corrected and 100 Ns were placed between neighbouring scaffolds to mark the sequence gap (Supplementary Table S1). Chromosome 3DL genes were predicted by ab initio methods, using EST and de novo transcript assemblies. Predicted gene models were curated manually and given a confidence score using RNA evidence, protein alignments, and ab initio predictions. Pseudogenes were annotated and identified as those genes with predicted exon-intron structures conforming to the GT-AG intron rule, but with no consensus or translatable coding sequences (CDS). A total of 3,927 genes were identified on 3DL of which 3,540 were located on anchored scaffolds (Additional File 4). Seventy-five percent of the annotated genes were predicted with high confidence, and 192 pseudogenes were identified on the basis of gene models with conserved exon-intron structures that lacked an identifiable coding sequence. Approximately 70% of the assembled sequences were repetitive (Supplementary Table S2), comprising 52% class I long terminal repeat retrotransposons and 12% class II DNA transposons. Supplemental Table S3 describes the assembly of chromosome 3DL. Gene order between hexaploid wheat 3DL and Ae tauschii AL8/78 genome pseudomolecules was determined using gene annotations from Luo et al. [21]. An additional 1,266 additional genes were identified on chromosome 3 L using wheat 3DL gene models and Ae tauschii RNA-seq and transcript assemblies to assign a total of 4,121 genes to chromosome 3 L.

RNA sequence datasets
Triplicated samples of wheat and Ae tauschii AL8/78 were collected from leaves and roots of 14-day-old greenhouse-grown plants, 4-day-old seedlings germinated on filter paper, and 10 days after pollination (DAP) and 27 DAP developing grains (Supplementary Table S3), frozen in liquid nitrogen and stored at −80 • C. Total RNA was extracted as described [22]. Illumina TruSeq messenger RNA libraries were constructed according to the manufacturer's protocol. All sequencing was carried out on an Illumina HiSeq 2500, with 100 bp paired-end read metric, TruSeq SBS V3 Sequencing kit, and version 1.12.4.2 RTA.

Assay for transposase-accessible chromatin sequence datasets
Nuclei were isolated rapidly from triplicated wheat and Ae tauschii leaf protoplast samples and subjected to transposition using Nextera transposase (Illumina). Purified DNA was used as a control. Amplified tagmented DNA was sequenced on a HiSeq X Illumina sequencer (Illumina HiSeq X, RRID:SCR 016385) with 150 bp paired-end reads (Novagene, Shenzhen, China). Assay for transposase-accessible chromatin sequencing (ATAC-seq) reads matching mitochondrial and chloroplast genomes were identified and removed using the T. aestivum chloroplast genome (GenBank accession No. NC 002762), T. aestivum mitochondrial genome (GenBank accession No. AP008982), and the Ae tauschii chloroplast genome (GenBank accession No. NC 022133).

Extensive conservation of genes on wheat chromosome 3DL and Ae tauschii 3 L
Accurate long-range assemblies and consistent annotations are essential prerequisites for comparing chromosome-scale features. We focused on group 3 chromosomes of pooid grasses becase they have extensive conserved gene order and clearly defined orthologous relationships [16,23]. We prepared a BACbased long-range assembly of wheat chromosome 3DL and compared this to a long-range assembly of chromosome 3 L of Ae tauschii [21]. Comparison of the 504 3DL scaffolds with the recently available IWGSC v1.0 assembly of chromosome 3D [20] corrected 13 orientation discrepancies in the IWGSC v1 assembly, including a 25.37-Mb segment (Fig. 1A), and its collinear relationship to chromosome 3 L of Ae tauschii (Fig. 1B). Extensive conserved syntenic gene order between wheat 3DL and Ae tauschii 3 L was seen (Fig. 1C), in which 3,456 of the 3,927 predicted 3DL wheat genes align to 3 L genes with a mean of 99.66% sequence identity across their full CDS (Additional File 4). Most of the 207 non-syntenic genes on wheat 3DL were located towards the telomeric ends of the chromosome arms.

Gene expression patterns in diploid 3 L and hexaploid 3DL
Supplementary Fig. S1 shows expression profiles of 3DL and 3 L genes in 2-week-old greenhouse Paragon wheat and Ae tauschii AL8/78 plants, from 4-day-old seedlings, and from 10 and 27 DAP developing grain sampled tissues (Supplementary Table S3). In total, 2,375 (68.72%) of the syntenic genes from 3DL (1,893 genes) and/or 3 L (2,217 genes) were expressed at transcripts per million (TPM) ≥ 1 in these tissues (Additional File 4). Comparison of TPM values of syntenic gene pairs between hexaploid 3DL and diploid 3 L ( Fig. 2A) showed a significant trend in reduction of TPM values in the hexaploid context to ∼40% of those measured in diploid genomic context for ∼70% of the pairs of genes. This pattern of gene expression difference, which we called proportionately reduced expression, is shown as grey bars in Fig. 2B that mapped across the chromosome arms of wheat 3DL and Ae tauschii 3 L. To validate the use of TPM for comparing gene expression between hexaploid wheat and diploid Ae tauschii, we measured the ex-  pression of a set of 14 genes present as single copies in the wheat AA, BB, DD, and Ae tauschii DD genomes (Additional File 5) using RNA extracted from 50,000 leaf protoplasts and quantitative RT-PCR using a standard curve of different numbers of a plasmid molecule. For 11 of the 14 gene sets, absolute transcript levels showed the same pattern of reduction in expression in the 3DL hexaploid context to ∼40% of that measured in the D genome diploid context (Fig. 2C). The sum of AA, BB, and DD expression values was 1.2 times the diploid value on average (Fig. 2C), verified by our quantitative PCR results. This consistency validated the use of TPM values for comparing gene expression values in diploid and hexaploid genome contexts.

Differentially expressed genes
Differentially expressed genes (DEGs) were categorized by median-normalizing wheat TPM values to Ae tauschii TPM values by multiplying them by 1.7604 to take account of the 70% of genes that had reduced expression levels in 3DL compared to 3 L ( Fig. 2A). A threshold value of ≥4-fold change (either up in wheat or up in Ae tauschii) was then established as a conservative measure, compared to 30% variation used in whole-genome analyses [24]. This defined a class of 674 DEGs (28.38% out of the 2,375 expressed genes) in the 5 examined tissues (Supplementary Table S4). Fig. 2B describes these genes, which are represented as red bars for those with ≥4-fold increase in wheat or blue bars for ≥4-fold increase in Ae tauschii on a chromosome map. It is possible that these differences in expression were due to differences between the sequenced Ae tauschii variety AL8/78 and the donor of the D genome to bread wheat. We therefore carried out RNA-seq analyses of 2 other diverse Ae tauschii varieties, Clae23 and ENT336. Leaf RNA-seq data from Clae23 and ENT336 were mapped to 3 L ( Supplementary Fig. S2). Of the 262 (17% of 1,564 expressed genes) leaf-specific DEGs identified in wheat 3DL and Ae tauschii AL8/8 3 L, 106 were conserved in all 3 Ae tauschii varieties. Mapping of these conserved DEGs to 3DL and 3 L showed that they occurred along the entire chromosome arm, with no evidence for regional differences (Fig. 2D).
Most gene sequences (CDS and untranslated region [UTR]) had >99% sequence similarity between wheat 3DL and Ae tauschii 3 L, with a pattern of greater diversity towards chromosome ends (Fig. 3A). Promoter sequence variation was also more pronounced towards chromosome ends (Fig. 3B). However, there was no clear relationship between promoter and gene sequence divergence and differential patterns of gene expression. In addition, only 4 of the 106 leaf DEGs showed minor differences in exon-intron structure ( Supplementary Fig. S3).
We identified 86 DEGs on 3DL that were up-regulated in 27 DAP developing grain of wheat compared to Ae tauschii (Additional File 6). Sixty-three were classified using GO terms as involved in protein targeting and degradation, RNA transcription, processing, and translation regulation. Several promoter motifs were enriched in these genes, suggesting that the DD genome contributed genes with new roles in the later stages of grain development. Thirty-three of these 86 DEGS had previously defined tissue-specific gene expression patterns [25] with 27 most highly expressed in 20 DAP aleurone layer samples. These 33 genes were also expressed in 20 DAP whole endosperm, 20 DAP transfer cell, and 20 and 27 DAP starchy endosperm tissue samples. Thus hexploidy leads to differential regulation of 3 L genes during later stages of grain development in wheat. Sequence differences between gene transcripts of 106 syntenic genes with conserved differential gene expression patterns on the long arms of wheat 3DL and 3 L of Ae tauschii. The vertical axis shows percent similarities in predicted transcript sequences, and on the x-axis the telomeric end of the chromosome arm is on the right. Red dots indicate the 106 consensus DEGs among 3 Ae tauschii accessions, and the grey dots identify genes with proportionately reduced expression. Increased transcript sequence divergence is seen towards the telomere (Spearman correlation test −0.6913737 in 8-Mb bins), while DEGs and genes with proportionately reduced expression were distributed along the chromosome arm. (B) Sequence differences between promoters of 106 syntenic genes with conserved differential gene expression patterns on the long arms of wheat 3DL and 3 L of Ae tauschii. The vertical axis shows percent similarities in predicted promoter sequences, defined as 2 kb upstream of the predicted transcription start site. Red dots indicate the 106 consensus DEGs among 3 Ae tauschii accessions, and the grey dots identify genes with proportionately reduced expression. Promoter sequence divergence was also more pronounced towards the telomeres (Spearman correlation test −0.398358), but there was no clear relationship between promoter divergence and differential gene expression.

Comparative DNA methylation
The role of gene body and promoter methylation in the altered patterns of gene expression observed in chromosomes 3 L and 3DL was determined using exome capture and bisulphite sequencing of Ae tauschii AL8/78, and whole-genome bisulphite sequencing of hexaploid wheat. Reads were mapped to each complete genome, those mapping to chromosomes 3 L and 3DL were identified, and their methylation in CpG, CHG, or CHH contexts characterized. Overall levels of 3 L and 3DL methylation were similar, with 89.9%, 59.4%, and 3.8% methylation of CpG, CHG, and CHH sites for wheat and 87.1%, 53.4%, and 3.4% for Ae tauschii AL8/78 (Supplementary Table S5). Average methylation levels were assessed across normalized promoter and gene body lengths of expressed and non-expressed genes for CpG, CHG, and CHH contexts to define relationships between gene methylation and gene expression. Fig. 4 shows a decrease of CpG methylation at the transcriptional start site (TSS) and transcription termination site (TTS) compared to the promoter and gene body regions. This pattern of reduced TSS methylation of CpG and CHH contexts is more marked for expressed genes than non-expressed genes in both 3 L and 3DL. CpG and CHG methylation at the TSS (±20 bp) is significantly lower in expressed genes compared to non-expressed genes (CpG sites: P < 0.0001, t = 5.9739, df = 80; CHG sites: P < 0.0001, t = 6.3446, df = 80).
To identify differentially methylated regions (DMRs) DNA methylation at CpG, CHG, and CHH sites was averaged independently across each gene-body and promoter region. A gene/promoter region was only analysed if a minimum of 5 methylated cytosines were included in the region, each with a minimum bisulphite sequencing coverage of 5× for Ae tauschii 3 L and 10× for Paragon 3DL, because it had a higher sequence depth coverage. This identified 2,709 unique gene-regions (81.9% of the total genes) and 2,182 unique promoter regions across the CpG/CHG/CHH contexts for chromosome 3DL and 2,952 unique gene-regions (71.5% of the total genes) and 2,719 unique promoter regions for chromosome 3 L (Supplementary Table S6). Comparison of methylation between Paragon and Ae tauschii for 2,224 gene pairs in total (64.35%) across CpG/CHG/CHH gene and promoter sites (1,920 genes and 1,368 promoters) identified DMRs. DMRs were defined if a CpG region showed a difference in methylation of ≥50% (q < 0.05), a CHG region showed a difference of ≥25%, or a CHH site showed a difference of ≥10%. Only 11.3% of differentially methylated genes or promoters were correlated with differences in gene expression between Paragon 3DL and Ae tauschii 3 L (Table 1). This showed that differential promoter and gene methylation may account for only a small proportion of observed gene expression differences between diploid 3 L and hexaploid 3DL genes.

Pseudogene analyses
Pseudogene formation as a consequence of polyploidization has been proposed to be an important driver of genomic change [26]. A total of 192 pseudogenes were defined on chromosome 3DL on the basis of well-defined exon-intron structures using EST, protein, and RNA-seq evidence, but without a predicted CDS region. Among the 160 pseudogenes that were syntenic on 3DL and 3 L, 66 (43 in Paragon and 49 in AL8/78) were expressed with ≥1 TPM in the examined tissues. DNA methylation analyses of the 160 syntenic pseudogenes revealed CpG, CHG, or CHH gene methy-   Table S7 compares the methylation levels of pseudogenes with the methylation patterns of all analysed genes on 3 L and 3DL. Methylation levels were generally elevated in pseudogenes in both Paragon and Ae tauschii, by 13.2% for CpG sites and 12.6% for CHG sites, in comparison with non-pseudogenes. Methylation levels were not increased at CHH sites in pseudogenes (mean difference −0.4%). We identified 7 pseudogenes on wheat 3DL that were intact genes in Ae tauschii 3 L, indicating a potential recent origin (Supplementary Table  S8). Three of these Ae tauschii 3 L genes were expressed, but none were in wheat, confirming their identification as pseudogenes. Available methylation data for 4 of these gene pairs showed a large increase in CpG methylation in the wheat pseudogenes compared with their functional Ae tauschii counterparts.

Chromatin accessibility
Differences in chromatin accessibility are key factors in chromosome-scale patterns of gene expression changes seen in dosage compensation [27] and in large-scale transcriptional reprogramming [28]. We therefore carried out ATAC-seq [29] in nuclei prepared from hexaploid wheat and diploid Ae tauschii AL8/78 leaf protoplasts. Supplementary Fig. S4 shows the profile of ATAC fragment sizes on chromosomes 3 L and 3DL. The ∼10.5 bp periodicity reflects cleavage of the DNA helix, and a trace of single and double nucleosome spacing can be seen in the 3 L ATAC fragment length frequency plot. In both Ae tauschii and wheat, normalized density plots of ATAC peaks showed that these were mainly found in 5 UTR + promoter regions (Supple-mentary Fig. S5), consistent with more accessible chromatin in putative transcriptional regulatory regions observed in mammalian cells [29]. Peak lengths showed that regions of accessible chromatin extended to >5 nucleosomes, with a peak at 1-2 nucleosome spacing. In Ae tauschii leaf nuclei 4,960 ATAC peaks on 936 genes were identified on diploid 3 L and 2,970 ATAC peaks in 1,187 genes were identified on hexaploid 3DL leaf nuclei (Supplementary Table S9). Interestingly, 26.53% of wheat and 28.64% of Ae tauschii ATAC peaks mapped to intergenic regions. Approximately 60% of these intergenic ATAC peaks were in regions that were not annotated as repeats, while nearly 75% of the ATAC peaks that mapped to annotated repeats were found over simple sequence repeats (SSRs) (Supplementary Fig. S6A). Only a small proportion of the annotated repeats on 3DL and 3 L had accessible chromatin, the largest of which was 8% of SSR on 3 L (Supplementary Fig. S6B). ATAC peak length density plots (Fig. 5A) showed an overall restriction of accessible chromatin in genic regions of hexaploid 3DL compared to diploid 3 L, with highly significant differences in genes with both proportionately reduced and differential patterns of gene expression (Fig. 5B). Typical patterns of reduced ATAC peak coverage of promoters of 2 syntenic pairs of wheat genes compared with their Ae tauschii counterparts involved loss of additional peaks and reduced width of a common peak and are shown in Supplementary Fig. S7. Peak length distributions mapped to TSS of hexaploid 3DL and diploid 3 L genes showed that ∼60% of accessible regions were found within 100 kb of the TSS in diploid 3 L genes. In contrast, 50% of accessible chromatin was found within 3 kb of hexaploid 3DL gene TSSs (Fig. 5C). This large-scale restriction of genic chromatin accessibility in The chromosomal regions are intergenic, CDS+introns, 5 UTR+2 kb upstream putative promoter region, and 3 UTR+2 kb downstream. ATAC peak length distributions are shown in base pairs on the horizontal axis. The colour scale shows ATAC peak frequency distribution per bin. (B) Box plots of ATAC peak lengths across intergenic regions, genes with proportionately reduced expression patterns, and genes showing differential expression between wheat 3DL and Ae tauschii 3 L. The significance of ATAC peak length differences was assessed by 1-way analysis of variance. Peak length differences for balanced genes between wheat at Ae tauschii were significant at 6.99E−68, and 1.34E−10 for differentially expressed genes. (C) Normalized distance distribution of ATACsequence peaks relative to the TSS of genes in hexaploid wheat and diploid Ae tauschii genes. hexaploid 3DL compared with diploid 3 L genes mapped across the entire chromosome arms (Fig. 6).
Of the 683 genes with differential ATAC peaks between hexploid 3DL and diploid 3 L, 133 of 159 genes (84%) that were differentially expressed in leaf tissue (the tissue used for ATACseq) also had differential ATAC peaks ( Fig. 7; Supplementary Table S10). Of these, most different ATAC peaks occurred in the 5 UTR+promoter regions. These analyses show that the majority of DEGs in hexaploid 3DL had reduced chromatin accessibility mainly over their 5 UTR5+promoter regions. These patterns indicate that chromatin accessibility influenced differential gene expression.

Discussion
We used long-range assemblies and detailed annotations of homologous chromosome arms of diploid Ae tauschii (3 L) and hexaploid wheat (3DL) to characterize the expression of precisely defined syntenic gene pairs from these chromosome arms in diploid and hexaploid genome contexts. Approximately 70% of expressed gene pairs showed a proportionate reduction of gene expression in the hexaploid chromosome arm to 40% of that in the diploid chromosome. This pattern of reduced expression probably reflects balancing gene expression, in which ∼70% of 1:1:1 AABBDD homoeologs were shown to be expressed at an approximate mid-point level [24]. Similar overall reductions in expression have been observed in newly formed wheat hybrids [14,30,31], suggesting that a global re-alignment of gene expression to near-diploid levels is an early consequence of hexaploidy. The remaining 30% of gene pairs in diploid 3 L and hexaploid 3DL exhibited differential expression patterns that were significantly higher or lower in the hexaploid 3DL context compared to the diploid 3 L context. This proportion is also similar to that observed in analyses of complete wheat gene sets in the hexaploid genome [24]. We were not able to identify any relationships between sequence divergence in putative promoter or gene sequences between 3 L and 3DL gene pairs that may account for these differential expression patterns. A significant proportion of DEGs were expressed during the later stages of grain development, consistent with the differential contributions of the AA, BB, and DD genomes to functional modules involved in seed development identified by co-expression analyses [32]. Enrichment of promoter motifs in these 87 DEGs is consistent with a model in which transcriptional regulation from the AABB genomes may differentially regulate DD genes during grain development. Such emergent patterns of cis-trans interactions in polyploid genomes have been modelled as an important consequence of polyploidy [33,34].
Differences in gene methylation have been proposed to contribute to differences in gene expression between progenitor and allopolyploid species [35,36]. Only 11% of DEGs had different gene and promoter methylation patterns that might cause altered expression, indicating a relatively minor influence of methylation differences on gene expression patterns in 3DL and 3 L, which might be partly affected by gene capturing efficiency. Of the 7 pseudogenes found in 3DL that had an intact homolog in 3 L, bisulphite sequence data for 4 showed large increases in CpG methylation. This is consistent with extensive methylation of pseudogenes seen in many plant species [37,38]. Whether DNA methylation is a cause or consequence of pseudogene formation is not known in these examples.
Given the lack of evidence for differences in sequence composition (including gene loss and pseudogenization) or DNA  Relationships between differentially expressed genes and differential ATAC peaks. The Venn diagram shows 683 genes with differential ATAC peaks on any region of genes on hexaploid 3DL and diploid 3 L. In total, 133 of 159 genes that are differentially expressed in leaf tissue (the tissue used for ATAC-seq) have differential ATAC peaks. Of these, most differential ATAC peaks occurred in the 5 UTR+promoter region (shown as UTR5). methylation that could account for the major patterns of altered gene expression observed across chromosomes 3 L and 3DL, what mechanisms may be responsible? Changes in small RNA and chromatin, as measured by chromosome immunofluorescence [14], accompany the formation of new allotetraploid wheat lines, suggesting that some forms of chromatin modification may contribute to altered gene expression in wheat allopolyploids. Dynamic interplay between nucleosomes, transcription factors, and chromatin remodelling proteins alters physical access to DNA in chromatin [39] and provides a direct measure of chromatin states involved in gene expression, such as the occupation of promoter and enhancer sequences by transcription factors and other proteins. ATAC sequencing [29] was used to assess chromatin accessibility in leaf nuclei of hexaploid wheat and diploid Ae tauschii. Chromatin in promoter and 5 UTR regions of genes in the diploid context had more accessible chromatin, both in terms of peak numbers and peak lengths, than in the hexaploid context. This restriction of chromatin accessibility in the hexaploid context extended across the 3DL chromosome arm, encompassing genes with both proportionately reduced and differential expression. Proportionately reduced gene expression on 3DL was correlated with reduced chromatin accessibility compared to diploid 3 L. Differences in chromatin accessibility, as measured by ATAC, are thought to be due to passive competition for DNA between nucleosomes and transcription factors, chromatin remodelling and architectural proteins [39]. It is possible that in the hexaploid context reduced chromatin access across genic regions of 3DL may be due to altered competition for DNA between increased nucleosome formation or reduced transcription factor levels, or a combination of both.
The overall similarities in chromatin accessibility in intergenic regions of both 3 L and 3DL may be due to higher-order nucleosome packaging in heterochromatin, which is characteristic of intergenic DNA in grass genomes [40]. The relatively high levels of chromatin accessibility in SSRs may reflect possible altered nucleosome interactions with these atypical sequences, as Alu repeats influence nucleosome spacing in human cells [41].
The introduction of a divergent set of gene regulatory proteins from the 4-Gb Ae tauschii genome into a 12-Gb tetraploid nucleus may lead to altered interactions between the new set of transcription factors and nucleosomes, thus altering chromatin accessibility. Models of homoeolog expression patterns in allopolyploids that include varying affinities and concentrations of transcription factors for gene regulatory sequences in an allopolyploid showed a strong effect of inter-genome interactions on altered gene expression [34]. Recently, the conforma-tion of cotton chromosomes was shown to be strongly affected by polyploidy, which altered topologically associating domain (TAD) boundaries and chromatin states [42]. Similar changes may occur in wheat upon polyploidization.

Potential Implications
We have identified chromosome-wide changes in chromatin accessibility in a pair of homologous Triticeae chromosome arms in diploid and hexaploid genome contexts that may establish and maintain the large-scale differences in gene expression observed upon formation of polyploid genomes. A wide range of chromatin analysis methods are currently available for studying genome-scale changes in chromatin in newly formed polyploids to further explore mechanisms that impart new patterns of gene expression in polyploid genomes. These analyses will be able to establish comprehensive mechanisms that explain the rapid emergence and stable maintenance of new traits in polyploid crop plants such as bread wheat.

Transcription analyses
Reads were mapped to the complete Triticum3.1 genome assembly with 3DL assemblies replaced by our chromosome 3DL pseudomolecule. Expression differences between wheat and Ae tauschii used the HISAT2-StringTie pipeline [46] to compute TPM values (described online at [47]). Absolute quantitative RT-PCR was used to validate TPM value comparisons between the diploid and hexaploid chromosome arms. Fourteen non-DEG genes were identified and 2 pairs of primers for RT-PCR were designed by the HomoeologPrimer (HomoeologPrimer, RRID:SCR 0 17559) (Additional File 5). One pair was designed to amplify all 3 gene copies in hexaploid wheat and the single copy in diploid Ae tauschii, and a second pair was designed to specifically amplify only the DD genome copy. Triplicate samples of 50,000 leaf protoplasts from wheat and Ae tauschii AL8/78 from young leaves were collected and RNA extracted. Quantitative PCR was performed on complementary DNA using a LightCycler 480 System (Roche), and a standard curve of 8-80 m molecules of the 5,340bp plasmid pETnT was used to estimate absolute transcript levels.

Bisulphite sequencing
A full description is provided on protocols.io [48]. Bisulfite converted wheat paired-end sequences were aligned to wheat genome assemblies using Bismark (version 0.18.1) [49]. The methylation status of each cytosine residue across the sequences was identified using the Bismark methylation extractor tool and the percentage of reads methylated per cytosine residue across the wheat 3DL and Ae tauschii sequences was calculated.

ATAC-seq sequencing
A protocol adapted for wheat and Ae tauschii leaf nuclei is described (see details at [50]) and was used with the ATACseqMap-pingPipeline (ATACseqMappingPipeline, RRID:SCR 017558). Duplicate reads were removed using the Picard tools MarkDuplicates program [49]. All reads aligning to the forward strand were offset by +4 bp, and all reads aligning to the reverse complement strand were offset by −5 bp [51]. ATAC-Seq peak regions of each sample were called using MACS2 (v2.1.2 dev) [52] and only peak areas generated in all 3 independent experiments were used in subsequent analyses. ATAC-Seq peaks for which the distance between proximal ends was <10 bp were merged. Four classes of peak areas were assessed for chromatin accessibility: 5 UTR + promoter (including from the ATG to 2 kb upstream), CDS (the gene coding region and predicted intron sequences), 3 UTR + downstream (including the 3 UTR to 2 kb downstream), and intergenic (regions > 2 kb distance from genes).

Availability of Supporting Data and Materials
The chr3DL assembly and annotation data generated in this study have been submitted to the EBI European Nucleotide Archive (ENA) database (https://www.ebi.ac.uk/ena) under accession number PRJEB23358. Paragon RNA-seq data are under PRJEB29855; Ae tauschii AL8/78 RNA-seq data are under PR-JEB23317 as described in the Ae tauschii genome article [21]. Ae tauschii Clae 23 leaf RNA-seq data are under PRJEB29859; Ae tauschii ENT336 leaf RNA-seq data are under PRJEB29860; Paragon ATAC-seq data are under PRJEB29868; AL8/78 ATAC-seq data are under PRJEB29869. Gene methylation data for Ae tauschii AL8/78 3 L and Paragon wheat 3DL data are under PRJEB31186. All supporting data and materials are available in the GigaScience database GigaDB [53].  Figure S7. ATAC peaks on pairs of syntenic genes in hexaploid wheat and diploid Ae tauschii". "Table S1 "Sequence assembly of wheat chromosome 3DL Scaffolds