Transcript Length Mediates Developmental Timing of Gene Expression Across Drosophila

The time required to transcribe genes with long primary transcripts may limit their ability to be expressed in cells with short mitotic cycles, a phenomenon termed intron delay. As such short cycles are a hallmark of the earliest stages of insect development, we tested the impact of intron delay on the Drosophila developmental transcriptome. We find that long zygotically expressed genes show substantial delay in expression relative to their shorter counterparts, which is not observed for maternally deposited transcripts. Patterns of RNA-seq coverage along transcripts show that this delay is consistent with their inability to completely transcribe long transcripts, but not with transcriptional initiation-based regulatory control. We further show that highly expressed zygotic genes maintain compact transcribed regions across the Drosophila phylogeny, allowing conservation of embryonic expression patterns. We propose that the physical constraints of intron delay affect patterns of expression and the evolution of gene structure of a substantial portion of the Drosophila transcriptome.


Rejection of a kinetic mechanism restricting early expression of long zygotic transcripts
A third potential explanation for the low expression level of long transcripts followed by a progressive reduction in the ratio of short/long gene expression is transcriptional kinetics. Consider a simple model under which a single RNA Pol II complex transcribes a locus at any one time. Assuming that expression level eventually reaches saturation, multiple short transcripts could be produced in the same amount of time required to transcribe a long transcript, allowing short genes to reach saturation while delaying maximal expression of longer genes. However, there are several lines of evidence that argue against this possibility. First, while it is not known whether cotranscription of multiple RNA Pol II complexes is a common feature of long genes in general, it has been observed in situ at a number of specific loci, suggesting that such a simple model does not capture biological reality (e.g., Beyer and Osheim 1988;Osheim and Beyer 1989). Second, a purely kinetic model cannot explain the decreasing ratio of

Analysis of 5ʹ′:3ʹ′ ratios over embryogenesis
We calculated the 5ʹ′:3ʹ′ ratio for each transcript that a) was included as part of the embryonic timecourse (see Methods), b) had a transcript of at least 2 kb in length, c) had only a single TSS according to the FlyBase 5.43 annotation, and d) did not overlap another transcript, leaving 3,396 loci with 5ʹ′:3ʹ′ ratios to analyze. Only long zygotic genes show a significant change, with the 5ʹ′:3ʹ′ ratio decreasing over time (slope = -0.162, R 2 = 0.811, p = 0.00905; p > 0.05 for all other categories; supplementary figure S3). We note that extension of the regressions to all 12 time points results in a significant negative slope among all four gene categories (p < 0.05); however long zygotic genes show a significantly steeper negative slope than the other gene categories (ANCOVA, p < 0.001), as expected by predictions of the intron delay hypothesis.

No evidence for delayed initiation of transcription in long genes based on histone marks
As a further confirmation of a lack of evidence for postponement of transcriptional initiation of long zygotic genes, we explored chromatin profiles in a window 1 kb upstream of the TSS using ChIP-Seq data from the modENCODE Consortium (2010) generated using embryos collected during four-hour windows over the first 12 hours of embryogenesis. It is important to note that these windows are too broad to draw conclusions regarding chromatin occupancy during the syncytial timecourse (though a recent study found that there is no evidence for substantial chromatin organization in promoters prior to mitotic cycle 14, as revealed by the lack of We first confirmed that H3K4me3 occupancy upstream of the TSS was correlated with expression level within the embryonic timecourse (where expression level was calculated as the average RPKM among the two embryonic time points within each 4 hour chromatin timepoint; supplementary figure 4A). Indeed, while expression levels of zygotic genes were positively correlated with H3K4me3 occupancy, the correlation was stronger for short, as compared to long genes, as expected if length is limiting expression.
Notably, occupancy differences of active chromatin over the first 12 hours were inconsistent with an initiation-based model as occupancy of long zygotic genes did not increase over the time period (supplementary figure 4A). While upstream occupancy of the repressive chromatin mark H3K27me3 was negatively correlated with expression level for short zygotic genes, we observed no significant correlation in the case of long zygotic genes (supplementary figure 4B). Furthermore, there was no significant decrease in repressive chromatin mark occupancy over the period, again providing no support to a delayed initiation-based model during this period of embryogenesis.

Transcript length and zygotic origin are not associated with particular functional classes
In order to better characterize potential functional differences among transcripts, we analyzed the representation of the different size categories of maternal and zygotic  (supplementary table S3). In contrast, we found that zygotic genes did not show significant over-representation among any GO biological process categories (p > 0.05). Furthermore, when zygotic and maternal genes were compared to one another, neither long nor short transcripts were significantly overrepresented in any GO category. Consequently, there is not a clear set of functions associated with whether zygotic genes are subject to or escape intron delay.

Analysis of 5ʹ′:3ʹ′ ratios over embryogenesis
Using a custom script, combined with HTseq-count at the locus level with the 'union' option, we counted the number of reads spanning the 5ʹ′ and 3ʹ′ 1 kb of each transcript excluding any intronic sequence during the first 12 hours of development, during which zygotic activation takes place. Read counts were quantile normalized using the R aroma.light package (Bengtsson and Hössjer 2006) and RPKMs calculated.

Gene ontology analysis
All maternal or zygotic loci considered significantly expressed in either the embryonic or syncytial timecourses were analyzed for functional over-and underrepresentation using FatiGO (Al-Shahrour et al. 2007) on the Babelomics version 4.3 webserver at (http://babelomics.bioinfo.cipf.es/functional.html). Gene lists were compared either to one another or the whole FlyBase 5.43 annotation among GO biological process levels from three to nine using two-tailed tests and retaining only p values < 0.05 when adjusted for multiple tests by the software.

Analysis of histone marks
Raw developmental timecourse ChIP-seq reads derived from antibodies to histone modifications H3K4me3 and H3K27me3 were obtained from GEO (accession numbers to all datasets are found in supplementary table S2) (modENCODE Consortium 2012).
All reads were mapped uniquely to the FlyBase D. melanogaster genome release 5 using Bowtie version 0.12.8 and allowing 2 mismatches (Langmead et al. 2009). Base-level coverage was assessed in a 1 kb window upstream of the TSS of genes with a single annotated TSS. Coverage was normalized between time points and chromatin marks by dividing by the total number of mapped reads by 10 6 .  S1).

Al
Supplementary Figure S2. Reproduction of fig. 3B showing all 12 developmental time points. The pattern of more rapidly declining read coverage over the first 5 kb of transcripts for zygotic (blue) as compared to maternal (red) transcripts holds during the stages not show in fig. 3B. Furthermore, the difference in measured slopes decreases significantly over the 12 2-hour developmental windows. Note that beginning at the 4h time point, expression levels, and hence coverage patterns of genes of maternal origin are largely determined by zygotic transcription.
Supplementary Figure S3. 5ʹ′:3ʹ′ ratios of coverage among different transcript categories support the intron delay model. Under a regulatory model, the 5ʹ′ and 3ʹ′ ends of all transcripts should have relatively similar read coverage. Under the intron delay model, however, the 5ʹ′:3ʹ′ ratio of long genes should be > 1 during early development, and decrease as development progresses. Median 5ʹ′:3ʹ′ ratios over the embryonic timecourse as determined from total RNA data are indicated for zygotic (blue) and maternal (red) genes. Short (< 5 kb) and long (≥ 5 kb) genes are indicated as circles and triangles, respectively. The area shaded in grey indicates time points during which expression levels of genes of maternal origin is largely determined by zygotic transcription. The 5ʹ′:3ʹ′ ratio decreased over the first six hours of development in long zygotic genes, but not in any other category, as expected under the predictions of the intron delay model. Figure S4. The presence of heterochromatic marks measured during four hour windows during the embryonic timecourse are inconsistent with transcriptional initiation-based repression of long zygotic genes (modENCODE Consortium 2012). A) Upstream coverage of ChIP-seq data from the active chromatin mark, H3K4me3, is significantly correlated with expression level across 4-hour developmental windows spanning the first 12 hours of embryogenesis, indicating that it is predictive of active transcription. Spearman correlation coefficients are strongest for short zygotic genes, as expected if long genes are limited from proper transcription by intron delay. Median base-level coverage of H3K4me3 among long zygotic genes does not increase over this interval, as would be predicted from a purely transcription initiation-based model of delay. Error bars indicate the standard error of the mean. B) At no time point is upstream coverage of the repressive chromatin mark, H3K27me3, significantly negatively correlated with expression of long zygotic genes. Furthermore, there is no significant pattern of decreasing upstream coverage as again would be consistent with a purely transcription initiation-based model of delay. ns, non-significant; *, p < 0.05; **, p < 0.01; ***, p < 0.001.