Increased gene dosage and mRNA expression from chromosomal duplications in Caenorhabditis elegans

Abstract Isolation of copy number variations and chromosomal duplications at high frequency in the laboratory suggested that Caenorhabditis elegans tolerates increased gene dosage. Here, we addressed if a general dosage compensation mechanism acts at the level of mRNA expression in C. elegans. We characterized gene dosage and mRNA expression in 3 chromosomal duplications and a fosmid integration strain using DNA-seq and mRNA-seq. Our results show that on average, increased gene dosage leads to increased mRNA expression, pointing to a lack of genome-wide dosage compensation. Different genes within the same chromosomal duplication show variable levels of mRNA increase, suggesting feedback regulation of individual genes. Somatic dosage compensation and germline repression reduce the level of mRNA increase from X chromosomal duplications. Together, our results show a lack of genome-wide dosage compensation mechanism acting at the mRNA level in C. elegans and highlight the role of epigenetic and individual gene regulation contributing to the varied consequences of increased gene dosage.


Introduction
Chromosomal aneuploidies or copy number variations (CNVs) are associated with a wide range of phenotypes in many organisms (Tang and Amon 2013;Durrbaum and Storchova 2016). To understand the effect of gene dosage on gene expression, a series of studies compared DNA copy number and mRNA levels in aneuploid cells and animals (Kojima and Cimini 2019). In plants and mammalian cells, partial and full aneuploidies showed complex responses to alterations in gene dosage, with secondary effects on the transcriptome and nonlinear correlation between copy number and expression (Aït Yahya-Graison et al. 2007;Huettel et al. 2008). In Drosophila melanogaster, buffering mechanisms were proposed to act through the gene regulatory networks to dampen the effect of gene dosage (Zhang et al. 2010;Malone et al. 2012). In Saccharomyces cerevisiae, gene expression largely correlated with copy number with some variation between genes (Gasch et al. 2016;Torres et al. 2016;Taggart and Li 2018).
While there is no effective compensation of chromosomal aneuploidies for the rest of the genome, X chromosomes are highly regulated in many animals. Typically, females (XX) contain twice the number of X chromosomes than males (XY) but the 2 sexes express similar levels of X chromosomal transcripts. Strategies that equalize X chromosome expression between sexes differ in different animals. In mammals (human and mouse), one of the X chromosomes is silenced in XX females; in flies (Drosophila), the single X chromosome is upregulated by a factor of 2 in XY males; and in worms (Caenorhabditis), both X chromosomes are downregulated by a factor of 2 in XX hermaphrodites (Meyer 2005;Samata and Akhtar 2018;Dossin and Heard 2021). While strategies differ, in each system, a multisubunit protein complex specifically targets and regulates X chromosome transcription in one of the 2 sexes through epigenetic mechanisms (Ercan 2015).
A high rate of CNVs was reported in mutation accumulation experiments in Caenorhabditis elegans (Lipinski et al. 2011;Konrad et al. 2018). In addition, 80% of the genome could be isolated as chromosomal duplications in the laboratory, leading to the conclusion that the worm is relatively tolerant to increased gene dosage (Hodgkin 2005). To understand the effect of increased gene dosage in C. elegans, we characterized large chromosomal duplications in several strains using DNA-seq and analyzed the effect of increased gene dosage in 3 chromosomal duplications and 1 fosmid insertion strain using mRNA-seq. Like other species, there was a complex response to increased gene dosage in C. elegans. While the average mRNA level increased for genes located within large duplications and the integrated fosmid, genes within the same chromosomal duplication showed varying levels of mRNA increase. An X chromosomal duplication that recruits the somatic dosage compensation complex (DCC) showed lower mRNA increase compared to one that did not recruit, also demonstrating the contribution of epigenetic regulation to the effect of gene dosage.

DNA-seq
At least 20 worms were hand-picked as young adults. Worms were washed by settling animals at least 3 times with 1 ml M9 and starved overnight to remove gut bacteria. Following a final M9 wash, worms were resuspended in 100 ml TE and frozen. For DNA isolation, 400 ll of lysis buffer (0.1 M Tris-HCl; 0.1 M NaCl; 50 mM EDTA; 1.25% SDS) was added and worms were sonicated using Bioruptor 30 s on/off at high for 30 min. Sonicated DNA was isolated using Qiagen MinElute kit and Illumina DNA sequencing libraries were prepared as described previously (Albritton et al. 2014). Single-or paired-end sequencing was performed using Illumina HiSeq-2000 and aligned to genome version WS220 (ce10) using Bowtie2 (version 2.3.2) with default settings (Langmead and Salzberg 2012). All replicate and read number information is provided in Supplementary File 1. Samtools version 1.6 (Li 2011) was used to merge replicates before running bamCompare from Deeptools version 3.3.1 (Ram ırez et al. 2016), using the following options: -binSize 500, -scaleFactorsMethod None, -normalizeUsing CPM, -operation log2, -minMappingQuality 30, -outFileFormat bedgraph, and -ignoreDuplicates. Copy number analysis was performed with CNVnator version 0.3.3 (Abyzov et al. 2011) comparing data from mutant strains to reference genome WS220 (ce10) using bin_size ¼ 1,000. CNVnator output files listing deletions and duplications are provided in Supplementary File 2. Overlap of CNVs with genes was determined by Galaxy (https://usegalaxy.org/) tools using coverage option in "Operate on Genomic Intervals" (Afgan et al. 2018) and provided in Supplementary File 3. Only genes within the duplications and deletions greater than 10 kb were considered for further analysis.

mRNA-seq
Mixed-stage embryos were collected from N2, TY1916, BC4289 and SP117 by bleaching gravid adults. L2-L3 worms were collected by plating embryos and growing for 22-26 h at 22.5 C, and young adults next day. Worms were resuspended in at least 10 volumes of Trizol and stored at À80 C. RNA preparation was performed as previously (Albritton et al. 2014). Briefly, samples were freeze cracked 3-5 times, followed by TRIzol purification, and cleaned up with Qiagen RNeasy kit. From 0.5 to 10 lg of total RNA, mRNA was purified using Sera-Mag oligo(dT) beads (Thermo Scientific), sheared and stranded Illumina sequencing libraries were prepared using a previously published protocol (Parkhomchuk et al. 2009). Sequencing was performed with Illumina HiSeq-2000 and reads were aligned to genome version WS220 with Hisat2 version 2.2.1 (Kim et al. 2019) using default parameters. Count data were calculated using HTSeq version 0.13.5 (Putri et al. 2022) and differential expression was performed using the R package DESeq2 version 1.30.1 (Love et al. 2014). FPKM values were generated using Cufflinks version 2.2.1 with optionsp 8 -library type fr-first strand (Trapnell et al. 2010). FPKM and DEseq2 output are provided in Supplementary File 4. In Fig. 3, log2 fold change values were median centered by subtracting the genome median from each value.

Gene enrichment analysis in OP37
Genes differentially expressed in OP37 compared to N2 were analyzed using the Wormbase tool Gene Set Enrichment Analysis (Angeles-Albores et al. 2016). PHA-4 ChIP-seq binding peaks from OP37 L3 larvae were downloaded from the modERN project (Kudron et al. 2018). The results of the enrichment analyses and the list of differentially expressed genes are provided in Supplementary File 5.

Characterization of chromosomal duplications and deletions using DNA-seq
To analyze the effect of gene dosage on mRNA expression, we used previously isolated strains with megabase-scale duplications (Supplementary File 1). Since prior characterization of the duplications was done by visible genetic markers (Hodgkin 2005;Edgley et al. 2006), we performed DNA-seq to map genes that were duplicated or deleted. First, we calculated average read coverage within 1-kb windows tiled across the genome. Plotting the ratio of coverage to wild type confirmed previously mapped duplications ( Fig. 1).
To further characterize the strains, we used CNVnator to generate a list of genomic windows with increased and decreased coverage compared to N2 (Abyzov et al. 2011;Supplementary File 2). In addition to the mapped changes, many of the strains showed smaller CNVs that were different from the N2 strain ( Fig. 2). Some CNVs were common among the duplication strains, which may have originated from a laboratory strain polymorphic to N2 (Vergara et al. 2009). Notably, there were multiple smaller deletions and insertions within and at the boundaries of the larger duplications (Supplementary File 2). One example is the presence of a duplication near $40 kb deletion on the X chromosome in the VC100 strain. The presence of additional changes in the boundaries may be due to imperfect repair of the double-strand breaks used to induce aneuploidies, followed by selection to laboratory conditions (Farslow et al. 2015).

Increased gene dosage and mRNA expression from chromosomal duplications
To analyze the effect of chromosomal duplications on gene expression, we performed mRNA-seq in 2 homozygous-viable strains with various length and number of duplications. SP117 contains a $5 Mb X chromosomal duplication attached to chromosome I (mnDp10). TY1916 contains a $1 Mb duplication from the X chromosome attached to chromosome IV (yDp11). TY1916 also contains previously uncharacterized changes involving the right arm of chromosome IV and the left end of chromosome V (Fig. 1). The left tip of chromosome V coverage (0.67) is similar to that of the characterized duplication from the X chromosome (0.60), thus appears to be duplicated. In the case of region with increased coverage on chromosome IV, the median ratio of coverage is lower (0.40) than that of the X duplication (0.60), thus this region may be present as free duplication or heterozygous in the population of worms used for DNA-seq.
To identify the genes affected by the chromosomal duplications, we used the CNVnator defined regions and categorized genes as duplicated or deleted (if transcription start-end of the gene fully overlaps with the CNV), affected (if overlap is 1 bp or more), or unaffected (no overlap; Supplementary File 3). We then plotted the log2 ratio of mRNA coverage in the duplicated strain compared to control along the wild-type chromosomes, highlighting genes within each category. Across all duplications, average mRNA-seq expression was increased, suggesting that gene dosage correlates with mRNA expression in C. elegans (Fig. 3). Notably, the effect on individual genes was variable, showing a range of log2 ratios for genes located within the same duplication (Supplementary File 4).
In mRNA-seq analyses using ratios, inclusion of lowly expressed genes reduces the magnitude of observed effect (Deng et al. 2011). To address this problem, we filtered out genes with FPKM values less than 1 in any wild-type replicate, and replotted the 3 large duplications in SP117 and TY1916 (Supplementary Fig.  1a). The median mRNA-seq increase for all 3 duplications was similar with or without filtering (1.58 vs 1.51 for mnDp10, 2.34 vs 2.49 for yDp11, 1.51 vs 1.43 for duplication on chr IV in TY1916), thus lowly expressed genes did not significantly skew the analysis. We also plotted FPKM expression values for each gene between the large duplication strain SP117 and wild type ( Supplementary Fig. 1b). Although the correlation is noisier at lower FPKM values, the shift in mRNA level is clear for genes located at the duplication.
To probe further into variability, we addressed if the tighter scatter of log2 ratios on the X chromosome (Fig. 3) is due to lower expression noise, which was shown for the single X chromosome upregulated by the DCC in flies . To address noise, we calculated the coefficient of variation for each gene (filtering out those with FPKM < 1) using wild-type mRNA-seq data replicates in embryos, larvae, and young adults ( Supplementary Fig.  1c). Overall, there was less variation in larvae, likely due to temporal dynamics of embryogenesis in embryos and germ cell development in young adults adding variability between collection of worms for mRNA-seq replicates. In all 3 developmental stages, the median coefficient of variation for X chromosomal genes was in between other chromosomes, suggesting that dosage compensation in worms does not reduce expression noise below that of autosomes.

Increased mRNA expression and indirect effects of a multicopy integrated fosmid
While megabase-scale chromosome duplications showed an average increase in mRNA level, we wondered if smaller chromosomal segments with higher copy numbers also increase mRNA expression. To this end, we used a multicopy fosmid integration strain, where GFP-3xflag tag was inserted to the C terminus of pha-4 gene within a fosmid containing 3 other genes (Sarov et al. 2012). The fosmid was then integrated into the genome randomly and a transgenic line that was vigorous and expressing the GFPtagged pha-4 was selected. This strain named OP37 was used to study the binding sites of pha-4, a transcription factor required for several developmental processes including the pharynx (Gaudet and Mango 2002;Zhong et al. 2010). The copy number for the fosmid was calculated to be 5.6 (Sarov et al. 2012).
The genes within and surrounding where the fosmid originated from are shown along with the DNA read coverage highlighting the increased gene dosage (Fig. 4a). The insertion site of the fosmid has not been mapped. Notably, the mRNA level of all 4 genes including pha-4 was increased in the OP37 strain compared to N2 control (Fig. 4b)  GFP tagging on mRNA expression or detection. The mRNA level of the other 3 genes on the fosmid increased by a factor of $3.6to 4.8, reflecting the increase in their copy number.
Highlighting the indirect effect on the transcriptome, 871 genes were identified to be up-or downregulated in the OP37 strain, despite the fosmid including only 4 genes. Analysis of genes that were differentially expressed revealed no enrichment for PHA-4 bound genes (25% of upregulated and 27% of downregulated genes were bound by PHA-4 compared to 27% of all genes; Fig. 4c). It is possible that $2.4-fold increase in pha-4 mRNA level does not cause an increase at the protein level or that the presence of multiple genes within the fosmid reduces pha-4 specific changes on the transcriptome. The differentially expressed genes in the OP37 strain showed tissue-specific enrichment possibly reflecting the developmental functions of pha-4 and the neighboring genes ( Fig. 4d; Supplementary File 5).

The mRNA increase from a X duplication differs across developmental stages
Next, we wondered how epigenetic regulation of the X chromosomes in different tissues would affect the consequence of an X chromosomal duplication. To address this question, we first considered the germline, where the X chromosomal genes are repressed compared to autosomal genes during early meiosis (Schaner and Kelly 2006). We reasoned that if a X chromosomal duplication is also subject to repression, the mRNA increase would be lower in adult worms. Indeed, genes located within the 5 Mb X chromosomal duplication in the SP117 strain showed less increase in mRNA expression in young adults compared to embryo and larvae (Fig. 5a). X chromosome repression in the germ cells is measurable in whole animals because germ cells outnumber the somatic cells in adults, leading to lower X chromosome expression compared to autosomes (Fig. 5b). Thus, our results suggest that repression of genes within the 5 Mb X chromosomal duplication in the germline negates the mRNA effect of the duplications specifically in this tissue.
The duplication strain BC4289 contains a $1 Mb translocation from chromosome IV to X allowing us to test if autosomal genes attached to the X chromosome are repressed in germ cells. Unlike the 5 Mb X-to-I translocation in SP117, the 1 Mb IV-to-X translocation in BC4289 showed a similar increase in mRNA expression in young adults compared to larvae (Fig. 5c). Thus, germline repression may not affect autosomal genes even when they are physically attached to the X chromosome. In the future, multiple autosome to X duplications should be analyzed to test the generality of this conclusion. Nevertheless, the SP117 data suggest that epigenetic regulation of X chromosomal genes reduces the effect of a large X duplication on mRNA expression in the germline.

Somatic dosage compensation reduces the level of mRNA increase in X chromosomal duplications
In C. elegans, hermaphrodite X chromosomes are repressed by a factor of 2 by somatic dosage compensation (Kramer et al. 2016  specifically binds to and represses transcription initiation from both X chromosomes (Kruesi et al. 2013;Kramer et al. 2015). DCC is recruited specifically to the X chromosomes by several cisregulatory elements on the X (McDonel et al. 2006;Ercan et al. 2007;Jans et al. 2009;. Recruitment leads to spreading of the complex to nearby chromatin, creating robust binding across the X chromosomes Street et al. 2019).
Previous studies used immunofluorescence to analyze the ability of several X chromosomal duplications to recruit the DCC (Csankovszki et al. 2004;Blauwkamp and Csankovszki 2009). These studies found that $1 Mb duplication in TY1916 (yDp11) does not recruit the DCC. Although one strong and one weak recruitment sites are present within yDp11, insertion of few recruitment elements is not sufficient to recruit high levels of DCC to an autosome , explaining the lack of robust DCC binding to this shorter duplication. The same stretch of DNA within the larger $5 Mb duplication in SP117 (mnDP10), is bound by the DCC (Csankovszki et al. 2004), due to the additional $4 Mb region containing 5 strong and 15 weaker recruitment elements. Since genes in yDp11 are also in mnDP10, we were able to test if epigenetic repression by the DCC reduced the effect of increased dosage across the shared genes (Fig. 5d). Indeed, the average mRNA increase from the 1 Mb region differed in the 2 strains (Fig. 5e). The median increase of mRNA expression from the 1 Mb duplication was lower when bound by the DCC (SP117) compared to when unbound (TY1916).
In TY1916, there are 4 copies of genes located within the 1 Mb yDp11; 2 X chromosomal copies under DCC-mediated repression and 2 duplicated copies on chromosome IV without DCC. In SP117, all 4 copies of the genes within the commonly duplicated region between mnDp10 and yDP11 are under DCC-mediated repression. In both strains, the expected level of median mRNA increase (based on copy number and the assumed DCC repression by a factor of 2) was lower than that of observed; $2.3-fold vs 3 in TY1916 and $1.5-fold vs 2 in SP117 [after filtering genes (FPKM < 1) median increase was $2.5 for TY1916 and 1.5 for SP117]. It would be interesting to know if other mechanisms of repression such as lamina-mediated organization of repressive chromatin contribute to repression of X chromosomal duplications (Snyder et al. 2016;. Regardless, the lower level of mRNA increase from the same set of duplicated genes in the SP117 (DCC-bound) strain compared to TY1916 (DCC-unbound; Fig. 5f) indicates that DCC-mediated repression reduced the consequence of increased gene dosage on mRNA expression.

Discussion
In C. elegans, several observations supported the idea that this organism is robust to changes in gene dosage in laboratory (Hodgkin 2005;Lipinski et al. 2011;Sarov et al. 2012;Konrad et al. 2018). Here, we showed that tolerance to increased gene dosage is not due to a genome-wide compensation mechanism acting at the mRNA level. However, it remains possible that effective mechanisms of compensation exist at the protein level (Dalley et al. 1993;Chang and Liao 2020). Post-transcriptional control of gene expression is common in C. elegans particularly in the germline (Nousch and Eckmann 2013), a tissue whose function is central to the viability of all isolated strains.    I II III IV V  I II III IV 5. mRNA expression in X duplications subject to repression in the soma and germline. a) SP117 contains a large $5 Mb duplication of chr X that is attached to chr I. Violin plots of log2 fold change from DESeq2 analysis of SP117 compared to N2 in embryos, L2-L3 larvae and young adult worms. Significance of difference between average log2 fold change for genes within and outside the duplication from chr X is tested by t-test. Increased mRNA expression from the duplication in embryos and larvae is highlighted with an arrow. b) Violin plot of log2 fold change from DESeq2 analysis of YA to L2-L3 in N2 and SP117. X chromosomes are repressed in the germ cells that are present in young adults but not in L2-L3 larvae Difference in average change in expression between X and A is tested by t-test. c) BC4289 contains a $1 Mb chr IV duplication attached to chr X. Violin plots of log2 fold change from DESeq2 analysis of BC4289 compared to N2 in L2-L3 larvae and young adults is shown. Significance of difference between average log2 fold change of genes within and outside the duplication from chr IV is tested using t-test. d) A schematic representing N2 control, SP117 and TY1916 genomes. The 1 Mb duplicated region (dark blue) in TY1916 (yDp11) does not bind to the somatic DCC. The 4 Mb uniquely duplicated region in SP117 (mnDp10) is indicated in light blue. Within mnDp10, genes within the duplicated region that overlap with yDp11 (dark blue) is bound by the DCC. e) Violin plots of log2 fold change of SP117 and TY1916 compared to N2 in L2-L3 larvae. The 1.5-and 2.5-fold refer to the ratio of the median for the 1 Mb common duplicated region compared to the unaffected X chromosomal genes [lowly expressed genes are filtered (FPKM < 1)]. The arrows indicate the mRNA-seq increase in the duplicated regions, with P-values generated from t-test. f) Violin plot of log2 fold change ratio for SP117 directly compared to TY1916. The increase in mRNA expression of the same set of genes is lower in SP117 when $1 Mb X chromosome duplication is bound by the DCC compared to when unbound in TY1916.
Our analysis of chromosomal duplications in C. elegans indicates that similar to other organisms (Kojima and Cimini 2019), increased gene dosage leads to increased mRNA expression with a high degree of variability between genes (Fig. 3). The variation in response to gene dosage could be due to measurement errors based on technical and biological variation in mRNA expression. It could also be due to feedback through gene regulatory networks (Malone et al. 2012). Indirect effect of increased gene dosage on the transcriptome was particularly evident in the fosmid integration strain (Fig. 4). It is possible that compensatory changes in the expression of other genes render chromosomal duplications and CNVs viable in the laboratory.
Partial aneuploidies and CNVs in specific genes have been implicated in a wide range of diseases (Inaki and Liu 2012;Tang and Amon 2013;Durrbaum and Storchova 2016). To determine the cause and consequences of these CNVs on organism phenotypes, it is necessary to understand the variation both between genes and between tissues. Here, our results demonstrate that tissuespecific mechanisms of X chromosomal gene repression control the level of mRNA increase from certain duplications in the germline and the soma. Thus, increased gene dosage could result in varying levels of mRNA-seq increase, when the same set of genes is under different epigenetic regulation. Overall, our work adds C. elegans to the line of research analyzing partial aneuploidies, demonstrating a lack of genome-wide dosage compensation mechanism acting at the mRNA level, highlighting gene-to-gene variability and the contribution of epigenetic regulation to the mRNA response to gene dosage.

Data availability
A list of new and published data used in this study is provided in Supplementary File 1 with GEO accession numbers. The new data are available at GEO Series number GSE198682. Supplementary Material is included at figshare: https://doi.org/ 10.25387/g3.19762756.