Single-cell transcription profiles in Bloom syndrome patients link BLM deficiency with altered condensin complex expression signatures

Abstract Bloom syndrome (BS) is an autosomal recessive disease clinically characterized by primary microcephaly, growth deficiency, immunodeficiency and predisposition to cancer. It is mainly caused by biallelic loss-of-function mutations in the BLM gene, which encodes the BLM helicase, acting in DNA replication and repair processes. Here, we describe the gene expression profiles of three BS fibroblast cell lines harboring causative, biallelic truncating mutations obtained by single-cell (sc) transcriptome analysis. We compared the scRNA transcription profiles from three BS patient cell lines to two age-matched wild-type controls and observed specific deregulation of gene sets related to the molecular processes characteristically affected in BS, such as mitosis, chromosome segregation, cell cycle regulation and genomic instability. We also found specific upregulation of genes of the Fanconi anemia pathway, in particular FANCM, FANCD2 and FANCI, which encode known interaction partners of BLM. The significant deregulation of genes associated with inherited forms of primary microcephaly observed in our study might explain in part the molecular pathogenesis of microcephaly in BS, one of the main clinical characteristics in patients. Finally, our data provide first evidence of a novel link between BLM dysfunction and transcriptional changes in condensin complex I and II genes. Overall, our study provides novel insights into gene expression profiles in BS on an sc level, linking specific genes and pathways to BLM dysfunction.


Introduction
Bloom syndrome (BS, MIM: 210900) is a rare genetic disease that was first described in 1954 in children with growth retardation and sunlight sensitivity (1). BS is defined as an autosomal recessive disease with clinical characteristics of primary microcephaly, pre-and postnatal growth deficiency, short stature, immunodeficiency and a predisposition to cancer with an early age of onset (2)(3)(4). A narrow face with a sunlight-sensitive erythematous rash on the cheeks and café-au-lait spots are the differential characteristics of BS (1,5). Mainly, biallelic loss-of-function mutations in BLM including nonsense and missense variants as well as frameshifting indels cause the BS phenotype (4). The BLM gene encodes the BLM RecQ-like helicase, an enzyme that unwinds double-stranded DNA during DNA replication and repair processes (6). Furthermore, the BLM helicase forms a part of a multiprotein complex called BTR complex, which also includes topoisomerase III alpha (TOP3A) and RecQ-mediated genome instability protein 1 (RMI1) and 2 (RMI2) (7)(8)(9). Recently, homozygous frameshift mutations in TOP3A and RMI1 as well as loss of the RMI2 gene as a result of a large homozygous deletion have been linked to a BS phenotype (10,11).
The BTR complex plays important roles in the maintenance of genome stability, dissolving DNA intermediates such as G-quartets, D-loops and Holliday junctions (12). The BLM helicase moves the branches of double Holliday junctions toward each other and the topoisomerase III alpha unhooks the hemicatenated structure, resulting in noncrossover products (13,14). For now, the BTR complex is the only known protein complex that is able to dissolve the double Holliday junctions in such a manner that no exchange between sister chromatids occurs during the homology-directed repair process (15,16). Further actions of the BTR complex such as localization to stalled replication forks and DNA end resection during double-stranded break (DSB) repair emphasize its importance for the protection of the genetic material (17). The cellular phenotypes of BS related to genomic instability include, e.g., mitotic aberrations such as chromatid breaks, anaphase bridges and high levels of somatic mutations (11,16,18). Gene expression levels are also affected in BS, but studies investigating the transcriptional profile alterations in BLM deficiency are rare. One recent study examined the expression profiles of BS fibroblasts by means of RNA microarray and found out that the expression levels of the genes with G4 motifs were deregulated in BS cell lines (19). Another recent study investigated the expression levels in BS by microarray analysis and emphasized the regulation of inflammatory interferon-stimulated genes in BS cells (20). Montenegro et al. (21) performed RNA-seq for two BS patients and reported abnormal expression levels of immune response and apoptosis-associated genes. Here, we report a comprehensive study of transcription profiles of BS patient fibroblast cell lines obtained by single-cell (sc) transcriptome sequencing. We define specific deregulated gene expression levels as well as specific pathway alterations in BS cells and highlight the power of sc transcriptomics for elucidating the molecular pathogenesis in BS.

Results
Two age-matched wild-type (WT) control fibroblast cell lines (GM08398 and GM00409) and three BS patient fibroblast cell lines (GM02520, GM02932 and GM02548; purchased from Coriell Institute for Medical Research) were used for scRNA profiling. The causative mutations were confirmed using Sanger sequencing on DNA isolated from each cell line (Supplementary Material, Fig.  S1). For single-cell transcriptome sequencing (scRNAseq), single cells were dispensed to a 5184-well microchip using ICELL8 system (Takara Bio) (22). We prepared libraries using the Nextera XT DNA library preparation kit (Illumina) and sequenced on an Illumina HiSeq4000.

Quality control of sc transcriptome data
To determine the differences in transcriptional profiles between the BS patient cells and WT cells, we performed scRNAseq on three BS patient fibroblast cell lines and two WT fibroblast cell lines (Supplementary Table S1). Quality control (QC) of the raw data was carried out by the CogentAP pipeline and a total of 567 out of 653 cells passed the initial QC filter. A total of 70.4% of the total reads mapped to exonic reads (Fig. 1A). The mean of the read counts per cell varied between 190 586 and 244 819 reads (Fig. 1B) and the number of expressed genes per cell was ∼7000 for each cell line (Fig. 1C). In addition, the level of ribosomal RNA contamination in the raw data was low (almost zero), as well as the level of mitochondrial RNA percentage, which was below our set quality target of 10%. (Supplementary Material, Fig. S2). Next, the dimension reduction of the data was done by using Uniform Manifold Approximation and Projection (UMAP) (23) (Fig. 1D). Despite some overlapping clusters that could be due to the nature of the UMAP (24), this analysis displayed that cells belonging to the same cell line were clustering at close positions in space and had similar expression profiles, whereas cells from different cell lines were distinguishable in terms of their localization in the UMAP and their expression profiles (Fig. 1D). Together, these results showed that the quality of the obtained scRNAseq data was robust and suitable for further analysis of transcriptional profiles of BS and WT fibroblasts on an sc level.  Table S1). We compared scRNAseq data from fibroblasts of three BS patient fibroblast in comparison to two WT controls and analyzed the results via overrepresentation analysis (ORA) for various gene sets. ORA for gene ontology (GO) terms revealed highly significant terms related to molecular signatures of BS such as mitotic defects and chromosomal aberrations (25). Strikingly, the detected deregulated terms amongst the top 20 most significant GO terms were mainly related to BS pathogenesis ( Fig. 2A). The GO term with the highest significance level was 'chromosome segregation' with >100 deregulated genes in the group, followed by 'mitotic nuclear division' and 'mitotic sister chromatid segregation' classes as the second and third most significant GO terms, respectively. Overall, ORA for GO terms from single cells revealed highly deregulated pathways in the BS cells in line with the molecular characteristics of the disease.
Next, we performed ORA using the Medical Subject Headings (MeSH) (26) to determine the association of the deregulated genes with disease-related terms. The top 20 headings with the lowest adjusted P-value contained terms relating to BS pathogenesis, such as genomic instability and various cancer-associated terms (Fig. 2B). In addition, we observed BS as a detected MeSH, which can be interpreted as an additional confirmation of the respective cell lines. The first three most significant terms of this analysis were 'aneuploidy', 'chromosomal instability' and 'genomic instability', each with >50 deregulated genes. Considering the cellular function of the BLM helicase in DNA replication and repair processes, deregulated gene expression levels of the three highest ranking MeSH terms (aneuploidy, chromosomal and genomic instability) were expected outcomes obtained by the sc transcriptome data of BS fibroblasts.
Moreover, we observed that all of the top 20 terms of the MeSH-based analysis were cancer-related terms and included cancer predisposition syndromes such as Fanconi anemia (FA) and Li-Fraumeni syndrome (27,28). One of the major and well-defined clinical characteristics of BS is the predisposition to different cancer types with a mean age of onset of 24 years (3). Our results obtained More than 70% of all reads mapped to exonic reads. (B) After removing cells with <10 000 total reads or <300 genes, the number of reads per cell was around 250K for all samples including two WT (WT1, WT2) and three BS patient fibroblast cell lines (BS1, BS2, BS3). (C) After removing genes with <100 total reads or expressed in fewer than three cells, the overall gene number per cell was between 7000 and 8000 genes for all samples. (D) The UMAP analysis of raw data presents the different gene expression profiles of each cell line originating from healthy and patient cell lines. Clustering between cells from different cell lines can be observed on the UMAP because of differential expression profiles of single cells. The plot was generated using ggplot2 package for R-programming language (72).
from MeSH analysis of sc expression profiles were in line with the clinical synopsis of BS, providing a molecular link between gene expression levels and clinical cancer predisposition characteristics of patients with BS.

Genes of the FA pathway are enriched in BS
We scrutinized the deregulated pathways in relation to known functions of the BLM helicase. BLM helicase is known to act on DNA intermediates that occur during DNA replication and repair processes such as the homologous recombination pathway (15). The function of BLM in these pathways has been widely studied as a part of the BTR complex (16,29). Interestingly, the expression levels of members of the BTR complex were not significantly deregulated in our scRNAseq data, indicating that dysregulation of BLM expression does not significantly affect the expression levels of the other members of the BTR complex, namely, TOP3A, RMI1 and RMI2. Still, we can currently not exclude an effect of BLM deletion on protein stability of BTR complex members (Supplementary Material, Table S2).
We then investigated the expression levels of genes of the FA pathway, which was already detected by ORA for MeSH as a highly deregulated pathway (Fig. 2B). Mutations in the various genes of the FA pathway are linked to the autosomal recessively inherited FA disorder (30), and interestingly, FA shows overlapping clinical features with BS such as primary microcephaly, growth deficiency and cancer predisposition (31). On the molecular level, FA and FA-associated proteins take part in cellular processes such as DNA replication, DSB repair and replication fork maintenance (32). Therefore, we first assessed the cell cycle stages bioinformatically as previously described (33) and observed a similar proportion of cells in the G2/M phase and a similar expression pattern for specific cell cycle markers for both sample groups ( Supplementary Material, Fig. S3). It is well established that various FANC proteins directly interact with BLM during these processes (34)(35)(36). We continued our analysis by comparing the expression levels of each gene of the FA group and observed several significantly deregulated genes of the FA pathway (Fig. 3). Interestingly, almost all of the significantly deregulated genes had higher expression levels in the BS fibroblasts compared with control fibroblasts (Supplementary Material, Table S3). One of the important interaction partners of BLM is FANCM at stalled replication forks (35) and another important interaction of BLM was observed with BRCA1, e.g. at the telomeres of chromosomes (37). Therefore, it should be noted that both of these genes had higher expression levels in BS cell samples compared with WT controls (Supplementary Material, Fig. S4). Additionally, FANCD2 expression was among the largest differences in expression levels in the class of FA-related genes, followed by FANCI expression level (Supplementary Material, Table S3).

Condensin complex I and II genes are overexpressed in BS cells
Condensin complexes are ring-like multiprotein complexes that organize the DNA and chromatin in loops to form the cylinder-shaped mitotic chromosomes (38,39). Thereby, condensin complexes are important for, e.g., maintaining chromosomal stability, correct segregation of chromatids and transcriptional control (40). Condensin complexes I and II are composed of the SMC proteins (SMC2 and SMC4) and the non-SMC proteins. Our sc transcriptome data analysis surprisingly revealed a highly significant upregulation of the genes coding for either condensin I or condensin II complexes, namely, SMC2, SMC4, NCAPG, NCAPG2, NCAPH, NCAPD2 and NCAPD3 (Fig. 3, red color). NCAPH2, which encodes a condensin-II-complex-associated protein, was not significantly altered in our data; however, we interpret this result as a putative gene dropout because of the sc analysis (41). The gene expression levels of most of the members of condensin I and II complexes were approximately two times higher in BS cells than in control cells and the changes were highly significant (Supplementary Material, Table S3). Based on our transcriptome data obtained from single cells, we hypothesize a link between BS and expression of condensin complex genes. In all BS samples, we observed higher expression levels of SMC2 and SMC4 (Supplementary Material, Fig. S4, bottom row) as well as of other members of the condensin complexes I and II. As a further experimental proof, we observed increased SMC2 and SMC4 protein levels in Western blots (Supplementary Material, Fig. S5). Overall, we suggest a molecular connection between condensation of the chromosomes and BLM function, which can be investigated in detail in future studies.

Genes associated with primary microcephaly are upregulated in BS
One of the main clinical manifestations in BS is primary microcephaly. We therefore analyzed in detail the expression levels of genes which have been previously associated with autosomal recessive forms of primary microcephaly (MCPH). As shown in Figure 3, we found altered expression levels in a significant number of MCPH-associated genes including CDK5RAP2, CENPE, SASS6 and ASPM. Among the 27 microcephaly-associated genes (42), significant differences in expression levels  (42) are highlighted with different colors as the following: condesin I/II complex, red; FA group, black; and genes that are clinically associated with microcephaly, blue. The plot was generated using EnhancedVolcano package from Bioconductor for R programming language. Complete gene list and the corresponding values are given in Supplementary Material, Table  S3. were observed in 12 genes (Fig. 3, blue color, Supplementary Material, Table S3). In addition, also mutations in genes encoding condensin I and II complex members are associated with primary microcephaly, e.g., NCAPD2, NCAPH, NCAPG2 or NCAPD3, which showed altered expression levels in our data as well (43,44).

Discussion
Technical advances in sc transcriptomics offer the opportunity to determine in more detail changes in transcriptional signatures in human disease and, thereby, to gain deeper insights into the underlying disease-causing mechanisms. In this study, we systematically analyzed transcription signatures and molecular pathways in sc RNA sequencing in primary dermal fibroblasts from three patients with BS. The performance of the ICELL8 (22) platform used in our analysis allowed the investigation of a high number of single cells passing our high-quality standards. Considering all the objectives, the Limma + Voom method used for analyzing the scRNAseq data was the most effective analysis tool among other bulk and sc RNA sequencing analysis models (45,46). We based our analysis on three patient cell lines compared with two age-matched WT controls to receive statistically significant results. It is important to note that both, WT and BS cells presented with a similar proportion of cells in G2/M phase (Supplementary Material, Fig. S3) and there was no significant difference in the expression of cell cycle-specific marker genes. We defined specifically deregulated gene expression levels as well as significantly altered pathways in BS cells, showing the power of sc transcriptomics for elucidating the molecular pathogenesis in BS. Associated pathways include previously described processes relating to, e.g., mitosis, chromosome segregation and cancer.
As described, we confirmed the deregulation of FA pathway genes in BS. This represents not only a molecular link between BS and FA, but also explains the clinical overlap between both syndromes. It is well established that several FANC proteins can directly interact with BLM during different processes (34)(35)(36), among them FANCM and FANCD2 (35,47), both of which showed altered expression in our analysis. FANCD2 has a well-described function during DNA repair as well as an important role in replication fork management and stabilization, which implies its independent roles from the rest of the FA-related genes and proteins. Also, FANCD2 has been shown to localize to ultrafine DNA bridges, which occur frequently in BS patient cells because of the high rate of mutations and resulting replication stress (48,49).
Regarding the genomic instability present in BS cells, we suggest that the enrichment of the FA pathway might be due to increased replication fork stalling and destabilization resulting from the high rate of somatic random mutations in the genome based on the affected DNA repair mechanisms (3,47,50).
Autosomal recessive primary microcephaly (MCPH) is a rare disorder mainly characterized by severe microcephaly at birth and mental retardation of variable degree in the absence of any additional significant neurological findings, malformations or growth anomalies (51). MCPH is a very heterogeneous disorder and numerous genes have been identified which, on a cellular level, play an important role during cell division processes, regulation of the cell cycle and in DNA damage response (52)(53)(54). Since BS patients also present with a syndromic form of primary microcephaly, we determined the expression profiles of 27 genes that were previously associated with microcephaly. A substantial number of 12 of these genes were deregulated, most significantly ASPM, a well-characterized MCPH gene, and CDK5RAP2, mutations in which have been reported in patients with primary microcephaly and/or Seckel syndrome (55)(56)(57). CDK5RAP2 encodes the centrosomal protein CDK5RAP2, which was shown to have an important role in centriole cohesion, microtubule dynamics and spindle orientation (58,59). Previously, we showed that loss of functional CDK5RAP2 in human fibroblasts leads to severe mitotic defects including fragmented centrosomes, unbalanced amounts of centrin and pericentrin and multiple or misplaced centrosomes (57). The analysis of brain organoids from a patient with Seckel syndrome and homozygous CDK5RAP2 mutation determined defects in neurogenesis caused by an altered balance of symmetric and asymmetric cell division of neuronal precursor cells (60,61). Overall, our scRNAseq data analysis strongly points towards a common or at least overlapping molecular pathogenesis of microcephaly in BS and MCPH.
Most interestingly, we observed a robust link between BLM deficiency and transcriptional upregulation of condensin complex I and II genes. This link has not yet been established. Condensins are multi-subunit SMCcontaining protein complexes found in all eukaryotes (62). They are built upon specific pairs of long coiledcoil subunits that heterodimerize by the central 'hinge' domain situated at one end of the coil. Additional subunits are recruited to the condensin complex and, depending on the proteins associated to the complex, condensin I and II complexes are defined (63). We observed a significant upregulation of SMC2 and SMC4 as well as other complex subunits (NCAPG2, NCAPG, NCAPH, NCAPD2 and NCAPD3). Condensin complexes are essential for the structural organization of mitotic and meiotic chromosomes. Studies have shown that chromosome assembly is impaired when condensin subunits are dysfunctional or depleted. Cells attempt to undergo anaphase but chromosomes frequently fail to segregate, leading to lagging chromosomes and the formation of chromatin bridges (43,64). In humans, autosomal recessive mutations in genes encoding condensin complex proteins have been associated with microcephalic syndromes (43). It is of further interest that recently described recessive loss-of-function mutations in NCAPG2 cause a neurodevelopmental disorder with microcephaly, and analysis of patient fibroblasts found not only abnormal chromosome condensation, but also augmented anaphase chromatin-bridge formation, similar to the anaphase-bridge formation observed in patients with BS and BLM dysfunction (44). On the other hand, BS cells lacking resolvases show strong condensation defects, especially at the common fragile sites resulting from unresolved recombination products, which then lead to ultrafine anaphase bridges (18,65,66). Our results will have to be confirmed by future functional studies aiming to elucidate the functional interaction of BLM and condensin complexes and it will be also of interest to investigate the expression and function of BLM in patients with different forms of condensinopathies.
In summary, we have defined gene expression profiles and pathway signatures in patients with BS caused by BLM deficiency using sc transcriptome analysis. We observed deregulation of specific gene sets related to the molecular characteristics of BS, such as mitosis, chromosome segregation, cell cycle regulation and genomic instability. In addition, we found specific and already described upregulation of BLM-interacting proteins, e.g., related to the FA pathway. Significant deregulation of genes associated with inherited forms of primary microcephaly might explain, in part, the molecular pathogenesis of microcephaly in BS. Finally, our data provide the first evidence of a novel link between BLM dysfunction and transcriptional changes in condensin complex I and II genes. Overall, our study provides novel insights into gene expression profiles in BS on a singlecell level linking specific genes and pathways to BLM dysfunction.

Cell culture
Two WT control fibroblast cell lines (GM08398 and GM00409) and three BS patient fibroblast cell lines (GM02520, GM02932 and GM02548) were purchased from Coriell Institute for Medical Research (see Supplementary Material, Table S1 for details). The cells were maintained in Dulbecco's Modified Eagle Medium (Gibco, Life Technologies) supplemented with 10% fetal bovine serum (Superior; Sigma-Aldrich), 100 U/ml penicillin, and 100 μg/ml streptomycin in a 37 • C incubator with 5% CO 2 . On the day of sc sorting, cells were harvested with 0.05% trypsin-EDTA (Gibco, Life Technologies) and diluted in 1× DPBS before further steps.

Mutation confirmation of the cell lines
Genomic DNA was extracted from one WT control and three patient cell lines using DNeasy Blood and Tissue kit (Qiagen). Specific primers were designed to cover ∼500 base pairs around the mutations and the regions were PCR-amplified using Qiagen Multiplex PCR kit (Qiagen). The PCR products were sequenced with corresponding primers using canonical Sanger sequencing at SeqLab-Microsynth (Göttingen, Germany).

Sc transcriptome sequencing
Resuspended cells were stained with Hoechst 33342 and Propidium Iodide (ReadyProbes Cell Viability Imaging Kit, Thermo Fisher) in accordance with the manufacturer's instructions. After microscope check for proper staining, the cells were counted using CASY cell counter and serial dilutions were prepared to obtain 1 cell/50 nl concentration before sc sorting. Single cells were dispensed to a 5184-well microchip using ICELL8 (Takara Bio) system (22) and the well content was evaluated with image processing software of Takara. The wells containing one living single cell were selected with the help of DAPI-positive (one signal spot) and propidium iodide-negative signals. After cell lysis at −80 • C, cDNA synthesis was done by SMART-Seq Single Cell kit (Takara Bio) and the library was prepared with Nextera XT DNA library preparation kit (Illumina). The generated library was sequenced on an Illumina HiSeq4000 and the fastq files were generated.

QC, alignment and counting of reads
First, the resulting fastq file was demultiplexed using the demux script of the CogentAP analysis pipeline version 1.0 (Cogent NGS Analysis Pipeline, Takara Bio). The single-end 51 base-pair reads were analyzed by using the analysis part from the CogentAP pipeline. This analysis utilized cutadapt (67) (version 1.18) for trimming of bad quality bases and over-represented sequences from reads. Afterwards, the reads were aligned against the reference genome GRCh38.12 with Ensembl version 94 using the STAR RNA-Seq aligner (68) (version 2.7.8a). For the counting of gene-level expression values, featureCounts (69) (2.0.1) was used. Cell samples with <10 000 reads or <300 expressed genes were excluded from the analysis. Additionally, the cells with more counts than three median absolute deviations from the total median log10library size were marked as outliers and were excluded. As the last step, cells with a mitochondrial fraction of 0-0.2 and an intergenic fraction of 0-0.1 were excluded as well.

Differential gene expression and ORA
The R-package Limma + Voom (45) for the differential gene expression analysis was used on the filtered and quality-controlled count data. Limma + Voom fits a linear model using the weighted least squares for each gene and combining it with Empirical Bayes smoothing of standard errors. The resulting differentially expressed gene lists were multiple test adjusted (Benjamini-Hochberg) (70) and used as input for the ORA of Gene-Ontology, KEGG and MeSH terms using the clusterProfiler (71) package from the R-programming language.

Supplementary Material
Supplementary Material is available at HMG online