Vimentin binds to G-quadruplex repeats found at telomeres and gene promoters

Abstract G-quadruplex (G4) structures that can form at guanine-rich genomic sites, including telomeres and gene promoters, are actively involved in genome maintenance, replication, and transcription, through finely tuned interactions with protein networks. In the present study, we identified the intermediate filament protein Vimentin as a binder with nanomolar affinity for those G-rich sequences that give rise to at least two adjacent G4 units, named G4 repeats. This interaction is supported by the N-terminal domains of soluble Vimentin tetramers. The selectivity of Vimentin for G4 repeats versus individual G4s provides an unprecedented result. Based on GO enrichment analysis performed on genes having putative G4 repeats within their core promoters, we suggest that Vimentin recruitment at these sites may contribute to the regulation of gene expression during cell development and migration, possibly by reshaping the local higher-order genome topology, as already reported for lamin B.


INTRODUCTION
DNA is organized into hierarchical layers inside the nucleus. These are established through long-range chromatin interactions (1), which allow the creation of loops, where sets of genes are brought together into topologically associating domains (TADs) (2) and enhancers-promoters communication is facilitated (3). The spatial association of TADs with similar properties, leads to the creation of the transcriptionally active (euchromatic) and repressive (heterochromatic) compartments (4). Heterochromatin is mainly found at the nuclear periphery, while euchromatin fills the nucleoplasm (5). Still, regions of active chromatin are found near nuclear pore complexes (6). This organization is constantly reshaped during early development and differentiation, reflecting the required gene expression program (7).
While CCCTC binding factor (CTCF) and the cohesin complex have been identified as principally responsible for creating loops and TADs (8), lamin B receptor seems to play a pivotal role in tethering heterochromatin to the nuclear periphery (9). However, the molecular mechanisms underlying enhancers-promoters contacts and compartmentalization are still not fully elucidated. In this regard, accumulating evidence suggests that, at accessible chromatin regions (10), DNA folding into non-canonical structures may drive the recruitment of architectural proteins to promote gene clustering (11,12).
Among all DNA non-canonical secondary structures, G-quadruplexes (G4s) are tetra-helical arrangements that form within guanine-rich tracts. Here, four guanines interact through Hoogsteen hydrogen bonds to form planar arrays (G-tetrads) that stack one upon the other, building the core of the structure (13). G-quadruplexes can adopt different topologies depending on the relative orientation of the DNA strands (parallel, antiparallel, hybrid) and the shape of the loops connecting the G-tetrads (14). G-quadruplexes have been detected within the human genome (15). They are found at telomeres, 5 UTR regions, introns, and gene promoters. Interestingly, G4 motifs are depleted within housekeeping genes, while they are enriched within developmental and oncogenic ones, suggesting that they may play a specific role in the regulation of these gene clusters during cell development and cancer progression (15). Ligands that selectively bind to G4s at promoters have been shown to influence the expression of the associated genes, thus supporting a functional role of these structures (16).
Noteworthy, the enrichment in putative G4 forming sequences within enhancers and at TADs boundaries suggests that G-quadruplexes may regulate gene expression through their involvement in the three-dimensional organization of the genome (17). This picture is supported by the ability of G4s to recruit architectural and chromatin remodeling factors such as the (SWI/SNF)-like chromatin remodeler ATRX (18), the architectural protein HMGB1 (19,20), and the heterochromatin associated protein HP1␣ (21). Recently, Li et al. showed that G4 structures participate to the YY1-mediated DNA looping, thus providing experimental evidence to this model (12).
Still, a clear correlation between the complex G4 structural features and protein recruitment is lacking.
In the present study, we focused on peculiar G4 arrangements, to which we will refer here as G4 repeats. G4 repeats comprise two or more adjacent G4 modules, which eventually give rise to end-to-end mutual interactions. They were first characterized for the human telomeric sequence, where multiple adjacent G4s interact through transientstacking of the external tetrads (22). More recent studies highlighted the ability of gene promoter sequences to also give rise to G4 repeats (23)(24)(25). Among them, the hTERT sequence, located within the core promoter of telomerase, folds into three interacting parallel three-quartet G4s (23), the ILPR sequence, located within the promoter of insulin, folds into two cross-talking hybrid four-quartet G4s (24), while KIT2KIT*, which is found within the core promoter of c-KIT, folds into two interacting parallel three-quartet and antiparallel two-quartet G4s (25).
With the aim to identify proteins able to interact with G4 repeats, we performed pull-down assays with KIT2KIT* using nuclear extracts from the KIT-positive HGC-27 cell line. Noteworthy, we found the architectural protein Vimentin as the best interactor. By using a small panel of sequences, we showed that Vimentin binds to G4 repeats regardless of their sequence and topology and with high selectivity with respect to other DNA arrangements.
Vimentin is the first reported protein selective for G4 repeats versus individual G4s. This points to G4 repeats as unique structural elements involved in the higher-order genome architecture.

Oligonucleotides
Oligonucleotides were purchased lyophilized and RP-HPLC purified from Eurogentec (Seraing, Belgium) and used without further purification. The DNA sequences are listed in Table 1.
Oligonucleotides were resuspended in nuclease-free water from Thermo Fisher Scientific (Waltham, MA, USA) to obtain 100 M stock solutions, which were then diluted in the proper buffer for further analyses. The concentrations of the initial stock solutions were measured by UV absorbance at 260 nm on a Uvikon XS, using molar absorption coefficients calculated with a nearest neighbour model (26). Solutions were annealed with 85 • C heating for 7 min and then led to equilibrate overnight at room temperature. Equilibrated solutions were doped with KCl to promote G-quadruplex folding. For duplex DNA preparation, oligonucleotides were annealed in the presence of equimolar amounts of the complementary strand.

Protein sample preparation
Recombinant human Vimentin was purchased lyophilized and RP-HPLC purified from LS-Bio (Seattle, WA, USA) and was stored at −80 • C in 8 M urea, 5 mM Tris-HCl (pH 7.5), 1 mM dithiothreitol, 1 mM EDTA, 0.1 mM EGTA. The day before use, Vimentin was renatured following a protocol developed by Herrmann and colleagues to avoid extensive polymerization into filaments (27). Briefly, the protein was dialyzed at room temperature against dialysis buffer (5 mM Tris-HCl (pH 8.4), 1 mM EDTA, 0.1 mM EGTA, and 1 mM dithiothreitol) containing progressively reduced urea concentration (6, 4, 2 and 1 M urea). Dialysis against a large volume of dialysis buffer was continued overnight at 4 • C. The next day, dialysis was continued into tetramer buffer (5 mM Tris-HCl (pH 8.4)) for 1 h at room temperature. After dialysis, the concentration of Vimentin monomers was determined by measuring the absorption at 280 nm with ε = 24 900 cm -1 M -1 .

Pull-down assays
600 l of Streptavidin-coated paramagnetic particles (Promega, Milan, Italy) were washed three times with 600 l of PBS-1× (Euroclone, Milan, Italy) and then resuspended in 60 l of PBS-1×. 2 M 5 -biotinylated KIT2KIT* was previously annealed in 10 mM potassium phosphate (pH 7.4) and subsequently equilibrated overnight in the presence of 150 mM KCl, to promote Gquadruplex folding. For duplex DNA preparation, 2 M 5biotinylated KIT2KIT* was annealed in 10 mM potassium phosphate (pH 7.4) in the presence of equimolar amounts of the complementary strand. The oligonucleotide was then added to the beads and incubation was performed for 30 min at room temperature. Beads were washed three times with 600 l of oligonucleotide buffer and then resuspended in 60 l of pull-down buffer (20 mM HEPES (pH 7.9),

Circular dichroism spectroscopy
Circular dichroism (CD) spectra were acquired on a JASCO J-810 spectropolarimeter equipped with a Peltier temperature controller. CD spectra were recorded from 235 to 330 nm with the following parameters: scanning speed of 100 nm/min, band width of 2 nm, data interval of 0.5 nm and response of 2 s. Measurements were performed using a 1 cm path length quartz cuvette at oligonucleotide concentration of 2 M in 10 mM Tris-HCl (pH 7.4), 150 mM KCl. Observed ellipticities were converted to Molar Ellipticity which is equal to deg·cm 2 ·dmol −1 , calculated using the DNA residue concentration in solution.

Fluorescence anisotropy
Fluorescence anisotropy measurements were performed on a JASCO FP-6500 spectrofluorometer equipped with polarization devices and with a Peltier temperature controller.
Measurements were performed at 25 • C using a 1 cm path length quartz cuvette with the following parameters: 495 nm excitation wavelength, 520 nm emission wavelength, band width of 5 nm, response 8 s, sensitivity high, two acquisitions. The instrument G factor was determined prior to anisotropy measurements. 5 -6-FAM labelled oligonucleotides were used at a concentration of 5 nM in 5 mM Tris-HCl (pH 8.4), 150 mM KCl. Titrations were performed by adding increasing concentrations of recombinant Vimentin to the oligonucleotide solution. After mixing, the solution was led to equilibrate for 10 min at room temperature before acquisition. Experiments were performed in triplicate. Acquired data were fitted according to a 1:1 binding model with the following equation: where [DNA] and [VIM] stand for DNA and tetrameric Vimentin concentrations, respectively; A obs is the observed anisotropy value; A 0 is the anisotropy value in the absence of protein; A represents the total change in anisotropy between free and fully bound DNA, and K D is the equilibrium dissociation constant.

Trypsin limited proteolysis
The solution of Vimentin in 5 mM Tris-HCl (pH 8.4) was loaded on a Pierce™ Detergent Removal Spin Column (Thermo Scientific) and eluted in the same buffer. then analyzed by SDS-PAGE. Labelled bands were cut and subjected to 'in-gel' trypsin digestion and LC-MS E analyses for peptide sequencing. The sequence of full-length Vimentin used in the experiments was also confirmed by MS analysis of the recombinant protein (calculated average mass 53520.6 Da and measured mass 53520.6 Da, Supplementary Figure S1).

In-gel digestion
In-gel digestion of protein bands was performed according to Shevchenko et al. (31). Briefly, excised bands were cut into small cubes, washed with water, with 50% acetonitrile in water and shrunk with neat acetonitrile. Gel particles were swelled in 10 mM dithiothreitol, 0.1 M NH 4 HCO 3 and incubated for 45 min at 56 • C. After cooling at room temperature, the supernatant was replaced with the same volume of iodoacetamide solution (55 mM iodoacetamide in 0.1 M NH 4 HCO 3 ) and the tubes were incubated for 30 min in the dark at room temperature. After removal of the iodoacetamide solution, gel pieces were washed again with water followed by 50% acetonitrile in water and shrunk with neat acetonitrile to remove completely the Coomassie staining. The gel particles were eventually rehydrated on ice in a solution containing 5 ng/l of trypsin (Promega, modified sequencing grade) in 50 mM NH 4 HCO 3 . After complete rehydration, gel pieces were covered with 50 mM NH 4 HCO 3 and incubated overnight at 37 • C. The supernatants were then transferred to clean tubes and peptides were extracted from gel particles upon incubation with 5% formic acid in water followed by dilution with an equal volume of neat acetonitrile. All the peptide-containing supernatants were combined and dried using a Speed-Vac system (Savant).

Mass spectrometry analyses
The tryptic digests of the gel bands were analysed using a Xevo G2-S QTof (Waters) equipped with a Waters Acquity H- The following parameters were used in the MASCOT search: trypsin specificity; maximum number of missed cleavages, 1; fixed modification, carbamidomethyl (Cys); variable modifications, oxidation (Met); peptide mass tolerance, ±10 ppm; fragment mass tolerance, ±15 ppm; protein mass, unrestricted; mass values, monoisotopic. A protein was considered identified when two unique peptides with statistically significant scores (P < 0.05) were obtained.
MS E data of the gel bands from the proteolysis experiment were processed with the BiopharmaLynx, setting trypsin as digest reagent, one missed cleavage and carbamidomethyl cysteine as fixed modification. MS ion intensity threshold was set to 100 counts, and the MS E threshold was set to 100 counts. MS mass match tolerance and MS E mass match tolerance were set to 10 ppm. The peptide list obtained from the LC-MS analysis of Vimentin at 0 min of proteolysis was reduced to represent only peptides with an intensity higher than 3000 counts and it was considered as the reference list of tryptic peptides. For the digests of the bands at the different times of incubation, peptides with a signal higher than 100 counts were considered identified, provided that they displayed a retention time in accordance with the same peptide in the reference list. In order to identify the region of Vimentin in the different gel bands, the percent ratio between the intensity of each peptide in the LC-MS analysis of the digest of the band and in the reference list was calculated. This calculation was performed in order to consider the different ionization efficiency of the different peptides.

QPARSE search and GO enrichment analysis
The whole set of human genes were downloaded from EN-SEMBL (32) (GENCODE v34) and the upstream 100 nucleotides from the transcription starting site (TSS) of each gene were extracted to search for double and triple putative G4 repeats (double triple G4 PQS). The pattern search was performed using QPARSE (33) with the following options: i) for double G4 repeats we searched for islands of at least 3 (-m 3) and up to 4 (-M 4) Gs/Cs with connecting loops of maximum 5 nucleotides (-L 5), and at least five perfect islands (-p 5) out of eight islands (-n 8) with the bulged islands that contain only one gap of length 1 (-l 1), ii) for triple G4 repeats we searched for islands of at least three (-m 3) and up to four (-M 4) Gs/Cs with connecting loops of maximum five nucleotides (-L 5), and at least eight perfect islands (-p 8) out of 12 islands (-n 12) with the bulged islands that contain only one gap of length 1 (-l 1). The searched pattern was extended both in the forward and reverse strand (using the parameter -b C and -b G to search for PQS in the reverse and forward strand respectively). For comparison purposes, we calculated the GC content of the 100 bp up- stream the TSS of all the sequences. To assess whether genes containing putative G4 repeats were enriched in functional categories or signalling pathways, we selected genes with a high GC content similar to that found in genes containing the searched motifs. We used this list as background population. The list of double triple-G4 PQS versus the background population were analyzed in DAVID tool (34).

Identification of nuclear proteins that bind to the KIT2KIT* G4 repeat
To identify nuclear proteins that interact with KIT2KIT* G4 repeat, pull-down assays were performed with nuclear extracts from the KIT-positive HGC-27 cell line. Streptavidin-coated paramagnetic beads were derivatized with the biotinylated oligonucleotide and subsequently incubated with nuclear extracts. Bound proteins were eluted with a KCl gradient (Supplementary Figure S2A). A last fraction was obtained by boiling beads in denaturing Laemmli sample loading buffer (29). When resolved by SDS-PAGE ( Figure 1A), this last fraction exhibited three main bands that were cut and subjected to in-gel trypsin digestion and LC-MS E analyses for protein identification. Two of them (bands S1 and S2 at ∼15 kDa and ∼28 kDa, respectively) corresponded to Streptavidin monomer and dimer that detached from the beads along boiling procedure (Table 2; detailed data are given as Supplementary Material-Excel file). That aside, the band at ∼50 kDa (band V in Figure 1A) was associated to the intermediate filament protein Vimentin (  Figure  S2B). Most importantly, no enrichment in Vimentin was observed within the last fraction ( Figure 1B).

Vimentin selectively binds to G4 repeats
To validate the binding of Vimentin to the G4-folded KIT2KIT*, we performed electrophoretic mobility shift assays (EMSA) with the purified recombinant protein. The oligonucleotide was equilibrated in 150 mM KCl to promote G-quadruplex formation before protein addition. To avoid Vimentin polymerization into filaments, binding reactions were carried out at pH 8.4. Indeed, under these experimental conditions, Vimentin is stably arranged into tetramers (27). As shown in Figure 2A, at pH 8.4, free Vimentin tetramers migrate toward the anode as well as Vimentin-DNA complexes. Vimentin was proved able to bind to KIT2KIT*, leading to a well-defined band belonging to the complex. A fraction of free DNA appeared as a retarded band as a result of partial dissociation of the complex during the run. Noteworthy, no complex was observed when Vimentin was incubated with the isolated KIT2 and KIT* G-quadruplexes.
To determine whether KIT2KIT* recognition was based on sequence composition or structural features, we tested other DNA sequences for which the folding into G4 repeats akin to KIT2KIT* was already reported. Figure 2B shows EMSA performed with the insulin-linked polymorphic region (ILPR) and the human telomerase promoter (hTERT). With both oligonucleotides Vimentin formed single well-defined complexes. The heterogeneity of the so far tested sequences suggests that Vimentin recruitment is driven by DNA folding into G4 repeats, irrespectively of G-quadruplex topology and number of G-tetrads. Therefore, to better characterize this interaction, we moved to the telomeric sequence, as it constitutes an easily tunable model for G4 repeats. Indeed, by increasing the number of TTAGGG repeats, the resulting oligonucleotide folds into one (TEL), two (2TEL), three (3TEL) or four (4TEL) adjacent G-quadruplexes (22). Moreover, Vimentin association with telomeres has already been observed within living cells (35). As expected, Vimentin did not bind to the single telomeric G4 ( Figure 2C) while it interacted with the G4 repeats. With 2TEL and 4TEL it formed a single complex, while in the presence of 3TEL, two different complexes were detected, possibly reflecting the reported conformational heterogeneity of this oligonucleotide in solution (36).
To assess whether the distance between the adjacent G4s might affect the binding of Vimentin, we increased the length of the linker connecting the two G4 motifs within 2TEL. In particular, starting from this model, we elongated the central 3 nts loop (TTA) up to 6 and 9 nts, by inserting 2 or 3 (TTA) repeats (2TEL(6) and 2TEL(9), respectively). EMSA performed with both these sequences showed only a smearing of the bands belonging to the free oligonucleotides, thus reflecting weak interactions with the protein (Supplementary Figure S3).
The selectivity of Vimentin for G-quadruplex versus duplex and single-stranded DNA was proved by performing EMSA with double-stranded KIT2KIT* and 2TEL and with a 49-mers G-rich oligonucleotide (G-rich noG4) unable to fold into G4, as demonstrated by circular dichroism and thermal difference spectra (TDS) (Supplementary Figure S4). Vimentin little interacted with doublestranded DNA, leading to poorly defined complexes (Supplementary Figure S5). As regards the unfolded singlestranded oligonucleotide, no binding was detected. Interestingly enough, when the same experiments were performed under KCl-free conditions, Vimentin interacted with both single and double-stranded oligonucleotides (Supplemen-  tary Figure S5), in line with the already reported association of Vimentin with G-rich DNA (37,38). The fact that this interaction is largely impaired in the presence of 150 mM KCl, suggests that the binding of Vimentin to duplex and unfolded oligonucleotides is likely driven by non-specific electrostatic interactions. Conversely, the protein binding to G4 repeats relies on a more specific binding pattern that still occurs in the presence of the metal ion.

One Vimentin tetramer binds to two adjacent G4s with nanomolar affinity
The stoichiometry of Vimentin binding to G4 repeats was investigated according to Job method (30), the complex formed at variable Vimentin and oligonucleotide molar fractions being resolved by agarose gels and quantified. The Job plot derived for the complex of Vimentin with the telomeric G4 repeat 2TEL ( Figure 3A) showed maximal complex formation at 0.2 DNA molar fraction, corresponding to a 1:4 DNA:Vimentin binding stoichiometry. Thus, interaction is likely to occur between a Vimentin tetramer and two adjacent G4s. Consistently, in the presence of 4TEL, the Job plot showed a maximum at 0.1 DNA molar fraction, confirming the recruitment of two Vimentin tetramers on a stretch of four contiguous G4s ( Figure 3B).
To quantitatively determine the affinity of Vimentin for the telomeric G4 repeat, we followed the change in fluorescence anisotropy of 5 -6-FAM labelled 2TEL, upon titration with the protein. Analyses were performed considering the concentration of Vimentin tetramers and data were fitted according to  that Vimentin binds strongly to 2TEL, with a K D value of 25.7 ± 2.4 nM ( Figure 3C).

Vimentin binding to G4 repeats competes with filament assembly
To assess whether Vimentin assembly into filaments impacts on its G4 binding properties, we performed EMSA with 2TEL in 150 mM KCl, at pH 7.4 ( Figure 3D). Indeed, as previously reported by Herrmann et al. (27), lowering the pH to 7.4 in the presence of high ionic strength, causes extensive polymerization of Vimentin. This clearly emerges from the almost complete disappearance of the band corresponding to Vimentin tetramers in agarose gels (third lane of Figure 3D). Interestingly, when 2TEL was added to polymerized Vimentin, the complex with the soluble tetrameric form of the protein formed, as evidenced by the appearance of its characteristic band. In line with the dynamic reversible assembly of Vimentin (39), the protein polymerization into filaments does not prevent the binding to G4 repeats. Instead, this interaction competes with filament assembly.

Vimentin N-terminal domain is involved in the interaction with G4 repeats
To identify the Vimentin domains that are involved in the interaction with G4 repeats, we performed limited proteolysis experiments on tetrameric Vimentin in the absence and in the presence of stoichiometric amounts of 2TEL. Proteolysis was performed with trypsin since Vimentin contains several lysine and arginine residues homogenously distributed along the sequence (Supplementary Figure S6). Proteolysis reaction mixtures obtained after different times of incubation were resolved by SDS-PAGE ( Figure 4) and the sequence of the fragments was determined by in-gel trypsin  digestion followed by LC-MS E analyses as reported In Table 3. Analysis of the Vimentin band at time = 0 min (band 0 in Figure 4) gave a sequence coverage of 91% (Supplementary Material-Excel file), with only some internal regions missing (amino acids 97-99, 235-269, 292-293 and 310-312). In the absence of DNA, after 1 min from trypsin addition, full-length Vimentin was converted into two main species (bands 3 and 4 in Figure 4), corresponding to the protein lacking respectively the N-terminal domain (band 3), and both the C-terminal and N-terminal domains (band 4). This result fits with the intrinsically unfolded state of the N-terminal and C-terminal domains that makes them readily subjected to proteolysis.
Addition of G4 repeats promoted a delay in proteolysis. Worth of note, after 1 min from trypsin addition, two different species were generated from the full-length protein (bands 1 and 2 in Figure 4). These correspond to Vimentin lacking the first three amino acids at the N-terminal domain (band 1) and to the same fragment further deleted of the C-terminal domain (band 2). Concomitantly, the full digestion of the N-terminal domain required to convert them into fragments 3 and 4, respectively, is reduced significantly. These data indicate that, in the presence of G4 repeats, the N-terminal domain is more resistant to proteolysis, thus pointing to its direct involvement in the DNA binding.

Putative G4 repeats are found within the promoter of genes involved in the cellular response to external stimuli, cell-cell communication and locomotion
So far, we showed that Vimentin selectively binds to G4 repeats. To search for sequences putatively able to adopt such conformation, we previously developed QPARSE tool and highlighted their non-random distribution within human gene promoters (33). Here, we refined our search focusing on the first 100 bp upstream the transcription starting site (TSS) of the genes, since this is the region where we previously found the highest frequency in putative G4 repeats (33). Moreover, it comprises the already characterized G4 repeats KIT2KIT* and hTERT, for which a role in controlling the expression of the downstream gene has been experimentally confirmed (40,41).
All the promoter regions corresponding to genes annotated in GENCODE v34 (38404 sequences) were downloaded from ENSEMBL (32). The software identified 1477 genes containing at least one putative double G4 repeat (two adjacent G4s) and 295 sequences containing a putative triple G4 repeat (three adjacent G4s). Sequences with a triple G4 repeat were a subset of those with a double G4 repeat, as expected, apart for only one gene due to the slightly different searching criteria. Overall, the retrieved sequences potentially able to fold into a double or triple G4 repeat are 1478. The median GC content of these sequences is 80% and 98% of this subset shares a GC content >60%. We refer to this list as double triple G4 PQS. We further selected a background population for comparison including 14 053 sequences that share the same high GC content with double triple G4 PQS (greater than 60%) but do not contain any putative double or triple G4 repeat. We call this list GC rich BKG.
Using these two lists of genes, we performed a GO enrichment analysis using DAVID tool (34), to look for a link between the reported physio-pathological roles of the G4 repeats containing genes, and those related to Vimentin. The more interesting results are summarized in Figure 5 (detailed results are found in Supplementary Material-Excel file).
Cellular component analysis revealed a significant enrichment in cell membrane components, particularly in those engaged in cell junctions. Both biological process and functional analyses highlighted an enrichment in proteins responsible for cell-cell communication, signal transduction and locomotion, together with an overrepresentation of genes involved in neurogenesis and nervous system development. We performed the same analyses with Panther tool (42) and again we found a significant enrichment in plasma membrane components, particularly those participating to cell surface signaling, cellular response to external stimuli and cell-cell communication.

DISCUSSION
Vimentin is an intermediate filament protein highly expressed within migratory cells that are present at the early stage of embryonic development (43). Its postnatal expression is restricted to motile cells such as fibroblasts, endothelial cells, lymphocytes, and Swann cells (44). Noteworthy, epithelial cells rely on Vimentin expression to acquire fibroblast-like morphology and increased migratory capacity during epithelial to mesenchymal transition (EMT), which occurs both during physiological tissue development/regeneration and pathological cancer progression toward metastasis (45).
As a structural protein with main cytosolic localization, Vimentin function reported so far is to orchestrate cytoskeletal rearrangements and mechano-signaling in support to cell migration (44). However, accumulating evidences currently point to specific functions of Vimentin at nuclear level. Among them, studies conducted on poorly differentiated metastatic cancer cells revealed the presence of Vimentin within their nuclear matrixes, while it was no longer detected upon induction of cell differentiation (46,47). Moreover, in human embryo fibroblasts, Vimentin has been found in tight association with telomeres and centromeres and a correlation between DNA guanine enrichment and vimentin binding was highlighted (35).
In the present study, we found that the fraction of Vimentin that was present within nuclear extracts of undifferentiated HGC-27 cells, binds to DNA in a structure dependent/sequence independent manner. Indeed, it efficiently interacts with different G4-folded DNA sequences with almost no binding to the corresponding unfolded/duplex conformations. As a further level of specificity, the presence of at least two adjacent G4s is required for Vimentin recruitment, regardless of the topology of the participating G4s (parallel + antiparallel for KIT2KIT*, hybrid for telomeric G4s and ILPR, parallel for hTERT) and the total number of G-tetrads (five for KIT2KIT*, six for 2TEL, eight for ILPR, nine for hTERT). Indeed, Vimentin does not bind to individual G4s, regardless of their number of G-tetrads (two for KIT*, three for KIT2 and TEL) and topology (antiparallel for KIT*, parallel for KIT2 and hybrid for TEL). Moreover, the close proximity of the G4 units comprising the G4 repeat is required for efficient recruitment of Vimentin. This is the first time to our knowledge that a G4-binding protein displays selectivity for G4 repeats.
Tolstonog et al. already addressed the interaction of Vimentin with intra-and inter-molecular G4 structures and observed that the best interaction with the protein occurred with bi-or tetra-molecular G4s (38). Interestingly, the new evidences in our hands help to rationalize this output. Indeed, based on the sequence composition of the oligonucleotides they studied, only the multimeric arrangements are compatible with the formation of G4 repeats. Moreover, also within their models, the distance between the adjacent G4s modulated the binding of the protein.
Vimentin exists in a highly dynamic state within living cells, where post-translational modifications drive filament assembly/disassembly, in response to changes that occur in the extracellular environment (39,48). In the present study, we showed that Vimentin binds to DNA G4 repeats in the tetrameric form, the stoichiometry of the complexes being one Vimentin tetramer every two adjacent G4s. Noteworthy, this interaction shifts the Vimentin assembly equilibrium toward the naturally present soluble fraction (49). Vimentin assembly into filaments proceeds through the lateral association of Vimentin tetramers into unit length filaments (ULF) followed by the N-terminal to C-terminal longitudinal annealing of ULF to yield mature filaments (50). Our limited proteolysis experiments showed that the interaction of Vimentin with G4s occurs at the N-terminal domains, and this can affect the longitudinal annealing of ULF into filaments. Noteworthy, Vimentin tetramers switch from A11-type (where the N-terminal domains are oriented toward the centre of the tetramer) to A22-type (where the N-terminal domains are placed at the edge of the tetramer) during ULF formation (51). Therefore, the binding of G4 repeats to the soluble A11-type tetramers may also prevent the type-switching and, consequently, ULF formation.
The high affinity of Vimentin for G4 repeats fits with its already reported association with telomeres and centromeres. The herein acquired in vitro evidence of Vimentin binding to G4 repeats at gene promoters suggests that the same interaction may occur within living cells as well. In this Nucleic Acids Research, 2022, Vol. 50, No. 3 1379 regard, GO enrichment analysis performed on genes having putative G4 repeats within their core promoters revealed an overrepresentation of cell membrane components, particularly those participating to cell-cell communication and cell surface signalling. Noteworthy, among them there is the zinc finger protein SNAI1, the expression of which was shown to be directly regulated by Vimentin during EMT (45). It is thus tempting to suggest that the binding of Vimentin at G4 repeats may contribute to the regulation of the expression of the associated genes, possibly contributing to wider DNA topological changes. In this regard, Vimentin was shown to influence not only nuclear shape and mechanics, but also chromatin condensation within mesenchymal cells (52). It has been reported that soluble pools of Vimentin-related lamin A/C and B can contact euchromatin within the nucleoplasm, promoting gene clustering within the so-called euchromatin lamin associated domains (eLADs), ultimately regulating their expression (53,54). Of particular interest is the involvement of lamin B1-eLADs in chromatin reorganization that occurs during EMT. Indeed, lamin B1 was shown to bind to G-rich promoters of genes that belong to the EMT pathway, helping the establishment of the EMT transcriptional program (54). Soluble pools of Vimentin may exert a similar function, or even provide lamin B a way to contact DNA at G-rich promoters. Indeed, the direct interaction of Vimentin Cterminal domain with lamin B has already been reported in vitro (55).
GO enrichment analysis also highlighted the presence of putative G4 repeats within the promoters of genes involved in neurogenesis and nervous system development. In this regard, the relevance of Vimentin in neurological development is well established (56,57). In a recent study, high levels of soluble Vimentin were detected in the axoplasm of neurons following neuronal injury. Vimentin was found able to translocate from the site of injury to the soma through direct interaction with ␤-importin and dynein-mediated retrograde transport. The authors point to a signalling role for soluble Vimentin during neural injury, in support to neurite regeneration (58). Noteworthy, the direct interaction of Vimentin with ␤-importin, provides a mechanism for its entry into the nucleus in response to peripheral stimuli.
Overall, these evidences support a correlation between the functions of soluble Vimentin and those of the genes containing putative Vimentin binding sites at their promoters.
To conclude, in the present study, we identified the intermediate filament protein Vimentin as a selective binder for G4 repeats. The fact that Vimentin does not bind to individual isolated G4s could provide it a way to contact DNA at specific genomic loci, including telomeres, centromeres, and a distinct subset of gene promoters. Further studies are needed to unravel the biological significance of such interaction within human living cells. Our working hypothesis is that G4 repeats may exist as primary structural elements, able to drive the recruitment of architectural proteins to ultimately reshape the higher-order genome folding during important physiological processes such as cell development, differentiation and migration, thus favouring the establishment of the required gene expression program.

DATA AVAILABILITY
The mass spectrometry proteomics data have been deposited to the ProteomeXchange Consortium via the PRIDE [59] partner repository with the dataset identifier PXD026505 and 10.6019/PXD026505.