Homodimerization regulates an endothelial specific signature of the SOX18 transcription factor

Abstract During embryogenesis, vascular development relies on a handful of transcription factors that instruct cell fate in a distinct sub-population of the endothelium (1). The SOXF proteins that comprise SOX7, 17 and 18, are molecular switches modulating arterio-venous and lymphatic endothelial differentiation (2,3). Here, we show that, in the SOX-F family, only SOX18 has the ability to switch between a monomeric and a dimeric form. We characterized the SOX18 dimer in binding assays in vitro, and using a split-GFP reporter assay in a zebrafish model system in vivo. We show that SOX18 dimerization is driven by a novel motif located in the vicinity of the C-terminus of the DNA binding region. Insertion of this motif in a SOX7 monomer forced its assembly into a dimer. Genome-wide analysis of SOX18 binding locations on the chromatin revealed enrichment for a SOX dimer binding motif, correlating with genes with a strong endothelial signature. Using a SOX18 small molecule inhibitor that disrupts dimerization, we revealed that dimerization is important for transcription. Overall, we show that dimerization is a specific feature of SOX18 that enables the recruitment of key endothelial transcription factors, and refines the selectivity of the binding to discrete genomic locations assigned to endothelial specific genes.


INTRODUCTION
Understanding how transcription factors (TFs) orchestrate gene expression to instruct a phenotypic output is fundamental to biology and future therapeutics. Dy-namic control of gene transcription is particularly important during development as cell lineages are established. In mammals, many members of the SOX SRYrelated High-Mobility Group (HMG) box family act as central regulators of gene expression to govern cell fate in a variety of key processes (4)(5)(6)(7), such as vascular network assembly (8), cartilage formation and sex determination (9,10), neurogenesis (11), as well early stage development and embryonic stem cell pluripotency (12). These crucial roles are highlighted by the fact that many mutations in SOX genes cause severe congenital diseases in humans, such as XY sex reversal (SRY), campomelic dysplasia (SOX9), Waardenburg-Hirschsprung syndrome (SOX8) and anophthalmia-esophageal-genital syndrome (SOX2). A prominent feature of the SOX proteins is the presence of a 79 amino acids region which delineates the HMG-box, the DNA binding and bending domain. The HMG-box is present in all groups of SOX proteins (A-H, 20 SOX) and is classically used as a reference to align and compare these proteins since this region is highly conserved (7). It is made up of 3 ␣-helixes, whereby ␣1 and ␣2 are involved with DNA binding while ␣3 is involved in protein-protein interactions (13). The HMG-box recognizes a heptameric consensus sequence (5-A/TA/TCAAA/TG-3) on the DNA. The activity of SOX proteins at these binding locations is modulated by varying the combinations of protein-protein interactions which can cause activation or repression of transcription (14)(15)(16). In addition to the HMG-box common to all SOX genes, individual subgroups possess other functional domains that include: transactivation domain (TAD), coiled-coil, and proline-rich domains. The presence of these domains within the same group is likely to account for redundancy mechanism, an essential safety net to insure proper embryonic development (4). In particular, SOX proteins within the F group (SOX7, SOX17 and SOX18) reg-ulate various aspects of vascular development (17)(18)(19) and often do so in a redundant manner (20).
Nevertheless, SOX18 is central to both angiogenesis and lymphangiogenesis (21). In human, several mutations in SOX18 are linked to the Hypotrichosis-Lymphedema-Telangiectasia and Renal Syndrome (HLTRS). HLTRS is a rare syndrome characterized by defects in hair follicle development (hypotrichosis), fluid accumulation in the limbs (lymphedema), presence of haemorrhagic blood vessels (telangiectasia), and renal defects as probands develop to adulthood. These features indicate that SOX18 function is required for the proper development of blood and lymphatic vasculature in human during embryogenesis (17,22). A series of de novo mutations causing HLTRS have been identified within the HMG domain and the TAD and have been associated to a broad spectrum of the syndrome severity (23). The murine counterpart of HLTRS is caused by natural mutations (allelic series: Ragged mice) in SOX18, which lead to truncated proteins. The truncated SOX acts as a dominant-negative protein that suppresses the endogenous activity of SOX7 and SOX17 (24,25). The phenotype of the Ragged mutant mice is characterized by severe vascular dysfunction, leading to the loss of endothelial cell junction integrity, which gives rise to a generalized haemorrhage, loss of lymphatic vascular function and a blockade of hair follicle cycle, mirroring the human syndrome (21). Despite an integral role for the SOX18 genetic pathway in vascular development there is a lack of information regarding its molecular mode of action.
Self-association, from dimers to higher-order oligomers, is often used by proteins to modulate activity and tune cellular responses (26). The capacity for self-association is even more significant for TFs since this ability modulates the physiological rate of gene transcription and may lead to deleterious effects when uncontrolled (3,27,28). It is particularly relevant in the case of SOX (29) proteins. An example of such a potent and functionally dynamic TF is SOX9. SOX9 can dimerize upon binding to DNA (30). Many studies have shown that the configuration of SOX9 as a monomer or a DNA-bound dimer leads to the binding of different enhancers, inducing differential gene transactivation (31,32). The relevance of the dimer function is dramatically illustrated in campomelic dysplasia disorder (33,34) where mutations interfere with SOX9 dimerization ability.
During a screen of small-molecules that could disrupt lymphangiogenesis in zebrafish, we showed that the SOX18 interaction network could be targeted pharmacologically (35). This work suggested that formation of SOX18 complexes is crucial for vascular development, and we set out to investigate the potential link between protein-protein interactions and target gene selectivity. In the present study, we used single molecule techniques and protein binding assays to study the behaviour of SOXF proteins in vitro. Here, we demonstrate that SOX18 has a unique ability to homodimerize, as opposed to other members of the F group. Using systematic truncation analysis, we identified and characterized a novel dimerization domain that is highly conserved during evolution. We validated this discovery in vivo, by developing a split-GFP biosensor of SOX18 dimerization in zebrafish larvae. Further, we found a specific palindromic doublet of SOX-binding consensus sequences in the human genome, evidence for the formation and importance of the SOX18 dimer. Gene ontology (GO) enrichment analysis of the subset of genes assigned to the SOX18 homodimer reveals a specific endothelial signature and include genes essential to vascular development. Finally, we validated that pharmacological disruption of SOX18 dimer interferes with the expression level of a subset of genes, linking physical interaction and transcriptional output.

Materials and Methods
Plasmid preparation and cell free-expression. Proteins were genetically encoded with enhanced GFP (GFP), mCherry and cMyc (myc) tags, and cloned into the following cell free expression Gateway destination vectors: N-terminal GFP tagged (pCellFree G03), N-terminal Cherry-cMyc (pCell-Free G07) and C-terminal Cherry-cMyc tagged (pCell-Free G08) (36). Human RBPJ (Q06330 SUH HUMAN), GATA 2 (P23769) and MEF2C (BC026341) Open Reading Frames (ORFs) were sourced from the Human OR-Feome collection, versions 1.1 and 5.1, and the Human Orfeome Collaboration OCAA collection (Open Biosystems), as previously described (37) and cloned at UNSW. The entry clones pDONOR223 vectors were exchanged with the ccdB gene in the expression plasmid by LR recombination (Life Technologies, Australia). The full-length human SOX18 gene was synthesized (IDT DNA, USA) and transferred to pCellFree vectors using Gateway cloning. Translation competent Leishmania tarentolae extract (LTE) was prepared as previously described (38,39). GFP-and Cherrytagged proteins were expressed separately for 15 min at 27 • C to start transcription, then were mixed and co-expressed for 3 h.
Preparation of the SOX18 truncation constructs. The DNA sequences encoding the desired domains were obtained by PCR amplification of the SOX18 WT construct with the combination of primers listed in Supplementary Table S1. PCR amplification was performed using Phusion polymerase. The PCR fragments were isolated by electrophoresis and purified using Promega Wizard ® SV gel and PCR clean up system. These fragments were then cloned into the Gateway destination vectors (pCell-Free G03 or pCellfree G08) by LR recombination (Life Technologies, Australia) as described previously.
Construction of the SOX18DIM/SOX7 swap construct. The SOX18DIM/SOX7 swap construct was made by exchanging the 50 amino acids following the HMG box of SOX18 WT: Y R P R R K K Q A R K A R R L E P G L L L P G L A P P Q P P P E P F P A A S G S A R A F R E L P P L with the 50 amino acids following the HMG box of SOX 7 WT Y R P R R K K Q A K R L C K R V D P G F L L S S L S R D Q N A L P E K R S G S R G A L G E K E D R G The swap construct was obtained as a gBlock (IDT), and was exchanged with the ccdB gene in the donor plasmid (pDONOR223) by BP recombination (Life Technologies, Australia), then with the ccdB gene in the expression Nucleic Acids Research, 2018, Vol. 46, No. 21 11383 plasmid (pCellFree G03 and pCellFree G08) by LR recombination (Life Technologies, Australia) as described previously.
Construction of the SOX7DIM/SOX18 swap construct. The SOX7DIM/SOX18 swap construct was created by exchanging the 50 amino acids following the HMG box of SOX7 WT: with the 50 amino acids following the HMG box of SOX 18WT Y R P R R K K Q A R K A R R L E P G L L L P G L A P P Q P P P E P F P A A S G S A R A F R E L P P L The swap construct was obtained as a gBlock (IDT), and was exchanged with the ccdB gene in the donor plasmid (pDONOR223) by BP recombination (Life Technologies, Australia), then with the ccdB gene in the expression plasmid (pCellFree G03 and pCellFree G08) by LR recombination (Life Technologies, Australia) as described previously.
Multiple sequence alignment. Putative SOX18 homodimeriation domains from 9 different species (Mus musculus, Danio rerio, Xenopus tropicalis, Gallus gallus, Anolis carolinensis, Orcinus orca, Monodelphis domestica, Latimerica chalumnae and Callorhinchus milii) were obtained using the 50 amino acid human SOX18 homodimerization domain as a query in Protein Blast (NCBI). Multiple sequence alignment of the SOX18 homodimer domain of 10 different species (including human), as well as the corresponding 50 amino acid region of the human SOXF family (SOX7, SOX17 and SOX18) was performed using Clustal Omega.
AlphaScreen assay. The assay was performed as previously described (37,40), using the cMyc detection kit and Proxiplate-384 Plus plates (PerkinElmer). The LTE lysate co-expressing the proteins of interest was diluted in buffer A (25 mM HEPES, 50 mM NaCl). For the assay, 12.5 l (0.4 g) of Anti-cMyc coated Acceptor Beads in buffer B (25 mM HEPES, 50 mM NaCl, 0.001% NP40, 0.001% casein) were aliquoted into each well. This was followed by the addition of 2 l of diluted sample, at different concentration, and 2 l of biotin labeled GFP-Nanotrap in buffer A. Then 2 l (0.4 g) of Streptavidin coated Donor Beads diluted in buffer A was added, and the plate was incubated in the dark for 1.5h min at room temperature. The AlphaScreen signal was measured on an Envision Plate Reader (PerkinElmer), using manufacturer's recommended settings ( exc = 680/30 nm for 0.18 s, em = 570/100 nm after 37 ms). The resulting bell-shaped curve is an indication of a positive interaction, while a flat line reflects a lack of interaction between the proteins. Measurements of each protein pair were performed in triplicate. A binding index was calculated as: BI = All experiments were performed using independent and technical triplicates (N = 6, n = 3).
Single-molecule spectroscopy. GFP-and Cherry-tagged SOX18 proteins were expressed separately for 15 min at 27 • C to initiate the transcription and then were mixed and co-expressed for 3 h. 20 l samples were used for each experiment. These were placed into a custom-made silicone 192-wells plate equipped with a 70 × 80 mm glass coverslip (ProSciTech Australia). Plates were analysed on a Zeiss LSM710 microscope with a Confocor3 module, at room temperature. Two lasers (488 and 561 nm) were co-focussed in the well solution using a 40 × 1.2 NA water immersion objective (Zeiss, Germany); fluorescence was collected and split into GFP-and Cherry-channel by a 560 nm dichroic mirror. The GFP emission was further filtered by a 505-540 nm band pass filter and the Cherry emission was filtered by a 580 nm long-pass filter (41).
in vitro mRNA synthesis of BiFC reporters and microinjection into zebrafish embryos. Restriction enzyme digestion was performed to linearize 5 g of each mVENUS-based BiFC reporter construct. Following linearization, BiFC reporter DNA was purified (DNA Clean & Concentrator™-5 Kit, Zymo), 1 g of which was used as a template for in vitro mRNA synthesis (mMESSAGE mMACHINE SP6 RNA Synthesis Kit, Ambion). Synthesized BiFC reporter mRNA was purified (RNA Clean & Concentrator™-5 Kit, Zymo) and 1 nl of 100 ng/l mRNA was co-injected with phenol red into the yolk sac of single-cell zebrafish embyros. Embryos were maintained in E3 media (5.0 mM NaCl, 0.17 mM KCl, 0.33 mM CaCl, 0.33 nM MgSO 4 ) at 28 • C until they reached 4-5 hpf.
Zebrafish embryo imaging. 4-5 hpf zebrafish embryos were screened for fluorescence using a fluorescent stereo microscope (M165FC, Leica). Embryos identified as fluorescent had the chorion manually removed prior to being embedded in a 2% methylcellulose-containing 35 mm glass-bottom dish. Zebrafish embryos were imaged live using confocal laser scanning microscopy (LSM710, Zeiss), whereby a 514 nm laser was used to visualise mVENUS-based BiFC reporters. Fluorescent and brightfield images were taken as Z-stacks at 10 × magnification with a 0.45 NA dry objective and at 40 × magnification with a 1.3 NA oil objective. Post-acquisition image processing was performed using FIJI (FIJI Is Just ImageJ) to generate maximum intensity projections and fluorescence/brightfield composites. Timelapse images were taken over a 10 h period.
Purified full-length mouse SOX18. A cDNA clone of mouse Sox18 was PCR amplified and cloned into the pOPIN-GST vector, to generate N-terminally tagged HIS-GST-SOX18. A sequence-verified construct was cotransfected with flashBACULTRA (Oxford Expression Technologies, Oxford, United Kingdom) baculovirus DNA onto Spodoptera frugiperda Sf9 cells to obtain recombinantly expressed HIS-GST-SOX18. High Five cells (BTI-TN-5B1-4) in Sf-900™ II serum-free medium were infected at cell density of 1.5 × 10 6 cells/ml with a multiplicity of infection (MOI) of 5 PFU/cell, and incubated at 21 • C for 144 h before harvest. The cell pellet from 100 mL of expression culture was resuspended in 30 mL of phosphate lysis buffer (50 mM sodium phosphate, 500 mM sodium chloride, 1% Triton X-100, 2 mM magnesium chloride, one tablet of cOmplete Protease Inhibitor Cocktail, pH 7.5) and sonicated on ice for 20 s. The cell lysate was centrifuged at 17 000 × g for 40 min at 4 • C. Supernatant was incubated with Benzonase Nuclease (Merck Millipore) for 1 h at room temperature for DNA degradation, before being mixed with 500 l GST resin (GE Healthcare Life Sciences, Sweden) and incubated on a rotating wheel at room temperature for 1 h. The sample was centrifuged at 500 × g for 1 min to remove unbound protein in the supernatant. The resin was further washed with 50 resin volumes (RV) wash buffer (50 mM sodium phosphate, 500 mM NaCl, pH 7.5), with unbound protein removed by centrifugation as above. Bound protein was eluted from the resin with 3 × 1 RV elution buffer (50 mM Tris, 500 mM NaCl, 10 mM reduced glutathione, pH 8.0), collecting the supernatant by centrifugation as above.
Purified mouse SOX HMG fragments. The HMG domain of mouse SOX18 was BP cloned from cDNA templates (IMAGE cDNA clone IDs: Sox18: 3967084) into a pDONR™221 pENTRY vector, sequenced and recombined into a pETG20A or a pHisMBP expression plasmid using Gateway®LR Technology (42). Constructs were transformed into Escherichia coli BL21(DE3) cells (Luria-Bertani, 100 g/ml Ampicillin) and purified as described above.
Design of the synthetic palindromic probes. A doublestrand (ds) 37 bp-long DNA probe centred on a synthetic IR5 element was designed. GC-rich flanking and spacer sequences were used to avoid confounding off-site protein-DNA binding. Three spacer lengths were designed: 1 (IR1), 5 (IR5) and 10 (IR10) bp. The DNA probes were obtained from IDT (IDT DNA, USA).
Sequences for the probes are: EMSA (electrophoresis mobility shift assay). EMSAs were performed using a DNA elements with 5 cy5 (Cyanine Fluorescence polarization assay. Protein-DNA binding was measured by fluorescence polarization, using fluorescein 5 -phosphate-tagged ds DNA probes. Three spacer lengths were tested: 1, 5 and 10 bp. The DNA-binding assay was performed in 20 l, in black 384-well plates, using mouse full-length SOX18, or a SOX-HMG fragment incubated in 30 mM HEPES buffer pH 7.5, supplemented with 100 mM KCl, 40 mM NaCl, 10 mM ammonium acetate, 10 mM guanidinium HCl, 2 mM MgCl 2 , 0.5 mM EDTA, and 0.01% Nonidet NP-40. Protein concentrations ranging from 5 to 150 nM, in presence of a constant 5 nM labelled DNA. Controls consisted of: free labelled DNA (low FP milli-Polarization index, mP); labelled DNA in presence of protein (negative control, high mP); labelled DNA and protein in presence of 400 times excess of unlabelled DNA (positive control, low mP). Plates were sealed, briskly agitated in the dark at room temperature for 5 min then centrifuged at 1800 rpm for 10-20 s to flatten the sample meniscuses.
Plates were allowed to equilibrate for another 15 min at room temperature, before reading fluorescence polarization on a Tecan M1000Pro ( exc = 485 nm, em = 525 nm). All experiments were performed using independent and technical triplicates (N = 3, n = 3). At given constant temperature and viscosity, the fluorescence polarization index (mP) is proportional to the molecular size of binding complex. Binding data were fitted to the Hill equation using GraphPad Prism version 7.03 for Windows, GraphPad Software (La Jolla, CA, USA). RT-PCR Dose effect of Sm4 treatment. Total RNA was extracted using a RNeasy Mini kit (Qiagen, 74106) according to the manufacturers protocol, including on column DNA digestion. cDNA was synthetised from 1g of purified RNA using the High Capacity cDNA Reverse Transcription kit (Life Technologies, 4368813). Amplification and quantitation of target cDNA was performed in technical triplicates of at least three biological replicates using the SYBR green (Life Technologies, 4312704) method. Reactions were run in 10 l in 384-well plates using a ViiA 7 Real-Time PCR system. The housekeeper gene GAPDH was selected based on the stability of their expression after validation by cross-referencing against expression of other housekeeper genes including 18s ribosomal RNA and betaactin. Primer efficiencies were calculated using LinRegPCR, and amplification data was analysed using ViiA7 software and the Q-gene PCR analysis template.

Formation of complexes within the SOXF group
To assess whether members of the SOXF group have the potential to self-interact, we first measured physical interactions using different in vitro assays. Transcription factors are notoriously difficult proteins to work with, and SOXF proteins are no exception. The small DNA-binding domain can be expressed and purified in recombinant form, but the fulllength proteins are difficult to obtain. The N-terminal and C-terminal domains of SOX18 are likely intrinsically disordered, reducing further the probability of high-resolution structural studies using crystallography.
Therefore, in order to characterize the behaviour of fulllength SOX7, SOX17 and SOX18 proteins, we turned to cell-free protein translation. In recent years, our laboratory has successfully expressed and studied difficult targets using a eukaryotic cell-free system based on Leishmania tarentolae (39,40,43). When supplemented with plasmids encoding the SOXF proteins, the cell-free system produces full-length proteins in 3 hours, with minimal truncations (Supplementary Figure S1).
One of the advantages of cell-free expression is the ability to co-express different constructs, and we used this to investigate protein self-oligomerisation. Specifically, we coexpressed GFP-and mCherry-tagged SOXF proteins and used the two tags for affinity capture and single-molecule fluorescence detection. The proteins were labelled at either end (N-or C-terminal) to assess the effect of the fluorophore on protein-protein interactions (PPIs) (Supplementary Figure S2).
First, we performed a proximity assay (AlphaScreen, AS) to measure interaction between protein pairs. The assays were performed directly from the cell-free co-expressions, without enrichment or purification steps that could perturb weak complexes. In AS, the interaction between the two proteins brings donor and acceptor beads into close proximity, generating a luminescent signal ( Figure 1A). The amplitude of the signal produced is proportional to the degree of physical interaction between two proteins. Previously reported interactions such as the SOX9 dimer (44), SOX18-MEF2C (45) and SOX18-RBPJ (35) were used as positive controls ( Figure 1B) whereas the known lack of interaction between SOX18 and GATA2 was used to define a baseline level for the AS signal. When testing the SOXF group, AS revealed a strong binding between the SOX18-GFP/SOX18-mCherry pair while SOX7 and SOX17 did not form homodimer complexes ( Figure 1C). To verify that the genetically encoded tags did not prevent interaction, we tested different configurations of the fluorophores in this assay and identified that the N-GFP-SOX18/ SOX18-C-mCherry pair gave the strongest AS signal. For SOX18, all other configurations did lead to a positive, albeit weaker AS signal, while none of the combinations in the case of SOX7 and SOX17 yielded a positive AS signal (Supplementary Figure S2).
To further characterise SOX18 dimer complexes and their ability to recruit protein partners, we took advantage of single molecule spectroscopic assays. This approach measures the photon emission of individual molecules of GFP or mCherry in a defined confocal volume ( Figure 1D). After co-expression of GFP and mCherry tagged SOX18 proteins, the samples were rapidly diluted to working concentrations of approximately 100 pM. In these conditions, individual protein complexes are observed as they travel through the confocal excitation volume. A single GFP or mCherry fluorophore can emit a maximum of 90-100 photons per millisecond (40), and we used this calibration to quantify the size of complexes. In the trace obtained for GFP and mCherry tagged SOX18, we did not detect large bursts of fluorescence (>200 photons) that would indicate that the proteins form large oligomers or non-specific aggregates. We did observe the presence of slightly larger bursts in both GFP and mCherry channels, with intensities in the 100-200 photon range ( Figure 1E, arrows). These bursts suggest the formation of SOX18 complexes containing two GFP or two mCherry fluorophores.
This observation was further confirmed using twocolours coincidence detection, as shown in Figure 1E. The fluorescence trace shows frequent co-diffusion of two SOX18 proteins tagged separately with GFP and mCherry. In this experiment, GFP-labelled and Cherry-labelled protein were expressed separately in LTE then mixed together and allowed to interact for 1h before the assay. In all cases, the mixtures were diluted to pM concentrations immediately before testing. A fluorescence signal was recorded in the GFP channel and the Cherry channel over 500 s. The signal was then analyzed as a succession of individual events. For each event, a ratio of Cherry fluorescence to the total fluorescence is calculated. The number of events for each ratio C was counted and normalized to the total number of events. This fraction of events P(C) is plotted as a function of coincidence ratio (C). Gaussian curves are overlayed on the histograms: the green Gaussian curve corresponds to GFP only, the red Gaussian to Cherry only; the yellow Gaussian highlights the presence of both GFP and Cherry in the focal volume.
At these concentrations, the random simultaneous presence of two proteins in the small detection volume is highly improbable. Thus, the method provides a direct visualization of protein-protein binding. In the single molecule coincidence assay, binding stoichiometry can be inferred by measurement of the coincidence ratios of the protein complexes. By simply measuring the fraction of mCherry in the total fluorescent bursts, protein stoichiometries can be plotted, which clearly show that SOX18 forms a 1:1 dimer with a coincidence ratio C = mCherry/(GFP + mCherry) = 0.5 ( Figure 1F).
Taken together, AS and single molecule coincidence results firmly demonstrate that SOX18 has the ability to form a dimer, unlike SOX7 or SOX17.

SOX18 dimer recruits specific protein partners
The identification of SOX18 homodimers prompted us to determine the stoichiometric ratios for different assembly complexes formed with protein partners such as MEF2C, RBPJ and RXRA ( Figure 1F). In this assay, we used GATA2 as a negative control for SOX18 interaction. The frequency distribution of coincidence ratio between mCherry-SOX18 and GFP-tagged MEF2C or RBPJ correspond to 2:1 interaction (C = 0.66), whereas binding to RXRA occurs in a 1:1 ratio (C = 0.5). These data provide evidence that the SOX18 dimers recruit MEF2C or RBPJ whereas monomeric SOX18 is able to recruit RXRA monomers.

SOX18 homodimer forms in vivo in zebrafish larvae
As a demonstration that SOX18 has the capability to homodimerise in vivo during development, we investigated the dimer formation using a zebrafish-based model system. To follow the formation of the SOX18 dimer in developing zebrafish larvae, we engineered a fluorescent reporter based on a split-fluorescent protein (split-FP) biosensor and took advantage of this construct in transient transgenic reporter experiments. Bimolecular fluorescence complementation (BiFC) assays are powerful tools for the visualisation of protein-protein interactions in both cell and ze-Nucleic Acids Research, 2018, Vol. 46, No. 21 11387 brafish model systems (46,47). We found that a mVENUSbased split-FP biosensor was the most suitable for use in zebrafish embryos for the visualisation of SOX18 dimerization events. The selected mVENUS biosensor incorporates the N-terminus of mVENUS fragmented at amino acid 155 on the N-terminus of one SOX18 (NmVENUS155-SOX18), and the C-terminus of mVENUS fragmented at amino acid 155 on the N-terminus of another SOX18 (CmVENUS155-SOX18). These fragments were tagged to SOX18 via a flexible 3xGGGS linker ( Supplementary Figure S3). The mRNA encoding these biosensors was coinjected into zebrafish embryos at the single-cell stage to promote ubiquitous expression of this TF during early stage development (Figure 2, top left panel).
Live imaging of the biosensor-injected larvae at around 4-5 hpf revealed the presence of mVENUS expression specifically in the nuclei (Figure 2, middle panel and Movie 1). In parallel, FosLZ/JunLZ heterodimers coupled to the BiFC reporter system were used as a positive control. Zebrafish embryos injected with a similar concentration of this FOSLZ/JunLZ biosensor mRNA display fluorescence in both nuclear and cytoplasmic localisations at the same developmental stage (Figure 2 right panel). To further validate the specificity of the split-FP biosensor assay, we established a negative control using a truncated version of SOX18 protein that does not harbour the HMG-box and nuclear localisation sequence (NLS) ( 84-205). Transgenic zebrafish embryos expressing this mutant split-FP biosensor did not show any fluorescent signal in cell nuclei (Figure 2 bottom left panel). Therefore, the use of a BiFC reporter system further confirmed the capability of SOX18 to form a dimer in vivo.

Mapping of Sox18 dimerization domain
To pinpoint a putative SOX18 dimerization (DIM) domain, we generated a series of truncated constructs and tested their binding ability in AS. The truncations were designed based on the known domains of SOX18 full-length, as shown in Figure Figure  3C. This 50 amino acids region is highly conserved throughout evolution in SOX18. However, this region is not conserved in SOX7 or SOX17.
To validate the importance of this region in the dimerization process ( Figure 3D), we swapped the 50 amino acids post-HMG-box of SOX18 with the corresponding 50 amino acids of SOX7 (SOX18DIM/SOX7-swap mutant). We also performed the reciprocal experiment whereby the putative SOX18 DIM domain was inserted into the corresponding site on the SOX7 protein. This chimeric protein corresponds to a SOX7DIM/SOX18-swap mutant. The SOX7 sequence was used since this TF was shown not to dimerize in AS and single molecule spectroscopy assays.
The homodimerization ability of the two swap mutants were tested in AS ( Figure 3E) and single molecule twocolours coincidence ( Figure 3F). In both assays, insertion of the exogenous SOX7 region into SOX18 caused a loss of interaction, indicating that this 50 amino acids region encompasses a motif that is necessary to the dimerization process. Conversely, insertion of the SOX18 DIM domain enabled the SOX7DIM/SOX18 swap mutant to homodimerize when SOX7 WT does not. These experiments establish that the DIM domain is sufficient to drive the dimerization process. The fact that dimerization is not restored to the same level for SOX7DIM/SOX18 as compared to SOX18 WT indicates that the dimerization is likely to be stabilized by secondary interactions outside the DIM domain that may be specific to SOX18.
Multiple sequence alignment of SOX18 DIM domain across 10 different species shows that the residues are mostly conserved throughout evolution ( Figure 3C), especially in the region next to the third ␣-helix of the HMG domain (aa 161 to 168) as well as the FRELPPL motif, located in the last 17 amino acids preceding the TAD domain (aa 197-203). Further comparison of the DIM domain within the Human SOXF group reveals that the hydrophobic sequence FRELPPL is a specific feature of SOX18--the equivalent sequences in SOX7 and SOX17 are less hydrophobic--suggesting a potential role for this sequence in SOX18 homodimerization. To further investigate the role of this motif in SOX18/SOX18 interaction, we performed an AS assay between full length SOX18 and a deletion mutant that lacks the FRELPPL motif (197)(198)(199)(200)(201)(202)(203). The deletion of this motif suppresses dimer formation (Supplementary Figure S4). In contrast, SOX18 FL was still able to form a dimer with the SOX18 deletion mutant lacking the first hydrophobic motif ( 161-168) ( Supplementary Figure S4). The DIM domain is a novel and unique feature of SOX18, with key hydrophobic motifs involved in the homodimerization process.

A SOX18 homodimer binding motif is present on the chromatin
In order to find a trace of the SOX18 dimer in the genome, we investigated the presence of a dimer-binding motif on the chromatin. To this end, we applied a motif based sequence analysis tool, Spaced Motif Analysis (SpaMo) (48), to search for an enrichment of a secondary SOX motif on the chromatin at a fixed distance from genomic SOX18 binding sites. We analysed the spacing between primary SOX18 binding sites and a putative secondary SOX site on the reported 23,635 peaks from the SOX18 ChIP-seq analysis (35), and identified a signature dimer motif that corresponds to a palindrome of the archetypical SOX motif 5 -AACAAT-3 , spaced by 5 nucleotides (Inverted repeat 5, IR5, P-value = 0.005) ( Figure 4A, B). Since SOX proteins have a highly conserved HMG box and a very similar consensus-binding motif (5 -AACAAT-3 or the reverse complement 5 -ATTGTT-3 ), the spacing enrichment was identified for three combinations of SOX motifs: SOX18-SRY (IR5a), SOX18-SOX18 (IR5b) and SRY-SRY (IR5c), all corresponding to the inferred motif 5 -AC/TAATnnnnnATTGT-3 ( Figure 4B). The IR5 motif closely resembles known dimer motifs identified for SOXE proteins such as SOX9 25,26 . Electrophoretic mobility shift assay (EMSA) experiments demonstrated that two SOX18-HMG domains, as well as two SOX9-HMG domains could simultaneously bind to this IR5 motif (Supplementary Figure S5).
To further confirm this observation, we took advantage of a fluorescence polarisation (FP) assay using FAMlabelled oligonucleotides harbouring IR motifs with different spacer lengths (IR1: 1 bp, IR5: 5 bp and IR10: 10 bp). In this assay, as proteins bind to DNA, the increase in molecular weight, as the protein-DNA complex forms, is reflected by an increase in the FP index (mP). This approach revealed that SOX18 full-length protein produces a maximum binding activity (higher mP index) in presence of an IR5 binding site ( Figure 4C). There is approximately twice as much occupancy of SOX18 full-length on a probe that contains an IR5 motif, compared to one that has an IR1 motif ( Figure  4C), since steric hindrance prevents cohabitation when the spacer is shorter. Occupancy on an IR1 probe could be restored to levels similar to those seen for an IR5 probe by using a SOX18-HMG fragment (aa 1-109), which allows for more physical overlap ( Figure 4D).

Sox18 dimerization is not simply a juxtaposition event on the DNA
In order to tease apart a cooperative binding mechanism from a co-binding event that does not involve a PPI, we performed AS experiments using SOX18 and the SOX18DIM/SOX7-swap mutant in presence of an oligonucleotide harbouring the IR5 palindromic sequence. The lack of dimerization capability of the mutant protein only allows monomeric binding. Incubation of the IR5 probe in presence of SOX18 reaches a plateau phase almost instantly with only a mild increase of the maximum AS signal observed. In contrast, the SOX18DIM/SOX7-swap mutant responded in a dose-dependent manner to an increase of the IR5 probe concentration ( Figure 5A). The main difference between the wild type and the mutant protein lies in their abilities to elicit protein-protein interactions, and in particular homo-dimer formation.
Next, we evaluated the effects of pharmacologically disrupting SOX18 dimer formation in this context. The small molecule inhibitor Sm4 interferes with SOX18-dependent PPIs, including its homodimerization (35). As previously described, Sm4 significantly disrupted the SOX18 homodimer in absence of IR5 motif-containing oligonucleotide with an IC 50 value around 3M ( Figure 5B). However, in presence of the IR5 probe, the AS signal intensity was unperturbed upon addition of Sm4 at up to 100 M (Figure 5C). This suggests that despite disruption of the SOX18 dimer formation caused by Sm4, two SOX18 monomers can still co-bind to the IR5 motif and produce AS signal, in a similar fashion to the SOX18 DIM/SOX 7 swap protein.
Finally, when AS was performed in the presence of DNA harbouring a single consensus SOX binding motif, we observed a small increase of the signal strength as the concentration of probe increases (to 1 M) until all dimers are displaced by binding to individual DNA probes (>5 M) ( Figure 5D). This effect is specific to the SOX18 dimer (Supplementary Figure S6). Taken together, these experiments show that formation of the SOX18 dimer does not require homodimerization domain across 10 different species, using the human SOX18 homodimer domain as a reference. Residues are grouped into colours, based on their chemical and physical properties. Bottom: multiple sequence alignment of the putative SOX18 dimerization domain with the corresponding domains in SOX17 and SOX7 proteins. The 50 amino acids directly following the high mobility group (HMG) domain of the SOXF protein family reveals residues that are non-conserved and therefore possibly involved in the unique ability for SOX18 to homodimerize. Residues that are not conserved between all SOXF family members are highlighted in red. For all, fully conserved residues are marked by an asterisk (*), partially conserved residues that retain high similarity are marked by a colon (:), partially conserved but weakly similar residues are marked by a full stop (.) and residues that have no conservation are left blank (). Protein alignment was performed using Clustal Omega. (D) Schematic representation of the constructs: (top) SOX18 WT and SOX7 WT; (bottom) SOX18DIM/SOX7 and SOX7DIM/SOX18 swaps. (E) Typical AlphaScreen curves obtained for SOX18 WT with MEF2C and SOX7DIM/SOX18 showing respectively a positive signal above 10 000 cps. Lack of signal for the SOX18DIM/SOX7 swap indicates a loss of the dimerization propensity. SOX18-GATA2 is used as a negative control. (F) Value of coincidence obtained from the two-colours coincidence experiments performed on SOX18 WT, SOX18/7, SOX7/18 and SOX7 WT co-expression as a GFP/Cherry pair. Data were analysed as in Figure 1 and the percentage of coincident events (0.25 < C < 0.75) was plotted for the different constructs, reflecting their ability to homodimerize. the presence of IR5 (contrary to SOX18DIM/SOX7) even though the dimer can be stabilized by the presence of DNA.

SOX18 dimer has an endothelial specific signature
Analysis of the SOX18 ChIP-seq data set revealed 747 unique genomic regions harbouring at least one of the three IR5 motifs in their sequence. The IR5 motif was identified scanning for a more or less relaxed secondary SOX binding site in the vicinity of a primary SOX motif. We chose 3 different combinations of motifs since the consensus binding sequence for SOX proteins is short and often degenerated (49). To be exhaustive, we considered the following variations: SOX18-SRY (IR5a), SOX18-SOX18 (IR5b) and SRY-SRY (IR5c). Genomic regions enrichment of annotations tool (GREAT) (50) analysis of the genome-wide distribution of the SOX18 ChIP-seq peaks containing an IR5a-c motif assigned to these sequences a total of 964 genes. Genotype-Tissue Expression (GTEx) analysis of this  Figure S2). Only IR5 and not IR1 can accommodate the dimer of full-length SOX18, but both can bind as efficiently to the HMG box.
gene list revealed that about one-third of them are significantly expressed by endothelial cells (Supplementary Figure S7). In particular, some putative regulatory sequences containing an IR5a-c motif have been assigned to specific vascular endothelial markers that include, but is not limited to, FLT1, Endomucin, SEMA3D, MEF2A, MAP4K4 and NRP1, as well as other genes known to be involved in angiogenesis such as IL33 and KLF4 (Supplementary Figure  S7). Further analysis of SOX18 ChIP-seq peaks containing IR5a-c motif using EpiExplorer software (51) enabled us to define the overlap with histone marks and DNase hypersen-sitivity regions publically available from the ENCODE consortium (Supplementary Figures S8-S9 and Supplementary  Table S2). A large portion of the peaks intersect with active regulatory regions of the HUVEC genome, with 371 regions showing at least 50% overlap with no less than two histone marks for active transcription. Conversely, some IR5 motifs (∼50%) overlap with at least one repressive mark (H3K27me3 or H3K36me3) (Supplementary Figure S8A). This observation indirectly suggests that the SOX18 dimer has the potential to act as both a repressor and an activator of transcription. This capability is likely to be modulated by protein partner recruitment and different cell subtype.
To further assess the functional relevance of the SOX18 dimer in endothelial cells, we took advantage of a previously published RNA-seq dataset where SOX18 was overexpressed in HUVECs, in presence and absence of the small molecule inhibitor Sm4 (35). The over-expression of SOX18 caused a broad range of genes to be up-or down-regulated (3621, 53% up) ( Figure 6A, grey dots, Supplementary Figure S10A). GO analysis showed enrichment for biological processes involved in angiogenesis (1.67-fold, FDR < 0.01), hematopoiesis (1.52-fold, FDR < 0.01) and wound healing (1.44-fold, FDR < 0.05), typical SOX18 functions (Supplementary Figure S10B).
This list of SOX18-responsive genes was then crossreferenced to the list of genes associated to IR5a-c motifs in order to ascertain which of the putative dimer genes would be most likely to be biologically relevant in an endothelial cell context. We found a set of 261 genes that met these criteria, being both responsive to SOX18 overexpression and having at least one of the 3 IR5 motifs in their putative regulatory elements ( Figure 6A, red/blue/green dots). GO analysis of this gene subset revealed a strong enrich-ment for endothelial and angiogenic terms within biological processes, particularly negative regulation of endothelial cell proliferation (21.3-fold enrichment, FDR < 0.05), positive regulation of angiogenesis (7.49-fold, FDR < 0.05) and positive regulation of vascular development (6.78-fold, FDR < 0.05) ( Figure 6B). The enrichment for these terms in the IR5 gene set was much higher than in the non-filtered set of SOX18 responsive genes. This suggests that the nondimeric and the dimeric forms are involved in distinct biological processes (Supplementary Figure S10C).
To further validate these findings, we analysed the effect of the protein-protein interaction disruptor Sm4 on SOX18 responsive genes ( Figure 6C). Several dimer genes were affected by Sm4; interestingly, 90% of those genes were also positively regulated by SOX18 overexpression. Sm4 strongly affected a subset of IR5 genes, highlighted in Figure 6C. A full list of the IR5 genes affected by Sm4 is provided in Supplementary Table S3. Dysregulation of gene expression was further profiled by qRT-PCR analysis ( Figure 6D). Results validated genome-wide overlapping analysis with histone marks, suggesting that the dimer has the capability to activate or repress transcription, since the small molecule inhibitor was able to enhance or re- press gene expression. Lastly, the effect of Sm4 on the transcriptional activity of SOX18 was interrogated in further details for known key endothelial regulators such as IL33, KLF4 or PROX1 ( Figure 6E). Sm4 selectively caused a SOX18-dependent dose response on the expression of genes harbouring an IR5 motif (IL33 and KLF4) ( Figure  6E). SOX18-dependent IL33 activation was inhibited by Sm4, whereas KLF4 activation was enhanced. In contrast, PROX1, a known SOX18 target gene that only contains monomer motifs in its regulatory region of intron 1, was not significantly affected by Sm4 treatment ( Figure 6E). These results show that the SOX18 dimer has a distinct molecular role from the monomer and selectively regulates a subset of endothelial specific genes that are likely to be context dependent.

DISCUSSION
Here, we describe the molecular basis for the dimerization of the SOX18 transcription factor, a key player during endothelial cell fate determination. We quantitatively describe this homotypic interaction, uniquely observed within the SOXF group and demonstrate the existence and functional relevance of SOX18 homodimer, showing the presence of an homodimer signature in the genome and controlling gene expression using pharmacological interferences with a small molecule inhibitor.
In humans, 20 Sry-related high-mobility-group box (SOX) genes have been identified, characterised, and categorised into 8 groups (29) (A-H). Across all SOX proteins, the HMG-box is highly conserved. In contrast, protein regions outside this DNA-binding domain (52) are highly variable in length and amino-acid composition. The HMGbox is thought to be central to target gene selectivity via both specific DNA motif recognition and protein partner recruitment. The functional consequences of switching the HMG-box between SOX2 and SOX17 have been shown to affect endodermal programing, by altering enhancer selection in combination with differential recruitment of OCT4 (53). In recent years, it became apparent that the domains outside of the HMG-box also contribute to protein partner recruitment.
Only a handful of SOX protein have been shown to dimerize (see for review (14,16)) even though the high throughput SELEX approach has predicted that most of SOX TFs are likely to form homodimers (54). SOX dimerization behaviours fall into three distinct groups. Some SOXs, such as the ones in the E Group (SOX8, SOX9, and SOX10), homodimerize in a DNA-dependent manner. SOXE proteins encode a unique 40 amino acids dimerization (DIM) domain which precedes the HMG-box (55). SOXE TFs dimerize in a highly cooperative fashion, but only do so in the presence of a (A/T)(A/T)CAA(A/T)G palindromic DNA binding sequence (56)(57)(58). Dimers of SOXE factors are able to accommodate a range of variably spaced half-sites (30,59), as opposed to other TFs that favour composite DNA elements with fixed spacing. All three SOXE proteins also effectively heterodimerize with one another, but do not dimerize with non-SOXE proteins. Interestingly, truncated DIM-SOXE fragments can also effectively dimerize with isolated SOXE HMG boxes, suggesting that a single SOXE group DIM domain is necessary and sufficient to mediate dimer formation. In this process, the dimerization is driven in the main by DIM-HMG intramolecular interactions communicated to the HMG of the juxtaposed SOX protein rather than by direct DIM-DIM intermolecular interactions. In contrast, for SOX18 the presence of two DIM domains seems to be mandatory for dimer assembly. Indeed, we show that SOX18 WT and the SOX18 mutant (minus FRELPPL motif, 197-203) are not able to form a dimer (Supplementary Figure S4). SOX2 is another protein able to form a dimer in a DNAdependent fashion. It has been shown that both monomeric and dimeric forms are present in human neutrophils (60). The dimerization propensity of SOX2 has been validated at a transcriptional level whereby the dimerization of SOX2 is triggered by the presence of bacterial DNA, and unlike the monomeric form, activates the TAB2-TAK1 complex, leading to the stimulation of the innate immune response (61). As in SOX18, the Group B homolog (GBH) domain required for SOX2 dimerization is at the C-terminus of SOX2 HMG-box.
In contrast to the DNA-dependent dimerization processes of the SOXE, members of the D-Group (SOX5/SOX6/SOX13) are known for dimerizing via a leucine zipper motif in a DNA-independent manner (62). A coiled-coil domain mediates homo-and heterotypic interactions within the SOXD group (63). This dimerization domain is situated in the N-terminal part of the protein and enables cooperative binding to clustered SOX-responsive elements (64). Our study supports the idea that regions located outside the HMG-box play an essential role in the dimerization process since we locate the SOX18 DIM domain within a unique 50 amino acids region adjacent to the 3 rd ␣-helix of the HMG-box. This localization is in good agreement with our previous observation that binding of an antibody raised against the 3rd ␣-helix of the HMG-box prevents homodimerization 14 . As with the SOXD group, we speculate that the self-assembly process of SOX18 might be DNA-independent, since dimerization occurs both in the in presence or absence of an IR5 oligonucleotide.
As for SOX2, SOX9, and SOX10, a subset of SOXresponsive genes are specifically regulated by SOX18 dimer activity. In the case of SOXE proteins, dimerization partially drives transcriptional output specificity. For instance, SOX10 homodimer binding sites are found in enhancers of several SOX10 target genes, including connexin-32, protein zero and myelin basic protein. Occupation of both SOX binding sites is required to drive promoter activities (65). SOX10 dimers also influence the formation of multi-protein complexes and transcriptional activity from these promoters (57). SOX9 homodimer-binding sequences are found in the enhancers of collagen and it has been shown that the SOX9 dimer recruits SOX5/6 dimers to activate Col2a1 transcription. In a similar fashion, we show that the SOX18 dimer has the capability to recruit the notch effector RBPJ or the transcription factor MEF2C (this study, Figure 1F) to probably further regulate transcription of dimer responsive genes (35,66). In most cases, it seems that the presence of a non-compact SOX-binding motif is a good marker to track potential transcriptional regulation by a dimer. In the case of SOX18, and in contrast to SOXE proteins, the spacer size is critical for cooperative binding and is found mainly in enhancer regions located 50Kb to 500Kb from gene transcription start sites.
In conclusion, structural and functional variations within different members of the SOX family make the identification and characterisation of the dimerization process a tedious exercise. Different modalities of self-assembly, involving the DNA, the HMG-box and specific motifs in C-terminal and N-terminal positions outside the DNAbinding domain, contribute to the diversity of self-assembly mechanisms. Our work shows that the mechanism of SOX18 dimer formation is a unique feature within the Fgroup, and involves a distinct binding motif, which permits the transcriptional signature of SOX18 to be distinguished from confounding, closely related, and redundant, SOX7 and SOX17 activities.