A conserved motif in the disordered linker of human MLH1 is vital for DNA mismatch repair and its function is diminished by a cancer family mutation

Abstract DNA mismatch repair (MMR) is essential for correction of DNA replication errors. Germline mutations of the human MMR gene MLH1 are the major cause of Lynch syndrome, a heritable cancer predisposition. In the MLH1 protein, a non-conserved, intrinsically disordered region connects two conserved, catalytically active structured domains of MLH1. This region has as yet been regarded as a flexible spacer, and missense alterations in this region have been considered non-pathogenic. However, we have identified and investigated a small motif (ConMot) in this linker which is conserved in eukaryotes. Deletion of the ConMot or scrambling of the motif abolished mismatch repair activity. A mutation from a cancer family within the motif (p.Arg385Pro) also inactivated MMR, suggesting that ConMot alterations can be causative for Lynch syndrome. Intriguingly, the mismatch repair defect of the ConMot variants could be restored by addition of a ConMot peptide containing the deleted sequence. This is the first instance of a DNA mismatch repair defect conferred by a mutation that can be overcome by addition of a small molecule. Based on the experimental data and AlphaFold2 predictions, we suggest that the ConMot may bind close to the C-terminal MLH1-PMS2 endonuclease and modulate its activation during the MMR process.


INTRODUCTION
DNA mismatch repair (MMR) is a highly conserved repair system; it is acti v e predominantly in the correction of base-base mismatches and insertion-deletion loops that occur during replication. It increases replication fidelity by three orders of magnitude ( 1 ). Its heterozygous inactivation in the germline causes Lynch syndrome (OMIM #120435), a dominantly heritable cancer predisposition which affects mainly endometrium and colorectum, but also other organs. Lynch syndrome accounts for 2-5% of all colorectal cancer (CRC) cases, but a pproximatel y 15% of sporadic CRC patients also show somatic inactivation of MMR, mostly due to h ypermeth ylation of the MLH1 promoter ( 2 ).
MutS proteins ar e ATP ases and versatile detectors of base mismatches and insertion-deletion loops in the DNA duplex ( 5 , 7-10 ). Their mode of mismatch detection has been studied in se v eral crystal structures and involves kinking of the DNA duplex at the mismatched site (for re vie w, see ( 11 )). Minimal MMR reactions have been reconstituted in vitro from purified components and, in case of the human system, rely on MutS ␣, MutL ␣ and some additional factors (RPA, RFC, PCNA, exonuclease I and DN A pol ymerase) (12)(13)(14). Howe v er, the mechanism of the r epair r eaction is still under discussion (15)(16)(17). It is difficult to assess experimentally since both protein dimers can assume a multitude of conformational states, depending on the loading of their ATP binding sites and on interactions with homoduple x DNA, heteroduple x DNA or with each other (18)(19)(20)(21). Moreover, MutS proteins may utilize different modes of movement on DNA (for mismatch search and for repair initiation) (for re vie w, see ( 22 )). The repair reaction is initiated by mismatch reco gnition, w hich authorizes binding of ATP by MutS ( 21 , 23 ). This induces a comprehensi v e conformational transition that has been found to transform MutS proteins into stably sliding DNA clamps, and sliding of MutS proteins as well as of MutS-MutL complex es is consider ed to be an important intermediate of the DNA mismatch repair reaction (24)(25)(26)(27)(28). Alternati v ely, it has also been suggested that MutL directs MutS to remain stationary in the vicinity of the mismatch after ATP binding, suggesting that repair is initiated by mispair-bound MutS-MutL complexes ( 29 , 30 ). ATP binding allows interaction of MutS with MutL dimers via their N-terminal domain ( 19 , 21 , 31 , 32 ). Before initiating DNA strand removal, MMR has to identify the newly replicated (and therefore erroneous) strand. While hemi-methylated GATC sites are used in some gammaproteobacteria for this task ( 33 ), interaction with PCNA at the replication fork is considered to allow strand discrimination in other organisms (34)(35)(36)(37)(38). After strand identification, the endonucleolytic activity of MutL proteins is acti vated ( 39 , 40 ). This acti vation is PCNA-dependent and introduces nicks in the wrongly replicated DNA strand in vicinity to the mismatch, possibly through direct interactions of MutL N-terminal domains (NTD) and C-terminal domains (CTD) (41)(42)(43)(44)(45)(46). The nicks are introduced on both 5'-and 3'-sides of the mismatch in the newly replicated strand to facilitate its digestion (46)(47)(48)(49).
Ther efor e, MutL proteins ar e mediators between mismatch recognition and removal of the faulty DNA strand. Their precise mode of function is still under discussion. MutL proteins contain ATPase activities in their structured NTD ( 50 ). Furthermor e, their structur ed CTDs confer constituti v e dimerization ( 51 , 52 ). ATP binding has been found to cause transient dimerization of the NTD ( 53 ) and conforma tional condensa tion of the protein ( 44 ). MutL proteins possess DNA binding capability ( 49 , 54-57 ). They have been observed to be able to move on DNA duplexes and even traverse obstacles like nucleosomes ( 27 , 58 , 59 ), presumably by formation of a DNA ring using their fle xib le linkers and the dimerization of the N-terminal domains ( 11 ). Additionally, there is evidence that MutL proteins cooperati v ely bind DNA and can form acti v e polymers on DNA ( 60 , 54 ).
While the catalytic activities of both structured domains of MutL (NTD and CTD) hav e been inv estigated, it is unclear how both domains communicate to facilitate the r epair r eaction and which role the MutL linkers play. The linkers r epr esent intrinsically disorder ed r egions (IDR) ( 70 ). Although unstructured, such regions occur in about one third of eukaryotic proteins, and they often fulfil specific functions, frequently involving protein interactions, either within the same or with other proteins (71)(72)(73). Small motifs or modules located within these IDRs may facilitate such tasks ( 74 ). The linker regions in MLH1 and PMS2 show no general evolutionary conservation in sequence and have significant differences in length, being 100-300 amino acids long, in case of the PMS2 homologue MLH3 e v en 800 amino acids. Small coding variations in these regions found in patients are considered non-pathogenic because practically all pathogenic variants have as yet been observed in the structured N-and C-terminal domains ( 15 ). Although they are fle xib le and of significant length, the linkers are r equir ed for repair since proteolytic cleavage of the linker in vitro reduces MutL DNA binding capacity, and cleavage in vivo abrogates MMR ( 75 ). Deletions in the yeast MLH1 linker conferred effects on MMR ca pacity, DN Astimulated ATPase acti vity, mov ement on DNA and nicking ( 76 ). It was concluded that the linkers are relevant for completing the ATPase cycle and therefore for both movements on DNA and efficient endonuclease activation. A recent study using 'handcuffing' of the linkers via insertion of rapamycin-dependent FRB and FKBP dimerization domains in specific positions of the MLH1 and PMS2 linkers showed that inter-dimer handcuffing close the C-terminal dimerization domain caused none or weak effects, while N-terminal handcuffing conferred decreased MMR activity and DNA binding ( 77 ). Howe v er, ATPase-and endonuclease activities were increased, suggesting an inappropriate activa tion of MutL ␣. Handcuf fing the MLH1 linker with itself displayed the most pronounced MMR defect: while no effect on ATPase activity was detectable, DNA binding and endonuclease activity were reduced, suggesting that the MLH1 linker may be important f or conf ormational rearrangements bringing the DNA strand to the endonuclease site.
We have investigated the human MLH1 linker region and identified a small motif (subsequently termed ConMot) that, in contrast to the rest of the linker, displays a high degree of conservation e xclusi v ely in eukaryotes. Genetic variants of unclear clinical significance have been reported in this motif in cancer patients. We have analyzed the role Nucleic Acids Research, 2023, Vol. 51, No. 12 6309 of the ConMot and the variants located therein concerning their effects on protein expression and mismatch repair activity.

Cell culture, reagents, vectors and variant synthesis
HEK293 and HEK293T cells were purchased at DSMZ German Collection of Microorganisms and Cell Culture (Braunschweig, Germany) and maintained in D-MEM containing 10% FCS and antibiotics / antimycotics. The pcDNA3 expression vector containing the entire open reading frame of human MLH1 was a gift of Dr Hong Zhang (Huntsman Cancer Institute, Uni v ersity of Utah, Salt Lake City, UT, USA). The pSG5 e xpression v ector containing full-length human PMS2 cDNA was provided by Dr Bert Vogelstein (Johns Hopkins Oncology Center, Baltimore , MD , USA). Amino-acid positions in MLH1 refer to the 756 aminoacid r efer ence MLH1 sequence (NP 000240.1).
Sequence variants of the MLH1 expression vector were generated using the Q5 site directed mutagenesis system (New England Biolabs, Ipswich, MA, USA) with the appropriate primers (Supplementary Table S1).

Mismatch repair assay
Mismatch repair activity was assessed as described in detail before ( 79 , 81 ) (Supplementary Figure S3). In short, 50 g of nuclear extract of HEK293T cells, which do not express MLH1-PMS2 due to a h ypermeth ylation of the MLH1 promoter ( 78 ), was supplemented with 5 g nati v e e xtract of HEK293T cells co-transfected with plasmids encoding MLH1 and PMS2; the MLH1 plasmid either encoded the wildtype sequence or the variant to be investigated. Mismatched DNA plasmid substrate (Supplementary Figure  S3) pr epar ed according to pr e viously pub lished procedures (81)(82)(83) was added, and reactions in a final volume of 25 l were incuba ted a t 37 • C for 15 min. Repair efficiency was scored by purifying the mismatched DNA plasmid from the reaction and assessing the fraction of homoduplex sequence by restriction digestion of the mismatched site using EcoRV. The substrate was additionally linearized with AseI.
For complementation assays with peptide, the indicated peptide(s) were used at stock concentrations of 50 or 500 M in HEPES-KOH 20 mM pH 7.6 and supplemented to the MMR reactions.

Circular dichroism (CD) and melting analysis of ConMot peptide
For the CD analysis, 33 M ConMot peptide was used in 20 mM Na-phospha te buf fer pH 7.2, 20 • C . Thr ee scans wer e performed with 190-300 nm wavelength, using a 1 mm light path. A blank was measured with buffer. For the melting analysis, 33 M ConMot peptide was used in 20 mM Naphospha te buf fer pH 7.2. Detection was a t 222 nm, temperature was scanned from 25-95 • C. A blank with buffer only was subtracted.

Bioinformatic analyses
Sequences for alignments were retrie v ed using BLAST with the human MLH1 protein sequence (NM 000240) as query.
Retrie v ed sequences were specified to organisms' classes or kingdoms in question by restricting the search on certain TaxIDs. The resulting hits were manually curated according to established procedures ( 84 ). Only one MLH1 sequence per organism was r etained, r emoval of incomplete sequences, and verification that the retrie v ed sequence corresponds to an MLH1 (and not PMS2 or other) protein, w hich can easil y be identified by the highl y conserved, C-terminal FERC sequence of eukaryotic MLH1 proteins ( 48 , 85 ). By that procedure, 567 sequences from animals, 348 from fungi, 117 from embryophyta could be identified, while few sequences were available from other eukaryotic kingdoms (3 from red algea, 9 from green algea, 8 from amoebozoae, 1 apusozoa and 2 choangoflagellata). Sequences were handled in JalView ( 86 ). Alignments were performed with Muscle ( 87 ). Conservation scor es wer e calculated with these alignments using ConSeq ( 88 ). Conservation images of motifs were created using WebLogo 3 ( 89 ).
The model of MLH1-PMS2 complex was built using Al-phaFold ( 90 ) version 2.2.2 with all options set to default except the max recycles parameter which was increased from 3 to 12 maximise model accuracy. AlphaFold was run using AlphaPulldown pipeline ( 91 ).

Eukaryotes have a conserved motif within the intrinsically disor der ed MLH1 linker region
MutL proteins, including human MLH1 and PMS2, possess two structured, highly conserved functional domains: the ATPase function is located in the N-terminus (NTD), while an endonucleolytic activity is located in the Cterminus (CTD) of PMS2 ( Figure 1 A and B). The CTDs also confer a constituti v e dimerization of both proteins ( 51 , 52 ), while the N-terminal ATPase domain can transiently dimerize, controlled by the binding and hydrolysis of ATP ( 53 ).
Both structured domains are connected by a nonconserved linker of variable length, which has been suggested to contribute to DNA binding ( 75 ). This sequence r epr esents an intrinsically disordered region (IDR) of the protein, as is illustrated by a PONDR analysis (Figure 1 A) ( 93 ). Consistent with the low sequence and length conservation of this IDR, small coding alterations in this region that have been identified in humans are usually considered non-pathogenic ( 15 ).
Depending on the composition of sequences analyzed in MutL alignments, a small conserved area becomes evident within the disordered linker region (Figure 1 A). Along with increased conservation, the PONDR score is reduced, suggesting that this area ma y f old into a secondary structure (Figure 1 A). A comprehensi v e alignment analysis demonstra ted tha t the peak of conserva tion within the linker corresponds to a highly conserved MVRTD motif found exclu-si v ely in MLH1 proteins of the eukaryotic domain, which is embedded in a less conserved motif embracing 22-24 amino acids (Figure 1 C and D). Kingdom-specific alignments show that these flanking sequences comprise residues highly conserved within their respective eukaryotic kingdom but showing some variation between kingdoms (Supplementary Figure S1).
The strong sequence conservation suggests a biological significance of this motif which is, for ease of reading flow, termed ConMot subsequentl y. Notabl y, within the ConMot sequence, human variants have been identified in cancer patients (Figure 1 E) ( 94 ).
Further sequence evaluation demonstrated that not only a reduction in PONDR score is predicted for the Con-Mot, but structur e pr ediction algorithms suggest an elevated propensity to form a secondary structure. The peptide folding algorithm PEP-FOLD3 predicts a potential ␣helical structure in two short sequences of the human Con-Mot, interrupted by the highly conserved MVRTD motif (Supplementary Figure S2A). Therefore, the sequence may provide a potential f or f olding into a secondary structure (Supplementary Figure S2B). We have analyzed a 32-mer peptide containing the human ConMot sequence by circular dichroism and melting analyses ( Supplementary Figure 2C and D). Neither investigation provided evidence for secondary structure formation of the ConMot peptide.
Howe v er, the inv estiga ted 32-mer peptide is ra ther short for forming stable secondary structures and may lack protein or DNA interactions that under biological conditions would confer secondary structure stabilization.

Deletion and scrambling of the ConMot and one human variant abolish DNA mismatch repair
A prominent task of MLH1 is its contribution to the DNA mismatch repair reaction. In order to assess the relevance of the ConMot for MLH1 function, we ther efor e introduced artificial and human variants in MLH1, expressed the MLH1-PMS2 heterodimers in MLH1-deficient HEK293T-cells and investigated their capacity to repair a G-T mismatch in a test plasmid.
We constructed a deletion variant ( ConMot) in which the conserved ConMot sequence comprising 20 residues was r emoved (r ed in Figur e 1 B). For comparing if the shortening of the MLH1 linker region can disturb mismatch repair by itself, we generated a deletion-comparison variant in which a non-conserved fraction of the linker comprising also 20 residues was removed ( Compare) (blue in Figure  1 B). For assessing the relevance of the central conserved motif sequence, a scrambled variant was investigated, in which the MVRTD sequence was substituted by the sequence DTMVR.
Wildtype and variant MLH1-PMS2 proteins w ere w ell expr essed (Figur e 2 A). Although most human MLH1 missense variants confer functional defects as well as pathogenicity by destabilizing the MLH1 protein ( 79 ), neither the human variants nor the artificial variants in the linker region investigated here had any effect on expression, suggesting that e v en gross alterations in the linker region do not significantly destabilize the MLH1 protein. This is consistent with the absence of large secondary structures in the linker region, and also suggests that potential interactions of the ConMot with the MLH1 NTD or CTD regions do not significantly contribute to protein stability either. The shortening conferred by the deletion variants was visible by corresponding size shifts (Figure 2 A).
We next tested the DNA mismatch repair capacity of all variants by assessing the restoration of a mismatched restriction site in a test plasmid (Supplementary Figure  S3). This demonstrated that deletion of the ConMot from MLH1 full y abro ga ted misma tch repair ( ConMot), while the more C-terminally located deletion of identical size ( Compare) had no detectable effect on repair activity (Figure 2 B, left panel). The scrambling of the highly conserved MVRTD motif also significantly reduced repair capacity.
The applied MMR assay procedure is not only suitable for r esear ch purposes but can also be applied for pa thogenicity clarifica tion of human variants ( 79 , 81 , 95 ). We ther efor e tested human variants of the ConMot (Figure 1 E). Of these, the proline substitution of Arg385 located within the MVRTD motif displayed a loss of mismatch repair activity (Figure 2 B, right panel). The catalytic activities of other substitutions, including a cysteine and a histidine substitution of the same r esidue, wer e similar as the wildtype sequence.
The core MVRTD motif is highly conserved over the complete eukaryotic domain, while the flanking sequences are also conserved, albeit in a kingdom-specific manner (Supplementary Figure S1). This may suggest that the core motif may be invariant for mechanistical reasons, while the flanking sequence may fulfil other functions that have allowed alternati v e sequences to e volv e. To directly compare the contribution of core sequence versus flanking sequences, we used the core-scramble variant and a hybrid variant with a plant ConMot as it is, for example, present in walnut and mungo bean. Exclusi v ely in the plant kingdom, a two-residue (alanine-glycine) insertion in the Con-Mot flanking sequence is common (Supplementary Figure  S1), and, while conservation is high in plants, the pattern of conservation is different from animals (Figure 3 B). Therefore, the in the human-plant hybrid, the ConMot contains a functional flanking sequence, albeit from a different organism.
We compared the core-scramble and human-planthybrid variants with the ConMot deletion variant and a full scramble variant. All variants were well expressed (indistinguishable from wildtype) (Figure 3 C). The core scramble variant again had a pproximatel y 40% activity, which was comparab ly acti v e as the human-plant hybrid (Figure 3 D). This shows that the flanking sequence, although variant in inter-kingdom comparisons, is onl y full y functional within its species context, and that both core and flanking sequence contribute to a similar degree to the functionality of the ConMot.

A ConMot peptide r estor es mismatch repair capacity of ConMot
Since the ConMot is localized in the fle xib le MLH1 linker region, it potentially confers a pr otein-pr otein interaction or an interaction with DNA; both could be reconciled with current assumptions on the function of the MLH1 linker. We performed competition / supplementation tests to gain information on the contribution of the ConMot to the DNA mismatch repair reaction using a 32mer peptide containing the human ConMot sequence.
First, we attempted to interfere with a regular DNA mismatch repair reaction by adding the ConMot peptide to a reaction using extract of MMR-proficient HEK293 cells (Figure 4 A). Even 75 M ConMot peptide did not detectably interfere with the catalytic repair activity of the cell extract, suggesting that the ConMot peptide does not significantly compete with the MLH1-ConMot in a manner causing a defect in mismatch repair.
Second, we used extract of the repair-deficient cell line HEK293T supplemented with either wildtype MutL ␣ (MLH1-PMS2) or the ConMot deletion variant, corresponding to experiment shown in Figure 2 B. In this experiment, we again added ConMot peptide. Interestingly, the DNA mismatch repair-defect of the ConMot variant could be overcome by the addition of exogenous ConMotpeptide (Figure 4 B). In contrast, a scrambled control peptide did not r estor e MMR activity. A smaller peptide (20 aa) containing only the highly conserved ConMot sequence was not acti v e either, suggesting that there is a r equir ement for additional residues neighboring the ConMot sequence for activity. Since the presence of the 20mer-ConMot did not af fect MMR restaura tion by the 32mer-ConMot, it is likely that target binding, not activation, is compromised in the smaller peptide (Figure 4 B).
We investigated the effect of the ConMot peptide more closely by titrating the r equir ed peptide concentrations for activa tion. We determined tha t the misma tch r epair r eactions performed in this study contained a pproximatel y 280  Figure S4A and B), which is in good agreement with previous quantifications of MLH1-PMS2 in cells and extracts ( 96 ). An increase in MMR acti vity became detectab le at a ConMot peptide concentration of 1 M, 50% activation was achie v ed at 7.5 M, and almost complete activation at 75 M (Figure 4 C and D). Dissociation constants ( K D ) for peptides to proteins are typically in the range of 1-10 M ( 97 ). We compared the measured concentration-dependent MMR activities with theoretical binding curves of (peptide) ligands to (protein) substrates for K D values of 1, 5 and 10 M (Supplementary Figure S4C) at the determined MLH1 concentration. The degree of MMR restoration corresponded very well to the theoretical fraction of peptide binding to a protein for a K D of 7.5 M (Figure 4 D). It is therefore plausible that a binding reaction of the ConMot peptide (to a protein) underlies the restauration of MMR activity.
In an identical way, we analyzed the effects of the Con-Mot peptide on MMR activity of the defecti v e human MLH1 missense variant p.Arg385Pro ( Figure 5 ). Restoration of MMR activity was again observed, and r equir ed identical peptide concentrations as with the ConMotdeletion variant, suggesting that the human p.Arg385Pro variant abolishes the interaction of the MLH1-ConMot with its target binding site, and that addition of ConMotpeptide can replace the function of a mutated ConMot.

The ConMot variant p.arg385tyr partly restores MMR deficiency of the endonuclease variant p.tyr750arg
In our AlphaFold2 model of the MLH1-PMS2 heterodimer, the ConMot is predicted with high confidence to bind to the CTD of MLH1 (Supplementary Figure S5,   Figure 6 A). The p.Arg385 residue of the ConMot is predicted to be in sufficient proximity to the p.Tyr750 to engage in a side-chain-side-chain interaction (Figure 6 B). Both p .Arg385 and p .Tyr750 are highly conserved, with p .Arg385 being the most highly conserved residue of the ConMot (Figure 6 B).
We specula ted tha t a puta ti v e p .Arg385-p .Tyr750 side chain interaction may be accessible for experimental verification. We performed an exchange of these residues and created the variants p .Arg385Tyr, p .Tyr750Arg and a double variant with interchanged side-chains (p.Arg385Tyr + p.Tyr750Arg). As controls, we generated p .Arg385Ala and p .Arg385Glu. All variant proteins were well expressed (Figure 6 C).
While the p.Arg385Tyr substitution was well tolerated in terms of MMR activity, the p.Tyr750Arg alteration caused a comprehensi v e defect of MMR (Figure 6 D). Intriguingly, this defect was partially re v erted in the double variant (p .Arg385Tyr + p .Tyr750Arg): repair activity of this double variant was more than twice as high as that of p.Tyr750Arg (it increased from 22.4% to 50.2% on average). This observation was highly reproducible and significant ( P = 0.0028). This was not the case for the two control alterations of p.Arg385 to alanine and to glutamate: these had no re v erting effect on the defect of the p.Tyr750Arg variant, but instead acted additi v ely to result in a more pronounced MMR defect (Figure 6 D). Following the concept of co-evolution, detrimental effects of missense substitutions may potentially be r estor ed by subsequent adapti v e substitutions of another residue that is typically in spatial and / or functional contact with the first one ( 98 ). Detection of such coupled substitutions in multiple sequence alignments ther efor e is informati v e for identifying or confirming direct side chain contacts in threedimensional structure ( 99 ) and is exploited by prediction algorithms as AlphaFold ( 90 ).
The expected result of a simultaneous substitution of two independent residues is that a potential defect has at least the same extend as the more deficient single variant. More likely, both may exert an additi v e effect and result in a more pronounced defect of the combined variant, similar as we have r eported befor e for two small coding variants in MLH1 that conferred an additi v e defect on protein stability and resulted in a disease phenotype ( 95 ). Likewise, in this experiment, the two control variants (p.Arg385Ala and p.Arg385Glu) also reacted in this manner and both additionally aggravated the defect of p.Arg750Tyr.
Ther efor e, although r estauration of r epair was incomplete in the p.Arg385Tyr + p.Tyr750Arg double variant, the finding that the second alteration (p.Arg385Tyr) relie v ed the functional defect of the primary mutation (p.Tyr750Arg) most likely is related to a spatial or functional interaction of both side chains, specifically since both residues wer e only inter changed. W hile further investiga tions ar e r equired to confirm and characterize ConMot binding to the MLH1-CTD, this result provides evidence that the interaction of p.Arg385 with p.Tyr750 was correctly predicted by AlphaFold2.

DISCUSSION
The MutL linker regions have recently attracted increased attention concerning their functional role in the mismatch r epair r eaction ( 15 , 59 , 76 , 77 ). Due to their low degree of sequence and length conservation, human missense alterations are commonly considered innocuous ( 15 ). Our investiga tions demonstra te the existence of a small motif (Con-Mot) within the human MLH1 linker that is vital for MLH1 function. For a corresponding sequence in yeast mlh1 , a similar significance has very recently been reported ( 100 ). The ConMot motif is e xclusi v ely conserv ed in eukaryotes and comprises 20 amino acids (22 in plants).

The ConMot sequence and alteration tolerance
The ConMot consists of a central, uni v ersally conserv ed core motif of fiv e amino acids (MVRTD) and of flanking sequence whose conservation is kingdom-specific. While deletion of the ConMot and complete scrambling rendered MLH1 fully deficient in MMR, gross variants of the core motif (core-scramble) and of the flanking sequence (a human chimera containing the flanking sequence of plants) retained a pproximatel y 40% catalytic activity. This suggests that its contribution to MMR is not focused on single residues. This is confirmed by the observa tion tha t, of all tested missense altera tions, only some of the most highly conserved residue Arg385 reduced MMR activity. While three tested substitutions wer e tolerated (p.Arg385Cys / His / Tyr), thr ee other caused a defect similarly strong as the core-scramble variant (p.Arg385Pr o / Ala / Glu). The most pr onounced effect was observed for the human patient variant p.Arg385Pro, possibly because the sterically infle xib le proline confers a more gener al str ain on target binding by the ConMot than other missense variants and e v en the core scramble variants. W hile exact determina tion of ConMot target binding will be r equir ed to explain these r esults satisfactorily, it seems plausible that ConMot activity primarily depends on its binding to its target, and that this binding has a strongly cooperati v e nature and is therefore rather tolerant to a number of alterations.

The potential role of the ConMot in MLH1 function
Since the ConMot is a short motif, it is unlikely to perform any specific activity by its own; rather, it will need to bind a target to perform its function and facilitate MLH1-PMS2 activity. Since the ConMot is situated in the fle xib le linker, it enjoys some spatial liberty (Figure 7 A-a). Our finding tha t an isola ted ConMot peptide can replace the MLH1-ConMot confirms that the exact position of the ConMot within the protein is not of significant importance, suggesting that it moves to its target during the repair process. This is consistent with the observation that the corresponding yeast motif was also acti v e when mov ed within the yMLH1 linker or e v en when transferred to the yPMS1 linker ( 100 ).
The curr ent r esults in the peptide complementation experiments allow interesting conclusions concerning the target binding site of the ConMot: In general, the target may either be another protein or DNA (a third partner, #1, Figure 7 A-b), or alternatively, it ma y perf orm an interaction within the MLH1-PMS2 dimer (#2) (Figure 7 A-c).
We have observed that DNA mismatch repair is not attenuated by addition of e v en high concentrations of ConMotpeptide (Figure 4 A). This finding is supporti v e of an intr amolecular inter action (Figure 7 A-c), since formation of a ternary complex with another partner (Figure 7 A-b) would be likely impaired in the presence of excess ConMot-peptide by out-competing the MLH1-ConMot for binding to this partner and thereby displacing it from the ternary complex (Figure 7 A-d). In contrast, if the MLH1-ConMot is involved in an intramolecular interaction (Figure 7 A-c), addition of ConMot-peptide in excess is unlikely to interfere with the DNA repair reaction. Either the ConMotpeptide may be excluded from the complex because it cannot compete with the much more efficient intramolecular interaction of the MLH1-ConMot (Figure 7 A-e), or it does displace the MLH1-ConMot without interfering with the r epair r eaction (which is possible, see below) ( Figure 7 A-f).
The results observed with the ConMot-deletion variant of MLH1 ( ConMot, Figure 7 A-g) also suggest an interaction of the MLH1-ConMot within the MLH1-PMS2 dimer. In these experiments, it was possible to overcome the MMR-deficiency of the MLH1 ConMot v ariant b y addition of exogenous ConMot-peptide. If the MLH1-ConMot were involved in a ternary complex formation, the effect of peptide addition (again) would have been the displacement of the ternary partner, without re-constituting repair activity (Figure 7 A-hi). In contrast, the finding that addition of ConMot-peptide r estor es mismatch r epair activity suggests that the ConMot-peptide has re-established an intramolecular or inter-subunit interaction within MLH1-PMS2 (Figure 7 A-jk).
It is relevant to note that, in contrast to our results, peptide addition in yeast inhibited activity of wildtype yMLH1-yPMS1 ( 100 ). Howe v er, in these e xperiments, only a fraction of the complete DNA mismatch r epair r eaction (nonspecific endonuclease activation in the absence of a mismatch) was measured under rather artificial conditions (presence of manganese instead of magnesium), making it possible the observed inhibition arises only in this experimental set-up (see Supplementary Table S2 for a detailed comparison of reaction conditions). Additionally, the use of a shorter peptide (25 aa instead of 32 aa) may have contributed to the conflicting results, since our experiments showed that peptide length is a relevant factor in activation (Figure 4 B). Since the experiments described her e r eflected the behavior of the complete MMR reaction under largel y biolo gical conditions and consistently demonstrated non-inhibition of the wildtype reaction and re-constitution of repair in ConMot-mutant variants, we assume that this activa ting ef fect better reflects the biological function of the ConMot.
A further aspect corroborating an internal interaction of the ConMot is the observa tion tha t the AlphaFold2prediction, which suggested a binding of the ConMot to the MLH1-CTD with high confidence (Supplementary Figure  S5) also predicted a functional interaction of the ConMot residue p.Arg385 with the endonuclease residue p.Tyr750, for which we provided experimental evidence ( Figure 6 ).
In summary, the evidence suggests that the ConMot interacts with a site located within the MLH1-PMS2 dimer (Figure 7 B). There, it would be in contact with p.Tyr750 in the C-terminal helix of MLH1 which contributes an essential Zn 2+ binding Cys residue (Cys765) to the composite MLH1-PMS2 endonuclease site ( 46 , 101 ). Ther efor e, the ConMot could potentially modulate endonuclease activity, or e v en contribute catal yticall y r elevant r esidues, from this position, consistent with the observed MMR defect and with the observation that mutation of the corresponding motif in yeast inactivated endonuclease activity ( 100 ). Binding of the ConMot to this site would also be consistent with the observation that ConMot residues were found to be close to the DNA backbone in FeBABE experiments ( 102 ), since it is sim ultaneousl y located close to the DNA in this proximity to the endonuclease site.
Besides a potential involvement in endonuclease activation, the proposed interaction of the ConMot with the MLH1-CTD would subdivide the central cavity of MLH1-PMS2 that is formed by N-terminal dimerization (Figure  7 C). A smaller linker ring of a pproximatel y 300 Å circumference would be formed sim ultaneousl y with a larger ring comprising the residual fraction of the MLH1-CTD combined with both MLH1-and PMS2-NTD and the major part of the PMS2 linker. This may contribute to a controlled and targeted activity of the endonuclease function to the newly synthesized DNA strand whose identity is communicated by PCNA at the replication fork ( 38 ). Alternati v ely, the ConMot may be involved in formation of MutL acti v e polymers that have been observed to occur on DNA by binding neighboring subunits ( 60 ).
Taken together, the ConMot is a conserv ed, movab le, short motif located in an IDR, and likely to confer interaction to a structured protein domain. Ther efor e, it has features attributed to so-called 'Short Linear Motifs' (SLiM), w hich frequentl y confer weak and / or transient protein interactions ( 74 , 84 , 103 , 104 ).
Besides the interaction of the ConMot with the MLH1-CTD, a helical motif in the PMS2 linker is present in our AlphaFold2-predictions and in previous predictions for yeast and other organisms ( 100 ). This P-motif is predicted to bind in the MLH1-CTD (Figure 7 B and C). This interaction would tether the most C-terminal portion of the PMS2 linker to the structured MLH1 CTD, causing the PMS2 linker to actually be 'connected' to the MLH1-CTD instead of the PMS2-CTD (P-Linker-N in Figure 7 C). This has plausibly been suggested to evade sterical problems in interactions during the repair process ( 100 ), since the PMS2 linker thereby is at greatest possible distance to the PMS2 PCNA interaction motif ( 64 ) and the endonuclease domain ( 48 ) (Figure 7 C).

Human variants in the ConMot
As yet, pathogenic small coding variants causing Lynch syndrome have almost exclusively been identified either in the conserved NTD or in the CTD, where they can suppress di v erse aspects of MLH1 function ( 105 ). In the NTD, they may interfere with the ATPase cycle ( 50 , 80 ) and / or with MLH1-MSH2 interaction ( 83 ). In the CTD, they may disturb dimerization ( 51 ), frequently affect protein stability ( 79 ) and / or may impair endonuclease activity ( 47 , 85 ). In contrast, the MLH1 linker region a ppeared largel y devoid of pathogenic small coding variants ( 15 ). Our present da ta demonstra te tha t the ConMot r epr esents an exception to this rule.
We inv estigated se v er al human alter ations reported within the ConMot. Of these, the p.Arg385Pro alteration displayed a mismatch repair defect. It is located in the core of the strongly conserved MVRTD motif. The sub-stitution to proline likely introduces a sterical strain that interferes with the ability of the ConMot motif to adopt a proper conformation for target binding. This alteration has originally been identified in a patient with a large adenoma with focal high grade dysplasia and a family history of CRC (family ID R-RM6) ( 106 ) (Supplementary Figure S6). Microsatellite instability (MSI) in the tumor tissue is a hallmark of MMR-defecti v e tumors ( 107 ), but MSI testing has only been performed in the adenoma tissue of the index patient and ther efor e was not informati v e ( 106 ). The Arg385Pro allele (rs63750430) has been identified in low frequencies in Asian and European populations ( https://gnomad.broadinstitute.org/variant/3-37067243-G-A?dataset = gnomad r2 1) ( 108 ). As with many small coding altera tions, insuf ficient informa tion is available for this alteration, ther efor e it has remained unclear if this variant is causati v e for cancer diseases. Correspondingly, it is listed as an 'variant of uncertain significance' (class 3) in the clinical r efer ence database for variant classifications ( https://www.insight-database. org/classifications/ ). The MMR defect of this variant confirms that this variant is likely pathogenic and ther efor e causati v e for Lynch syndrome in this family by affecting the proper function of the MLH1 ConMot.
While the sample size is too small for a final evaluation, it is interesting to note that the average age at cancer diagnosis is rather high in the family with the p.Arg385Pro variant. We have shown before that incomplete inactivation of MLH1 by missense alterations prompts a milder Lynch syndrome phenotype with lower penetrance and higher avera ge a ge of cancer onset ( 79 ), and a similar observation has recently been published for an MSH2 variant ( 109 ). In case of the ConMot, a complete loss of MMR activity was observed only in the deletion and full scramble variants, while other alterations like the 'Core-Scramble' and the 'Animalplant hybrid' variants, still retained some activity ( Figures  2 and 3 ). It is ther efor e possible that small ConMot alterations retain partial activity entailing a milder Lynch syndrome phenotype.
Intriguingl y, the DN A mismatch repair defect of the p.Arg385Pro variant could be overcome by addition of ConMot peptide. To our knowledge, this is the first example of restoration of mismatch repair activity to a MMRdeficient human MLH1 mutant by addition of a small compound.
In contrast to p.Arg385Pro, conservati v e substitutions of this residue to histidine and cysteine (p.Arg385His and p.Arg385Cys) did not affect catalytic activity. For neither substitution, pa thogenicity classifica tions exist due to a lack of clinical data. It is interesting to note that homologous mutations in yeast of both substitutions displayed defecti v e MMR ( 100 ). This may reflect different sensitivity of the motif to substitutions in the yeast and the human system. The p.Tyr379Cys variant has been clinically classified as 'likel y not patho genic' (class 2) based on co-segregation da ta ( https://www.insight-da tabase.org/classifica tions/ ). Consistent with this, we did not find a catalytic defect for this variant. Although tyrosine 379 is a conserved residue of the ConMot, the substitution is rather conservati v e, suggesting that the ConMot-target interaction is not significantly disturbed.
Although the p.Val384Asp variant is a non-conservati v e exchange of a highly conserved, small lipophilic residue, there was no detectable impairment of MMR. Consistent with this experimental result, this alteration is a polymorphism (rs63750447, up to 2.7% of controls) in East Asia, speaking against a causati v e role in cancer; on the other hand, it is the most prevalent somatic alteration reported in MLH1 in different tumor entities ( 110 ). According to our data, the variant is unlikely to be causati v e for Lynch syndrome; further in vestigations ma y show if a causati v e association with other diseases exists.
In summary, we have described identification and preliminary characterization of a conserved eukaryotic protein motif located in the otherwise unstructured, unconserved linker region of MLH1 proteins. This motif is indispensable for human mismatch repair and most likely performs an intr amolecular inter action with the CTD of MLH1 during the repair process. While showing a certain degree of tolerance to substitutions, we have shown that variants in the ConMot can disable mismatch repair and therefore can be causati v e for Lynch syndrome.

DA T A A V AILABILITY
The data underlying this article are available in the article and in its online supplementary material.