A comprehensive in vivo screen of yeast farnesyltransferase activity reveals broad reactivity across a majority of CXXX sequences

Abstract The current understanding of farnesyltransferase (FTase) specificity was pioneered through investigations of reporters like Ras and Ras-related proteins that possess a C-terminal CaaX motif that consists of 4 amino acid residues: cysteine–aliphatic1–aliphatic2–variable (X). These studies led to the finding that proteins with the CaaX motif are subject to a 3-step post-translational modification pathway involving farnesylation, proteolysis, and carboxylmethylation. Emerging evidence indicates, however, that FTase can farnesylate sequences outside the CaaX motif and that these sequences do not undergo the canonical 3-step pathway. In this work, we report a comprehensive evaluation of all possible CXXX sequences as FTase targets using the reporter Ydj1, an Hsp40 chaperone that only requires farnesylation for its activity. Our genetic and high-throughput sequencing approach reveals an unprecedented profile of sequences that yeast FTase can recognize in vivo, which effectively expands the potential target space of FTase within the yeast proteome. We also document that yeast FTase specificity is majorly influenced by restrictive amino acids at a2 and X positions as opposed to the resemblance of CaaX motif as previously regarded. This first complete evaluation of CXXX space expands the complexity of protein isoprenylation and marks a key step forward in understanding the potential scope of targets for this isoprenylation pathway.

Ras signaling GTPases are often cited as classical examples of isoprenylated proteins (Willumsen, Christensen et al. 1984;Wright and Philips 2006;Zhou, Wiener et al. 2016). Ras isoprenylation is catalyzed by FTase, which acts upon a C-terminal CaaX motif consisting of 4 amino acid residues: cysteine-aliphatic 1 -aliphatic 2 -variable (X). FTase appends the C15 isoprenoid lipid donated by farnesyl pyrophosphate to the CaaX cysteine via a thioether linkage. After farnesylation, Ras undergoes the coupled modifications of endoproteolysis to remove the -aaX portion of the motif, followed by carboxylmethylation of the farnesylated cysteine that becomes the new C-terminal residue. These PTMs regulate Ras plasma membrane localization and function and have been extensively studied due to the significance of Ras GTPases in human disease such as cancer (Tamanoi 2011;Cox, Der et al. 2015;Hobbs, Der et al. 2016). As a result, previous Ras-based investigations have led to the general model that the 3 CaaX PTMs (i.e. isoprenylation, proteolysis, and methylation) commonly occur across a range of proteins that are collectively referred to as CaaX proteins. CaaX PTMs exist in all eukaryotic species, and the impact of these PTMs on CaaX protein biology is highly conserved across model systems (Omer, Kral et al. 1993;Cazzanelli, Pereira et al. 2018;Ravishankar, Hildebrandt et al. 2023). This is especially evident in the conserved similarities between mammalian and yeast FTase structure and function (Kohl, Diehl et al. 1991;Gomez, Goodman et al. 1993;Omer, Kral et al. 1993).
Not all farnesylated proteins undergo all 3 CaaX PTMs. An example is the Saccharomyces cerevisiae Hsp40 chaperone Ydj1 (ScYdj1) that is farnesylated on its C-terminal CASQ sequence but not subjected to endoproteolysis or carboxylmethylation. Farnesylation of Ydj1 is required for optimal yeast growth at elevated temperatures (>37°C), and subjecting Ydj1 to all 3 CaaX PTMs negatively impacts this Ydj1-dependent thermotolerance growth phenotype (Caplan et al. 1992;Hildebrandt et al. 2016). We have defined this noncanonical pathway leading to a farnesylation-only PTM as the "shunt" farnesylation pathway, which is characterized by the lack of downstream modifications (i.e. nonproteolyzed and noncarboxylmethylated), possibly due to the inability for downstream protease to recognize and cleave sequences that do not adhere to the CaaX consensus sequence (Fig. 1). The observation of the shunt pathway raises the question of whether the previous use of canonically modified CaaX protein reporters (e.g. Ras, a-factor), which are subject to the additional constraints of proteolysis and carboxylmethylation, has limited the breadth of sequences that can be identified as FTase substrates.
Many past investigations have attempted to probe FTase specificity. Initially, the screening of FTase substrates was explored in vitro on a case-by-case basis using recombinant CaaX proteins and synthetic peptides with [ 3 H]-FPP or [ 3 H]-GGPP, which was inconvenient and labor intensive for large-scale studies Caplin, Hettich et al. 1994). In vitro investigations using peptide libraries and metabolic labeling have extended this effort but have been limited in scope and cost-prohibitive, making it difficult to conduct systematic studies of the 8,000 CXXX sequence space (Reiss, Stradley et al. 1991;Boutin, Marande et al. 1999;Krzysiak, Scott et al. 2007;Krzysiak, Aditya et al. 2010;Wang, Dozier et al. 2014;Tate, Kalesh et al. 2015;Suazo, Schaber et al. 2016;Storck, Morales-Sanfrutos et al. 2019). While in silico prediction models based on structural analysis of mammalian FTase are available and have potential to define isoprenylatable sequences, these models often fail at predicting farnesylated proteins with noncanonical CaaX motifs such as ScYdj1 (CASQ), HsDNAJA2 (CAHQ), ScPex19 (CKQQ), HsLkb1 (CKQQ), and ScNap1 (CKQS) and often lack orthologous in vivo reporter data to validate predictions (Collins, Reoma et al. 2000;Sapkota, Kieloch et al. 2001;Reid, Terry et al. 2004;Maurer-Stroh and Eisenhaber 2005;Lane and Beese 2006;Maurer-Stroh, Koranda et al. 2007;London, Lamphear et al. 2011;Berger, Yeung et al. 2022). Most of the in vivo reporters used to probe CaaX space have relied on CaaX protein reporters requiring a 3-step modification and the superposition of 3 enzyme specificities for their activities (Boyartchuk, Ashby et al. 1997;Stein, Kubala et al. 2015). Moreover, recent reports demonstrating FTase activity on shortened and extended sequences highlight the flexibility of CaaX substrate lengths, additionally complicating the recognized model of FTase CaaX specificity (Ashok, Hildebrandt et al. 2020;Schey, Buttery et al. 2021).
The reliance of previous studies on canonical protein reporters and incomplete peptide reporter sets, in conjunction with our shunt pathway observations (i.e. nonproteolyzed and noncarboxylmethylated), led us to hypothesize that the full scope of FTase targets remains unknown. To address this gap in knowledge, we developed ScYdj1 as a genetic reporter to elucidate the specificity of the yeast FTase across all 8,000 CXXX sequences. Our results indicate that yeast FTase has a much broader target specificity than previously defined by the canonical CaaX motif.

Yeast strains
Strains used in this study are listed in Table 1. All plasmid transformations into yeast were performed using a lithium acetate-based transformation procedure unless otherwise stated (Elble 1992).

Plasmid construction for Ydj1-CKQx variants and His-Nap1
Plasmids used in this study are listed in Table 2. Newly created plasmids needed for this study were constructed using PCR-directed recombinational cloning consistent with previously reported methods (Oldenburg, Vo et al. 1997;Hildebrandt, Cheng et al. 2016;Berger, Kim et al. 2018). For Ydj1-CKQX plasmids, pWS1132 that was linearized with Nhe1 was co-transformed into yWS304 along with a PCR-derived DNA fragment encoding a new CaaX sequence. For the His-Nap1 plasmid (pWS1474), pRS316 that was linearized with BamHI, XbaI, and XhoI was co-transformed into BY4741 along with a PCR-derived DNA fragment encoding the NAP1 gene. The resultant Nap1 plasmid (pWS1318) was additionally modified to encode an octa-His tag at the 5′ end of the NAP1 ORF; the tag was introduced by recombination using a PCR-derived fragment and pWS1318 that had been linearized with BsaBI. In all cases, candidate plasmids recovered from yeast surviving appropriate genetic selection were evaluated by both restriction digest and DNA sequence analyses (GENEWIZ/Azenta Life Sciences, South Plainfield, NJ; Eurofins Genomics, Louisville, KY).

Preparation of naïve yeast library containing Ydj1-CXXX variants
The E. coli-derived Ydj1-CXXX plasmid library was transformed into yWS304 (ydj1Δ) via Frozen-EZ Yeast Transformation II Kit by Zymo Research (Irvine, CA) according to manufacturer's instructions such that over 3 million individual colonies were recovered across multiple SC-Uracil plates on the same day. The colonies were collected by gently washing the plates with liquid medium and cells concentrated by centrifugation to remove excess supernatant. The collected cells were stored at −80°C as 100-µL aliquot stocks in 15% glycerol at a concentration of ∼2 × 10 9 cells per 1 mL. For all subsequent studies, freshly thawed stock vials were used and not re-frozen for repeated use.

Thermotolerance selection of Ydj1-CXXX variants en masse and plasmid library recovery
An aliquot of the naïve yeast library expressing Ydj1-CXXX variants was thawed and used for each iteration of this assay. In brief, ∼582,000 cells were diluted into 40 mL of room temperature SC-Uracil liquid media, and this premix was divided across 4 test tubes where sets of 2 tubes were incubated at either permissive (25°C) or restrictive temperature (37°C). Each test tube containing diluted culture was rapidly thermally equilibrated using an appropriate temperature water bath for 30 min before being placed onto a rotating wheel in an incubator at the same temperature. The cultures were incubated for ∼24-48 h until A 600 2.0 was reached, cells were collected by centrifugation, and plasmids were isolated from the recovered cell populations using a commercial kit (OMEGA Bio-Tek E.Z.N.A. Yeast Miniprep kit) as per manufacturer's instructions. The experiment was performed twice across 2 different days, with each condition performed in duplicate, for a total of 8 replicates per temperature condition.

NGS and data analysis
The yeast plasmids recovered after thermoselection were subject to a shortened PCR with 15 cycles with oligonucleotide pairs oWS1408 (5′-GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAGC AGCATATAATCCCTGCTTTA-3′) and oWS1409 (5′-TCGTCGGCAG CGTCAGATGTGTATAAGAGACAGTCCAGGGGTGGTGCAAACTA-TG-3′) using a high-fidelity Q5 polymerase (New England Biolabs, Ipswich, MA) to attach overhang sequences. The PCR products were cleaned (OMEGA Bio-tek E.Z.N.A. Cycle Pure kit), quantified (Synergy H1 Hybrid Multi-Mode Microplate Reader), and submitted to the Georgia Genomics Bioinformatics Center (GGBC, Athens, GA) for Illumina MiSeq library assembly and sequencing (single-paired 100-bp reads starting 38 bases upstream of the CXXX region of interest).
The NGS analysis yielded over 10 million total reads, with ∼96% of individual reads at or above the quality score of Q30. All 8 replicates from 25°C yielded high-quality reads, while only 7 replicates from  37°C passed the quality cutoff. These 15 experimental samples, which represented about 3 million total reads, were carried forward for downstream analysis. In parallel, NGS was used to sequence 10 replicates of the E. coli plasmid library and 10 replicates of the plasmids extracted from the naïve yeast library prior to selection.
To assess the likelihood of farnesylation of Ydj1-CXXX variants, the reads within each replicate were first assessed to determine the number of occurrences for each unique CXXX sequence present. The occurrence of each CXXX sequence was summed across all replicates of the same temperature condition and then normalized to a frequency value. The latter was calculated by dividing the occurrence value for each unique CXXX by the total number of occurrences of all CXXX sequences (i.e. frequency = count unique-/count total ) for each of the 25°C and 37°C data sets. Lastly, the frequency value of each unique CXXX sequence was used to determine a unique enrichment factor (EF) score. This was calculated by dividing the frequency value of a unique CXXX sequence at restrictive temperature by that of the 25°C condition (i.e. EF score = frequency 37°C /frequency 25°C ). Some CXXX sequences were not recovered at 25°C, yet present at higher temperatures (n = 9; CFFM, CIFF, CILF, CIYM, CNWC, CTFA, CVFF, CVLW, CWIA). To account for such cases, a correction factor of +1 was applied across all frequency values at 25°C so that 1 would be the denominator for CXXX sequences with 0 occurrences at 25°C. We also applied the same correction factor of +1 to all frequency values at 37°C for consistency.

Weblogo sequence alignments
The top 5% (n = 400) sequences for both Ydj1-based and Ras-based screens were evaluated by Weblogo (http://weblogo.berkeley.edu/ logo.cgi) using a custom color scheme (Crooks, Hon et al. 2004). Cys was set to blue; polar charged amino acids were set to green (Asp, Arg, Glu, His, and Lys); polar uncharged residues were set to black (Asn, Gln, Ser, Thr, and Tyr); branched chain amino acids were set to red (Ile, Leu, and Val); all other residues were set to purple (Ala, Gly, Met, Phe, Pro, and Trp).

Heatmap analysis
CXXX variants were clustered into 20-member groups having shared amino acid pairs. An average EF value was determined for these groups to allow for contextual analyses exploring the relationships between a 1 vs a 2 , a 1 vs X, and a 2 vs X. The averages were then analyzed with Microsoft Excel version 16.65 using the Conditional Formatting and Color Scales (Green-Yellow-Red) function to produce 3 distinct heatmaps (HMs). The values reported within each cell of a heatmap represent the average EF value for each respective 20-member group. The EF scores associated with Ras Recruitment System (RRS) data (Stein et al. 2015) were evaluated in the same manner.
To determine the confidence interval (C.I.), the average HM values were averaged across each individual row and column of the heatmap, and the averages were used to determine a SD for either the row or column set of values. A SD calculation was next used to determine a 95% C.I. that was in turn used to establish high and low cutoffs for identifying positive selection or negative restriction. A pattern was deemed significant when 18 or more of each 20-member set (i.e. >90%) were above or below the statistical cutoff.

Likelihood of prenylation calculations for each CXXX variant
The averaged EF values from each of the 3 HMs were summed to produce a HM score. For example, to predict the likelihood of prenylation for CASQ, the averages of Cys-Ala-Ser-X (n = 20 varied at the X position), Cys-Ala-X-Gln (n = 20 varied at the a 2 position), and Cys-X-Ser-Gln (n = 20 varied at the a 1 position) were first calculated; then, these 3 HM values were summed to represent the HM score of CASQ.

Ydj1-CXXX thermotolerance assay
The assay was performed as previously described (Hildebrandt, Cheng et al. 2016;Berger, Kim et al. 2018;Ashok, Hildebrandt et al. 2020). In brief, yeast cultures in SC-Uracil media were incubated at 25°C until saturation and then a 10x serial dilution was applied in YPD before being pinned onto YPD solid medium. Plates were incubated for 2-4 days at 25, 37, and 39°C prior to results being digitally scanned face down without lids using a Cannon flatbed scanner (300 dpi; TIFF format). Scanned images were adjusted for consistency in image rotation, contrast, and size before being copied onto Microsoft PowerPoint version 16.65 for final figure assembly. The experiment was performed twice in duplicate.

Gel-shift assay
The assay was performed as previously described (Hildebrandt, Cheng et al. 2016;Berger, Kim et al. 2018;Schey, Buttery et al. 2021). Whole cell lysates of mid log cells were prepared and separated by sodium dodecyl sulfate-polyacrylamide gel electrophoresis (SDS-PAGE; 6% stacking with 9.5% resolving gel) then transferred onto nitrocellulose. Blots were blocked with 5% milk then sequentially incubated with rabbit anti-Ydj1 primary antibody (courtesy of A. Caplan) and HRP-conjugated goat antirabbit secondary antibody (Kindle Biosciences, Greenwich, CT). Fluorescence was detected using the KwikQuant Western Blot Detection Kit (Kindle Biosciences) and a KwikQuant Imager and as per manufacturer's instructions. Protein bands were quantified using NIH ImageJ, and resulting values were used for calculating ratios for prenylated and nonprenylated bands. The blots containing yeast Nap1 were treated similarly but incubated with mouse anti-His primary antibody (Thermo Fisher Scientific, Waltham, MA) and HRP-conjugated sheep antimouse secondary antibody (Kindle Biosciences, Greenwich, CT).

Decision tree modeling
Sequence motif features were generated by one-hot encoding for each of the variable CaaX motif sites: a 1 , a 2 , and X. Additional binary features were included to describe whether the variable residue was within a given set of residues. To define these sets, we enumerated all possible amino acid combinations with a max set size of 5. In total, 65,097 features were considered. While building the decision tree classifier, entropy was used to evaluate the quality of potential splits. To determine more generalizable rules, trees were allowed a maximum depth of 3, and all nodes were required to have a minimum of 50 samples. This method was implemented using Scikit-learn (v0.22.2) using the DecisionTree Classifier class.

Results
Many CXXX sequences sustain Ydj1-dependent growth of yeast at high temperature Ydj1 follows an isoprenylation-only pathway in contrast to canonical reporters (i.e. Ras GTPases, a-factor) that undergo additional downstream modifications (Trueblood, Boyartchuk et al. 1997;Stein, Kubala et al. 2015) (Fig. 1). As such, the use of Ydj1 as a reporter to probe yeast FTase specificity reduces the risk of introducing additional specificity bias from CaaX proteases. To evaluate the substrate scope of FTase, we adapted methods from a previous yeast study that utilized a Ras-based CXXX reporter (i.e. RRS) along with competitive growth enrichment and NGS methods (Stein, Kubala et al. 2015). In our case, we investigated the ability of Ydj1-CXXX variants to sustain high-temperature growth in a ydj1Δ genetic background ( Supplementary Fig. 1). A plasmid library of Ydj1-CXXX variants was generated using single codons for each amino acid so that all 8,000 CXXX sequences are represented in a relatively small library (see Material and methods). Compared with traditional plasmid library construction relying on fully or partly degenerate oligonucleotides, a Trimer 20-based strategy was used with the aim of yielding a balanced library with respect to codon redundancy (i.e. no over-representation of amino acids with multiple codons) and no introduction of early stop codons, which effectively reduces the number of copies needed to reach statistical confidence for full coverage (Patrick, Firth et al. 2003;Firth and Patrick 2005). While 110,500 independent clones were minimally required for statistical 100% coverage, our library contains 606,800 independent clones (i.e. 6× full coverage). In contrast, constructing a library with fully degenerate oligonucleotides to vary 3 amino acids would have required 3,200,000 independent clones for 100% coverage, leading to much higher labor and time costs for this study.
The Ydj1-CXXX plasmid library DNA was purified from E. coli before being transformed into the ydj1Δ yeast strain to yield a naïve ydj1Δ/YDJ1-CXXX yeast library. Both the E. coli plasmid library and the naïve yeast library were confirmed by NGS to contain all potential Ydj1-CXXX variants (Supplementary File 1). Ligation efficiency of vector alone was determined to be ∼0.3% relative to vector plus insert, indicating that a small portion of the library could encode YDJ1-SASQ (i.e. uncut parent plasmid). Consistently, YDJ1-SASQ was observed at frequencies of 0.065 and 0.071% in the E. coli and naïve yeast libraries, respectively. The frequencies of all CXXX sequences from the E. coli and naïve yeast libraries were also graphed, and no significant changes in the frequency profile between the 2 libraries were observed ( Supplementary Fig. 2). This analysis further revealed that the libraries were not perfectly balanced such that the extremes exhibited a ∼10× range in abundance.
The naïve yeast library was propagated for ∼8 generations at permissive (25°C) and selective (37°C) temperatures in liquid media until the culture was well saturated (A 600 ∼2.0). The selective temperature condition was expected to enrich for farnesylated Ydj1-CXXX variants. NGS methods were then used to identify all the CXXX sequences present in each culture, from which frequencies were determined for each CXXX sequence in each temperature group. Similarities in experimental design allowed for direct comparison of Ydj1 and Ras-based data sets. In the RRS study, the likelihood of farnesylation was reported by an EF (RRS EF) that was defined by the frequency of a specific CXXX sequence occurring at 37°C divided by its frequency at 25°C (i.e. frequency selective /frequency permissive ). A high RRS EF was interpreted as a high possibility of farnesylation. We performed a similar calculation for the Ydj1 data.
From our analysis, we observed that EFs for the Ydj1-based screen exhibited a narrower range (EF: 0.036 for CWWC to 13.898 for CVFF) relative to the RRS EFs for the Ras-based screen (RRS EF: 0.008 for CLRS to 70.667 for CYCM). We interpret these ranges to indicate that the majority of all CXXX sequences can support growth in the Ydj1-based screen, whereas a smaller number of sequences undergo higher enrichment during the thermoselection process in the Ras-based screen. The comparison also revealed that the EF profiles offered 2 distinct sequence landscapes, especially for farnesylated sequences that are well characterized: noncanonical CASQ (ScYdj1) and CKQQ (HsSTK11/Lkb1) and canonical CVIA (Sc a-factor) and CVLS (HsH-Ras). In the Ydj1 NGS-based screen, noncanonical sequences CASQ (EF: 1.608) and CKQQ (EF: 1.627) outperformed canonical CaaX sequences such as CVIA (EF: 0.406) and CVLS (EF: 0.494) (Fig. 2a). In contrast, in the Ras-based screen, CASQ (RRS EF: 0.399) and CKQQ (RRS EF: 0.399) were significantly less enriched, while CVIA (RRS EF: 10.124) and CVLS (RRS EF: 11.804) ranked among the top hits (Fig. 2b). The EFs of all CXXX sequences resulting from the Ydj1 screen are reported in Supplementary File 2.
Our observations suggest that the Ydj1-based screen best enriches noncanonical sequences, whereas the Ras-based screen best enriches canonical sequences. The top 5% (n = 400) of hits from the Ydj1 NGS-based assay did not exhibit an obvious consensus sequence (Fig. 2c); however, the same number of top hits from the Ras-based screen was enriched with aliphatic amino acids at the a 1 and a 2 positions (Fig. 2d). In the context of the Ras reporter, an aliphatic amino acid is especially prominent at the a 2 position, which has been historically regarded as a requirement for FTase specificity. The top 5% of hits from both the Ydj1-based and Ras-based screens were also displayed as a 4D plot (Fig. 2e, f; the size of the spot is the 4th dimension and represents relative abundance). This analysis revealed that the top hits were more widely dispersed across the CXXX sequence space in the Ydj1-based data relative to the Ras-based data and that the most abundant sequences differed between the data sets. Analysis of the data as 3D plots, viewed along each of the 4D plot axes such that lysine (K) is the nearest amino acid, provided additional details about the sequence space covered by each screen (Supplementary Fig.  3). In these 3D plots, the spots representing relative abundance are stacked behind each other, and the total number of spots occupying a particular node is not easily discerned. Although this arrangement makes it difficult to make detailed conclusions about the depth of coverage at each node (i.e. the number of amino acids present), it allows for clear conclusions about restrictions (i.e. amino acids that are not tolerated independent of context). From the perspective of a 1 , both data sets lack several amino acids at a 2 (i.e. D, E, G, K, and R) and a single amino acid at X (i.e. R) ( Supplementary Fig. 3a, b). Specific to the Ras-based data, additional amino acids were absent at a 2 (H and Y) and X (K, P, and W) or less prevalent at a 2 (i.e. A, P, Q, S, T, and W). From the perspective of a 2 , 1 amino acid was absent at the a 1 position within the Ydj1-based data set (i.e. P). One amino acid was absent at the X position in both data sets (i.e. R), and several amino acids were absent at the X position within the Ras-based data set only (i.e. K, P, R, and W) ( Supplementary Fig. 3c, d). From the perspective of X, 1 amino acid was absent at the a 1 position within the Ydj1-based data set (i.e. P), and the amino acids that were absent at a 2 from the perspective of a 1 were again identified, with 1 additional amino acid being absent (i.e. S) in the Ras-based data set ( Supplementary Fig. 3e, f). A general interpretation of these observations is that many amino acids can be accommodated at each position of the CXXX sequence independent of whether Ydj1 or Ras is the reporter, except for charged residues that are strictly not tolerated at a 2 in either case. There also appear to be additional exceptions in the context of the Ras reporter at both the a 2 and X positions that likely reflect its need for additional modification beyond initial isoprenylation.
To further analyze our Ydj1 NGS-based data, we crosschecked CXXX sequences that were previously identified as being farnesylated through a limited scope genetic Ydj1-based Temperature Screen (YTS) (Berger, Kim et al. 2018). Mostly noncanonical sequences were identified in the earlier study (n = 153). Many of these sequences exhibited high EFs in the present study (Fig. 3a). When superposed on the Ydj1 EF profile, most YTS-identified sequences were positioned in the 4th quartile (n = 77) with fewer in the 3rd quartile (n = 67) and fewest in the 2nd quartile (n = 9). None of the YTS hits were present in the 1st quartile. In contrast, superposition of the top hits from the Ras-based screen (n = 496; RRS EF > 3; the cutoff for positive hits of the Ras-based screen) on the Ydj1 EF profile revealed positioning of sequences in the 2nd, 3rd, and 4th quartiles of the EF profile (n = 228, n = 114, and n = 153, respectively), with the fewest in the 1st quartile (n = 1) (Fig. 3b). The few canonical sequences identified by YTS (Fig. 3a, dashed box; n = 15) remained within the range displayed by RRS sequences. Together, these results are fully consistent with our Ydj1 NGS-based method being useful for enriching noncanonical sequences that are farnesylated in addition to canonical sequences observed using the Ras reporter.

Noncanonical CKQX sequences are targeted by FTase
Emerging evidence indicates that some CKQX sequences are farnesylated despite the presence of nonaliphatic amino acids at a 1 and a 2 positions. CKQQ farnesylation is well documented for human STK11/Lkb1, human Nap1L1, and yeast Pex19, whereas CKQS derived from yeast Nap1 is farnesylated in the context of the Ydj1 reporter (Collins, Reoma et al. 2000;   Fig. 2. Comparison of results from the Ydj1-CXXX NGS screen and previously published Ras-CXXX NGS screen. Enrichment profiles of a) Ydj1-based and b) Ras-based screens offer 2 distinct sequence landscapes. Annotated controls: noncanonical ScYdj1 (CASQ) and HsLkb1/STK11 (CKQQ) and canonical Sc a-factor (CVIA) and HsH-Ras (CVLS). Weblogo frequency representation of top 5% (n = 400) of hits for c) Ydj1-based and d) Ras-based screens. A 4D representation of the space occupied by top 5% of hits recovered from e) Ydj1-based and f) Ras-based screens. The size of the dot is the 4th dimension and corresponds to the EF of the CXXX motif within the respective screen. Sapkota, Kieloch et al. 2001;Tate, Kalesh et al. 2015;Storck, Morales-Sanfrutos et al. 2019;Berger, Yeung et al. 2022). We additionally confirmed that farnesylation of CKQS occurs in the natural context of yeast Nap1 itself using a gel-shift assay that evaluates the mobility of farnesylated species by SDS-PAGE ( Supplementary Fig. 4a). Of note, farnesylation has an opposite effect on Nap1 mobility when compared with Ydj1. Moreover, the ScanProsite tool (https://prosite.expasy.org/ scanprosite/) identifies 5,179 entries across UniProtKB/ Swiss-Prot and UniProtKB/TrEMBL reference proteome sequences ending in CKQX across eukaryotes (Supplementary File 3). The majority of these sequences (i.e. 65%) contain CKQQ or CKQS. Out of the 20 CKQX variants, most (n = 17 positive hits) displayed a Ydj1 EF consistent with a high likelihood of farnesylation (Fig. 4a). The remaining sequences (n = 3) had low EFs, were well separated from the other CKQX sequences on the EF profile, and were expected to have a low likelihood of modification (i.e. negative hits). By comparison, none of the 20 CKQX sequences were predicted to be targets of FTase in the Ras-based EF profile (Fig. 4b).
The farnesylation status of Ydj1-CaaX variants was monitored by thermotolerance and gel-shift assays (Caplan, Tsai et al. 1992;Hildebrandt, Cheng et al. 2016). The thermotolerance assay confirmed the 17 CKQX positive hits to have growth profiles similar to that of wildtype Ydj1, whereas the negative hits (n = 3) had a growth profile consistent with unmodified Ydj1 (Fig. 4c and Supplementary Fig. 4b). In strong agreement with the EFs and thermotolerance profiles, a prenyl-dependent gel-shift was observed for the 17 positive Ydj1-CKQX hits and no shift observed for the 3 negative hits. It should be noted that CKQR displayed a doublet gel pattern in both the presence and absence of FTase. The reason for this doublet pattern is unknown; however, the pattern was identical with and without FTase, so CKQR was deemed as having 0% prenylation.

Fig. 3.
Relative performance of previously identified farnesylated CXXX sequences. Sequences previously identified in a a) Ydj1-based genetic selection or a b) Ras-based competitive growth assay were superimposed on the Ydj1 EF profile. Each sequence is represented as a point with an associated score where the second Y-axis refers to the scoring range used in these previous studies (YTS T-score, Ydj1-based Temperature Screen Thermotolerance Score; RRS EF, Ras Recruitment System enrichment factor). The limited number of canonical sequences that were identified by YTS is identified by a dashed box (n = 15).

Statistical analyses reveal yeast FTase specificity
Our evaluation of genetic and biochemical data suggested that FTase is not limited to target sequences having aliphatic amino acids at a 1 and a 2 . To identify factors that could better indicate target specificity, we assessed the likelihood of prenylation for sets of CaaX motifs that shared the same context, differing only at a single amino acid position. Using heatmap analysis, we sought to identify potential patterns marked by positive selection (colored in green) or negative restriction (colored in red) across a 1 , a 2 , and X positions in both Ydj1-based and Ras-based results (Fig. 5). These patterns were identified as instances where the HM values of 18 or more of each 20-member set (i.e. > 90% of EF Fig. 4. Evaluation of CKQX variants as FTase substrates. Enrichment profiles of CKQX sequences in a) Ydj1-based and b) Ras-based screens. c) Thermotolerance and gel-shift assays of Ydj1-CKQX variants provide supporting evidence toward the prenylation of noncanonical CKQX sequences by yeast FTase. Thermotolerance assays were performed by culturing yeast to the same density, applying a 10× serial dilution to the cultures, spotting dilution sets onto YPD-rich media, and incubating at indicated temperatures. Notably, plate-based selection requires a slightly higher temperature to mimic the growth profile observed on liquid-based selection. The strains used were transformants of yWS304 containing the indicated Ydj1-CXXX variant. Gel-shift assays were performed for the indicated Ydj1-CXXX variants in the presence and absence of FTase activity. Total yeast extracts were analyzed by SDS-PAGE and immunoblotting. The strains used were yWS2544 (+FTase) and yWS2542 (ram1Δ; -FTase. Seq, sequence; EF, enrichment factor; %, percent prenylation. scores) were well above or below the statistical average for all values (n = 400) within a heatmap (Supplementary File 4).
For the Ydj1-based data, no positive patterns were observed for the a 1 position in combination with a 2 (Fig. 5a); however, 1 restrictive pattern, D at a 1 , was observed in combination with X (Fig. 5b). Generally, a 1 appears to tolerate many different types of amino acids and does not seem to strongly influence reactivity with FTase in either a positive or negative manner. When considering the a 2 position, no positive patterns were observed, but several negative patterns were evident. Specifically, D, K, and R were restrictive in combination with either a 1 (Fig. 5a) or X (Fig. 5c), with E being additionally restrictive in combination with a 1 . Generally, a 2 also appears to tolerate many different types of amino acids except for charged amino acids that negatively influence reactivity with FTase. For the X position, a positive pattern, Q at X, was observed in combination with a 1 (Fig. 5b). Several negative patterns were also evident at the X position. Both P and R were restrictive in combination with either a 1 or a 2 , with K being additionally restrictive in the context of a 1 . Generally, X seems to tolerate many different types of amino acids except for positively charged amino acids and structurally constrained proline (Fig. 5c). Thus, despite repeated reports that FTase targets canonical CaaX sequences, our Ydj1-based data strongly indicates that FTase tolerates most CXXX sequences unless amino acids are present that restrict its specificity. Similar heatmap analysis of the Ras-based data also revealed fewer positive than negative patterns. The 2 positive patterns were I at a 2 (Fig. 5d) and M at X (Fig. 5e), both in combination with a 1 . Most of the negative patterns observed were primarily constrained to a 2 and X positions in all contexts (Fig. 5c, f). The a 1 position, on the other hand, displayed fewest negative patterns; only 2 patterns were observed at E in combination with a 2 and K in combination with X. The combined observations that both Ydj1 and Ras-based data reveal high tolerance at the a 1 position while a 2 and X positions have more restrictions is consistent with previous reports on FTase specificity (Reid, Terry et al. 2004;Stein, Kubala et al. 2015).
To fully account for the contextual information from heatmap analysis, the 3 heatmap values from the Ydj1 NGS data set were summed for each specific CaaX motif to obtain a HM score. This score represents the sum of 3 averages, where each average accounts for 20 EF data points. The advantage of the HM score is that it normalizes statistical outliers that could skew results when using individual EF scores. The HM scores, when a cutoff of 3 was applied, correlated well with prior evidence of prenylation gathered from previous studies, outperforming published models such as Prenylation Prediction Suite (PrePS), RRS, and The Ydj1 gel-shift patterns of these sequences revealed a doublet in both the presence and absence of FTase, so they were deemed as having 0% prenylation; the reason for the doublet pattern is unknown. our recently published SVM algorithm for predicting farnesylation of noncanonical sequences (Maurer-Stroh and Eisenhaber 2005;Stein, Kubala et al. 2015;Berger, Yeung et al. 2022) (Table 3).
Lastly, we trained a predictive model using the results of the Ydj1-based NGS screen to predict whether a given CXXX sequence can be modified based on a subset of questions (decisions) and possible consequences in a decision tree model ( Fig. 6 and Supplementary Fig. 5). Each question pertains to the whether certain amino acids are present in the a 1 , a 2 , and X positions. The decision tree was created by systematically identifying the best set of questions for predicting whether a CXXX sequence can be modified. This allowed us to identify a general set of rules which govern FTase substrate specificity. Based on our model, the biggest determinant of FTase activity is the restriction of D, E, K, or R at the a 2 position, followed by the restriction of K, P, or R at the X position. However, amino acids such as I, L, M and V at the a 2 position seem to neutralize the negative effects of K, P, or R at the X position. Our findings repeatedly raise the likelihood that restrictions, rather than tolerance, are applied to a few impermissible amino acids at a 2 and X positions of potential substrates modified by FTase. It should be noted that although this simple flowchart is useful in understanding general rules of yeast FTase specificity, HM scores provide a more individualized prediction of the prenylation status of each CXXX sequence.

Discussion
In this study, the use of Ydj1 as a reporter instead of Ras allowed for the identification of yeast FTase targets that extend beyond the canonical CaaX consensus sequence. Notably, this study corroborates previously published observations that noncanonical sequences can be targets of FTase. Restrictions by a small number of amino acids, rather than adherence to the canonical CaaX consensus, seem to guide FTase specificity. We believe that these findings can be used to improve tools for predicting farnesylation status that will outperform current prediction methods.
Structural analysis of mammalian FTase has previously shown that the aaX portion of the protein substrate makes extensive Van der Waals contact with the adjacent isoprenoid group, which is hypothesized to modulate product release (Long, Casey et al. 2002;Reid, Terry et al. 2004). However, these interactions are not predicted when nonaliphatic residues are at a 1 and a 2 . Similar studies of yeast have not been reported. Nevertheless, our finding that yeast FTase can tolerate nonaliphatic residues at a 1 and a 2 raises the possibility that the catalytic site of yeast FTase may display a greater degree of structural flexibility beyond the conformations that could be sampled from mammalian FTase in complex with canonical substrates. Our findings also lead us to predict that binding of charged amino acids (D, E, K, and R) may be unstable in the a 2 binding pocket. Likewise, certain amino acids (K, P, and R) at the X position may be unable to coordinate with the product binding pocket of the FTase. These situations may result in the formation of a farnesylated peptide that is unable to be released from the enzyme, thus inhibiting FTase from turning over such substrates efficiently. These possibilities could be resolved by future structural studies involving noncanonical CXXX peptide substrates in complex with either yeast or mammalian FTase, which are structurally and functionally homologous (Kohl, Diehl et al. 1991;Gomez, Goodman et al. 1993;Omer, Kral et al. 1993).
Our study was focused on the recognition of CXXX sequences with respect to FTase specificity. It has been observed, however, that upstream sequences can impact FTase specificity (Cox, Graham et al. 1993). The impact of upstream sequences has been explored by prediction algorithms, most notably by PrePS (Maurer-Stroh and Eisenhaber 2005). It is critical to acknowledge that the Ydj1-CXXX NGS screen recovered sequences that support high-temperature growth in the context of Ydj1. Thus, any effect of upstream sequence has not been explored in this work. Additionally, it is possible that some Ydj1-CaaX variants affect growth in ways unrelated to prenylation status, which could impact some of our results. Still, our YDJ1 HM scoring system outperformed predictions made by PrePS, RRS, and SVM machine learning algorithm (Maurer-Stroh and Eisenhaber 2005;Stein, Kubala et al. 2015;Berger, Yeung et al. 2022) (Table 3). Combining PrePS predictions that consider upstream sequence in addition to the observations presented in this work is expected to provide the most robust prediction for farnesylation of individual protein targets. As more of the prenylation predictions are validated using additional techniques in the future, it will be possible to determine which prediction methods are most reliable.
Another caveat to the study is the potential for modification of our reporter by GGTase-I, the prenyltransferase that appends a C20 geranylgeranyl group to the cysteine of certain CaaX motifs. The X position is considered to differentiate the specificity of FTase and GGTase-I, where CaaX sequences ending in L, F, M, or I are more likely to be GGTase-I targets (Caplin, Hettich et al. 1994;Hartman, Hicks et al. 2005). The current study samples all possible CXXX combinations, which raises the possibility that some sequences may be modified by GGTase-I. Yet, Ydj1 variants with the CaaL/F/M/I motif (n = 36; a = I, L, V) had a relatively low average EF score (EF: 1.23 ± 0.71) compared to the top 5% of Fig. 6. A simplified decision tree for predicting yeast FTase substrates. The output of Ydj1 EF scores was fit into a decision tree model considering each of the variable CaaX motif sites: a 1 , a 2 , and X. Our results strongly suggest that a major determinant of FTase activity depends on a select few discriminatory amino acids at a 2 (D, E, K, and R) and X (K, P, and R) positions, with the potential for certain allowable amino acids at a 2 (I, L, M, and V) that can mitigate discrimination caused at the X position. For the full decision tree, see Supplementary Fig. 5. sequences described in Fig. 2c (EF: 3.54 ± 1.13). The more broadly defined category of CxxL/F/M/I (n = 1600; x = all 20 amino acids) also had a relatively low average EF score (EF: 1.26 ± 0.88). Further investigations will be necessary to fully resolve the CXXX space targeted by FTase vs GGTase-I, which could be achieved using a similar strategy to that described in this study but utilizing a ram1Δ genetic background.
Many proteins undergo a process known as isoprenylation. Historically, FTase is recognized as targeting the CaaX sequence, but emerging evidence, including that reported here, has indicated that FTase substrates are not limited to the canonical CaaX sequence. Our methods, which primarily focused on yeast FTase, provides insights into the broader specificity of this process which likely extends to human and other FTase enzymes. The discovery of farnesylated noncanonical sequences such as CKQX associated with large families of proteins like Stk11/Lkb1 and Nap1, which lack a canonical CaaX motif, opens a new understanding of post-translational isoprenylation and the potential for future discoveries in this field. Our results also demonstrate that the target specificity of FTase is mainly due to restrictions at the a 2 and X positions, rather than selection toward a canonical CaaX sequence. As Ras and other canonical reporters rely on multiple modifications for function, previous studies using these proteins may have reported on sequences that are suited for the combined specificities of FTase and downstream enzymes (i.e. CaaX proteases ScSte24 or HsRce1) instead of the specificity of FTase alone. Our study, in combination with data from previous studies, provides new ways to advance the understanding of the specificities of each enzymatic step associated with the post-translational isoprenylation pathway, laying out a crucial step toward identifying its cellular targets and the extent to which they are modified.

Data availability
Yeast strains and plasmids are available upon request. All relevant data sets for this study are included in the supplemental files of the manuscript. Supplementary File 1 contains frequency of CXXX sequences observed in naive libraries. Supplementary File 2 contains EF results of Ydj1-based NGS. Supplementary File 3 contains CKQX hits identified on UniProt. Supplementary File 4 contains heatmap analysis. The data derived through the Ras Recruitment System was previously published and reanalyzed here (Stein, Kubala et al. 2015).
Supplemental material available at G3 online.