Stochastic scanning events on the GCN4 mRNA 5’ untranslated region generate cell-to-cell heterogeneity in the yeast nutritional stress response

Abstract Gene expression stochasticity is inherent in the functional properties and evolution of biological systems, creating non-genetic cellular individuality and influencing multiple processes, including differentiation and stress responses. In a distinct form of non-transcriptional noise, we find that interactions of the yeast translation machinery with the GCN4 mRNA 5’UTR, which underpins starvation-induced regulation of this transcriptional activator gene, manifest stochastic variation across cellular populations. We use flow cytometry, fluorescence-activated cell sorting and microfluidics coupled to fluorescence microscopy to characterize the cell-to-cell heterogeneity of GCN4-5’UTR-mediated translation initiation. GCN4-5’UTR-mediated translation is generally not de-repressed under non-starvation conditions; however, a sub-population of cells consistently manifests a stochastically enhanced GCN4 translation (SETGCN4) state that depends on the integrity of the GCN4 uORFs. This sub-population is eliminated upon deletion of the Gcn2 kinase that phosphorylates eIF2α under nutrient-limitation conditions, or upon mutation to Ala of the Gcn2 kinase target site, eIF2α-Ser51. SETGCN4 cells isolated using cell sorting spontaneously regenerate the full bimodal population distribution upon further growth. Analysis of ADE8::ymRuby3/ GCN4::yEGFP cells reveals enhanced Gcn4-activated biosynthetic pathway activity in SETGCN4 cells under non-starvation conditions. Computational modeling interprets our experimental observations in terms of a novel translational noise mechanism underpinned by natural variations in Gcn2 kinase activity.


INTRODUCTION
Gene expression stochasticity underpins a wide range of phenomena that are critical to organism functionality and viability, including cellular auto-regulatory circuits, phenotypic varia tion, dif ferentia tion, str ess r esponses, synchrony in circadian clocks, and probabilistic fate decisions such as viral latency (1)(2)(3)(4)(5)(6)(7)(8). These various lines of evidence point to a major role for noise in evolution (9)(10)(11)(12). At the same time, other reports have re v ealed that noise is a potentially damaging source of imprecision, for example impacting on signaling and regulation (13)(14)(15)(16). In response to this threat of disor der, li ving systems use multiple mechanisms to keep r andomness under control. Over all, understanding gene expression noise, and the mechanisms used in living systems to manage it, is essential to achieving a complete understanding of biology. Such knowledge also provides important guiding principles for the design and engineering of biological systems.
Gene expression noise is generally categorized in two different components: intrinsic noise that is attributed to inherent stochasticity of expression from a specified gene system, and extrinsic noise that results from fluctuations in the intracellular en vironment, f or example linked to the cell cycle and / or changes in the capacity of the expression machinery (17)(18)(19). It has previously been observed that total noise squared ( η tot 2 ) equates to the sum of the intrinsic and extrinsic noise components η 2 int + η 2 ext ( 13 ). Stochastic variations in the expression of reporter genes encoding fluorescent proteins ar e r eflected in heterogeneity in the le v els of these proteins in individual cells. The work on gene expression noise in eukaryotes (predominantly high-throughput genome-wide studies) has generally emphasized the influence of cell-to-cell variations in mRNA abundance that are dri v en by fluctuations in transcription, whereby correlations have been identified between noise level and variables that include promoter structure, gene function and chromatin density ( 16 , 20 ).
In contrast, until r ecently, ther e has been very little progress in understanding how posttranscriptional steps might contribute to noise generation. One comparable study in the yeast Saccharomyces cerevisiae found that noise strength for GFP gene expression increased linearly with transla tion ef ficiency (v aried b y changing codon usage; ( 21 ). On the other hand, two other studies in yeast indicated that intrinsic noise scales inversely with protein abundance ( 20 , 22 ), but did not dif ferentia te between transcriptional and posttranscriptional contributions. In contrast, intrinsic noise in mammalian cells does not always show this rela tion a t lower protein abundance values ( 23 ). Other work suggested that a high tRN A ada ptation index is correlated with noise ( 24 ). Gi v en the apparent contradictions in previous research, we recently performed detailed studies of the contribution of translational e v ents to gene expression noise generation ( 25 , 26 ). These studies on reporter genes have shown that constraints on translation initiation imposed by structural elements in the 5'UTR provide an additional source of noise. This suggested that more complex endogenous 5'UTRs could impose a variety of noisy behavioural features on the expression of eukaryotic genes.
In this context, it is relevant to consider how gene expression noise can be linked to stochastic switching between sta tes tha t influence cell viability in variable environmental conditions. This has led to discussion of the concept of 'bet-hedging', which is a term borrowed from the financial sector to describe investment in opposite outcomes in order to provide protection against monetary losses. Analogous strategies based on non-genetic heterogeneity (noise) are now thought to underpin viability and / or survival in a number of organisms. For example, slow-growing 'persister' cells of Esc heric hia coli can withstand extended exposure to antibiotic treatment, and can switch to faster growth once the antibiotic is removed ( 27 ). Human melanomas can contain small subpopulations of cells that divide slowly and are thus resistant to chemotherapy ( 28 ). Transcriptional bet-hedging cases have been identified in S.cerevisiae : first, slo wer-gro wing cells that produce higher le v els of the tr ehalose-synthesis r egulator Tsl1 enhance the probability tha t a popula tion will survi v e heat stress ( 29 ); second, probabilistic activation of the galactose gene regulatory network in a subpopulation of cells enables yeast to undergo a more rapid metabolic transition from glucose to galactose ( 30 ). Howe v er, understanding of the mechanistic basis of such bet-hedging phenomena is limited and, looking at the wider picture, it is evident that the potential contribution of translation has recei v ed v ery little attention. At the same time, there is evidence that cell-to-cell heterogeneity can also be advantageous in stab le, e v en benign, environments ( 31 ).
In this new study, we have turned our attention to the stochasticity of transla tion initia tion on another example of a complex 5' untranslated region, that of GCN4 in Sacchar om y ces cer evisiae . An e xtensi v e body of pre vious wor k has identified the key deterministic features of translational regulation of the GCN4 transcriptional activator in terms of cell-popula tion da ta averages ( 32 , 33 ); (Figure 1 ). The Gcn4 protein upregulates the transcription of genes involved in a large number of biosynthetic pathways, including those for amino acids, purines and vitamin-cofactors, and also activates synthesis of mitochondrial carrier proteins, amino acid transporters and autophagy proteins. Translation via the GCN4 5'UTR (5'UTR GCN4 ), which contains four short upstream open reading frames (uORFs), is induced in response to starvation of amino acids, purines or glucose, as well as exposure to sodium chloride and rapamycin ( 32 ). Under non-starvation conditions, the fourth uORF in the GCN4 5'UTR is generally assumed to act as a barrier to translation of the downstream open reading frame corresponding to the GCN4 coding DNA sequence (CDS) by promoting dissociation of ribosomal subunits and mRNA molecules following termination of its encoded three-amino-acid polypeptide ( Figure 1 ); ( 32 ). In contrast, uORF1 functions as a positi v e element that promotes further downstream scanning ('rescanning') that allows rebinding of the ternary complex (TC: GTP-eIF2-Met.tRNA i Met ) and thus reacquisition of the competence to initiate on downstream start codons, most importantly uORF4. uORFs 1 and 4 are known to be the major regulatory elements acting on translation of GCN4 , since removal of uORFs 2 and 3 has only a minimal effect on the regulatory functionality of the 5'UTR GCN4 ( 32 ). Under induction (nutritional stress) conditions, phosphorylation of the ␣ subunit of eIF2 [by general control nonder epr essible 2 (Gcn2) kinase] converts eIF2 to a competiti v e inhibitor of the guanine nucleotide exchange factor eIF2B. This has two effects: first, it suppresses global protein synthesis; second, it modulates the re-initiation kinetics of ribosomal preinitiation complexes scanning downstream of uORF1 so that some can scan past uORF4 and re-initiate instead on the start codon of the GCN4 CDS, thus differentially activa ting transla tion of this r eading frame (Figur e 1 ).
In contrast to the earlier body of GCN4 -related research mentioned above, the present study focuses on regulatory heterogeneity at the single-cell le v el. We find that a subset of cells in any gi v en yeast population manifests a high 5'UTR GCN4 -mediated expression state under nonstarvation (non-induced) conditions, and we investigate the stochastic processes that underpin this phenomenon. We conclude that the 5'UTR GCN4 generates a previously unknown type of translational stochasticity that, in turn, results in a corresponding degree of cellular heterogeneity with respect to activation by Gcn4 of biosynthetic pathways Figure 1. GCN4 transla tional regula tion scheme. According to this e xtensi v ely tested model ( 32 ), at high amino acid availability ( A ), normal physiological le v els of the ternary complex (TC: GTP-eIF2-Met.tRNA i Met ) drive rapid binding to the 40S ribosomal subunit, enabling it to scan effectively. Moreover, TC binds 40S relati v el y ra pidl y (post-termina tion) downstream of uORF1 (an uORF tha t promotes post-termina tion scanning and re-initia tion), so tha t the 40S can resume scanning and recognize a downstream uORF (reinitiate). If this downstream element is uORF4, the context of the termination codon promotes release of ribosomes and pre v ents further scanning. In non-starvation conditions ther efor e, uORF4 acts as a block to initiation on GCN4 . When amino acid availability is low (or this state is mimicked by adding the inhibitor 3-AT) ( B ), high le v els of uncharged tRNA cause the Gcn2 kinase to phosphorylate eIF2 ␣, which in turn inhibits the GDP-GTP exchange activity of eIF2B. This causes a reduction in the intracellular TC abundance, thus slowing the rate of TC binding to the 40S subunit. Thus, many 40S subunits now scan through uORFs 2-4 without reinitiating, only to bind TC in the long region between uORF4 and the GCN4 main ORF, thus enabling (re)initiation on GCN4 . The GCN4 5'UTR (5'UTR GCN4 ) retains much of its regulatory capacity after removal of uORFs 2 and 3 ( 32 ), and ther efor e we have focused on the roles of uORFs 1 and 4 here. The stochastic properties of 5'UTR GCN4 -media ted transla tional regula tion ar e described for the first time in the curr ent work. under non-starvation conditions. We also illustrate, using a newly developed ma thema tical model, tha t the existence of the sub-population manifesting enhanced 5'UTR GCN4media ted transla tion can only be explained in terms of stochastic e v ents that ar e not inher ent to the canonical GCN4 regulatory model that applies to population averages.

Strain construction
The Sacchar om y ces cer evisiae strains used in this stud y were all deri v ed from the background strain PTC830: MAT ␣ ura3-1 leu2-3, 112 his3-11, 15 can1-100 (a deri vati v e of W303). The ymNG expression reporter constructs were integra ted a t the CAN1 locus. The genes encoding yeGfp and mRuby3 fluorescent proteins were integrated at the Ctermini of the natural genomic GCN4 cds and ADE8 cds, respecti v ely.

Flow cytometry, cell sorting and gene expression stochasticity analysis
Cells were prepared for flow cytometry as described previously ( 25 , 26 ). Yeast cells expressing the yEGFP or ym-NeonGr een r eporter genes wer e excited using a 488 nm laser, and fluorescence was collected through 505 nm longpass and 530 / 30 nm band-pass filters on a BD Fortessa X20 flow cytometer. For dual-colour reporter strains, yEGFP or ymNeonGreen was excited and fluorescence was collected using the same laser and filters as described above, while mRub y3 was ex cited using a 561 nm laser and its fluorescence collected through a 600 long-pass plus 610 / 20 nm band-pass filters. The data wer e r ecorded using the 'Area' option. Flow cytometry data were exported from the acquisition program (FACSDiva) in the FCS3.0 format with a data resolution of 2 18 . A custom R programme was written [using flo wCore, flo wV iz and flowDensity Bioconductor packages; as described previously ( 25 , 26 )] to calculate statistics for each file.
For calculating the coefficients of variation, cytometry files were processed as follows: (i) the first second, and final 0.2 s, of data were removed to minimize errors due to unstable sample flow through the cytometer; (ii) thresholds of 40 000-100 000 and 10 000-90 000 for the FSC and SSC gates, respecti v el y, were typicall y used to limit the influence of cellular debris and aggregated cells; (iii) for the remaining data, the FSC and SSC values of the highest density centre of the FSC-SSC scatterplot were calculated, and the distance of the ith sample to the centre was determined: distance i = √ ((FSC i -FSC centre) 2 + (SSC i -SSC centre) 2 ); (iv) the fluorescent reporter (e.g. ymNG) data within the radius were used to calculate the coefficient of variation (CV), i.e. CV = / . The intrinsic, extrinsic and total noise from dual-color reporter flow cytometry data were calculated as described previously ( 25 , 26 ).
Fluor escent r eporter (e.g. ymNG) data wer e obtained from six independent experiments, whereby the centre point for the scatter plot analysis was either set automatically, or manually at FSC = 59 000 / SSC = 27 000. The average number of cells analyzed gi v en a radius limit of 4000 was a pproximatel y 780 (900). This gate radius was chosen as a compromise point at which, over multiple experiments, the variation between experiments was minimal and the number of cells analysed provided statistically meaningful results. This procedure is similar to one reported previously ( 20 ) except that, by focusing on the cell density centre, we hav e been ab le to maximize the number of cells that are sampled.

Live-cell imaging and data analysis
One day prior to an experiment, single colonies from each of the strains were picked and grown overnight in YNB (plus amino acids; 2% glucose) to sa tura tion with shaking a t 30 • C . The following morning, cells were diluted to gi v e an optical density at 600nm (OD 600 ) of ∼0.2, and these diluted cultures were grown to mid-log phase (OD 600 ∼0.6) with shaking a t 30 • C . This procedure allowed us to maintain the cultures in the exponential growth phase right up to the time of measurement. 3 l of mid-log phase culture from each strain were loaded onto clean slides for imaging acquisition.
Li v e-cell images were acquired on a Nikon Ti Spinning Disk confocal microscope, equipped with a 60 × objecti v e, 1.4 numerical aperture using 1.515 refracti v e inde x oil and an ANDOR DU-888X EMCCD camer a. The temper ature of the incubation chamber was set a t 30 • C . Cells were imaged in two dimensions. Images were acquired for each field of view using yEGFP / ymNeonGreen (excitation 488 nm, emission 521 nm) and mRuby3 (excitation 561 nm, emission 610 nm) filter sets in this order. The background strain PTC830 (not expressing any reporter gene) was imaged alongside the sample strain as a negati v e control.
Cell segmentation and image analysis were performed using a custom-written R programme (using EBImage Bioconductor packages). A common pre-processing step involves cleaning up the images by removing local artifacts or noise through smoothing. Images were smoothed using the wra pper function w hich performs Gaussian filter smoothing. Image segmentation was performed to identify the in-dividual cells using the watershed algorithm over the binary image ( 34 ).

CRISPR / cas9 genome editing
In this study, the site-directed mutant strains were generated using CRISPR-Cas9 genome editing technology as described previously ( 35 ). The vector (pML104) for the one-plasmid system of CRISPR-Cas9 gene editing ( www. addgene.org ) contains both the Cas9 gene and the guide RNA expression cassette. To design the specific single guide RN A (sgRN A), we used an online tool ( http:// wyrickbioinfo2.smb.wsu.edu/crispr.html ) to aid identification of unique guide RNA target sites in the yeast genome. RNA expression cassette double-stranded DNA fragments (gBlocks) corresponding to the selected guide RNAs were synthesized by Integrated DN A Technolo gies. Yeast transf ormations f or CRISPR-Cas9 gene editing were perf ormed as described previously ( 35 ). A synthetic double-stranded gBlock template with the targeted gene mutation was used to introduce each nucleotide substitution. To validate creation of the mutants, transformants were isolated for genomic DNA extraction and sequence analysis.

Microfluidic devices for single cell studies
The configuration of the microfluidic devices used in this work (Supplementary Figure S1) was based on the principle of hydrodynamic pr essur e holding individual cells within suitably designed jail traps comprising PDMS pillars. We deri v ed our de vices from a design outlined previously ( 36 ), and used a chrome-plated glass photolitho gra phy mask for their manufacture at a tolerance of ±0.15 m. After calibration of the spreading of the SU8 photoresist by a spin coater, silicon wafer moulds were created to the desired dimensions and tolerances. These moulds were utilized in the manufacture of PDMS casts for use in the microfluidics experiments. Media flow through the assembled microfluidics devices was managed using a Fluigent pr essur e-controlled system, which allowed automatic feedback under the control of Fluigent AIO software. The statistics of individual cell division rates during experiments confirmed that nutrient provision in the microfluidics device supported exponential growth a t ra tes (cell-cycle completion time 88 ± 21 min) comparable with the maximal rates observed in the low-density exponentially growing batch cultures used elsewhere in this work. Cell clumps were removed from cell cultures prior to loading into the microfluidics device using a combination of mild sonication (15 s at low power) in a sonica tion ba th and filtra tion through a 10 m filter. Control experiments using flow cytometry and live-cell imaging showed that short exposure to such low power sonication elicited no detectable stress response from the cells.

Computational modeling
The model is deri v ed from the totally asymmetric simple exclusion process [TASEP; ( 37 )], as applied previously to GCN4 translation ( 38 ).

Heterogeneity of expression from the GCN4 5'untranslated region
We investigated the expression characteristics of the GCN4 system at the single-cell le v el using a genomic fusion construct comprising the GCN4 5'UTR (5'UTR GCN4 ) coupled to the yeast-optimised mNeonGreen coding sequence (ymNG CDS; Figure 2 A). In order to focus on translational control mediated by the 5'UTR GCN4 , this construct was placed downstream of the constituti v e P TEF1 promoter. We reported previously that this promoter generates single cell mRNA copy number data that fit well to a standard neg-ati v e binomial distribution across a yeast population ( 26 ). Examination of the cell-to-cell heterogeneity of ymNG expression with the help of fluorescence microscopy re v ealed that the cell population did not uniformly manifest the noninduced state (Figure 2 B). Unexpectedly, flow cytometry revealed that a significant proportion ( ≥3.0% of total cells) of the population manifests high fluorescence intensity (Figure 2 C), while most of the cells show lower fluorescence activity that is consistent with the expected non-induced state. Indeed, the intensity range recorded for the stochastically enhanced transla tion sta te (SET GCN4 ) sub-popula tion of cells mapped to the fluorescence distribution observed for cells responding to amino-acid starvation, as induced by the addition of 3-aminotriazole (3-AT; Figure 2 D, E), a competiti v e inhibitor of imidazoleglycerol-phosphate dehydratase (encoded by HIS3 ) whose addition mimics amino acid starvation ( 32 ). In summary, the SET GCN4 cells seem to conform to a separate distribution from the main fluorescence distribution of non-induced cells, whereby the SET GCN4 distribution is narrower, has a higher fluorescence intensity mean, and accommoda tes approxima tely 3% of the total population. As we consider in more detail later in this paper, the fluorescence intensity data for the whole population do not fit to a single normal or skewed-normal distribution. A full analysis is provided in the Supplementary Data section (including the embedded Supplementary Figure S10 in that section).

A stochastic asymmetric expression distribution determined by the GCN4 5'UTR
The observation that, in an exponentially growing culture, there is a sub-population of non-starved cells in which expression of the CDS downstream of the 5'UTR GCN4 is at a le v el characteristic of starvation-stressed cells led us to investigate in further detail the noise characteristics of transla tion media ted via this 5'UTR (as manifested in cell-tocell heterogeneity). A plot of forward-scattering vs sidesca ttering da ta (Figure 3 A) for the reporter construct strain (described in Figure 2 A), plus a version of this plot highlighting the distributions of these variables for the respecti v e cell populations (Supplementary Figure S2), re v eal that the SET GCN4 state is observed in cells (highlighted in red) with a wide variety of sizes, shapes and internal structures. We used fluorescence-activated cell sorting (FACS) to isolate cells belonging to the most intensi v ely fluorescing half of the SET GCN4 sub-population (Figure 3 B) from the remaining cell population and asked the question whether this sub-group would maintain its high-expression status over further generations of growth. The result was striking: within 10 generations ( < 20 h) of further growth, the original dominant non-induced fluorescence distribution overlapping with the smaller SET GCN4 sub-population was reestablished (Figure 3 C). The same result was achie v ed if solely non-SET GCN4 cells were selected as the starting point for this regrowth experiment. In other words, the asymmetric expression distribution (including the SET GCN4 state) is a default sta tus tha t is determined by stochastic processes tha t ar e inher ent to any randomly selected population of yeast cells. Ther efor e, the SET GCN4 sub-population does not correspond to a distinct genetic variant. Each flow cytometry experiment reveals the distribution of fluorescence intensity in the individual members of a cell population within a relati v ely short time window. We ther efor e utilized microfluidics in combination with timelapse fluor escence microscop y to perform continuous li v eimaging of exponentially growing individual cells within a cell population over the full cycle of cell division. Small volumes of exponentially growing yeast cultur es wer e introduced into a temperature-controlled microfluidic device so that the trap array was populated with single (budding) cells (illustrated by images of selected cells in Figure 3 D-H). The growth environment within the microfluidics device was maintained in a stable stead y-sta te using precisely regulated pr essur e-dri v en pumps that continuously r efr eshed the growth medium. The fluorescence intensity of cells that remained immobilized (while still growing normally) in the microfluidic traps was observed over 90 minutes, revealing a distribution of fluorescence values across the trapped population (see an example of one such experiment in Figure 3 I).
Monitoring more than 1000 single growing cells over an extended timeframe in this way again re v ealed the e xistence of the highly fluor escing SET GCN4 sub-population, wher eby the relati v ely small cell sample size (compared to the flow cytometry experiments) resulted in a coarser distribution, as well as more variability between experiments. The latter is highlighted here by the orange-coloured bars corresponding to the cells manifesting the top-3% fluorescence inten-sities (see the example experiment shown in Figure 3I; the bin size affects the granularity of the graph). In conclusion, this individual-cell-imaging based analysis confirms the existence of the SET GCN4 sub-population as well as its main quantitati v e features.

GCN4 5'UTR structure underpins the SET phenomenon
In further experiments, we investigated the role of the 5'UTR GCN4 in generating this stochastic distribution. An important feature of the 5'UTR GCN4 is the influence of its uORFs on the behaviour of ribosomal complexes that interact with it. Elimination of uORF1 has previously been shown to abrogate translational der epr ession mediated by the 5'UTR GCN4 ; the explanation for this loss of regulation is that the modification allows a greatly increased percentage of ribosomal pre-initiation complexes to initiate on uORF4 and then to dissociate from the mRNA, so that v ery fe w of them can initiate on the GCN4 CDS ( 32 ). We found tha t muta tion of the uORF1 start codon to AUA markedly reduced the size of the SET GCN4 sub-population (Figure 4 ). It also changed the reporter expression behaviour brought about by 3-AT induction, yielding an overlapping bimodal fluorescence distribution and, overall, a threefoldreduced mean value for the reporter fluorescence (Supplementary Figure S3). This outcome is consistent with the primary role of uORF1 as a reinitiation-promoting structural element whose functional impact can only be partially reproduced by uORF2, at least when located in its normal position within the 5'UTR GCN4 (32). In further experiments, we replaced uORF1 by uORF4. In line with the demonstrated property of uORF4 to promote ribosomal release after termination on its stop codon, this modification also markedly limited the induced le v el of gene expression and strongly reduced the size of the SET GCN4 sub-population (data not shown). These were very similar outcomes to those observed after modification of the uORF1 start codon to AUA (Figure 4 and Supplementary Figure S3). We conclude from this section of the work that modifications of the uORF-based ar chitectur e of the 5'UTR GCN4 that partiall y or full y eliminate the nutrient-stress-induced translation regulatory mechanism also suppress de v elopment of the SET GCN4 state. The fluorescence distributions observed in these cases are similar to those manifested by a control reporter construct featuring a short unstructured 5'UTR that does not contain uORFs, as considered in the modeling discussion in the Supplementary Data section (specifically the embedded Supplementary Figure S12 in that section).

The role of eIF2 ␣ phosphorylation
The degree of phosphorylation of eIF2 ␣ is known to play a key role in transla tional regula tion of GCN4 . It modulates the ability of reinitiating ribosomal 43S complexes to bypass uORF4 and thus to initia te transla tion on the start codon of the main GCN4 CDS ( 32 ). We tested the hypothesis that the existence of the SET GCN4 cells is linked to eIF2 ␣ phosphorylation. We did this in two ways. First, we compared the distribution of fluorescent reporter activity in a gcn2 Δ genetic background (Figure 5 B) with that obtained with a wild-type control strain ( Figure 5 A; see also overlay of these respecti v e panels in Supplementary Figure S4). Flow cytometry re v ealed that, in the absence of Gcn2 kinase, translational activation of the genomic GCN4 reporter construct in the wider population was, as expected from earlier work ( 32 ), abrogated. In addition, we observed that the SET GCN4 cells were no longer evident (Figure 5 B). Second, we observed a very similar result using a strain in which the phosphorylation target of Gcn2 kinase, eIF2 ␣ Ser51, was mutated to Ala (Figure 5 C). In the latter case, activation of Gcn2 (through addition of 3-AT) in the eIF2 ␣ Ser51Ala mutant cells leads to a small degree of narrowing of the expression profiles of the reporter construct in the non-activated and activa ted sta tes (Figure 5 C), perhaps because of some other effect of Gcn2 activation in the cell. Overall, the most consistent feature of both types of experiment was the disappearance of the SET GCN4 sub-population (compare, for example, the profile delineated by the broken vertical lines in Figure 2 C).

Extrinsic versus intrinsic noise
In order to obtain more information about the translational stochasticity generated by the 5'UTR GCN4 , we used a previously r eported procedur e to a ppl y light-sca tter ga ting analysis to the fluorescence cytometry data ( 20 , 26 ). This procedure is generally used to provide information on the relati v e contributions of extrinsic and intrinsic components to the total gene expression noise observed in a given population of cells. We applied this procedure to the population of uninduced cells of a strain carrying the chromosomally integrated reporter construct shown in Figure 2 A. The lightsca tter ga ting analysis re v eals an interesting property of the SET GCN4 subpopulation of cells (Figure 6 A); in comparison with the whole population (red plots), translation initiation mediated by the 5'UTR GCN4 in the SET GCN4 cells manifests a much reduced component of extrinsic noise (blue plots). Assessment of the smaller gate radius data also indicates tha t the SET GCN4 subpopula tion manifests a lower intrinsic noise le v el (b lue plot lines in Figure 6 A). These observed changes in noise characteristics can be reproduced by the system model we describe below. Since we had alread y demonstra ted tha t the presence of an acti v e Gcn2 kinase is linked to the expression state in SET GCN4 cells ( Figure 5 ), we examined whether variation in the basal activity of this kinase might potentially constitute one form of extrinsic noise that could contribute to the generation of the bimodal distribution of 5'UTR GCN4 -supported translation initiation that is observed in non-induced cell populations. We transformed the P TEF1 -5'UTR GCN4 -ymNG strain using a centromeric plasmid carrying GCN2 . Since centromeric plasmids can manifest mean copy numbers per cell of up to at least fiv e ( 39 ), expression of a promoter-gene combination from such a plasmid can generally be expected to be increased, and show enhanced variability compared to the same promotergene combination in a genomic locus. The presence of a significantly increased basal activity of Gcn2 kinase will, e v en in the absence of induction (by starvation or the addition of 3-AT), result in an increased le v el of phosphorylation of eIF2 ␣. The results of the flow cytometry experiments (Figure 6 B) re v eal that this markedly increases the size of the sub-population of SET GCN4 cells, consistent with the proposal that variations in Gcn2 activity r epr esent a natural generator of extrinsic noise in the 5'UTR-mediated regulatory system controlling GCN4 translation.

Gcn2
Uncharged tRNA  20 indicates that expression dependent on the 5'UTR GCN4 manifests a comparati v ely high le v el of noise (red data points), e v en in the absence of induction b y starv ation (or 3-AT). The plot re v eals the effect of progressi v ely constraining the heterogeneity of the cells (by reducing the gate radius) on the %CV (coefficient of variance as a percentage) value. As the gate radius is constrained (moving along the x-axis from right to left), the contribution of extrinsic factors is reduced, and ultimately a minimum value is reached that is primarily attributable to the intrinsic noise component. The SET GCN4 sub-population of cells (blue data points, r epr esenting those cells whose ymNG fluorescence intensity lies above the 97th percentile of total population fluorescence intensity values) manifests a lower le v el of overall noise, indicating that there is greater homogeneity in the rates of GCN4 gene expression in these cells, and a smaller extrinsic component. ( B ) Flow cytometry fluorescence intensity profile of a strain carrying the genomic P TEF1 -5'UTR GCN4 -ymNG construct (see Figure 2 A) plus a plasmid-borne GCN2 gene that overproduces the Gcn2 protein (to stochastically varying degrees because of copy number variations). We observed a high-noise, (overlapping) bimodal distribution that features peaks typical of non-induced cells (major peak outlined by broken blue line) and of a greatly increased proportion of SET GCN4 cells (see region indicated by the horizontal blue bracket).

Expression state of gcn4 -activated biosynthetic pathway genes
If the SET GCN4 state is linked to an increased le v el of the Gcn4 transcriptional activator in these cells, this should switch on expression of downstream genes involved in amino acid biosynthesis. We accordingly integra ted yEGFP a t the C-terminal end of the genomic GCN4 CDS and mRuby3 at the C-terminal end of the genomic ADE8 CDS. ADE8 is a Gcn4-regulated gene that encodes phosphoribosyl-glycinamide transformylase, an enzyme tha t ca talyses a step in the purine nucleotide biosynthetic pathway. This enabled us to measure both the relati v e in viv o ab undance of the Gcn4 ::yEGFP fusion and, in paral-lel, the degree of downstream transcriptional activation imposed by this fusion protein in exponentially growing cells. Analysis of the fluorescence characteristics of randomly selected, non-induced, individual cells revealed the relationship between the intracellular abundance of the yEGFP fusion protein and ADE8::mRuby3 expression. We identified a strong correlation between the abundance of the respecti v e fusion proteins (Figure 7 ). It is also noticeable that a large percentage of those cells manifesting the highest yeGFP fluorescence values (including the SET GCN4 subpopulation) also manifest above-proportional increases in mRuby3 fluorescence. This is consistent with increased activation of the P ADE8 promoter at higher le v els of Gcn4. Overall, these data confirm that in all cells of the population, including the SET GCN4 cells, ADE8::ymRuby3 was subject to GCN4::yEGFP -dependent induction.

A possible mechanism underpinning the SET GCN4 state
We explored whether we can model a scenario in which a population of cells manifests a variation in an activator that provides an explanation for the observed abundance of SET GCN4 cells. Understanding what assumptions are required to simulate the results that we hav e observ ed helps us to elucidate the likely mechanistic basis of the SET GCN4 phenomenon. We have applied a model of GCN4 translation (Figure 8 A) that is based on the totally asymmetric simple exclusion process (TASEP; 37) as more recently applied to GCN4 translation ( 38 ). The model involves binding of the 43S complex to the 5' end of mRNA at rate α, followed by scanning a t ra te v and initia tion a t uORF1. Following termination of uORF1 translation, the small ribosomal unit (40S) remains on the transcript with a probability η and scans a t ra te v to the next start codon, which is located at the start of either uORF4 or the GCN4 CDS. We have ignored uORF2 and uORF3 in this version of the model because the regulatory properties of the GCN4 5'UTR are largely maintained in their absence ( 32 ). Downstream of uORF1, a new ternary complex (TC) can bind to the scanning 40S at rate λ, forming the complex we refer to as 40S-TC. The model and its parameters are presented and discussed in full detail in the Supplementary Data section. In this model, the rate at which 40S-TC initiates at the GCN4 start codon is equal to the scanning speed v multiplied by the probability that 40S-TC will reach the GCN4 start codon. We make the assumption that the rate α at which the 43S complex binds to the 5'end is much lower than the scanning ra te v , i.e. tha t the total density of initiation complexes concurrently scanning the 5'UTR is also low. This assumption is justified by the results of ribosome density mapping experiments that detect the presence on the 5'UTR of only approximately one ribosome under repressing conditions, and a pproximatel y two ribosomes under der epr essing conditions ( 40 ). We have estimated the values of λ/ v and η under r epr essing conditions (0.018 and 0.62, respecti v ely) on the basis of previous studies; one examining the effects of the respecti v e uORFs on translation e v ents on the GCN4 5'UTR ( 41 ) and another looking at the dynamics of yeast ribosomal scanning ( 42 ). Taking into account the induction behaviour observed using flow cytometry (see, for example, Figure 2 E), λ/ v becomes 0.011 under de r epr essing conditions wher eb y, λ assumes the v alue 0.33 s −1 . Under r epr essing conditions, λ = 0.54 s −1 . Using a value of 0.08 s −1 for α, the model predicts the relationships between the value of λ and the probabilities of reinitiation on uORF4 and on GCN4 shown in Figure 8 B (see also the Supplementary Data section). These plots highlight the respecti v e reinitiation probabilities predicted at the different values of λ under r epr essing and der epr essing conditions. The variables in this model can be adjusted to obtain an optimal model fit for a range of experimental conditions.
Building on the above model, we propose a simple rationalization (outlined in detail in the Supplementary Data section) of the 5'UTR GCN4 -dependent translation profile observed in a population of yeast cells (as reflected, for example, in the fluorescence data shown in Figure 3 B) that comprises a normal distribution of fluorescent reporter expression that overlaps with the enhanced fluorescence distribution corresponding to the SET GCN4 state. We have employed a simple two-step gene expression model to explain these two distributions (Figure 8 C). The main normal . This model predicts relationships between the rate of TC binding to scanning 40S and the rate of reinitiation on uORF4 and on the GCN4 coding r egion, r especti v ely ( B ). The main distribution of protein synthesis (in each cell) is dictated by a number of processes that occur sim ultaneousl y ( C ), and can be explained assuming the value of the translation rate under r epr essing conditions. In contrast, the SET GCN4 cell distribution can be explained by cell-to-cell variations in the T C binding rate. Her e, we have modeled the generation of the TC binding rate variations in response to variations in Gcn2 activity. Combining these two strands of modeling, we obtain an asymmetric profile that is remar kab ly close to the observed behavior of the GCN4 system ( D ). distribution can be explained assuming the value of the tr anslation r ate under r epr essing conditions. In contrast, the SET GCN4 cell distribution can be explained by cell-to-cell variations in the TC binding rate λ. We have demonstrated that one potential cause for these variations is cell-to-cell heterogeneity in the activity of Gcn2 kinase. We expect that λ assumes a maximum value of λ r epr essed under conditions in which Gcn2 activity is below a threshold value (see above), while λ assumes a lower (der epr essing) value at Gcn2 kinase activities greater than the threshold value. We have shown that positi v e fluctuations of Gcn2 acti vity can incr ease der epression in the cell population (see Figure 6 ). Howe v er, negati v e fluctuations of Gcn2 kinase cannot increase λ beyond λ r epr essed because this is limited by the rate control coefficients of other transla tion initia tion factors ( 43 ). These consider ations gener ate an asymmetric profile (Figure 8 D) that closely resembles the experimentally observed behaviour of the system (Figure 3 ).

DISCUSSION
In this study, we have identified a form of non-genetic varia tion tha t opera tes a t the transla tion le v el. In an e xponentially growing yeast culture, this manifests itself in the form of a subset of cells in which the 5'UTR-mediated der epr ession of GCN4 expression is activated as the result of stochastic, rather than deterministic, factors. This phenomenon is reproducib ly observ ed under three different sets of experimental conditions that we have studied using flow cytometry, fluorescence microscopy of cells grown in batch culture, and fluorescence microscopy of cells over a longer timescale within a microfluidics environment in which the growth medium is continuously r efr eshed. The SET GCN4 subpopulation comprises < 5% of the total cell population, which is comparable to the limited sizes of subpopulations manifesting transcriptional heterogeneity in relation to other regulatory responses in yeast [for example, carbon source switching ( 30 ); stress protection by Tsl1 ( 29 )]. We hav e inv estigated the prov enance of this state of stochastic enhanced translation of the GCN4 gene (the SET GCN4 state). The experimental data reported in this study re v eal that the SET GCN4 state is underpinned by the operational features of the GCN4 5'UTR acting at the translation step of gene expression. Mutational analysis of uORFs 1 and 4 show that these structural elements play key roles in the generation of the SET GCN4 sub-population. This is consistent with the idea that stochastic variation in the mechanisms underpinning the fate (uninterrupted scanning or Nucleic Acids Research, 2023, Vol. 51, No. 13 6619 reinitiation) of pre-initiation complexes as they encounter the uORF4 start codon dri v es the SET GCN4 phenomenon. We have also demonstrated that the SET GCN4 state is dependent on phosphorylation of eIF2 ␣. Mutational disruption of both uORF-mediated scanning modulation and GCN2 phosphoryla tion elimina tes the SET GCN4 sta te and concomitantly r econfigur es the fluor escence intensity distribution to a more symmetrical form. Moreover, it is notable that, in those cases (Figures 4 and 5 ) in which we have inactivated the GCN4 translational regulation mechanism via previously described mutations ( 32 ), the fluorescence intensity data assume the form of a normal distribution (see also the Supplementary Data section). This suggests that, under standard conditions, the mRNA-scanning stochasticity that generates the SET GCN4 state is superimposed on the transcriptional noise dri v en by the promoter.
At first sight, it is tempting to conclude that the SET GCN4 state is comparable to the phenotype of cells carrying a Gcd − (constituti v e der epr ession) mutation ( 32 ). Howe v er, our results indicate that the SET GCN4 state is limited to only a subset of cells by virtue of stochastic variation; i.e. it is not generated by a nutritional starvation response. It seems likely that the SET GCN4 state is only feasible if translation of most, if not all, of the GCN4 mRNA molecules in any gi v en cell is transla tionally activa ted, since otherwise the observed high level of expression in each SET GCN4 cell would not be possible. This suggests the involvement of some form of extrinsic noise, i.e. variations in the global cellular environment. The dominant contribution of an extrinsic noise source, such as Gcn2, would also explain the occurrence of a sub-population of individual cells in which GCN4 translation is evidently activated under non-starvation conditions. Moreover, in another relevant stud y, transla tion complex profiling has provided additional insight into the interactions between 40S ribosomal small subunits (SSUs) and GCN4 mRNA molecules within the cell ( 44 ). In that work, translation complex profile sequencing (TCP-seq) has revealed the presence of an SSU footprint over the GCN4 main ORF AUG a t a frequency tha t suggests tha t there is a low le v el of GCN4 translation on mRNAs extracted from non-starved (non-der epr essed) yeast cultur es. This T CP-seq result is consistent with the occurrence of the SET GCN4 state in a sub-population of non-starved yeast cells, as described in the present work.
Consideration of the above findings prompted us to explore how cell-to-cell heterogeneity with respect to at least one extrinsic factor might act upon these operational features to generate the SET GCN4 state. Our results indicate that the activity of Gcn2 kinase exerts a strong influence on the SET GCN4 phenomenon. Pre vious wor k has demonstra ted tha t inactiva tion of Gcn2 kinase pre v ents GCN4 der epr ession in response to starvation of yeast cells for amino acids or nucleosides ( 32 ). In this study, we have observed not only tha t inactiva tion of GCN2 eliminates the SET GCN4 sub-population, but also that the presence of an excess of Gcn2 activity increases the size of the the SET GCN4 sub-popula tion. These da ta are consistent with a scenario in which SET GCN4 cells distinguish themselves from the bulk of a yeast cell population by virtue of the exceptionally high le v el of Gcn2 kinase activity that they possess. This could potentially be due to an enhanced abundance of Gcn2 kinase that is in its basal activity state, since activation linked to induction mediated by the usual nutritional stress pathway would normally be expected to cause marked growth limitation of the SET GCN4 cells, a feature of this sub-population that we have not observed. Howe v er, at this stage we cannot rule out the possibility that Gcn2 kinase is subject to partial activation in these cells. For example, partial dephosphorylation of Ser577 in Gcn2, which is known to activa te Gcn2-media ted phosphoryla tion of eIF2 ␣ ( 32 ), might potentially also be involved, and clarification of this question will need to be the subject of future work. It is notable that 5'UTR GCN4 -mediated noise is reduced in SET GCN4 cells, primarily because of suppression of the extrinsic component. This is consistent with an increased activity of the extrinsic factor that is responsible for genera ting the SET GCN4 sta te, although it does not inform us how the increased activity is achieved.
Our observations identify Gcn2 kinase as a source of 5'UTR GCN4 -mediated noise that contributes to the SET GCN4 phenomenon. We have not established whether other factors are also relevant. For example, the actinbinding yeast impact homologue 1 (Yih1) competes with Gcn1 for binding to Gcn2, thus inhibiting the stimulation of Gcn2 kinase by uncharged tRNAs under nutritional stress conditions ( 45 ). Future work will therefore need to investigate whether Yih1, or other potential modulators of Gcn2 kinase activity, also play a role in determining the characteristics of the SET GCN4 state. Moreover, it remains to be ascertained whether there exists an alternative route to modulating ribosomal interactions with the 5'UTR GCN4 that is at least partially uncoupled from regulation of global protein synthesis.
Of further note is that there is no a priori reason to assume that the relati v e size of the SET GCN4 sub-population is fixed, and this could potentially vary from species to species and from strain to strain. This is because the magnitude of the extrinsic noise fluctuations may be dependent on multiple (as yet undefined) factors. It is also likely to relate to the gr owth envir onment and thus selecti v e forces that, in turn, determine the balance between metabolic burden and enhanced competiti v eness. In this conte xt it is important to note other work in yeast showing that rate control of protein synthesis is shared across multiple components of the translation machinery ( 43 ). This raises the possibility that heterogeneity in the intracellular abundance of multiple transla tion appara tus proteins may contribute to the observed SET GCN4 state, perhaps working in synergy with variations in the activity of Gcn2. It would not be surprising to find that such complex functional interactions lie at the root of the type of regulatory heterogeneity we describe here, but their elucidation will r equir e futur e investigation.
Finally, we note that uORF-mediated posttranscriptional control is also observed in other yeast mRNAs ( 46 ) as well as in higher eukaryotes ( 47 ). Mammalian and insect (e.g. Drosophila ) cells possess ATF4 , a transcriptionfactor-encoding gene that, analo gousl y to GCN4 , is subject to uORF-mediated translational regulation ( 48 ). It is worth noting that this type of regulatory principle is also thought to a ppl y to other genes, including ATF5 and CHOP (C / EBP homologous protein) ( 49 ). The ATF4 5'UTR contains two uORFs, and translation of the second of these uORFs, as opposed to translation of the main ATF4 reading frame, is again subject to modulation by the state of phosphorylation of eIF2 ␣. Remar kab ly, the ATF4 regulatory pathway is critical for the survival and proliferation of at least two tumour cell types in response to nutrient deprivation ( 50 ). It also plays a role in controlling autophagy (a natural regeneration process that removes defective components in cells) and apoptosis (a process of programmed cell death essential to both the maintenance of homeostasis and tissue growth / de v elopment). A striking manifestation of this is that an ATF4 mutant in Drosophila pre v ents heads from emerging from the thorax during pupation ( 49 ).
In conclusion, this study has established the existence of a form of stochastic noise in what appears to be the default opera tional sta tus of a transla tional regula tory switch under non-starvation conditions in S.cerevisiae . This translational noise is in addition to the transcriptional noise that is generally evident in the expression pathwa y f or all genes. We have determined many of the key features of this novel stochastic system, but future work will need to characterize in detail the full significance of the SET GCN4 expression state in terms of further mechanistic details, the associated fitness trade-offs, and the potential role of such a system in evolutionary terms. Gi v en the e xistence of multiple genes subject to uORF-media ted transla tional regula tion in eukaryotes, it is possible that related forms of translational stochasticity, and possibly of translational bet-hedging, are operational in at least some of these systems. The mechanisms underpinning such stochastic phenomena may be found to be advantageous for the host solely under certain growth conditions, while close to neutral benefit or disadvantageous in others, so that any overall positive selective value becomes evident only in varying environments. Moving forward from this primary study of the mechanistic basis for gene expression heterogeneity generated by interactions between the transla tion appara tus and an uORF-containing 5'UTR, future work will need to explore the relationships between this phenomenon and cell viability, stress responses and evolution.

DA T A A V AILABILITY
Flow Cytometry data generated in this study are available via the FlowRepository w e bsite ( https://flowrepository. org/id/FR-FCM-Z6ZE ). The model presented in this wor k is availab le via the EBI BioModels ( 51 ) w e bsite (MODEL2302230001).