Towards a rational approach to promoter engineering: understanding the complexity of transcription initiation in prokaryotes

Abstract Promoter sequences are important genetic control elements. Through their interaction with RNA polymerase they determine transcription strength and specificity, thereby regulating the first step in gene expression. Consequently, they can be targeted as elements to control predictability and tuneability of a genetic circuit, which is essential in applications such as the development of robust microbial cell factories. This review considers the promoter elements implicated in the three stages of transcription initiation, detailing the complex interplay of sequence-specific interactions that are involved, and highlighting that DNA sequence features beyond the core promoter elements work in a combinatorial manner to determine transcriptional strength. In particular, we emphasize that, aside from promoter recognition, transcription initiation is also defined by the kinetics of open complex formation and promoter escape, which are also known to be highly sequence specific. Significantly, we focus on how insights into these interactions can be manipulated to lay the foundation for a more rational approach to promoter engineering.


Introduction
Synthetic biology aims to a ppl y engineering concepts to biology.In this context, the gold standard of synthetic biology is collections of well-defined, predictable, and tuneable parts that form the building blocks for expression of individual genes and ultimately complex genetic circuits (Arkin 2013 , Garcia andTrinh 2019 ).In the context of industrial biotechnology, the development of toolboxes of predictable and tuneable gene expression parts that function in industriall y r ele v ant hosts will be essential for creation of robust and cost-effective microbial cell factories (Lucks et al. 2008, An and Chin 2009, Mutalik et al. 2013, Liu et al. 2018, Costello and Badran 2021 ).
To ac hie v e pr edictability, it is important to be able to reduce both environmental and genetic context that can affect the performance of genetic parts.Extensive part c har acterization and pr edictiv e models can help us to predict how environmental factors and surrounding DNA context can alter the regulation of gene expression.In addition, gene regulation that is orthogonal to the host metabolism facilitates predictability by reducing unwanted environmental and host-related context, allowing the part to function lar gel y independentl y of host str ess or envir onmental c hanges.Equall y, optimizing metabolic flux of an expression pathway is r equir ed to pr e v ent negativ e consequences of ov er pr oduction and maximize desir ed pr oduct formation.To ensure optimal expr ession of eac h pathway component, a toolbox of genetic control elements is necessary for fine-tuning gene expression levels.
In the context of industrial bioprocesses, it is often beneficial to regulate gene expression at a transcriptional level.As the first step in the gene expression process, this has multiple benefits including conserving energy and cellular r esources suc h as RNA pol ymer ases, and pr oviding the ca pability to r a pidl y r espond to changes in environmental conditions (Segall-Shapiro et al. 2014, Bervoets and Charlier 2019, Ma et al. 2022 ).There are three main stages of transcription: initiation, elongation, and termination.In this r e vie w, we focus on tr anscription initiation as it is the point wher e tr anscription specificity is defined, yielding orthogonality (Bro wning and Busb y 2004 ).Pr omoter sequences ar e r egions of DNA that, thr ough r ecognition and inter action with the sigma subunit of RNA pol ymer ase (RNAP), signal the starting point for tr anscription.Pr omoter str ength is known to vary considerably, meaning that production of full-length transcripts can vary over 10 000-fold for different promoter sequences, making them good targets for regulating gene expression levels (McClure et al. 1983 , Mazumder andKapanidis 2019 ).
Although extensive collections of functional promoters exist, there is a considerable lack of unique and well-characterized promoters that can be used in industrially relevant hosts (Hossain et al. 2020, De Wannemaeker et al. 2022 ).In addition, many commonly used promoters are context specific due to their interaction with the host metabolism, resulting in environmental interfer ence, whic h is highl y undesir able when designing robust microbial cell factories that will be subject to harsh environmental conditions (Collado-Vides et al. 1991, Lu et al. 2009, Gilman and Love 2016 ).
To combat this, well-defined, orthogonal, and tuneable promoters that function in a broad range of host organisms can be cr eated using pr otein engineering.Despite the fact that protein engineering has already been frequently and effectively utilized to generate variation in gene expression levels, the field can still be impr ov ed to further incr ease the le v el of contr ol ov er gene r egulation (Blazeck andAlper 2013 , Xu et al. 2019 ).Within promoter sequences there are multiple conserved elements that are known to contribute to successful transcription initiation.Many previous promoter engineering attempts focus on one of these promoter regions at a time, ignoring the complexity of the interactions that define transcriptional output, and ther efor e limiting the po w er and predictability of these promoter engineering attempts .T he fact that transcription initiation is defined by more than recognition of the promoter sequence by the sigma factor is often overlooked, further compounding the problem.Single mutations, e v en in the sequence immediately upstream of the transcription start site or upstream of the main promoter boxes can cause global changes in promoter activity, highlighting the complexity of the system and knowledge r equir ed when engineering pr omoters based solel y on a small number of promoter elements (Urtecho et al. 2019, Saecker et al. 2021 ).
Through a more complete understanding of the interactions between the sigma factor and promoter sequence that define the strength and specificity of transcription initiation, we can lay the foundations for a mor e r ational a ppr oac h to pr omoter engineering.To this end, se v er al r ecent studies hav e further elucidated the complexity of these interactions through development of thermodynamic models that le v er a ge an in-depth knowledge on transcription mechanism to achieve improved predictions of gene expression levels.In addition, the use of massiv el y par allel reporter assa ys ha ve allo w ed high thr oughput assembl y and measurement of large promoter libraries that represent different combinations of promoter elements, including upstream and downstr eam r egions, and their inter actions with eac h other, pr oviding large datasets that are already being utilized for forw ar d promoter engineering (Einav and Phillips 2019, Urtecho et al. 2019, Lagator et al. 2020, LaFleur et al. 2022 ).Such knowledge can gr eatl y expand the potential of the field of promoter engineering through, e.g.creation of smarter promoter libraries that increase tunability, design of promoters with defined regulation or expression properties, or addition of impr ov ed insulator sequences that reduce the influence of the surrounding promoter context.Of particular interest is the potential of r e-engineering the inter action between the sigma factor and DNA, which could allow the design of promoters with specific desir ed c har acteristics and orthogonal interaction with a coengineered sigma factor, allowing gene expression that is primarily controlled independently of the host metabolism.
In the subsequent sections, we will break down the steps involv ed in tr anscription initiation and the pr omoter sequence features that determine the strength and specificity at each stage, demonstrating the complexity of transcriptional regulation by promoter sequences and providing insight into the relative importance of differ ent pr omoter positions on the process .T he information given relates primarily to interactions with Escherichia coli σ 70 although, due to the fact that the mechanism of transcription is highl y conserv ed, m uc h of the information is r ele v ant to pr omoter interactions with alternative sigma factors and those from other organisms.Ho w ever, it should be noted that, when considering those sigma factors that differ more significantly in structure and mechanism from σ 70 , there can be significantly different modes of transcriptional regulation than described here.Whilst existing r e vie ws in this ar ea focus on pr omoter r ecognition, they do not take into account open complex formation and promoter esca pe, whic h ar e also mediated lar gel y by inter actions between promoter and sigma factor and have a significant role in determining transcriptional output (Hook-Barnard and Hinton 2007 ).
Here, we will build on these r e vie ws, highlighting that the kinetics of open complex formation and promoter escape are also highly promoter sequence dependent.

Regulation of transcription initiation in prokaryotes
The steps in transcription initiation are controlled by the interaction of the sigma factor with the promoter sequence and are summarized in Fig. 1 .In the light of recent structural studies (Chakraborty et al. 2012, F eklisto v et al. 2017, Bo y aci et al. 2019, Chen et al. 2020, 2021, Saecker et al. 2021 ) that have further elucidated the complex and dynamic mechanism and regulation involved in transcription initiation, we have more insight into the role of particular promoter sequence characteristics in determining transcriptional output.Ho w ever, it should be noted that the specific details of these mechanisms, intermediates and kinetics pr esented her e ar e still debated and likel y differ depending on the promoter sequence, further adding to the complexity (Mazumder and Kapanidis 2019 ).
In the first step of transcription initiation, the sigma factor (comprising four domains denoted σ 1 , σ 2 , σ 3 , and σ 4 ) recognizes the promoter sequence primarily at the conserved −35 and −10 hexamer positions, causing the RNAP holoenzyme (consisting of subunits α, α, β, β', ω, and σ ) to be recruited to the promoter.Both specific and nonspecific interactions occur between the promoter and RNAP holoenzyme, forming the closed complex (Fig. 1 , box 1).Secondl y, pr omoter melting and open complex formation take place thr ough separ ation of 13 base pairs of double stranded DNA from position −11 to + 2, forming the transcription bubble (Fig. 1 , box 2).Initial RNA synthesis and pr omoter esca pe is the final stage of transcription initiation and culminates in dissociation of the RNAP holoenzyme from the promoter sequence, allowing it to proceed along the DNA and enter transcription elongation (Fig. 1 , box 3).Each of these stages and the specific and nonspecific interactions involved will be described in more detail in the subsequent sections.

Closed complex formation
As the first step of transcription initiation, initial promoter recognition is of gr eat inter est in pr omoter engineering, as altering the earl y inter action of the sigma factor with the DNA will dir ectl y affect the specificity and strength of a given promoter.In the first stage of promoter recognition RNAP makes contacts with the promoter sequence, forming a closed complex where RNAP dir ectl y contacts the double-stranded DNA and spans from positions −55 to + 15, r elativ e to the transcription start site (Li and McClure 1998 ).The closed complex DNA is thought to be bent 17 • at the −10 box, positioning downstream promoter DNA above the RN AP DN A binding cleft in pr epar ation for further steps of transcription (Chen et al. 2020 ).Bending of the −35 box between positions −38 and −48 has also been reported and is also thought to direct the closed complex to w ar ds the enzyme active site (Ruff et al. 2015 ).
A recent study by F eklisto v et al. ( 2017 ) proposed that fluctuations in RNAP clamp opening and closing allow efficient and dynamic promoter recognition (F eklisto v et al. 2017 ).In this process, the open clamp form of RNAP scans the genome for upstream promoter elements such as the −35 box and UP element.Dynamic binding of RNAP to these sequence features forms the closed  complex in which both general and base-specific contacts between upstream promoter elements and the RNAP holoenzyme are formed and further stabilized.
Ho w e v er, within closed complex formation there is an abundance of built-in complexity to consider.The dynamic nature of the process means that interactions are readily reversible and are subject to much regulation, limiting the impact of promoter sequence alone in determining transcription rate (Saecker et al. 2021 ).The r elativ e contributions of conserv ed upstr eam elements on the initiation and kinetics of closed complex formation differs between promoters .T his is outlined in detail below and summarized in Fig. 2 .

-35 box interactions
Initial chemical cross-linking studies gave the first indication of both nonspecific and base-pair specific recognition of the −35 box by domain 4 of σ 70 (Simpson 1979, Park et al. 1980, Hilton and Whiteley 1985, Buckle et al. 1991 ).Subsequently, amino acid substitutions in the housek ee ping sigma factor that alter or relax −35 consensus were identified, implicating positions −33 and −31 in base-specific promoter recognition through interaction with amino acids E585, R588 and R584, r espectiv el y (Gardella et al. 1989, Siegele et al. 1989 ).These studies w ere complimented b y detailed information from crystal structur es, whic h confirmed the base-specific interaction between −33G and −31C of a σ 70 consensus hexamer with amino acids E585 and R584 that are found in the helix-turn-helix motif of σ 4.2 .In addition, a number of amino acids were identified as having a role in creating nonspecific contacts with −35 box DNA (Campbell et al. 2002, Murakami et al. 2002, F eklisto v and Darst 2011, Bae et al. 2015 ).
Whilst the −35 box consensus is defined as TTGACA, there are numerous hexamer compositions that result in stable interactions and favourable transcription initiation.This is demonstrated by a r ecent fr ee ener gy model de v eloped by LaFleur et al. ( 2022) that provides interaction free energies for different promoter sequence compositions, allowing us to quantify the effect of different promoter positions on transcriptional output.The learned inter action ener gies indicate that whilst −33G is present in 6 out of 10 of the most favourable hexamers, A, T, and C at this position also feature in the top 10 most favourable −35 box compositions.This data set provides a highly useful resource for determining the stability of interactions betw een RN AP and DN A at differ ent pr omoter compositions and can, ther efor e, be utilized in a rational and data-driven approach to promoter engineering (LaFleur et al. 2022 ).In addition, a recent study by Liu et al. ( 2022 ) has further defined the role and impact of certain positions of the −35 box on closed complex formation thr ough r e-engineering the inter action between the −35 box and the sigma factor, giving a further exam-ple of how this knowledge can be used in a promoter engineering context to yield promoter sequences that function orthogonally to the host metabolism.
Whilst the sequence of the −35 box is the most conserved of the upstream elements and, at many promoters, has the biggest influence on promoter recognition by the RNAP holoenzyme, this often leads to an oversimplification and assumption that this acts independently to define initial promoter recognition.It is not uncommon for promoters to lack a −35 box, which is compensated for by the presence of other upstr eam pr omoter elements, as discussed further below.

-10 box interactions
Whilst it was pr e viousl y suggested that the −10 box was also involved in closed-complex formation, it is now thought that this initial scanning does not recognize specific bases of the −10 box.Ho w e v er, at this sta ge, DNA-bac kbone inter actions of the −10 box with basic residues of σ 2 (R436/R441/R451) and σ 3 (K462/R456) help to stabilize the closed complex and, ther efor e, influence the productivity of the closed complex (F eklisto v et al. 2017(F eklisto v et al. , Chen et al. 2020 ) ).Additionally, a recent study demonstrated that −35 and −10 box binding are not independent of each other, indicating that the composition of the −10 box influences −35 box recognition (Einav and Phillips 2019 ).In terms of base-specific interactions, it is likely that a dynamic mechanism exists where −10 box recognition is coupled to DNA melting and is, ther efor e, discussed further in the context of open complex formation.

Extended −10 box interactions
A conserved extended −10 motif, located two base pairs upstream of the −10 box at some promoters, is known to contribute to promoter sequence recognition through interaction with σ 3 .The base at position −14 of the nontemplate strand forms base-specific contacts with amino acid E458, which is located in the α-helix of σ 3 .Amino acid residue H455 of σ 2 is also thought to make direct contacts with the phosphate backbone of the nontemplate strand at this position, indicating that σ 2 also has a role in extended −10 box recognition (Barne et al. 1997, Murakami et al. 2002 ).Whilst the consensus −10 box motif is thought to be TGN, calculated inter action fr ee ener gy of RN AP-DN A binding indicated that there ar e m ultiple compositions that w ould be fav our able for tr anscription, indicating that this pr efer ence is not fixed (LaFleur et al. 2022 ).
The importance of the extended −10 box in closed complex formation is further demonstrated by its ability to compensate for the absence of other consensus promoter elements.Previous resear ch sho ws that, in cases where a consensus −35 box is absent, the extended −10 box can act alone to anchor RNAP to the correct position on the DNA (Keiltys and Rosenberg 1987 , Barne et al. 1997, Bown et al. 1999, Mitchell et al. 2003, Sanderson et al. 2003, Haugen et al. 2008, Ruff et al. 2015 ).In addition, the extended −10 box can also compensate for a longer spacer, with a high percentage of promoters containing the extended −10 box consensus 'TGN' motif having an 18 bp spacer in comparison to the consensus length of 17 bps (Mitchell et al. 2003, Ruff et al. 2015 ).Furthermore, the presence of an extended −10 box in combination with a consensus −35 box has been known to compensate for a weak −10 box, further demonstrating the flexibility in the sequence elements involved in promoter recognition (Hook-Barnard and Hinton 2007 ).
Whilst the extended −10 box is gener all y less imper ativ e for functional transcription than the −35 or −10 hexamer, the importance of the extended −10 box on transcription is promoter dependent.In one recent study, removing the consensus extended −10 box resulted in a deleterious phenotype, indicating that this motif is r equir ed for functionality at certain promoters (Park and Wang 2021 ).This information indicates that the extended −10 box is an important target for promoter engineering and can be dir ectl y modified to alter promoter recognition preference.Further, its ability to compensate for other promoter elements should be considered when designing the promoter engineering strategy, to ensure that the changes made to other promoter elements are not undermined by this compensation.

UP element interactions
Some promoters have an AT rich region located upstream of the −35 box that is also implicated in promoter recognition.This sequence, located between −40 and −60 and known as the UP element, helps to anchor the sigma factor at the correct location on the DNA and is also known to cause bending of DNA between −35 and −60, which is thought to stabilize the closed complex and facilitate pr ogr ession to further steps of transcription (Davis et al. 2011 ).The C-terminal domain of the two α-subunits of RNAP holoenzyme ( α-CTD) interacts with the UP element through both general and base specific interactions (Ross et al. 1993, Estrem et al. 1999, Gourse et al. 2000, Ross et al. 2001 ).This interaction is sequence dependent as the narrow minor gr ov e c har acteristic of UP element sequences is known to favour α-CTD domain binding, a trend also observed in the recent free energy model described by Salis and colleagues (Ross et al. 1993, LaFleur et al. 2022 ).The UP element is composed of two AT rich subsites known as the proximal and distal sites, each of which interacts with one αCTD domain.Promoters ma y ha ve one or both of these, and it is thought that their contributions to transcriptional output are ad diti ve to each other (Einav andPhillips 2019 , LaFleur et al. 2022 ).Whilst the distal element has pr e viousl y shown to function compar abl y to the full-length UP element, indicating that it has the greatest influence on promoter recognition (Estrem et al. 1999 ), more recent studies have shown that both elements can be influential to promoter functionality (LaFleur et al. 2022 ).
When combined with a canonical −35 box, a consensus UP element sequence w as sho wn to increase transcription up to 330fold, confirming its potential as an interesting target in the field of promoter engineering (Rao et al. 1994, Estrem et al. 1998, LaFleur et al. 2022 ).Mutation of the AT rich tracts at consensus UP elements can alter formation and stability of the closed complex, allowing modification of promoter specificity or str ength.Pr e vious studies show that DNA bending can be abolished with only single base pair mutations, highlighting the sequence specific nature of the UP element and the potential for altering these interactions in a rational and directed way (Ruff et al. 2015 ).Like with the extended −10 box, examples of weak −35 elements being compensated for by strong UP elements are abundant, bringing the same caution when considering the rational promoter engineering strategy and showing that, when determining the kinetics and stability of closed complex formation, it is important to consider the ad diti v e str ength of the combination of pr omoter elements.

Spacer region
The base pairs between the −35 and −10 boxes, known as the spacer r egion, ar e also an important determinant of pr omoter str ength and specificity.Corr ect spacer length is essential to ac hie v e pr omoter r ecognition as this anc hors the RNAP in position on the DNA so that the r equir ed sigma subunits line up with their corresponding promoter elements.A previous study calculated that 44% of E. coli pr omoters hav e a spacer length of 17 bps (Mitchell et al. 2003 ).It is well-documented that promoters with this consensus length have higher transcriptional output than identical promoters with either 16 or 18 bp spacer (Stefano and Gralla 1982, Ao y ama et al. 1983, Mulligan et al. 1985 ).To compliment this, a recent RNAP thermodynamic model predicted that inter action fr ee ener gy of a 16 or 17 bp spacer w as most fav ourable, whilst spacers of length 15, 19, and 20 bp had high positive interaction fr ee ener gies, indicating that these wer e consider abl y less favourable for transcription initiation.Notably, a spacer length of 20 bp gave the most positiv e inter action fr ee ener gy of all the promoter compositions in the study, indicating that long spacer length was highly detrimental to functional transcription (LaFleur et al. 2022 ).Together these demonstrate that spacer length is a highly important determinant of transcriptional output, indicating that altering spacer length is a po w erful w ay to manipulate both promoter strength and specificity.
Ho w e v er, as is the common theme of these promoter recognition inter actions, ther e ar e examples wher e nonoptimal spacer length is compensated for.For example, as pr e viousl y mentioned, consensus extended −10 boxes of housekeeping sigma factors can compensate for longer spacer length, giving functionality to the promoter and further highlighting the need to consider promoter elements in combination when determining the transcriptional output of a giv en pr omoter sequence (Mitc hell et al. 2003 ).It is thought that rotation of σ 4 is required to accommodate different spacer lengths, meaning that spacer length variation is limited by the possible rotation of σ 4 , which is sigma factor dependent (Zuo and Steitz 2015 ).Inter estingl y, in the case of σ S , amino acid residue E458 binds to spacer region DNA, initiating the formation of a kink in the DNA, which allows this alternative sigma factor to utilize promoters with nonoptimal spacer lengths (Typas and Hengge 2006 ).This knowledge gives an insight of how sigma factor engineering could be used alongside promoter design to functionalize promoter sequences that have nonoptimal pr omoter featur es.Suc h designed pr omoter sequences would be nonfunctional for transcription by native sigma factors but functional with a coengineered sigma factor, creating possibilities for orthogonal gene expression.
Aside from spacer length, spacer sequence is also implicated in determining transcription initiation rate.Individual base substitutions in the spacer r egion hav e been shown to effect transcription rate at given promoter sequences (Chan and Busby 1989, Mellies et al. 1994, Thouvenot et al. 2004 ).Consequently, the spacer region is often a target for promoter libraries that aim to modulate pr omoter str ength without influencing specificity (De Mey et al. 2007, Bervoets et al. 2018, Van Brempt et al. 2020 ).It is likely that spacer sequence influences transcription initiation primaril y thr ough c hanges in DNA secondary structur e and curv eability (Kanhere andBansal 2005 , Bansal et al. 2014 ).This can explain the observed influence of GC rich sequences that have previously been shown to be associated with incr eased pr omoter activity, compar ed to AT ric h spacers at given promoters (Repoila and Gottesman 2003, Liu et al. 2004, Klein et al. 2021 ).A recent AIbased tool de v eloped by Wang and colleagues uses a data-driven a ppr oac h to cr eate pr omoter r egions to flank the −35 and −10 box that are optimized for DNA shape and curvability, giving an example of how this knowledge can be manipulated for a more rational a ppr oac h to promoter engineering (Zhang et al. 2023 ).
Inter estingl y, sequence-specific r ecognition of the spacer sequence has also been observed in a study by Zenkin and collea gues.A r egion of the spacer from −22 to −18, with a loosely defined consensus sequence of AACCT, w as sho wn to interact with the β subunit of RNAP, facilitating closed complex formation.A truncated pr omoter, lac king a −35 box and other upstream fea-tures was able to initiate transcription in the presence of the Z element, whilst transcription was abolished when the Z element was r emov ed.Furthermor e, in the bac kgr ound of a weak −10 box, a consensus Z element worked together with an intact −35 box to allow promoter utilization.This again demonstrates the high le v el of complexity in the regulation of promoter recognition and the vast array of element combinations that can work together to ac hie v e closed complex formation.It is thought that pr omoter sequence-specific interactions contribute to this regulation, as the influence of the Z element differed with sequence.Although the exact mechanism of Z element recognition is unknown, it has been linked to conserved residues Y47 and R48 of the β subunit, whic h ar e in close pr oximity to the pr omoter DNA.Mutation of these amino acids abolished Z element interactions, indicating direct contact with the DNA and suggesting they could be manipulated to r egulate pr omoter r ecognition pr efer ences (Yuzenk ov a et al. 2011 ).These insights can further guide promoter engineering strategies that target the spacer region, potentially increasing the le v el of pr omoter tuneability that can be ac hie v ed.

Open complex formation
Once RNAP is recruited and anchored into position on the promoter DNA, transcription can proceed through unzipping of the DNA from positions −11 to + 2, forming a transcription bubble, which is referred to as the DNA open-complex (RP o ), a process summarized in Fig. 3 .This str and separ ation is crucial to allow positioning of the transcription start site near the catalytic Mg 2 + of the RNAP core enzyme active site, which runs between the β' and β subunits, allowing the enzyme to catalyse initial RNA synthesis at the TSS (Zhang et al. 1999, F eklisto v et al. 2017, Bo y aci et al. 2019 ).This ability for the sigma factor to melt DNA is fundamental for functional transcription initiation, making the kinetics of open complex formation an important consideration in promoter engineering.
In r ecent structur al and bioc hemical studies the mec hanism of open complex formation at a number of promoters has been elucidated, and se v er al r eaction intermediates wer e identified (Ruff et al. 2015  Nucleation of bubble formation involves initial separation of double-stranded DN A b y flipping the nontemplate strand base at position −11 into a complementary pocket on the sigma factor.This step is dynamic and highly reversible, especially in the absence of interactions that stabilize the ne wl y formed ds/ss junction (Fig. 3 A).Following successful nucleation of the transcription bubble, the strands are further separated between positions −10 and −6, and single-stranded DNA of the template strand is pulled into the positiv el y c har ged activ e site cleft.Stabilization of the str and-separ ated state can occur through flipping of the nontemplate nucleotide at position −7 into a second complementary pocket on the protein (Fig. 3 B).Bubble expansion to position + 2 and further stabilization of the str and-separ ated state r esults in an open complex that has the ability for functional transcription initiation (Fig. 3 C).2021) have suggested that, although bacterial σ 70 pr omoters most likel y follow the same mec hanism for open complex formation, the reaction kinetics and importance and longevity of different intermediates varies significantly with promoter sequence and RNAP composition (Ruff et al. 2015, Bo y aci et al. 2019, Chen et al. 2021, Saecker et al. 2021 ).In addition, e v en at a given promoter, open complex kinetics may not be uniform and both stable and dynamic populations can exist.For example, at the lacCONS promoter, a particular open complex intermediate is formed with a frequency of 40%, whilst the remaining 60% of open complexes are formed without this intermediate (Malinen et al. 2022 ).The subsequent sections further outline the stages of open complex formation, detailing the interactions and important pr omoter featur es that ar e implicated in kinetics of formation and stability of the various intermediates.

−10 box
The sequence features of the −10 box are the primary determinant of open complex kinetics.Significantly, high AT content in the −10 box pr omotes str and separ ation as A −T bonds are known to r equir e less ener gy to br eak than G −C bonds, explaining the AT rich consensus of the −10 box that is recognized by housek ee ping sigma factors (Mo 2006 , Khandelwal andBhyr av abhotla 2010 ).It is known that this alone can allow fast open complex kinetics.Whilst −11A and −7T ar e highl y conserv ed, and flipping of these bases into complementary pr otein poc kets contributes to fast initiation kinetics, their absence can be compensated for by a sufficientl y AT ric h −10 box, whic h w orks to lo w er the ener gy r equir ed to unzip the DNA, without the r equir ement for base-flipping (Heyduk and Heyduk 2014 , Ruff et al. 2015 ).Equall y, a r ecent thermodynamic model of RNAP-pr omoter inter actions pr edicted that some GC rich hexamers also bind favourably to RNAP to activate transcription, indicating that it is not a r equir ement that the −10 box is AT rich (LaFleur et al. 2022 ).This flexibility allows housek ee ping sigma factors to initiate transcription at div erse pr omoter sequences that are not a perfect match to consensus .T his further highlights the complexity of the system by indicating that there is not one set formula of particular sequence features that need to be pr esent.Conv ersel y, inter actions ar e ad diti v e to eac h other and should not be looked at independently when considering promoter strength and specificity.
At many σ 70 promoters the primary interaction that nucleates initial transcription bubble formation is the flipping of the nontemplate strand base −11A out of the base stack and into a complementary protein pocket formed by the aromatic residues F427, Y430, W433, and W434 of σ 2.3 (P ana ghie et al. 2000, Lee, Lim and Adhya 2004, F eklisto v and Darst 2009, 2011 ).T his base-flipping nucleates strand separation and creates flexibility in the strand, allowing a 90 • bend in the DNA, which is thought to direct single str anded downstr eam DN A to w ar ds the active site cleft (F eklisto v and Darst 2011, Saecker et al. 2011, Ruff et al. 2015 ).It is curr entl y unknown exactly what initiates base flipping but it has been suggested that transient closure of the RNAP clamp causes twisting of the DNA which facilitates −11A flipping into its complementary pr otein poc ket (Ruff et al. 2015, Chen et al. 2020 ).
The importance of −11A base-flipping on open complex formation is demonstrated by a study in which alanine substitutions at the four amino acids that form the −11A specificity pocket prevented open complex formation, without having an effect on initial closed complex formation (Cook and DeHaseth 2007 ).This complements a pr e vious study, whic h demonstr ated that amino acid substitutions at Y430 and W433 caused defective nucleation of melting (Juang and Helmann 1994 ).Furthermore, when look-ing at the first 3 bp of the −10 box, the free energy model developed by LaFleur et al. ( 2022 ) showed that 10 out of 14 of the most fa vourable motifs , characterized by interaction free energies less than one, had an A at position −11, further indicating that this position is r elativ el y conserv ed (LaFleur et al. 2022 ).Ho w e v er, despite this pr efer ence, ther e ar e numer ous examples of functional σ 70 pr omoters wher e the nucleotide at position −11 is not A, demonstr ating that ther e is flexibility within this pr ocess.In suc h cases it is likely that stronger closed complex reactions can compensate for the lack of base flipping, creating the stability required to facilitate successful transcription (Koo et al. 2009 ).
Significantly, due to its high melting capacity, defined by presence of all four melting pocket residues, housek ee ping sigma factors are more lenient to deviations from consensus promoter elements, whilst some alternative sigma factors require a near exact consensus sequence for successful promoter recognition (Helmann and Chamberlin 1988, Lonetto et al. 1992, Koo et al. 2009 ).This could be explained by the idea that base flipping by housek ee ping sigma factors r emov es the r equir ement for str ong closed complex interactions as it quic kl y stabilizes the initial open complex, pr e v enting dissociation of the complex (Koo et al. 2009 ).All group 3 sigma factors lack one or more of the four critical melting residues that form the −11A pocket in housek ee ping sigma factors (F eklísto v et al. 2014 ).Mutating these r esidues bac k to the housek ee ping consensus impr ov ed ability to melt DNA and r elaxed the r equir ement for a perfect promoter, indicating that these amino acid residues can be targeted to alter promoter specificity.Wher e r equir ed, stringent pr omoter r ecognition could be ac hie v ed by m utating the r ele v ant melting poc ket r esidues, indicating how this information could be used for rational promoter design (Koo et al. 2009 ).
Inter estingl y, some alternativ e sigma factors hav e similar mechanisms for transcription bubble nucleation despite having v ery differ ent consensus pr omoter motifs and mor e stringent pr omoter regulation.An example of this is the analogous 'melting loop' of σ E , which forms a complementary pocket for the base −11C, facilitating base flipping and subsequent transcription bubble nucleation in a compar able manner.Campa gne et al. ( 2014) engineered the σ E melting loop to recognize different bases at this position, showing that the composition of these amino acid r esidues dir ectl y affected pr omoter consensus .T his demonstrates a po w erful strategy for engineering promoters so that open complex nucleation is initiated specifically by its sigma factor pair with a complementary melting pocket.
The base at position −12T is highly conserved, indicating its importance for function (F eklisto v and Darst 2011 ).This can be partially explained by the fact that certain combinations of nucleotide bases within a promoter are known to encourage baseunstacking, further facilitating base flipping and thereby influencing the reaction kinetics of bubble nucleation (Haugen et al. 2008, Ruff et al. 2015, Chen et al. 2021 ).Such stacking interactions dictate that the A nucleotide of a TA motif is the easiest base to flip, meaning that a T at position −12 is most thermodynamicall y favour able for flipping −11A into its complimentary pr otein pocket (F eklísto v et al. 2014 ).Furthermore, this base pair at position −12 is dir ectl y, and sequence specificall y, contacted on the template strand by Q437 and T440 of σ 2 , an interaction that remains during transcription bubble nucleation (Ruff et al. 2015 ).Mutation of these amino acid r esidues alter ed the pr eferr ed pr omoter consensus at the −12 position, implicating this position of the −10 box in sequence specific interactions with the sigma factor (Siegele et al. 1989, Waldburger et al. 1990 ). T he in volvement of these residues in interaction with position −12T was further confirmed by elucidation of the crystal structure of σ 2 bound to −10 box DNA (Murakami et al. 2002 ).Despite the thermodynamic pr efer ence for T at position −12 and the fact that the top 4 most favourable −10 motifs, as predicted by the free energy model of LaFleur et al. ( 2022), all begin with TA, it is important to highlight that compositions with C, G, and A at position −12 also exhibit negativ e inter action ener gies .T he fact that 9 out of 14 of the most fa vourable combinations , characterized by interaction free energies less than 1, do not involve −12T further emphasizes that the consensus at this position is not fixed (LaFleur et al. 2022 ).
Following transcription bubble nucleation, subsequent stabilization of the ds/ss junction occurs through a conformational change in amino acids W433 and W434, resulting in the formation of a wedge, which interacts with the exposed −12T base.Base flipping and subsequent conformational changes in W433/W434 are often dynamic, meaning that the structure of this early transcription bubble intermediate cannot always be elucidated (Chen et al. 2020 ).It has also been suggested that transient melting of −12T occurs, adding to the difficulty in elucidating the structure and reaction kinetics of the intermediate at this stage (Chen et al. 2020 ).To highlight their importance in stabilization of the ds/ss junction at σ 70 pr omoters, m utation of W433 and W434 results in a highly deleterious phenotype, showing that these residues are essential for functional transcription of housek ee ping genes (Park and Wang 2021 ).As an important enabler of open complex formation, interactions of the sigma factor with −12T could be targeted to redefine the selectivity or kinetics of transcription bubble nucleation and initial stabilization, in order to influence transcription rate and functionality.
In order to pr ogr ess to a functional open complex, further bubble pr opa gation and stabilization is r equir ed, whic h is also known to be sequence specific.During this process, bases of the nontemplate strand are bound through phosphate backbone interactions with sigma factor residues outside of the active site cleft, promoting strand separation.This is supported by early evidence from c hemical cr osslinks that show inter actions between sigma factor residues and nontemplate strand DNA (Simpson 1979, Park et al. 1980, Hilton and Whiteley 1985, Buckle et al. 1991 ).In more stable open complexes these interactions are maximized and bases of the nontemplate strand (at positions −10 to −8) are stacked with each other (Saecker et al. 2021 ).Consequently, these interactions could be manipulated to influence open complex stability and kinetics.
To further stabilize the str and-separ ated state, single-str anded bases of the template strand at the −10 box position are pulled into the positiv el y c har ged activ e site cleft, further stabilizing the str and-separ ated state .T hey ar e subsequentl y ca ptur ed by electr ostatic inter actions with the inside of the cleft (F eklisto v et al. 2017 ).A base-specific pr otrusion poc ket on the sigma factor, whic h interacts with the template strand base at position −9 is thought to be specific for pyrimidines, meaning that this position has preference for a C or T on the template stand, which helps stabilize open complex formation when present (Chen et al. 2020 ).At most promoters, this is not rate limiting and, therefore, generally has little influence on kinetics of open complex formation.Ho w e v er, at weaker promoters this could be an important consideration, and ther efor e, a possible target for altering open complex stability.
Base −7T is highly conserved in promoters of both housek ee ping and alternative sigma factors across the whole bacterial domain, implying its importance for function (Moyle et al. 1991, Feklistov and Darst 2011, Heyduk and Heyduk 2014 ).Its importance is further highlighted by the r ecent fr ee ener gy model wher ein all of the top 10 most favourable −10 box motifs all feature a T at position −7.Additionally, of the 23 −10 hexamers exhibiting negativ e fr ee ener gies, indicativ e of favour able inter action ener gy, 16 featured −7T (LaFleur et al. 2022 ).Based on studies of open complex kinetics, it is thought that flipping of base −7T into a second complementary protein pocket on the sigma factor (formed by r esidues fr om σ 1.2 , σ 2.1 , and σ 2.3 ) pr omotes further bubble pr opagation and stabilization, allowing the bubble to extend from position −11 to + 2 (F eklisto v and Darst 2011 ).Even when upstream bases ar e pr emelted by mis-matc h base pairing, a T to A mutation at this position pr e v ented full bubble formation, suggesting that this base is involved in bubble pr opa gation r ather than nucleation of bubble formation (Chen et al. 2020 ).Aside from its role in bubble stabilization, flipping of −7T into its protein pocket is a prerequisite for displacement of σ 1.1 from the active site cleft (Chen et al. 2020 ), providing a further explanation for the importance of this position in open complex formation and increasing its potential as a target in promoter engineering (Heyduk andHeyduk 2014 , Ruff et al. 2015 ).

Discriminator element
The discriminator element spans from promoter positions −6 to −3 and is known to play a role in the kinetics and stability of the open complex.Interactions between the sigma factor and discriminator element positions −6 and −4 tr a p single str anded template DNA inside the active site channel, promoting separation of the strands and therefore further stabilizing the growing transcription bubble.Changes in the discriminator length and sequence are known to alter these interactions, leading to changes in the structure and stability of the RP o and, therefore, influencing tr anscription r ate (Saec ker et al. 2021 ).
Studies on a number of promoters suggested that a GGG motif on the nontemplate strand between −6 and −4 gave the longest RP o half-life.Of particular note in this study is the base −5G, which w as sho wn to increase the half-life 10-to 50-fold at fiv e differ ent pr omoters, when compar ed to −5C (Haugen et al. 2006 ).In addition, it has been suggested that −6G can increase transcription rates at certain promoters if it is flipped into a complementary hydr ophobic pr otein poc ket on the surface of σ 1.2 , whic h stabilizes RP o by pr e v enting r eannealing of the str ands (Haugen et al. 2006, Zhang et al. 2012, Karpen and deHaseth 2015 ).Despite this proposed discriminator consensus, the r ecent fr ee ener gy model published by LaFleur et al. ( 2022 ) shows that the 20 most favourable 3 bp discriminator motifs had compar able tr anscription initiation rates to each other, indicating that there is not one defined consensus sequence.Inter estingl y, man y of the higher ranking motifs in terms of interaction free energy did not have a G at position −5 or −6 and the discriminator GGG had positive interaction fr ee ener gy, indicating that, in the in vitro context in which this study was conducted, this motif was not particularly favourable for transcription initiation and suggesting that long RP o half-life does not necessarily positively correlate with higher transcription rate (LaFleur et al. 2022 ).
The discriminator element interacts with σ 1.2 to modulate these effects, with amino acid M102 making specific contacts with the nontemplate strand at position −5 (Haugen et al. 2006, 2008, Zhang et al. 2012, Basu et al. 2014, Zuo and Steitz 2015 ).Zhang et al. ( 2012) further confirmed this b y sho wing that an alanine substitution at M102 decr eases tr anscription r ate by decr easing open complex lifetime.Inter estingl y, it is kno wn that rRN A promoters often have a −5C consensus, as the resulting short open complex lifetime allows it to be subject to a lot of regulation (Travers 1990, Haugen et al. 2006 ).T hus , manipulating the discriminator element sequence of promoters with external regulators could be Figur e 4. T he mechanism of initial RNA synthesis and promoter escape in the RNAP active site cleft.The interactions between RNAP holoenzyme ( ααββ' ωσ ) and promoter are given at each stage of growth of the initial transcribing complex (ITC), culminating in promoter escape.Solid lines show RN AP-DN A interactions whilst disrupted interactions are represented by broken lines .P ositions on the template strand are gi ven, n umbered according to distance from the transcription start site.
one way in which this knowledge could be used in promoter engineering.Although this is not a highly conserved promoter element, these interactions can help stabilize open complex formation, which is particularly significant at promoters with otherwise weak consensus motifs.
In addition to discriminator sequence, discriminator element length also plays a role in stability of the open complex.Research shows that the lifetime of the open complex decreases as discriminator element length increases (Jeong and Kang 1994, Liu and Turnbough Jr 1994, Lewis and Adhya 2004 ).Some of the loss of stability seen with longer discriminator elements is a result of a r equir ement for pr escrunc hing of the DNA strands to produce an open complex with the designated TSS (Vvedenskaya et al. 2016 ).

Cor e r ecognition element
The core recognition element (CRE), located at position −4 to + 2 r elativ e to the TSS also has a role in stabilizing the open complex, allowing the transcription bubble to propagate to position + 2. Inter actions of RNAP cor e enzyme ( β r esidues 84-642) with all nucleotides of the CRE were demonstrated in crosslinking experiments (Naryshkin et al. 2000) .Importantl y, + 2G inter acts sequence specifically with a pocket formed by β subunit residues R151, I445, D446, R451, L538, and V547, which further stabilizes the transcription bubble.Although not required for successful transcription initiation, the base G has a 5-fold lo w er RN AP off r ate than an y other base at this position, meaning that + 2G makes the open complex more stable .Furthermore , nontemplate strand position + 1 interacts with β amino acid W183, contributing to unstacking of + 1 and + 2 and consequently making the + 2 base available for interacting with the CRE pocket (Zhang et al. 2012 ).As stable open complexes often lead to higher transcription rates, manipulating these stabilizing interactions could be a pathway for modulating transcription rate.

Initial transcription and promoter escape
As the link between initiation and elongation, promoter escape defines transcriptional output at many promoters, and is there-fore an important consideration for promoter prediction tools and promoter engineering (Reppas et al. 2006, Hatoum and Roberts 2008, Ko and Heyduk 2014 ).It should be noted that m uc h of the information presented in this section relates to transcriptional regulation when NTP concentrations are low.When NTP concentrations ar e sufficientl y high, pr omoter esca pe is not thought to be r ate-limiting and man y of the tr anscriptional r egulation mec hanisms that are described below are not observed (LaFleur et al. 2022 ).Due to the fact that cellular NTP le v els can v ary with environmental conditions, these mechanisms of promoter escape r emain inter esting to study when considering the predictability of biological systems in industrial conditions .T he steps in initial RNA synthesis and subsequent promoter escape are shown in Fig. 4 , which highlights the regulation involved in the transition to pr oductiv e tr anscription.
When template strand DNA + 1 and + 2 positions are situated in the active site cleft, initial binding of nucleotide triphosphates (NTPs) at position + 1 results in creation of the initial transcribing complex (ITC), an RN A-DN A hybrid to which further NTPs can be added.Subsequent translocation of the growing complex within the active site is essential to ensure that the template strand is in the correct position for further NTP binding and ITC elongation (Revyakin et al. 2006 ).Due to the fact that the RNAP holoenzyme remains held in position at promoter DN A b y the interactions that are involved in open complex formation, the translocation of the short ITC within the enzyme active site cleft generates stress in the initiation complex as the upstr eam pr omoter DNA cannot mo ve , resulting in scrunching of downstream DNA as the template str and mov es further into the active site cleft, as illustrated in Fig. 4 ( Kapanidis et al. 2006, Revyakin et al. 2006, Winkelman et al. 2015, Chen et al. 2021 ).
This stress build-up destabilizes RNAP-promoter contacts to driv e pr omoter esca pe, and is r elie v ed as RN AP-DN A contacts are broken (Henderson et al. 2019 ).A mechanism of downstream to upstream disruption of interactions has been proposed, starting with disruption of the interactions between RNAP and the discriminator and CRE elements, which are thought to be broken by translocation of an ITC containing a 5-mer RNA transcript into the active site (Fig. 4 B).Disruption of the −10 box contacts is commonly the slo w est step and results from translocation for synthesis of an ITC complex containing an RNA 9-mer (Fig. 4 C).In the final stage of promoter escape, −35 box and upstream contacts ar e br oken as a r esult of tr anslocation for synthesis of 11mer RNA transcript (Fig. 4 D) (Henderson et al. 2017, Plaskon et al. 2021 ).
Whilst some of these growing ITCs escape the promoter (typically when the RNA transcript is 9-15 nucleotides in length), which allows them to transition to transcription elongation and go on to produce full length RNA transcripts, some complexes ar e nonpr oductiv e and r elease the small RNA (typicall y 2-10 nucleotides in length) from the hybrid as an alternative way to reduce the translocation stress (Heyduk and Heyduk 2018 ).This results in them returning to the open complex state in a process known as abortive initiation.These nonproductive complexes, whic h r epr esent 30%-50% of open complexes under the reported conditions, can get stuck in cycles of abortive initiation and do not produce functional transcripts (Ko and Heyduk 2014, Duchi et al. 2016, Henderson et al. 2017, 2019 ).
Significantl y, a consider able number of open complexes encounter transcriptional pause when ITC reaches a length of + 6 as a result of σ 3.2 occupying the RNA exit channel of the RNAP holoenzyme active site .T his regulates continuation of transcription by creating a physical barrier that r equir es displacement before further extension of the RNA transcript can occur.To further demonstrate its influence, partial deletion of σ 3.2 significantl y r educed pausing at ITC + 6 (Duchi et al. 2016 ).In one study of transcriptional pausing at the lac promoter, 80%-90% of transcripts w ere sho wn to enter this pause at ITC + 6 and 20% of these did not exit the pause at the first attempt under the conditions tested (Dulin et al. 2018 ).As with the other examples of promoter esca pe mec hanisms, the r egulation of tr anscriptional pauses was demonstrated to be NTP concentration dependent.At low NTP concentrations the complexes were trapped in the paused state until NTP concentrations were sufficiently high (Dulin et al. 2018 ).Whilst 70%-80% of stalled constructs underwent abortive transcription, a second subset were thought to enter the scrunchingunscrunching pathway in which the RNA transcript was retained but transcription did not continue to the productive cycle (Dulin et al. 2018 ).
In this way, the pr oductiv e pathway competes with the abortive pathway and scrunching pathway to r egulate tr anscriptional output.These processes are subject to regulation that is, at least partly, influenced by the composition of the promoter sequence (Hsu et al. 2003, 2006, Chen et al. 2021, Saecker et al. 2021 ).The sections below outline the promoter elements involved in regulation of initial RNA synthesis and the kinetics of promoter escape.

Upstr eam pr omoter elements
The strength of RN AP-DN A interactions inv olved in formation of both the closed and open complex, as described abo ve , ha ve the biggest influence on the kinetics of promoter escape.Whilst it might be assumed that promoter sequences with a strong match to consensus motifs would result in a high transcriptional output, these sequences can be nonfunctional in low NTP conditions due to failure to disrupt the interaction between promoter and RNAP, resulting in an inability to move to further stages of transcription (Vo et al. 2003, Hsu et al. 2006, Ko and Heyduk 2014 ).To support this, it has been demonstrated that the highest gene expression le v els in vivo were observed when either the −35 box or −10 box de viated fr om exact consensus (Henderson et al. 2019, Urtecho et al. 2019 ).This can be further explained by the fact that stable open complexes r equir e mor e tr anslocation str ess and synthesis of a longer ITC to disrupt the initiation complex and allow promoter escape.As a result, stable open complexes often result in incr eased tr anscriptional pausing, higher yields of abortiv e pr oducts and lo w er pr omoter esca pe efficiency (Ellinger et al. 1994, Vo et al. 2003, Duchi et al. 2016, Saecker et al. 2021 ).As promoter escape can be the rate-limiting step at strong consensus promoters in low NTP conditions, it is an important target for promoter engineering (Einav and Phillips 2019 ).
Regulation of promoter escape is both sequence and position specific.The information in the pr e vious sections r elating to formation of a stable transcription bubble can be considered in this context, with the promoter elements that are mentioned this time having an inverse effect on productive transcriptional output.Consistent with its strong influence on open complex stability, changes in the −10 box sequence had the largest effect on promoter escape kinetics at low NTP concentrations, once again highlighting its importance (Ko and Heyduk 2014 ).Ho w e v er, it should be noted that these are not always inv ersel y corr elated.In one example, mutations at position −10 and −6 effected rate of promoter melting but did not influence the rate of promoter escape, showing that it is possible to effect one without changes in the other (Ko and Heyduk 2014 ).
Independently of influencing stability of interactions with the sigma factor, the sequence of the −10 box also has a role in influencing kinetics of initial transcription.In the RNA exit channel, steric and electrostatic interactions cause the 5-mer ITC to be pushed against σ 3.2 , resulting in contacts between the nontemplate −10 nucleotide and σ 2.3, and template strand position −10 and σ 3 (Zuo and Steitz 2015 ).These contacts ar e involv ed in displacement of σ 3.2 in the exit c hannel, whic h is essential for further RNA synthesis and the pr ogr ession of transcription.T hus , the bases of the −10 box influence both ability for and kinetics of the ne wl y synthesized RNA pr oceeding thr ough the RNA exit channel.

Discriminator element
The discriminator element also has a significant influence on the kinetics of initial RNA synthesis.Interaction between DNA template strand positions −4 and −3 with σ 3.2 residues D514, D516, D517, and F522 helps to pr eor ganize the template strand and force it to adopt an A-form helical conformation, which facilitates NTP addition by placing the strand in the correct position within the active site cleft (Zhang et al. 2012 , Zuo andSteitz 2015 ).Mutations or deletions in amino acid residues that dir ectl y contact the DNA template strand impaired initial NTP binding (Pupov et al. 2014 ).Consistent with this, Henderson et al. (2017 ) show that a consensus discriminator element leads to longer abortive products and higher promoter escape efficiencies indicating that a strong discriminator pr e v ents abortiv e initiation at earl y sta ges of initial transcription in low NTP conditions .Conversely, the presence of a Z-element like sequence at position −4 is thought to effect transcriptional output by interacting with the β subunit of RNAP during initial tr anscription, r esulting in stabilization of σ factordependent RNAP pausing and increased abortive initiation from a synthetic pr omoter (Yuzenk ov a et al. 2011 ).These observations show that changes in the discriminator sequence can have an influence on promoter escape, and thus transcription rate .T his is particularly significant in the case of strong consensus promoters, which could be matched with discriminator elements that favour pr omoter esca pe, in cases wher e high tr anscription r ates ar e r equired.

Initially transcribed sequence
The composition of the initially transcribed sequence (ITS) from + 1 to + 20 can have a big influence on the outcome of transcription when NTP concentrations are low (Hsu et al. 2006, Davis et al. 2011, He yduk and He yduk 2018, Mazumder and Kapanidis 2019 ).Pr e vious studies sho w ed that changing the ITS effected promoter str ength mor e than 10-fold (Kammer er et al. 1986, LaFleur et al. 2022 ).In particular, the ITS has been shown to influence the ratio of abortive to full length transcripts (Hsu et al. 2006 ).In general, a correlation between purine content of the ITS and high productive yield was observed, giving the first indication that this effect is sequence dependent (Hsu et al. 2006 ).Inter estingl y, whilst this influence is often independent of the sequence of the upstream promoter elements, certain upstream promoters have the ability to modulate ITS impact through allosteric effects (Heyduk and Heyduk 2018 ).This adds a further le v el of complexity and again highlights that elements should not be considered independently of each other, even as far as the downstream ITS.This means that the ITS should be considered according to its wider promoter context, implying that it should be defined as part of the promoter sequence.
One of the ways that ITS controls promoter escape and initial transcription is through regulation of transcriptional pausing.Base pr efer ences within the ITS have been identified that match with known elongation pausing signals .T he YG sequence motif is associated with long-lived initiation pauses as it is difficult for RNAP to tr anscribe, r esulting in pausing in the context of other str esses suc h as that caused by σ 3.2 blocking the RNA exit channel (Bauer et al. 2016, Dulin et al. 2018 ).The r equir ement for displacement of σ 3.2 from the RNA exit channel induces a significant pause at most promoter sequences, making this a common point of regulation.A study on the lac promoter shows that replacing + 6T and + 7G on the nontemplate strand with + 6G and + 7T, resulted in a higher rate of initial transcription due to a shortening of the σ 3.2 -induced transcriptional pause (Dulin et al. 2018 ).Additional studies show that a range of other combinations at these positions also resulted in reduced transcriptional pausing (Bauer et al. 2016, Duchi et al. 2016 ).It is thought that amino acid D446 of the β-subunit of RNAP dir ectl y contacts ITS position + 7G, influencing the transcriptional pause.Very few transcripts were able to extend beyond six nucleotides with an alanine substitution at D446 (Dulin et al. 2018 ).This pr efer ence was promoter specific, again indicating that upstream promoter sequence has a significant effect on the influence of the ITS.At low NTP concentrations, YG repeats at any position of the ITS effected the ability for pr omoter esca pe when in combination with stresses such as DNA scrunching or expansion of the transcription bubble (Dulin et al. 2018 ).T his is because ITC translocation is unfa vourable and r a pidl y r e v ersible in the absence of the corr ect NTP, due to the resulting stress build-up, making the kinetics of promoter esca pe str ongl y dependent on NTP concentration (Dulin et al. 2018 ).
ITS also regulates promoter escape through interactions that contribute to stability of the growing DNA-RNA duplex.High stability in the first 10 bps, which is defined by the composition of the ITS, correlated with fast promoter escape .T his can be explained by the fact that high stability reduces the chance of disassociation or misalignment of the transcript within the active site (Heyduk and Heyduk 2018 ).Furthermore, the structure of the RNA 5 end, which is also defined by the ITS, has a role to play in kinetics of pr omoter esca pe thr ough its inter action with σ 3.2 .Char ge r epulsion between the growing RNA and σ 3.2 ar e involv ed in displacing σ 3.2 from the active site cleft to allow elongation of the initially transcribed complex (Basu et al. 2014, Pupov et al. 2014 ).To further support this, studies show that the structure of the 5 RNA end contr ols r ate of pr oductiv e pause exit when ITC r eac hes + 6, which is known to be the point at which σ 3.2 is displaced from the RNA exit channel (Dulin et al. 2018 ).Whilst 5 RNA structure is influential, amino acid substitutions in σ 3.2 did not significantly affect these interactions, indicating that base-pair specific interactions with specific amino acids of σ 3.2 are not involved in this regulation (Pupov et al. 2014 ).

Conclusion
The knowledge presented in this r e vie w, r elating to the influence of promoter sequence on the mechanisms and kinetics of transcription initiation, can be utilized in the de v elopment of a more r ational a ppr oac h to pr omoter engineering.This knowledge-based a ppr oac h can help us to design more predictable , tuneable , and orthogonal promoter sequences, which is important in the context of regulation of gene expression.
A highly significant take-home message is that transcription initiation is complex and that underestimating this prevents the predictability of promoter engineering.Whilst the transcription mec hanism is lar gel y conserv ed, the kinetics of transcription initiation ar e str ongl y pr omoter sequence dependent.Pr omoter activity, strength, and specificity is a function, not just of DNA recognition and binding, but also of DNA melting and promoter escape, meaning that the stages of transcription initiation cannot be consider ed independentl y of eac h other.For example, a balance in RP o stability is r equir ed for functional transcription as stable open complexes are often limited at promoter escape.
An interplay of different promoter characteristics and their interaction with the sigma subunit of the RNAP holoenzyme determines the resulting transcriptional output.Despite the fact that eac h pr omoter r egion has a specific role in recognition, melting, and initial transcription, these interactions are ad diti ve to each other and sit within the more general context of the promoter sequence , structure , and surr ounding r egulation, meaning that individual promoter motifs cannot be looked at independently when defining promoter strength and specificity.Building on suggestions from previous reviews, different classifications of promoter sequence could be foreseen that categorize promoters based on the combination of promoter elements they contain (Hook-Barnard and Hinton 2007 ).Such classification could be used to define the promoter engineering strategy.For example, it is known that promoters with weak −35 and −10 consensus motifs show high dependence on other promoter features such as consensus Z element.This could direct the interactions that should be targeted to tune promoter strength or specificity at a given promoter sequence.
Of equal importance is that the knowledge gathered here shows that it is essential that upstream and downstream regions are defined as part of the promoter sequence, as they often have a significant influence on promoter output.This suggestion is supported by se v er al studies suc h as that by Davis et al. (2011 ) who c har acterized a set of promoter sequences that include a do wnstream insulator follo wing the observation that the 20-30 nucleotides following the transcription start sites can also have a significant impact on promoter clearance, and thus transcription initiation rate (Davis et al. 2011, Balzer Le et al. 2020 ).Urtecho et al. (2019 ) studied the relationships between promoter elements by combining different elements across different promoter bac kgr ounds.It was shown that, whilst 74% of variation resulted fr om c hanges in the −35 and −10 boxes, 19% of variation could not be explained by linear relationships between elements and likel y r esulted fr om mor e complex r elationships between the elements, further confirming the complexity of promoter regulation.Ho w e v er, these combinations focus on upstream promoter elements and are still limited in the number of different combinations tested, meaning that the complexity is likely still underestimated.
Pr omoter pr ediction tools ar e alr eady being impr ov ed to take suc h observ ations into account.A number of r ecent models hav e provided significant insight into the r elativ e contributions of differ ent pr omoter motifs on contr olling tr anscription r ate, pr oviding a valuable resource for studying the influence of promoter sequence on transcription initiation rate (Einav and Phillips 2019, Urtecho et al. 2019, Lagator et al. 2020, LaFleur et al. 2022 ).The Salis pr omoter calculator, whic h is based on a thermodynamic model of RN AP-DN A inter actions, alr eady considers a wide r ange of promoter elements in its training dataset, and generates quantitativ e pr edictions on the favourability of different compositions of both upstream and downstream elements.Such datasets can be utilized for forw ar d engineering of pr omoter sequences, pr oviding the opportunity to create promoters with desired expression properties, or to remove sequences in the surrounding DNA that might interfere with tr anscription, ther efor e, giving an example how an in-depth understanding and quantitative data on the inter actions involv ed in tr anscriptional contr ol can be used to rationally design new promoter sequences (LaFleur et al. 2022 ).In a further example of how a data driven approach can be utilized in promoter engineering, Wang and colleagues combine knowledgebased promoter design with AI based generation of promoter sequences to significantl y impr ov e pr omoter performance, further highlighting this potential (Zhang et al. 2023 ).
Whilst there is already a wealth of knowledge that can be utilized in the context of promoter engineering, the complexity of transcription initiation remains a limiting factor.More detailed mechanisms of RP o formation and the associated reaction kinetics continue to be elucidated and the dynamic nature of transcription initiation means that, at certain positions of unstable intermediates, the structure is not fully resolved.Furthermor e, man y of these structural models of transcription initiation have been conducted on a limited number of promoters and under one set of environmental conditions, limiting their broad applicability.
It is known that the mechanism of transcription initiation, and especially the reaction kinetics, are strongly dependent on both the promoter sequence and the environmental context.Significantl y, under differ ent envir onmental conditions certain pr omoter features become more or less influential, meaning that the importance of different promoter motifs on transcriptional output varies with wider promoter sequence and environmental context.Whilst recent quantitative models of transcription rate allow incr easingl y accur ate pr edictions of pr omoter str ength, the complexity of the system, and this context dependence, means that ther e ar e still some significant limitations to our knowledge of tr anscription initiation.Nonetheless, r ecent r a pid adv ancements in the field, coupled with high throughput screening techniques and greater data processing capabilities, suggest that these gaps in our knowledge can be quic kl y filled, enhancing our ability to reliabl y pr edict and forw ar d engineer impr ov ed pr omoter sequences for use in a variety of applications.

Figur e 1 .
Figur e 1. T he stages of transcription initiation in prokaryotes.An ov ervie w of the interactions between RNAP holoenzyme ( ααββ' ωσ ) and promoter elements at each stage of transcription is given.TSS denotes the transcription start site.

Figur e 2 .
Figur e 2. Interactions in volved in formation of the closed complex.Both base-specific (solid lines) and nonspecific (broken lines) interactions betw een RN AP holoenzyme ( ααββ' ωσ ) and pr omoter DNA ar e shown.
, Bo y aci et al. 2019 , Chen et al. 2020 ).Whilst it is now thought that specific −10 box features are not recognized by the open-clamp form of RNAP, which forms the closed-complex, subsequent transient clamp closure results in recognition of −10 box sequences.If favourable sequence features are present, this clamp closur e r esults in nucleation of tr anscription bubble formation, starting the transition to RP o .If −10 box features are unfavour able for tr anscription initiation, the clamp r eopens and continues scanning the genome for potential promoter sequences (F eklisto v et al. 2017 ).

Figur e 3 .
Figur e 3. T he proposed intermediates in the transition from closed to open complex.The interactions between RNAP holoenzyme ( ααββ ωσ ) and 10 box promoter DNA that result in transcription bubble formation are shown.Solid lines indicate base-specific DNA contacts whilst broken lines show nonbase specific interactions of the holoenzyme with the DNA phosphate backbone.Both the template strand (T) and nontemplate strand (NT) of the pr omoter ar e giv en.The n umbers gi v en abov e eac h pr omoter element indicate distance fr om the tr anscription start site.