Examining the Effects of Temperature on the Evolution of Bacterial tRNA Pools

Abstract The genetic code consists of 61 codons coding for 20 amino acids. These codons are recognized by transfer RNAs (tRNAs) that bind to specific codons during protein synthesis. All organisms utilize less than all 61 possible anticodons due to base pair wobble: the ability to have a mismatch with a codon at its third nucleotide. Previous studies observed a correlation between the tRNA pool of bacteria and the temperature of their respective environments. However, it is unclear if these patterns represent biological adaptations to maintain the efficiency and accuracy of protein synthesis in different environments. A mechanistic mathematical model of mRNA translation is used to quantify the expected elongation rates and error rate for each codon based on an organism's tRNA pool. A comparative analysis across a range of bacteria that accounts for covariance due to shared ancestry is performed to quantify the impact of environmental temperature on the evolution of the tRNA pool. We find that thermophiles generally have more anticodons represented in their tRNA pool than mesophiles or psychrophiles. Based on our model, this increased diversity is expected to lead to increased missense errors. The implications of this for protein evolution in thermophiles are discussed.


Introduction
Life exists in a wide variety of environments, even those that would be considered extreme relative to the environments humans occupy, ranging from deep-sea hydrothermal vents to arctic regions (Merino et al. 2019).Organisms found in these extreme environments play an important role in the cycling of nutrients and energy, and they are also important for the decomposition of organic matter (Merino et al. 2019).Thus, there is interest in studying the mechanisms and evolutionary adaptations that allow these organisms-broadly referred to as "extremophiles"-to not only survive but thrive in these extreme environments.Of particular interest is the ability of bacteria to thrive in extreme temperature environments.
A key process hypothesized to undergo adaptation to extreme environments is the process of protein synthesis.A large portion of a cell's resources is dedicated to protein synthesis, with about 50% in growing bacteria cells (Buttgereit and Brand 1995;Russell and Cook 1995).Although the basic process of protein synthesis is conserved across all domains of life, extremophilic organisms are hypothesized to have undergone many adaptations to maintain functional protein synthesis under conditions that would stress their mesophilic counterparts.For example, thermophilic (heat-loving) bacteria are hypothesized to have higher GC% in their protein-coding genes, ribosomal RNA (rRNA), and transfer RNA (tRNA) as an adaptation to higher temperatures, although the relationship between temperature and GC% of protein-coding genes appears to be due to shared ancestry rather than true adaptation.Relative to mesophiles, thermophiles and psychrophiles are hypothesized to invest less or more, respectively, in translation resources.Thermophiles are observed to have relatively few tRNA and ribosomal rRNA genes encoded in their genomes, while psychrophiles are observed to have a greater number of these genes.Relatedly, previous work found that the degree of codon usage bias varies across psychrophiles, mesophiles, and thermophiles, with psychrophiles having the most biased genomes and thermophiles having the least biased.This is hypothesized to reflect stronger (weaker) selection on translation efficiency at lower (higher) temperatures due to slower (faster) diffusion and chemical reaction rates (Vieira-Silva and Rocha 2010;Arella et al. 2021).
A crucial aspect of protein synthesis is the tRNA pool, making it a likely target for potential adaptations related to environmental factors.Notably, there have been numerous observations made about the apparent impact of environmental temperatures on the evolution of the tRNA pool.Aside from the apparent relationship of GC% and the number of tRNA genes with temperature, adaptation to extreme environments is also thought to occur via tRNA nucleotide modifications.For example, the modified nucleotide pseudouridine Ψ is more prevalent in psychrophilic bacteria, a supposed adaptation to maintain the flexibility of tRNA at colder temperatures.Additionally, the number of anticodons represented is hypothesized to be a possible adaptation related to temperature.Species generally use fewer than all possible 61 anticodons due to wobble matching at the third nucleotide of the codon.Previous work concluded thermophiles have the most diverse tRNA pools (i.e. the most anticodons represented) despite having the fewest number of tRNA genes, on average (Satapathy et al. 2010).As part of the same analysis, psychrophiles were found to have the least diverse tRNA pools, on average (Satapathy et al. 2010).This observation could have implications for the effectiveness of wobble base pairing as a function of temperature; however, this analysis was based on a limited set of species and failed to account for the shared ancestry of the species.
The structure of the genetic code and the currently established wobble rules are hypothesized to reduce the frequency and the impact of missense errors (Koonin and Novozhilov 2009).Despite this, mRNA translation is far more error-prone than either DNA replication or transcription, with an incorrect amino acid being incorporated into a peptide chain 1 out of every 1,000 to 10,000 codons (Ogle and Ramakrishnan 2005;Drummond and Wilke 2009).A key factor shaping missense error rates are the ratios of cognate to near-cognate tRNA for each codon (Shah and Gilchrist 2010;Mordret et al. 2019).Presumably, a more diverse tRNA pool would lead to higher missense error rates, potentially leading to other evolutionary adaptations to either reduce the frequency of missense errors or reduce the impact of missense errors when they occur.
Here, we examine the evolution of the tRNA pools across a phylogenetically diverse set of bacteria.We find that thermophilic bacteria generally have a greater number of tRNA anticodons represented in their genomes compared with mesophiles or psychrophiles.This increase in tRNA diversity is expected to increase the average missense error rate under standard wobble rules.We speculate the increased tRNA diversity may result from a lowered efficiency of wobble base pairing at higher temperatures, leading to fitness benefits for expanding the number of tRNAs represented in the tRNA pool.

Evolution of tRNA Pool across All Bacteria
Our final data set contained the tRNA gene copy numbers (tGCNs) for 334 mesophilic, 5 psychrophilic, and 67 thermophilic bacteria (Fig. 1a).Genome-wide GC% ranged from 26% to 74% with a median of 51%.The diversity of the tRNA pool (i.e. the number of anticodons represented in the tRNA pool) ranged from 25 to 45 with a median of 41, and the size of the tRNA pool (i.e. the total number of tRNA genes) ranged from 28 to 142 with a median of 48 (Fig. 1b and c).We note that the tRNA diversity for mesophiles appears to be bimodal with a small peak at 33 and a larger peak at 43 (Fig. 1b).Interestingly, many species with 33 anticodons represented in the tRNA pool are common psychrotrophs: bacteria capable of surviving at temperatures < 10 °C, but grow best in the range of mesophilic temperatures.This includes bacteria such as Psychrobacter arcticus, Pseudoalteromonas atlantica, and Shewanella halifaxensis.
We performed phylogenetic regressions to investigate the relationship between genomic GC%, tRNA diversity, total tGCN, and missense error rates ϵ M across bacteria (see Materials and Methods: Model of Translation Errors for details on estimating missense error rates from tGCN).These analyses indicate that higher GC% is associated with higher tRNA diversity, and both higher GC% and tRNA diversity are associated with higher missense error rates as determined by the median ϵ M for each species (Fig. 2).Interestingly, we observe that the relationship between GC% and tRNA diversity appears to be primarily driven by tRNA that recognizes GC-ending codons (not considering possible wobble; Fig. 3a).The representation of tRNA with anticodons recognizing codons ending in A or T/U varies with GC%, but only weakly (Fig. 3b).A summary of model comparisons can be found in the supplementary table S1, Supplementary Material online.
Testing for Systematic Differences in the tRNA Pool as Related to Environment Previous work found that thermophiles have a greater tRNA diversity compared with mesophiles and psychrophiles (Satapathy et al. 2010).This analysis was based on a limited set of bacteria and failed to control for the shared ancestry of the species.Controlling for the effects of GC content in our phylogenetic regression, we observe that tRNA diversity is, on average, higher in thermophiles.Thermophiles were found to have approximately one to two more anticodons represented in the tRNA pool, on average, compared with mesophiles with a similar GC% (Table 1; β Thermo = 1.77,P < 0.0001).In contrast, we find no evidence that thermophiles have fewer total tRNA genes, on average, compared with mesophiles with a similar GC% (Table 1; β Thermo = −2.90,P = 0.15).Although the slope estimate for psychrophiles was positive (Table 1; β Psychro = 3.91, P = 0.34), it was not statistically significant, such  -Summary of genome-wide GC%, total tGCNs, and tRNA diversity across the bacteria under consideration.a) Variation in genomic GC% and the tRNA pool vary across the species considered, as well as the evolutionary relationships of the species.Not all organisms studied are represented in the tree.b) Distribution of total number of tRNA genes across all species for which data were available.c) Distribution of the diversity of anticodons represented in tRNA genes across all species for which data were available.
Examining Effects of Temperature on Evolution of tRNA Pools that we have insufficient evidence to claim that psychrophiles have a greater number of tRNA genes than their mesophilic counterparts, contrary to previous claims.However, this is likely due to a lack of statistical power, as we only had five psychrophiles in our final set of bacteria.Unlike with tRNA diversity, we found no evidence that GC content had a significant relationship with the total number of tRNA found in a genome (Table 1; β GC = 3.96, P = 0.56).
Based on the general relationship between tRNA diversity and missense error rates observed across all bacteria, the increased diversity of thermophilic tRNA pools is expected to be associated with an increased missense error rate compared with psychrophiles and mesophiles.Our results support this hypothesis, as thermophiles had a median missense error rate approximately 5 × 10 −5 greater than a mesophile with the same GC% (Table 1; β Thermo = 0.000054, P = 0.012).

Discussion
Here, we present evidence that extreme temperature environments shape the evolution of the tRNA pool, a crucial feature of genome evolution and protein synthesis dynamics.Overall, we find that GC-rich bacteria generally have more anticodons represented in the tRNA pool (i.e. a higher tRNA diversity).This increase in tRNA diversity is primarily through tRNA that recognizes GC-ending codons.Notably, the presence/absence of tRNA with anticodons of the form CNN and GNN (where N is any of the four nucleotides) was previously observed to be variable across bacteria (Wald and Margalit 2014).Additionally, tRNA with anticodons of the form ANN are largely avoided due to modification to inosine (I), which could match with codons NNC, NNA, or NNU, increasing the potential for missense errors (Wald and Margalit 2014).As an NNC codon can only be decoded by an anticodon of the form GNN or INN (the latter of which may be largely avoided), and an NNG codon can only be decoded by an anticodon of the form CNN or UNN, it is unsurprising that the diversity of tRNA recognizing GC-ending codons scales with GC content.Based on a mechanistic model of ribosome elongation, a more diverse tRNA pool is expected to increase the frequency of missense errors.
We find that the number of anticodons represented in the tRNA pool is generally higher in thermophilic bacteria compared with mesophiles with a similar GC%.We speculate this could reflect the weakened efficiency of wobble base pairing at the third nucleotide due to the greater instability of chemical bonds at higher temperatures.Consistent with thermophiles generally having a more diverse tRNA pool, they also have a higher median missense error rate compared with mesophiles.We emphasize missense error rate estimates are based on a theoretical model of translation that includes parameters (e.g.wobble efficiency) estimated in Escherichia coli.The safest interpretation of our missense error rate estimates is the error rates expected in E. coli given its tRNA pool looked like a particular species.Similar to our hypothesis that wobble may be less efficient in thermophiles due to the higher temperatures, missense errors may be less likely at higher temperatures due to the instability in the codon-anticodon bonds.If increased temperatures do not offset the potential negative consequences of a more diverse tRNA pool, then the higher missense error rates may have multiple implications for the evolution of protein synthesis in thermophiles.
First, we speculate higher missense error rates due to a more diverse tRNA pool in thermophiles could be compensated for via the evolution of more accurate ribosomes or changes to tRNA modifications.Previous work concluded that hyperaccurate ribosomes substantially decrease growth rates due to kinetic inefficiency (Ruusala et al. 1984).The impact of higher temperatures on reaction rates may help offset some of the negative consequences of more accurate ribosomes.Such arguments are similar to those attempting to explain the apparent weaker codon usage bias observed in thermophiles compared with mesophiles and psychrophiles: as higher temperatures increase Top table: phylogenetic regression parameter estimates when comparing tGCN diversity across mesophiles, thermophiles, and psychrophiles while taking GC% into account.This can be represented by the formula tGCN diversity = β Meso + β Thermo x Thermo + β Psychro x Psychro + β GC x GC , where x Thermo and x Psycrho indicate if the bacterium is a thermophile or psychrophile (i.e.x Thermo/Psychro = 1 and 0 otherwise).This means the slope estimates β Thermo and β Psychro represent the mean value of tGCN diversity relative to a mesophilic bacterium.This regression assumed an OU model of trait evolution, which was 2.92 AIC units better than the same regression using a BM model of trait evolution.Middle table: phylogenetic regression parameter estimates when comparing total tGCN across mesophiles, thermophiles, and psychrophiles while taking GC% into account.This can be represented by the formula Total tGCN = β Meso + β Thermo x Thermo + β Psychro x Psychro + β GC xGC, where x Thermo and x Psycrho indicate if the bacterium is a thermophile or psychrophile (i.e.x Thermo/Psychro = 1 and 0 otherwise).This means the slope estimates β Thermo and β Psychro represent the mean value of the total tGCN relative to a mesophilic bacterium.This regression assumed an OU model of trait evolution, which was 53 AIC units better than the same regression based on a BM model.Bottom table: phylogenetic regression parameter estimates when comparing the median missense error rates across mesophiles, thermophiles, and psychrophiles while taking GC% into account.We note this can be represented by the formula Median Missense Error Rate = β Meso + β Thermo x Thermo + β Psychro x Psychro + β GC x GC , where x Thermo and x Psycrho indicate if the bacterium is a thermophile or psychrophile (i.e.x Thermo/Psychro = 1 and 0 otherwise).This means the slope estimates β Thermo and β Psychro represent the mean value of the median missense error rates relative to a mesophilic bacterium.This regression assumed an OU model of trait evolution, which was 124 AIC units better than the same regression based on a BM model.reaction rates, the ability of natural selection to act on translation efficiency via codon usage is weakened (Vieira-Silva and Rocha 2010; Botzman and Margalit 2011).The presence or absence of certain tRNA modifications in E. coli is known to impact decoding accuracy of the ribosome (Manickam et al. 2016).More in-depth studies of ribosome and tRNA function in extremophiles are needed to empirically determine if differences in the elongation process covary with environmental temperature.
Second, the higher missense error rates may have significant consequences for thermophilic proteome evolution.Thermophilic proteomes are known to contrast with their mesophilic counterparts.Thermophilic proteins generally accumulate amino acid substitutions at a slower rate than mesophilic proteins, hypothesized to be due to amino acid mutations being generally more destabilizing at higher temperatures (Zeldovich et al. 2007;Cherry 2010;Faure and Koonin 2015;Venev and Zeldovich 2018).As missense errors alter a protein sequence, missense errors could have a higher fitness burden in thermophiles.Thus, a higher missense error rate is expected to impose a greater fitness burden on thermophiles.A prominent hypothesis proposes natural selection against missense errors increases the frequency of "optimal" codons in both highly expressed genes and at functionally important positions in the protein (Drummond and Wilke 2008;Yang et al. 2010).Although thermophiles generally have weakly biased codon usage, little work has examined codon usage at functionally important sites.Future work will focus on detecting signals of adaptive codon usage in thermophiles related to selection against missense errors.Future research will also benefit from considering the presence and absence of tRNA modification enzymes, which can significantly impact protein synthesis (Nedialkova and Leidel 2015;Manickam et al. 2016;Diwan and Agashe 2018).We again emphasize that many of our analyses and hypotheses were extrapolated from observations in mesophiles.Comparative analyses of genome and proteome evolution as it relates to environmental temperature will greatly benefit from empirical research into the biochemistry and molecular biology of extremophilic organisms.

Obtaining tGCNs
We obtained tGCNs for 1,093 bacteria from a previous study (Diwan and Agashe 2018).The total number of tRNA genes for each bacterium was calculated by summing the tGCNs across all represented anticodons.The tRNA diversity value was calculated by summing the number of unique anticodons represented in the tRNA pool (e.g. a species with tRNA representing all 61 anticodons would have a tRNA diversity of 61).Organisms are categorized as psychrophilic, mesophilic, or thermophilic depending on their optimal growth temperature: psychrophiles prefer below 15 °C, thermophiles above 45 °C, and mesophiles between 15 and 45 °C (Taylor and Vaisman 2010).Optimal growth temperatures and GC content were taken from a database of optimal growth temperatures (Helena-Bueno et al. 2023).

Model of Translation Errors
To estimate codon-specific elongation rates and missense error rates (i.e.rate of incorrect amino acid incorporation per codon), we implemented a mechanistic model that takes in the tGCNs of a species (Shah and Gilchrist 2010).Standard wobble rules were applied for bacteria as previously done.For each codon in a species, the cognate, pseudo-cognate, and near-cognate tRNA were identified.A cognate tRNA is defined as a tRNA with an anticodon that either (i) perfectly matches a codon or (ii) matches a codon via wobble rules, with either case resulting in the correct amino acid being inserted into the growing peptide chain.Pseudo-cognates are anticodons that code for the same amino acid as the focal codon, but do not follow standard wobble rules.Near-cognates are anticodons having a single nucleotide mismatch, but do not code for the correct amino acid.For each codon i, we used a species tGCN and model of wobble pairing to estimate the cognate elongation rate R C (i), near-cognate elongation rate R N (i), and the missense error rate ϵ M (i) = R N (i)/(R C (i) + R N (i) + R D ), where R D is the expected ribosome drop-off rate (i.e. the rate at which premature translation termination occurs) set to be 3.146 × 10 −3 s −1 for all codons.Elongation rates for each species were scaled such that the harmonic mean across all codons was 12.5 amino acids per second as done previously.Additional model details can be found elsewhere (Shah and Gilchrist 2010).We emphasize that the parameters used to specify our model of translation are based on empirical measures taken from the mesophile E. coli, but this model is applied to the tGCN data for each bacteria under consideration.The safest interpretation of our missense error rate estimates is the expected error rates in E. coli if it had the same tRNA pool as the bacteria under consideration.

Comparing Data across Species
Phylogenetic linear regressions were used to test for differences in the tRNA pool as a function of genome-wide GC content and the classification of the species (thermophile, mesophile, psychrophile) while accounting for the shared ancestry of the bacteria (Felsenstein 1985).The phylogenetic tree was taken from a prior study of tRNA pool evolution across bacteria (Diwan and Agashe 2018).This phylogenetic tree was time-calibrated using the penalized likelihood FIG.1.-Summary of genome-wide GC%, total tGCNs, and tRNA diversity across the bacteria under consideration.a) Variation in genomic GC% and the tRNA pool vary across the species considered, as well as the evolutionary relationships of the species.Not all organisms studied are represented in the tree.b) Distribution of total number of tRNA genes across all species for which data were available.c) Distribution of the diversity of anticodons represented in tRNA genes across all species for which data were available.
FIG.2.-Across-species relationships between GC content, missense errors rates, and tRNA diversity.Phylogenetic regressions across all bacteria are represented by solid lines.* indicates parameter estimates significantly (P < 0.05) different from 0. a) Demonstration of a positive relationship between tRNA diversity and GC% across bacteria (Spearman correlation coefficient between PIC R = 0.521, P < 2.2e−16).b) Demonstration of a positive relationship between tRNA diversity and predicted median ϵ M (Spearman correlation coefficient between PIC R = 0.386, P = 7.412e−16).c) Demonstration of a positive relationship between the median missense error rate for each species and GC% across bacteria (Spearman correlation coefficient between PIC R = 0.400, P = <2.2e−16).

Table 1
Phylogenetic regression parameter estimates