Fail-safe genetic codes designed to intrinsically contain engineered organisms

Abstract One challenge in engineering organisms is taking responsibility for their behavior over many generations. Spontaneous mutations arising before or during use can impact heterologous genetic functions, disrupt system integration, or change organism phenotype. Here, we propose restructuring the genetic code itself such that point mutations in protein-coding sequences are selected against. Synthetic genetic systems so-encoded should fail more safely in response to most spontaneous mutations. We designed fail-safe codes and simulated their expected effects on the evolution of so-encoded proteins. We predict fail-safe codes supporting expression of 20 or 15 amino acids could slow protein evolution to ∼30% or 0% the rate of standard-encoded proteins, respectively. We also designed quadruplet-codon codes that should ensure all single point mutations in protein-coding sequences are selected against while maintaining expression of 20 or more amino acids. We demonstrate experimentally that a reduced set of 21 tRNAs is capable of expressing a protein encoded by only 20 sense codons, whereas a standard 64-codon encoding is not expressed. Our work suggests that biological systems using rationally depleted but otherwise natural translation systems should evolve more slowly and that such hypoevolvable organisms may be less likely to invade new niches or outcompete native populations.

the following biological and biochemical arguments to choose between some pairs of amino acids: M over I to avoid re-engineering translation initiation and due to the physicochemical similarity of isoleucine to leucine; K over N because lysine can form intramolecular salt links and act as a general base; Q over H because glutamine can both accept and donate hydrogen bonds, and to retain one amino acid with an amide side chain; and D over E because aspartate acts as a biosynthetic precursor for methionine, threonine, and lysine [Kanehisa and Goto 2000]. We chose W over C due to the following code structure argument: F (UUY) and Y (AUY) are constrained to two codons each such that either C codon (GUY) would be adjacent by point mutation to F or Y. W (UGG) is not similarly constrained, thus we chose W. We concede convincing arguments can be made for including different combinations of amino acids, but any resulting codes using codons that are nonadjacent by point mutation will have the same predicted evolutionary rate. Proposed set of tRNAs for instantiating RED15 and RED20, along with their genomic locations in E.

Supplementary Figure 4: Accounting for tRNA promiscuity has a limited effect on the expected evolutionary rates of fail-safe codes. (a)
Effect of wobble decoding on RED20 and RED15. Table and mutation-distance network representations of the codes that result from considering wobble decoding (PROMISC20 resulting from RED20 and PROMISC15 resulting from RED15). Color signifies the rank-ordered hydropathy of the amino acids (I is most hydrophobic, R is most hydrophilic).

Shaded boxes represent null codons that may be recognized by wobble decoding. A detailed discussion of tRNA promiscuous decoding is included in the Materials and Methods section. (b)
Mean fitness traces for fail-safe codes with and without considering tRNA promiscuity (n = 1000 replicates). Bold lines indicate the mean fitness of a batch culture, averaged across replicates.
Shaded regions represent the standard deviation of the mean fitness across replicates. The Standard Code is represented in blue. RED20 and RED15 are represented as the solid and dashed orange lines respectively. PROMISC20 and PROMISC15 are represented with solid and dashed purple lines respectively. (c) P "#$%&'$ at steady state vs. f ) for invasive strains using fail-safe codes with and without considering wobble decoding (n = 300 replicates). Colors are the same as in Fig.   5b. Lighter shaded lines represent bootstrap-resampled traces of the data. Color represents P "#$%&'$ magnitude, varying from 0 (yellow) to 1 (purple). P "#$%&'$ reaches a steady state value at the limit of large t. include at least one amino acid in the RED20 supported amino acid set, except for except 81H

Supplementary
(bolded). 81H however is only found in two of 46 variants in the multiple sequence alignment.