INRI-seq enables global cell-free analysis of translation initiation and off-target effects of antisense inhibitors

Abstract Ribosome profiling (Ribo-seq) is a powerful method for the transcriptome-wide assessment of protein synthesis rates and the study of translational control mechanisms. Yet, Ribo-seq also has limitations. These include difficulties with the analysis of translation-modulating molecules such as antibiotics, which are often toxic or challenging to deliver into living cells. Here, we have developed in vitro Ribo-seq (INRI-seq), a cell-free method to analyze the translational landscape of a fully customizable synthetic transcriptome. Using Escherichia coli as an example, we show how INRI-seq can be used to analyze the translation initiation sites of a transcriptome of interest. We also study the global impact of direct translation inhibition by antisense peptide nucleic acid (PNA) to analyze PNA off-target effects. Overall, INRI-seq presents a scalable, sensitive method to study translation initiation in a transcriptome-wide manner without the potentially confounding effects of extracting ribosomes from living cells.


INTRODUCTION
Protein synthesis is one of the most energy-consuming processes in living cells, making its precise regulation a crucial matter of cellular economy (1,2). While mRNA levels determined by RNA-sequencing (RNA-seq) are often used as a proxy for protein synthesis in global gene expression analysis (3), final protein abundance does not always correlate with mRNA levels (4)(5)(6)(7). This is due to a multitude of regulatory mechanisms, for example, direct control of the translational machinery (1) and post-transcriptional control of mRNAs by base-pairing small RNAs (sRNAs) (8) or intrinsic mRNA structure (9).
Over the past decade, ribosome profiling (Ribo-seq) has become a primary method to more directly measure protein synthesis in a transcriptome-wide manner (10)(11)(12). Riboseq is based on RNA-seq analysis of ribosome-protected fragments (RPFs), which are ribosome-bound mRNA fragments that survive nuclease treatment after cell lysis because they are covered and protected by translating ribosomes (13)(14)(15). In Escherichia coli, RPFs typically are 15-45 nt in length (16). Since each RPF represents the position of one ribosome, their sum not only reveals which mRNAs but also which parts of the coding sequence (CDS) of these mRNAs were being translated at the point of sampling. Thus, Riboseq can map the landscape of actively translating ribosomes in vivo to provide global information about translational pausing (16), stalling (17) and start site usage (18)(19)(20), as well as an estimate of protein copy numbers (6).
While Ribo-seq has greatly advanced the study of translation-related processes, the method has not been without limitations. Coverage of weakly expressed genes remains challenging, preventing many genes from being detected in common study designs. This includes certain gene classes with notoriously low expression under standard growth conditions, such as toxins whose expression is only triggered by specific stresses that lead to cell death or inhibition of growth (21). Similarly, Ribo-seq of microbes from important ecological habitats such as the human gut (22) remains difficult since many of them cannot be cultured in the laboratory, though recent efforts have started to tackle this challenge (7).
On the mechanistic level, Ribo-seq-based studies of molecules affecting translation can be hampered by cellular responses. For instance, the antibiotic retapamulin (RET) acts right after translation initiation when the ribosome is still at the start codon, which has enabled the global annotation of translation initiation sites (TISs) in Escherichia coli (18,23). However, this study required genetic inactivation of the ABC transporter TolC to prevent export of the antibiotic prior to it exerting its effect. While this was possible in E. coli, a lack of either genetic tools or knowledge of transporters would preclude similar studies in many other organisms. Lastly, since Ribo-seq is performed on living cells, it can be difficult to dissect direct and indirect effects on translation. This is exemplified by antisense antibiotics (24)(25)(26), whose import via carrier peptides broadly affects gene expression in addition to the desired antisense inhibition of the targeted gene of interest (27,28).
To overcome some of these limitations, we have developed in vitro Ribo-seq (INRI-seq) for the global study of translation in a cell-free manner. INRI-seq uses the commercially available PURExpress in vitro translation system combined with an in vitro-synthesized, fully customizable transcriptome for better control of individual mRNA levels. INRI-seq obviates the need for translation-modulating compounds to traverse cellular membranes and the extraction of ribosomes from a large number of living cells. As proof of concept, we apply INRI-seq to a synthetic E. coli transcriptome and show that the method faithfully validates known TISs and can be used to predict new TISs. In addition, we use the system to study the fidelity of translational inhibition by antisense peptide nucleic acid (PNA), demonstrating on-target specificity and defining base-pairing criteria that influence the off-target effects of these short antisense oligomers. INRI-seq bears great potential as a scalable alternate method to study translation control mechanisms and translation-modulating compounds in other organisms and organismal communities, including eukaryotes.

Design and synthesis of the synthetic transcriptome
To obtain a synthetic variant of the E. coli MG1655 transcriptome containing the first 51 codons of each gene, a pool of single-stranded DNA oligonucleotides was designed by extracting the sequence of the first 153 nt of each annotated open reading frame (start codon + 50 codons). At the 5 end, 30 nt upstream of each start codon was added as 5 UTR and extended by addition of a T7 RNA polymerase promoter (GTTTTTTTTAATACGACTCACTAT AGGG). At the 3 end, a TAA stop codon was added and extended by the first 28 nt of the 3 UTR of E. coli hns (TCTTTTGTAGATTGCACTTGCTTAAAAT). The final pool of 4386 DNA oligonucleotides (JVOpool-001) was ordered from Integrated DNA Technologies at a scale of 1 pmol/oligo and is listed in Supplementary Table S6.
Using the KAPA HiFi HotStart PCR kit (Roche), double-stranded DNA was generated from JVOpool-001. Per 50 l of PCR reaction, 20 ng of JVOpool-001, 10 l HF buffer, 1 l dNTPs, 1 l DNA polymerase and 2 l each of the oligos JVO-18582 and JVO-18583 (10 M each) were used. The PCR was run with the following protocol: 95 • C for 3 min; 10 cycles of 95 • C for 20 s followed by 60 • C for 20 s and 72 • C for 15 s; 72 • C for 2 min. A total of 400 l of this PCR was run and purified using column-based cleanup (Macherey-Nagel).
The RNA of the synthetic transcriptome was obtained by in vitro transcription of 500 ng each of the double-stranded DNA pool in two 40 l reactions using the MEGAscript T7 transcription kit (Thermo Fisher) according to the manufacturer's instructions, except that the reactions were performed overnight. The next day, the reactions were treated with 2 l each of TURBO DNase (Thermo Fisher) at 37 • C for 15 min. The RNA was denatured by addition of 40 l of 2 × GLII loading buffer (95% formamide, 18 mM EDTA, pH 8, 0.02% (w/v) SDS, 0.025% (w/v) bromophenol blue, 0.025% (w/v) xylene cyanol) and incubation at 95 • C for 5 min, then placed on ice. To purify the RNA, it was separated by 6% denaturing PAGE with 7 M urea in 1 × TBE, stained with EtBr, cut from the gel and eluted with 750 l RNA elution buffer (0.1 M NaOAc, pH 5.4, 0.1% (w/v) SDS, 10 mM EDTA, pH 8) at 4 • C overnight. The next day, the RNA-containing supernatant was mixed with 800 l acidic phenol-chloroform-isoamylalcohol and centrifuged at 4 • C for 15 min. The aqueous phase was collected, split into two tubes and precipitated with 1 ml each of ice-cold EtOH. Following precipitation at −20 • C for at least 1 h, the samples were centrifuged at 4 • C for at least 30 min, the supernatants discarded, the pellets washed with 400 l icecold 70% EtOH and centrifuged again at 4 • C for 15 min. Finally, the supernatants were discarded and the pellets were dried at room temperature for 5 min and resuspended in 25 l water. The resuspended pellets were pooled into one tube to obtain the final synthetic transcriptome. RNA purity and integrity were tested by denaturing urea PAGE.

In vitro Ribo-seq (INRI-seq)
To globally investigate translation in vitro, the PURExpress In Vitro Protein Synthesis Kit (New England Biolabs) was used to translate the synthetic transcriptome described above. To denature the synthetic transcriptome, it was incubated at 95 • C for 2 min, then placed on ice. For each sample, a 25 l reaction containing 10 l solution A (PURExpress), 7.5 l solution B (PURExpress), 5 l water and 2.5 l of the denatured synthetic transcriptome (stock concentration 10 M; final concentration 1 M) was incubated at 37 • C for 15 min in order to allow the ribosomes to start unhindered translation. Then, 1.25 l of retapamulin (stock concentration 100 g/ml in DMSO; final concentration 5 g/ml) was added to block translation and the incubation continued at 37 • C for 30 min. To stop the reaction, 175 l of ice-cold stop buffer (20 mM Tris-HCl, pH 8, 100 mM NH 4 Cl, 10 mM MgCl 2 , 5 mM CaCl 2 , 1 mM DTT, 0.4% Triton X-100, 0.1% NP-40, 200 U/ml RNase-inhibitor, 5 g/ml retapamulin) was added and the reaction put on ice.
When the effect of PNAs on translation was investigated, the same protocol was followed with the following exceptions: The PNA stock solutions (10× the concentrations of the respective final concentrations) were incubated at 55 • C for 5 min. Then, 2.5 l of the PNA stock solution and 2.5 l water were added to 2.5 l of the denatured synthetic transcriptome and the mixture annealed at 37 • C for 5 min, then placed on ice. The resulting 7.5 l were then mixed with 10 l solution A and 7.5 l solution B and the translation carried out as above.
RPFs were generated by addition of 1.4 l of MNase (420 U in total; Thermo Fisher) and incubation at 25 • C for 1 h. The reaction was quenched with 2 l of 0.5 M EGTA, pH 8 and placed on ice. To isolate 70S monosomes contain-PAGE 3 OF 15 Nucleic Acids Research, 2022, Vol. 50, No. 22 e128 ing the desired RPFs, 185 l of the sample was loaded on a 10-55% sucrose gradient (20 mM Tris-HCl, pH 7.5, 100 mM NH 4 Cl, 10 mM MgCl 2 , 5 mM CaCl 2 , 1 mM DTT) formed in an open-top polyclear ultracentrifugation tube (Seton Scientific). The gradient was centrifuged at 4 • C and 35 000 rpm for 2.5 h using an SW 40 Ti rotor (Beckman Coulter). Afterward, the 70S monosome fraction (∼1 ml) was collected using a Biocomp Model 153 gradient station and snap-frozen in liquid N 2 .
The frozen monosome fraction was thawed on ice and split into two tubes. To each tube, 800 l of acidic phenolchloroform-isoamylalcohol was added, the samples vortexed for 15 s and then centrifuged at 4 • C for 15 min. The aqueous phases were collected and precipitated by addition of 1 l GlycoBlue (Thermo Fisher) and 1.4 ml of icecold precipitation mix (30:1 EtOH:3 M NaOAc, pH 6.5) at −20 • C for at least 1 h. The samples were centrifuged at 4 • C for at least 30 min, the supernatants discarded, the pellets washed with 400 l ice-cold 70% EtOH and centrifuged again at 4 • C for 15 min. Finally, the supernatants were discarded and the pellets were dried at room temperature for 5 min and one pellet resuspended in 25 l water. The solution was then transferred to the other tube to resuspend the corresponding second pellet.
30 l of 2× GLII loading buffer were added and the sample denatured at 95 • C for 5 min, then placed on ice. As ladder, 10 l of microRNA Marker (New England Biolabs) were mixed with 2.5 l Low Range ssRNA Ladder (New England Biolabs), 25 l 2× GLII loading buffer and 22.5 l water and then denatured like the sample. The RNA samples containing the RPFs and the ladder were separated by 15% denaturing PAGE with 7 M urea in 1× TBE. The RPFs were visualized by staining the gel with SybrGold (Thermo Fisher) and cut from the gel in the range of 15-45 nt. Gel extraction was performed as described above, except that 1 l GlycoBlue was added during ethanol precipitation. Finally, the pellets were dried at room temperature for 5 min and one pellet resuspended in 25 l water. The solution was then transferred to the other tube to resuspend the corresponding second pellet.

cDNA library preparation
The purified RPFs were subjected to library preparation for next-generation sequencing (vertis Biotechnologie). The RNA 5 ends were phosphorylated with T4 polynucleotide kinase (New England Biolabs). Adapters were ligated to the 5 and 3 ends of the RNA and first-strand cDNA synthesis was performed using M-MLV reverse transcriptase. The cDNA was then amplified using a high fidelity DNA polymerase and purified with Agencourt AMPure XP beads (Beckman Coulter). The cDNA samples were pooled in equimolar amounts and the rDNA within the pool was depleted targeting 24 unique sequences (Supplementary Table  S1) using a CRISPR-Cas9 protocol similar to DASH (29). The final pool was subjected to sequencing on an Illumina NextSeq 500 system using 75 nt single-end read length.

Quantification of INRI-seq data
Reads of the INRI-seq experiments were preprocessed and mapped against the oligo pool using tools from the BBtools suite (https://sourceforge.net/projects/bbmap/). To remove sequencing adapters from raw reads and low-quality bases (Phred quality score <10), BBduk was used. The resulting reads were mapped against the oligo pool using BBMap (v38.79). Aligned reads were then assigned genes and quantified using the featureCounts (v2.0.1) method of the Subread package (30).
To facilitate the visualization of the 3 ends of reads in coverage plots, wiggle files were created with the bamCoverage (v.3.4.3) tool from the deepTools package (31). Here, the counts per million (CPM) normalization option was used to normalize for read depth per library. To get exact positional information on the ribosome position, only the last base of the 3 end of each aligned read was profiled using the -Offset option of bamCoverage. The resulting coverages of the wiggle files were then visualized using integrative genomics viewer (IGV) (32) and used for the identification of translation initiation sites.

Analysis of translation inhibition by PNA
R packages from the tidyverse-suite and edgeR (v3.30.0) were used to analyze the in vitro translation of the oligo pool (33,34). A raw count table of quantified reads was imported into the edgeR environment. Oligos with <4.21 CPM were removed. This cutoff was calculated by dividing 10/L, where L is the minimum library size in millions, as proposed by Chen et al. (35). Next, read counts were normalized with edgeR's trimmed mean of M values normalization method (36). Differential translation was measured by first estimating the quasi-likelihood dispersions with the glmFit function and then comparing conditions with the glmQLFTest function. Transcripts with false discovery rateadjusted (37) P-values <0.001 and log 2 fold changes >1 were considered to be differentially translated. In order to screen for possible off-targets of the acpP-PNA in the oligopool, the PNA sequence was mapped against the whole oligo pool using SeqMap, accepting alignments with up to one mismatch (38).

Metagene analysis
3 End-aligned INRI-seq data was compared to an in vivo ribosome profiling dataset of E. coli pre-treated with retapamulin (18). The raw coverage files were downloaded from GEO: GSE122129 and adjusted manually so that both datasets show the 3 end of ribosomal footprints. To generate metagene plots of Figure 2A and Supplementary Figure  S2F, footprints from −10 to +55 and from −30 to +180 relative to the start codon were extracted, respectively. The footprints were then normalized against the total read depth of these reads. Then plots were created using functions of the R packages ggplot2 and the dplyr (34).

Computational analysis of translation initiation sites
To identify annotated TISs, an algorithm was implemented using custom python scripts. The algorithm scans through each of the oligo sequences base by base and looks for peaks. For annotated TISs, it searches for peaks in the region of 15 nt (±3 nt) inside the CDS, because that is where e128 Nucleic Acids Research, 2022, Vol. 50, No. 22 PAGE 4 OF 15 the 3 end of the reads is located during translation initiation, when a ribosome is attached (39). If a peak was identified (CPM normalized depth >5) in at least two replicates, it was considered a TIS in the INRI-seq dataset (Supplementary Figure S6A, Supplementary Table S2).
In addition to the confirmation of annotated TISs, in vivo retapamulin-treated ribosome profiling had been used previously to search for alternative TISs, which do not coincide with annotated start codons (18). Custom python scripts were run to find alternative TISs and to compare them to the in vivo data (Supplementary Figure S6B). Each oligo was screened for peaks with >5 CPM outside its annotated TIS. Only peaks with a relative density (reads of peak divided by the total reads for the respective oligo) > 10% were considered. Additionally, only peaks with a start codon (ATG, TTG, GTG, CTG, ATC and ATT) 15 nt (±3 nt) upstream of the peak were considered. To prevent the identification of more than one alternative TIS for the same site, peaks in close proximity (up to 5 nt in distance) were merged and the highest respective peak was selected. Next, the reading frame of the alternative TIS relative to the annotated reading frame was noted and a stop codon downstream of the alternative TIS was searched for, designating the alternative TIS as in-frame or out-of-frame. Finally, the in vivo Ribo-RET data (18) was searched for the alternative TIS using the same settings.

Start codon analysis
The upstream sequences (37 nt upstream) from our identified TIS peaks were screened for potential start codons (ATG, GTG, TTG, CTG, ATC, ATT), Shine-Dalgarno (SD) motifs (AGG, GGA) and A-rich (AAA) motifs by counting these motifs using a custom R script. Then the frequency of occurrence was visualized as log 2 counts for annotated TISs detected by INRI-seq, annotated TISs not detected by INRI-seq, alternative TISs detected by INRI-seq, and peaks without called TIS in Supplementary Figure S3A-D using the ComplexHeatmap R package (v4.2) (40).
Additionally, the compositions of start codons were compared between annotated TISs and alternative TISs for both INRI-Seq and Ribo-RET. For this, the abundances of start codons were divided by the total number of the respective TIS type and visualized as percentages ( Supplementary Figure S3G).

Prediction of RNA secondary structure
To investigate the strength of secondary structures of our mRNAs around their TISs, regions between −30 and +15 relative to the start codon were extracted and the minimum free energy (MFE) was predicted using RNAfold (v2.4.14) (41). As control, RNA sequences of the same length were generated using the dinucleotide content of the E. coli genome. Shuffling was done with the esl-shuffle command of the HMMER suite (v3.3.2) (42). Resulting MFE predictions were then plotted in combined beeswarm and boxplots ( Figures S2G and S3E). Wilcoxon signed rank tests were applied to test for significant differences between categories.

In vitro translation followed by western blotting
The acpP::gfp fusion transcript was obtained as previously described (43). E. coli genomic DNA was PCR amplified with JVO-18305 and JVO-18306 (Supplementary Table S7) and cut with NheI and NsiI. Plasmid pXG-10 (43) was also cut with NheI and NsiI and ligated with the acpP insert. After transformation and plasmid isolation, the plasmid containing the acpP::gfp fusion template was PCR amplified with JVO-18200 and revSF to obtain the dsDNA for in vitro transcription. After in vitro transcription and purification as described above, in vitro translation of the acpP::gfp fusion transcript was carried out as for INRI-seq with the following exceptions: Translation reaction volumes were scaled down to a total of 10 l, the final transcript concentration was 100 nM, no RET was added to the reactions and translation was carried out for 2 h. After the incubation, 2.5 l of 5× protein loading buffer (300 mM Tris-HCl, pH 6.8, 50% (v/v) glycerol, 10% (w/v) SDS, 500 mM DTT, 0.05% (w/v) bromophenol blue) was added and the samples incubated at 95 • C for 2 min and placed at room temperature. The samples were then separated by 12% SDS-PAGE followed by blotting onto a PVDF membrane (GE Healthcare). The membrane was stained with Ponceau S (Sigma-Aldrich) as loading control and incubated with a mouse ␣-GFP antibody (Roche). An HRP-coupled goat ␣-mouse antibody (Thermo Fisher) was finally used to develop the blot.

Design and workflow of INRI-seq
To obtain a synthetic transcriptome for INRI-seq, we designed a single-stranded DNA oligonucleotide pool consisting of 4386 unique oligonucleotides covering all annotated CDSs of E. coli K-12 MG1655 (NC 000913.3). For each CDS, we included the natural, i.e. annotated, start codon (generally ATG, GTG or TTG) followed by the first 150 nt of the open reading frame (ORF) ( Figure 1). We also included the ribosome binding site by addition of 30 nt of the respective 5 untranslated region (UTR), which is the average length of 5 UTRs in E. coli (44). At the 3 end, we added a strong TAA stop codon (45) and the 3 UTR of the E. coli hns gene for uniform termination of translation. Importantly, this 3 UTR doubled as primer binding site to generate a double-stranded DNA (dsDNA) pool. Finally, a T7 RNA polymerase promoter was added to the 5 end of each oligonucleotide to facilitate in vitro transcription, while at the same time representing the second primer binding site for pool amplification. The sequence in between the flanking regions (T7 promoter and 3 UTR) is completely flexible, and can be replaced with the sequence of any gene of interest. In sum, each oligonucleotide in the pool was 242 nt long (Figure 1), which is below the 250 nt limit for oligonucleotide pool production offered by some companies. Furthermore, the included 52 codons should allow all the steps of translation (initiation, elongation, and termination) to occur in close-to-natural manner.
Following a limited-cycle PCR of the oligonucleotide pool, we used the resulting dsDNA as template for T7 in vitro transcription. RNA-seq analysis of the obtained syn-

PAGE 5 OF 15
Nucleic Acids Research, 2022, Vol. 50, No. 22 e128  Figure 1. Workflow of INRI-seq. INRI-seq employs a synthetic transcriptome, for the generation of which one ssDNA oligonucleotide is synthesized for each gene of E. coli MG1655. Every gene contains a T7 RNA polymerase promoter, 30 nt of its natural 5 UTR, its natural start codon, the first 150 nt of its CDS, a TAA stop codon, and a 28-nt 3 UTR. This pool of oligonucleotides is converted to dsDNA by limited cycle PCR and then transcribed in vitro to obtain the synthetic transcriptome. Following in vitro translation of the synthetic transcriptome, RET is added and the translation continued, leading to ribosomes stalled at the start codons of the transcripts. RNA not protected by ribosomes is cleaved with MNase and the reaction is sedimented on a sucrose gradient. After 70S monosome isolation, RPFs are extracted from the ribosomes and size-selected on a gel. The purified RPFs are converted to cDNA, rDNA is depleted and the resulting sample is sequenced. Attributing the read density to the 3 ends of the sequenced reads allows identification of the ribosome positions and with it analysis of TISs. thetic transcriptome detected 4225 of the 4386 synthetic mRNAs at an abundance of >10 reads per million (median abundance: 125 reads per million), showing almost complete (∼96%) coverage of the ORFs included in the original DNA pool design (Supplementary Figure S1A).
Next, we applied the INRI-seq protocol to study the TISs of the synthetic E. coli transcriptome in vitro. We used RET, an antibiotic known to stall the ribosome at the start codon immediately after initiation ( Figure 1) (18,23). Since an initiating ribosome protects 14-16 nt of the mRNA downstream of the first nucleotide of the start codon (39), the 3 ends of RPFs will identify the TIS of the translated CDS following Ribo-seq (20). Using toeprinting, a previous study showed that RET-induced stalling also occurs in the in vitro system applied for INRI-seq, suggesting that in vitro TIS identification would be feasible (18). Since RET does not inhibit translation elongation (18), its addition to translating ribosomes causes polysomes to collapse into monosomes if translation is continued. We found that under our experimental conditions, addition of RET at a final concentration of 5 g/ml for 30 min after initial translation for 15 min causes polysome collapse (Supplementary Figure S1B). This is considerably longer than the 5 min of RET treatment necessary in vivo (18), suggesting that in vitro translation of the shorter CDSs used here might be less efficient.
To purify RET-stalled monosomes, the associated transcripts were cleaved with MNase and the digested samples were subsequently run on sucrose gradients, as is common for bacterial Ribo-seq ( Figure 1) (16). After collection of the 70S peak, RPFs were extracted with acidic phenolchloroform followed by size selection (15-45 nt) of RNA in a denaturing polyacrylamide gel. These isolated RPFs can directly be used for library preparation and sequencing ( Figure 1). Preliminary sequencing showed 24 fragments of ribosomal RNAs (rRNA; 9 fragments for 23S rRNA, 13 for 16S rRNA, and 2 for 5S rRNA) to be strongly enriched in this short RNA fraction (Supplementary Figure S1C). In the final INRI-seq protocol, these 24 rRNA fragments are depleted from the final cDNA library using a Cas9based DASH protocol (29), which reduces rRNA reads from >90% to <1% (Supplementary Figure S1D, Supplementary Table S1). This in vitro Ribo-seq protocol enables us to globally study translation initiation in a synthetic E. coli transcriptome.

INRI-seq identifies annotated TISs
We sequenced five independent INRI-seq libraries (∼8 million reads per sample) to study the TISs of E. coli, as de-     (18). Manual inspection of the data further revealed that the position of RPF density generally overlapped very well between the in vitro and the in vivo data, as exemplified by the cspE transcript, where the only detected density was close to the start codon ( Figure  2B). Indeed, in both the INRI-seq and Ribo-RET datasets, the distance of the RPF peak density to the cspE start codon was exactly 16 nt ( Figure 2C). The overlap was also present for transcripts with unexpected density distributions such as hfq, for which two peaks 13 and 16 nt downstream of the start codon were detected by both methods (Supplementary Figure S2D, E). Globally, INRI-seq confirmed the annotated TISs of 3059 out of 4386 (∼70%) genes using stringent cut-off criteria (RPF peak with counts per million mapped reads (CPM) ≥5 within 12-18 nt of the start codon in at least two of the five replicates; Figure 2D, Supplementary Table S2). Using the same criteria, in vivo Ribo-RET verified only 780 (∼18%) annotated TISs (18) (Supplementary Table S2), suggesting that INRI-seq is more sensitive, likely due to more even transcript abundance in the pool. No TIS was detected for 812 transcripts that passed our detection thresholds ( Figure 2D). For 99 of them, a peak was detected in only one of the five replicates, whereas the others did not show a clear peak that could be attributed to a TIS (Supplementary Figure S2F, Supplementary Table S2). This could be due to a variety of reasons, for example the requirement of translation activation by sRNAs or RNA folding, as in the case of certain riboswitches. We therefore predicted the strength of RNA secondary structures, which can hinder translation (9,46,47), from −30 to +15 nt with respect to the corresponding start codons of the annotated TISs (Supplementary Figure S2G). The predicted secondary structures around the 1091 annotated TISs INRI-seq did not detect ( Figure 2D) was indeed stronger than those of the 3059 detected ones, which could explain why we failed to detect some of them (Supplementary Figure S2G). Furthermore, pseudogenes that are part of the NC 000913.3 annotation and that might not be translated, are present in our transcriptome. We also cannot exclude misannotation of some of the start codons. We observed no major differences in the start codon sequences of the annotated TISs that we identified and the ones that we missed (Supplementary Figure S3A, B). Similarly, we detected the expected enrichment of Shine-Dalgarno (SD)-motifs upstream of these start codons, as well as an enrichment of A-rich sequences, which were recently shown to promote translation initiation (48).

INRI-seq confirms alternative TISs
Stalling ribosomes at start codons using antibiotics is a powerful way of analyzing where on a transcript translation initiation takes place (18)(19)(20). In addition to validating known TISs, these kinds of datasets can also be searched for new TISs, revealing alternative in-frame ( Figure 3A) or out-of-frame TISs ( Figure 3B) within annotated genes. To investigate whether INRI-seq is able to identify alternative TISs, we compared our data to in vivo Ribo-RET data (18). Of the 64 alternative TISs identified in vivo, we detected 51 (Supplementary Table S3). Most of the alternative TISs that were missed in the INRI-seq data set were >150 nt downstream of the annotated start codons and therefore not part of our synthetic transcriptome.
Next, we inspected some examples in more detail. The gene encoding arginine decarboxylase, speA, was previously shown to contain an in-frame alternative TIS that shortens the secretion signal-containing N-terminus of SpeA by 26 amino acids (aa) and leads to cytoplasmic rather than periplasmic localization of this isoform (18). In accordance with the in vivo Ribo-RET data, INRI-seq displayed a strong peak of RPF density 16 nt downstream of this alternative start codon ( Figure 3C). Importantly, INRI-seq also detected the start codon of the 43-aa short yqgB gene encoded upstream of speA, which was not possible in vivo, most likely due to its low abundance ( Figure 3C). Different from the alternative in-frame TIS in speA, the oppA gene encoding an oligopeptide uptake protein as well as the OppX RNA sponge (49,50) contains an out-of-frame TIS for an alternative 7-aa ORF (18). Our INRI-seq data supports this prediction, exhibiting the same RPF peaks as the in vivo data ( Figure 3D). While it is unclear if this 7-aa peptide has a biological function, translation of this alternative frame could impact the translation of oppA, similarly to upstream ORFs that can regulate translation of their downstream genes (51). In conclusion, these examples illustrate how INRI-seq not only captures annotated TISs but also reports TISs that were not detected by in vivo Ribo-seq.

INRI-seq identifies putative new TISs
After confirming that INRI-seq is able to detect both annotated and known alternative TISs with high sensitivity, we asked whether this method also enables the detection of putative new TISs. To do so, we searched for peaks with a relative density of ≥10% of the total reads identified for a given gene. These were further filtered to only include peaks of ≥5 CPM, which had to be present in at least two of the five replicates. Using these criteria, we identified 918 putative new TISs, 279 of which could be assigned to genes for which INRI-seq did not detect the annotated TIS (Figure 2D, Supplementary Table S3). Similar to the annotated TISs (Supplementary Figure S3A, B), the putative new TISs showed an enrichment of ATG start codons, although alternative start codons such as GTG or TTG occurred in higher frequency (Supplementary Figure S3C). In contrast, no enrichment of SD-motifs upstream of our putative new TISs was observed. We did, however, notice a strong enrichment of A-rich sequences upstream of these TISs, which also promote translation initiation (48), indicating that these putative new TISs might be translated in absence of classic SD-motifs. Finally, we analyzed the INRI-seq peaks (relative density of ≥10% of the total reads, ≥5 CPM in at least two of the five replicates) for which no TIS was called (Supplementary Figure S3D). As expected, no start codons were enriched 12 to 18 nt upstream of these peaks. There was, however, an enrichment in ATG start codons and SDmotifs closer to the peaks, indicating that our distance cutoffs were too stringent to call these potential TISs.
Stable mRNA secondary structures around TISs negatively influence translation initiation (9,46,47). We therefore analyzed the predicted free energy from −30 to +15 nt      The second RPF peak identified in speA belongs to a known, in-frame alternative start codon (18). (D) INRI-seq detects two RPF peaks at the 5 end of oppA. The second RPF peak identified in oppA belongs to a known, out-of-frame alternative start codon (18). (E) INRI-seq detects an RPF peak at the 5 end of uidR. The detected RPF peak cannot be attributed to the annotated start codon of uidR but rather derives from an in-frame alternative start codon. (F) INRI-seq detects an RPF peak toward the 5 end of wbbI. The detected RPF peak cannot be attributed to the annotated start codon of wbbI but rather derives from an out-of-frame alternative start codon. Gray and bold, annotated start codon. Yellow and bold, alternative start codon. Orange, SD sequence. Red, stop codon.

T G G T A C G T A A C A A T G G T T C A G A A G T T C A G T C A T T A G A
with respect to the corresponding start codons of the putative new TISs. While the putative new TISs showed a slightly stronger predicted secondary structure when compared to annotated TISs, the structures were significantly weaker than the ones of random sequences (Supplementary Figure S3E). This suggests that the putative TISs might mediate translation initiation. INRI-seq detected the majority of TISs reported in the in vivo Ribo-RET data, only missing the annotated TISs of 104 genes that could be verified in vivo (Supplementary Figure S3F). For only five genes, both datasets detected alternative TISs without a signal for their annotated TISs, whereas both the annotated and the same alternative TISs were identified for 39 genes in both studies. Finally, we examined the distribution of used start codons among the annotated versus the alternative TISs (Supplementary Figure S3G). While ∼88% of annotated E. coli TISs use ATG start codons, the alternative TISs of both INRI-seq and in vivo Ribo-RET identified considerably more non-ATG start codons: >50% of the alternative TISs use GTG, TTG, CTG, ATC or ATT as the start codon. While these less common start codons were less frequent than ATG among the alternative TISs, they still occurred more frequently 12-18 nt upstream of INRI-seq peaks than other trinucleotides, supporting their role in translation initiation (Supplementary Figure S3C). Overall, these data indicate that INRI-seq is able to detect the majority of annotated and alternative TISs previously reported in vivo. We then studied two interesting candidates of putative new TISs in more detail.
UidR is a transcriptional repressor of the uid operon, which is involved in transport and degradation of ␤glucosides (52). INRI-seq showed a clear RPF density peak toward the 5 end of uidR, but downstream of the annotated start codon ( Figure 3E). The distance of the identified peak agreed well with an in-frame ATG four codons downstream of the annotated start codon, suggesting that the UidR protein might be 4 aa shorter than annotated (Figure 3E). Indeed, this shorter N-terminus is supported by a recent mass spectrometry study of the N-termini of E. coli proteins, which detected the exact peptide (MQTEAQPTR) that corresponds to the TIS identified by INRI-seq (53). By contrast, in vivo Ribo-RET (18) showed only minimal density in this position, leaving the TIS of uidR ambiguous.
Another interesting alternative TIS candidate identified by INRI-seq is present in wbbI ( Figure 3F), the gene encoding ␤-1,6-galactofuranosyltransferase (54). As for uidR, both INRI-seq and Ribo-RET did not detect RPF density corresponding to the annotated start codon of wbbI (Figure 3F). While in vivo Ribo-RET identified an alternative GTG start codon two codons upstream of the annotated start codon, INRI-seq detected a peak corresponding to an out-of-frame ATG close to the 5 end of the gene. This internal ORF encodes for a 31-aa long peptide, for which we could not find any homologous domains or sequences using Pfam or PHMMER queries, respectively (55,56). Follow-up in vivo studies will be necessary to reveal whether this peptide is produced within the cell. Interestingly, although in vivo Ribo-RET did not identify this alternative TIS, it revealed another out-of-frame internal ORF encoding a 47aa peptide within wbbI (18). Due to its distance from the annotated start codon, INRI-seq was unable to detect this additional alternative TIS. Yet, it is intriguing that a single gene might contain two different out-of-frame ORFs. Overall, INRI-seq is able to validate known alternative TISs and allows identification of putative new TISs, which should facilitate follow-up in vivo studies.

Analysis of in vitro translation inhibition by PNAs
Translation inhibition by antisense oligomers, specifically peptide nucleic acid (PNA), is an upcoming field of antibiotics research. PNAs have the potential for species-specific killing, only targeting the pathogen while leaving beneficial microbiota untouched (24)(25)(26). Similar to bacterial sRNAs (8), antisense oligomers inhibit translation by blocking ribosome access to the ribosome binding site of their targets (57). Yet, the high sample volumes required by standard Ribo-seq studies pose a challenge for studies of translation control and potential off-targeting by such antisense antibiotics in vivo because of the high manufacturing costs of PNAs. We therefore decided to exploit the small-volume, cell-independent nature of INRI-seq to quantitatively study the effects of varying PNA concentrations on translation in a transcriptome-wide manner.
To test whether PNA-mediated translation inhibition can be analyzed in vitro, we adopted the well-established translation inhibition by a 10-mer antisense PNA that sequesters the start codon region of the E. coli acpP mRNA, which encodes the essential acyl carrier protein ( Figure 4A) (28,58,59). To this end, we created a translational fusion transcript containing 50 nt of the 5 UTR and the first 8 codons of acpP fused to the CDS of gfp. Upon addition of increasing concentrations of acpP-PNA, in vitro synthesis of the AcpP::GFP fusion protein was strongly reduced (Figure 4B). Importantly, this was not the case when a scrambled version of the PNA (acpP-PNA-scr), i.e. the same nucleobases as in acpP-PNA but in a randomly shuffled order, was added to the reaction. At an equimolar ratio of acpP-PNA to acpP::gfp (100 nM), translation was inhibited by ∼17%, whereas a 5-fold excess of acpP-PNA caused a ∼74% drop in translation ( Figure 4C). These results are consistent with a previous in vitro study analyzing PNA efficiency in a less-defined reticulocyte extract using DNA as template (60) and demonstrate that our in vitro system can be used to study the effect of antisense oligomers.

Global analysis of PNA target inhibition using INRI-seq
Next, we set out to use INRI-seq to study the global effects of acpP-PNA on translation of its intended target as well as potential off-targets. We employed the same protocol that we used for the TIS analysis, except that we preannealed the synthetic transcriptome with varying concentrations of acpP-PNA or acpP-PNA-scr before starting the translation reactions. Similarly to the observed repression of the acpP::gfp fusion above ( Figure 4B, C), INRI-seq reported a clearly reduced RPF density on the synthetic acpP mRNA with increasing acpP-PNA concentrations ( Figure  5A Figure S4A). This suggested that the in vitro translation kit probably contains traces of highly abundant RNAs such as acpP. This hampers accurate estimations of ratios between acpP-PNA and its targets. Still, even low acpP-PNA concentrations showed a strong effect on acpP translation.
In the E. coli genome, the acpP mRNA is co-transcribed with the downstream gene fabF (61). acpP-PNA has been shown to lead to rapid decay of both cistrons following translation inhibition in cells, even though fabF is not a target of the antisense oligomer (28). In agreement with the notion that fabF is down-regulated in vivo because its transcript is coupled to acpP, INRI-seq did not detect any change in translation of fabF in vitro (where the two transcripts are uncoupled), suggesting that INRI-seq faithfully reports only direct PNA-induced translational changes ( Figure 5B).

INRI-seq identifies direct PNA off-targets
We then analyzed the full spectrum of acpP-PNA-regulated targets. While acpP clearly was the primary transcript affected by the antisense oligomer, translation of a few other transcripts was also reduced at higher acpP-PNA concentrations ( Figure 5C). One of them was yqjF, which has a stretch of 9 complementary bases to acpP-PNA within its translation initiation region ( Figure 5D). Only the very 5 nucleotide of the PNA is a mismatch to yqjF (C:U). This suggests that at high concentration, acpP-PNA will also repress mRNAs that harbor a mismatch to the PNA sequence. Crucially, this was true for the additional off-targets gpmM and ugpQ. The translation initiation regions of both genes are complementary to acpP-PNA with only one mismatch at the 3 end of the PNA ( Figure 5C, Supplementary Figure  S4B, C). Finally, translation of the gpp transcript, which has complementarity to the antisense oligomer around its start codon, but with a T-G pair formed by the PNA's second position (Supplementary Figure S4D), was downregulated at 1 M acpP-PNA as well, although it did not meet our strict cut-off conditions (p adj < 0.001, log 2 FC > |1|; Figure 5C).
The downregulated off-target arnC exhibits full complementarity to the antisense oligomer ( Figure 5C, Supplementary Table S5). However, to our surprise, the acpP-PNA binds ∼120 nt downstream of the start codon of arnC ( Figure 5E) where antisense repression of translation initiation is unlikely (57,62,63). Interrogating our INRI-seq TIS dataset for alternative TISs within arnC, we indeed detected a peak in RPF density corresponding to an alternative inframe TTG start codon, located a few nucleotides upstream of the acpP-PNA binding site (Supplementary Figure S5A). RPF coverage around the annotated TIS was not affected by PNA addition (Supplementary Figure S5B), while the alternative in-frame TIS clearly showed the expected depletion in RPFs when PNA was added (Supplementary Figure  S5C). Since no TISs were identified for arnC by in vivo Ribo-RET (18), it remains unclear whether this alternative TIS is used in vivo.
Overall, these results suggest that, while target mismatches within the PNA sequence are generally thought to render the PNA inactive (58,64), mismatches at the 5 or 3 ends of the PNA are tolerated, leading to target regulation at high PNA concentration. In particular, of the 16 genes that harbor an acpP-PNA complementary sequence with one mismatch within their translation initiation region, six have the mismatch in position 1 or 10 of the PNA (Supplementary Table S5). Among those, the off-targets gpmM, ugpQ and yqjF are significantly downregulated, as discussed above. Of the other three, upp and yodC showed a tendency to be downregulated at the highest acpP-PNA concentrations, though they missed our statistical cut-offs ( Figure 5C), while araH was not regulated at all. The ten genes whose translation initiation regions include a complementary sequence to acpP-PNA with one mismatch in positions 2-8, showed no significant regulation in presence of the antisense oligomer. The same was true for all potential off-targets with two or more mismatches.   This analysis suggests that INRI-seq is able to accurately determine direct transcriptome-wide PNA-mediated inhibition of translation in response to defined PNA concentrations in a small reaction volume. Thus, it not only allows quantitative analysis of the direct effects of the PNA on the intended target transcript, but also enables the identification of PNA off-targets.

DISCUSSION
In this study, we established INRI-seq, a cell-free method to globally study translation of a fully customizable synthetic transcriptome. Presenting two applications for this method, we identified known and putative new E. coli TISs and determined the fidelity of translation inhibition by antisense oligomers. Although optimized here for these purposes, the INRI-seq protocol is easily adaptable to other translationrelated questions. For example, our protocol for PNA analysis can also be applied to the identification of the global targetome of sRNAs, including transcripts that have low abundance in vivo. This would allow the expansion of regulatory networks of known sRNAs and the investigation of the targets of currently understudied sRNAs. For single sRNA targets, in vitro translation has been successfully used to investigate sRNA-mediated regulation, indicating that a global study based on INRI-seq is feasible (63,65,66). Importantly, INRI-Seq allows to include within the synthetic transcriptome targets that carry mutations in the putative sRNA binding site based on in silico predictions (67). This enables testing of targeting specificity in the same experiment.
Since INRI-seq uses a fully synthetic transcriptome, users can analyze any desired sequence compilation. For example, the method can be used to compare translation of transcriptomes of related bacterial species or to analyze translation within complex microbial communities such as the human gut microbiome. INRI-seq also enables the analysis of translation of phage transcripts independent of its host. It also facilitates the study of overlapping CDSs--a common feature in the complex genome organization of phages (68)--by disentangling them into single transcripts. Furthermore, mutational studies, in which a single or multiple transcripts are present in tens or even hundreds of different variants, could be designed to investigate the influence of, for example, synonymous mutations on translation.
The design of the synthetic transcriptome is only limited by the length of the oligonucleotides: currently, up to 350 nt is commercially possible. This restriction necessitates the disruption of bacterial operon structures, which are known to be an important factor in translation regulation (9,69,70). Yet, this property can also be exploited to discriminate between direct and indirect effects: for example, we show that acpP-PNA directly inhibits translation of acpP, but not of fabF ( Figure 5A, B). In contrast, in vivo the PNA influences the stability of the entire dicistronic acpP-fabF mRNA (27,28), which makes it harder to assess its direct effect on translation.
The number of oligonucleotides in a pool (i.e. the available sequence space) is less restricted than the length of the oligonucleotides, with several commercial providers able to synthesize extensive oligo pools. This enables the study of more complex synthetic transcriptomes, including eukaryotic ones. In this case, eukaryotic cell-free translation systems, such as those commercially available for human or rabbit (71), should be used in order to obtain faithful results for these organisms.
INRI-seq allows the study of translation in a quantitative manner, which is supported by the even coverage of transcripts in the synthetic transcriptome (half of the transcripts have concentrations within a ∼2-fold range of the mean concentration) (Supplementary Figure S1A) and read counts in the INRI-seq data ( Supplementary Figure S2B). Concentration-dependent analysis of translationmodulating molecules becomes feasible, because their exact concentration in the reaction is defined. This is not true in an in vivo setup, where the intracellular concentration is unknown. In addition, modulators can be added at high concentrations, which might otherwise lead to premature cell lysis when working with living cells. The small volume of the INRI-seq reactions (25 l) is another advantage, especially if the translation modulator is limited or particularly expensive, like PNAs or antibiotic lead structures. Moreover, since there is no cell barrier or other resistance mechanisms to consider, comparative analysis of different types of antisense oligomers becomes possible, although there is no general mechanism for the cellular delivery of these compounds (24,(72)(73)(74)(75). This enables the study of the direct targets of such molecules, which is often difficult to disentangle in an in vivo setup due to the simultaneous occurrence of direct and indirect effects.
In this study, we globally analyzed TISs of a synthetic E. coli transcriptome. Compared to a technically similar in vivo study (18), we validated almost four times more annotated TISs (3059 vs. 780), demonstrating the high sensitivity of INRI-seq. In addition, we detected 51 out of the 64 alternative TISs (Supplementary Table S3) detected by in vivo Ribo-RET (18). INRI-seq further identified 918 additional putative alternative TISs for 876 genes, of which 279 are the only detected TISs for the respective genes ( Figure 2D). These results are in agreement with a recent study suggesting that 5-12% of prokaryotic genes might have misannotated TISs (76). One reason for TIS misannotation is that gene prediction algorithms generally rely on SD-motifs upstream of the start codon (76,77). Yet, translation of transcripts lacking SD-motifs is relatively frequent (77). For example, it was recently shown in E. coli that A-rich sequences promote translation initiation, whereas SD-motifs influence initiation efficiency, but are not necessary to determine the site of initiation (48). For the putative new TISs we detected in this study, we do not observe enrichment of SDmotifs; instead, we find enrichment of A-rich motifs (Supplementary Figure S3C), which indicates that these TISs might be translated in absence of classic SD-motifs. Still, in vivo validation is required to verify whether these TISs are used within the cell, for example, by mass spectrometry. Analogous to our application of RET to map TISs, we expect INRI-seq to be a useful tool to study the effects of other translation-inhibiting drugs, such as macrolides, lincomides or tetracyclines. INRI-seq further complements inverse toeprinting, a related in vitro method that enables the high throughput analysis of ribosome stalling sites and their upstream arrest sequences (78).

PAGE 13 OF 15
Nucleic Acids Research, 2022, Vol. 50, No. 22 e128 To exploit the cell-free, low-volume nature of INRIseq, we globally analyzed the influence of an antisense PNA on translation. In line with in vivo data (27,28), we found that acpP-PNA is specific for its target, acpP. Downregulation of PNA off-targets that harbor mismatches in the complementary sequence requires substantially higher PNA concentrations. In addition, mismatches are only tolerated in the terminal nucleotides of the PNA. Gratifyingly, this position-dependent effect of mismatches on PNAmediated translation inhibition was also recently shown for Salmonella using a combination of minimum inhibitory concentration and RNA-Seq experiments (79). Further, a recent study investigating the effects of acpP-PNA on global transcript levels in a uropathogenic E. coli (UPEC) strain observed very similar off-target effects as the ones identified by INRI-seq: Of the four off-targets identified by INRI-seq, gpmM and ugpQ were among the top regulated transcripts upon addition of acpP-PNA (27). These findings should aid future PNA design by providing a framework to weigh the predicted specificity of PNAs according to their predicted off-targets.

Limitations of the study
Despite the clear value of INRI-seq for the in vitro analysis of translation, as discussed above, there are several caveats to consider. The commercial in vitro translation kit (PUR-Express, New England Biolabs) used in the INRI-seq protocol is based on the PURE system (80). It consists of individually purified E. coli components required for protein synthesis, such as initiation, elongation and termination factors as well as aminoacyl-tRNA synthetases, tR-NAs, amino acids and ribosomes. This reaction mix allows efficient translation of most transcripts. Nevertheless, the lack of factors like the translation elongation factor EF-P, which is necessary for efficient translation of polyproline stretches (81,82), could hamper the study of specific transcripts. Likewise, the concentrations of the loosely associated ribosomal proteins S1 and bL31, which promote translation of certain mRNAs and the association of the ribosomal subunits, respectively (83,84), might be unnaturally low due to loss during ribosome purification. This might lead to less efficient translation compared to the in vivo situation.
RNA-binding proteins such as Hfq, which facilitates base pairing between sRNAs and their targets (85), are not part of the PURE system, either. Therefore, these proteins need to be added to the reaction mixture when analyzing the effects of sRNAs, as previously done in an analysis of the sRNA GlmZ (66). sRNAs that activate or repress translation of certain mRNAs are also lacking from the kit, which can influence the translation of particular transcripts (8). Similarly, the concentration of the single transcripts in our pool does not reflect the ratios present in vivo. This might lead to unnatural competition for ribosome binding, favoring translation of certain transcripts compared to the in vivo situation. Finally, the presence of contaminating RNAs (Supplementary Figure S1C), which are likely co-purified with the individual proteins of the kit, has to be kept in mind when designing concentration-sensitive, quantitative experiments. Despite these drawbacks, the PURE system works very well for the applications tested here. It enabled INRIseq to globally analyze TISs of E. coli as well as the direct effects of PNA with high sensitivity. By using other cellfree translation systems, such as self-generated S30 extracts from bacteria other than E. coli (86), we expect INRI-seq to be applicable to more distantly related organisms such as the Gram-positive model Bacillus subtilis.
The necessary shortening of the transcripts in the synthetic transcriptome has disadvantages as well. It may influence mRNA folding, which in turn can impact translational efficiency (9) by, for example, affecting start codon recognition or diverting ribosomes to alternative TISs. The shortened transcripts may also engage in aberrant interactions, which could lead to sequestration of ribosome binding sites. Further, the shortened ORFs present in our transcriptome will result in the translation of unphysiological, truncated proteins, which might influence translation in an unpredictable manner. Nevertheless, INRI-seq provides a rapid, flexible in vitro protocol to investigate a wide range of translation-related questions that cannot be easily addressed in vivo. We expect this method to be an attractive alternative to in vivo Ribo-seq, particularly when availability or delivery of a translation-modulating molecule such as an antisense oligonucleotide of interest is limiting.