Site-specific labeling of RNA by combining genetic alphabet expansion transcription and copper-free click chemistry

Site-specific labeling of long-chain RNAs with desired molecular probes is an imperative technique to facilitate studies of functional RNA molecules. By genetic alphabet expansion using an artificial third base pair, called an unnatural base pair, we present a post-transcriptional modification method for RNA transcripts containing an incorporated azide-linked unnatural base at specific positions, using a copper-free click reaction. The unnatural base pair between 7-(2-thienyl)imidazo[4,5-b]pyridine (Ds) and pyrrole-2-carbaldehyde (Pa) functions in transcription. Thus, we chemically synthesized a triphosphate substrate of 4-(4-azidopentyl)-pyrrole-2-carbaldehyde (N3-PaTP), which can be site-specifically introduced into RNA, opposite Ds in templates by T7 transcription. The N3-Pa incorporated in the transcripts was modified with dibenzocyclooctyne (DIBO) derivatives. We demonstrated the transcription of 17-, 76- and 260-mer RNA molecules and their site-specific labeling with Alexa 488, Alexa 594 and biotin. This method will be useful for preparing RNA molecules labeled with any functional groups of interest, toward in vivo experiments.


INTRODUCTION
RNA molecules have enormous versatility within living organisms. Structural and biological studies of functional RNA molecules will be facilitated by the site-specific labeling and probing of target RNAs without the loss of activity. The present methods for the chemical synthesis of labeled RNA molecules or post-transcriptional modifications of RNA are very restrictive. Currently, chemical synthesis is limited by the length of the RNA molecule, and posttranscriptional modifications are applicable only to the sitespecific 5 -or 3 -terminal labeling of transcripts. In addition, various types of modifications involve immense amounts of time and effort to synthesize each modified component.
Recently, genetic alphabet expansion technology using unnatural base pairs has rapidly advanced. By creating an unnatural base pair that functions as a third base pair in replication and transcription, an artificial fifth or sixth base could be introduced into DNA and RNA molecules at desired positions. Over the past 15 years, Benner's, Romesberg's and our group reported several types of unnatural base pairs that function as a third base pair in replication, transcription and/or translation (16)(17)(18)(19)(20)(21)(22)(23)(24)(25)(26)(27). Among them, we developed two types of unnatural base pairs, between 7-(2-thienyl)imidazo [4,5-b]pyridine (Ds) and 2-nitro-4-propynylpyrrole (Px) (18,19) and between Ds and pyrrole-2-carbaldehyde (Pa), that function in replication and transcription ( Figure 1A) (17,28,29). The Ds-Px pair exhibits extremely high efficiency and specificity as a third base pair in replication. However, the nucleotide of Px is relatively unstable under basic conditions, and thus the Ds-Pa pair is useful for the site-specific incorporation of the Pa nucleotide into RNA by transcription, opposite Ds in DNA templates. Thus, the combination of the Ds-Px pair for the preparation of Ds-containing templates by polymerase chain reaction (PCR) with the Ds-Pa pair for transcription enables the site-specific labeling of large RNA molecules ( Figure 1B) (29,30). Furthermore, we previously reported that, by attaching an ethynyl group to the Pa base, the triphosphate substrate of the ethynyl-Pa base can be site-specifically incorporated into RNA opposite Ds in DNA templates, by transcription using T7 RNA polymerase (31). The ethynyl-groups in the transcripts can be modified by copper(I)-catalyzed azide-alkyne cycloaddition, using azide derivatives with functional groups. This genetic alphabet expansion method using the ethynyl-Pa base and Ds-containing DNA templates has high potential. However, the click reaction using copper is disadvantageous for subsequent applications of the modified RNA molecules, because the toxic copper contamination of the oligonucleotides impedes in vivo applications (32)(33)(34)(35).
Here, we present the site-specific labeling of transcripts by the combination of genetic alphabet expansion and copperfree click chemistry. We chemically synthesized a triphosphate of 4-(4-azidopentyl)-pyrrole-2-carbaldehyde (N 3 -Pa) ( Figure 1C) and performed the site-specific incorporation of N 3 -PaTP into RNA, opposite Ds in templates, by T7 RNA polymerase. The N 3 -Pa-containing transcripts were then efficiently modified with cyclooctyne-based probes. This method enables the site-specific labeling of large RNA molecules with any probes of interest.

MATERIALS AND METHODS
Chemical synthesis 1 H-, 13 C-and 31 P-NMR spectra of compounds dissolved in CDCl 3 , DMSO-d 6 or D 2 O were recorded on a BRUKER (300-AVM) magnetic resonance spectrometer. Coupling constant (J) values are given in Hz and are correct to within 0.5 Hz. All reagents were purchased from Aldrich, Nacalai Tesque, TCI (Tokyo Chemical Industry Co., Ltd.) and Wako (Wako Pure Chemical Industries, Ltd.). Thinlayer chromatography was performed using TLC Silica Gel 60 F 254 plates (Merck). Nucleoside derivatives were purified with a Gilson HPLC system using a preparative C18 column (-BONDASPHERE, Waters, 19 mm × 150 mm). The triphosphate derivatives were purified by chromatography on a DEAE Sephadex A-25 column (300 mm × 15 mm) and a C18 column (CAPCELL PAK MG III, SHI-SEIDO, 4.6 mm × 250 mm). High resolution mass spectra (HRMS) and electrospray ionization mass spectra (ESI-MS) were recorded on a Varian 901-MS spectrometer and a Waters micromass ZMD 4000 mass detector equipped with a Waters 2690 LC system, respectively.

Biological experimental methods
DNA fragments containing Ds were chemically synthesized with an automated DNA synthesizer (model 392, PerkinElmer Applied Biosystems) or an Oligonucleotide Synthesizer nS-8 (Gene Design) using phosphoramidites of the natural and Ds bases (Glen Research). DNA fragments consisting of only the natural bases were synthesized as described above or purchased from Invitrogen or Gene Design. The chemically-synthesized oligonucleotides were purified by gel electrophoresis.

T7 transcription
Transcription for the 17-mer RNA was performed in a reaction buffer (20 l), containing 40 mM Tris-HCl (pH 8.0), 24 mM MgCl 2 , 2 mM spermidine, 5 mM DTT and 0.01% Triton X-100, in the presence of 1 mM natural base substrates (NTPs), 0, 1 or 3 mM N 3 -PaTP, 2 M DNA tem-  Table 1. plate, and 50 U T7 RNA polymerase, unless otherwise indicated. Transcription for the tRNA was performed in the presence of 2 mM natural NTPs, 0, 1 or 2 mM N 3 -PaTP, and 0.5 M DNA template, and transcription for the 260-mer RNA was performed in the presence of 2 mM natural NTPs, 0, 0.25, 0.5, 1.0 or 2.0 mM N 3 -PaTP, and 0.1 M DNA template. Internally 32 P-labeled transcripts for the 17-mer or tRNA were prepared by transcription reactions including 2 Ci [␣-32 P] ATP or 3 Ci [␣-32 P] CTP (PerkinElmer), respectively. To prepare transcripts labeled with 32 P at the 5 -end, the gel-purified transcripts were labeled with [␥ -32 P]-ATP (PerkinElmer) and T4 polynucleotide kinase (Takara), after treatment with Antarctic phosphatase (New England Biolabs). After transcription at 37 • C for 3 h, the reaction was quenched by adding an equal volume of denaturing solution, containing 10 M urea in 1 × TBE. The mixtures were heated at 75 • C for 3 min, and the transcripts were purified by denaturing gel electrophoresis (15% PAGE-7 M urea for 17-mer RNA, 10% PAGE-7 M urea for tRNA and 6% PAGE-7 M urea for 260-mer RNA).

Nucleotide-composition analysis of T7 transcription
The gel-purified transcripts were digested by 0.075 U/l RNase T2 (Sigma-Aldrich) at 37 • C for 2 h, in 15 mM sodium acetate buffer (pH 4.5). The digestion products were analyzed by 2D-TLC, using an HPTLC plate (10 × 10 cm, Merck) with the following developing solvents: isobutyric acid-ammonia-water (66:1:33 v/v/v) for the first dimension, and isopropyl alcohol-HCl-water (70:15:15 v/v/v) for the second dimension. The TLC plates were analyzed with an FLA-7000 bioimager (GE Healthcare). The quantification of each spot was averaged from three data sets.

Gel mobility shift assay
Biotinylated 260-mer transcripts were detected by gel mobility shift assays, using streptavidin (Promega). Binding experiments were performed in a reaction buffer (10 l), containing 10 mM Tris-HCl (pH 7.6), 50 mM NaCl, 10 mM EDTA, 2 pmol 32 P-labeled gel-purified RNA transcript, and 100 pmol streptavidin. After an incubation at 20 • C for 1 h, the products were separated by gel electrophoresis and analyzed with an FLA-7000 bioimager.

T7 transcription using N 3 -PaTP and Ds-containing DNA template (35-mer) and click reaction
We first examined the selectivity of the incorporation of N 3 -PaTP into RNA opposite Ds in templates by T7 transcription. To determine the accuracy of the selectivity, full-length transcripts (17-mer) were prepared by using 35-mer Dscontaining DNA templates and analyzed by denaturing gelelectrophoresis (17). Transcription was performed by using 1 mM natural base substrates (NTPs) and 0 or 1 mM N 3 -PaTP, including 2 Ci [␣-32 P] ATP, with 2.5 units/l T7 RNA polymerase at 37 • C for 3 h. When using the short Dscontaining DNA templates, the transcription efficiency involving the unnatural base pair is generally lower, by around 45% in this case ( Figure 2B, lane 4), relative to that using DNA templates consisting of only the natural bases ( Figure 2B, lane 1). As we previously reported (17,29,31), the transcription efficiency with long DNA templates (>50mer) containing Ds is as high as that with natural base DNA templates, as shown in later. In the absence of N 3 -PaTP, the transcription efficiency was significantly reduced (Figure 2B, lane 3), and thus N 3 -PaTP was predominantly incorporated into RNA, opposite Ds in templates.
For the nucleotide-composition analysis, the transcripts were digested by RNase T 2 to give nucleoside 3 -phosphates (16,17). Since A is incorporated after the unnatural base position in the transcripts, the incorporated N 3 -Pa or misincorporated natural base nucleosides are labeled as 3 -32 Pphosphates by adding [␣-32 P] ATP in transcription (the stars (*) in the sequence shown in Figure 2A indicate the 32 P-labeled positions). After the digestion of transcripts by RNase T 2 , the labeled nucleoside 3 -phosphates were quantitatively analyzed by 2D-TLC ( Figure 2C). When using the Ds-containing template, 93% and 96% of N 3 -Pa were incorporated into the transcripts, in the presence of 1 and 3 mM N 3 -PaTP, respectively (Table 1). When using the template consisting of only the natural bases, no N 3 -Pa was observed in the digested products, even in the transcription using 3 mM N 3 -PaTP. Therefore, N 3 -PaTP was selectively incorporated opposite Ds in the template.
Next, we examined the copper-free click reaction of the transcripts (17-mer), using the fluorescent Alexa Fluor 594-DIBO Alkyne (the dibenzocyclooctyne derivative of Alexa 594, Alexa 594-DIBO) (Figure 2A). For the click reaction, we used the transcripts obtained by T7 transcription in the presence of 1 mM N 3 -PaTP and NTPs. The transcripts (5 M) were treated with 5 or 10 molar equivalents of Alexa 594-DIBO, in 200 mM sodium phosphate buffer (pH 7.0) at 37 • C for 5 h. The products were analyzed by gel-electrophoresis ( Figure 3A). Both the reacted and unreacted transcripts were identified from their radioactivities on the gel ( 32 P-detection in Figure 3A), and Alexa 488 was detected by 532-nm excitation (Ex 532 nm in Figure 3A). Most of the transcripts were reacted with at least five molar equivalents of the DIBO reagent (lane 8 in Figure 3A), and no products were observed from the transcripts obtained by using the natural-base template in the presence of N 3 -PaTP (lanes 5 and 6 in Figure 3A), confirming that N 3 -PaTP is rarely misincorporated into RNA opposite the natural bases in templates. From the 32 P-detection band densities of the gel, around 93% of the transcripts were reacted with five molar equivalents of the DIBO reagent. Since the incorporation selectivity of N 3 -PaTP was 93% when using 1 mM N 3 -PaTP and NTPs, all of the N 3 -Pa in the transcripts could be modified.
We also examined the copper-free click reaction of the 17-mer transcripts with Alexa 488-DIBO and biotin-DIBO ( Figure 3B). Alexa 488 was detected by 473-nm excita- h. Products were fractionated on a 20% denaturing polyacrylamide gel containing 7 M urea and were detected by their radioactivity ( 32 P-detection) or fluorescence (Ex 473 nm for Alexa 488 and Ex 532 nm for Alexa 594), using an FLA-7000 imager in the FAM mode (excitation 473 nm/emission filter Y520) and the Cy3 mode. The amounts of click products (yields) were determined from the band intensities, and each yield was averaged from three data sets.
tion. The reactivity of biotin-DIBO was lower than those of Alexa 488-and 594-DIBO, and increasing the molar equivalents (10 eq.) of the DIBO reagent and the reaction time (19 h) improved the reactivity (lane 12 in Figure 3B).

Site-specific labeling of tRNA molecules containing N 3 -Pa by the copper-free click reaction
Since we confirmed the high incorporation selectivity of N 3 -PaTP and the high reactivity of N 3 -Pa for the copperfree click reaction using short RNA fragments (17-mer), we next performed the site-specific labeling of a tRNA molecule, as a longer transcript (76-mer). To incorporate N 3 -Pa specifically at positions 35, 47 and 59 in yeast tRNA Phe transcripts, we prepared each DNA template containing Ds (35Ds, 47Ds and 59Ds) by ligation, using chemically synthesized DNA fragments. We introduced 2 -Omethyl-ribonucleosides at two positions from the 5 -termini of the template strands, to reduce the by-products obtained by non-templated nucleotide additions during T7 transcription (38). Transcription was performed in the presence of 0, 1 or 2 mM N 3 -PaTP and 2 mM NTPs. After purification by denaturing gel-electrophoresis, the transcripts were reacted with 10 molar equivalents of each DIBO reagent, Alexa 488-DIBO or Alexa 594-DIBO, at 37 • C for 5 h, and the fluorescently-labeled products were fractionated on a gel and analyzed (Figure 4). Each position of the tRNA was effectively labeled with Alexa 488 and Alexa 594. The labeling efficiencies of each transcript obtained by using 2 mM N 3 -PaTP (lanes 13-15 and 18-20 in Figure 4) were higher than those obtained by using 1 mM N 3 -PaTP (lanes 3-5 and 8-10 in Figure 4). Slight misincorporations of N 3 -PaTP opposite the natural bases were also observed (lanes 2, 7, 12 and 17 in Figure 4).
To determine the precise incorporation selectivity of N 3 -PaTP at different concentrations (1 or 2 mM) during transcription, the transcripts obtained using the 47Ds and 59Ds templates were subjected to nucleotide-composition analyses ( Figure 5 and Table 2). When using 1 mM N 3 -PaTP, the incorporation selectivity was around 85-86%, and when using 2 mM N 3 -PaTP, the incorporation selectivity increased to around 93-95%. However, the misincorporation rates were also increased with the higher concentration of N 3 -PaTP (Table 2), as shown in lanes 2, 7, 12 and 17 in Figure 4. The misincorporation rates per base were 0.059% and 0.12% for transcription reactions in the presence of 1 mM and 2 mM N 3 -PaTP, respectively.

Transcription and site-specific labeling of the 260-mer RNA
To demonstrate the wide range of applications of this labeling method, we examined the transcription of a much longer RNA molecule (260-mer) and its specific labeling at position 43. For the transcription, a 282-bp double-stranded DNA template containing Ds was prepared by fusion PCR, involving another set of an unnatural base pair between Ds and Px that exhibits high fidelity in PCR, using a plasmid DNA ( Figure 6A) (30). To evaluate the misincorporation of N 3 -PaTP opposite natural bases in the long template, we also prepared a control 282-bp double-stranded DNA template consisting of only the natural base pairs. T7 transcription was performed using these templates with different concentrations (0, 0.25, 0.5, 1.0 or 2.0 mM) of N 3 -PaTP and 2 mM natural NTPs, at 37 • C for 3 h. After transcription, the transcripts were purified by denaturing gel electrophoresis. The 5 -phosphate of the transcripts was removed by phosphatase, and then the 5 -terminus of each transcript was 32 Plabeled with T4 polynucleotide kinase and [␥ -32 P] ATP, for further analysis.
The 32 P-labeled transcripts (1 M) were reacted with 50 molar equivalents of DIBO reagents (Alexa 594-DIBO or biotin-DIBO) at 37 • C for 5 h (for Alexa 594-DIBO) or for 19 h (for biotin-DIBO), and the modified transcripts were analyzed by denaturing gel electrophoresis ( Figure 6B for Alexa 594 modification and Figure 6C for biotinylation). The presence of higher concentrations of N 3 -PaTP during the transcription of the Ds-containing DNA template increased the labeling efficiency with Alexa 594-DIBO (lanes 6-9 in Figure 6B). Simultaneously, the misincorporation rates of N 3 -PaTP opposite the natural bases also increased with the higher N 3 -PaTP concentrations, as shown in the  Table  2. Alexa 594-DIBO modification of the transcripts from the control DNA template (lanes 1-5 in Figure 6B). To estimate the incorporation selectivity of N 3 -PaTP opposite Ds and the misincorporation rate of N 3 -PaTP opposite the natural bases in transcription, we quantified the biotin-labeled transcripts prepared with biotin-DIBO, by gel-shift assays using streptavidin ( Figure 6C). When using 0.25, 0.5, 1.0 and 2.0 mM N 3 -PaTP, the selectivities of the N 3 -Pa incorporation opposite Ds were 60%, 70%, 79% and 86%, respectively (lanes 11-18 in Figure 6C), and the misincorporation rates were 0.015%, 0.023%, 0.042% and 0.077% per base, respectively (lanes 1-10 in Figure 6C).
At a glance, the selectivity and misincorporation rates were lower than those of the tRNA transcripts. This might be because of the low reactivity of biotin-DIBO. Thus, even for long RNA molecules, the Ds and N 3 -Pa base pair system functions with high selectivity in transcription and posttranscriptional modification by the copper-free click reaction.

DISCUSSION
We developed a site-specific post-transcriptional modification method by the genetic alphabet expansion system, us- , incubated at 20 • C for 1 h, and then analyzed on a 6% denaturing polyacrylamide gel. The amounts of the complex (yields) between biotinylated transcripts and streptavidin were determined from the band intensities, and each yield was averaged from three data sets.
ing the Ds-Pa pair combined with copper-free click chemistry. The chemically synthesized nucleoside 5 -triphosphate of the azide-conjugated unnatural Pa base was very stable under physiological conditions, and was efficiently and selectively incorporated into RNA opposite Ds in templates by T7 transcription. The incorporated N 3 -Pa in transcripts can be modified by copper-free click chemistry with any DIBO derivatives. Due to the difficulties of phosphoramidite synthesis with azide-nucleoside derivatives (13)(14)(15), the chemical synthesis of RNA containing azide groups is very limited. Thus, our unnatural base pair system would be very useful for the site-specific incorporation of azide groups into RNA. Furthermore, long RNA molecules (more than 200 bases) could be modified by this system, within the constraint of the base pairing fidelity. For the transcription of long RNA molecules, their DNA templates can be prepared and amplified by PCR involving the Ds-Px pair, as shown in Figure 6A.
When using an equivalent molar ratio of the natural and N 3 -Pa base substrates, the incorporation selectivity of N 3 -PaTP opposite Ds is 85-86% and the misincorporation rate of N 3 -PaTP opposite the natural bases is 0.059% per base. This fidelity is very similar to our previous data for the biotinylated PaTP, with 90% selectivity and a 0.059% misincorporation rate (17). Furthermore, these unnatural base misincorporation rates are lower than those of the noncognate pairings among the natural bases. Our previous data for the misincorporation rate of biotinylated uridine 5'-triphosphate (UTP) opposite the natural bases, except for A, showed 0.138% per base in T7 transcription under similar conditions (17). This natural base misincorporation rate is equivalent to that (0.12% per base) in the transcription using 2 mM N 3 -PaTP and 2 mM natural NTPs, in which the selectivity of N 3 -PaTP opposite Ds increases to 93-95%. Thus, depending on the desired objectives, the molar ratio between the unnatural and natural base substrates should be adjusted. If the modification efficiency is important, then the combination of 2 mM N 3 -PaTP and 2 mM natural base NTPs should be chosen. In contrast, if a single modification is required, such as for analyzing a single RNA molecule, then reducing the N 3 -PaTP concentration is recommended.
The ability to incorporate a new variable unit into biopolymers will stimulate researchers' imagination. Fluorescently-labeled RNA molecules are useful to examine interactions with other fluorescently-labeled molecules, such as tRNA-ribosome and tRNA-rRNA interactions in translation, by fluorescence resonance energy transfer (FRET) in an in vitro system (39). The injection of labeled RNA molecules into a cell will allow observations of the localization of functional RNA molecules at the single cell level (40). However, our method is still just shy of being applicable to single-molecule analyses using labeled large RNA molecules, like rRNA, because of the slight misincorporation of N 3 -Pa opposite natural bases.
An advantage of the unnatural base pair system is that it can be applied to in vivo experiments. Recently, Romesberg's group demonstrated the introduction of an artificial DNA containing their unnatural base pair into Escherichia coli (24). Thus, our RNA modification method could be used for analyzing the localization of target RNA molecules that are expressed in a cell: by transcription with N 3 -PaTP using a cell in which the Ds-Px pair is introduced into a specific position of a target gene, the target transcripts modified with N 3 -Pa could be observed by histological staining using DIBO fluorescence. To realize this in vivo imaging, an efficient incorporation method of fluorescent click chemistry reagents into cells without any negative effects should be developed in the future.
We now have two sets of labeled RNA molecules, in which one has N 3 -Pa at a desired position and another has ethynyl-linked Pa (Eth-C4-Pa) that we previously reported (31). The combination of N 3 -Pa and Eth-C4-Pa in two RNA molecules might be fixed by clicking each other through the intermolecular RNA-RNA interaction. By using the mixture of N 3 -PaTP and Eth-C4-PaTP in transcription, the tertiary structure of a single RNA molecule could be fixed between two incorporation sites with a 50% chance if the sites between N 3 -Pa and Eth-C4-Pa are spatially close each other.
Another type of double-labeling in a single RNA molecule is also possible by this Ds-Pa pair transcription system. Modified Ds bases, as well as Ds, can be incorporated into a specific position of transcripts, opposite Pa in templates, by T7 transcription (41,42). One of the modified Ds bases is a strongly fluorescent 7-(2,2'-bithien-5yl)-imidazo [4,5-b]pyridine (Dss) base (excitation max.: 370 nm, emission max.: 442 nm) (41,43), and FRET experiments of functional RNA molecules can be performed by site-specific double-labeling with Dss and other fluorescent groups with an excitation range of 440-550 nm, such as Cy3 and Alexa 555, which can be introduced into an N 3 -Pa position in transcripts. Another unnatural base, 2-amino-6thienylpurine (s), is also fluorescent (excitation max.: 352 nm, emission max.: 434 nm) and can be incorporated into RNA opposite Pa by T7 transcription (42,44). The doublelabeling of functional RNA molecules by the combination of fluorescent-Pa and s is in progress.

SUPPLEMENTARY DATA
Supplementary Data are available at NAR Online.