High-throughput identification of RNA localization elements in neuronal cells

Abstract Hundreds of RNAs are enriched in the projections of neuronal cells. For the vast majority of them, though, the sequence elements that regulate their localization are unknown. To identify RNA elements capable of directing transcripts to neurites, we deployed a massively parallel reporter assay that tested the localization regulatory ability of thousands of sequence fragments drawn from endogenous mouse 3′ UTRs. We identified peaks of regulatory activity within several 3′ UTRs and found that sequences derived from these peaks were both necessary and sufficient for RNA localization to neurites in mouse and human neuronal cells. The localization elements were enriched in adenosine and guanosine residues. They were at least tens to hundreds of nucleotides long as shortening of two identified elements led to significantly reduced activity. Using RNA affinity purification and mass spectrometry, we found that the RNA-binding protein Unk was associated with the localization elements. Depletion of Unk in cells reduced the ability of the elements to drive RNAs to neurites, indicating a functional requirement for Unk in their trafficking. These results provide a framework for the unbiased, high-throughput identification of RNA elements and mechanisms that govern transcript localization in neurons.


INTRODUCTION
In a variety of cell types across a range of species, thousands of RNA molecules are asymmetrically distributed within cells (1)(2)(3)(4). The localization of many of these RNAs is critical for specific cellular functions and developmental patterning (2,5,6). In general, the localization of these RNAs is thought to be controlled by sequence elements, often termed 'zipcodes', that mark an RNA as one to be transported to a specific subcellular location (1,7,8). These sequences are often found in the 3 UTR of the localized transcript and function through the recruitment of RNAbinding proteins (RBPs) that mediate the transport (9). With the exception of a handful of examples (10), the identity of the localization regulatory sequence and the RBP that recognizes it are unknown.
Massively parallel reporter assays (MPRAs) have been used to identify regulatory mechanisms underlying a variety of gene expression regulatory processes including transcription (11), RNA splicing (12,13), RNA stability (14), lncRNA nuclear localization (15,16) and protein abundance (17). These methods test thousands of potential regulatory sequences in parallel in a single experiment. Sequences to be tested can be random or naturally occurring elements drawn from existing genomes. These sequences are integrated into reporter constructs, expressed in cells, and then populations of cells or nucleic acids are isolated based on the phenotype or process to be tested. The abundance of each member of the MPRA library can then be quantified, and MPRA members associated with particular phenotypes or processes are identified. MPRAs therefore allow the rapid and unbiased detection of active regulatory elements from broad sequence pools. Despite the ability of MPRAs to obtain insights into gene regulation, they have not yet been widely used in the study of RNA localization.
In recent years, transcriptomic approaches have been applied to the study of RNA localization, particularly in neuronal cells. In general, these approaches involve isolating and sequencing RNA from cells grown on microporous membranes that allow mechanical fractionation into cell body and neurite fractions (4,(18)(19)(20)(21)(22). High-throughput sequencing of RNA samples derived from these fractions have defined a fairly consistent set of transcripts that are neuriteenriched (19). However, mechanisms underlying how these transcripts become localized, including sequences within them required for localization, are almost completely un-

Identification of localized RNAs from published datasets
Data from 32 previously published subcellular RNAseq datasets (18,19) from both neuronal cell lines and primary mouse cortical neurons were analyzed to identify genes whose transcripts were consistently enriched in neurites. Neurite enrichments were calculated by comparing cell body and neurite expression values using DESeq2 (23), removing any gene that received fewer than 20 read counts in either the cell body or neurite sample. In order to facilitate the comparison of neurite enrichments across samples, enrichment values were then normalized within samples using Z-scores. To define a single enrichment value for each gene, we calculated the median Z-score for a gene across all samples. We then sorted genes by their median Z-score and chose genes that had both high median scores and were consistently neurite-enriched across all samples.
For subcellular fractionation, the cells were grown on porous, transwell membranes with a pore size of 1.0 micron (Corning, #353102). Prior to seeding the cells, the bottom of the membrane was coated with 0.2% Matrigel (Corning #356237, diluted in growth medium). Matrigel was allowed to polymerize for an hour at 37 • C and 5% CO 2 . One million cells were plated on each membrane and membrane was placed in one well of a deep six well plate (Corning, # 353502). One 6-well plate constituted one replicate. The cells were allowed to attach for an hour, and then induced to differentiate into a more neuronal morphology. To induce differentiation of the cell lines, medium was changed to differentiation medium (DMEM/F-12 with 1% Penicillin-Streptomycin solution and DMEM with 1% Penicillin-Streptomycin solution, for CADs and N2As, respectively) and cells were allowed to grow for 48 h in differentiation medium.
After 48 h, the soma and neurite fractions were harvested. The media was aspirated, and the membranes were washed once with 1 ml PBS. 2 ml PBS was added to the well and 1 ml PBS was added to the top (soma) side of the membrane. The top of the membrane was then scraped gently (to avoid puncturing the membrane) but thoroughly with a cell scraper and the soma fraction were collected into a 15 ml falcon tube on ice. After scraping, the membranes were removed from the insert using a razor blade and were soaked in 500 l RNA lysis buffer (Zymo, #R1050) at room temperature for 15 min. This constituted the neurite fraction.
In parallel, the 6 ml of soma suspension in PBS was spun down at 2000g at 4 • C and resuspended in 600 l PBS. 100 l of this soma sample was then used for RNA isolation using the Zymo QuickRNA MicroPrep kit (Zymo Research, #R1050). RNA was also isolated from the 500 l neurite lysate using the same kit. The efficiency of the fractionation was analyzed by western blotting using ß-actin (Soma and neurite marker, Sigma #A5441, 1:5000 dilution) and Histone H3 (Soma marker, Abcam #ab10799, 1:10 000 dilution) antibodies.

RT-qPCR quantification of reporter transcripts
Two wells of the 6-well plate served as one replicate for the RT-qPCR experiment such that one 6-well plate is seeded per reporter construct to be tested in triplicate. The cells were plated on the transwell membrane and fractionated as mentioned above. The RNA from both soma and neurite fractions was collected and purified using Zymo Quick-RNA MicroPrep kit (Zymo Research, #R1050) including the recommended on column DNase I treatment step for 20 min. cDNA from 100 ng of purified RNA from each fractions was synthesized using iScript™ Reverse Transcription Supermix (Bio-Rad, #1708841) in a 10 l volume with a longer incubation time of 30 mins for reverse transcription step. The cDNA was diluted to 20 l with RNAse-free water. 2 l of diluted cDNA is used as the template for qPCR to estimate the abundance of Firefly and Renilla transcripts in soma and neurite fractions. The qPCR reaction was performed using Taqman Fast Advanced Master Mix (Life Technologies) with differently labeled Taqman probe sets for firefly and renilla luciferases (Life Technologies) allowing the use of renilla luciferase RNA counts as an internal control. Firefly luciferase RNA was quantified using a VIC-labeled probe while Renilla luciferase RNA (control) was quantified using a FAM-labeled probe.The observed Ct values of the transcripts were within the recommended dynamic range of the assay. Reactions were carried out using the Biorad CFX384/CFX-Opus 384 thermocycler with the following conditions: UNG activation at 50 • C for 2 min, followed by polymerase activation at 95 • C for 30 s and 40 cycles of 95 • C for 5 s, and 60 • C for 30 s. Finally, a melting curve was performed by incubating samples at 65 • C for 15 s followed by a temperature gradient increase at 0.5 • C/s to 95 • C. Each sample was measured with three technical replicates. To ensure no contamination, no reverse transcriptase and no template controls were performed. Fold enrichment was calculated using the Ct method. MIQE guidelines were followed for all qPCR experiments.

Integration of plasmids into cultured cells
CAD and N2A LoxP cell lines were plated in 5 × 6-well plates at 5 × 10 5 cells per well in respective growth medium 12-18 h before transfection. Cells were then co-transfected with the cloned reporter plasmid mixed with 1% of plasmid expressing Cre recombinase. To transfect one well of a 6-well plate, 1.5 g of reporter plasmid and 15 ng of Creplasmid was mixed with 3 l Lipofectamine LTX reagent (Invitrogen, #15338100), 1.5 l PLUS reagent and 100 l Opti-MEM following the manufacturer's protocol. Cells were incubated with the transfection mixtures for 24 h, followed by the media change. The cells were incubated for an additional 24 h allowing for recovery and expression of the antibiotic resistance before addition of puromycin (2.5 g/ml). The cells were selected in the puromycin until the cells in the control wells died. The cells with stably integrated reporter plasmids were expanded in the growth medium with puromycin. Three-fourth of the expanded cell lines were frozen, and the remaining were induced with 1 g/ml doxycycline for 48 h to express the reporters. After induction, the cells were lysed with TRIzol reagent (Invitrogen, #15596018) and RNA was purified according to the manufacturer's instructions to analyze the number of integrants and diversity of the library.

smFISH visualization of reporter transcripts
CAD cells with integrated reporter constructs/peak oligos were seeded on PDL coated glass coverslips (neuVitro, #H-18-1.5-pdl) placed in the 12-well plate at ∼5 × 10 4 cells per well. Cells were allowed to attach for 1 h, followed by media change to serum depleted media with 1 g/ml doxycycline. Cells were allowed to express constructs and differentiate for 48 h.
After 48 hours, the media was aspirated, and cells were washed 1× with PBS. Cells were fixed in 3.7% formaldehyde in 1× PBS for 15 min at room temperature, then washed 2× with 500 l PBS. Next, cells were permeabilized with 70% ethanol at room temperature for 4 h or at 4 • C overnight. The cells were washed with smFISH Wash buffer A at room temperature for 5 min. In the meantime, hybridization reaction was prepared as follows: 2 l of Stellaris FISH Probe (against Firefly luciferase CDS conjugated with Quasar 670) were added to 200 l of Hybridization buffer. A hybridization chamber was prepared using an empty opaque tip box with wet paper towels below and parafilm covering the top. Hybridization reaction prepared as above (200 l per reporter construct) was added on top of the parafilm and the PDL coated glass coverslips were placed with cell side down facing the hybridization reaction. The hybridization chamber with the coverslips was incubated at 37 • C overnight. Next day, glass coverslips were transferred to fresh 12-well plates with cell side up and incubated with smFISH Wash Buffer A at 37 • C in the dark for 30 min. Samples were stained with fresh smFISH Wash Buffer A supplemented with 100 ng/ml DAPI (Sigma, #D9542-1mg) and incubated in the dark at 37 • C for 30 min. DAPI staining buffer was washed with fresh smFISH Wash Buffer B and incubated for 5 min at room temperature. Coverslips were then mounted upside-down onto slides with 6 l Fluoromount G (Southern Biotech) and sealed with nail polish. Slides were imaged on a widefield DeltaVision Microscope (GE) using 60X objective with 1.4 numerical aperture with 1.33 refractive index oil. The excitation and emission wavelengths used were 547 and 583 nm, respectively.

Oligonucleotide library design and synthesis
The code for designing the MPRA oligos is available at https://github.com/TaliaferroLab/OligoPools/blob/master/ makeoligopools/OligoPools shortstep 260nt.py. This script designed 260 nt oligonucleotides with a step size in between neighboring oligonucleotides of 4 nt. For a given gene, oligonucleotides were designed against the 3 UTRs of all protein-coding transcripts with well-defined 3 ends (as defined as not having the 'cds end NF' or 'mRNA end NF' tags). The polyA site of the UTR was also required to be positionally conserved in humans. This was assessed by getting the syntenic region surrounding the mouse polyA site in the human genome using UCSC liftOver. A polyA site was defined as conserved if there was a polyA site within 200 nt of the corresponding region of the human genome. Additionally, UTRs longer than 10 kb were excluded. UTRs of multiple filter-passing transcripts for a single gene were merged together to create a meta-UTR. Oligonucleotides were then designed against this meta-UTR with the addition of 260 nt upstream and downstream of the beginning and end of the UTR in order to give full coverage of the ends of the UTR with multiple oligonucleotides. 20 nt PCR handles were then added to the ends of every oligonucleotide. The pool of oligonucleotides was synthesized by Twist Biosciences.

Oligonucleotide library cloning
The oligonucleotide pool obtained from Twist Bioscience was resuspended in 10 mM Tris-EDTA buffer, pH 8.0 to a concentration of 10 ng/l. The libraries for each reporter (Firefly and GFP) were amplified by performing 2× PCR reactions 50 l each, using Kapa HiFi HotStart DNA Polymerase (Kapa Biosystems, #KK2601) according to the manufacturer's instructions. We used 10 ng of the original pool as the input DNA template in the reaction and performed 15 cycles in total. The oligonucleotide pool was amplified using primers specific to the 20-nt common sequence and an overhang containing sequence specific to the cloning site for each reporter. After amplification, the PCR reaction was digested with Exonuclease I, at 37 • C for 2 h to digest the single-stranded template and primers. The DNA was then purified using Zymo DNA Clean & Concentrator kit (Zymo Research, #D4013) according to the manufacturer's protocol.
The reporter plasmid was linearized by digesting it with PmeI (NEB) and BstXI (NEB) at 37 • C for 4 h to clone the library into the 3 -UTR of Firefly and GFP reporter, respectively. Digested plasmid DNA was gel extracted using Zymoclean Gel DNA Recovery Kit (Zymo Research, #D4008). The digested plasmid and amplified DNA library were assembled using Gibson Assembly reaction (NEB) using the insert: vector molar ratio of 6:1 at 50 • C for 30 min. The cloned reporter plasmid (∼200 ng DNA) was purified to get rid of excess salts and was then transformed into Escherichia coli using MegaX DH10B T1R Electrocompetent Cells (ThermoFisher, #C640003), using Biorad GenePulser electroporator. The transformed cells are grown in recovery medium at 37 • C for an hour and then plated on pre-warmed Luria broth (LB) agar-Carbenicillin 15-cm plates and incubated at 37 • C overnight. The next day, the colonies were collected in a culture tube on ice by scraping the plates using LB medium and spreader. The bacterial culture was pelleted at 4000 rpm for 20 min. The reporter plasmids libraries were purified using ZymoPURE Plasmid Maxiprep kit (Zymo Research, #D4203). Restriction digestion was performed to confirm that the plasmid library contains only a single insert of the right size.

Simulation of MPRA quantification
In order to determine if short read aligners like Bowtie2 would be able to accurately assign reads to a set of oligonucleotides that differ from each other by only 8 nt (4 nt on each end), we performed a simulation of the MPRA results using different step sizes. The code for this simulation is available at https://github.com/TaliaferroLab/OligoPools/ tree/master/ScreenSimulations.
Oligonucleotides were synthesized from simulated UTRs drawn from the sequence of mouse chromosome 1. UTRs were given random sizes between 500 and 5000 nt. Each oligo was then assigned a random abundance within the simulated MPRA sample. A fastq file of 10 million reads using these abundances was then created. The paired-end fastq files contained the first 97 nt of the oligonucleotide in the forward read and the last 91 nt of the oligo in the reverse read, mimicking the situation encountered in real data after trimming adapters. Deletions and mutations were modeled into the reads at per-base rates of 0.0001 and 0.002, respectively.
This simulated library was then aligned to the reference oligonucleotides using bowtie2 (24) and the following command: bowtie2 -q -end-to-end -fr -no-discordant -no-unal -p 4 -x Bowtie2Index/index -1 forreads.fastq -2 revreads.fastq -S sample.sam Quantifications of oligonucleotides from the aligned reads were then compared to the known quantifications produced in the simulation. This process was then repeated with the addition of the bowtie2 parameter -D 150.
Targeted RNA sequencing of MPRA library 500 ng total RNA from each soma and neurite fractions was taken to synthesized cDNA in a 20 l reaction using Su-perScript IV Reverse Transcriptase (ThermoFisher) according to the manufacturer's protocol, with primers specific to Firefly and GFP CDS containing an 8-nt unique molecular identifier (UMI) and a partial Illumina read 1 primer sequence. The incubation time at extension step (55 • C) was increased to an hour. Post reverse transcription, 1ul each of RNAse H and RNaseA/T1 mix was added directly into the RT-reaction and incubated at 37 • C for 30 min to digest the remaining RNA and RNA:DNA hybrids. The cDNA was purified using Zymo DNA Clean & Concentrator kit (Zymo, #D4013) using 7:1 excess of DNA binding buffer recommended for binding ssDNA.
For library preparation, each purified reporter cDNA reaction was split into five PCR reactions (4 l cDNA/PCR) and amplified using a reporter specific forward primer with Illumina sequencing adaptors and a reverse primer binding the partial Illumina read 1 sequence with the remaining sequencing adaptors using Kapa HiFi HotStart DNA Polymerase (Kapa Biosystems, #KK2601) using 18× cycles for GFP and 23× for Firefly reporter. The five PCR reactions per sample were pooled together and purified using double SPRI beads protocol. In the first purification round, 0.5× SPRI beads were used to get rid of longer DNA products. The supernatant from this purification was then removed and additional SPRI beads were added to bring the final overall concentration to 0.8×. DNA bound to these beads was then washed and eluted. The library was quantified using Qubit dsDNA HS Assay Kits (ThermoFisher, # Q32854) and size of the library was verified using Tapestation (Agilent D1000 ScreenTape).

Alignment and quantification of MPRA results
Adapter sequences were removed from reads using cutadapt (25). Specifically, the sequences GGCGGAAAGATCGC CGTGTAAGTTTGCTTCGATATCCGCATGCTA and CTGATCAGCGGGTTTCACTAGTGCGACCGCAA GAG were trimmed from the 5 ends of the forward and reverse reads, respectively. The trimmed reads were then aligned to the reference oligonucleotide sequences using bowtie2 and the parameters outlined above (including -D 150). Typically, 99% of reads had the expected adapters, and 95% of those aligned to the reference oligonucleotides.
The number of unique UMIs (the first 8 nt of the reverse read) for each reference oligonucleotide was then calculated using https://github.com/TaliaferroLab/OligoPools/ blob/master/analyzeresults/UMIsperOligo.py. These UMI counts were then given to DESeq2 (23) to quantify oligonucleotide abundances in each sample and identify oligonucleotides enriched in soma or neurite samples.

Definition of oligonucleotide unions
To define windows of localization activity within UTRs that contained large numbers of neurite-enriched oligonucleotides, the following procedure was used. First, the UTR was traversed from 5 to 3 until a neurite-enriched oligo (as defined by a DESeq2-calculated FDR of <0.01) was found. When a neurite-enriched oligo was encountered, a new 'window' was opened. This window was extended oligo by oligo until an oligonucleotide that was not neurite-enriched was encountered. This position was defined as x. The distance to the next neurite-enriched oligo was defined as y. All of the oligonucleotides between x and x + y were therefore by definition not neuriteenriched. If all of the oligonucleotides between x + y and x + 2y were neurite enriched, then the window was extended to x + 2y and the process was continued. If not, the window was closed. Code to define these windows can be found at https://github.com/TaliaferroLab/OligoPools/ blob/master/analyzeresults/findMPRAseqs.py.

Generation, culture and motor neuron differentiation of induced human pluripotent stem cells (hiPSC)
Derivation of induced pluripotent stem cells (iPSC) from a healthy donor was described previously (26). hiPSC were grown as previously described . Briefly, hiPSC were maintained in mTESR plus (Stem Cell Technology, #100-0276) medium on hESC-qualified Matrigel (Corning, #354277) coated cell culture plates. hiPSC exhibited a normal karyotype and were regularly tested for my-coplasma contamination and found negative. Direct differentiation was performed essentially as described previously (27). Briefly, cluster formation was initiated when iPSC cultures reached 70-90% confluency using microwells (Ag-grewells800, Stem Cell Technology) at 3000 cells per cluster in the presence of 10 uM Rock Inhibitor Y-27632 (RI, Tocris #1254). Twenty four hours later, clusters were transferred to suspension culture plates at ∼600 clusters per 6well, maintained on an orbital shaker platform (set at rotational speed 100) in a regular cell culture incubator and motor neuron differentiation was initiated. Mature motor neurons were harvested between days 18-26 for downstream analysis.

Genetic engineering of hiPSC
hiPSC were dissociated into single cells using TrypLE by incubation at 37 • C for 8 min. Cells were quenched with mTESR+ media, diluted in PBS, and counted using Mox-iGo II cell counter (Orflow). 2 × 10 6 cells were transferred into microcentrifuge tubes, spun down and washed twice with PBS. Cells were then prepared for nucleofection of TALEN mediated knock-in (KI) (i) or Cre mediated RCME (ii). Nucleofection was performed using P3 buffer following the Amaxa P3 Primary cell 4D-Nucleofector kit protocol (V4XP-3024) using program: CB-150. (i) 0.5 g of AAVS1-TALEN-L and AAVS1-TALEN-R (gift from Dr Danwei Huangfu, Addgene plasmid # 59025) as well as 2 g of targeting plasmid were used for nucleofection of hPSC cells. Nucleofected hPSC were plated in 10cm plates with 10 uM RI, 1× CloneR (STEMCELL # 05888) and 1 M SCR7 (Excess Bioscience #M60082). Forty eight hours after plating, blasticidin selection (10 g/ml) was performed for 10 days. Surviving clonal colonies were picked and expanded for characterization. (ii) For RMCE, 7 × 10 6 cells were electroporated with a constitutive Cre plasmid and RIPE cassette using a BioRad GenePulser electroporation system using an exponential decay with 250 V and 500 F settings conditions. Electroporated cells were plated in 10 cm plates with 10 M RI, and 1× CloneR. 48 hours after plating, puromycin selection (0.5 g/ml) was performed for 48 h. Clonal colonies were picked after ∼10 days and expanded for further characterization.

CRISPR-mediated removal of endogenous RNA localization elements
Two sgRNAs for each gene (Net1 : UUUUUACAGU GUAUCAUGUG and GGGUCAUGCUCUUUACAG GG, Trak2: CAUGUGCUAAUUUAUACACG and UG UCCCUAGAGAACACAAGC) were obtained from Synthego and dissolved in 1× TE buffer at a final concentration of 100 M. Purified Cas9 protein was also obtained from Synthego. RNP complexes were assembled using a 3:1 molar ratio of sgRNA:Cas9 at room temperature for 15 min. The RNP complexes were then co-electroporated with a plasmid encoding GFP into CAD cells using the Neon transfection system (ThermoFisher) using the following settings: 1400 V, 1 pulse for 30 ms. The cells were allowed to recover for 72 h and then GFP-positive single cells were isolated using a MoFlo XDP100 cell sorter. The single cells were allowed to grow for 2 weeks and then expanded for screening.

Screening of CRISPR clones
Genomic DNA from single cell clones was isolated using the PureLink Genomic DNA Mini kit (Invitrogen). The clones were initially screened for the deletion of the localization element by PCR using primers F1 and R1 (Supplementary Figure S5C). After analyzing the full length PCR product in some clones using Sanger sequencing, we found that these apparently wildtype alleles were instead often inversion alleles. We therefore devised a new strategy to distinguish wildtype and inversion alleles (Supplementary Figure SSC). The knockout efficiencies of clones were determined using qPCR with two sets of primers to distinguish between wildtype alleles (Fq2 and Rq2, Supplementary Figure S5C) and edited alleles (Fq1 and Rq1, Supplementary Figure S5C). This qPCR experiment was performed using iTaq Universal Sybr Green Supermix and 100ng genomic DNA in a final volume of 10 l.

Computational predictions of RNA structure
RNA structure predictions were performed using RNAfold (28). For predicted structures of tested oligonucleotides, a window of 80 nt was slid 5 nt at a time across the oligonucleotide sequence. The median minimum free energy (MFE) of these windows was then used as the MFE for the oligonucleotide. For the prediction of guanosine residues participating in quadruplex interactions, the -g flag was used.

Analyses of sequence conservation
Sequence conservation was assessed using phastCons scores (29) downloaded from UCSC.

RNA pulldowns
The pull-down experiments and subsequent mass spectrometry were performed according to an adapted experimental pipeline that was previously established (30). Specifically, 10 l (∼20-35 ng/l) of cleaned-up PCR amplicon (primer sequences are provided in the table below, used with screening pooled library) were used as template of the in vitro transcription (HiScribe™ T7 Quick High Yield RNA Synthesis Kit, #E2050S, New England Biolabs), performed at 37 • C for 16 h, followed by DNase I treatment (37 • C for 15 min). IVT RNAs were then cleaned-up and concentrated (DNA Clean & Concentrator-5; #D4013, Zymo Research).
3 -Desthiobiotin labeling was carried following the manufacturer's' guidelines of Pierce™ RNA 3 End Desthiobiotinylation (ThermoFisher, #20163). Briefly, ∼115 pmol of each RNA was first subjected to fast denaturation in the presence of 25% v/v DMSO (85 • C for 4 min) to relax secondary structures and subsequently labelled at 16 • C for 16 h. RNA binding proteins were isolated using Pierce™ Magnetic RNA-Protein Pull-Down Kit (ThermoFisher, #20164). Briefly, 3 -desthiobiotin labelled RNAs were incubated with magnetic streptaividin-coated beads (50 l of slurry for each RNA probe) for 30 min at room temperature under agitation (600 RPM in a ThermoMixer, Eppendorf). 200 g of cell lysates (in Pierce IP lysis buffer; #87787, ThermoFisher) derived from fully differentiated CAD or N2a cells was then incubated with 3 -desthiobiotinylated-RNA/streptavidin beads at 4 • C for 1 h under agitation (600 RPM). 20 l of each eluate was then analyzed by mass spectrometry.
The sequence TAATACGACTCACTATAGG at the beginning of the forward primers encodes a T7 promoter. The biotinylated probe corresponding to the open reading frame of firefly luciferase served as a negative control.

Mass spectrometry
The pulldown samples were subjected to Trichloroacetic (TCA) precipitation. 20 l of each eluate was mixed with 80 l of water and 100 l of 10% TCA (5% TCA final concentration). The resulting protein pellets were washed twice with cold acetone, dried and dissolved as follows: 45 l of 10 mM Tris/2 mM CaCl 2 , pH 8.2 buffer; 5 l trypsin (100 ng/l in 10 mM HCl); 0.3 l trypsin Tris 1 M, pH 8.2 to adjusted to pH 8. The samples were then processed with microwave-assisted digestion (60 • C for 30 min) and dried. The dried digested samples were dissolved in 20 l ddH 2 O + 0.1% formic acid and transferred to autosampler vials for Liquid chromatography-mass spectrometry analysis (LC-MS/MS). 2 l of sample were injected on a nanoAcquity UPLC coupled to a Q-Exactive mass spectrometer (Thermo Scientific).
The protein identification and quantification was performed using MaxQuant v1.6.2.3 and the data were searched against the Swissprot mouse database. The mass spectrometry proteomics data have been deposited to the ProteomeXchange Consortium via the PRIDE partner repository with the dataset identifier PXD030720.

RBP knockdowns
Multiple dsiRNAs (3-5 siRNAs/protein) targeting mouse HnrnpA2, Unk and APC were obtained as TriFecta kit from IDT. dsiRNAs were transfected two days prior to differentiation of the CAD reporter constructs in the presence of 1 g/ml doxycycline followed by 48 h of differentiation. One 6-well plate per reporter (2 wells/replicate) to be tested was transfected using RNAiMax transfection reagent (Invitrogen) according to the manufacturer's protocol. The efficiency of the knockdown was confirmed by western blot (HnrnpA2) and qRT-PCR (HnrnpA2, Unk and APC).
The RNA was collected and purified using Zymo Quick-RNA MicroPrep kit (Zymo Research, #R1050) followed by DNase I treatment for 1 h at 37 • C. cDNA from 150 ng of purified RNA from each fractions was is synthesized using iScript™ Reverse Transcription Supermix (Bio-Rad, #1708841) in a 10 l volume with a longer incubation time of 30 mins for reverse transcription step. qPCR was carried out using iTaq Universal Sybr Green Supermix, 2 l of 1:4 diluted cDNA, and primers (IDT, see sequences below) against the coding sequence of endogenous HPRT (control) and HnrnpA2/Unk/APC mRNAs. Reactions were carried out using the Biorad CFX384/CFX-Opus 384 thermocycler with the following conditions: polymerase activation at 95 • C for 3 min and 40 cycles of 95 • C for 10 s, and 60 • C for 30 s. Finally, a melting curve was performed by incubating samples at 65 • C for 15 s followed by a temperature gradient increase at 0.5 • C/s to 95 • C. Each sample was measured with three technical replicates. To ensure no contamination, no reverse transcriptase and no template controls were performed. The knockdown efficiency was calculated using the Ct method. MIQE guidelines were followed for all qPCR experiments.

Identification of 3 UTRs sufficient for RNA localization activity
In order to study RNA localization in neuronal cells on a transcriptomic scale, we employed a mechanical fractionation technique using cells grown on microporous membranes (18,19,27,31) ( Figure 1A). With this technique, cells are plated on the top of membranes whose pores allow neurite growth to the underside of the membrane. However, cell bodies are restricted to the top of the membrane. After cell growth and differentiation, the cells are mechanically fractionated into soma and neurite fractions by scraping the top of the membrane. RNA is collected from both fractions and can be analyzed by RT-qPCR or high-throughput sequencing.
We have performed this technique on 32 mouse samples ranging from neuronal cell lines to primary cortical neurons (18,19,27). By amalgamating these results, we compiled repeated observations of the RNA localization patterns of thousands of transcripts and identified those which were reproducibly enriched in neurites (19). We chose to focus on eight genes whose transcripts were strongly and repeatedly neurite-enriched ( Figure 1B, Supplementary Figure S1A) with the goal of identifying sequences within these transcripts that regulate their localization. As controls, we also chose three genes that were reproducibly soma-enriched and one that was neither soma-nor neurite-enriched (Figure 1B, Supplementary Figure S1A). Reporter plasmids expressing the firefly and renilla luciferase transcripts are integrated into the genome through Cre-mediated recombination and are expressed from a bidirectional promoter. Sequences whose RNA localization activity will be tested are fused onto the 3 UTR of Firefly luciferase. The ratio of firefly to renilla luciferase transcripts in soma and neurite samples is measured using Taqman qPCR. Comparing these ratios in soma and neurite samples quantifies localization of the firefly luciferase transcript. (E) 3 UTRs of the indicated genes were fused to firefly luciferase and the neurite localization of the resulting transcript was quantified using RT-qPCR. The neurite localization of the firefly luciferase with no added 3 UTR was used as a control.
To narrow the possible sequence search space, we focused on the 3 UTR sequence of each of these genes as 3 UTRs have been found to regulate the localization of many RNAs (8,18,32,33). We found that the 3 UTRs of the chosen neurite-localized genes were more conserved than other 3 UTRs, suggesting that they may contain functional regulatory elements ( Figure 1C). Using RNAseq data from fractionated human motor neurons (19,27), we found that the localization of the human orthologs of these transcripts was very similar to their mouse counterparts, further suggesting that conserved elements within these transcripts mediate their transport (Supplementary Figure S1B).
To directly test if the 3 UTRs of these transcripts contained RNA localization regulatory elements, we incorporated their 3 UTRs into firefly and renilla luciferase genes driven by a bidirectional doxycycline-sensitive promoter.
The 3 UTRs to be tested were fused to the firefly luciferase transcript while the 3 UTR of the renilla luciferase transcript was kept constant, allowing it to serve as an internal control. We then site-specifically integrated this construct into CAD cells using cre/loxP-mediated recombination (34).
We then prepared soma and neurite fractions ( Figure 1A) and calculated the relative firefly luciferase and renilla luciferase transcript levels in the resulting RNA samples using Taqman RT-qPCR. By comparing these relative firefly to renilla luciferase transcript ratios in the soma and neurite fractions (i.e. calculating a ratio of ratios), we quantified the localization of various firefly luciferase fusion constructs ( Figure 1D).
As a control, we used a construct in which no endogenous 3 UTR was fused to the firefly luciferase transcript. We set the resulting neurite to soma ratio of firefly to renilla ratios of this construct to one. We then tested the effect of fusing the 3 UTRs of the identified neurite-and soma-enriched genes on this localization-quantifying ratio of ratios. We found that fusion of the 3 UTR of 5 of the 8 neurite-localized genes (Trak2, Cplx2, Net1, Cdc42bpg and Gdf11) to firefly luciferase substantially increased neurite enrichment ( Figure 1E). Conversely, none of the four control 3 UTRs had an effect on the neurite enrichment of firefly luciferase RNA. We therefore concluded that RNA localization regulatory elements were located somewhere within the identified active 3 UTRs.
To further confirm that a subset of these 3 UTRs were sufficient to drive RNA localization, we visualized the reporter constructs in cells using single molecule fluorescence in situ hybridization (smFISH) probes against the firefly transcript (Supplementary Figure S1C). As expected, reporter fusions that were neurite-enriched in the RT-qPCR experiment were also neurite-enriched when assayed using smFISH. Similarly, reporter transcripts that were not neurite-enriched in the RT-qPCR experiment were not neurite-enriched by smFISH.

Design of a massively parallel reporter assay
To define RNA localization regulatory elements within active 3 UTRs, we designed an MPRA covering the 3 UTRs of selected genes. Because previously identified localization regulatory elements were often quite large (50-250 nt) (10,35), we chose to use long oligonucleotides that were densely tiled in order to identify regulatory elements with high resolution. We designed 260 nt oligonucleotides that tiled the chosen 3 UTRs with one oligonucleotide every 4 nt (Figure 2A). This resulted in an oligonucleotide library with ∼8100 members in which each individual nucleotide within a UTR was incorporated into 65 distinct oligonucleotides (Supplementary files 1 and 2). As positive controls, we included 3 oligonucleotides that covered the entirety of the 150 nt neurite-localized long noncoding RNA BC1 (36). As negative controls, we included oligonucleotides that tiled the 3 UTRs of four RNAs that were not neurite-enriched ( Figure 1E) as well as oligonucleotides that tiled the length of the nuclear restricted lncRNA Malat1.
We were concerned that high-throughput sequencing read aligners like Bowtie2 (24) might have difficulty correctly assigning reads to oligonucleotides that differ from each other by only 4 nt steps. To assess this, we created multiple simulated MPRA libraries of ten thousand 260 nt UTR fragments. In these libraries, oligonucleotides were separated by 2, 5 or 10 nt steps. We created 10 million mock sequencing reads from these libraries, including mimicked oligonucleotide synthesis and sequencing errors (37). We found that decreasing the step size between oligonucleotides resulted in more reads being incorrectly assigned (Supplementary Figure S2A). We observed that the incorrectly assigned reads had lower mapping qualities (Supplementary Figure S2B). We therefore reasoned that allowing Bowtie2 more chances to find optimal alignments (controlled by the parameter -D) might increase performance. Increasing the value of the parameter from its default of 15-150 allowed Bowtie2 to correctly assign 100% of reads from the 2 nt step library (Supplementary Figure S2C). We therefore concluded that with the modified parameter, Bowtie2 can accurately assign reads from high density MPRA libraries, obviating the need for including a sequence-identifying barcode in the oligonucleotide.

Quality control of MPRA reagents and procedure
To check the quality of the oligonucleotide library, we analyzed it using paired-end high-throughput sequencing. We detected 100% of the expected oligonucleotides in the library and found that most of them were approximately equally abundant ( Figure 2B). We found that ∼65% of the oligonucleotides contained no mutations, insertions, or deletions (Supplementary Figure S2D). However, the majority of indels and mutations occurred in regions of the oligonucleotide sequenced by only one read of the pairedend reaction (Supplementary Figure S2E). In the middle of the oligonucleotide where the reads overlapped, we required that both reads contain a mutation in order to call a mutation. Because the observed mutation rate is much lower in the paired end overlap, we conclude that the majority of observed errors are sequencing errors, and that the oligonucleotide library is largely error-free.
After verifying the quality of the oligonucleotide library, it was cloned into two sequence contexts in a single plasmid: once into the 3 UTR of a gene encoding GFP and once into the 3 UTR of a gene encoding firefly luciferase. Each plasmid molecule therefore contained two different oligonucleotides incorporated into two different doxycycline-inducible reporter transcripts.
To express the reporter library at moderate and controllable levels, we integrated it into mouse CAD and N2A cells using cre/loxP-mediated recombination (34). This ensured that each cell is only expressing one firefly luciferase and one GFP reporter RNA from the same genomic locus to avoid artifacts of over-expression from transient transfection (38).
In order to use this strategy, we needed to know how many independent integration events we could generate. To calculate this, we integrated a plasmid containing a randomized 15 nt segment into 6 million CAD and N2A cells. After selecting for integrants with puromycin, we used targeted high-throughput sequencing to determine the number of unique 15 nt segments that were genomically integrated. We observed ∼600 000 unique CAD integration events and ∼200 000 unique N2A integration events (Supplementary Figure S2F), giving an cre-mediated integration efficiency of 5-10%, which agreed well with previously published values (39). We therefore concluded that integrating our 8100 reporter constructs was feasible. Indeed, after integration, we could detect and reliably quantify the vast majority of oligonucleotides ( Figure 2C, Supplementary Figure S2G).

MPRA results are reproducible across cell lines and reporter transcript scaffolds
We fractionated CAD and N2A cells containing the MPRA library into cell body and neurite fractions in quadruplicate. Using targeted RNAseq, we then quantified the relative abundance of each oligonucleotide in the firefly luciferase and GFP reporters in each fraction (Supplementary Tables S1-S4). UMIs that were incorporated during reverse transcription were used for quantification to exclude PCR amplification artifacts.
We found that oligonucleotide abundances within samples clustered by cellular compartment and that abundances within compartments were highly similar ( Figure 3A, Supplementary Figure S3A-C). We then identified oligonucleotides that were significantly enriched in a compartment using DESeq2 (23), using a false discovery rate (FDR) cutoff of 0.01. Using these parameters, we identified 379 and 220 oligonucleotides that were neurite-and somaenriched, respectively, in the GFP reporter construct in CAD cells. 269 and 132 oligonucleotides were neurite-and soma-enriched, respectively, using the GFP construct in N2A cells.
We then wondered if oligonucleotide enrichments were consistent across cell lines. We found that oligonucleotides that were neurite-enriched in CAD cells had significantly higher neurite enrichments in N2A cells than non-localized oligonucleotides. Conversely, oligonucleotides that were soma-enriched in CAD cells had significantly lower neurite enrichments than non-localized oligonucleotides ( Figure  3B). Further, neurite enrichments for all oligonucleotides in CAD and N2A cells were significantly correlated with each other (Supplementary Figure S3D, E).
To determine if oligonucleotide enrichments were consistent across reporter constructs, we similarly compared neurite enrichments for oligonucleotides embedded in the GFP and firefly luciferase reporters. Oligonucleotides that were significantly neurite-enriched when embedded within the GFP reporter had significantly higher neurite enrichments than expected when embedded in the firefly luciferase reporter (Figure 3C), indicating a broad concordance of results between reporter transcripts. These results suggest that the observed localization activity is independent of both the cell line used and the broader sequence context of the reporter.
If the observed data were robust, then we would expect that two oligonucleotides who neighbor each other in their position within a UTR to have similar neurite enrichments since they share 252 nt of sequence. To assess this, we calculated absolute differences in neurite enrichments for all neighboring oligos. As a control, we shuffled the positions of oligonucleotides along UTRs but within a gene and repeated the analysis. We observed that the neurite enrichments of neighboring oligonucleotides were significantly more similar to each other than in the shuffled control (Figure 3D). We then performed this analysis for each gene individually. Interestingly, in the neurite-enriched genes, neighboring oligonucleotides had similar neurite enrichments. However, this was not true for the soma-enriched genes (Supplementary Figure S3F).

Oligonucleotides drawn from neurite-enriched transcripts are more likely to be neurite-enriched than those drawn from soma-enriched transcripts
If our results were to make biological sense, we would expect that oligonucleotides drawn from neurite-enriched transcripts would be more likely to be neurite-enriched than those drawn from the soma-enriched control transcripts. Reassuringly, we found that oligonucleotides from neurite-enriched genes were 3 times as likely to themselves be neurite-enriched than those contained within somaenriched genes (P = 1e−21, binomial test) ( Figure 3E). When we considered each gene individually, most neuriteenriched genes contained far more neurite-enriched oligos than soma-enriched oligos. This includes our positive control, BC1, for whom two out of its three constituent oligos were significantly neurite-enriched (Supplementary Figure  S3G).

Neurite-enriched oligonucleotides cluster together along the 3 UTR to define peaks of activity
We then arranged oligonucleotides according to their position along their parental 3 UTRs and plotted their neurite enrichments. Encouragingly, we found that for all of the 3 UTRs that were sufficient to direct localization to neurites, peaks of activity within the 3 UTR were clearly visible (Figure 4A-E, Supplementary Figure S4A). These peaks repre- As a control, the positional relationship between all oligonucleotides was randomly shuffled, and the difference in neurite enrichment for neighboring oligonucleotides were recalculated. (E) Number of significantly neurite-and soma-enriched oligonucleotides among those drawn from UTRs from neurite-and soma-enriched genes. All significance tests were performed using a Wilcoxon rank-sum test. P value notation: * < 0.05, ** < 0.01, *** < 0.001, **** < 0.0001. sent RNA elements within the 3 UTR that regulate RNA localization. We were not able to detect peaks of activity in the 3 UTRs of genes that were not neurite-localized ( Figure  4F, Supplementary Figure S4B), further demonstrating the specificity of the experiment. Intriguingly, we did detect one clear peak of activity within the body of Malat1 (Supplementary Figure S4B). Importantly, these peaks span dozens of oligonucleotides, meaning that fully independent yet related oligonucleotides behaved similarly in the MPRA, giving us confidence in the results.
From this data, we formalized an approach for defining roughly contiguous windows of active oligonucleotides (see Methods). Essentially, contiguous stretches of significantly neurite-enriched oligos (FDR < 0.01) were combined to define 'oligonucleotide unions' (Figure 4G). The 4 nt tiling distance between neighboring oligonucleotides affords a high degree of resolution of the boundaries of these unions. Two oligonucleotides, although separated by only 4 nt, can show dramatically different neurite enrichments ( Figure 4H).

Characteristics of neurite-enriched oligonucleotides
We then asked if neurite-enriched oligonucleotides were enriched for specific properties that might define or contribute to their activity. We found that neurite-enriched oligonucleotides had significantly higher adenosine and guanosine contents than nonlocalized oligonucleotides ( Figure 4I) and were also enriched for A/G-rich kmers (40) (Supplementary Figure S4C). Conversely, neurite-enriched oligos were depleted for cytosine ( Figure 4I), consistent with recent reports that C-rich elements are important for nuclear retention of RNA molecules (15,16). Further, all oligonucleotide union sequences had higher average A/G contents than the rest of the 3 UTR that contained them ( Figure 4J, Supplementary Figure S4D-F). These results suggest that A/G richness is a key contributor to the ability of an RNA element to direct transcript localization in neurites.
Multiple previous reports have emphasized a likely role for RNA secondary structure in the definition of RNA localization regulatory elements (41,42). To assess the secondary structure character of our oligonucleotides, we computationally folded them using RNAfold (28). Surprisingly, we found no difference in the minimum free energy of soma-, non-, and neurite-enriched oligonucleotides (Supplementary Figure S4G). We did, however, find that neuriteenriched oligonucleotides were significantly more likely to contain predicted G-quadruplexes (Supplementary Figure  S4H), consistent with previous reports of the ability of Gquadruplex RNA sequences to drive RNA localization to neurites (19,43).
If the identified sequences were functional, we would expect them to be conserved. Unexpectedly, the sequences of neurite-enriched oligonucleotides were less conserved than those of soma-or non-enriched oligonucleotides ( Figure  4K). We suspect that this may be due to two factors. First, if a defining characteristic of our identified regulatory elements is their A/G richness, the exact nucleotide sequence of the element may be less important and therefore less A/G content in all 3 UTR 100 nt windows for nonlocalized (gray) and neurite-localized (purple) RNAs. Localized RNAs were defined as having a median Z-normalized neurite enrichment across 32 RNA localization experiments of at least 2. Nonlocalized RNAs were defined as those with Z-normalized neurite enrichments of less than 2. (N) Median Z-normalized neurite enrichment across 32 RNA localization experiments for genes whose 3 UTR either does (purple) or does not (gray) contain a 100 nt window with at least 75% A/G content. All significance tests were performed using a Wilcoxon rank-sum test. P value notation: * < 0.05, ** < 0.01, *** < 0.001, **** < 0.0001. likely to be conserved. Second, it may not be necessary for these elements to be positionally conserved and therefore alignable across genomes. It may be that the mere presence of the element anywhere within the 3 UTR is sufficient for function. If this were true, then the 3 UTRs of the human orthologs of the genes with active oligonucleotide windows should also contain stretches with high A/G content. We found that this was, in fact, the case ( Figure 4L), suggesting that similar mechanisms regulate the localization of human and mouse orthologous transcripts.
If A/G-rich windows were true regulators of neurite RNA localization, we would expect them to be enriched within neurite-localized RNAs. We therefore asked if, transcriptome-wide, localized RNAs were more likely than expected to contain A/G-rich windows in their 3 UTRs. We analyzed dozens of subcellular RNAseq datasets from a va-riety of neuronal cell types (18)(19)(20)32,44). For each dataset, we compared cell body and neurite expression values using DESeq2 (23) (see Methods), and Z-normalized neurite enrichments within the sample. We then binned RNAs based on their median enrichment with RNAs that had median Z-scores of at least 2 as 'localized' and RNAs with median Z-scores of less than 2 as 'nonlocalized'. We then identified the most A/G-rich 100 nt window within the 3 UTR of each gene. We found that localized RNAs were much more likely to contain a particularly A/G-rich window in their 3 UTRs than nonlocalized RNAs, supporting the idea that such features are associated with RNA localization ( Figure  4M).
We then inverted this analysis and asked if the presence of an A/G-rich window in a 3 UTR was predictive of that transcript being neurite-enriched. We defined A/G-rich Nucleic Acids Research, 2022, Vol. 50, No. 18 10637 windows as a 100 nt stretch of at least 75% A/G and binned RNAs according to whether or not they contained such a window in their 3 UTR. Surprisingly, we found that containing an RNA containing an A/G-rich window in its 3 UTR was not predictive of its localization to neurites (Figure 4N).
A/G-rich windows are therefore found in many neuritelocalized RNAs, but their presence alone is not predictive of RNA localization. There are two possibilities to explain this. First, it could be that additional 3 UTR features outside of an A/G rich window are necessary for neurite localization. However, because A/G-rich oligonucleotides were able to drive RNA localization on their own in the MPRA, we believe this is less likely. Alternatively, there could be additional features beyond just A/G richness within the A/Grich window (e.g. the precise order of the nucleotides or their secondary structure) that are required for localization regulatory activity.

Identified oligonucleotides are necessary and sufficient for transport of reporter RNAs
To verify the ability of individual oligonucleotides to direct RNA transport, we created reporter transcripts containing active oligonucleotides. For each MPRA-defined peak of activity along a 3 UTR, we defined a 'peak' oligonucleotide that lay near the center of the peak of activity (Figure 5A). We fused the peak oligonucleotide for 5 genes to our reporter construct. Using our cell fractionation and RT-qPCR assay, we found that each peak oligonucleotide was sufficient to drive the reporter to neurites (Supplementary Figure S5A). We then tested the localization regulatory activity of peak oligonucleotides from two genes, Net1 and Trak2, using smFISH. We found that in contrast to a reporter transcript lacking an oligonucleotide fusion, reporters containing peak oligonucleotides were highly enriched in neurites ( Figure 5B), again demonstrating that the peak oligonucleotides were sufficient for RNA transport.
We then compared the RNA localization of reporters containing either peak oligonucleotides or the entire 3 UTR from which the peak oligonucleotide was obtained using cell fractionation and RT-qPCR ( Figure 1D). We found that for Net1, although the peak oligonucleotide contained activity, it could not drive neurite-enrichment to the same extent as its parental 3 UTR. This may indicate that there are sequences in the UTR outside of the peak oligonucleotide that contribute to localization but are unable to drive localization on their own. Conversely, for Trak2, the peak oligonucleotide was slightly more active than its parental 3 UTR ( Figure 5C). These results suggest that contextual effects of the sequence surrounding the peak oligonucleotide can influence its regulatory ability.
Next, we compared the localization activities of 3 UTRs either containing or specifically lacking the identified peak oligonucleotides. For both Net1 and Trak2, we found that removal of the peak oligonucleotide completely abolished the activity contained within the entire 3 UTR (Figure 5D), indicating that the peak oligonucleotide is necessary for localization of the reporter transcripts.
We then attempted to identify regulatory elements with higher resolution by defining 'oligonucleotide intersections'. We reasoned that the sequence elements that were driving RNA localization were likely to be those in common to all oligonucleotides contained within an oligonucleotide union ( Figure 5A). Essentially, while an oligonucleotide union would contain the union of all active oligonucleotides within a given region, an oligonucleotide intersection would contain their intersection. We defined oligonucleotide intersections for all identified peaks of activity. Their lengths ranged from 56 nt to 224 nt (Supplementary Figure S5B). The oligonucleotide intersections for Net1 and Trak2 were 56 and 107 nt, respectively.
For both Net1 and Trak2, reporters containing oligonucleotide intersections were significantly less localized than reporters containing peak oligonucleotides, despite the fact that the oligonucleotide intersections are core, relatively long subsequences of the peak oligonucleotides ( Figure 5E). Peak oligonucleotides of length 260 nt contained significantly more activity than their constitutive oligonucleotide intersections of length 56 and 107 nt, suggesting that RNA localization elements are generally much larger than regulatory elements that control other RNA metabolic processes. This is in line with the sizes of previously defined localization regulatory elements (7,10).

Peak oligonucleotides are sufficient to drive RNA localization in human motor neurons
To assess the ability of the peak oligonucleotides to regulate RNA localization in other neuronal systems, we expressed our reporter transcripts in iPS-derived human motor neurons. As with the CAD and N2A systems, we sitespecifically integrated reporter genes into iPS genomes using cre/loxP-mediated recombination. Reporter and control iPS cells were then differentiated into motor neurons (27) and mechanically fractionated into cell body and neurite fractions. (Figures 1A, 5F). Reporter transcripts containing the mouse Net1 peak oligonucleotide were 30% more enriched in neurites than a control reporter transcript (P = 0.03, t-test) ( Figure 5G). This result, combined with the conservation of A/G rich sequences in the 3 UTRs of the human orthologs of the genes tested in the MPRA ( Figure 4L), suggests that A/G rich sequences also regulate RNA localization in human neurons.

Peak oligonucleotide sequences are necessary for transport of endogenous transcripts
Up to this point, all of the experiments aimed at testing the necessity and sufficiency of the localization regulatory elements had been performed using reporter RNAs. To test whether the identified elements in Net1 and Trak2 are required for the localization of endogenous transcripts, we used CRISPR/Cas9 to remove them from the genome of CAD cells (Supplementary Figure S5C). Using a qPCRbased strategy (Supplementary Figure S5C, D), we identified Net1 and Trak2 clones in which, for the majority of alleles, the peak oligonucleotide sequence had been deleted or inverted ( Figure 5H). were defined as the sequences present within any oligonucleotide contained within a stretch of neurite-localized oligonucleotides. Oligonucleotide intersections (red) were defined as those present within all oligonucleotides of a stretch of neurite-localized oligonucleotides. Peak oligonucleotides were those with high localization activity, often found at the center of an oligonucleotide union. (B) smFISH of reporter constructs containing peak oligonucleotide sequences. (C) RNA localization activity, as assayed by RT-qPCR, of peak oligonucleotide sequences and the full UTRs from which they were drawn. Observed neurite enrichments for each construct were compared to the enrichment of a control reporter construct lacking an active 3 UTR. (D) As in C, RNA localization activity of reporter constructs containing peak oligonucleotide or full UTRs lacking the sequences of the peak oligonucleotides. (E) RNA localization activity of reporter constructs containing peak oligonucleotides or oligonucleotide intersections as defined in (A). (F) Schematic of differentiation of human iPS cells into motor neurons. (G) RNA localization activity, as assayed in human motor neurons, of a reporter construct containing the peak oligonucleotide from the mouse Net1 gene. (H) Fraction of UTR alleles that remain wildtype for the sequence in between designed gRNA cut sites. Wildtype cells and CRISPR-generated clones in which approximately 360 bp was deleted from Net1 and Trak2 3 UTRs were interrogated. This fraction was calculated using a qPCR strategy that quantified the relative amount of wildtype and total alleles. (I) Neurite enrichment of endogenous Net1 and Trak2 RNA in wildtype and CRISPR-generated peak oligonucleotide deletion clones. All significance tests were performed using a t-test. P value notation: * < 0.05, ** < 0.01, *** < 0.001, **** < 0.0001. Normally, both Net1 and Trak2 endogenous RNAs are highly enriched in neurites ( Figure 1B). However, the deletion or inversion of the 260 bp peak oligonucleotide sequences within their 3 UTRs resulted in an almost complete loss of their neurite enrichment, demonstrating that these features are necessary for endogenous neurite RNA localization ( Figure 5I). The RNA elements identified in the MPRA are therefore necessary for the localization of the endogenous transcripts from which they were derived.

Unkempt is required for efficient localization of peak oligonucleotides
In order to identify RBPs required for the efficient transport of our identified peak oligonucleotides derived from Net1 and Trak2, we created biotinylated transcripts containing their sequences and incubated them with CAD and N2A cellular extract. We then retrieved the transcripts from the extract using streptavidin and identified proteins bound to them using mass spectrometry ( Figure 6A). As a control, we repeated the same procedure using a sequence derived from the open reading frame of Firefly luciferase. This sequence was present in all tested reporter constructs, both localized and unlocalized, as well as control reporter constructs (Figure 5B-E). It therefore has little intrinsic localization ability and is well-suited to serve as a control in this experiment. Proteins that were specifically bound to the peak oligonucleotides were identified as those more abundant in the peak oligonucleotide RNA pulldown compared to the control RNA pulldown.
The only two RNA binding proteins significantly enriched (FDR < 0.05) on both the Net1 and Trak2 peak oligonucleotide transcripts in both the CAD and N2A samples were Hnrnpa2 and Unkempt (Unk) ( Figure 6B, Supplementary Figure S6A-C, Supplementary Tables S5-S8). We further confirmed the association of Unk with the Net1 and Trak2 probes using immunoblotting ( Figure 6C, Supplementary Figure S6D). To assay the ability of Hn-rnpa2 and Unk to functionally regulate RNA localization through sequences contained within the Net1 and Trak2 peak oligonucleotides, we measured the neurite-enrichment of reporter constructs containing the peak oligonucleotides following Hnrnpa2 and Unk knockdown using siRNAs (Supplementary Figure S6E and F). While the knockdown of Hnrnpa2 expression did not result in reduced neurite localization of either peak oligonucleotide construct (Supplementary Figure S6G), the knockdown of Unk resulted in significant decreases in neurite localization for both constructs ( Figure 6D). Unk protein is therefore both associated with Net1 and Trak2 peak oligonucleotide RNA sequences and required for the efficient neurite localization of transcripts that contain them.
To test if Unk protein is required for the neurite localization of endogenous transcripts, we again used siRNA to knockdown Unk expression and then assayed the neurite enrichment of endogenous Net1 and Trak2 transcripts using RT-qPCR. We found that for both genes, reduction of Unk protein expression resulted in a significant decrease in the neurite enrichment of their transcripts ( Figure 6E, P < 0.01, t test).
RNA localized to the projections of mesenchymal cells through the action of A/G rich sequences can do so through the action of Adenomatous polyposis coli (APC) (45,46). In fact, APC was found to bind A/G rich sequences in vitro and in cells, lending support to this result (47). We therefore speculated that APC might be involved in the transport of the sequence elements we identified. However, we did not identify APC in our mass spectrometry experiment, and the 3 UTRs containing our identified localization elements were not bound by APC in a published CLIP-seq dataset (47). Further, APC knockdown had no effect on the localization of reporter RNAs containing peak oligonucleotides, suggesting that RNA-binding proteins that regulate transport to projections may in some cases be cell-type specific (Supplementary Figure S6H).

DISCUSSION
In this study, we identified RNA sequences that were both necessary and sufficient for the transport of a given transcript to neurites. Because the oligonucleotides we used were quite long (260 nt), this allowed us to capture regulatory elements that may also be large. In contrast to short (4-8 nt) RNA elements that regulate processes like alternative splicing and RNA stability, many of the previously identified localization regulatory elements are tens to hundreds of nucleotides long (10). The elements that we identified likely require similarly long lengths in order to be fully active. For two of the 260 nt active oligonucleotides that we identified, core subsequences of length 56 and 107 nt drawn from those oligonucleotides had significantly less localization activity than the full length oligonucleotide. The full activity of these elements therefore requires more sequence than was contained within the subsequences. This implies that there may be more to the character of these elements beyond simple linear sequence recognition.
A predominant feature of our identified regulatory elements was their peculiar nucleotide composition. Relative to non-enriched oligonucleotides, neurite-enriched oligonucleotides were strongly enriched for adenosine and guanosine residues. We could not discern a more fine-grained sequence enrichment beyond a general preference for adenosine and guanosine. Interestingly, A/G rich sequences have been previously identified in the regulation of localization to projections of mesenchymal cells (45), suggesting that such sequences may be general targeting elements for cellular extensions.
The observation of Unk as an RNA binding protein important for efficient localization of peak oligonucleotidecontaining transcripts is supported by the involvement of Unk in the maintenance of neuronal morphology (48). Interestingly, depletion of Unk results in the loss of neuronal shape while the ectopic overexpression of Unk results in the formation of projections in cells that do not normally have them. The relationship of the regulation of RNA localization by Unk to these functions remains to be seen.
At the time this study was performed, two other groups undertook similar approaches to identify RNA localization regulatory sequences in neuronal cells (30,49). The results Figure 6. Identification and functional validation of RBPs required for peak oligonucleotide localization. (A) Scheme of mass-spectrometry-based experiment to identify RBPs bound to peak oligonucleotide sequences. (B) RBPs derived from CAD extract that were significantly different in abundance(FDR < 0.05) in the Net1 peak oligonucleotide RNA pulldown than the control RNA pulldown. (C) Western blot of RNA pulldowns from N2A cell lysate using RNA baits composed of peak oligonucleotides (Net1 and Trak2) or a portion of the coding sequence of firefly luciferase (control). (D) Neurite-enrichments, as determined by cell fractionation and RT-qPCR, of Net1 and Trak2 peak oligonucleotide reporter transcripts following the siRNAmediated knockdown of Unk. (E) Neurite enrichment of endogenous Net1 and Trak2 following siRNA-mediated knockdown of Unk. All significance tests were performed using a Wilcoxon rank-sum test. P value notation: * < 0.05, ** < 0.01, *** < 0.001, **** < 0.0001. of this study are complementary to the findings of the other two studies and highlight the power of massively parallel approaches to understand RNA localization mechanisms. Still, there are notable differences between our study and the other two contemporaneous reports.
We chose to densely tile long (260 nt) oligonucleotides across a comparatively smaller number of UTRs while the other studies used shorter oligonucleotides (100-150 nt) tiled more sparsely across a larger number of UTRs. While the sparse design does allow interrogation of more localized RNAs, it is not well-suited to the detection of large (>100 nt) regulatory elements. In order to be detectable in an MPRA, the entire sequence of such an element must be contained within a single oligonucleotide, a scenario that is unlikely with smaller, more sparsely tiled oligonucleotides.
This may explain the fact that this study identified large regulatory elements while the other two identified much smaller, 4-15 nt elements. These larger elements may be stronger regulators of RNA localization as we observed up to 100-fold enrichment of reporter RNAs in neurites compared to the maximum 10-fold enrichment reported by the other studies. Further, we found that our identified elements contained essentially all of the localization regulatory activity within their constituent 3 UTRs as deletion of the element eliminated the ability of the entire 3 UTR to regulate RNA localization. In the contemporaneous studies, this was either not tested (49), or it was found that multiple small elements worked together to regulate RNA transport (30).
Additionally, the tight tiling of our oligonucleotides means that neighboring oligonucleotides share a large amount of sequence and are expected to therefore similarly regulate RNA localization. This means that we often obtained many independent measures of a given sequence's regulatory ability, giving us confidence in the results of the experiment. For example, the peak of regulatory activity identified in the UTR of the Net1 UTR spanned 64 independently derived reporter RNA measurements.
In sum, we have identified RNA elements that regulate transcript localization in both mouse and human neuronal cells using an unbiased MPRA approach. Given that hundreds to thousands of RNAs are known to be asymmetrically localized in a variety of cell types, the broad application of MPRAs to RNA localization may be successful in uncovering more elements that govern this process. Further, targeted MPRAs that interrogate specific features within identified elements through mutations and truncations may be useful to determine additional and/or more precise mechanisms that govern RNA localization.

DATA AVAILABILITY
All high-throughput RNA sequencing data as well as oligonucleotide quantifications have been deposited at the Gene Expression Omnibus under accession number GSE183192.