DDX3 depletion represses translation of mRNAs with complex 5′ UTRs

Abstract DDX3 is an RNA chaperone of the DEAD-box family that regulates translation. Ded1, the yeast ortholog of DDX3, is a global regulator of translation, whereas DDX3 is thought to preferentially affect a subset of mRNAs. However, the set of mRNAs that are regulated by DDX3 are unknown, along with the relationship between DDX3 binding and activity. Here, we use ribosome profiling, RNA-seq, and PAR-CLIP to define the set of mRNAs that are regulated by DDX3 in human cells. We find that while DDX3 binds highly expressed mRNAs, depletion of DDX3 particularly affects the translation of a small subset of the transcriptome. We further find that DDX3 binds a site on helix 16 of the human ribosomal rRNA, placing it immediately adjacent to the mRNA entry channel. Translation changes caused by depleting DDX3 levels or expressing an inactive point mutation are different, consistent with different association of these genetic variant types with disease. Taken together, this work defines the subset of the transcriptome that is responsive to DDX3 inhibition, with relevance for basic biology and disease states where DDX3 is altered.

Inactivation of Ded1 in yeast leads to polysome collapse and global downregulation of translation (21,22). More recent work showed that Ded1 is required for translation of most transcripts in yeast using genome-wide approaches (1,23). In contrast, DDX3 depletion seems to only affect translation of a subset of expressed transcripts (2,4,(24)(25)(26). Despite the importance of DDX3 to normal function and its alteration in diverse disease states, the set of genes that depend on DDX3 for translation is not clearly defined. Moreover, it has been challenging to relate DDX3 binding to functional effects on bound mRNAs, and it was unclear if DDX3 functions outside of translation initiation given that binding was detected in coding sequences and 3 UTRs (3,12).
Here, we depleted DDX3 protein levels and measured alterations to translation and RNA abundance using ribosome profiling and RNA-seq. We also characterized DDX3 binding by PAR-CLIP, using the presence of T>C mutations as a diagnostic hallmark of protein-RNA interactions. We observed robust interactions between DDX3 and transcript 5 UTRs, as well as a specific and conserved site on the 18S ribosomal rRNA. We found that transcripts with structured 5 UTRs are preferentially affected by DDX3. We used in vitro and cellular reporter systems to conclude that decreases in ribosome occupancy upon DDX3 depletion are driven by 5 UTRs. Taken together, our results support a model for DDX3 function where interactions with the small ribosomal subunit facilitate translation on messages with structured 5 UTRs, which, when inactivated, pathologically deregulates protein synthesis.

NGS data pre-processing
Ribo-seq fastq files were stripped of the adapter sequences using cutadapt. UMI sequences were removed and reads were collapsed to fasta format. Reads were first aligned against rRNA (accession number U13369.1), and to a collection of snoRNAs, tRNAs and miRNA (retrieved using the UCSC table browser) using bowtie2 (27) in the 'local' alignment mode.
Remaining reads were mapped to the hg38 version of the genome (without scaffolds) using STAR 2.6.0a (28) supplied with the GENCODE 32 .gtf file. A maximum of three mismatches and mapping to a maximum of 50 positions was allowed. De-novo splice junction discovery was disabled for all datasets. Only the best alignment per each read was retained. Read counts for all libraries are in Supplementary  Table S3.

PAR-CLIP peak calling
Peak calling for PAR-CLIP reads was performed with PARalyzer v1.5 (29) in the 'EXTEND BY READ' mode using the following parameters: Coverage-normalized T>C conversions on rRNA for positions with 2000 reads or more (Supplementary Figure  S4A) were mapped onto the 18S rRNA sequence from PDB entry 6FEC and visualized using UCSF Chimera (30).

Differential expression analysis
Count matrices for Ribo-seq and RNA-seq were built using reads mapping uniquely to CDS regions of protein-coding genes, using the Bioconductor packages GenomicFeatures, GenomicFiles and GenomicAlignments (31). Genomic and transcript regions where extracted using Ribo-seQC (32). Only reads mapping for more than 25nt were used.
Differential analysis was using DESeq2 (33). Concordant changes were defined using an FDR cutoff of 0.01 for RNAseq and Ribo-seq individually and ensuring the same directionality in the estimated fold changes.
Changes in translation efficiency were calculated using DESeq2 by using assay type (RNA-seq or Ribo-seq) as an additional covariate. Translationally regulated genes were defined using an FDR cutoff of 0.05 from a likelihood ratio test, using a reduced model without the assay type covariate, e.g. assuming no difference between RNA-seq and Ribo-seq counts (34).
For both RNA-seq and Ribo-seq, only genes with Base-Mean >8 or more than the bottom 10% of the library were used. GO enrichment analysis was performed with the topGO package (version 2.38.1; available from BioConductor), using the Fisher test with default parameters.
The Random Forest regression was run using the ran-domForest package (version 4.6-14; available from CRAN) with default parameters. Lasso regression was performed on scaled variables using the glmnet package (35). The following features for each gene were used: • TPM values using RNA-seq (in log scale); • Baseline TE levels, defined as ratio of Ribo to RNA reads; • Baseline RNA mature levels, defined as lengthnormalized ratio of RNA-seq reads in introns versus exons; • GC content, length (in log scale) and ribosome density in:  (38) Feature importance (measured by mean decrease in accuracy for the random forest model) and correlation between predicted and measured test data were calculated on a 10fold cross-validation scheme.

Meta-transcript profiles
PARalyzer peaks (and peaks from the POSTAR2 repository (39)) were mapped on transcript coordinates using one coding transcript per gene: such transcript was chosen to have the longest 5 UTR and the most common annotated start codon for that gene . Transcript positions were converted  into bins using 15 bins for each 5 UTR, 30 bins for each  CDS and 20 bins for each 3 UTR. Peak scores were normalized for each transcript (to sum up to 1), and values were  summed for each bin to build aggregate profiles, as in Figure 4B. When plotting profiles for different RBPs. the aggregate profiles were further normalized to a sum of 1. To build the average meta-transcript profile in Figure 4C, conversion specificity values were averaged per transcript bin. To create shuffled profiles in Figure 4F and Supplementary Figure  S4, 5 random positions for each peak were taken from the same bound UTR.

De novo motif finding
The STREME algorithm (40) was used to perform de novo motif finding, using PAR-CLIP peaks from the POSTAR2 repository, selecting peaks from HEK 293 cells called with PARalyzer as above, and selecting for peaks in UTRs and intronic regions in protein-coding genes. The following parameters were used: -rna -minw 5 -maxw 15 -pvt 0.05 -totallength 1e7 -time 18000 -patience 6

Additional transcript features analysis
To compare read mapping locations within transcripts, a window of 25nt around the start codon was subtracted from annotated 5 UTRs and CDS. 5 UTRs and CDS regions in genomic and transcriptomic space were retrieved using Ribo-seQC. Counts on 5 UTR and CDS were first averaged between replicates. The ratio 5 UTR to CDS of these counts were calculated for each gene, in the siRNA and controls condition. The log 2 of the ratio siDDX3/control for those values represents the skew of counts towards 5 UTR in the siDDX3 condition: RNA in silico folding was performed on 5 UTRs sequences using RNAlfold (41) with default parameters. Average G values per nucleotide were calculated averaging the G values of each structure overlapping that nucleotide. %GC content and T>C transition specificity (defined as ConversionEventCount / (ConversionEventCount + Non-ConversionEventCount)) for each PAR-CLIP peak were derived using the clusters.csv output file from PARalyzer. Gviz was used to plot tracks for RNA-seq, Ribo-seq and PAR-CLIP over different transcripts. The Wilcoxon rank sum was used for statistical testing and Cliff's delta was used to calculate effect sizes as described (42).
Source code to reproduce figures can be found at: https://github.com/lcalviell/DDX3X RPCLIP Degron RP: DDX3X-mAID tagged HCT 116 (one 15 cm dish at 80-90% confluency per replicate, two replicates) cells expressing OsTIR1 were transfected with either wild-type DDX3X or DDX3X R326H. 24 hours post-transfection, media was changed and fresh media with 500 M indole 3-acetic acid (IAA) was added to cells. Un-transfected cells were treated with either DMSO or IAA. Forty-eight hours after auxin addition, cells were treated with 100 g/ml cycloheximide (CHX) for two minutes to trap ribosomes and harvested and lysed as described in (43). Briefly, cells were washed with PBS containing 100 g/ml CHX and lysed in ice-cold lysis buffer (20 mM Tris-HCl pH 7.4, 150 mM NaCl, 5 mM MgCl 2 , 1 mM DTT, 100 g/ml CHX, 1% (v/v) Triton X-100, 25 U/ml TurboDNase (Ambion). 240 l lysate was treated with 6 l RNase I (Ambion, 100 U/l) for 45 min at RT with gentle agitation and further digestion halted by addition of SUPERase:In (Ambion). Illustra Microspin Columns S-400 HR (GE healthcare) were used to enrich for monosomes, and RNA was extracted from the flow-through using Direct-zol kit (Zymo Research). Gel slices of nucleic acids between 24 and 32 nts long were excised from a 15% urea-PAGE gel. Eluted RNA was treated with T4 PNK and preadenylated linker was ligated to the 3 end using T4 RNA Ligase 2 truncated KQ (NEB, M0373L). Linker-ligated footprints were reverse transcribed using Superscript III (Invitrogen) and gel-purified RT products circularized using CircLigase II (Lucigen, CL4115K). rRNA depletion was performed using biotinylated oligos as described in (44) and libraries constructed using a different reverse indexing primer for each sample.

PAR-CLIP experiments
Flp-In T-REx HEK 293 cells expressing FLAG/HA-tagged DDX3X (45) were labeled with 100 M 4-thiouridine for Nucleic Acids Research, 2021, Vol. 49, No. 9 5339 16h. PAR-CLIP was performed generally as described (46,47). Briefly, cells were UV-crosslinked with 0.15 J/cm 2 at 365 nm, and stored at -80 • C. Obtained cell pellets were lysed in three times the cell pellet volume of NP-40 lysis buffer (50 mM HEPES-KOH at pH 7.4, 150 mM KCl, 2 mM EDTA, 1 mM NaF, 0.5% (v/v) NP-40, 0.5 mM DTT, complete EDTA-free protease inhibitor cocktail (Roche)), incubated 10 min on ice and centrifuged at 13000 rpm for 15 min at 4 • C. Supernatants were filtered through 5 m syringe filter. Next, lysates were treated with RNase I (Thermo-Fisher Scientific) at final concentration of 0.25 U/l for 10 min at room temperature. Immunoprecipitation of the DDX3/RNA complexes was performed with FLAG magnetic beads (Sigma). After IP and washing, the protein-bound RNAs were 3 dephosphorylated and 5 -end phosphorylated using T4 PNK with 0.01% Triton X-100, and the NIR fluorescent adaptor (5 -OH-AGATCGGAAGAGCGGTTCAGAAAAAA AAAAAA/iAzideN/AAAAAAAAAAAA/3Bio/-3 ) was ligated to the RNA using truncated RNA ligase 2 K227Q (NEB) overnight at 16 • C, shaking at 1600 rpm. Crosslinked protein-RNA complexes were resolved on a 4-12% Nu-PAGE gel (Thermo-Fisher Scientific) and transferred to a nitrocellulose membrane. Protein-RNA complex migrating at an expected molecular weight were excised, and RNA by proteinase K (Roche) treatment and phenolchloroform extraction. Purified RNA was further ligated to 5 adapters, reverse transcribed and PCR amplified. The amplified cDNA was sequenced on a NextSeq 500 device (Illumina).

In vitro transcription, capping, and 2 -O methylation of reporter RNAs
Annotated 5 UTRs for selected transcripts were cloned upstream of Renilla Luciferase (RLuc) under the control of a T7 promoter, with 60 adenosine nucleotides downstream of the stop codon to mimic polyadenylation. 5 UTR sequences are in Supplementary Table S4. Untranslated regions were cloned using synthetic DNA (Integrated DNA Technologies) or by isolation using 5 RACE (RLM-RACE kit, Invitrogen). Template was PCR amplified using Phusion polymerase from the plasmids using the following primers, and gel purified, as described (42). pA60 txn rev : TTT TTT TTT TTT TTT TTT TTT TTT  TTT TTT TTT TTT TTT TTT TTT TTT TTT TTT TTT  TTT CTG CAG pA60 txn fwd: CGG CCA GTG AAT TCG AGC TCT AAT ACG ACT CAC TAT AGG 100 l in vitro transcription reactions were set up at room temperature with 1-5 g of purified template, 7.5 mM ACGU ribonucleotides, 30 mM Tris-Cl pH 8.1, 125 mM MgCl 2 , 0.01% Triton X-100, 2 mM spermidine, 110 mM DTT, T7 polymerase and 0.2 U/l units of Superase-In RNase inhibitor (Thermo-Fisher Scientific). Transcription reactions were incubated in a PCR block at 37 • C for 1 h. 1 l of 1 mg/ml pyrophosphatase (Roche) was added to each reaction, and the reactions were subsequently incubated in a PCR block at 37 • C for 3 h. 1 unit of RQ1 RNase-free DNase (Promega) was added to each reaction followed by further incubation for 30 minutes. RNA was precipitated by the addition of 200 l 0.3 M NaOAc pH 5.3, 15 g Gly-coBlue co-precipitant (Thermo-Fisher Scientific) and 750 l 100% EtOH. Precipitated RNA was further purified over the RNA Clean & Concentrator-25 columns (Zymo Research). Glyoxal gel was run to assess the integrity of the RNA before subsequent capping and 2 O-methylation.

Preparation of cellular extracts for in vitro translation
Three to five 150 mm plates of HEK 293T or HCT 116 cells were trypsinized and pelleted at 1000g, 4 • C. One cell-pellet volume of lysis buffer (10 mM HEPES, pH 7.5, 10 mM KOAc, 0.5 mM MgOAc 2 , 5 mM DTT, and 1 tablet Complete mini EDTA free protease inhibitor (Sigma) per 10 ml) was added to the cell pellet and was incubated on ice for 45 min. The pellet was homogenized by trituration through a 26G needle attached to a 1 ml syringe 13-15 times. Efficiency of disruption was checked by trypan blue staining (>95% disruption target). The lysate was cleared by centrifugation at 14000g for 1 min at 4 • C, 2-5 l was reserved for western blot analysis, and the remainder was aliquoted and flash frozen in liquid nitrogen.

In vitro translation
5 l in vitro translation reactions were set up with 2.5 l of lysate and 20 ng total RNA (0.84 mM ATP, 0.21 mM GTP, 21 mM creatine phosphate, 0.009 units/ml creatine phosphokinase, 10 mM HEPES pH 7.5, 2 mM DTT, 2 mM MgOAc, 100 mM KOAc, 0.008 mM amino acids, 0.25 mM spermidine, 5 units RNasin Plus RNase inhibitor (Promega) as described (48). Reaction tubes were incubated at 30 • C for 45 min, and expression of the reporter was measured using the Renilla Luciferase Assay System (Promega) on a GloMax Explorer plate reader (Promega).

Identifying mRNAs that depend on DDX3 for efficient translation
We performed ribosome profiling and RNA-seq to determine the set of transcripts that are affected by depletion of DDX3. DDX3X is an essential gene (26,49), so we transiently knocked down its expression using siRNA and collected ribosome protected footprints in duplicate experiments ( Figure 1A, B, Supplementary Figure S1A). Knockdown efficiencies were ∼90% and ∼70% in replicates (Figure 1B). Measuring changes in both RNA abundance and ribosome occupancy enabled us to distinguish between different modes of DDX3-mediated regulation. We found that depletion of DDX3 affects ribosome occupancy of a minority of messages ( Figure 1C). Most changes in ribosome occupancy upon DDX3 depletion were decreases, broadly suggesting that the function of DDX3 is to increase ribosome occupancy. Genes such as DVL2, NT5DC2 and ODC1, which is described to be translationally-controlled (50), decreased in ribosome occupancy upon DDX3 depletion ( Figure 1C, D, Supplementary Table S1). Diverse biological pathways were affected by DDX3 depletion, revealing the enrichment of histone mRNAs among the few examples of translationally upregulated transcripts, which might reflect a resistance to a widespread translation suppression rather than an increase in protein synthesis. Genes related to neuronal branching belonged to the translationally downregulated set (Supplementary Figure S1B). To confirm the effects of DDX3 depletion on the translated transcriptome, we also established a cell line to rapidly and efficiently degrade endogenous DDX3 in human male-derived colorectal cancer HCT 116 cells, a near-diploid cell line amenable to genome engineering, upon addition of auxin (Supplementary Figure S2A; (51). As with the siRNA knockdown, induced degradation of DDX3 predominantly resulted in a marked decrease in the translation of a subset of cellular messages (Supplementary Figure S2B). Translation efficiency (TE) changes were more similar than steady state RNA levels upon siRNA knockdown or chemical degradation (Supplementary Figure S2C, Supplementary Table S1), even though these experiments were performed in different cell lines (HEK 293T versus HCT 116) and with different depletion approaches, affirming DDX3 function in translation regulation.
Our data suggest that DDX3 directly affects translation of a subset of mRNAs. However, ribosome profiling measures ribosome density, which can be affected by changes to translation initiation, translation elongation, or ribosome stalling. DDX3 is thought to regulate translation through transcript 5 UTRs, and we found genes that are regulated by their 5 UTRs such as ODC1 in the translationally downregulated set ( Figure 1C; (50,52). To test whether altered translation initiation contributes to the impact of DDX3 knockdown on ribosome density, we cloned DDX3sensitive 5 UTRs from this and previous work (3,25,53) upstream of a Renilla luciferase reporter and compared them to a control reporter that is not sensitive to DDX3 depletion. Since DVL2 has many annotated 5 UTRs that overlap, we cloned the prevalent isoforms in HEK 293T cells using 5 RACE, which yielded a short and long isoform (Materials and Methods). We then made translation extracts from HEK 293T cells transfected with a nontargeting control siRNA or a DDX3 siRNA ( Figure 1E). Next, reporter RNAs were in vitro transcribed, capped, and 2 -O methylated and used for in vitro translation in wild-type or DDX3-depleted lysate. We found that the 5 UTRs from the DDX3-sensitive mRNAs ODC1, PRKRA, RAC1 and DVL2 also conferred DDX3 dependence to the luciferase reporter ( Figure 1F). However, other reporter RNAs, such as ATF5 or and RPLP1, did not change in the in vitro translation upon DDX3 knockdown. RPLP1 was identified as an mRNA with uORF occupancy changes upon mutant DDX3 expression in prior work (3), while ATF5 was previously implicated in DDX3-dependent translation (25,53). To compare in vitro translation with translation in cellular contexts, we transfected the same reporter mR-NAs into HCT 116 cells with DDX3 depleted using our degron system ( Figure 1G, Supplementary Figure S2D). The results of the in vitro and cellular translation assays show a high degree of concordance, with the exception of the CCNE1 5 UTR, which was previously shown to require DDX3 for efficient translation (25). Therefore, based on these reporter experiments, we interpret ribosome occupancy changes upon DDX3 depletion as a result of misregulated translation initiation dynamics and refer to them as translation efficiency (TE).

Defining the features that mediate DDX3-dependent translation
The in vitro and cellular translation experiments implicated translation initiation and transcript 5 UTRs in mRNAs that are sensitive to DDX3 depletion. To determine which features contribute to quantitative changes in translation upon knockdown of DDX3, we used known translationalcontrol elements to generate a random forest regression model. A model with 28 features (Materials and Methods) was able to moderately predict the translation changes upon DDX3 knockdown (correlation between predicted and ob-served changes = 0.54, Supplementary Figure S3A), with few features driving the model performance (Figure 2A, Materials and Methods), such as baseline translation levels, GC content in coding sequences and 5 UTRs, and density of the CERT motif. The CERT motif is a cytosinerich element that has been implicated in eIF4E-and eIF4Adependent translation through incompletely understood mechanisms (54,55). Interestingly, mRNAs that are sensitive to DDX3 depletion appear to be poorly translated in HEK 293T cells ( Figure 2B).  Figure 1D. performed similarly; conversely, a model built without using sequence predictors or baseline translation levels could not recapitulate translation downregulation effects (Supplementary Figure S3B, S3C). 5 UTR and coding sequence GC content may be indications of increased RNA structure in these regions, possibly relevant for other aspects of cytoplasmic RNA processing (Discussion).
To test the ability of the random forest to select relevant features predictive of DDX3-mediated translation changes, we compared it to lasso regression, another method used to perform feature selection among a set of correlated predictors (Materials and Methods). The two methods largely agreed in pinpointing relevant features, with the random forest slightly outperforming the lasso ( Supplementary Figure S3D, E). The features predicted to drive DDX3 sensitivity also showed different distributions among the regulated mRNAs, especially in the translationally downregulated set, indicating that the features identified by the random forest are indeed different between sets of transcripts ( Figure 2B).
We measured the enrichment of ribosomes in transcript 5 UTRs, under the hypothesis that depletion of DDX3 might lead to defective scanning and ribosome accumulation (1), or selective ribosome depletion on coding sequences. Indeed, we found more ribosomes in transcript 5 UTRs relative to coding sequences upon DDX3 depletion ( Figure  2C), especially in mRNAs that show translational downregulation. As an example, HMBS ribosome occupancy is shown in Figure 2D, which showed changes in ribosome density in its 5 UTR and therefore may be regulated by upstream ORF (uORF) translation.

DDX3 crosslinks to ribosomal RNA and 5 UTRs
The above ribosome profiling and RNA-seq experiments identified the set of transcripts that are affected by DDX3 depletion, but these transcripts could be affected by direct or indirect mechanisms. Therefore, to better define the set of transcripts that are direct targets of DDX3, we measured DDX3 binding sites with high specificity using PAR-CLIP ( Figure 3A, Supplementary Figure S1A). Previous work using a complementary method (iCLIP) to measure DDX3 binding sites identified 5 UTR and ribosomal RNA binding. Curiously, even though DDX3 is thought to regulate translation initiation, binding was also identified in coding sequences and 3 UTRs (3,12). Here, we used the additional specificity afforded by T>C transitions induced by protein adducts on crosslinked uridine residues in PAR-CLIP to refine DDX3 binding sites across the transcriptome (46).
High-throughput sequencing of RNA fragments crosslinked to DDX3 identified a binding site for DDX3 on the 18S ribosomal RNA ( Figure 3B; visualized on the structure (56). It is possible that these rRNA reads could arise from nonspecific interactions between RNA binding proteins and the highly abundant rRNA. However, while there were many rRNA fragments sequenced following PAR-CLIP, there was only one site with high-confidence T>C transitions, spanning nucleotides 527-553 in the 18S rRNA ( Figure 3B, Supplementary Figure S4A). This site maps to helix 16 of the 18S rRNA, similar to where Ded1 crosslinks to 18S rRNA in yeast, and does not contain post-transcriptionally modified rRNA nucleotides (1,57).
Helix 16 (h16) is on the small ribosomal subunit facing incoming mRNA, which might provide DDX3 access to resolve mRNA secondary structures to facilitate inspection by the scanning 43S complex ( Figure 3C). The crosslink site on h16 is just opposite an RRM domain that has been assigned as eIF4B, another factor crucial in ribosome recruitment and scanning (56,58,59). This is consistent with observations that eIF4B and Ded1 cooperate in translation initiation on mRNAs (58). Recently, it has been proposed that this RRM domain may belong to eIF3g, another translation initiation factor (60), which is consistent with a reported interaction between eIF3 and DDX3 (26).
In addition to ribosomal RNA binding, we found that DDX3 interacts primarily with coding transcripts ( Figure  4A, Supplementary Table S2). To identify where DDX3 binds mRNAs, we aggregated peaks across all expressed transcripts in a metagene analysis. We found that DDX3 primarily contacts transcript 5 UTRs, with a small number of reads mapping in the coding sequence and 3 UTR ( Figure 4B). A large peak was also observed at the start codon, which could reflect kinetic pausing during subunit joining while DDX3 is still bound to the initiating ribosome (61). We used available CLIP data to compare the binding pattern of DDX3 to other known mRNA-binding proteins (39). We selected three RNA-binding proteins to compare to: eIF3b is a member of the multi-subunit initiation factor eIF3 (62), FMR1 interacts with elongating ribosomes (63), and MOV10 is involved in 3 UTR-mediated mRNA decay (64,65). The binding pattern of DDX3 most closely resembles the initiation factor eIF3b ( Figure 4B). We detected some DDX3 binding within coding sequences and even 3 UTRs, which could arise from background signal, or alternative binding modes. PAR-CLIP indicated that DDX3 binds abundant mRNAs (Supplementary Figure S4B), including ribosomal protein genes. While this observation can provide insights into the molecular functions of DDX3, it might reflect a limit in the sensitivity of our PAR-CLIP data. Therefore, we decided to investigate PAR-CLIP binding patterns which are not strongly confounded by expression levels. We used the frequency of T>C transitions at each site as a measure of the specificity of protein-RNA interactions (66). High specificity crosslinks with frequent T>C transitions resided most often in 5 UTRs (Figure 4C), as also shown in a translationally-regulated transcript such as ODC1 ( Figure 4D). Confirmation of this binding pattern comes from an independent assay of protein-RNA interaction, as measured by enhanced CLIP (eCLIP) (Supplementary Figure S4C; 67).
Next, we sought to describe the mRNA regions with enriched DDX3 binding. DEAD-box RNA helicases engage RNA by recognizing structural elements with poorly defined sequence context (68), which hinders the ability to extract meaningful sequence motifs from CLIP data. To investigate this phenomenon, we performed de novo motif finding on PAR-CLIP peaks in HEK 293 cells from the POSTAR2 repository (39; Materials and Methods). As expected, motifs extracted from different RBPs showed different degrees of specificity (Supplementary Figure S4D). DDX3 motifs, together with motifs from translation initiation factors and other ribosome interactors, showed poor significance when compared to other RBPs with more de- fined sequence specificity (such as members of cleavage and polyadenylation machinery, or known 3 UTR binders). This result called for a more targeted analysis of DDX3 binding sites. By investigating the sequence-structure context around mRNA peaks in 5 UTRs, we observed that DDX3 binding sites by PAR-CLIP reside in highly structured regions ( Figure 4E). We observed a higher guanine content (accompanied by predicted G-quadruplexes; Supplementary Figures S4E, F) upstream of the binding site; downstream of the peak summit, we detected high GC content resembling the CERT motif ( Figure 4F), a regulatory motif highly predictive of DDX3-mediated translation regulation ( Figure 2A). Moreover, we observed increased T>C conversion specificity in the 5 UTR of transcripts whose translation decreases upon DDX3 depletion (Figure 4G), indicative of a possibly stronger protein-RNA association at those regions. This binding pattern, combined with the observation that DDX3 co-fractionates with both polysomes and initiation complexes, suggests that DDX3 acts with the 40S during the process of translation initiation (2,(69)(70)(71). Taken together, we conclude that the DDX3 binds and regulates the translation of poorly translated mR-NAs exhibiting complex sequence-structure features in their 5 UTRs.

Identifying translation changes caused by DDX3 mutations
De novo genetic variants in DDX3X cause developmental delay and intellectual disability in DDX3X-syndrome (13)(14)(15)(16). Interestingly, patients carrying inactivating point mutations in DDX3 display more severe clinical symptoms than patients with truncating mutations (16). Point mutations in DDX3 associated with medulloblastoma are dominant negative and act by preventing enzyme closure of DDX3 towards the high-RNA-affinity ATP-bound state (6)(7)(8)(9)(10)(11). This suggests there may be different effects on translation between depletion of DDX3 and inhibition or expression of an inactive mutant.
To test the effect of mutants in DDX3 on translation, we transfected cells with plasmids containing wild-type or mutant DDX3 proteins after auxin-induced degradation of endogenous DDX3, switching expression from the wildtype sequence to an allele of interest ( Figure 5A). We used this system to define acute changes to translation caused by DDX3 mutations without allowing the cells to adapt to the presence of inactive DDX3 alleles, which may also be lethal. We measured genome-wide translation changes caused by DDX3 mutants using ribosome profiling (Supplementary Table S1). A collection of genes related to the double-stranded DNA response were upregulated at the level of RNA abundance, likely due to differences in transfected DNA amounts ( Figure 5B, upper right). Broadly, we noticed higher variability in in ribosome occupancy changes ( Figure 5B, y-axis) than RNA level changes (Figure 5B, x-axis) when compared to the siDDX3 experiments ( Figure 1C), suggesting that one functional difference between mutant and knockdown might involve regulation of RNA steady-state levels. Among the few regulated mRNAs, we observed a robust downregulation of ODC1 translation, which was directly bound by DDX3 ( Figure 4D) and strongly downregulated in the DDX3 depletion experiment ( Figure 1C).  Supplementary Table S4. To further test the difference between knockdown and mutant DDX3 at the level of individual 5 UTRs, we made in vitro translation extracts with wild-type DDX3, R326H DDX3, or R534H DDX3 and measured translation of a panel of reporters ( Figure 5C; (3)). Broadly, we found two classes of reporter RNAs ( Figure 5D; Supplementary Figure S5A, B). One class, including ODC1, PRKRA, RAC1 and DVL2 isoforms decreased in translation in all tested perturbations to DDX3. Another class, including ATF5, CCNE1 and RPLP1 selectively decreased in translation in mutant DDX3 extracts but not in knockdown extracts. We sought to test chemical inhibition of DDX3, as it functionally mimics a dominant negative mutation by blocking ATP binding. We were unable to inhibit translation in a DDX3-dependent manner using RK-33 (72), and instead found that it acted as a general translation inhibitor (Supplementary Figure S5C). Taken together, we found that DDX3 sensitivity for translation is preserved in translation extracts and that depletion of DDX3 appears to have different outcomes on translation than inhibition or dominant negative variants.

DISCUSSION
DDX3X is an essential human gene that is altered in diverse diseases. Here, we use a set of transcriptomics approaches, machine learning and biochemistry to show that DDX3 regulates a subset of the human transcriptome, likely through resolving RNA structures in 5 UTRs ( Figure 6). Reporter experiments show that 5 UTRs are sufficient to confer DDX3 sensitivity onto unrelated coding sequences. We conclude that DDX3 affects translation initiation through transcript 5 UTRs. Our data suggest that the major role of DDX3 is in translation initiation and reveal translation differences between mutated and haploinsufficient DDX3 expression.
We identified binding between DDX3 and helix 16 (h16) on the human 40S ribosome. This is similar to binding sites identified previously using other CLIP approaches (1,3,12), confirmed here using T>C transitions defined by PAR-CLIP. Interestingly, histone mRNAs showed sustained translation levels upon DDX3 knockdown ( Figure  1C). However, histone mRNAs represent a peculiar category of transcripts, often containing short UTRs, few in- Figure 6. A model of DDX3 in translational control. DDX3 binds to the small subunit via helix 16 (h16). Transcripts that are poorly translated in normal cells and that harbor increased intramolecular RNA structure (blue transcript; left) are sensitive to DDX3 depletion (right). Other mRNAs that are initially highly translated (black transcript) are unaffected.
trons, highly repetitive sequences and the lack a poly-A tail. The quantitative contribution of some of these features have been resolved by our regression approach, with good agreement between the lasso and the random forest models. However, we believe that more tailored approaches are needed to precisely investigate regulation of specific classes of mRNAs. For instance, histone mRNA translation has been recently shown to be dependent on mRNA binding to the rRNA h16 helix (73). Our data suggests that there might be competition for h16 between DDX3 and histone mR-NAs, and their translation increases upon DDX3 knockdown due to increased accessibility to h16. The set of mR-NAs that require h16 for their translation will be an interesting direction to pursue in the future.
Despite primarily affecting translation, some genes exhibited changes in steady-state RNA abundance upon DDX3 depletion. RNA-level changes could be mediated by indirect effects of a DDX3-dependent translation target, or reflect additional mechanisms of post-transcriptional gene regulation, possibly mediated by RNA structural features, codon composition, or other RNA decay pathways (74). Moreover, DDX3 is an important factor in stressgranule complexes (75), and possibly involved in granulespecific regulation of mRNA metabolism. Interestingly, we observed more RNA-level changes in DDX3-depletion experiments ( Figure 1C and Supplementary Figure S2B) than in experiments using a mutation in DDX3 ( Figure 5B). Potential interactions between DDX3 regulation of both ribosome occupancy and RNA levels will be further explored in future studies.
DDX3 is an abundant protein, with approximately 1.4 million copies per HeLa cell (76), or about half the abundance of ribosomes (77). We have interpreted data in this work by hypothesizing that DDX3 is functioning in cis by binding to the 40S ribosome and facilitating translation initiation on the associated mRNA ( Figure 6). It is also possi-ble that DDX3, alone or in combination with other DEADbox proteins like eIF4A, functions in trans by activating an mRNA prior to 43S complex loading. Future work defining the binding site of DDX3 on the ribosome could enable separation of cis and trans functions to test these two models, although we note that the functional consequences of DDX3 depletion we have observed here are independent of its functioning in cis or trans.
DDX3 is altered in numerous human diseases, including cancers and developmental disorders (5). Some diseases are characterized by missense variants (8)(9)(10)(11), while others involve predominantly nonsense or frameshift variants (17,18), and still others present with a mixture of variant types (13)(14)(15)(16). Our work suggests that variants in DDX3 that deplete protein levels may result in different translation changes than inactivating missense variants. We attempted to directly compare translational changes upon DDX3 depletion identified in this work with previous expression of mutant DDX3 but stopped due to confounding variability in biological sample and library preparation and sequencing protocols. Defining how different mutation types in DDX3 affect gene expression, the underlying molecular mechanisms, and potential therapeutic interventions is an intriguing direction for the future.

DATA AVAILABILITY
Processed datasets and code to reproduce the main figures can be found here: https://github.com/lcalviell/ DDX3X RPCLIP.
Sequencing data can be retrieved using GEO accession numbers GSE125114 and GSE157063.