-
PDF
- Split View
-
Views
-
Cite
Cite
Pan Li, Xiaolin Zhou, Kui Xu, Qiangfeng Cliff Zhang, RASP: an atlas of transcriptome-wide RNA secondary structure probing data, Nucleic Acids Research, Volume 49, Issue D1, 8 January 2021, Pages D183–D191, https://doi.org/10.1093/nar/gkaa880
- Share Icon Share
Abstract
RNA molecules fold into complex structures that are important across many biological processes. Recent technological developments have enabled transcriptome-wide probing of RNA secondary structure using nucleases and chemical modifiers. These approaches have been widely applied to capture RNA secondary structure in many studies, but gathering and presenting such data from very different technologies in a comprehensive and accessible way has been challenging. Existing RNA structure probing databases usually focus on low-throughput or very specific datasets. Here, we present a comprehensive RNA structure probing database called RASP (RNA Atlas of Structure Probing) by collecting 161 deduplicated transcriptome-wide RNA secondary structure probing datasets from 38 papers. RASP covers 18 species across animals, plants, bacteria, fungi, and also viruses, and categorizes 18 experimental methods including DMS-seq, SHAPE-Seq, SHAPE-MaP, and icSHAPE, etc. Specially, RASP curates the up-to-date datasets of several RNA secondary structure probing studies for the RNA genome of SARS-CoV-2, the RNA virus that caused the on-going COVID-19 pandemic. RASP also provides a user-friendly interface to query, browse, and visualize RNA structure profiles, offering a shortcut to accessing RNA secondary structures grounded in experimental data. The database is freely available at http://rasp.zhanglab.net.
INTRODUCTION
RNA is critical across biological processes and a range of cellular mechanisms act upon it to carefully regulate and refine gene expression (1). The specific secondary structures formed by non-coding RNAs (ncRNAs) are central to their regulation and functions (2–4). Recent studies have also found that mRNA secondary structures influence gene transcription, translation and decay (5). The secondary structure of many RNA viruses also have important functions. For example, the 3′UTRs of Flaviviruses produce highly structured noncoding RNAs that are resistant to host nucleases (6). As more and more functions for RNA secondary structure are discovered, deciphering the structures themselves has become a priority.
During the past few decades, many computational methods predicting RNA secondary structure have been developed (7–9). These methods only work well on shorter RNA sequences, and has been a major source of information for RNA structure studies (10). However, computational prediction and modeling usually cannot take into consideration the complex cellular environments and thus lack of the resolution for in vivo studies. Small molecule approaches have been long developed to quantitatively measure RNA conformation (11). In the last few years, thanks to development of high-throughput sequencing technology, RNA structure probing has entered the omics era, allowing simultaneous, transcriptome-wide measurement of RNA (leads to the results of the so-called ‘structuromes’), both in test tubes and in cellular conditions (12–14).
The principles underlying RNA probing largely fall into two categories: nuclease cleavage and small molecule-based probing (Figure 1). RNase P1, S1 and RNase V1 nucleases cut the single and double-stranded RNA respectively (15,16). During reverse transcription, cDNA synthesis stops at the cleavage site, revealing information about single-stranded and double-stranded nucleotides upon high-throughput sequencing. Small chemicals such as 1M7, DMS, N3-kethoxal and NAI-N3 can be used to specifically probe single-stranded RNA bases (17–22). Upon reverse transcription, the RT enzymes terminate at the modified site (17,18,20,21) or mis-incorporate nucleotides resulting from the chemical modification (19,22). By normalizing RT stop values or mutation rates, a structural score can be assigned to each base, measuring the likelihood of that base being single-stranded or double-stranded.

Schematic for RNA probing technology. Approaches for RNA secondary structure probing are based on either RNA nuclease digestion or small chemical modification.
RNA secondary structure databases such as RMDB (23) and RSVdb (24) have been developed, but they normally focus on specific datasets of a very limited coverage. For example, RMDB contains diverse RNA structural mapping experiments, but focuses on low-throughput experiments (23). RSVdb collects RNA structure data, is limited to DMS reagent-based datasets (24). Given the increasing volume of experimental RNA structure data (especially those using high-throughput technologies), and their broad relevance to biological processes, a comprehensive database is highly desired.
Here we describe a database, RASP, that collects 161 datasets from 38 papers (Table 1). RASP spans, categorizes and organizes 18 species across animals, plants, bacteria, fungi and viruses and 18 different experimental methods. RASP contains almost all currently published transcriptome-scale data, including the most recent studies that probed the RNA secondary structures of the genome of the SARS-CoV-2 RNA virus, with technologies varying from PARS (16,25), DMS-seq (17), Structure-seq (18), to SHAPE (26), icSHAPE (21) and SHAPE-MaP (27) etc. RASP provides a user-friendly interface to query, browse and download data. In addition, RASP implements analytical functions such as multiple sequence alignment, along with RNA secondary structure prediction and visualization. This dataset will greatly expand accessibility and cross-comparison of RNA secondary structures, empowering relevant researches across fields.
The data collection table. All the papers are classified with species. Please refer to Supplementary Table S1 for more details
Animals | Human | (2014, Nature, DMS-seq, 6 samples) (2014, Nature, PARS, 5 samples) (2016, Science, DMS-seq & SHAPE, 4 samples) (2016, Cell, icSHAPE, 2 samples) (2017, Cell, DIM-2P-seq, 1 sample) (2017, NatureMethods, DMS-MaPseq, 1 sample) (2019, Nature Structural & Molecular Biology, icSHAPE, 6 samples) (2020, Nature Chemical Biology, Keth-seq, 4 samples) |
Mouse | (2016, Science, DMS-seq & SHAPE, 11 samples) (2010, Nature Methods, FragSeq, 6 samples) (2014, Genome Biology, CIRS-seq, 1 sample) (2015, Nature, icSHAPE, 3 samples) (2015, Biochemistry, SHAPE-MaP, 2 samples) (2019, Nature Structural & Molecular Biology, icSHAPE, 6 samples) (2020, Nature Chemical Biology, Keth-seq, 6 smaples) | |
Zebrafish | (2018, Nature Structural & Molecular Biology, DMS-seq, 7 samples) (2020, Genome Biology, icSHAPE, 8 samples) | |
Plants | Rice | (2017, Nucleic Acids Research, Structure-seq2, 4 samples) |
Arabidopsis thaliana | (2014, Nature, Structure-seq, 1 sample) | |
Bacteria & Fungi | E. coli | (2016, Science, DMS-seq & SHAPE, 5 samples) (2017, eLife, DMS-seq, 7 samples) (2018, Cell, SHAPE-MaP, 3 smaples) |
Yeast | (2010, Nature, PARS, 1 sample) (2012, Molecular Cell, PARTE, 5 samples) (2014, Nature, DMS-Seq, 3 samples) (2016, Science, DMS-seq & SHAPE, 5 samples) (2017, Nature Methods, DMS-MaPseq, 6 samples) (2018, Nature, DMS-MaPseq, 2 samples) (2014, RNA, Mod-seq, 2 smaples) | |
P. putida | (2016, Science, DMS-seq & SHAPE, 5 samples) | |
Synechococcus | (2016, Science, DMS-seq & SHAPE, 4 samples) | |
Y. pseudotuberculosis | (2020, Nucleic Acids Research, Lead-seq, 2 samples) | |
Virus | HIV | (2009, Nature, SHAPE, 1 sample) (2014, Nature Methods, SHAPE-MaP, 3 samples) (2015, Plos Computational Biology, SHAPE-MaP, 1 sample) |
Dengue | (2019, Nature Communications, NAI-MaP, 1 sample) (2018, PNAS, SHAPE-MaP, 6 samples) | |
Zika | (2018, Cell Host & Microbe, icSHAPE, 2 sample) (2019, Nature Communications, NAI-MaP, 1 sample) | |
HCV | (2015, PNAS, SHAPE-MaP, 1 sample) (2016, Molecular Cell, SHAPE, 1 sample) | |
STMV | (2013, BioChemistry, SHAPE, 2 samples) | |
CMV | (2018, Nucleic Acids Research, SHAPE-seq, 2 samples) | |
IAV | (2019, Nucleic Acids Research, RAPiD-MaPseq, 2 samples) | |
SARS-CoV-2 | (2020, bioRxiv, SHAPE-MaP, 1 samples) (2020, bioRxiv, icSHAPE, 1 samples) (2020, bioRxiv, SHAPE-MaP, 2 samples) |
Animals | Human | (2014, Nature, DMS-seq, 6 samples) (2014, Nature, PARS, 5 samples) (2016, Science, DMS-seq & SHAPE, 4 samples) (2016, Cell, icSHAPE, 2 samples) (2017, Cell, DIM-2P-seq, 1 sample) (2017, NatureMethods, DMS-MaPseq, 1 sample) (2019, Nature Structural & Molecular Biology, icSHAPE, 6 samples) (2020, Nature Chemical Biology, Keth-seq, 4 samples) |
Mouse | (2016, Science, DMS-seq & SHAPE, 11 samples) (2010, Nature Methods, FragSeq, 6 samples) (2014, Genome Biology, CIRS-seq, 1 sample) (2015, Nature, icSHAPE, 3 samples) (2015, Biochemistry, SHAPE-MaP, 2 samples) (2019, Nature Structural & Molecular Biology, icSHAPE, 6 samples) (2020, Nature Chemical Biology, Keth-seq, 6 smaples) | |
Zebrafish | (2018, Nature Structural & Molecular Biology, DMS-seq, 7 samples) (2020, Genome Biology, icSHAPE, 8 samples) | |
Plants | Rice | (2017, Nucleic Acids Research, Structure-seq2, 4 samples) |
Arabidopsis thaliana | (2014, Nature, Structure-seq, 1 sample) | |
Bacteria & Fungi | E. coli | (2016, Science, DMS-seq & SHAPE, 5 samples) (2017, eLife, DMS-seq, 7 samples) (2018, Cell, SHAPE-MaP, 3 smaples) |
Yeast | (2010, Nature, PARS, 1 sample) (2012, Molecular Cell, PARTE, 5 samples) (2014, Nature, DMS-Seq, 3 samples) (2016, Science, DMS-seq & SHAPE, 5 samples) (2017, Nature Methods, DMS-MaPseq, 6 samples) (2018, Nature, DMS-MaPseq, 2 samples) (2014, RNA, Mod-seq, 2 smaples) | |
P. putida | (2016, Science, DMS-seq & SHAPE, 5 samples) | |
Synechococcus | (2016, Science, DMS-seq & SHAPE, 4 samples) | |
Y. pseudotuberculosis | (2020, Nucleic Acids Research, Lead-seq, 2 samples) | |
Virus | HIV | (2009, Nature, SHAPE, 1 sample) (2014, Nature Methods, SHAPE-MaP, 3 samples) (2015, Plos Computational Biology, SHAPE-MaP, 1 sample) |
Dengue | (2019, Nature Communications, NAI-MaP, 1 sample) (2018, PNAS, SHAPE-MaP, 6 samples) | |
Zika | (2018, Cell Host & Microbe, icSHAPE, 2 sample) (2019, Nature Communications, NAI-MaP, 1 sample) | |
HCV | (2015, PNAS, SHAPE-MaP, 1 sample) (2016, Molecular Cell, SHAPE, 1 sample) | |
STMV | (2013, BioChemistry, SHAPE, 2 samples) | |
CMV | (2018, Nucleic Acids Research, SHAPE-seq, 2 samples) | |
IAV | (2019, Nucleic Acids Research, RAPiD-MaPseq, 2 samples) | |
SARS-CoV-2 | (2020, bioRxiv, SHAPE-MaP, 1 samples) (2020, bioRxiv, icSHAPE, 1 samples) (2020, bioRxiv, SHAPE-MaP, 2 samples) |
The data collection table. All the papers are classified with species. Please refer to Supplementary Table S1 for more details
Animals | Human | (2014, Nature, DMS-seq, 6 samples) (2014, Nature, PARS, 5 samples) (2016, Science, DMS-seq & SHAPE, 4 samples) (2016, Cell, icSHAPE, 2 samples) (2017, Cell, DIM-2P-seq, 1 sample) (2017, NatureMethods, DMS-MaPseq, 1 sample) (2019, Nature Structural & Molecular Biology, icSHAPE, 6 samples) (2020, Nature Chemical Biology, Keth-seq, 4 samples) |
Mouse | (2016, Science, DMS-seq & SHAPE, 11 samples) (2010, Nature Methods, FragSeq, 6 samples) (2014, Genome Biology, CIRS-seq, 1 sample) (2015, Nature, icSHAPE, 3 samples) (2015, Biochemistry, SHAPE-MaP, 2 samples) (2019, Nature Structural & Molecular Biology, icSHAPE, 6 samples) (2020, Nature Chemical Biology, Keth-seq, 6 smaples) | |
Zebrafish | (2018, Nature Structural & Molecular Biology, DMS-seq, 7 samples) (2020, Genome Biology, icSHAPE, 8 samples) | |
Plants | Rice | (2017, Nucleic Acids Research, Structure-seq2, 4 samples) |
Arabidopsis thaliana | (2014, Nature, Structure-seq, 1 sample) | |
Bacteria & Fungi | E. coli | (2016, Science, DMS-seq & SHAPE, 5 samples) (2017, eLife, DMS-seq, 7 samples) (2018, Cell, SHAPE-MaP, 3 smaples) |
Yeast | (2010, Nature, PARS, 1 sample) (2012, Molecular Cell, PARTE, 5 samples) (2014, Nature, DMS-Seq, 3 samples) (2016, Science, DMS-seq & SHAPE, 5 samples) (2017, Nature Methods, DMS-MaPseq, 6 samples) (2018, Nature, DMS-MaPseq, 2 samples) (2014, RNA, Mod-seq, 2 smaples) | |
P. putida | (2016, Science, DMS-seq & SHAPE, 5 samples) | |
Synechococcus | (2016, Science, DMS-seq & SHAPE, 4 samples) | |
Y. pseudotuberculosis | (2020, Nucleic Acids Research, Lead-seq, 2 samples) | |
Virus | HIV | (2009, Nature, SHAPE, 1 sample) (2014, Nature Methods, SHAPE-MaP, 3 samples) (2015, Plos Computational Biology, SHAPE-MaP, 1 sample) |
Dengue | (2019, Nature Communications, NAI-MaP, 1 sample) (2018, PNAS, SHAPE-MaP, 6 samples) | |
Zika | (2018, Cell Host & Microbe, icSHAPE, 2 sample) (2019, Nature Communications, NAI-MaP, 1 sample) | |
HCV | (2015, PNAS, SHAPE-MaP, 1 sample) (2016, Molecular Cell, SHAPE, 1 sample) | |
STMV | (2013, BioChemistry, SHAPE, 2 samples) | |
CMV | (2018, Nucleic Acids Research, SHAPE-seq, 2 samples) | |
IAV | (2019, Nucleic Acids Research, RAPiD-MaPseq, 2 samples) | |
SARS-CoV-2 | (2020, bioRxiv, SHAPE-MaP, 1 samples) (2020, bioRxiv, icSHAPE, 1 samples) (2020, bioRxiv, SHAPE-MaP, 2 samples) |
Animals | Human | (2014, Nature, DMS-seq, 6 samples) (2014, Nature, PARS, 5 samples) (2016, Science, DMS-seq & SHAPE, 4 samples) (2016, Cell, icSHAPE, 2 samples) (2017, Cell, DIM-2P-seq, 1 sample) (2017, NatureMethods, DMS-MaPseq, 1 sample) (2019, Nature Structural & Molecular Biology, icSHAPE, 6 samples) (2020, Nature Chemical Biology, Keth-seq, 4 samples) |
Mouse | (2016, Science, DMS-seq & SHAPE, 11 samples) (2010, Nature Methods, FragSeq, 6 samples) (2014, Genome Biology, CIRS-seq, 1 sample) (2015, Nature, icSHAPE, 3 samples) (2015, Biochemistry, SHAPE-MaP, 2 samples) (2019, Nature Structural & Molecular Biology, icSHAPE, 6 samples) (2020, Nature Chemical Biology, Keth-seq, 6 smaples) | |
Zebrafish | (2018, Nature Structural & Molecular Biology, DMS-seq, 7 samples) (2020, Genome Biology, icSHAPE, 8 samples) | |
Plants | Rice | (2017, Nucleic Acids Research, Structure-seq2, 4 samples) |
Arabidopsis thaliana | (2014, Nature, Structure-seq, 1 sample) | |
Bacteria & Fungi | E. coli | (2016, Science, DMS-seq & SHAPE, 5 samples) (2017, eLife, DMS-seq, 7 samples) (2018, Cell, SHAPE-MaP, 3 smaples) |
Yeast | (2010, Nature, PARS, 1 sample) (2012, Molecular Cell, PARTE, 5 samples) (2014, Nature, DMS-Seq, 3 samples) (2016, Science, DMS-seq & SHAPE, 5 samples) (2017, Nature Methods, DMS-MaPseq, 6 samples) (2018, Nature, DMS-MaPseq, 2 samples) (2014, RNA, Mod-seq, 2 smaples) | |
P. putida | (2016, Science, DMS-seq & SHAPE, 5 samples) | |
Synechococcus | (2016, Science, DMS-seq & SHAPE, 4 samples) | |
Y. pseudotuberculosis | (2020, Nucleic Acids Research, Lead-seq, 2 samples) | |
Virus | HIV | (2009, Nature, SHAPE, 1 sample) (2014, Nature Methods, SHAPE-MaP, 3 samples) (2015, Plos Computational Biology, SHAPE-MaP, 1 sample) |
Dengue | (2019, Nature Communications, NAI-MaP, 1 sample) (2018, PNAS, SHAPE-MaP, 6 samples) | |
Zika | (2018, Cell Host & Microbe, icSHAPE, 2 sample) (2019, Nature Communications, NAI-MaP, 1 sample) | |
HCV | (2015, PNAS, SHAPE-MaP, 1 sample) (2016, Molecular Cell, SHAPE, 1 sample) | |
STMV | (2013, BioChemistry, SHAPE, 2 samples) | |
CMV | (2018, Nucleic Acids Research, SHAPE-seq, 2 samples) | |
IAV | (2019, Nucleic Acids Research, RAPiD-MaPseq, 2 samples) | |
SARS-CoV-2 | (2020, bioRxiv, SHAPE-MaP, 1 samples) (2020, bioRxiv, icSHAPE, 1 samples) (2020, bioRxiv, SHAPE-MaP, 2 samples) |
DATA COLLECTION AND PROCESSING
We retrieved published papers containing high-throughput RNA structure probing data from PubMed. For experiments containing multiple conditions, we collected datasets for each condition separately. We classified datasets according by species and experimental technology used (Table 1). Detailed information including species, cell line, reagents used in the experiment, and publication information was also collected, and used for classification during construction of the RASP database (Supplementary Table S1).
In principle, all RNA secondary structure probing methods generate a structure score (called with different names) to provide a measure of the pairing probability of each nucleotide (14). We directly downloaded the structural scores from publications and integrated these data into RASP if the processed structural score is provided (Figure 2, right). If not, we downloaded raw data and used the same pipeline as the original paper to calculate the scores (Figure 2, left). This data processing information, along with the transcript numbers with structural scores is listed in Supplementary Table S1.

DATABASE CONTENT AND USAGE
RASP provides a rich number of approaches to interact with the data stored in the database, including search, browse, download and functions including structure prediction, multiple sequences alignment and cross-dataset comparison (Figure 2, bottom).
Search interface for retrieving gene and sequence
RASP provides two kinds of inquiry modes: ‘search gene’ and ‘search sequence’ (Figure 3A). In the gene-based inquiry mode, a user first selects one or more species, then inputs a gene symbol (such as GAPDH) or ensembl gene ID or transcript ID of interest. Clicking ‘Search Gene’ gives the user a match list (Figure 3B). Each match item includes the organism name, the genome location, the match string (gene or transcript), the gene symbol, a transcript list and a genome browser link. The user has two options here: click any specific transcript to visualize the transcript sequence and structure score (see ‘Visualize sequence and structure data for transcript’ section below) or click ‘Go’ to visualize in genome browser (see ‘Genome browser’ section). In the sequence-based inquiry mode, a user can select a species and input a DNA or RNA sequence, and click the ‘Search Sequence’, to search for the inquiry sequence in the genome by using blastn (28). The action will return the user a hit list (Figure 3C). Each hit item includes the organism name, the genome location, the E-value, the match between query sequence and target sequence, the gene symbol, a transcript list and a genome browser link. As described above, the user can visualize data in the genome or transcript by clicking the ‘Go’ button or any transcript.

Gene-based and sequence-based inquiry mode. (A) Search interface for retrieving gene and sequence. (B, C) Result interface for gene search (B) and sequence search (C).
Genome browser
RASP integrates Jbrowse (29) which allows users to visualize and compare structure scores in the genome (Figure 4A). On the ‘Browse’ page, users select a species and input the gene symbol or ensembl gene ID of interest. Clicking ‘Go’ refreshes the browser to display the gene region. There are two options for users to load structure data: (i) a ‘Click here to select datasets’ button to expand the selection panel; (ii) a ‘Select tracks’ button on the top left of the browser region to shift the browser selection panel. Users can filter the data based on all kinds of criteria including the name of the technology, the reagents used in the experiment, the journal where the paper was published, specific experimental conditions, the cell line, the strand, the experimental principle, etc. Users can filter out those tracks without structural data coverage in the genomic region by clicking the ‘Only show tracks with structural data coverage’ checkbox.

Structural score visualization. (A) Browser page. (B) Structure scores of the GAPDH and RPL32 transcripts. (C) Visualize the structure score on a transcript with nucleotides colored by the scores. Users can select a sequence region to copy scores, download data, or predict structure. In ‘Compare probing data’ panel, users can compare two datasets by a scatter plot. The structure score distribution is displayed in ‘Statistics’ panel.
Jbrowse also allows users to download a small amount of data by highlighting a region of interest, click ‘save track data’ in the track menu to save the structure score as a bedGraph file. Users can also rearrange tracks by dragging the track labels, and visualize their custom tracks by uploading local files or providing hyperlinks. Jbrowse also provides convenient ways for structure score comparison. Figure 4B shows two examples of the structure scores of the human GAPDH and yeast RPL32 transcripts obtained by different technologies. Users can also easily compare the structural differences of the same RNA under different conditions, which may help to visually identify the influence of different conditions on RNA structure.
Visualize sequence and structure data for a transcript
By clicking the transcript on the search results page (see ‘Search interface for retrieving gene and sequence’ section) (Figure 3B, C), users can skip to a new page to visualize the sequence and structure score (Figure 4C). There are five panels on this page. ‘Summary of transcript’ panel displays basic transcript information including genome location, the transcript biotype, etc. The ‘Selection’ panel contains buttons that allow selection of sequence regions and performs various operations on this area including copy, download, alignment and structure prediction. The ‘Sequence’ panel displays the full sequence of transcript, with the UTR highlighted in orange. Users can click any two bases to select a region and copy or download this region. To search a subsequence, users input the subsequence to the text area and click the ‘Search’ button. If successful, the subsequence will be selected. ‘Load probing data’ panel provides the function of loading probing data. The structure score is displayed with base-specific colors in the ‘Sequence’ panel. High structure scores are indicated in red, and low scores is in blue. The ‘Compare probing data’ panel draws scatter plots that compare structure scores between any two selected datasets. The Pearson correlation efficient is shown. If the user selects a region, only the data in the region will be displayed. The ‘statistics’ panel stores statistics and draw a distribution plot for the structure score of the full transcript or selected region.
Multiple sequences alignment and score comparison
If a user is interested in the structure of homologous sequences, they can compare the structure score of homologous sequences through the ‘Alignment’ page. Multiple sequences and corresponding structure scores can be input (Figure 5A), and upon clicking ‘Submit’, the RASP server uses muscle (30) to align the sequences and return a new page including the aligned sequences and the aligned structure scores (Figure 5B). The user can then click the ‘Download’ button to download the aligned sequences and structure scores. Users can also select a region of a transcript and click ‘Add’ button in the transcript summary page (Figure 4C) to save the data, and then directly load the data in the ‘Alignment’ page.

Sequence alignment and structure prediction. (A, B) The query (A) and result (B) interface of sequence alignment. (C, D) The query (C) and result (D) interface interface of structure prediction.
Structure prediction
Deigan et al. used grid search to fit slope and intercept based on the secondary structure and SHAPE scores of E. coli rRNA (31). we took the values they obtained (slope = 1.8, intercept = −0.6) as default parameters, but users can adjust these parameters to change the weight of the structure score. Users can also select a region of a transcript and click the ‘Predict’ button in the transcript summary page (Figure 4C) to directly jump to the ‘Predict’ page for structural prediction.
Clicking the ‘Submit’ button returns a page with the predicted structures (Figure 5D). ‘Summary of query information’ panel contains input sequences, constraints and parameters. ‘Prediction results’ panel present a list of predicted structures ranked by free energy. Users can visualize structure with forna (32) by clicking the ‘Go’ button, or copying a java command to visualize the structure with VARNA (33).
Download
The Download page provides genome reference sequence files, annotation files and structure score files. Processed structure data are saved in the bigWig format and bed format. The Bigwig format is a binary file format with genomic coordinates. bigWig files can be converted to text-format bedGraph files using the bigWigToBedGraph program from UCSC tools package (34). Users can refer to the ‘Help’ page for detailed operation steps. Users can also download the bed text-format files.
Special collection: RNA structure probing data of SARS-CoV-2
The recent outbreak COVID-19 has rapidly spread to the whole world and caused tremendous damage to our society and economy. COVID-19 is caused by a single-stranded, highly infectious RNA virus SARS-CoV-2. As previous studies on the other RNA viruses like HCV, HIV, Dengue virus, Zika virus have demonstrated that the secondary structures of their RNA genome are important for the life cycle of these viruses (26,35–37), much research efforts have been dedicated to determining the secondary structure of the SARS-CoV-2 RNA genome, by using RNA structure probing methods including SHAPE-MaP, DMS-MaPseq and icSHAPE (38–43).
Called on by the urgent need fighting against the on-going pandemic, we made a special effort for the collection of existing RNA secondary structure of the SARS-CoV-2 genome (Figure 6). We explored datasets associated with six manuscripts deposited on bioRxiv, including one study from our own laboratory (38) and found that two of the six studies disclosed their processed structural data (39,40). We have thus collected these data and integrated them into RASP. Users can easily visualize, compare and download the data through our database server. We have been actively monitoring the progress in the field and will continue to collect and update with the new results in the future.

A special page for the secondary structure data of the SARS-CoV-2 RNA genome.
DISCUSSION
High-throughput RNA secondary structure probing has raised a great amount of research interest (44), and large-scale RNA secondary structure datasets are accumulating rapidly. These datasets are helpful for modeling RNA secondary structure, and analyzing correlations between structural and cellular activities including transcription rate, translation efficiency, etc. Based on the current need to comprehensively collect and process RNA probing data, we developed RASP that cover 161 manually curated RNA probing datasets from 18 species with 18 different experimental technologies. In contrast to existing databases, RASP provides a comprehensive data and analysis platform.
The current version of RASP mainly includes datasets of large-scale RNA secondary structure studies, integrated with enabling RNA structurome analysis tools. However, previous low-throughput experiments have also generated a lot of RNA structural data. For comprehensiveness, in the future RASP will include these low-throughput data, as well as results of studies focused on certain RNA targets such as the lncRNA, Hotair (45). We also aim to integrate more analysis tools, such as RNA covariation analysis, conserved RNA structure elements discovery, and multiple methods for RNA secondary structure prediction and modeling. We expect that RASP should greatly aid researchers, and accelerate RNA structure research, as well as allow follow up on more data from current large datasets.
SUPPLEMENTARY DATA
Supplementary Data are available at NAR Online.
FUNDING
Chinese Ministry of Science and Technology [2019YFA0110002 to Q.C.Z.]; National Natural Science Foundation of China [91740204, 31761163007, 91940306 to Q.C.Z.]. Funding for open access charge: Chinese Ministry of Science and Technology [2019YFA0110002 to Q.C.Z.]; National Natural Science Foundation of China [91740204, 31761163007, 91940306 to Q.C.Z.].
Conflict of interest statement. None declared.
REFERENCES
Author notes
The authors wish it to be known that, in their opinion, the first two authors should be regarded as Joint First Authors.
Comments