Non-coding RNA: a new frontier in regulatory biology

A striking finding in the past decade is the production of numerous non-coding RNAs (ncRNAs) from mammalian genomes. While it is entirely possible that many of those ncRNAs are transcription noises or by-products of RNA processing, increasing evidence suggests that a large fraction of them are functional and provide various regulatory activities in the cell. Thus, functional genomics and proteomics are incomplete without understanding functional ribonomics. As has been long suggested by the ‘RNA world’ hypothesis, many ncRNAs have the capacity to act like proteins in diverse biochemical processes. The enormous amount of information residing in the primary sequences and secondary structures of ncRNAs makes them particularly suited to function as scaffolds for molecular interactions. In addition, their functions appear to be stringently controlled by default via abundant nucleases when not engaged in specific interactions. This review focuses on the functional properties of regulatory ncRNAs in comparison with proteins and emphasizes both the opportunities and challenges in future ncRNA research.


INTRODUCTION
A major surprise since the completion of the human genome and subsequent sequencing of all biological model organisms is the limited number of protein-coding genes, which neither correlates with the complexity of organisms nor accounts for the selection pressure during the evolution of modern organisms [1]. In humans, the protein-coding sequences occupy only ~1.5% of the genome, and when considering intervening sequences (introns) within protein-coding genes and 5′ and 3′ untranslated regions, this number goes up to only ~28%. Much of the remaining portion of the human genome used to be considered 'junk' DNA because ~59% are repeat sequences; however, recent analysis by the Encyclopedia of DNA Elements (ENCODE) project suggests that ~80% of the genome appears to participate in some sort of biochemical activities that might be functionally important [2]. This suggests a general paradigm for functional DNA elements embedded in the non-coding part of mammalian genomes.
While initial microarray-based results met with skepticism, the ENCODE data generated by the latest deep sequencing technologies demonstrated that at least 70% of the human genome has the capacity to produce transcripts of various sizes, many of which are conserved in animal kingdom [2]. Besides mRNAs already annotated, most other transcripts do not seem to encode for proteins and are generally referred to as non-coding RNAs (ncRNAs) [3]. Although debate continues with respect to the possibility that some of these ncRNAs may still direct synthesis of short peptides, the consensus is that they are largely non-coding, which is supported by the evidence from ribosome profiling [4] and by the large-scale proteomics analysis performed on two ENCODE cell lines [5]. While most of these ncRNAs have yet to be biochemically characterized, we are witnessing functional assignment to an increasing number of ncRNAs, leading to birth of a new discipline in biological research.
Like many emerging disciplines, the ncRNA field has received great attention in recent years from the general research community, and the progress made has been extensively reviewed from the perspective of mechanistic insights [6][7][8] and/or biological functions [9][10][11]. Instead of enumerating numerous great points that have been made in those reviews, here I highlight the biochemical property of ncRNAs in comparison with proteins to formulate ideas for future research, the uniqueness of ncRNA research, which calls for the great need to develop new experimental approaches, and the potential to exploit ncRNA as a new class of biomarkers or therapeutic targets in biomedical and biotechnological applications.

ncRNA: OLD AND NEW
ncRNAs may be new to the research community at large, but actually ancient among RNA researchers. Classic ncRNAs that have been intensively studied in the past five decades since the birth of molecular biology include small ncRNAs, such as transfer RNAs (tRNAs) for carrying amino acids, small nucleolus RNAs (snoRNAs) for RNA modifications, and small nuclear RNAs (snRNAs) for RNA splicing, and large ones, such as ribosomal RNAs (rRNAs) for protein synthesis (Box 1 and Fig. 1). These ncRNAs may be considered 'constitutive', because they are abundantly and ubiquitously expressed in all cell types and provide essential functions to the organism. This class may also include the telomere complex-associated guide RNA, which is essential for the end formation and maintenance of chromosomes in normal proliferating cells even though the telomere complex and the ncRNA in it are often compromised in cancer cells [12].
sequencing has identified an increasing number of long intergenic non-coding RNAs (lincRNAs) or simply long non-coding RNAs (lncRNAs), now listed in various databases [18,19], which has received great attention from the research community.
In general, ncRNAs have been classified based on an arbitrary size cut-off of 200 nt to separate small ncRNAs from lncRNAs. However, many ncR-NAs may fall into both sides of this cut-off, such as enhancer-associated RNAs (eRNAs), promoter-associated transcripts (PATs), and the more recently emerged circular RNAs (circRNAs) (Box 1; Fig. 1). In fact, these ncRNAs have their own structural features at each end, as eRNAs and PATs have cap, but no poly(A) tail [20], while circRNAs obviously have no ends, which add to structural characteristics of other ncRNAs after processing (e.g. snRNAs with the 5′ tri-methylated cap, miRNAs with the 5′-phosphate, etc.). These features distinguish them from the class of lncRNAs (Box 1), which are transcribed and processed in an identical way to that of proteincoding genes (e.g. capping, splicing, and polyadenylation, see Fig. 1), and as such, their genes are also associated with characteristic chromatin marks (e.g. H3K4me3 at promoters and H3K36me3 in the gene body), which have been exploited for their prediction, identification, and characterization in mammalian genomes [21].
A common feature of newly identified ncRNAs is their highly regulated expression in different cell types or during development. Our current understanding of their functions, although still quite limited, suggests that these ncRNAs may have diverse regulatory activities (Box 1). Because ncRNAs are either transcribed from specific genomic loci or derived from segments of protein-coding genes, the question is whether all expressed ncRNAs that are detectable by sensitive technologies are functional or some of them may simply reflect transcriptional noises or by-products of RNA processing [22]. A deeper question is whether the process of producing some of those ncRNAs, rather than the final products, is of biological importance because transcription of these ncRNAs is often associated with chromatin remodeling activities. Despite continuous debate on these valid questions, the field has experienced tremendous progress in elucidating the function and mechanism of various ncRNAs, particularly lncRNAs. Thus, for practical reasons, one may first focus on studying ncRNAs that have already some functional evidence, while ignoring many potential 'junk' RNAs, at least for the time being.

FUNCTION OF ncRNA IN COMPARISON WITH PROTEIN
The hypothesis of 'the RNA world' proposes that the development of life, which has to fulfill the requirement of having the ability to carry and replicate its genetic material, may begin with RNA [23,24]. ncR-NAs appear to have preserved most, if not all, of their original features and functions in modern organisms that have evolved to adopt more efficient strategies to replicate and express their genetic information along the central dogma from DNA to RNA to protein. As a result of exploring selective advantages of proteins and RNA during evolution, many functions of RNA are passed onto proteins while others are retained. In this regard, it might be informative to compare the function of ncRNAs with proteins to conceptualize ncRNA function and mechanism.

RNA as enzyme
One of the key functions of proteins is to catalyze chemical reactions. Some ncRNAs have long been known to preserve this critical function, known as catalytic RNA, such as the RNAs associated with RNase P required for tRNA processing [25] and auto-catalytic introns [26]. In fact, through in vitro selection from random sequences, one may select RNA capable of catalyzing RNA ligation [27] or polymerization [28]. Other ncRNAs preserve their catalytic function only when folded correctly with help of proteins. The best known example is rRNAs in which all key catalytic reactions in reading the coding information in mRNA are provided by the so-called RNA centers [29]. This may also be the case in the spliceosome, which is responsible for intron removal during pre-mRNA splicing and where the catalytic center may form with both RNA and proteins [30]. Therefore, although most catalytic activities of RNA have been passed onto proteins in modern organisms, at least some ncRNAs appear to have kept such function during evolution. Even so, some key functional properties of RNA are maintained in many ribonucleoprotein (RNP) machines. The best known examples are in fact miRNAs and piRNAs in argonaute-containing complexes where these tiny ncRNAs provide targeting information whereas the associated proteins execute the biochemical reactions [31,32]. We thus should not be surprised if many additional ncRNAs are found to make direct contribution to catalysis in the form of RNPs.

RNA as scaffold of molecular interactions
A major function of proteins in the cell is to engage in protein-protein, protein-DNA, and protein-RNA interactions in diverse biochemical reactions. These functions are mediated by specific domains, ~600 of which have been characterized to date among ~3000 potential ones [33][34][35]. In comparison, RNA seems to have similar, if not larger, capacity to perform such molecular interactions through their unique sequence motifs and secondary structures, the latter of which may adapt into different combinations when exposed to different environments or interacting with different proteins. In principle, a specific RNA moiety may interact with DNA or RNA through base-pairing whereas both primary sequences and secondary structures may serve as modules for interactions with specific proteins or protein complexes. For example, specific stem-loop domains in the 7SK RNA are known to interact with distinct protein components [36], and the lncRNA HO-TAIR uses its 5′ domain to interact with Polycomb Complex 2 (PRC2) and its 3′ domain to recruit the histone lysine 4 demethylase LSD1, thus coordinating two separate transcription repressor complexes to act on target genes [37]. The ability of a ncRNA to simultaneously engage in interactions with DNA and proteins has been exemplified with the rRNA gene-associated transcripts, which, together with the transcription factor TTF-1, recruit the DNA methyl-transferase DNMT3b to CpG islands [38]. These examples illustrate unique advantages of ncRNAs in the regulation of gene expression.
The ncRNA steroid receptor RNA activator is one of the first examples documented to function as a transcription co-activator in gene activation [39], and we now know that many other ncRNAs appear to have such enhancer function [40]. Numerous studies have exposed the mechanisms of regulatory ncRNAs in transcriptional control, including (1) transcription interference by antisense RNA [41,42] (Fig. 2a), (2) direct inhibition of Pol II activity by Alu repeat-derived transcripts [43,44] (Fig. 2b), (3) sequestration of transcriptional regulators [45] (Fig. 2c), (4) guiding transcription regulators to specific regulatory loci through RNA-DNA base-pairing interactions [38] (Fig. 2d), (5) recruitment of additional transcription regulators [37] (Fig. 2e), and (6) mediating long-distance interactions between promoter and enhancer [40,46] (Fig. 2f). Each of these action mechanisms by specific lncRNAs on their target genes has been detailed in multiple recent reviews [6][7][8]11]. Interestingly, a recent study showed that two lncRNAs (PRNCR1 and PCGEM1) overexpressed in prostate cancer cells interact in a consecutive fashion with the androgen receptor to promote gene expression and cell proliferation in castration-resistant prostate cancer [47]. These and other findings emphasize the involvement of extensive RNAdependent interactions in transcriptional control.

Cis-acting RNA as regulatory signal
A common property associated with many regulatory ncRNAs is their action in cis, meaning that they function at the genomic loci where they are transcribed [40], which is likely due to their rapid turnover once released from the site of synthesis. An analogy may be made in this case with secreted proteins synthesized on endoplasmic reticulum (ER), where the signal peptide guides the protein during translation into the lumen of ER and then removed by peptidase [48]. Some promoter-proximal ncR-NAs appear to interfere in cis with transcription either through direct interaction with core components of the transcription machinery [49] or through separate RNA-binding proteins (RBPs) [50]. Certain lncRNAs, such as HOTTIP, appear to also act in cis because of the difficulty in restoring their functional requirement with exogenous transcripts [51]. However, inactivation of most lncRNAs by RNAi seems to invoke genome-wide responses, implying that those lncRNAs may function in trans to module gene expression in multiple locations in the genome [52].
One particular type of ncRNAs that function exclusively near the site of their production is enhancer-transcribed ncRNAs (or eRNAs) [53,54]. Recent studies demonstrated that eRNA production is essential for activating their targeted promoters [20,46,55,56]. As enhancer activities may reflect binding and activity of Pol II, which has been shown to induce chromatin remodeling [57] and promote DNA looping between enhancer and promoter [56], the question is whether or not the process of such transcriptional activities might be more functionally relevant than the RNA products. A BoxB-λN tethering strategy was first used to demonstrate HOTTIP in coordinating long-range chromatin interactions [51], and a recent study also took this approach to show that eRNA mediates DNA looping between enhancer and promoter [46].
Another class of potential cis-acting ncRNAs is PATs. Interestingly, most mammalian genes appear to express divergent transcripts from their promoters, a phenomenon that is not evident in yeast or Drosophila [58,59]. Currently, little is known about the function of these ncRNAs transcribed in the opposite direction of the genes. Interestingly, the antisense transcripts tend to lack U1-binding sites whereas the sense transcripts lack the polyadenylation signals [60]. These features might be responsible for the termination of antisense transcription while allowing sense transcription to proceed, as U1 is known to protect the genome by preventing premature transcriptional termination [61]. The sense PATs may also represent aborted transcription products of paused Pol II immediately downstream of mammalian promoters [62]. Interestingly, one such RNA signal has been well studied in HIV-1, where it attracts the HIV tat protein to bind and recruit additional transcription activators, particularly pTEFb, a Pol II CTD kinase, to release paused Pol II into the gene body [63]. A recent study indicates that many cellular genes may employ a similar mechanism through the splicing factor (SRSF2) to facilitate pause release of Pol II from gene promoter into gene body [64], thus suggesting a general role of PATs in providing signals for Pol II to enter productive elongation. It has also been demonstrated that nascent RNA from the gene body near the transcription start site may provide cis signals for the Polycomb Complexes to bind [65]. Another important message from these studies is that parts of pre-mRNAs from protein-coding genes may also be considered as a new class of ncRNAs in regulated transcription.

Trans-acting RNA as molecular sink
The molecular sink mechanism is a key strategy for proteins to function in signaling networks in mammalian cells. This concept has also been well documented with many RNA motifs in mRNAs as well as in transcripts from transcribed pseudogenes in mammalian genomes [66,67], again indicating that some parts of mRNAs also function as ncRNAs in nature. These RNA elements have been shown to sequester specific miRNAs to prevent their action on other target mRNAs, but the stoichiometry between competing ncRNAs and target RNAs has to be considered in each case for the physiological relevance of any sequestration effect detected [68]. Some specific lncRNAs have also been shown to sponge miRNA [69] and titrate transcription activators to inhibit cell cycle progression under starvation conditions [70] or in response to DNA damage [45]. Therefore, the entire repertoire of expressed RNAs, whether they are mRNAs or ncRNAs, may participate in diverse RNA-RNA or RNA-protein interaction networks to regulate various cellular activities.
Interestingly, analysis of poly(A−) RNA, which has been largely ignored in the past, revealed many stable ncRNA species, which have been abundantly detected in the oocyte nucleus [71]. One of the general mechanisms for these ncRNAs to remain stable may be that their ends are somehow sealed. Three strategies have been elucidated for stabilization of such ncRNAs. One is to ligate their 5′ and 3′ ends, thus forming circRNAs (see Fig. 1) [72,73]. This likely results from the action of the spliceosome, leading to the ligation of the upstream 3′ splice site to the downstream 5′ splice site of an exon, although the precise mechanism for their production remains to be understood. Interestingly, through characterizing poly(A−) RNAs, another strategy to 'seal' the ends was recently revealed, which is to prevent de-branching on some released introns [74]. This type of intron-derived circRNAs is thus sealed by the 2′-5′ phosphodiester bond formed at the branch-point during pre-mRNA splicing (see Fig. 1). The third strategy to protect the RNA ends is via some stable RNA moieties, such as those found in snoRNAs [75] or the formation of a triple helical structure, such as that characterized at the ends of the stable MALAT-1 RNA [76,77] and some virus-derived ncRNAs [78]. Such RNA structures, either alone or in complex with specific RBPs, protect the RNA from degradation after release from their pre-mRNA precursors.
Functionally, one specific circRNA has been shown to contain an array of binding sites for miRNAs, thus serving as a molecular sink to prevent the miRNAs from interacting with their targets [72,73]. The snoRNA-protected intronic ncRNAs appear to trap a critical RNA binding protein RBFox2, thus titrating its active pool for regulated splicing in the cell [75]. In fact, the classic RNA that serves as a molecular sink is the very abundant 7SK RNA, which has been well characterized to bridge pTEFb to its inhibitor HEXIM1 in the inactive pool of the CTD kinase in the cell [63]. It is unlikely, however, that a molecular sink is the only function associated with various stable ncRNAs. For example, the intron-derived circRNAs sealed by the 2′-5′ phosphodiester bond appear to play a positive role in transcription of their host genes, although the mechanism has remained elusive [74]. This finding further highlights the functional importance of various sequences in the pre-mRNA of protein-coding genes, as they not only give rise to miRNAs and snoRNAs, but also produce various circRNAs that appear to have both cis and trans functions.

RNA as ligand
Both small molecules and proteins are well known for their abilities to bind and induce conformational changes of their protein partners, thereby invoking signaling. ncRNAs appear to have a similar role in modulating protein conformation. One such example is a DNA damage-induced ncRNA from the cyclin D1 promoter-proximal region. This ncRNA binds to the RNA binding protein TLS to induce its conformational changes to unmask another domain in the protein for additional protein-protein interactions to take place, eventually leading to transcriptional repression [50].
The miRNA Let-7 appears to also act like a ligand in activating the Toll-like receptor 7, which appears to be a critical event in Let-7-induced neurodegeneration [79]. Small RNAs as ligands have also been exemplified by piRNAs, which, upon incorporating into the PIWI complex, induce conformational changes of the PIWI protein (MIWI in mice) to permit its ubiquitination by a specific E3 ligase [80]. This ncRNA-induced signaling event appears to play a vital role in spermiogenesis by triggering the eventual clearance of the piRNA machinery, a pathway proven to be essential for producing mature sperms in the testis. These findings illustrate that ncRNAs can function as ligands to regulate the conformation of their target proteins to trigger the next set of molecular interactions in some important biological processes. Future structural studies of RNPs may elucidate detailed mechanisms underlying such ncRNA-induced molecular switches.

RNA as organizer of cellular structures
Many ncRNAs are quite large in size and have been referred to as macroRNAs. The best example is the nuclear enriched abundant transcript 1 (NEAT-1). NEAT-1 has two isoforms (the larger one is ~23 kb in length and the smaller one is 3.7 kb in human, 3.2 kb in mouse), both of which are localized to a specific nuclear domain known as paraspeckles [81,82]. The function of paraspeckles is largely known, although a more recent study suggests an active role of NEAT-1 in facilitating the expression of some antiviral genes [83]. A large number of RBPs have been identified to be part of this nuclear structure, although a few core factors, such as Nono, PSP1, and PSF, appear to be selectively concentrated in this nuclear domain [84]. Many repeat-containing RNAs have been shown to associate with this structure, suggesting that the domain might arise from clustering some specific classes of ncRNAs along with their RBPs [85,86]. The larger NEAT-1 isoform appears to play a critical role in organizing such clusters, as targeted degradation of this ncRNA disrupted the structure [87,88], and ectopic expression of this large, but not small, NEAT-1 isoform was sufficient to induce de novo formation of a paraspeckle-like structure around it [89].
The name of paraspeckle is due to the spatial relationship of the domain to another nuclear domain known as speckles [90]. As numerous factors implicated in the splicing reaction have been localized to this structure, it has been a cellular hallmark for the splicing machinery [91]. However, its primary function in pre-mRNA splicing has long been a subject of debate. A popular view is that this domain serves as a storage site for splicing factors; however, increasing evidence points to a more active role of the domain in gene expression via coordinating transcription and splicing reactions at its vicinity, thus suggesting that this nuclear domain may play a larger role in organizing the genome for concerted transcription and post-transcriptional processing events [92,93]. Interestingly, another large lncRNA, known as NEAT-2/MALAT-1 of ~7.5 kb in size, lies in the heart of individual nuclear speckles. The initial MALAT-1 transcript contains a tRNA-like structure at its 3′ end, which is processed to produce the mature MALAT-1 retained in the nucleus, releasing the tRNA-like small RNA to the cytoplasm [94]. Unlike NEAT-1, mature MALAT-1 does not seem to be responsible for the formation or maintenance of nuclear speckles [95]. However, depletion of this large lncRNA has been shown to affect specific events associated with nuclear speckles, such as SR protein phosphorylation [96], implying that the lncRNA is involved in various protein-protein interactions to facilitate the establishment and dynamics of this non-membrane-bound organelle in the nucleus. Interestingly, NEAT-2/MALAT-1 was originally identified as a nuclear ncRNA that was dramatically elevated in tumor cells [97], which appears to be important for metastasis of lung cancer [98], indicating that this macroRNA may have an active role in cancer initiation and/or progression through its function in regulated gene expression. It is however important to point out that knockout of either NEAT-1 or NEAT-2/MALAT-1 produced no obvious phenotypic defects, indicating that these ncRNAs are not essential for mouse development [95,99].
Contrary to the nuclear structures associated with active gene expression, other nuclear domains are functionally linked to gene repression, such as the Polycomb body in the nucleus, which contains protein complexes responsible for depositing repressive marks, such as H3K27me3, to chromatin. This domain contains numerous ncRNAs, including Tug 1 [100]. While the precise role of this lncRNA has remained unclear, its association with the Polycomb body may compete with some common gene expression regulators that are partitioned between active and repressive domains in the nucleus, and regulated exchange between these domains appears to be a key event in switching the functional states of many genes [101]. Therefore, specific lncRNAs may provide signals or docking sites for regulatory proteins or protein complexes, thereby contributing to the organization of the human genome in the 3D space of the nucleus. More recently, repeat-derived ncRNAs were suggested to be a key part of nuclear scaffold for maintaining chromosome territories [102].
Together, various nuclear domain-associated lncRNAs may be considered as part of nuclear skeleton in analogy with the cytoskeleton in the cytoplasm.

Secreted RNA as potential hormone
ncRNAs are made in the nucleus either from their own genes or genomic loci or processed from their host genes. As cells have very active machineries to degrade most transcribed RNAs, functional ncRNAs must have evolved some strategies to survive various RNA surveillance mechanisms. As described above, some ncRNAs have specific structures to protect their ends to make them inaccessible to exonucleases while others may gain protection by forming specific RNPs. A fraction of ncRNAs are able to not only survive degradation in the cell, but also make it to the extracellular space. So far, this has been documented for miRNAs, which appear to be assembled into microvesicles for secretion [103]. We are still early in understanding how some miRNAs are imported or assembled into microvesicles for secretion, and how the specificity, if any, might be established in such a process. In any case, the detection of secreted miRNAs in the circulation system seems to provide a unique set of biomarkers for disease diagnosis [104][105][106]. A more important question is what these secreted miRNAs might do in the circulation system. Do they function as hormones to act in distal organs? Initial studies provide some evidence for such a possibility [107,108]. Remarkably, some exogenous miRNAs from food supply might also have such a role [109], although the finding has remained to be substantiated [110]. Overall, the idea that RNAs can function as hormones has remained as a hypothetic function for secreted miRNAs.
In concluding this section, I wish to make the point that our current knowledge has significantly expanded the function of RNAs as information carriers. They appear to be able to perform a large array of cellular functions that have been ascribed to proteins. Importantly, we are still glimpsing at the tip of iceberg, despite the impression that many working principles have been elucidated with specific ncRNA examples.

STRATEGIES FOR FUNCTIONAL AND MECHANISTIC STUDIES OF ncRNA
Small ncRNAs, particularly miRNAs, are well known for their roles in diverse biological pathways. The existing examples of characterized lncRNAs have also demonstrated their widespread participation in biological functions, ranging from dosage compensation [111,112], cell cycle control [45,113], stem cell maintenance and differentiation [52,114,115], development [116][117][118], and cancer etiology and progression [47,119,120]. Given their functional resemblance to proteins, essentially all experimental strategies developed to decipher protein functions may be applied to ncRNA research; however, because of their uniqueness as a linear chain of nucleic acids and the ability to fold into multiple secondary and tertiary structures, new approaches are also needed to study their functions and action mechanisms. In this section, I briefly discuss some common and unique approaches developed for ncRNA research (Box 2).

Experimental approaches to defining ncRNA function
As with protein-coding genes, one of the most important experimental approaches to study ncRNAs nowadays is to determine their unique expression patterns associated with a specific biological question under investigation and to conduct loss-of-function studies in a particular biological setting. Using modern genomics strategies, it has become a routine to profile gene expression by RNA-seq in any given biological system [121,122], which may be combined with various affinity methods to detect RNA (both coding and non-coding) at different stages of gene expression [123,124]. The identification of the entire set of expressed lncRNAs would allow comparison under different experimental conditions or between different cell types to identify differentially expressed lncRNAs [116,125]. The challenge is to determine on which specific lncRNA(s) to further study. Currently, most studies focus on differentially expressed lncRNAs that are expressed with sufficient abundance. By using siRNA or antisense oligonucleotides (ASO), the latter of which appear to be more efficient in depleting lncRNAs via endogenous RNase H activities [126], one can efficiently deplete specific lncRNAs to evaluate their functional requirement. If resources are available or permit, this loss-of-function approach may be applied genome-wide to obtain a comprehensive set of lncRNAs involved in some defined biological processes, as exemplified on stem cells [52].
The hard part of ncRNA research is to probe for the mechanism and explore new regulatory concepts. The cellular localization of specific ncRNAs may be first determined to obtain an approximation of their functional sites. As mRNAs are known to display remarkable localization patterns in the cell [127], the localization of ncRNAs, particularly lncRNAs, might be informative to their cellular functions. To understand the function of a specific lncRNA, it is often important to identify its protein partners. Furthermore, if the lncRNA under investigation acts in the nucleus to regulate gene expression, one will also need to determine its target genes. To identify protein partners, antibodies are very useful tools for protein research, but for lncRNA, one has to rely on some entirely distinct approaches. One such approach is to use affinity tagged (such as biotin) oligos to capture specific lncRNA followed by deep sequencing of linked DNA and/or by mass spectrometric analysis of associated proteins, a method known as CHART-seq [128], which has been applied to elucidate two-step spreading of Xist ncRNA complexes during X-chromosome inactivation [129]. A related method called ChIRP-seq was developed in parallel to survey lncRNA occupancy on genomic DNA [130]. This technique has been applied to probe the genomic interaction of the 7SK complex on so-called anti-pause enhancers [131].
To efficiently use this approach, it would be helpful to know the exposed RNA regions in the cell by probing RNA structure in living cells [132,133]. Two recent studies reported a more robust method based on dimethyl sulfate modification of exposed adenines and cytosines followed by deep sequencing of RNA containing the modified residues to achieve high-resolution mapping of the RNA secondary structure [134,135]. These new approaches will greatly accelerate the discovery of regulatory events on RNA targets by both ncRNAs and specific RBPs.
Another approach is to epitope tag an lncRNA with an MS2 moiety, thus permitting the capture of the lncRNA-containing RBP with an MS2 fusion protein [136]. An analogous strategy is to use an RNA tag that contains two specific hairpins, thus allowing tandem affinity purification of RNA-protein complexes [137]. This RNA-tagging strategy, however, can be problematic if the lncRNA only acts in cis or the overexpressed transcript does not effectively get assembled into its native RNP complexes. This problem can be addressed by using the latest genome editing technology to tag specific ncRNA genes [138] (see below). Given the nucleic acid nature of lncRNA, future studies may also pursue chemical engineering methods to take advantage of specific sequences or structure moieties to introduce affinity groups for lncRNA localization and affinity purification.

Studying ncRNA from the angle of RBPs
It is conceivable that lncRNA functions are mostly mediated by specific RBPs, and, thus, focusing on specific RBPs of interest may be an effective route to study lncRNA function and mechanism in general. Recent studies indicate that mammalian genomes may express at least 1000 RBPs [139], many of which may not even carry annotated RNA-binding domains [140]. As a matter of fact, we do not know the exact distinction between DNA-binding proteins and RBPs, as they have been traditionally studied based on their interactions with DNA or RNA. As a result, some DNA-binding proteins may also bind RNA and the converse may also be true. For example, two recent studies demonstrated that the PRC2, which is responsible for depositing the repressive H3K27me3 mark on histone, actually has high affinity for RNA [141], explaining its extensive interaction with nascent RNA in the cell [65].
An important point is that the cross-linking immunoprecipitation (CLIP) technology and various variants of the approach have demonstrated effectiveness in identifying proteinassociated RNAs and mapping such interactions in the genome [142]. Efficient and highthroughput methods have also been developed to determine the RNA binding specificity of RBPs [143,144], and an increasing number of RBPs have been mapped to mammalian genomes using CLIP technologies. Although most published studies to date have been focused on understanding the function of RBPs in RNA metabolism, such as pre-mRNA splicing, the available mapping data indicate that many RBPs also show extensive interactions with diverse lncRNAs [145]. As the CLIP data accumulate and have been organized in the database [146], one may mine such data to identify proteins mapped to specific lncRNAs under investigation. With candidate RBPs and lncRNAs in hand, loss-offunction studies can then be performed to identify common targets for further mechanistic dissection, as exemplified by the study of p53-regulated gene expression that involves both an lncRNA (lincRNA-p21) and a specific RBP (hnRNP K) [147].

Challenges in structural analysis of RNPs
A common approach in mechanistic studies of proteins or protein complexes is to define specific protein domains engaged in a particular molecular interaction and probe a detailed interaction mechanism in crystal structure. Similar approaches are clearly needed for understanding RNA-protein interactions. The challenge in dissecting RNA domains involved in such an interaction with specific proteins has been showcased with HOTAIR, an lncRNA that interacts with two different chromatin remodeling complexes through distinct RNA segments [37]. However, there is a great uncertainty in dissecting domains with in vitro transcribed RNA, as RNA may adopt into distinct secondary structures when made in vitro versus produced inside cells where specific RBPs may be assembled onto the RNA during transcription and/or processing, which may take place in a sequential fashion. This may make it difficult to reconstitute RNPs that contain multiple protein components for biochemical studies.
In the protein world, ultimate mechanistic insights are obtained from NMR or crystallography. The structure of the largest RNA machine-the ribosomes in complex with tRNA and mRNA-has been resolved at the atomic levels [148,149], and similarly, structures of miRNAs in argonaute proteins have been determined [150][151][152]. The structural approach has also been applied to an H/ACA box snoRNP particle [153] and a spliceosome sub-complex [154]. In general, however, it has been quite difficult to obtain crystals of many other RNPs, such as the spliceosome, in part because of insufficient materials one can purify from the cell or the lack of ability to preserve relatively stable structures during the purification process for crystallization. The common practice in protein crystallization is to use recombinant proteins, but in light of various potential problems in assembling RNPs in vitro, it will be a major challenge to reconstitute large RBPs for structural studies.

Genome engineering to determine ncRNA function
Similar to investigating protein functions in biology, the decisive information is obtained in many cases by gene targeting, which has recently been applied to a set of lncRNAs [155]. We are at the dawn of applying this genetic approach to ncRNA research, especially in light of the recent development of the powerful TALEN and CRISPR/Cas technologies for genome engineering [138,156]. For instance, the CRISPR technology has been used to tag an lncRNA in its expression unit in the genome to allow capture of specific RNA-protein complexes assembled in vivo [157]. In this elegantly designed strategy, a small RNA hairpin is first inserted in the front of specific ncRNA under investigation in the genome by CRISPR. An inactive version of the Cys4 nuclease is next used to efficiently capture the hairpin as part of RNA hybrid along with associated proteins. The affinity-purified RNP is then released for biochemical analysis by using imidazole to activate the Csy4 nuclease. The CRISPR technology can also be used to selectively remove specific ncRNA sequences embedded in their host genes, such as those transcribed as part of introns, to study their functional requirements. Recently, a catalytic inactive form of Cas9 was exploited to develop the CRISPRi system [158,159], which permits both positive and negative modulation of endogenous genes [160] and real-time imaging dynamic movement of specific genomic loci [161]. It is anticipated that the rapidly evolving CRISPR-based genome editing technologies will find wide applications in studying genomic sequences encoding for both small and large ncRNA in the near future.

ncRNA as an integral part of genomics and proteomics
It has become increasingly evident that ncRNAs provide diverse regulatory functions in the cell, and regulatory RNA networks in general represent a crucial interphase between genomics and proteomics (Fig. 3). Both small and large ncRNAs are subjected to regulation by diverse mechanisms to control their expression, biogenesis, and degradation, all of which have been well documented with miRNAs and piRNAs [15,31]. As many lncRNAs are expressed from their own genes, a battery of transcription factors are likely involved in the regulation of these lncRNAs during development or in different cell types in a similar way to the regulation of protein-coding genes.
Most lncRNAs have been characterized by their functions in the nucleus, and their interactions with various nuclear machineries may thus contribute to their nuclear retention. However, many lncRNAs are also detectable in the cytoplasm and clearly function there, as demonstrated with the BACE1-antisense transcript (BACE1-AS) and an Alu-containing lncRNA in the regulation of mRNA stability [162,163]. Because premature stop codons in mRNA trigger the nonsense-mediated RNA decay (NMD) [164], this raises the question of how various lncRNAs escape such a pathway. One possibility is that lncRNAs are not scanned by ribosome beyond immediate 5′ sequences [4,165], as the translation process is known to activate the NMD pathway [166]. However, the key NMD initiator Upf1 appears to have the capacity to bind mRNAs as well as lncRNAs in a translation-independent manner [167]. At this point, we have little knowledge about whether cytoplasmic lncRNAs are sensitive to NMD, which represents an interesting subject for future studies.
One exciting future research area is to decipher the contribution of lncRNAs to local and long-distance genomic interactions (Fig. 3a,b). Functional studies of eRNAs and certain lncRNAs have exemplified the critical role of ncRNAs in mediating enhancer-promoter interactions [46,56,168]. Recent studies suggest that the Xist complex explores some larger genomic domains to help spread the transcription repressor complex during X-chromosome inactivation [129,169]. This strategy may also be exploited for establishing both active and repressive domains that involve genomic segments separated by long linear distance on the same chromosomes or even from different chromosome, which may in turn contribute to the organization of the genome in the 3D space of the nucleus [170,171] (Fig. 3c). Research along this direction may represent a new frontier of ncRNA cell biology.
The intersection of ncRNA research with gene networks has well been established for miRNAs [172]. It is easily imaginable for numerous RNA-dependent protein-protein and protein-DNA interactions to exist in the cell, but systematic effort has yet to be undertaken to study such RNA-dependent interactions (Fig. 3d). Thus, analysis of gene networks would be incomplete without incorporating regulatory ncRNAs into various biological pathways. Towards this general goal, all classes of ncRNAs and their expression patterns have been organized in an integrated database [173]. Such a systems biology approach will greatly accelerate research on ribonomics and its integration with functional genomics and proteomics.

CONCLUSIONS
ncRNAs have undoubtedly become one of the 'hot' spots in modern biological and biomedical research. The existing data have abundantly demonstrated the connection of ncRNAs to diverse disciplines in biology, and have illuminated regulatory paradigms that have been largely attributed to proteins. As ncRNAs can be efficiently targeted by stable ASO, this approach may be explored as a method to target specific regulatory ncRNAs to understand their biological functions and action mechanisms in basic research and develop novel strategies for disease intervention in clinical applications. The era of ncRNA research has resulted in and benefited from the rapid advance in genomics technologies and informatics approaches that have been developed in recent years. However, we are clearly facing new challenges in dissecting the dark matter in the genome and understanding their mechanisms. Like many breakthroughs made in the history of life science, both opportunities and challenges equalize, which is up to prepared minds to seize the moment in order to make new breakthroughs. Production of distinct classes of ncRNAs from mammalian genomes. Top: protein-coding (green lines) genes produce divergent PATs at the transcription start site. Certain exonic and intronic sequences have the capacity to generate circRNAs containing either 3′-5′ or 2′-5′ phosphodiester bonds. Many intronic sequences can also encode for miRNAs or snoRNAs. Genes for rRNAs, tR-NAs, or a subfraction of snRNAs are transcribed from separate genes. Bottom: similar to protein-coding genes, transcription enhancers also produce divergent transcripts, known as eRNAs. Most of the lncRNA genes contain at least one intron and are transcribed and processed in the same way as protein-coding genes except that they do not have coding potential (yellow line). miRNAs and piRNAs can also be derived from various intergenic regions.  Modes of ncRNA action on genomic DNA in regulated gene expression. lncRNAs are best characterized for their interactions with transcriptional regulators on functional DNA elements. (a) Various antisense transcripts, which appear to be quite widespread in humans and mice [42], may act as ncRNAs to interfere with Pol II elongation [41]. (b) Repeatderived ncRNAs to block transcription. The prototype ncRNAs in this class are some transcribed Alu sequences, which bind to and interfere with Pol II function at gene promoters [44]. (c) A ncRNA may function as a decoy to compete for a specific transcription factor. The prototype for this mode is PANDA in sequestering the transcription factor NF-YA [45]. (d) A ncRNA may also facilitate the recruitment of a transcription regulator to a specific target site by engaging base-pairing interactions with genomic DNA. The prototype for this mode is the rRNA gene PATs [38]. (e) A ncRNA may bridge proteinprotein interactions between transcription regulators to enhance their activities on a common DNA target. The prototype for this mode is the ncRNA HOTAIR in bridging PRC2 and the lysine demethylase LSD1 to mediate gene silencing [37]. (f) A ncRNA may mediate long-  ncRNAs as integrated parts of gene networks. (a) ncRNAs mediate promoter-enhancer interactions to regulate the expression of various protein-coding genes. Protein-coding transcripts are also subjected to regulation by miRNAs to fine tune protein synthesis in the cytoplasm. (b) ncRNA genes produce various regulatory ncRNAs, which then participate in regulated expression of both protein-coding and non-coding genes. (c) ncRNAs may play a critical role in the organization of the genome in the nucleus to coordinate the expression of gene clusters. (d) Regulated gene expression at both the transcriptional and posttranscriptional levels determines the cell type-specific proteome and ncRNAs may also be extensively involved in protein interaction networks, which together contribute to gene networks in the cell.