In recent years, new methods for computational RNA design have been developed and applied to various problems in synthetic biology and nanotechnology. Lately, there is considerable interest in incorporating essential biological information when solving the inverse RNA folding problem. Correspondingly,
The design of RNAs with favorable traits is a promising endeavor that can be viewed as part of growing efforts in synthetic biology (1), as well as other applications. For example, it can be used to enhance the search for particular RNAs such as ribozymes and riboswitches in sequenced genomes (2), as well as other non-coding RNAs that may act as regulators of disease (3) or participate in catalysis (4). For riboswitches (5,6), aside of the classical problem of computationally designing transcription regulators and validating them experimentally (7,8) to complement pure experimental designs (9–11), the inverse RNA folding problem that was initially formulated and addressed in (12) can be used as a pre-processing step before BLAST for riboswitch identification (13). This recent use was also worked out for IRES-like structural subdomain identification in (14). It has potential to advance the field described in (15) for conserved RNAs in general.
Thus, computational RNA design is of increasing biological importance. Since the first program for solving the inverse RNA folding problem (or RNA design) called
In recent years, several programs for RNA design have been developed with the goal of offering added features with respect to the original RNAinverse, most of which are general in purpose for solving similar problems (19–27) and a few are more specialized for nanostructure design and for fixed-backbone 3D design, respectively (28,29). Recently, an extension to the problem was gradually developed (30–32) that allows designing sequences that fold into a prescribed shape, leaving some flexibility in the secondary structure of RNA motifs that do not necessarily possess a known functional role. This extension, when offering a fragment selection to the user, is called ‘fragment-based’ design because it is based on a user-selected secondary structure motif (the fragment) that possesses a functional role and is therefore inserted as a ‘fragment-based’ constraint to the design problem. The shape of the RNA can be represented as a tree-graph (33) that groups together a family of RNA secondary structures, all belonging to the same coarse grained graphical representation.
The aforementioned extension led to a unique inverse RNA folding program called
Moreover, pseudoknots have not yet been implemented in our designed program, as well as some experimental constraints, such as avoiding transcription slippage in the case of consecutive G nucleotides. In the future, it will be desired to add these features and others of experimental type to our program, as more experimental results with designed sequences obtained from our program are accumulated. In the following sections, the
The backend is written in Java EE and run on Tomcat 8. It dispatches design tasks responsible for running
The input screen of the
Results are sent by email if specified, otherwise the results are available upon completion in an interactive job mode. Aside of the Desgin Form page, a Search Result page is available in the top menu, should the user wish to re-analyze a previously-computed result, using its corresponding query name or identification. A general Help page is also available, as well as contextual tooltips that provide brief explanations for each field.
The results can be accessed through the web link provided to the user, and are guaranteed to be accessible for at least a week following their generation. In addition to keeping the web link for later use, the user has an option to download the results in excel format for further analysis.
After the example parameters in the input screen of Figure 1 are inserted and the form is submitted, the main results screen appearing in Figure 2 is obtained. The query structure and associated sequence constraint appears at the top of the page. Below it are filtering options of the results displayed and further below is a table with a list of results. The table contains all the designed sequences that were generated. Each row provides a designed sequence result and its folded predicted structure in dot-bracket notation (12), its Shapiro tree-graph representation (33), minimum free energy in kcal/mol (calculated using
For the user to have an estimate of run times, given inputs of different lengths, Figure 3 constraints run times for four different structures. The number of sequences designed was 20 by default. Tests were made using the default parameters and are presented in Log-10 s. The fourth structure was taken for timing purposes, although it should be noted that our method is using energy minimization predictions and therefore it is expected to become less accurate for lengths over 150 nt and output results for the fourth structure are not likely to have any biological meaning. There can be structures that may have results of biological meaning over 150 nt and our web server supports inputs of up to 512 nt.
MATERIALS AND METHODS
The inverse RNA folding problem for designing sequences that fold into a given RNA secondary structure was introduced in (12). The approach to solve it by stochastic optimization relies on the solution of the direct problem (16–18). Initially, a seed sequence is chosen, after which a local search strategy was used in the original
Most of the constraints are inserted to the objective function in an additive manner with proper weights. This raises compatibility issues with rigid constraints like sequence constraints, which could also be inserted to the objective function in future work although at present they are left as rigid constraints for simplicity.
The weights are fixed and the rationale for their values is explained below, as well as a description for each one of the terms. To start with, the first term for the target motif existence is a binary term and is the most important constraint in general that should be fulfilled exactly without any compromise. Therefore a much larger weight of 103 relative to all others in the objective function is chosen for this term (32). In our problem of riboswitch identification, we may use it in case we encounter a specific motif such as the multi-branched loop of the guanine-binding aptamer that we would like to preserve. The neutrality for measuring mutational robustness is a number between 0 and 1. Therefore a weight of 102 is assigned to it.
The neutrality of an RNA sequence of length L is calculated by the formula <(L − d)/L >, where d is the base-pair distance between the secondary structure of the original sequence and the secondary structure of the mutant, averaged over all 3L one-mutant neighbors. The base-pair distance is evaluated by the
Preliminary analyses (34) revealed that
When solving the inverse RNA folding problem, it is important to be able to address biological constraints in the forms of structural constraints, as well as physical observables and sequence constraints. New programs that were recently developed such as
It should be noted that the allowed flexibility of the fragment-based design approach may also introduce spurious solutions that could be more noticeable in specific cases. Some of these issues could be remedied in the future, especially when more practical experience is gained on biologically-driven problems. As a consequence, the user should not get disappointed in special cases when the imposed constraints do not seem to lead to the desired outcome from the biological standpoint. For example, the fragment-based constraint and the sequence constraints are not fully compatible, and this could introduce designed sequences in which the sequence constraints that were meant to appear inside a certain selected motif appear outside it in adjacent motifs. Such compatibility issues could be alleviated in future versions of our approach by enforcing links between the different types of constraints that are beyond the scope of the present work. At present, non-desired results as a consequence of these issues could be neglected or filtered out in a suitable post-processing step. Extensions for pseudoknot consideration and additional biologically-driven constraints, including varied-length designed sequences, are also left as prospects for future work.
We thank Arik Goldfeld and Vitaly Shapira from the computer science laboratory at Ben-Gurion University for their help with our web server.
ISF within the ISF-UGC joint research program framework [9/14]. Funding for open access charge: ISF within the ISF-UGC joint research program framework [9/14].
Conflict of interest statement. None declared.