Fitness functions for RNA structure design

Abstract An RNA design algorithm takes a target RNA structure and finds a sequence that folds into that structure. This is fundamentally important for engineering therapeutics using RNA. Computational RNA design algorithms are guided by fitness functions, but not much research has been done on the merits of these functions. We survey current RNA design approaches with a particular focus on the fitness functions used. We experimentally compare the most widely used fitness functions in RNA design algorithms on both synthetic and natural sequences. It has been almost 20 years since the last comparison was published, and we find similar results with a major new result: maximizing probability outperforms minimizing ensemble defect. The probability is the likelihood of a structure at equilibrium and the ensemble defect is the weighted average number of incorrect positions in the ensemble. We find that maximizing probability leads to better results on synthetic RNA design puzzles and agrees more often than other fitness functions with natural sequences and structures, which were designed by evolution. Also, we observe that many recently published approaches minimize structure distance to the minimum free energy prediction, which we find to be a poor fitness function.


INTRODUCTION
Ribonucleic Acid (RNA) is a versatile molecule ( 1 ).It is fundamental to life ( 2 ) and has a breadth of roles including transcription and translation ( 3 ), catalyzing reactions ( 4 ), gene regulation ( 5 ), maintaining telomeres ( 6 ) and beyond.The functions of these RNAs are often governed by their structure ( 7 ).Additionall y, synthetic RN As can easily be constructed ( 8 ).A consequence of these properties is that RNA is a popular tool for bioengineering therapeutics (including mRNA vaccines ( 9 )) and biomachines (10)(11)(12).
RNAs have the quality that fast and (relati v e to other molecules) accurate algorithms exist to predict their struc-tures in silico (13)(14)(15)(16)(17). Since structure usually determines function in biology, a multitude of attempts have been made to de v elop algorithms to automaticall y design RN As with specific structures ( 18 ).We refer to this as the RNA design problem.Algorithmic RNA design dates back to at least the early 1990s ( 19 ).These algorithms have led to some promising, practical successes (20)(21)(22)(23).Howe v er, reliab ly solving RNA design puzzles is an open problem.No algorithm has solved the entire EteRNA100, a widely used RNA design benchmark that was hand-crafted by humans ( 24 ).Recent results suggest the RNA design problem is NP-hard by proving that the most general version of the problem is hard ( 25 ) as well as a simplified model ( 26 , 27 ).
Many techniques have been applied to the RNA design prob lem.We gi v e a non-e xhausti v e sample for conte xt.The first method was likely RNAinverse ( 19 ), which applied an adapti v e random walk.Later methods used stochastic local search ( 28 ) with seeding heuristics ( 29 ), genetic algorithms (30)(31)(32), constr aint progr amming (33)(34)(35), simulated annealing ( 36 ), hierarchical decomposition ( 37 , 38 ), Monte Carlo methods (39)(40)(41), global sampling ( 42 , 43 ) and ant colony optimisation ( 44 ).The EteRNA project has turned RNA design into a game in which solutions to RNA design puzzles can be cro w d sourced from human players ( 45 ).Recently, deep learning has also been applied to the problem in se v eral ways including learning from human solutions ( 46 ), and by a ppl ying r einfor cement learning ( 47 , 48 ).
Each of these RNA design algorithms generally have three components: (1) A computational model (encompassing multiple components ( 17 )), that can predict the structure of an RNA gi v en its sequence.(2) A fitness function , which determines the quality of a potential solution with r efer ence to a target structure.(3) A searc h algorithm , w hich explores the sequence space to find a desirable solution under the fitness function.
The choice of possible search algorithms is often limited by the choice of fitness function.
The thermodynamic 'nearest neighbor' model de v eloped by Turner, Mathews, Tinoco and many others ( 49 ) is ubiquitous in the RNA design litera ture.The implementa tion of the nearest neighbor model in the ViennaRNA package was e40 Nucleic Acids Research, 2023, Vol. 51, No. 7 PAGE 2 OF 10 used in our work.This is the most widely used implementation ( 13 ).Howe v er, it should be noted that other models exist and may offer new options for different fitness functions and search algorithms ( 17 ).
Generally speaking, design algorithms r equir e a fitness function to guide their search.This is a measure of how promising an intermediate solution is.The fitness functions used are almost as numerous as the algorithms.Howe v er, most algorithms use fitness functions that are a variant of one of the four main types.These will be defined formally la ter.They comprise structur e distance minimization , fr ee ener gy minimization , pr obability maximization and ensemble defect minimization .
Our aims are to summarize information about fitness functions for RNA design, and to provide an objecti v e analysis of them.The latter is achie v ed by e xperiments.We use a fitness function agnostic search algorithm to do a fair benchmark.Both our aims ar e r elevant to future works developing computational approaches to RNA design.The only similar work we are aware of is by Dirks et al. ( 50 ).We significantly expand and modernize the methods used, and find new results that differ somewhat from those of Dirks et al.We find that probability maximization performs better than any other fitness function, which differs from the previous result.This has implications for algorithms for RNA design, many of which currently use other fitness functions.

The computational RNA design problem
Before we begin our discussion of fitness functions, we must understand the rules of the game: what is RNA design?
An RNA is r epr esented by a string p ∈ { A , U , G , C } * where the letters correspond to nucleotides.We refer to this string p as the sequence .A secondary structure s is a (possibly empty) set of pairings between the indexes of p such that an index appears in s at most once and ther e ar e no two elements ( i , j ) ∈ s and ( k , l ) ∈ s such that i < k < j < l .In short, s is a properly nested set of pairings in p .In nature, some RN A structures contain improperl y nested pairings called pseudoknots.Also, some natural structures contain a single nucleotide binding to more than one other nucleotide forming a triplex or quadruplex.Since the computational model we use does not support these well, we do not consider them.Henceforth we elide secondary structure to just structure for brevity.
Call S ( p ) the set of all possible structures for a sequence p .Define a modelm ( p , s ) as a function that assigns e v ery s ∈ S ( p ) a score.Since the structure represents bonds that form as the RNA sequence folds, we call the process of finding the optimal structure folding and define FOLD ( m, p) = arg min s ∈ S ( p) m ( p, s).Ties for the arg min are broken arbitrarily.The solution to FOLD ( m, p) is the best guess of the true structure of the RNA under the model.Assuming the widely used thermodynamic model ( 49 ), FOLD ( m, p) can be computed in O (| p | 3 ) time ( 16 , 51 ).
The algorithmic RNA design problem is essentially to invert the folding problem.The folding problem takes a sequence p and finds an optimal structure s .The design problem inverts this relationship by taking a structure s and finding a sequence p that folds into s .Formally, we are gi v en a structure s and a model m .Define the in verse f olding func- Sometimes, there can be ties for the optimal fold FOLD ( m, p).This ha ppens w hen ther e ar e multiple structures with the minimum score.This issue is often overlooked in the literatur e. Ther e ar e many published r esults where a folding algorithm is run and whate v er arbitrary structure it produces in the case of ties is taken --a none xhausti v e list: ( 18 , 39 , 44 , 45 ).There are two ways to address the issue.We can say that a sequence p is a correct design for a structure s when it is tied with the optimal fold: m ( p, s) = m ( p, FOLD ( m, p)).Alternati v ely, we can say that p is a correct design iff FOLD ( m, p) = s and there are no ties.We adopt the latter tie-breaking strategy because it is more stringent and should lead to more accurate designs in vivo .
If ther e ar e ties for an optimal structure, it means the model cannot determine which is the most likely.An ideal design should ensure the target structure is non-ambiguously selected under the model.

Fitness functions
Success on the algorithmic RNA design problem as we have defined it is binary.Either a design is correct, or it is not.This is not directly usable for most search algorithms, which need to be able to make incremental progress toward a solution.This is why fitness functions are important.Their use is to estimate the effecti v eness of intermediate sequence solutions towards a target structure.Next, we describe the typical fitness functions used in RNA design.

Structure distance minimization.
The key idea in structure distance minimization is to make FOLD ( p, m ) as similar to the target structure as possib le.Gi v en a distance function d ( s 1 , s 2 ) between two structures, the structure distance for a gi v en potential solution sequence p and target structure t is d( t, FOLD ( p, m )).Minimizing the structure distance can lead to a correct design.This category of fitness function seems to be the most widely used ( 19 , 28 , 30 , 36 , 39 , 40 , 48 ).
It is important to choose a good distance function.In our experiments, we use three that are widely used in the literature.The first is base pair distance as implemented in the ViennaRNA software package ( 13 ).Base pair distance is the number of base pairs that occur in exactly one of the two compared structures.The second is the number of indexes in p where the structure differs from the target, defined as the number of incorrect nucleotides in ( 50 ).The third is Interaction Network Fidelity (INF) applied to secondary structures, which is the Mathews correla tion coef ficient of predicted and true base pairs ( 52 , 53 ).
A weakness of using structure distance is that it is unclear what to do in the case of ties for the optimal fold.The typical approach in the literature seems to be not to address this edge case and accept the arbitrary result of FOLD for structure distance comparison.We try se v eral methods of dealing with ties including using the average distance and minimum distance.Another weakness is that structure distance myopically considers only the results of FOLD while ignoring all other structures.This can make structure distance a poor guide when the target structure is close to op-PAGE 3 OF 10 Nucleic Acids Research, 2023, Vol. 51, No. 7 e40 timal, but not an optimal fold.Due to these w eaknesses, w e expected that structure distance is not an effecti v e fitness function.The expectation that structure distance is a poor metric may be considered controversial, since it is widely used.Howe v er, the pre vious wor k on fitness functions by Dirks et al. found structure distance (they call it MFE satisfaction) to have middling efficacy ( 50 ).
Fr ee ener g y minimization.Most e xisting RNA design algorithms use a thermodynamic nearest neighbour model.This model assigns each structure s ∈ S ( p ) a free energy scor e. Lower fr ee energy structur es ar e mor e likely at equilibrium and ther efor e mor e likely to be the true structure.Free energies can be used directly for RNA design.The utility of a potential design p for a target structure t can be defined as m ( p , t ).It seems na tural tha t we want to select designs that minimize the free energy of our target structure.This fitness function is often used in conjunction with other metrics ( 29 , 30 , 39 ), but it has been used independently too ( 50 , 54 ).
Despite its intuiti v e appeal, free energy minimization has a major weakness when used for algorithmic RNA design.Consider two sequences p 1 and p 2 and a structure s ∈ S ( p 1 ) ∩ S ( p 2 ) where m ( p 1 , s ) > m ( p 2 , s ).It is possible that FOLD ( m, p 1 ) = s and FOLD ( m, p 2 ) = s.In words, s may be more stable in p 2 compared to p 1 , but not the most stable among all structures with respect to p 2 while being the most stab le ov er all structures for p 1 .This is because a 'good' score in S ( p 1 ) may not be a 'good' score in S ( p 2 ).In essence, we cannot sensibly compar e scor es drawn from different thermodynamic ensembles.Additionally, if free energy minimization was a perfectly effecti v e fitness function, then either algorithmic RNA design would not be NP-hard or NP = P since a sequence with the minimum possible free energy for a structure can be found in polynomial time ( 29 ).Due to this w eakness, w e expect tha t free energy minimiza tion is not an effecti v e fitness function.Dir ks et al. pre viously found free energy minimization to be a relati v ely poor fitness function ( 50 ).
Pr obability maximization.The thermod ynamic equilibrium partition function for RNAs under the thermodynamic model can be computed efficiently, requiring O (| p | 3 ) time, the same as computing FOLD ( m, p) ( 55 ).This allows us to compute the conditional probability P ( s | p) of a structure s gi v en a sequence p .This probability can be maximized and used as a fitness function for RNA design.This fitness function has been used for RNA design, ( 19 , 31 , 50 ) albeit less widely than some other fitness functions.
Using probability maximization is similar to free energy minimization, but it addresses many of the major shortcomings.Because the probabilities are normalized to the same scale, it is more sensible to compare two probabilities taken from different sequences.In this sense, probability can be thought of as a 'better' free energy minimization.There is still a w eakness, how e v er.Suppose we are comparing two sequences p 1 and p 2 as potential designs for a target structure t .It is possible that t is the best ranked (by probability) for p 1 , but has a higher probability and a worse rank in p 2 .This can happen when the probability distributions looks dissimilar between sequences, for example p 1 could have a flat distribution and p 2 could have an exponential distribution.
This weakness is similar to the weakness of free energy minimization.Howe v er, it should hav e a lower negati v e impact, since it r equir es the distributions of structure probabilities for sequences to look dissimilar.For this reason, we expected that probability will be a better fitness function than free energy.Dirks et al. found that probability maximization was an effecti v e fitness function ( 50 ).
Ensemble defect minimization.Ensemble defect was originally introduced by Dirks et al. who found it to be the best fitness function they tested ( 50 ).They called it 'average number of incorrect nucleotides,' which describes what it measures.It was later called ensemble defect ( 37 ).In essence, the ensemble defect of a sequence p with respect to a target structure t is the sum of a certain distance function over the ensemble of structures S ( p ) weighted by probability.
A structure s is a set of paired nucleotide locations.For convenience, let s i = j if i and j ar e pair ed, and s i = i if i is not paired to any nucleotide.Let our distance function be the average number of incorrect nucleotides between a structure s and a target structure Define P ( s | p) as the probability of the structure s gi v en the sequence p .The ensemble defect of a sequence p for a target structure t is defined as: It is possible to compute D( p, t) in O (| p | 3 ) time, making it no less efficient to compute than comparable fitness functions ( 50 ).While not as widely used as structure distance minimization, ensemble defect is well known ( 33 , 37 , 38 , 50 , 56 ).It appears to combine the strengths of probability maximization and structure distance minimization while avoiding the weaknesses of both.We expect that it will be the most effecti v e fitness function.

MATERIALS AND METHODS
We ran two different experiments.The first used synthetic RNAs, and the second used real RNAs.In the first, we used a fitness function agnostic search algorithm to test the performance of each fitness function type on synthetic RNAs.In the second, we took real RNA sequences with conserved structures and determined if the fitness functions predict that the true sequence is a good design for the true structure.
We used the ViennaRNA package 2.5.0 ( 13 ) as the model and the RNAfold program as the folding algorithm for all our experiments.

Synthetic RNAs
A search algorithm is needed to test the fitness functions.It must be compatible with all fitness functions.We used an Adapti v e Random Walk (ARW), since it supports all fitness functions, is widely known, and is relati v ely simple.Dir ks et al. ( 50 ) used an ARW, as does the RNAinverse program in ViennaRNA ( 13 , 19 ).The algorithm is described in Algorithm 1, and makes a sequence of muta tions tha t maintain base pairs and accepting only those that improve the fitness function score.Our introduction points out the diversity of search algorithms for the RNA design problem.By restricting ourselves to a single search algorithm, we limit the conclusions we can make about different search algorithms.Howe v er, we hav e picked a search algorithm that is similar to local search subroutines in many algorithms ( 19 , 38 , 40 , 56 ), and is also comparable to techniques that have elements of local search like Monte Carlo methods, genetic and evolutionary algorithms, and simulated annealing.
The ARW was run using each fitness function.ARW was run for 1000 steps on each target structur e.A r esult sequence p was considered correct for a target structure t if it was the unique solution; that is it was the unique minimum free energy structure as judged by ViennaRNA ( 13 ).
If ther e wer e ties or if p did not fold into the target structur e, then a r esult was consider ed incorr ect.For each fitness function, we counted the number of correct and incorrect results.Better fitness functions should guide the ARW to good solutions more often, so the rate of corr ect r esults was used to measure fitness function efficacy.
Three synthetic data sets were used.A 40nt, 80nt, and 120nt set.In all sets, target structures were generated similarly.First, sequences were generated by picking each nucleotide with equal probability.A target structure was picked randomly for each sequence from the set of structures within Tkcal / mol of the minimum free energy.A different T value was picked in different data sets to adjust difficulty.We use this method because always picking a minimum free energy structure tended to generate structures that were easily solved by the ARW.The resulting set of target structures for each sequence was used as our testing data set.Wuchty's algorithm ( 57 ) in ViennaRNA ( 13 ) was used to generate all suboptimal structures within the free energy window.We ensured that a data set contained no duplicate structures.
For the 80nt and 120nt data sets, 1600 structures of lengths 80 and 120 respecti v ely were generated with T = 5.0 kcal / mol.For the 40nt data set, 1600 structures of length 40 with T = 1.0 kcal / mol were generated.These values (1600, 80, 120, 5, 40, 1) were picked before examining the perfor-mance results.Running the ARW for 1000 steps was an arbitr ary par ameter that was also chosen before examining performance results.Length 120 was picked because it was the largest length that ran in time on an AMD 5950X processor.Lengths 80 and 40 were picked arbitrarily.A window of T = 5.0 kcal / mol was chosen because 6.0 was too slow on our processor, and 1.0 was picked because it was significantly smaller than 5.0.
The code used to generate the data sets is provided in the Data Availability section.These can be used to generate the same set of structures used in our experiments.Statistics about the structural motifs in our data set are gi v en in Table 1 .
The previous work by Dirks et al. tested on 11 structures ( 50 ).Of these, 9 were variants of the a tRNA inspired three-way multiloop structure with varying stem lengths and numbers of unpaired nucleotides.Of the two remaining, one is a larger multiloop, and the other is a small pseudoknot.In contrast, we use 4800 structures with a much gr eater degr ee of di v ersity due to our method of generation.In particular, we expect our 80nt and 120nt data sets to contain much more difficult to solve structures.A major issues with the Dirks test set in addition to its small size is that all structur es wer e solved by both probability and ensemble defect.We belie v e har der puzzles are needed to distinguish between these two.
We do not belie v e that our structure generation method has a bias toward producing structures that are easier for any particular fitness function, because the results using synthetic data are consistent across all data sets and with those from real RNAs.Howe v er, we could not find a way to ensure no bias in da ta genera tion for this or any other method except picking structures uniformly at random.Uniform random structure generation could not be used because we found it almost always generates structures no fitness function can solve.
Recall that the ARW algorithm starts with a random seed sequence and modifies it.This sequence was picked uniformly at random.We tested all fitness functions on the three data sets with different seeding strategies for the ARW.This was to assess the sensitivity of the fitness functions to the seed sequence.Seed sequences were tested with 25% GC-content and 75% as well as picking sequences uniformly at random, which have 50% GC-content.
Structure distance.We tried se v eral variants of structure distance.We used 'base pair distance' (BPD), which is a standard distance measure included in ViennaRNA ( 13 ) and defined as the number of base pairs that occur in exactly one of the two compared structures.We also used a structural Hamming distance (HD) defined the same as in Equation ( 1 ).Finally, Interaction Network Fidelity (INF) was used ( 52 , 53 ).
For all of these distance functions, we tried three different ways of dealing with ties for the minimum free energy structure.This issue is described previously in Fitness Functions .We tried using the single arbitrary structure ViennaRNA returns; taking the average distance over all ties; and taking the minimum distance over all ties.
Both BPD and HD gi v e distance values in the range from 0 to n for n nucleotides.Because our ARW implementation Fr ee ener gy.The free energy as ev aluated b y the Vien-naRNA package ( 13 ) was used.Because a lower free energy is more stable, and our ARW maximizes fitness, we use − G as the fitness value, where G is the free energy change.
Free energy minimization produces sequences with a high GC-content ( 50 ).This is known to cause issues in RNA design ( 28 , 29 , 43 ).To address this, we test a variant of free energy minimization with a GC-content constraint which penalises e xcess GC e xponentially.The ARW maximizes Equation ( 3 ) where g is the target GC-content ratio (0.5 in our experiments) and ˆ g is the actual GC-content.
Probability.The probability P ( s | p) for a structure s gi v en a sequence p as evaluated by the ViennaRNA package ( 13 ) was used.This was achie v ed by computing the partition function for the sequence p .Probability was maximized directly by the ARW.
Ensemble defect.The normalized ensemble defect was calculated using the method provided in the ViennaRNA package ( 13 ).This is the same as in Equation ( 2), but normalized by sequence length: D( p,t) | p| .Because our ARW maximizes fitness, we optimized 1 − D( p,t) | p| .

Real RNAs
Many RNAs in nature hav e conserv ed structures.Consider a natural RNA sequence p with a known structure s .A fitness function f can be used to rank e v ery structure in S ( p ) as a design for p .We can assign each structure s ∈ S ( p ) a score f ( p , s ) then order S ( p ) according to these scores.A good fitness function is expected to rank the true structure s highly.Natural sequences have been 'designed' through natural selection to fold into their structure.While the computational folding algorithm onl y a pproximates this, we do expect that the real structure for a natural sequence is seen as a good design.A subtlety worth emphasizing is that we are not comparing multiple sequences for a single target structure, as with synthetic RNAs.Instead, we are looking at all the structures for a single, natural sequence that has a single true structure.Using natural sequences is important to fairly compare ensemble defect.It is claimed that ensemble defect is expected to make good predictions on natural sequences due minor base pairing fluctuations in vivo and the observed low ensemble defect of natural helices ( 37 , 38 , 50 ).Despite this, the pre vious wor k on fitness functions by Dirks et al. ( 50 ) did not test on natural sequences, so we consider this analysis novel.
We used ArchiveII , a curated collection of 3948 natural RNA sequences with conserved structures ( 58 ).For each, suboptimal folds were generated using the implementation of Wuchty's algorithm ( 57 ) in ViennaRNA ( 13 ).The suboptimal free energy window started at 0 kcal / mol and was increased in increments of 0.2 kcal / mol until at least 200 000 suboptimal structures were generated or the window exceeded 10 kcal / mol.The structures were ranked by their corresponding fitness function values.The rank of the RNA's true structure was found in the ranked list of structures.If it was not found, the closest structure by base pair distance ( 13 ) was used.If the closest structure differed by > 5% of the sequence length, the sequence was excluded from the experiment.A total of 1719 sequences were excluded this way.The 5% similarity threshold was picked because often the true structure does not appear in the suboptimal list (e.g., it contains a base pair not allowed in the nearest neighbour model), but a nearly identical structure does appear.
Probability, ensemble defect, and structur e distance wer e considered in our experiment.We opted not to test free energy.In the special case of this experiment, free energy necessarily always has the same ranking as probability because the sequence is fixed.For structure distance, we considered only Hamming distance breaking ties for the minimum free energy by averaging because it is r epr esentati v e of other structure distance methods on synthetic RNAs.

Synthetic RNAs
Tables 2 -4 summarize the results for synthetic RNAs.The first table shows probability and ensemble defect with the strongest performances on the easier synthetic data set.Ensemble defect and probability perform similarly.In order, the performance seems to be fr ee energy, structur e distance, ensemble defect, probability.This is somewhat similar to the results found by Dirks et al. ( 50 ) who found that both probability and ensemble defect solved their entire test set.
The second table contains results from the 80nt synthetic data set and shows probability with the highest correct rate and number of unique solutions.The same performance hierarchy is maintained, but probability pulls away by a lar ger mar gin.The number of unique solves is of particular inter est wher e probability dominates all other fitness functions.The thir d tab le contains results for the 120nt synthetic data set.We see similar results as in the 80nt but more exaggerated.All fitness functions have lower performance, but probability is barely impacted, while the performance of other fitness functions falls precipitously.
We tried se v eral variations of structur e distance.The r esults suggest that HD, BPD and INF are all comparable.Further, using an average to break ties is much better than arbitrary tie breaking (which is the norm in the literature) and minimum tie breaking.
The GC-content of seed sequences was tested using 25% and 75% GC-content seeds in addition to our default seeding strategy, which pr oduces r oughly 50% GC-content.Table 5 summarizes the results.While there is some minor variation, the seeds did not seem to have a significant effect on the number of solves or the final GC-content of the ARW f or an y fitness function.Note that Hamming distance with an average tie breaking strategy was used as r epr esentati v e for structure distance.5. Results on synthetic structures of all lengths with different GC-content percentages used for the seed sequences.For example, 80nt / 25% indicates the 80nt dataset was used and the seed sequence for the adapti v e w alk w as generated with 25% of the nucleotides as G or C. The r eported r esults ar e the number of correct solves and the find GC-content of the solution in brackets.For example, 512 (0.49) indicates 512 correct solutions and the average GC-content of the final sequences found by adapti v e walk using this fitness function was 49%

Real RNAs
We consider the ranks of the true structure (or the closest analog) as a percentage.Consider an example percentage of 30.2%.This means, on average, the true structure was ranked at the position 30.2%below the highest rank.Higher ranks and ther efor e lower per centages ar e better as they correspond to the true structure being seen more favorably by the fitness function.
Probability achie v ed a mean rank of 21.2% with a standar d de viation of 26.9% and a median of 8.0%.Ensemb le defect achie v ed a mean rank of 48.0% with a standar d deviation of 34.5% and a median of 47.8%.Structure distance achie v ed and mean rank of 48.2% with a standard deviation of 33.6% and a median of 47.9%.
Note that these results contain fewer solutions than on synthetic sequences.This is because the energy model used is imperfect, so natural sequence and structure pairs are not expected to be perfectly recognised as good designs.This is why we look at relati v e r anks r ather than unique MFE solutions.
These results are reflected in Figure 1 where probability dominates the lower percentiles and ensemble defect and structure distance dominates the higher percentiles.It appears as though natural sequences are designed in a way tha t correla tes with probability, but not ensemble defect or structure distance.
A r epr esentati v e e xample RNA is considered to illustrate why ensemble defect performs poorly compared to probability on natural RNAs.Consider the tRNA from Saccharomy ces Cere visiae ( 59 ). Figure 2 shows a small but informati v e sample of structures from this RNA.The top ranked structures by probability are di v erse with some three-and f our-wa y multiloops, including the true structure.The top ranked structures by ensemble defect have low diversity and are all three-way multibranch loops.Ensemble defect tries to find the probability weighted center of the structure space under its distance function.In this case, the ensemble defect structur es ar e a similar distance to the other structure clusters and ensemble defect picks this most central shape rather than the most accurate.
A second example is provided in Figure 3 .It is a small SRP ( 60 ).This is another case where ensemble defect does poor ly.The top r anked structures for probability include some di v ersity and contain se v eral structures similar to the true structure.The top ranked by ensemble defect have low di v ersity.In this case, the top structures for ensemble defect are similar to the three-way multiloop structures found by probability.Ensemble defect prefers these structures since they are relati v ely high probability in the ensemble, and they are also roughly equidistant (by probability weighted defect) from the other two structural clusters (f our-wa y m ultiloops and no m ultiloop).Ensemble defect makes a mistake by trying to pick a structure cluster that is central under its distance function, but being central does not appear to be correlated with being correct.

DISCUSSION
Ensemble defect did not perform the best in our experiments.Probability performed better.Dirks et al. reported that probability and ensemble defect both performed well ( 50 ).In their tests, both solved every RNA design challenge, with other fitness functions falling behind.We see a similar result in our 40nt data set (Table 2 ), which we expect has a similar difficulty to that of Dirks et al.Howe v er, the larger data sets (Tables 3 and 4 ) show probability having much stronger results.The explanation for the differences in our findings is that we tested on harder structures to design.If both fitness functions are solving e v ery design challenge, then the challenges are not hard enough to properly distinguish the fitness functions.We found the same pattern in real RNAs (Figure 1 ).Probability outperformed ensemble defect and structure distance.Our results seem to indicate that evolution has designed sequences in a way that correlates with probability, but not with ensemble defect or structure distance.Natural RNA sequences generally have low structural entropy ( 61 , 62 ).Maximising probability e v entually minimises entropy, so these observations may be linked.
An examination of specific RNAs where ensemble defect performs worse than probability is re v ealing.If there is more than one cluster of similar structures that are high probability, ensemble defect will compromise between the two clusters, which can lead to unrealistic averaging as can be seen in Figures 2 and 3 .We belie v e that a better interpretation when ther e ar e clusters is that a structure that at the center of any single cluster is likely, but one that is equidistant between two or more clusters is not.Probability captures this intuition.
It is possible that the issues with ensemble defect can be rectified by changing the distance function used (see Equation 1 ).Howe v er, a primary strength of ensemb le defect is that it can be efficiently computed.This is because of the intelligent choice of distance function ( 50 ).It may be diffi-cult to achie v e such an efficient algorithm with a different distance function.
Tables 2 -4 record GC-content.They suggest that free energy has a strong bias, which was expected given GC pairs have the lowest free energy change in the model.We expect to see 50% if the sequence was chosen randomly.Using GCcontrolled free energy as a fitness function did improve performance, but only a little.Probability and ensemble defect appear to have a slight bias in GC-content.The increase in GC-content seems to be correlated with increased performance on the test as well as increased difficulty.This may be explained by the thermodynamic model predicting GC pairs are more stabilizing than other pairs.Thus, harder design challenges m ust rel y on using the most stabilizing pairs in some situations, increasing the required GC-content.We note that Dirks et al. reported similar GC-contents, albeit slightly ele vated, possib ly due to their data set having a bias ( 50 ).Also, we found that changing the GC-content of the seed sequences for the ARW did not materially change our results (Table 5 ).
Structure distance appears to be the most widely used fitness function in the literature as we noted previously.The issue of minimum free energy tie breaking is not addressed in the existing literature.We address it here, and our results suggest that taking the average among the ties has the best result.We tested base pair distance, structural Hamming distance, and Interaction Network Fidelity.We did not observe any significant differences in performance.Overall, no variant of structure distance came close to the le v el of probability.As such, we recommend that probability is pr eferr ed in the future over structure distance despite its popularity.
In Figure 1 , we see that structure distance performs similarly to ensemble defect when used to rank real RNAs.Howe v er, it is less effecti v e than ensemb le defect on synthetic RNAs as seen in Tables 3 and 4 .We speculate that it could be the case that the evolutionary pr essur es on natural sequences are not captured by ensemble defect or structure distance, but ensemble defect is still a better guide than structure distance.
In the work presented here, we examined fitness functions in isolation.Howe v er, some e xisting techniques combine multiple fitness functions ( 31 , 39 ).Future work might test the performance of composite fitness functions.
Probability is the best performing fitness function, but it has a theoretical weakness described previously: in short probabilities taken from different distributions may not be comparab le.Future wor k may show that, in practice, RNA sequences of the same length have similar probability distributions over the corresponding structure ensemble.If this is true, then it would explain why comparing probabilities between different sequences works well in practice.Additionally, a better fitness function may be found that outperforms probability and does not have this issue.
We gi v e three major components comprising an RNA design algorithm: the computational model , the fitness function , and the search algorithm .It is interesting that the choice of computational model informs the space of possible fitness functions.For example, probability and ensemble defect r equir e that the model assigns the ensemble of structures probabilities.Howe v er, structure distance only r equir es a folding function.The choice of fitness function also constrains the search algorithm.To illustrate, ( 37 ) uses the properties of ensemble defect to weight mutations.A computational model based on the nearest neighbour thermodynamic model and dynamic programming recursions from ( 16) is widely assumed in RNA design, but there are alternati v es ( 17 ).We wonder if using a different paradigm altogether may affect the choice of fitness function and search algorithm.
The fitness function for design must lead the search algorithm toward good solutions.The goodness of those solutions is dependent on the computational model, which also has its own associated fitness function.Ther efor e, it must be the case that there is some relationship between the two fitness functions.The relationship between fitness functions for structur e pr ediction and fitness functions for design is indirect.For example, most successful folding algorithms minimize free ener gy ( 16 ), ho we v er we hav e demonstrated that doing the same for design is poor.Howe v er, some prediction algorithms maximize probability ( 17 ), and probability also works well for design as we have demonstrated.The reason for the relationship being complex is likely because the space of sequences is not comparable to the space of structures.In structure prediction, we must select from the space of all structures.For design, we must select from the space of all sequences, which is much larger.It is not clear tha t dif ferent fitness function hav e comparab le interpretations in both spaces.

DA T A A V AILABILITY
All source code and data is available at https://github.com/maxhwardg/fit-fns-for-rna-design .

e40 7 PAGE 8 OF 10 Figure 2 .
Figure 2. A sample of the structures for a tRNA ( Sacchar om y ces cer evisiae tdbR00000083 ).Colors corr espond to nucleotides in the true structur e depicted on the left.The top row is ranked by probability and the bottom row is ranked by ensemble defect.

Figure 3 .
Figure 3.A of the structures for a small ( Deinococcus r adiodur ans Dein.radi.AE000513 ).Colors correspond to nucleotides in the true structure depicted on the left.The top row is ranked by probability and the bottom row is ranked by ensemble defect.

Table 1 .
Statistics about the structural motifs in the Nearest-Neighbor model across the three synthetic data sets.Each data set contains 1600 structures.Average counts over the data set ( / 1600) are included in brackets.We define a helix as a sequence of consecutive base pairs with no loops

Table 2 .
Results on synthetic structures of length 40.The '# Correct' is the number of correct solutions out of 1600.The 'Correct Rate' is the ratio of the number of correct solutions and 1600.The 'Unique Solver' column contains the number of structures for which a fitness function was the only fitness function to find a correct solution

Table 3 .
Results on synthetic structures of length 80.The '# Correct' is the number of correct solutions out of 1600.The 'Correct Rate' is the ratio of the number of correct solutions and 1600.The 'Unique Solver' column contains the number of structures for which a fitness function was the only fitness function to find a correct solution

Table 4 .
Results on synthetic structures of length 120.The '# correct' is the number of correct solutions out of 1600.The 'correct rate' is the ratio of the number of correct solutions and 1600.The 'unique solver' column contains the number of structures for which a fitness function was the only fitness function to find a correct solution Comparison between probability, ensemble defect, and structure distance on real RNA sequences with known structures.Shows the cumulati v e number of RNAs for which the true structure (or the closest analog) was at or under a certain percentile when ranked by a fitness function against other structures.