Abstract

Summary: Computing the reversal distance and searching for an optimal sequence of reversals to transform a unichromosomal genome into another are useful algorithmic tools to analyse real evolutionary scenarios. Currently, these problems can be solved by at least two available softwares, the prominent of which are

GRAPPA
and
GRIMM
. However, the number of different optimal sequences is usually huge and taking only the distance and/or one example is often insufficient to do a proper analysis. Here, we offer an alternative and present
baobabLUNA
, a framework that contains an algorithm to give a compact representation of the whole space of solutions for the sorting by reversals problem.

Availability and Implementation: Compiled code implemented in Java is freely available for download at http://pbil.univ-lyon1.fr/software/luna/. Documentation with methodological background, technical aspects, download and setup instructions, interface description and tutorial are available at http://pbil.univ-lyon1.fr/software/luna/doc/luna-doc.pdf.

Contact:mdvbraga@gmail.com

Supplementary information:Supplementary data are available at Bioinformatics online.

1 INTRODUCTION

Computing the reversal distance between two unichromosomal genomes without duplications, insertions and deletions and finding one optimal sequence of reversals (that is, a sequence with a minimum number of reversals) that transforms one genome into the other can be solved in polynomial time, thanks to Hannenhalli and Pevzner (1999). These two problems have been the topic of several works, such as Tannier et al. (2007), and their solutions are valuable tools to analyse evolutionary scenarios. Currently, there are at least two available softwares to solve these problems. One is the package

GRAPPA
and the other is the software
GRIMM
, described respectively, in Moret et al. (2001) and Tesler (2002).

Nevertheless, there are many different solutions, with each solution representing an optimal sequence of reversals that sort one genome into another, and finding only one is often insufficient. Exploring the whole set of solutions is thus an interesting strategy to do a more realistic analysis. The first step in this direction was the enumeration of all solutions, thanks to an algorithm proposed by Siepel (2003). However, since the number of solutions is usually huge, the whole set is very hard to handle and this could be as useless as finding one of them. Bergeron et al. (2002) then proposed a model to represent the solutions in a compact way, grouping them into classes of equivalence. This allows to reduce the set to be handled and an algorithm to directly enumerate the classes was given by Braga et al. (2008). The number of non-equivalent solutions can be still too large, therefore, a method was proposed for filtering solutions using constraints (Braga, 2009).

In this work, we describe

baobabLUNA
, a framework that contains the implementation of the algorithm developed by Braga et al. (2008) to directly enumerate all the classes of equivalent solutions and also the further use of biological constraints to filter the classes.

2 DESCRIPTION

2.1 Permutations, reversals and sorting sequences

Genomes are represented by the list of homologous markers between them. These markers correspond to the integers 1, 2,…, n, with a plus or minus sign to indicate the strand they lie on. The order and orientation of the markers of one genome in relation to the other is represented by a signed permutation π = (π1, π2,…, πn−1, πn) of size n over {−n,…, −1, 1,…, n}, such that, for each value i from 1 to n, either i or −i is mandatorily represented, but not both. The identity permutation (1, 2, 3,…, n) is denoted by In.

A subset of numbers ρ⊆{1, 2,…, n−1, n} is said to be an interval of a permutation π if there exist i,j∈{1,…, n}, 1≤ijn, such that ρ={|πi|, |πi+1|,…, |πj−1|, |πj|}. Given a permutation π and an interval ρ of π, we can apply a reversal on the interval ρ of π, that is, the operation which reverses the order and flips the signs of the elements of ρ, that results in the permutation (π1,…, πi−1, −πj,…, −πi, πj+1,…, πn).

If s1ρ2…ρi is a sequence of reversals for a permutation π, we say that s sorts π into πT if the result of the consecutive application of the reversals ρ1, ρ2, …ρi on π is πT. The length of a shortest sequence sorting π into πT is called the reversal distance of π and πT, denoted by d(π, πT). Let s1ρ2…ρi be a sequence of reversals sorting π into πT. If d(π, πT)=i, then s is said to be an optimal sorting sequence. As an example, the sequence {1}{2}{4}{1, 2, 3} sorts (−3, 2, 1, −4) into I4 and is optimal.

2.2 Main functionalities

2.2.1 Computing traces

Given two permutations π and πT, the enumeration of all solutions (sequences) that sort π into πT can be done by iterating an algorithm given by Siepel (2003). However, the number of solutions is huge and the complexity of enumerating all of them is O(n2n+3) (Braga, 2009). Bergeron et al. (2002) introduced a more compact representation of the space of solutions, grouping them into equivalence classes called traces. All equivalent solutions in a trace are composed by the same reversals but in different orders. Observe however that this is not the formal definition of a trace, which can be obtained in Braga (2009). Braga et al. (2008) later proposed an algorithm to directly give one representative solution and the number of solutions in each trace. The complexity of this algorithm is also exponential in a property of the traces called width (Braga, 2009), but, as the number of traces is usually much smaller than the number of solutions, enumerating traces runs considerably faster.

The framework

baobabLUNA
contains the implementation of the algorithm developed by Braga et al. (2008). As a simple example of the gain represented by this algorithm with respect to the enumeration of all solutions, the 28 solutions that sort (−3, 2, 1, −4) into I4, can be grouped in only two traces, one is represented by {1}{1, 2, 3}{2}{4} and has 24 solutions, while the other is {1, 2, 4}{3}{1, 3, 4}{2, 3, 4} and has 4 solutions. More details on how the algorithm generates directly the traces and also counts the number of solutions in each trace can be obtained in Braga (2009).

2.2.2 Filtering traces with constraints

Biological constraints can be used to filter the traces of optimal sequences, as described in Braga (2009). Besides the two signed permutations π and πT, this approach requires a list C of compatible constraints for selecting the sequences that sort π into πT and respect the given constraints. Frequently, only a subset of the sorting sequences of a trace is in agreement with the constraints in C, and this subset is called C-induced subtrace. The result of applying this method is the complete set of non-empty C-induced subtraces of sequences sorting π into πT. Generally, we have no guarantee that a sorting sequence that respects all constraints exists, thus this approach can lead to an empty result.

One of the considered constraints is the list of common intervals detected between the two initial permutations, that may correspond to the clusters of co-localized genes between the considered genomes—an optimal sequence of reversals that does not break the common intervals may be more realistic than one that does break. This approach was previously used in several studies [see for instance, Diekmann et al. (2007)]. We used the common intervals initially detected and also a variation of this approach, described in Braga (2009), that is the list of common intervals progressively detected when sorting one permutation into another by reversals.

Another constraint implemented in

baobabLUNA
is called strata and is specific to the evolution of sexual X and Y chromosomes in mammals and some other organisms. Although X and Y are usually very different, they still share an identical region (called ‘pseudo-autosomal’ region) at one of their extremities and are believed to have evolved from an identical pair of chromosomes. This process is at the origin of sexual differentiation: the female XX and the male XY pairs. Current theories suggest that the pseudo-autosomal region, which originally covered the whole chromosomes, was successively pruned by a few big reversals on the Y chromosome (Lahn and Page, 1999). The successive limits of the pseudo-autosomal region on the X chromosome represent the limits of what have been called the ‘evolutionary strata’ of X chromosome and a sequence of reversals that could have created the strata on human X chromosome is given by Ross et al. (2005). The use of the strata as a constraint to filter the space of solutions of the sorting by reversals problem is described in Braga et al. (2008) and is used by Lemaitre et al. (2009) to evaluate the scenario of reversals given by Ross et al. (2005).

2.3 Experiments

In order to evaluate the performance of the algorithm that computes directly the traces, named

traces
, we used the algorithm
enumSol
that enumerates all solutions. We also tested the filters
perfTrcs
, that selects traces whose solutions do not break common intervals initially detected,
prgSubt
, which selects subtraces whose solutions do not break common intervals progressively detected and
strSubt
that selects subtraces whose solutions produce a given strata in the origin permutation. The analysed permutations are πA=(−12,11,−10,6,13,−5,2,7,8,−9,3,4,1) and πB=(−12,11,−10,−1,16,−4,−3,15,−14,9,−8,−7,−2,−13,5,−6) (both fictitious), Rfe=(1,3,−2,−11,5,−9,−10,8,6,−7,−4,12) and R2=I12 [the bacterium Rickettsia felis and its ancestor R2, reconstructed in Blanc et al. (2007)], X=I12 and Y=(−12,11,−2,−1,−10,−9,8,−5,7,6,−4,3), [human X and Y chromosomes, as the scenario proposed in Ross et al. (2005)]. The results are in Table 1 and show that computing traces directly indeed runs much faster than computing solutions. Moreover, the variants that take constraints in consideration usually run faster than computing all traces. Additional analyses and experimental results can be found in Braga (2009).

Table 1.

Computation results for each pair of permutations (the number of elements and reversal distance of each pair is given in the first column).

PERMUT. Algorithm NS NT Execution time 
πA, I12 
enumSol
 
8 278 540 − 
13.5 min
 
n=12, d=10 
traces
 
8 278 540 2151 
27 sec
 
 
perfTrcs
 
1 698 480 12 
4 sec
 
 
prgSubt
 
453 600 
2 sec
 
πB, I16 
enumSol
 
505 634 256 − 
16 h
 
n=16, d=12 
traces
 
505 634 256 21 902 
7.3 min
 
 
perfTrcs
 
122 862 960 171 
27 sec
 
 
prgSubt
 
5 963 760 
14 sec
 
Rfe, R
enumSol
 
546 840 − 
42 sec
 
n=12, d=9 
traces
 
546 840 13 
3 sec
 
 
prgSubt
 
263 088 
2 sec
 
X, Y 
enumSol
 
31 752 
5 sec
 
n=12, d=8 
traces
 
31 752 
1.3 sec
 
 
strSubt
 
420 
0.5 sec
 
PERMUT. Algorithm NS NT Execution time 
πA, I12 
enumSol
 
8 278 540 − 
13.5 min
 
n=12, d=10 
traces
 
8 278 540 2151 
27 sec
 
 
perfTrcs
 
1 698 480 12 
4 sec
 
 
prgSubt
 
453 600 
2 sec
 
πB, I16 
enumSol
 
505 634 256 − 
16 h
 
n=16, d=12 
traces
 
505 634 256 21 902 
7.3 min
 
 
perfTrcs
 
122 862 960 171 
27 sec
 
 
prgSubt
 
5 963 760 
14 sec
 
Rfe, R
enumSol
 
546 840 − 
42 sec
 
n=12, d=9 
traces
 
546 840 13 
3 sec
 
 
prgSubt
 
263 088 
2 sec
 
X, Y 
enumSol
 
31 752 
5 sec
 
n=12, d=8 
traces
 
31 752 
1.3 sec
 
 
strSubt
 
420 
0.5 sec
 

The columns NS and NT give, respectively, the number of sorting sequences and traces computed by each algorithm. Experiments were made on a 64 bit personal computer with two 3 GHz CPUs and 2 GB of RAM.

2.4 Download, setup and tutorial

Download and setup instructions, interface description and tutorial for computing traces (including the versions that take constraints in consideration) are available in http://pbil.univ-lyon1.fr/software/luna.

3 FINAL REMARKS

The framework

baobabLUNA
contains the implementation of a method proposed by Braga et al. (2008), that gives a compact representation of the solution space of the sorting by reversals problem, grouping solutions into traces. This is an interesting alternative to most of the previous methods that give either only one or all solutions, and are provided by tools such as
GRIMM
(Tesler, 2002) and
GRAPPA
(Moret et al., 2001). However, although the number of traces is much smaller than the number of solutions, it may be still too big to be interpreted, and in some cases, too big to be computed. Indeed, currently we are unable to compute traces for permutations with a reversal distance of about 20 or higher.

Different biological constraints can be used to filter the traces and reduce the universe to be handled. Nevertheless, there is no guarantee that a solution that respects the given constraints exists, thus this approach may lead to empty results.

ACKNOWLEDGEMENTS

The author is grateful to Marie-France Sagot and Christian Gautier for their constructive comments and to the Pôle Bioinformatique Lyonnais (PBIL) for hosting

baobabLUNA
web site.

Funding: Programme Alβan (E05D053131BR); French projects ANR (REGLIS NT05-3_45205 and MIRI BLAN08-1_335497); INRIA ArcoIris (associated with the University of São Paulo, Brazil); Rhône-Alpes Bioinformatics Center (PRABI).

Conflict of Interest: none declared.

REFERENCES

Bergeron
A
, et al.  . 
On the properties of sequences of reversals that sort a signed permutation
Journées Ouvertes en Biologie, Informatique et Mathématiques 2002
 , 
2002
France
Saint Malo
(pg. 
99
-
108
)
Blanc
G
, et al.  . 
Reductive genome evolution from the mother of Rickettsia
PLoS Genet.
 , 
2007
, vol. 
3
 (pg. 
103
-
114
)
Braga
M.DV
Exploring the Solution Space of Sorting by Reversals When Analyzing Genome Rearrangements
PhD Thesis
 , 
2009
France
Université Lyon 1
Braga
M.DV
, et al.  . 
Exploring the solution space of sorting by reversals with experiments and an application to evolution
Trans. Comput. Biol. Bioinform.
 , 
2008
, vol. 
5
 (pg. 
348
-
356
)
Diekmann
Y
, et al.  . 
Evolution under reversals: parsimony and conservation of common intervals
Trans. Comput. Biol. Bioinform.
 , 
2007
, vol. 
4
 (pg. 
301
-
309
)
Hannenhalli
S
Pevzner
P
Transforming cabbage into turnip (polyn. algorithm for sorting signed permutations by reversals)
J. ACM
 , 
1999
, vol. 
46
 (pg. 
1
-
27
)
Lahn
BT
Page
DC
Four evolutionary strata on the human X chromosome
Science
 , 
1999
, vol. 
286
 (pg. 
964
-
967
)
Lemaitre
C
, et al.  . 
Footprints of inversions at present and past pseudoautosomal boundaries in human sex chromosomes
Genome Biol. Evol.
 , 
2009
 
[Epub ahead of print, doi:10.1093/gbe/evp006, April 30, 2009]
Moret
B.ME
, et al.  . 
A new implementation and detailed study of breakpoint analysis
Proceedings of the 6th Pacific Symposium on Biocomputing
 , 
2001
Hawaii
Big Island
(pg. 
583
-
594
)
Ross
MT
, et al.  . 
The DNA sequence of the human X chromosome
Nature
 , 
2005
, vol. 
434
 (pg. 
325
-
337
)
Siepel
A
An algorithm to enumerate sorting reversals for signed permutations
J. Comput. Biol.
 , 
2003
, vol. 
10
 (pg. 
575
-
597
)
Tannier
E
, et al.  . 
Advances on sorting by reversals
Discrete Appl. Math.
 , 
2007
, vol. 
155
 (pg. 
881
-
888
)
Tesler
G
GRIMM: genome rearrangements web server
Bioinformatics
 , 
2002
, vol. 
18
 (pg. 
492
-
493
)

Author notes

Present address: Universität Bielefeld Technische Fakultät AG Genominformatik, Postfach 10 01 31, 33501 Bielefeld, Germany
Associate Editor: Alex Bateman
This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/2.0/uk/) which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.

Comments

0 Comments