Abstract

Summary: Cleaver is an application for identifying restriction endonuclease recognition sites that occur in some taxa but not in others. Differences in DNA fragment restriction patterns among taxa are the basis for many diagnostic assays for taxonomic identification and are used in procedures for removing the DNA of some taxa from pools of DNA from mixed sources. Cleaver analyses restriction digestion of groups of orthologous DNA sequences simultaneously to allow identification of differences in restriction pattern among the fragments derived from different taxa.

Availability: Cleaver is freely available without registration from its website () and can be copied, modified and re-distributed under the terms of the GNU general public licence version2 (). The program can be run as a script for computers that have Python 2.3 and necessary extra modules installed. This allows it to run on Gnu/Linux, Unix, MacOSX and Windows platforms. Stand-alone executable versions for Windows and MacOSX operating systems are available.

Contact:simon.jarman@aad.gov.au

1 INTRODUCTION

Restriction endonucleases are robust, cheap and widely available tools for analysing and manipulating DNA sequences. As analytical tools, endonucleases can be combined with DNA electrophoresis to provide some information about the sequence of short DNA molecules. A common way to differentiate between taxa is to generate a short (<2000 bp) PCR product from DNA of an unknown organism and digest the fragment with restriction endonucleases that produce different sized fragments from different species. Fragments can then be analysed by a number of electrophoretic means such as PCR-RFLP to identify single organisms (e.g. Pfeiffer et al., 2004) or TRFLP for studying diversity and identity of multiple species (Avaniss-Aghajani et al., 1994; Clement et al., 1998). These assays have generally been developed by empirical means. However, there is now a large and rapidly increasing amount of sequence information for short DNA fragments available in public databases for a diverse array of species. Numerous phylogenetic studies have placed their results in GenBank (). Projects specifically aimed at cataloging easily amplifiable DNA sequencers as unique taxonomic identifiers such as the ribosomal DNA database (Cole et al., 2005), and the DNA barcoding project (Hebert et al., 2003) have generated orthologous sequences for large numbers of diverse taxa. Cleaver may assist in designing these assays by identifying taxon-specific restriction endonuclease digestion patterns for a particular DNA fragment.

The sequence specificity of restriction endonucleases also makes them useful for manipulating pools of mixed DNA sequences. There are many situations where one set of sequences is desired for downstream analysis and another set is not. An example is DNA-based identification of animal prey, where there is a problem of predator DNA dominating the mixed PCR products of universal primers applied to DNA purified from animal stomach contents. This often makes the amplification of prey DNA difficult and reduces the proportion of prey DNA clones in libraries generated from such PCR products. Removal of the predator DNA can be accomplished by digesting mixed PCR products with an endonuclease that has recognition sites in predator DNA, but not prey DNA (Blankenship and Yayanos, 2005). The efficiency of the suicide polymerase chain reaction procedure (Green and Minz, 2005) for improving amplification of minor templates can also be enhanced by identifying and using an endonuclease that does not recognize sites in the minor template, but does have sites for other templates in the pool. Cleaver will help in selecting appropriate endonucleases in situations such as these.

2 FEATURES

The main interface of Cleaver is a graphical display split into two windows. One is a list of restriction endonucleases and their features; the other is a list of DNA sequences and information about them. Endonucleases and sequences to be used in analyses can be selected from these lists. Analyses are accessed through menus that generate a text or graphical file that is displayed in a new ‘top level’ window. Results can be copied and pasted from these windows using the system clipboard, or saved in their entirety to a file. Cleaver has analysis options for listing the fragment sizes produced by all endonucleases selected by the user for all selected sequences. There are options to do this either as single digests where fragment sizes for digests by each endonuclease on each sequence are given; or as multiple digests where fragment sizes for each sequence when digested by all selected endonucleases are shown. Other options are to analyse the frequency of cutting of endonucleases to produce text-based or graphical restriction maps that compare patterns of endonuclease recognition sites among selected sequences, analyse terminal restriction fragment lengths and an option to search for endonucleases that cut DNA from one group of sequences, but do not cut DNA from another group.

Cleaver allows the user to assign taxonomic groupings to DNA sequences for the purpose of performing searches for group specific endonucleases. If the sequences are in a FASTA (or similar format) file then groups can be assigned manually by the user. Another option is to download sequences directly from GenBank, the DNA database of Japan, or the EMBL database in the INSDSeq XML file format. This format includes taxonomic information on the organism from which sequences were derived, along with several other categories of metadata. Cleaver can then assign sequences to groups at a taxonomic level chosen by the user based on the taxonomy supplied by the INSDSeq file, so that sequences could be divided at the species, genus, family or higher level as desired.

Cleaver has a sequence editing facility that allows the user to view and manipulate sequences imported from file. Cleaver can read sequences in FASTA, clustal, MEGA and INSDSeq XML formats. Once in memory, the sequence editor allows these sequences to be individually reversed, complemented, cut or manually edited. All sequences in memory can be aligned by Cleaver linking to CLUSTALW (Thompson et al., 1994). Oligonucleotide binding sites can be found with a search option. Columns of residues can be removed. The combination of multiple sequence alignment and removal of columns allows the user to reduce sequences in an alignment to only regions that are truly orthologous, such as the portion of a DNA fragment that is found between two universal PCR primer binding sites. This is especially useful if sequences have been downloaded directly from databases as an unedited INSDSeq XML file. After sequences are edited in Cleaver they may be saved in a variety of file formats, including FASTA, CLUSTAL, MEGA, PAUP, Phylip and INSDSeq XML.

An online manual is provided that describes the operation of Cleaver in detail. This can be accessed through a menu option in Cleaver, which opens a web browser for viewing the manual. The manual includes examples of Cleaver's operation based on files included with each distribution of cleaver.

3 COMPARISONS WITH SIMILAR TOOLS

There are many software packages and websites available for displaying restriction digestion patterns in single DNA sequences. Some free example softwares are ‘BioEdit’ () and ‘cloneit’ () (Lindenbaum, 1998). Many commercial products are also available. Example websites are ‘restrictionmapper’ () and ‘NEB cutter’ (). However, to be useful for identifying restriction sites that can differentiate alleles or taxa, software that compares restriction patterns among a group of sequences are required. The program ‘restrifind’ () simulates an image of the end result of agarose gel electrophoresis of different DNA sequences after restriction digestion. A web-based tool for assisting with TRFLP analysis () has also been produced (Marsh et al., 2000).

4 IMPLEMENTATION

Cleaver is written in Python2.3 (). Its graphics are produced with the Qt3.3 graphical user interface library () as adapted for python in the PyQt3.13 package () for the Gnu/Linux and MacOSX versions. The Windows version uses a version of PyQt derived from the KDE-cygwin project. Cleaver uses CLUSTALW (Thompson et al., 1994) to align DNA sequences and this is re-distributed with permission along with Cleaver.

The Windows version is distributed as a stand-alone executable that does not require the user to have Python installed on their system. This was produced using py2exe () and it has a graphical installer produced by InnoSetup (). The MacOSX version is also produced as an executable file, which was generated with py2app ().

This work was funded by the Australian Commonwealth Government.

Conflict of Interest: none declared.

REFERENCES

Avaniss-Aghajani
E.K.
, et al.  . 
A molecular technique for identification of bacteria using small subunit ribosomal RNA sequences
BioTechniques
 , 
1994
, vol. 
17
 (pg. 
144
-
149
148–149
Blankenship
L.H.
Yayanos
A.A.
Universal primers and PCR of gut contents to study marine invertebrate diets
Mol. Ecol.
 , 
2005
, vol. 
14
 (pg. 
891
-
899
)
Clement
B.G.
, et al.  . 
Terminal restriction fragment patterns (TRFPs), a rapid, PCR-based method for the comparison of complex bacterial communities
J. Microbiol. Methods
 , 
1998
, vol. 
31
 (pg. 
135
-
142
)
Cole
J.R.
, et al.  . 
The Ribosomal Database Project (RDP-II): sequences and tools for high-throughput rRNA analysis
Nucleic Acids Res.
 , 
2005
, vol. 
33
 (pg. 
D294
-
D296
)
Green
S.J.
Minz
D.
Suicide polymerase endonuclease restriction, a novel technique for enhancing PCR amplification of minor DNA templates
Appl. Environ. Microbiol.
 , 
2005
, vol. 
71
 (pg. 
4721
-
4727
)
Hebert
P.D.N.
, et al.  . 
Biological identifications through DNA barcodes
Proc. R. Soc. Lond. B Biol. Sci.
 , 
2003
, vol. 
270
 (pg. 
313
-
321
)
Lindenbaum
P.
CloneIt: finding cloning strategies, in-frame deletions and frameshifts
Bioinformatics
 , 
1998
, vol. 
14
 (pg. 
465
-
466
)
Marsh
T.L.
, et al.  . 
Terminal restriction fragment length polymorphism analysis program, a web-based research tool for microbial community analysis
Appl. Environ. Microbiol.
 , 
2000
, vol. 
66
 (pg. 
3616
-
3620
)
Pfeiffer
I.
, et al.  . 
Diagnostic polymorphisms in the mitochondrial cytochrome b gene allow discrimination between cattle, sheep, goat, roe buck and deer by PCR-RFLP
BMC Genet.
 , 
2004
, vol. 
5
 pg. 
30
 
Thompson
J.D.
, et al.  . 
CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, positions-specific gap penalties and weight matrix choice
Nucleic Acids Res.
 , 
1994
, vol. 
22
 (pg. 
4673
-
4680
)

Author notes

Associate Editor: Golan Yona

Comments

0 Comments