Abstract
We report a major update of the MAFFT multiple sequence alignment program. This version has several new features, including options for adding unaligned sequences into an existing alignment, adjustment of direction in nucleotide alignment, constrained alignment and parallel processing, which were implemented after the previous major update. This report shows actual examples to explain how these features work, alone and in combination. Some examples incorrectly aligned by MAFFT are also shown to clarify its limitations. We discuss how to avoid misalignments, and our ongoing efforts to overcome such limitations.
Introduction
Multiple sequence alignment (MSA) plays an important role in evolutionary analyses of biological sequences. MAFFT is an MSA program, first released in 2002 (Katoh et al. 2002). Because of its high performance (Nuin et al. 2006; Golubchik et al. 2007; Dessimoz and Gil 2010; Letsch et al. 2010; Sahraeian and Yoon 2011; Sievers et al. 2011), MAFFT is becoming popular in recent years. After reviewing the previous version (version 6) in Katoh and Toh (2008b), we have been continuously improving its accuracy, speed, and utility in practical situations. These improvements and techniques were mostly reported in individual papers (Katoh et al. 2009; Katoh and Toh 2010; Katoh and Frith 2012; Katoh and Standley 2013). In this report, we demonstrate the different kinds of analyses that can be achieved with the new features, alone and in combination, using realistic examples. We also discuss limitations of current version by giving examples of sequences incorrectly aligned by MAFFT, and describe our ongoing efforts to overcome these limitations.
Basic Concepts and Usage
As listed in table 1, MAFFT version 7 has options for various alignment strategies, including progressive methods (PartTree, FFT-NS-1, and L-INS-1) (Feng and Doolittle 1987; Higgins and Sharp 1988; Katoh and Toh 2007), iterative refinement methods (FFT-NS-i, L-INS-i, E-INS-i, and G-INS-i) (Barton and Sternberg 1987; Berger and Munson 1991; Gotoh 1993; Katoh et al. 2005), and structural alignment methods for RNAs (Q-INS-i and X-INS-i; Katoh and Toh 2008a). See Katoh and Toh (2008b) for details of these strategies. According to a recent comparative study based on the MetAl metric (Blackburne and Whelan 2012a, 2012b), there are two significantly different classes of MSA methods, similarity-based methods and evolution-based methods. MAFFT is classified as a similarity-based method. However, evolutionary information is useful even for similarity-based methods, because the sequences to be aligned are generated from a common ancestor in the course of evolution. In this respect, MAFFT takes evolutionary information into account.
Options of MAFFT Version 7.
| Option name | Command | |
|---|---|---|
| For a large-scale alignment: progressive methods with the PartTree algorithm | ||
| NW-NS-PartTree1 | mafft –– parttree –– retree 1 input | Distance is by the 6mer method. |
| NW-NS-PartTree2 | mafft –– parttree –– retree 2 input | Distance is by the 6mer method. Guide tree is re-built. |
| NW-NS-DPPartTree1 | mafft –– dpparttree –– retree 1 input | Distance is estimated based on DP. |
| NW-NS-DPPartTree2 | mafft –– dpparttree –– retree 2 input | Distance is estimated based on DP. Guide tree is re-built. |
| For a medium-scale alignment: progressive methods | ||
| FFT-NS-1 | mafft –– retree 1 input | Approximately two times faster than the default. |
| FFT-NS-2 | mafft input | Default. |
| For a small-scale alignment: iterative refinement methods | ||
| FFT-NS-i | mafft –– maxiterate 16 input | Fastest of the four in this category. Uses WSP score (Gotoh 1995) only. |
| G-INS-i | mafft –– maxiterate 16 –– globalpair input | Uses WSP score and consistency (Notredame et al. 1998) score from global alignments. |
| L-INS-i | mafft –– maxiterate 16 –– localpair input | Uses WSP score and consistency score from local alignments. |
| E-INS-i | mafft –– maxiterate 16 –– genafpair input | Uses WSP score and consistency score from local alignments with a generalized affine gap cost (Altschul 1998). |
| If not sure which option to use | ||
| Automatic | mafft –– auto | Selects an appropriate option from FFT-NS-2, FFT-NS-i and L-INS-i, according to the size of input data. |
| For a small-scale RNA alignment: structural alignment methods | ||
| Q-INS-i | mafft-qinsi input | Structure information is included in iterative refinement step. |
| X-INS-i-scarnapair | mafft-xinsi –– scarnapair input | Uses pairwise structural alignment by MXSCARNA (Tabei et al. 2008). |
| To add new sequences into an existing MSA | ||
| Add | mafft –– add mew msa | The simplest option for alignment extension. |
| Addprofile | mafft –– addprofile msa1 msa2 | msa1 must form a monophyletic cluster. |
| Addfragments | mafft –– addfragments new msa | Suitable for short new sequences. |
| Addfragments, LAST | mafft ––addfragments new –– lastmultipair msa | Faster option, LAST (Kiełbasa et al. 2011) is required. |
| Addfragments, 6mer | mafft –– addfragments new –– 6merpair msa | Faster option for conserved data. |
| Parameters | ||
–– bl #, –– jtt #, –– tm # | Score matrices for protein alignment. | |
–– kimura # | Score matrix for nucleotide alignment. | |
| Utility options | ||
–– anysymbol | See main text. | |
–– reorder | ||
–– clustalout | ||
–– phylipout | ||
–– namelength # | ||
–– adjustdirection | ||
–– adjustdirectionaccurately | ||
–– seed msa1 –seed msa2 … | ||
–– treein treefile | ||
–– treeout | ||
–– thread # | ||
| Option name | Command | |
|---|---|---|
| For a large-scale alignment: progressive methods with the PartTree algorithm | ||
| NW-NS-PartTree1 | mafft –– parttree –– retree 1 input | Distance is by the 6mer method. |
| NW-NS-PartTree2 | mafft –– parttree –– retree 2 input | Distance is by the 6mer method. Guide tree is re-built. |
| NW-NS-DPPartTree1 | mafft –– dpparttree –– retree 1 input | Distance is estimated based on DP. |
| NW-NS-DPPartTree2 | mafft –– dpparttree –– retree 2 input | Distance is estimated based on DP. Guide tree is re-built. |
| For a medium-scale alignment: progressive methods | ||
| FFT-NS-1 | mafft –– retree 1 input | Approximately two times faster than the default. |
| FFT-NS-2 | mafft input | Default. |
| For a small-scale alignment: iterative refinement methods | ||
| FFT-NS-i | mafft –– maxiterate 16 input | Fastest of the four in this category. Uses WSP score (Gotoh 1995) only. |
| G-INS-i | mafft –– maxiterate 16 –– globalpair input | Uses WSP score and consistency (Notredame et al. 1998) score from global alignments. |
| L-INS-i | mafft –– maxiterate 16 –– localpair input | Uses WSP score and consistency score from local alignments. |
| E-INS-i | mafft –– maxiterate 16 –– genafpair input | Uses WSP score and consistency score from local alignments with a generalized affine gap cost (Altschul 1998). |
| If not sure which option to use | ||
| Automatic | mafft –– auto | Selects an appropriate option from FFT-NS-2, FFT-NS-i and L-INS-i, according to the size of input data. |
| For a small-scale RNA alignment: structural alignment methods | ||
| Q-INS-i | mafft-qinsi input | Structure information is included in iterative refinement step. |
| X-INS-i-scarnapair | mafft-xinsi –– scarnapair input | Uses pairwise structural alignment by MXSCARNA (Tabei et al. 2008). |
| To add new sequences into an existing MSA | ||
| Add | mafft –– add mew msa | The simplest option for alignment extension. |
| Addprofile | mafft –– addprofile msa1 msa2 | msa1 must form a monophyletic cluster. |
| Addfragments | mafft –– addfragments new msa | Suitable for short new sequences. |
| Addfragments, LAST | mafft ––addfragments new –– lastmultipair msa | Faster option, LAST (Kiełbasa et al. 2011) is required. |
| Addfragments, 6mer | mafft –– addfragments new –– 6merpair msa | Faster option for conserved data. |
| Parameters | ||
–– bl #, –– jtt #, –– tm # | Score matrices for protein alignment. | |
–– kimura # | Score matrix for nucleotide alignment. | |
| Utility options | ||
–– anysymbol | See main text. | |
–– reorder | ||
–– clustalout | ||
–– phylipout | ||
–– namelength # | ||
–– adjustdirection | ||
–– adjustdirectionaccurately | ||
–– seed msa1 –seed msa2 … | ||
–– treein treefile | ||
–– treeout | ||
–– thread # | ||
Note.—N, the number of sequences; L, the sequence length;
All the options of MAFFT assume that the input sequences are all homologous, that is, descended from a common ancestor. Thus, all the letters in the input data are aligned. Genomic rearrangement or domain shuffling is not assumed, and thus the order of the letters in each sequence is always preserved, although the sequences can be reordered according to similarity. Most options in MAFFT assume that almost all the pairs in the input sequences can be aligned, locally or globally. In such a situation, there is a tradeoff between accuracy and speed. For example, the PartTree option (Katoh and Toh 2007) is a fast and rough method, whereas L-INS-i and G-INS-i are slower and more accurate. RNA structural alignment methods are generally more accurate and computationally more expensive because they need additional calculations (Katoh and Toh 2008a). However, this tradeoff does not always hold. In particular, the new options to add sequences into an existing alignment (Katoh and Frith 2012), requires careful consideration of this tradeoff, as discussed later.
Profile Alignments
MAFFT has a subprogram,
This method separately convertsmafft-profile alignment1 alignment2>output
Assumptions on the phylogenetic relationship in different options of MAFFT. (A)
Assumptions on the phylogenetic relationship in different options of MAFFT. (A)
MAFFT version 7 has an alternative option,
This option accepts two existing alignments,mafft--addprofile alignment1 alignment2>output
Adding Unaligned Sequences into an MSA
As a result of advances in sequencing technologies, we increasingly need MSAs consisting of a larger number of sequences. There are several different approaches to enable construction of large MSAs, such as rapid algorithms and parallelization. Here, we describe an alternate approach: use of an existing alignment. There already exist databases of carefully aligned and annotated sequences (Cole et al. 2009; Sigrist et al. 2010; Punta et al. 2012), in which each MSA consists of a small number (typically up to ∼1,000) of sequences. We can use such MSAs as a backbone to build a larger MSA containing newly sequenced data. This is more efficient than rebuilding the entire MSA from a set of ungapped sequences. Moreover, this approach is relatively robust to low-quality sequences resulting from sequencing errors, misassemblies, and other factors. Such noise usually has a negative effect on the quality of an MSA, but there are situations where biologically important information is contained in low-quality sequences. In such a case, we first select highly reliable sequences to build a backbone MSA, and then add the other sequences, including low-quality ones, into the MSA. As a result, the quality of the final MSA is less affected by the low-quality sequences.
Inappropriate Applications of Profile Alignment
The
Another misapplication is as follows: 1) convert the existing alignment to a profile, 2) separately align each new sequence to the profile of the existing alignment, and 3) construct a full alignment from the individual alignments computed in the previous step. This approach is more reasonable than the first one but still problematic, because the phylogenetic positions of new sequences are assumed at the root of the tree, as illustrated in figure 1C. Results of this procedure for two cases are shown in table 2 and figure 2.
ITS alignments by different options of MAFFT, displayed on Jalview (Waterhouse et al. 2009). (A, B) Incorrect alignments by the FFT-NS-2 and L-INS-i algorithms, respectively. (C) An incorrect alignment by
ITS alignments by different options of MAFFT, displayed on Jalview (Waterhouse et al. 2009). (A, B) Incorrect alignments by the FFT-NS-2 and L-INS-i algorithms, respectively. (C) An incorrect alignment by
Comparison of Different Options Using the 16S.B.ALL Data Set (Mirarab et al. 2012).
| Data | Method | Accuracy | CPU Time | Actual Timea |
|---|---|---|---|---|
| Case 1 | mafft ––multipair ––addfragments frags existingmsa | 0.9969 | 6.67 days | 18.3 h |
mafft ––6merpair ––addfragments frags existingmsa | 0.9949 | 3.76 h | 36.2 min | |
mafft ––localpair ––add frags existingmsa | 0.9707 | 23.4 daysb | 2.43 daysb | |
mafft ––6merpair ––add frags existingmsa | 0.9604 | 1.32 h | 1.44 h | |
| profile alignment | 0.2779 | 15.5 h | 1.60 h | |
| Case 2 | mafft ––6merpair ––addfragments frags existingmsa | 0.9969 | 4.54 h | 33.8 min |
| Case 3 | mafft ––6merpair ––addfragments frags existingmsa | 0.9949 | 1.79 days | 5.91 h |
| Data | Method | Accuracy | CPU Time | Actual Timea |
|---|---|---|---|---|
| Case 1 | mafft ––multipair ––addfragments frags existingmsa | 0.9969 | 6.67 days | 18.3 h |
mafft ––6merpair ––addfragments frags existingmsa | 0.9949 | 3.76 h | 36.2 min | |
mafft ––localpair ––add frags existingmsa | 0.9707 | 23.4 daysb | 2.43 daysb | |
mafft ––6merpair ––add frags existingmsa | 0.9604 | 1.32 h | 1.44 h | |
| profile alignment | 0.2779 | 15.5 h | 1.60 h | |
| Case 2 | mafft ––6merpair ––addfragments frags existingmsa | 0.9969 | 4.54 h | 33.8 min |
| Case 3 | mafft ––6merpair ––addfragments frags existingmsa | 0.9949 | 1.79 days | 5.91 h |
Note.—The estimated alignments were compared with the CRW alignment to measure the accuracy (the number of correctly aligned letters/the number of aligned letters in the CRW alignment). Calculations were performed on a Linux PC with 2.67 GHz Intel Xeon E7-8837/256 GB RAM (for the case marked with superscript alphabet “b”), or on a Linux PC with 3.47 GHz Intel Xeon X5690/48 GB RAM (for the other cases). Case 1: 13,822 sequences in the existing alignment × 13,821 fragments; Case 2: 1,000 sequences in the existing alignment × 138,210 fragments; Case 3: 13,822 sequences in the existing alignment × 138,210 fragments.
aWall-clock time with 10 cores. Command-line argument for parallel processing is
bFull command-line options are as follows:
The ––add and ––addfragments Options
To overcome this limitation of profile alignment, in 2010, we implemented an option,
Along with popularization of second-generation sequencers, we sometimes need to align short reads to an existing alignment. Several tools (Berger and Stamatakis 2011; Löytynoja et al. 2012; Sun and Buhler 2012) for this purpose were developed between 2011 and 2012. A limitation of the
Test Case 1: Fungal Internal Transcribed Spacers Sequences
Here, we discuss how the
Suppose a situation where we need an MSA of approximately 300 full-length sequences and approximately 5,000 ITS1 or ITS2 sequences. One possible solution is to build an entire MSA at once. The result of the default option (FFT-NS-2) of MAFFT is obviously incorrect, as shown in figure 2A. ITS1 and ITS2 regions are forced to be aligned to each other. Even if a more computationally expensive (and usually more accurate) method, L-INS-i, is applied (CPU time = 98 h), the alignment is still obviously incorrect (fig. 2B).
Two-step strategies can solve this type of problem. That is, a set of full-length sequences taken from databases are first aligned to build a backbone MSA, and then the new ITS1 and ITS2 sequences are added into this backbone MSA, using the
The second command is equivalent toStep 1:
mafft--auto full_length_sequences >\ backbone_msaStep 2:
mafft--addfragments \ new_sequences backbone_msa > output
in which Dynamic Programming (DP) is used to compare the distances between every new sequence and every sequence in the backbone MSA (mafft--multipair--addfragments \ new_sequences backbone_msa > output
where distances are rapidly estimated using the number of shared 6mers, instead of DP.mafft--6merpair--addfragments \ new_sequences backbone_msa > output
The result of the latter option (
This case suggests that it is crucial to select a strategy appropriate to the problem of interest. The most time-consuming method, L-INS-i, is not always the most accurate one. The difficulty of this problem for standard approaches comes from the fact that ITS1 sequences and ITS2 sequences are not homologous to each other and most pairwise alignments are impossible. Because of these nonhomologous pairs, the distance matrix used for the guide tree calculation is not additive; the distances between ITS1 and full-length sequences and those between ITS2 and full-length sequences are close to zero, whereas the distances between ITS1 and ITS2 are quite large. In this situation, it is difficult for normal distance-based tree-building methods to give a reasonable tree. Moreover, in the alignment step, the objective function of the L-INS-i is affected by inappropriate pairwise alignment scores between ITS1 and ITS2. Such problems can be avoided by just ignoring the relationship between ITS1 and ITS2, as done in the
In addition, a result of the second type of misuse of
Test Case 2: Bacterial SSU rRNA
Another case is the 16S.B.ALL data set by Mirarab et al. (2012). It consists of an MSA of 13,822 bacterial SSU rRNA sequences, taken from the Gutell Comparative RNA Website (CRW) (Cannone et al. 2002) and 138,210 fragmentary sequences, which are originally included in the CRW alignment but ungapped and artificially truncated. In Katoh and Standley (2013), we used a subset (13,821 fragmentary sequences) prepared by Mirarab et al. (2012). In addition to this subset, here we use the full data set (138,210 fragmentary sequences), to examine the scalability. Suppose a situation where we already have a manually curated (or backbone) MSA and a newly determined set of many fragmentary sequences in a metagenomics project, and we need an entire MSA of them.
The first four lines in table 2 (case 1) show the performances of various options for such an analysis, with a relatively small data set (13,822 sequences in the existing alignment × 13,821 fragments). The accuracy of each resulting MSA was evaluated by comparing the MSA with the original CRW alignment. CPU time and wall-clock time for each method are also listed. As the sequences in this data set are highly conserved, the difference in accuracy between the default (
Again, the tradeoff between accuracy and speed does not hold. The application of a computationally expensive method based on L-INS-1 (
The “profile alignment” line in table 2 shows results of the second type of misuse of profile alignment (discussed earlier), in which the given alignment is converted to a profile and each new sequence is separately aligned to the profile. This result clearly indicates that the application of profile alignment must be avoided in this case, too. Users do not need to be too worried about this misuse, because this calculation is disabled in MAFFT unless the user modifies the code or writes a wrapper script.
The last two lines in table 2 (Cases 2 and 3) show the performance of the fast option (
Parallelization
MAFFT version 7 has an option for parallel processing,
For progressive methods, the result with the multithread version is identical to that of the serial processing version. However, for iterative refinement methods, the results are not always identical. We confirmed that the accuracy of the parallel version in this case is comparable with that of the serial version (Katoh and Toh 2010). The efficiency of parallelization depends on the alignment strategy. In the case of the
Utility Options
MAFFT version 7 also has several enhanced options for peripheral functions.
Estimating the Direction of DNA Sequences
In the case of nucleotide alignments, if some of input sequences have an incorrect direction relative to the other sequences, the directions can be automatically adjusted by the
MAFFT cannot handle more complicated sequences with genomic rearrangements (translocations, duplications, or inversions). The web version of MAFFT displays dot plots between the first sequence and the remaining sequences, using the LAST local alignment program (Kiełbasa et al. 2011), for every nucleotide alignment run. By viewing the dot plots, a user can easily check for genomic rearrangements and the directions of input sequences. See Katoh and Standley (2013) for details and an example.
Input/Output
MAFFT version 7 has several enhancements in the flexibility of input/output. The following options related to input/output are available and can be combined with other options.
- ––anysymbolIf the input data include unusual letters, like U, J, etc., (in the case of protein data), MAFFT stops by default. The––anysymboloption allows these letters and nonalphabetical letters.
- ––preservecaseBy default, amino acid sequences are converted to upper case and nucleotide sequences are converted to lower case. This behavior can be changed by using the––preservecaseoption.
- ––reorderThe order of sequences is the same as the input sequences by default, but the sequences can be sorted according to similarity to each other by the––reorderoption.
- ––phylipoutand––clustaloutThe output format is multi-fasta by default, but the phylip (interleaved) format and the clustal format can be selected.
Guide Tree and Phylogenetic Positions of New Sequences
Users can check the guide tree by using the
(A) A part of output of the
(A) A part of output of the
Note that this phylogenetic information is roughly estimated before the MSA calculation, not based on the MSA. Especially, with the fast option,
Parameters
For amino acid alignment, MAFFT uses the BLOSUM62 matrix by default. For nucleotide alignment, a 200PAM log-odds scoring matrix is generated assuming that the transition rate is twice the transversion rate. These matrices are suitable for aligning distantly related sequences. We selected these default parameters based on an expectation that, if the program works well for difficult (distantly related) cases, it should also work well for easy cases.
It is unclear whether this expectation is always correct. For example, in a benchmark using simulated protein sequences (Löytynoja et al. 2012) generated by INDELiBLE (Fletcher and Yang 2009), when we tested a more stringent scoring matrix, JTT 1PAM (Jones et al. 1992) with weaker gap penalties than the default, the benchmark scores were considerably improved. Despite this observation, we consistently used the default parameters in the benchmark in Katoh and Frith (2012), because it does not make sense to arbitrarily adjust parameters to a simulation setting. This observation suggests that the current default parameters of MAFFT might not be very suitable for aligning closely related sequences. However, this idea must be checked using actual biological sequences.
User can select different scoring matrices other than the default. For amino acid alignment,
One possible extension is to use different scoring matrices and gap penalties for different sequence pairs according to the divergence level, like ClustalW (Thompson et al. 1994). More studies using actual sequence data will be necessary before implementing this extension. It will also be necessary to adjust gap penalties, preferably based on a realistic evolutionary model of insertions and deletions.
Use of Structural Information
We have discussed possible improvements in MSAs of closely related sequences in the previous section. MSA of distantly related sequences is still a challenging problem.
Test Case 3: PIN Domain
Figure 4 shows a typical limitation of sequence level alignment for a highly divergent set of three PIN-domain containing proteins: human regnase-1, VPA0982 from Vibrio parahaemolyticus, nuclease domain of taq polymerase from Thermus aquaticus. These three proteins share a magnesium-binding site composed of three conserved aspartic acids. Figure 4A shows a superposition of the three structures (Protein Databank identifiers 3v33, 2qip, and 1taq, respectively). The middle aspartic acid is indicated by sphere-representation, colored red. In Figure 4B, a typical MSA (by MAFFT-L-INS-i) is shown wherein the middle aspartic acid position is misaligned. In Figure 4C, a structure-informed MSA (described below), with the middle aspartic acid correctly aligned, is shown.
(A) Superposition of 3v33, 2qip, and 1taq structures visualized by PyMOL (Schrödinger LLC 2010). (B) MAFFT-L-INS-i sequence alignment displayed on jalview (Waterhouse et al. 2009). Misaligned Ds are highlighed in red. (C) Structure-informed MSA with correctly aligned Ds; Alpha helices and beta sheets are shown in blue and yellow, respectively, in (A–C).
(A) Superposition of 3v33, 2qip, and 1taq structures visualized by PyMOL (Schrödinger LLC 2010). (B) MAFFT-L-INS-i sequence alignment displayed on jalview (Waterhouse et al. 2009). Misaligned Ds are highlighed in red. (C) Structure-informed MSA with correctly aligned Ds; Alpha helices and beta sheets are shown in blue and yellow, respectively, in (A–C).
Strategy for Integrating Structural Alignments and MAFFT
It has long been known that structural information can be used to improve MSA calculations. This was the basis of the 3D Coffee program (O’Sullivan et al. 2004), and later the PROMALS3D package (Pei et al. 2008). Here, we address incorporation of protein structural information in MAFFT-based MSA construction. There are both conceptual issues and technical issues that complicate the process. Conceptually, we have to define structural similarity in such a way that it can easily be used in sequence alignments. We discuss our approach to this problem below in the context of integrating MAFFT with the structural alignment program ASH (Standley et al. 2004, 2007). On the technical level, structural information complicates matters simply because protein structures contain more information and more noise than sequence information.
Here, we focus on one essential feature of ASH: the equivalence score that is used to define structural similarity. A particular element in the structural similarity matrix takes the form of a Gaussian-shaped function of the inter-residue distance
where dij is the distance between two alpha carbons i and j in the two input structures and d0 is a parameter that defines tolerance in the score. The default behavior is to set d0 to 4 Å. The goal of ASH is to maximize the sum of eij over aligned residues. The residue-level equivalences, which form the basis of all ASH alignments, provide a convenient route for combining MAFFT and ASH. We can, for example, set a threshold value of eij and incorporate highly confident parts of the alignment into MAFFT to “seed” the MSA calculation. If we consider the case of the three PIN domain-containing structures in Figure 4, we can first compute structural alignments for the three unique pairs using ASH (ash_3v33A-2qipA, ash_3v33A-1taqA, and ash_2qipA-1taqA). If we set a threshold for residue equivalence atBecause the sequence identities between the aligned structures are low, we see an improvement in the resulting MSA relative to conventional MAFFT (Fig. 4). Based on this approach, we are developing an integrative service for protein structure-informed MSA construction.mafft-linsi--seed ash_3v33A-2qipA\ --seed ash_3v33A-1taqA\--seed ash_2qipA-1taqA \ sequences>output
Acknowledgments
The authors thank Drs. Wen Chen, C. André Lévesque, and Christopher Lewis, Agriculture and Agri-Food Canada, for permitting the use of the ITS data in this article and providing other challenging problems. This work was supported by Platform for Drug Discovery, Informatics, and Structural Life Science from the Ministry of Education, Culture, Sports, Science and Technology, Japan, and the Computational Biology Research Center, National Institute of Advanced Industrial Science and Technology (AIST), Japan.
