Condensation (C) domains in the nonribosomal peptide synthetases are capable of catalyzing peptide bond formation between two consecutively bound various amino acids. C-domains coincide in frequency with the number of peptide bonds in the product peptide. In this study, a phylogenetic approach was used to investigate structural diversity of bacterial C-domains. Phylogenetic trees show that the C-domains are clustered into three functional groups according to the types of substrate donor molecules. They are l-peptidyl donors, d-peptidyl donors, and N-acyl donors. The fact that C-domain structure is not subject to optical configuration of amino acid acceptor molecules supports an idea that the conversion from l to d-form of incorporating amino acid acceptor occurs during or after peptide bond formation. l-peptidyl donors and d-peptidyl donors are suggested to separate before separating the lineage of Gram-positive and Gram-negative bacteria in the evolution process.
Nonribosomal peptide synthetases (NRPS) are modular mega-multifunctional enzymes that synthesize an incredibly diverse set of biological active peptides or cyclic lipopeptides . They are antibiotics, biosurfactants, siderophores, and immunosuppressants, as well as antitumor and antiviral agents. These valuable biomolecules carry important medical and biotechnological applications. The presence of d-amino acid residues is a hallmark of nonribosomal peptides. This creates structural diversity, constrains their stereochemical conformation, and resists to proteolysis . Biosynthesis of nonribosomal peptides occurs via the action of catalytic units of NRPS, referred to as modules, in the direction from N-terminal to C-terminal ends as well as by ribosome dependent mechanism. Each module is composed of functionally specific domains that are responsible for catalyzing different enzymatic activities [ [, . They are adenylation (A), thiolation (T), and condensation (C) domains. A-domain is responsible for amino acid recognition and adenylation at an expense of ATP. Then, adenylated amino acid is covalently attached to a phosphopantetheine carrier of the adjacent T-domain. Peptide bond formation of two consecutively bound amino acids to the growing peptide chain is catalyzed by the C-domain. Additionally, a modifying epimerization (E) domain that catalyzes the conversion of l-amino acids to d-isomers is typically associated with the module incorporating d-amino acid. After the synthesis of linear intermediate peptide, cyclization and release of the product peptide are carried out by a C-terminal thioesterase (Te) domain. This modular organization of NRPS allows artificial alteration of the protein template with an aim to reprogram it for the synthesis of novel peptides with improved properties .
Peptide bond formation by C-domains requires two adjacent aminoacyl molecules for initiation or one peptidyl donor and one aminoacyl acceptor molecules during elongation of peptide chain. C-domain is about 450 amino acids in length and is located at the N-terminus of each module. It coincides in frequency with the number of peptide bonds in the product peptide. Although A-domain primarily recognizes and activates specific substrate molecules based on the structure, it is probable that the structure of C-domains is also affected by the structure of substrates donor and acceptor molecules. In fact, an alignment of 74 C-domains indicated a clustering according to the types of reactions catalyzed and chirality of donor and acceptor molecules . They are for l-l condensations (lCl; donorCacceptor), d-l condensations (dCl), δ-l-α-l condensations, condensations involving N-methylated amino acids, N-acylations, and reactions involving cysteinyl-peptides as donors. Biochemical characterization of the C5 domain of tyrocidine synthetase has revealed that this domain is indeed a dCl catalyst . It was suggested that there is no lCd and dCd domains. Unfortunately, there were only a limited number of amino acid sequences available for putative lCd and dCd in the NRPS to support this idea. So, it remains unclear whether the putative l/dCd constitutes a phylogenetic group different from l/dCl. We have recently cloned a gene cluster encoding arthrofactin synthetase (Arf), arfABC, from Pseudomonas sp. MIS38 . The product lipopeptide, arthrofactin, contains seven d-form amino acid residues and Arf is expected to contain five putative dCd and one putative lCd. We took advantage of this recent information to re-examine the relationship between the structure of C-domain and donor and acceptor molecules. It was found that lCl and putative lCd, and dCl and putative dCd, are not clearly separated in l-peptidyl donors and d-peptidyl donors, respectively. This topology of the tree allowed us to conclude that there is no C-domain, l/dCd, which specifically recognizes d-form acceptor molecules.
Materials and methods
The amino acid sequences of C-domains in various NRPS were retrieved from publicly accessible databases (http://www.ncbi.nlm.nih.gov/entrez/). We adopted the NRPS whose product structure has been determined. They are of anabaenopeptilides (Apd, DDBJ/EMBL/GenBank Accession No. AJ269505), arthrofactin (Arf, AB107223), bacillomycin D (Bam, AY137375), bacitracin (Bac, AF007865), chloroeremomycin (Cep, T17483), complestatin (Com, AF386507), fengycin (Fen, AF023464), gramicidin (Grs, X61658), iturin A (Itu, AB050629), lichenysin (Lic, U95370), mycosubtilin (Myc, AF184956), nostopeptolide (Nos, AF204805), pyoverdin D (Pvd, U07359), pyoverdin S (Pvs, AE016863), surfactin (Srf, X70356), syringomycin (Syr, AF047828), syringopeptin22 (Syp, AF286216), tyrocidine (Tyc, AF004835), and a condensation enzyme from vibriobactin synthetase (VibH, AAD48879). The sequences of C-domains were aligned by the ClustalW program  provided by the DNA Data Bank of Japan, DDBJ.
Phylogenetic trees were constructed using the distance method and the character based method (protein parsimony, PROTPARS program) from the PHYLIP package v3.6 . For the distance method, a distance matrix is calculated by protein distance matrix calculation (PROTDIST program) and the matrix is then transformed into a tree by NEIGHBOR program. In order to verify the accuracy of the tree, multiple data sets were generated using the SEQBOOT program with 1000 bootstrap replicates. A tree was built from each replicate with PROTDIST or PROTPARS program, then bootstrap values were computed with CONSENSE program. The phylogenetic tree was visualized with the Phylodendron program (http://iubio.bio.indiana.edu/treeapp/treeprint-form.html). Both methods gave similar tree topology, suggesting that the major clustering did not result from computational artefacts. Then, only the trees constructed by the distance method are presented in this paper.
Results and discussion
C-domains of arthrofactin synthetase
Arf is a NRPS responsible for the biosynthesis of a cyclic lipoundecapeptide, arthrofactin, in Pseudomonas sp. MIS38. It consists of three large subunit proteins (ArfA, ArfB, ArfC) that contain totally 11 functional modules. Arf represents a novel type of NRPS architecture that features tandem Te-domains but lacks the internal E-domain . There are 10 peptide forming C-domains (namely from ArfA_C2 to ArfC_C5) in Arf. One additional C-domain (namely ArfA_C1) was identified in the first module of ArfA. This arrangement has also been reported for several NRPS [ [0– . It is suggested that the first amino acid could be initially acylated with a fatty acid in this domain . When d-amino acid residues show up in the peptide products, the available l-isomers will be typically selected by A-domains and then epimerized by downstream E-domain. The peptide intermediate will be then presented to the next C-domain that is a dCl catalyst. It is unclear if there are any lCd and dCd domains . According to the chirality of amino acid in arthrofactin, d-d-d-d-d-d-l-d-l-l-l, we propose that the second to sixth C-domains in Arf (ArfA_C2/ArfB_C1/ArfB_C2/ArfB_C3/ArfB_C4) are potentially d-specific for both donor and acceptor molecules (namely putative dCd). Similarly, ArfC_C1 and ArfC_C3 would be dCl, ArfC_C2 would be putative lCd, whereas ArfC_C4 and ArfC_C5 would be lCl, respectively.
A phylogenetic tree of C-domains in Arf clearly showed two clustered branches according to the chirality of donor molecules rather than acceptor molecules, namely l-peptidyl donors and d-peptidyl donors (Fig. 1). lCl and putative lCd, and dCl and putative dCd, are not clearly separated in l-peptidyl donors and d-peptidyl donors, respectively. This finding suggests the evolutionary relationship of C-domain for donor molecules rather than acceptors. ArfA_C1 that probably catalyzes N-acylation of the first amino acid is closely related to l-peptidyl donors but forms an independent branch.
Sequence alignments of 10 peptide bond forming C-domains revealed that there are insertion or deletion of amino acid sequences between l-peptidyl donors and d-peptidyl donors (Fig. 2). These differences may reflect on the different substrate specificity for the chirality of donor molecules. Unlike to the NRPS from Gram-positive bacteria, Arf, Syr, and Syp in Pseudomonas do not contain E-domain genes in the synthetase gene clusters [ [, [1, . Then, an external epimerase is supposed to be involved in the formation of holo-NRPS. The differences in the sequence of l- and d-peptidyl donors may attribute to the binding or non-binding with this unknown external epimerase. The amino acid sequence of Arf has updated the previous information about C-domains of Gram-negative bacteria, especially putative dCd domain. We further analyzed the phylogeny of C-domains of various Gram-positive and Gram-negative bacteria.
C-domains of NRPS from Gram-positive and Gram-negative bacteria
The tree with totally 162 C-domains does indeed show a clustering according to the types of reactions catalyzed and types of bacterial group (Fig. 3). C-domains are apparently grouped into two main functional classes; l-peptidyl donors, lCl and putative lCd, and d-peptidyl donors, dCl and putative dCd. It was confirmed that l-acceptors (l/dCl) and d-acceptors (l/dCd) are not clearly separated. l-peptidyl donors and d-peptidyl donors are subclustered into two groups according to the bacterial group, Gram-positives and Gram-negatives. These results suggest that l-peptidyl donors and d-peptidyl donors were separated before Gram-positive and Gram-negative bacteria were separated in their evolution. It should be noted that d-peptidyl donors of siderophore synthetase (Pvd, Pvs) in Pseudomonas are obviously closer lineage to those of Gram-positive filamentous actinomycetes (Cep, Com) than those of lipopeptides in Pseudomonas (Arf, Syr, and Syp). Both Pvd and Pvs contain internal E-domains similar to Gram-positive NRPS but Arf, Syr, and Syp do not contain E-domains gene in the synthetase gene cluster [ [, [1, . This may be the reason for the fact that Pvd/Pvs and Com/Cep share a similar structure of d-peptidyl donor over the difference in Gram-positive and Gram-negative bacteria.
N-acyl donors use fatty acyl-CoA as their starter unit. The N-acyl donor is unexceptionally located at the N-terminal domain of the first subunit NRPS that synthesize cyclic lipopeptides (CYLP) in both Gram-positive and Gram-negative bacteria. It has been suggested that the first amino acid of lipopeptides is initially N-acylated with a β-hydroxy fatty acid in this domain . CYLP containing a β-hydroxy fatty acid have various physiological activities, such as enzyme inhibitors (fengycin, plipastatin ), biosurfactants (arthrofactin , lichenysin , surfactin ), and phytotoxin (syringomycin , syringopeptin22 ). Whereas, CYLP with a β-amino fatty acid modification are antifungal substrances such as bacillomycin, iturin and mycosubtilin. The first subunit of iturin family synthetase (BamA, ItuA, MycA) exhibits a remarkable complexity with two extra N-terminal C-domains [ [5– . The first extra C-domains (BamA/ItuA/MycA_C1) that have been believed to transfer β-amino fatty acid to a T-domain  are close relatives to N-acyl donors for β-hydroxy fatty acid (Fig. 3). Meanwhile, the second extra C-domains (BamA/ItuA/MycA_C2) that have been proposed to acylate the activated asparagine with β-amino fatty acid are closer to l-peptidyl donors than N-acyl donors for β-hydroxy fatty acid. This finding may suggest the different optical configuration of β-hydroxy fatty acid and β-amino fatty acid. Chirality of asymmetric alpha-carbon in the β-hydroxy fatty acid part in surfactin has been reported to be R-form . It is interesting that the SrfA_C1 is closer to d-peptidyl donors than l-peptidyl donors because optical configuration of asymmetric alpha-carbon of d-form amino acid is known to be R-form. Then, it is possible that clustering of N-acyl donor is also according to the chirality of the molecule. It is of great interest whether β-amino fatty acid in nonribosomal peptides has R-form or S-form configuration.
In order to analyze the relationship between the structures of C-domains and donor/acceptor molecules in more details, phylogenetic trees were constructed in each Gram-positive and Gram-negative bacterial NRPS. In Gram-positive Bacillus, the lCl and dCl domains are relatively high abundance (Fig. 4(a)). Only 3 putative dCd domains were identified in iturin synthetase family (BamB/ItuB/MycB_C1). The dCl and putative dCd domains were also contained in Cep and Com synthetases of actinomycetes. CepB_C1 (lCd), CepC_C1 (lCl) and ComD_C1 (lCd) were clustered not into l-peptidyl donors but into d-peptidyl donors. This may attribute to the fact that they adopt unusual amino acids as acceptor, HPG and DHPG. Their side chain structures would affect the structure of C-domains over the enantioselectivity for donor molecules.
Phylogenetic analysis of C-domains from Gram-negative bacteria also showed three clusters according to the types of reactions catalyzed, l-peptidyl donors, d-peptidyl donors, and N-acyl donors (Fig. 4(b)). However, C-domains of Syp also showed unusual clustering. SypC_C12 is dCl-domain for d-Dab and l-Thr condensation but it belongs to l-peptidyl donors group. In contrast, SypA_C5, and SypC_C3 should be lCd but they are members of d-peptidyl donors. In addition, the C-domains for Dhb, which is an achiral molecule, such as SypA_C2 and SypB_C5 are members of d-peptidyl donors, too.
Evolutionary trace analysis of C-domains
VibH, that is responsible for acylation of amine group in norspermidine with aryl acid dihydroxybenzoate, was classified as a member of N-acyl donors (Fig. 3). Since the crystal structure and active site structure of VibH has already been determined , evolutionary trace (ET) analysis  was applied to extract functionally important residues conserved at different levels of partition among the C-domain groups (figure not shown). The aligned sequences with VibH (1L5A) coordinate were submitted to the University of Cambridge ET server (http://www-cryst.bioc.cam.ac.uk/jiye/evoltrace/evoltrace.html ). ET analysis of 162 C-domains indicated that the major functional clusters (N-acyl, l-peptidyl, and d-peptidyl donors) are visible as partition P03. These results showed that the catalytic residue His126 and structural residue Asp130 are conserved in almost all partitions, suggesting the common catalytic mechanism is shared. The Asp130 was replaced with structurally related Glu in PvdL_C2 and Pvs5_C2. Gly265 and Arg271, are also highly conserved. Gly265 lines near the solvent channel of VibH and may be important for catalytic reaction. Substitution of Arg271 to Gln was observed in the first extra C-domains of Gram-positive bacteria. There was a significant difference between l-peptidyl and d-peptidyl donors at position 264. This residue is located in the active site of VibH, just before the conserved Gly265. In the l-peptidyl donors group, the residue 264 was either Val or Ile. The corresponding residue in the d-peptidyl donors group was more bulky hydrophobic Phe in most cases (see also Fig. 2). Tyr264 was also found in GrsB_C1 domain. ET analysis revealed another signature difference at positions 180 (Asp) and 184 (Trp) in alpha-6 of VibH. Trp is highly conserved at 180 and the residue at 184 is Leu with no exception in the d-peptidyl donors. While, the corresponding residues in l-peptidyl donors are in most cases Gln and Trp, respectively. It is interesting that these positions are replaced by Phe or Tyr and Met or Arg, respectively, in the d-peptidyl donors from Gram-negative bacteria that do not contain internal E-domain (Arf, Syr, and Syp). Because these corresponding residues in VibH, Asp180 and Trp184, are lined near the solvent channel, these differences probably relate to the different substrate specificity of C-domains against donor molecules.
Phylogenetic analyses of C-domains presented here revealed the structural diversity and relationship with donor and acceptor molecules. This study demonstrates that optical configuration of the donor molecule is critical for the structure of C-domain and also there is no d-acceptor specific C-domain as far as examined. Several type-specific amino acid residues were identified within putative solvent channel of C-domains that may be directly or indirectly responsible for determining substrate specificity. This phylogenetic approach should be useful not only for engineering novel NRPS but also for deducing the chirality of amino acids in product peptides.
N.R. acknowledges his post-doctoral fellowship from Japan Society for the Promotion of Science (P04468). This work was supported by the Grants-in-Aid for Scientific Research for Exploratory Research of the MEXT (No. 17510171) and Takeda Science Foundation.