Abstract

Summary: X-ray crystallography is the most widely used method to determine the 3D structure of protein molecules. One of the most difficult steps in protein crystallography is model-building, which consists of constructing a backbone and then amino acid side chains into an electron density map. Interpretation of electron density maps represents a major bottleneck in protein structure determination pipelines, and thus, automated techniques to interpret maps can greatly improve the throughput. We have developed WebTex, a simple and yet powerful web interface to TEXTAL, a program that automates this process of fitting atoms into electron density maps. TEXTAL can also be downloaded for local installation.

Availability: Web interface, downloadable binaries and documentation at

Contact:textal@tamu.edu

The process of solving a protein structure by x-ray crystallography involves several steps, including cloning and crystallization of the molecule, data collection, phasing, model-building and refinement. A high degree of automation has been achieved for many of these steps; but model-building remains largely manual, since it is an intricate and laborious process that requires a lot of expertise. Despite the use of molecular visualization software, model-building can be very time-consuming, especially if the data is of poor quality.

Various tools and techniques have been proposed for automated electron density map interpretation: treating model-building and phase refinement as one unified procedure using free atom insertion in ARP/wARP (Perrakis et al., 1999); Cα tracing, fuzzy logic sequence assignment and real-space torsion angle refinement in X-AUTOFIT (Oldfield, 2003); template matching and iterative fragment extension in RESOLVE (Terwilliger, 2002); fitting α-helices and β-sheets, followed by local sequence assignment and extension through loops in MAID (Levitt, 2001); molecular-scene analysis (Leherte et al., 1997); database search (Diller et al., 1999); using templates from the Protein Data Bank (Jones et al., 1991); template convolution (Kleywegt and Jones, 1997), etc.

The TEXTAL system is based on artificial intelligence techniques that simulate the decision-making processes that crystallographers use in building protein models. First, a Cα trace is built by the CAPRA (C-Alpha Pattern Recognition Algorithm) system, which uses a neural network to predict positions of Cα atoms, and a heuristic search method to connect the Cα atoms into chains. Then side chains are modeled by retrieving (from a database) structure fragments for 5 Å spheres of electron density that match the density patterns in the input electron density map. This is followed by post-processing routines, including sequence alignment (Smith and Waterman, 1981) and real-space refinement (Diamond, 1971). One of the distinguishing features of TEXTAL is that it has been specifically designed to work even with average quality data; it often builds reasonable models given maps with resolution as low as 3 Å. In practice, many electron density maps are noisy and fall in the low-medium resolution category. Table 1 shows how TEXTAL typically performs on electron density maps at various resolution. Figures 1 and 2 show examples of protein models built by TEXTAL, and compare them to their true structures. For a detailed discussion on TEXTAL, refer to (Ioerger and Sacchettini, 2003; McKee et al., 2005; Romo et al., 2005) and the documentation available at .

Fig. 1

Cartoon of the backbone of nsf-d2, a monomer with 247 amino acids. The manually built and refined structure is shown in orange; the model built by TEXTAL is in green. TEXTAL builds 91% of the structure; most of the α-helices, β-sheets and loops are built fairly accurately, except for a small fragment (at the top, on the right) that TEXTAL builds correctly, but in a symmetry-related copy of the true structure.

Fig. 1

Cartoon of the backbone of nsf-d2, a monomer with 247 amino acids. The manually built and refined structure is shown in orange; the model built by TEXTAL is in green. TEXTAL builds 91% of the structure; most of the α-helices, β-sheets and loops are built fairly accurately, except for a small fragment (at the top, on the right) that TEXTAL builds correctly, but in a symmetry-related copy of the true structure.

Fig. 2

Comparison of a few side chains built by TEXTAL (shown in green) to those in the manually built structure (in orange) for epsin. TEXTAL correctly identifies 92% of the 149 side chains in epsin, and places them with a root mean square error of 0.8 Å, computed over all atoms.

Fig. 2

Comparison of a few side chains built by TEXTAL (shown in green) to those in the manually built structure (in orange) for epsin. TEXTAL correctly identifies 92% of the 149 side chains in epsin, and places them with a root mean square error of 0.8 Å, computed over all atoms.

Table 1

Quality of structures built by TEXTAL

Protein (PDB ID) Resolution (Å) Number of amino acids Percentage of structure built Percentage of correct amino acid identities Cα rms error (Å) All-atom r.m.s. error (Å) 
Epsin (1EDU) 1.8 149 98 92 0.57 0.80 
Antitrypsin (1AS4) 2.1 376 81 52 0.95 1.04 
Nsf-d2 (1NSF) 2.4 247 91 90 0.94 1.05 
Penicillopepsin (3APP) 2.8 323 92 77 0.82 0.87 
Protein (PDB ID) Resolution (Å) Number of amino acids Percentage of structure built Percentage of correct amino acid identities Cα rms error (Å) All-atom r.m.s. error (Å) 
Epsin (1EDU) 1.8 149 98 92 0.57 0.80 
Antitrypsin (1AS4) 2.1 376 81 52 0.95 1.04 
Nsf-d2 (1NSF) 2.4 247 91 90 0.94 1.05 
Penicillopepsin (3APP) 2.8 323 92 77 0.82 0.87 

Representative results on four models built by TEXTAL, given experimental electron density maps. TEXTAL typically builds >90% of the structure. The root mean square (RMS) error for Cα atoms is usually <1 Å. The RMS error for all atoms is ∼1 Å. Also, TEXTAL correctly predicts the identities of ∼80% of the amino acids on average (after sequence alignment).

WebTex is a publicly accessible web interface that allows users to upload crystal diffraction data (reflection file or electron density map in XPLOR format), run various modules of TEXTAL and obtain (via email) solved models (in PDB format). The first version of WebTex was launched in June 2002; in its first three years, WebTex has been used by ∼120 users in 70 institutions from 20 countries. We were thus encouraged to enhance the capability and flexibility of WebTex, provide users with the latest features of TEXTAL, and meet the various requests that users made.

The latest release of WebTex, available since September 2005, allows users to run each of the following four main modules of TEXTAL independently:

  1. Identify a contiguous biological molecular unit of the protein in an electron density map (by the FINDMOL program), since oftentimes the repeating asymmetric unit cuts the molecule into disconnected fragments, making electron density map interpretation difficult.

  2. Find chains of Cα atoms that represent the backbone (or main chain) of the molecule, using the CAPRA program.

  3. Solve the complete model i.e. determine and refine the main chain as well as the side chains.

  4. Given a trace of Cα atoms and an electron density map as inputs, determine and refine the side chains.

The latest version of WebTex offers many new features, and parameters that users can define. For instance, WebTex can now build a model directly from structure factors (instead of requiring the user to prepare an electron density map, as in the previous version), and using FINDMOL to automatically create a map centered on a complete molecule. Furthermore, WebTex now provides a flexible, context-sensitive interface that automatically adapts itself based on selected objective and options. This largely alleviates the encumbrance related to a multiplicity of choices because of many inter-dependent parameters. Figure 3 is a snapshot of the job submission page of WebTex.

Fig. 3

A typical web page to run various TEXTAL modules. The page adapts itself depending on the objective (e.g. build Cα trace or build side chains), type of input (e.g. reflection file or electron density map file), format of input (e.g. file or string for amino acid sequence), etc. There are also numerous features to see the progress of runs and peruse output files online (or download them).

Fig. 3

A typical web page to run various TEXTAL modules. The page adapts itself depending on the objective (e.g. build Cα trace or build side chains), type of input (e.g. reflection file or electron density map file), format of input (e.g. file or string for amino acid sequence), etc. There are also numerous features to see the progress of runs and peruse output files online (or download them).

Other new features of WebTex include the ability to view the status of submitted jobs, and to conveniently peruse all output files online. Users can also download individual output files, or a compressed file containing all output for each run. The final output (a model in PDB format) is sent via email (it takes ∼2 h to build a model for a protein of average size i.e. ∼300 amino acids). Intermediary outputs and logs often contain valuable information that can help the user explain the output and choose the subsequent action to further improve the model.

One key issue about protein crystallography is the value associated with diffraction data. They are often expensive to collect and process, and hence are viewed as proprietary. The web interface ensures confidentiality and security of data through various measures (such as encryption, license agreements, etc.). Furthermore, executable versions of TEXTAL can be downloaded and installed locally. WebTex is available free of charge.

This work was supported by the National Institutes of Health (Grant PO1-GM63210) and the Robert A. Welch Foundation (Grant A-0015).

Conflict of Interest: none declared.

REFERENCES

Diamond
R.
A real-space refinement procedure for proteins
Acta Crystallogr.
 , 
1971
, vol. 
A27
 (pg. 
436
-
452
)
Diller
D.J.
Redinbo
M.R.
Pohl
E.
Hol
W.G.J.
A database method for automated electron density map interpretation in protein crystallography
Proteins Struct. Funct. Genet.
 , 
1999
, vol. 
36
 (pg. 
526
-
541
)
Ioerger
T.R.
Sacchettini
J.C.
Sweet
R.M.
Carter
C.W.
The TEXTAL system: artificial intelligence techniques for automated protein model building
Methods Enzymol.
 , 
2003
, vol. 
374
 (pg. 
244
-
270
)
Jones
T.A.
, et al.  . 
Improved methods for building models in electron density maps and the location of errors in these models
Acta Crystallogr.
 , 
1991
, vol. 
A47
 (pg. 
110
-
119
)
Kleywegt
G.J.
Jones
T.A.
Template convolution to enhance or detect structural features in macromolecular electron density maps
Acta Crystallogr.
 , 
1997
, vol. 
D53
 (pg. 
179
-
185
)
Leherte
L.
, et al.  . 
Analysis of three-dimensional protein images
J. AI Res., (JAIR)
 , 
1997
, vol. 
7
 (pg. 
125
-
159
)
Levitt
D.G.
A new software routine that automates the fitting of protein x-ray crystallographic electron density maps
Acta Crystallogr.
 , 
2001
, vol. 
D57
 (pg. 
1013
-
1019
)
McKee
E.W.
, et al.  . 
FINDMOL: automated identification of macromolecules in electron-density maps
Acta Crystallogr.
 , 
2005
, vol. 
D61
 (pg. 
1514
-
1520
)
Oldfield
T.J.
Automated tracing of electron density maps of proteins
Acta Crystallogr.
 , 
2003
, vol. 
D59
 (pg. 
483
-
491
)
Perrakis
A.
, et al.  . 
Automated protein model-building combined with iterative structure refinement
Nat. Struct. Biol.
 , 
1999
, vol. 
6
 (pg. 
458
-
463
)
Romo
T.D.
Gopal
K.
McKee
E.W.
Kanbi
L.
Pai
R.
Smith
J.N.
Sacchettini
J.C.
Ioerger
T.R.
TEXTAL: AI-based structural determination for X-ray protein crystallography
IEEE Intell. Syst.
 , 
2005
, vol. 
20
 (pg. 
59
-
63
)
Smith
T.F.
Waterman
M.S.
Identification of common molecular subsequences
J. Mol. Biol.
 , 
1981
, vol. 
147
 (pg. 
195
-
197
)
Terwilliger
T.C.
Automated side-chain model-building and sequence assignment by template matching
Acta Crystallogr.
 , 
2002
, vol. 
D59
 (pg. 
45
-
49
)

Author notes

Associate Editor: Martin Bishop

Comments

0 Comments