Analysis of binding of the family 2 a carbohydrate-binding module from Cellulomonas fimi xylanase 10 A to cellulose : specificity and identification of functionally important amino acid residues

The family 2a carbohydrate-binding module (CBM2a) of xylanase 10A from Cellulomonas fimi binds to the crystalline regions of cellulose. It does not share binding sites with the N-terminal family 4 binding module (CBM4-1) from the cellulase 9B from C.fimi, a module that binds strictly to soluble sugars and amorphous cellulose. The binding of CBM2a to crystalline matrices is mediated by several residues on the binding face, including three prominent, solvent-exposed tryptophan residues. Binding to crystalline cellulose was analyzed by making a series of conservative (phenylalanine and tyrosine) and non-conservative substitutions (alanine) of each solvent-exposed tryptophan (W17, W54 and W72). Other residues on the binding face with hydrogen bonding potential were substituted with alanine. Each tryptophan plays a different role in binding; a tryptophan is essential at position 54, a tyrosine or tryptophan at position 17 and any aromatic residue at position 72. Other residues on the binding face, with the exception of N15, are not essential determinants of binding affinity. Given the specificity of CBM2a, the structure of crystalline cellulose and the dynamic nature of the binding of CBM2a, we propose a model for the interaction between the polypeptide and the crystalline surface.


Introduction
Many microbial polysaccharide hydrolases are modular proteins, comprising a catalytic module and one or more ancillary modules.Amongst the latter, the most common type mediates binding to substrate (Gilkes et al., 1991;Tomme et al., 1995b).Of the substrate-binding modules, those that bind to cellulose, the cellulose-binding domains (CBDs), now referred to as carbohydrate-binding modules (CBMs), have attracted considerable interest (Gilkes et al., 1988;Linder et al., 1995a;Tomme et al., 1995a,b).CBMs are found in both cellulases and xylanases, for which the natural substrates are predominantly the cellulose and hemicelluloses in the cell walls of plants.CBMs serve to increase the effective concentrations of the enzymes on the substrate (Linder et al., 1995a;Tomme et al., 1995a).In at least one enzyme, the CBM may guide a cellulose chain into the active site of the catalytic module (Sakon et al., 1997;Irwin et al., 1998).CBMs are classified into families of related amino acid sequences (Tomme et al., 1995b).The members of a family are very similar in size, with average sizes for the different families ranging from ~36 to ~200 amino acids (Tomme et al., 1995b).
Certain families of CBMs, including members of family 2a, bind to crystalline cellulose, including bacterial microcrystalline cellulose (BMCC) and also to partially crystalline preparations, such as phosphoric acid-swollen cellulose (PASC).All CBMs of family 2a share the same basic structure: a β-barreltype backbone with a relatively flat face on which there are several solvent-exposed aromatic amino acid residues that are involved in binding (Kraulis et al., 1989;Reinikainen et al., 1992;Poole et al., 1993;Din et al., 1994;Xu et al., 1995;Bray et al., 1996;Tormo et al., 1996;Brun et al., 1997;Nagy et al., 1998).Binding is driven by sorbent and protein dehydration (Creagh et al., 1996).It is not known if the module recognizes only crystalline regions of the sorbent or is capable of also binding more amorphous cellulose structures present in PASC.A second class of CBMs, typified by the Nterminal family 4 binding module from the Cellulomonas fimi cellulase 9B (hereafter CBM4-1, formerly designated CBD N1 ), is known to bind PASC but not BMCC.These CBMs also have a β-sheet structure, but the binding region is a narrow groove lined with hydrophobic amino acid residues (Johnson et al., 1996a,b;Tomme et al., 1996).
Among the best characterized of the class of CBMs which bind both BMCC and PASC is the family 2a CBM from xylanase 10A of C.fimi (previously CBD Cex , hereafter, CBM2a).It is 110 amino acids long and comprised of two sheets of five and four anti-parallel β-strands (Xu et al., 1995).Binding of CBM2a to either BMCC or PASC is irreversible, such that complete removal of the module from the solution phase does not result in significant desorption over extended (i.e.several weeks) periods of equilibration (unpublished data).Although binding is irreversible, the interaction with the cellulose surface is dynamic.Surface diffusion measurements using fluorescence recovery techniques show that CBM2a, either in its isolated form or as a module in xylanase 10A, is mobile on the surface of crystalline cellulose (Jervis et al., 1997) and may therefore provide bound xylanase 10A access to a much broader field of substrate.
CBM2a contains three solvent-exposed tryptophan residues (W17, W54 and W72) which are highly conserved within members of family 2a and are aligned on the binding face of the module.The rate of oxidation of these tryptophans with N-bromosuccinimide was halved when CBM2a was bound to BMCC, suggesting a dynamic interaction of the tryptophans with the cellulose surface which, at least periodically, allows each residue to become solvent exposed and susceptible to oxidation.Complete oxidation of the three tryptophan residues eliminates binding but not proper folding of CBM2a, indicating the importance of these hydrophobic residues to the binding reaction (Bray et al., 1996).The individual contributions of W17, W54 and W72 to the overall binding energetics however, are not known.It is also not clear whether any other solventexposed amino acid residues proximal to the three tryptophans make a significant contribution to the energetics of binding.
In this work, site-directed mutation was combined with a Langmuir-type adsorption isotherm analysis to determine the individual contributions of W17, W54, W72 and a number of neighboring residues to the overall energetics of CBM2a binding to BMCC.The choice of neighboring residues to include in the study was based on inspection of the structure of CBM2a (Xu et al., 1995) and the potential of the residue either to hydrogen bond or to establish close van der Waals contact with the sorbent surface.Results from this set of single-site mutants were used to establish a model for the binding of CBM2a to the surface of crystalline cellulose.

Bacterial strains and plasmids
The Escherichia coli strains used were JM101 and R1360.The expression vector pTug K-H6-IEGR-CBM2a is a derivative of the pTugA and pTugAS vectors described previously (Graham et al., 1995).It encodes resistance to kanamycin; it also carries a synthetic gene encoding the leader peptide of Xyn10A at the N-terminus, followed by six histidine residues, a factor Xa protease cleavage site (amino acid sequence, IEGR) and then CBM2a at the C-terminus.

Synthesis of DNA fragment encoding CBM2a
The DNA fragment encoding CBM2a was synthesized by a PCR-based procedure, similar to that used in the synthesis of a xylanase gene from Schizophyllum commune (Graham et al., 1993).The sequence incorporated a number of unique restriction sites to facilitate manipulation of the fragment after sitedirected mutation and the codon bias was changed to that of E.coli, both without changing the encoded amino acid sequence.Six synthetic oligonucleotides were used as primers in the PCR assembly of the complete fragment (Table I).
The first reaction mixture contained 5 pmol each of longer primers SCx2 and SCx3, 200 µM of each deoxynucleoside triphosphate (dNTP), 3 U of Expand Hi-Fi Polymerase and its recommended buffer supplemented with DMSO to 5%, in a final volume of 50 µl.After five cycles of 1 min at 95°C followed by 3 min at 72°C in a Model 2400 Thermocycler (Perkin-Elmer), 20 pmol each of the flanking primers SCx1 and SCx4 were added.Synthesis of the segment was completed by 25 cycles of 45 s at 95°C, 45 s at 58°C and 1 min at 72°C.The second reaction mixture contained 1 ng of the product from the first reaction, 23.00, 2.30 or 0.23 pmol of primer SCx5 (all three primer concentrations yielded PCR product), 2.5 U of Pwo polymerase in its recommended buffer supplemented with DMSO to 5% and 200 µM of each dNTP, in a final volume of 50 µl.After five cycles of 1 min at 95°C followed by 3 min at 72°C, 20 pmol each of the flanking primers SCx1 and SCx6 were added and synthesis of the fragment was completed by 25 cycles of 45 s at 95°C, 45 s at 55°C and 1 min at 72°C.The full-length fragment was purified using a Qiaquick PCR purification kit (Qiagen), digested with EcoRI and HindIII, then ligated into pUC19 that had been digested with the same enzymes.After checking its sequence, the fragment was subcloned into the pTugK vector.

Site-directed mutation of CBM2a
All mutations were made by two-primer PCR mutagenesis (Ansaldi et al., 1996) or the 'megaprimer' method (Sarkar and Sommer, 1990;Smith and Klugman, 1997).A typical twoprimer reaction contained 50 ng of template DNA, 5 pmol each of the flanking (primer BG2, Table II) and the mutagenic primers (Table II), 200 µM of each dNTP and 1 U of Vent R Polymerase (NEB) in its recommended buffer.Amplification was obtained by 30 cycles of 30 s at 96°C, 30 s at 53°C and 60 s at 72°C.The product was purified with a Qiaquick PCR purification kit (Qiagen), digested with the appropriate restriction enzymes and ligated into a pTugK vector that had been digested with the same enzymes.Mutants of residues W17, W54, W72 and Q52 were made by this method.Mutants of residues N15, N18, Q83 and N87 were made by using a megaprimer strategy.The first product was amplified using the mutagenic primer and the BG2 flanking primer as described above.The product of this reaction was used as a megaprimer using the oligonucleotide Lax16 as a flanking primer.PCR conditions were as described previously (Smith and Klugman, 1997).Briefly, 2 µg of megaprimer product were used with 50 ng of template (pTug K-H6-IEGR-CBM2a) and 5 pmol of the Lax16 flanking primer, added after five cycles of denaturation at 96°C for 1 min and extension at 72°C for 3 min.The remaining 30 cycles of the PCR program were as described above.

Production and purification of the polypeptides
The full-length fragment was ligated into the vector pTugK in frame.Inclusion of the leader peptide of Xyn10A in the fragment allows export of CBM2a to the periplasm of E.coli from where it leaks into the culture supernatant (Graham et al., 1993;Ong et al., 1993).Since the aim was to obtain mutants of CBM2a with reduced affinity for cellulose, a hexahistidine sequence followed by a factor Xa site was added to the Nterminus of mature CBM2a so that the mutant polypeptides could be purified by immobilized metal-chelate affinity chromatography (IMAC) (Petty, 1996).This had the added advantage of avoiding the denaturing conditions required to desorb CBM2a from cellulose if the mutants, unlike the wild-type, could not be refolded following desorption (Ong et al., 1993).
In the event that the hexahistidine sequence affected the binding of CBM2a to cellulose, factor Xa could be used to obtain native CBM2a following purification by IMAC.E.coli strains JM101 or R1360 were used for the production of wild-type CBM2a and its mutants.Typically, 500 ml of tryptone yeast extract medium (TYP) (Sambrook et al., 1989) containing 1.5 mM potassium phosphate and 50 µg kanamycin/ ml were inoculated with overnight cultures of an E.coli strain transformed with the appropriate vector.The cultures were shaken at 200 r.p.m. at 37°C until the OD 600 was ~1.0, then induced with 0.3 mM IPTG.After a further 36 h, the cells were removed by centrifugation.The proteins in the supernatant were concentrated and exchanged into IMAC binding buffer (5 mM imidazole, 5 M NaCl, 20 mM Tris-HCl, pH 7.9) using a tangential flow filtration unit (Filtron Ultrasette, 1 kDa cutoff).The solution was passed through a column of Ni 2ϩ -Sepharose.The column was washed with binding buffer.Adsorbed polypeptide was eluted with binding buffer containing stepwise increases in the concentration of imidazole (Figure 1).The fractions containing polypeptide, detected by A 280 , were pooled, desalted and concentrated in a stirred ultrafiltration cell (Amicon; Filtron 1 kDa cut-off filter).The protein purity and approximate yield were estimated by sodium dodecyl sulfate polyacrylamide gel electrophoresis (SDS-PAGE).Final protein concentrations were determined from the A 280 , using the calculated Σ max s (Ong et al., 1993).The wild-type and mutants of CBM2a were obtained routinely in yields of up to 100 mg of purified polypeptide per liter of culture supernatant, except for the W54A mutant, for which the yield was ~20 mg/l.In each case, sufficient polypeptide was obtained for binding analysis.

Determination of affinities
The affinities of CBM2a and its mutants for cellulose were determined using a solution depletion method to generate binding isotherms.CBM2a variants were equilibrated in microcentrifuge tubes with 1 mg of cellulose in 50 mM potassium phosphate (pH 7.0) in a total reaction volume of 1 ml.The final total polypeptide concentrations ranged from 1 to 30 µM.Triplicate samples were incubated at 4°C for at least 3 h while rotating end over end.The cellulose was removed by centrifugation and the concentration of protein remaining in the supernatant was determined by A 280 after subtracting A 350 to allow for light scattering.The concentration of bound protein for a particular concentration of the CBM was the difference between a control incubated without cellulose and free protein left after incubation with cellulose.An isotherm of [Bound] (µmol/g of cellulose) vs [Free] (µM) was generated and binding parameters were determined by non-linear regression using the Langmuir-type isotherm [Equation 1; modified from Din et al. (1994)], using GraphPad Prism 3.0 for Windows (GraphPad Software, http://www.graphpad.com). [ where [B] is the concentration of bound protein (µmol/g cellulose), [N 0 ] is the total concentration of binding sites (µmol/g cellulose), K a is the association affinity constant (µmol/l), [F] is the concentration of free protein (µmol/l) and G is a constant, calculated during the regression, that corrects for non-protein absorbance at the measured wavelengths.G therefore is the sum of all of the factors that contribute to absorption measurements not specific to CBM2a.

Competition binding isotherms
Preparations of CBM2a and CBM4-1 were each labeled with a unique fluorescent probe so that the concentration of each species in a binding reaction could be measured independent of the A 280 .Oregon Green 514 carboxylic acid succinimidyl ester (Molecular Probes, Eugene, OR) will react with one of two amino groups of CBM2a: the N-terminus and a single solvent-exposed lysine residue.Since neither of these sites is near the putative binding face, the Oregon Green label, similarly to fluorescein isothiocyanate (FITC), will not interfere with the cellulose-binding properties of the polypeptide (Jervis et al., 1997).6-[(7-Amino-4-methylcoumarin-3-acetyl)amino]hexanoic acid succinimidyl ester (AMCA-X SE) will react with CBM4-1 at its N-terminus, also well removed from the binding groove.Both labeling reactions were performed as directed by the manufacturer.Briefly, 1 mg of the fluorescent probe dissolved in 100 µl of dimethyl sulfoxide (DMSO) was added to 15 mg of CBM in ~1 ml of 0.1 M sodium hydrogencarbonate buffer, pH 9.0.The solution was mixed in the dark at room temperature for 1 h.The labeled polypeptide was passed twice through a 5 ml Sephadex G-25 column (Pharmacia Biotech, Uppsala, Sweden), equilibrated with 50 mM potassium phosphate (pH 7.0), to separate any unbound probe from the protein.Fractions containing protein were collected and pooled.The labeling efficiency of each reaction was calculated: the number of moles of Oregon Green per mole of CBM2a was estimated to be 0.42 calculated from A 280 and A 506 of the pooled fractions.The number of moles of AMCA-X per mole of CBM4-1 was estimated to be 0.60 calculated from A 280 and A 352 of the pooled fractions.
Owing to the sensitivity of the fluorimeter (Perkin-Elmer LS 50 luminescence spectrometer), labeled CBM was diluted with unlabeled CBM so that the maximum measurable fluorescence emission, at 537 nm for CBM2a and 447 nm for CBM4-1, coincided with typical maximum concentrations of CBMs as measured by A 280 .This ratio was ~373 mol of unlabeled CBM2a per mole of labeled CBM2a and 87 mol of CBM4-1 per mole of labeled CBM4-1.The contribution of the fluorescent labels to A 280 was negligible.Binding isotherms to PASC of CBM2a alone and in the presence of a total concentration of ~11 µM CBM4-1 were performed as described previously, except that all binding reactions were done in microcentrifuge tubes that had been blocked in a solution of 1% bovine serum albumen (BSA) for 2 h then well rinsed with distilled water.Since there was a linear relationship between the fluorescence and A 280 of polypeptide dilutions (Figure 2, inset), the concentration of each binding module could be determined by fluorescence alone, independent of the A 280 .CBM2a was measured using an excitation wavelength of 506 nm and an emission wavelength of 537 nm.CBM4-1 was measured using an excitation wavelength of 352 nm and an emission  wavelength of 447 nm.The concentration of CBM2a and CBM4-1 for each isotherm was calculated from standards of fluorescence vs CBM concentration (Figure 2, inset).

Binding of the wild-type CBM2a to cellulose in the absence and presence of CBM4-1
Measured isotherms for binding of wild-type CBM2a and His 6 -CBM2a to BMCC at 4°C (50 mM phosphate buffer, pH 7.0) were indistinguishable, with equivalent equilibrium binding constants when experimental error is considered (Table III).The hexahistidine tag therefore does not affect binding and was not removed from the wild-type and mutant polypeptides compared here.Relative to a highly crystalline preparation from Valonia ventricosa, the crystallinity indices for the cellulose preparations used in this work are 0.76 for BMCC (Kulshreshtha and Dweltz, 1973) and 0.50 for Avicel (Wood, 1988).The crystallinity index for PASC is not known, but is likely to be significantly less than that of Valonia owing to the acidswelling process used in its preparation.The affinity of His 6 -CBM2a for these insoluble cellulose preparations decreased in the order BMCC Ͼ PASC Ͼ Avicel (Table IV) and ranged from 3.2ϫ10 6 to 1.0ϫ10 6 M -1 , suggesting preferential binding of CBM2a to crystalline cellulose.The binding properties of the wild-type CBM2a on these substrates were indistinguishable from the His 6 -CBM2a construct (data not shown).Based on isothermal titration calorimetry (ITC) data, highly crystalline BMCC preparations offer two classes of binding sites to CBM2a, both of relatively high affinity and characterized by binding constants (K a ) within the range reported here (10 6 -10 7 M -1 ) (Creagh et al., 1996).Thus, possibly owing to differences in the surface energies of its various crystal faces, crystalline cellulose appears to present a heterogeneous array of binding sites to CBM2a.
The K a values reported in Tables III-V are regressed using Equation 1, which assumes that the cellulose surface is uniform and offers only a single class of binding site.This approach is justified by the fact that the shape of the cumulative isotherm generated by the depletion method (used here) is relatively insensitive to the energetics and occupancy of lower affinity/ lower occupancy sites as opposed to that of the differential binding isotherm, provided by ITC.The regressed K a value from the cumulative depletion isotherm is an averaged value which reflects, in part, the relative contributions of each class of binding site on the cellulose surface.Since it is difficult and potentially misleading to apply a multi-site binding model to cumulative isotherm data, it is appropriate to use the single site-binding model (Equation 1) (Langmuir, 1918;Hinshelwood, 1940).
Based on this mode of analysis, the lower K a values reported for binding of CBM2a to Avicel and PASC, compared with the affinity for BMCC, suggest that these forms of insoluble cellulose present a higher fraction of low-affinity sites to CBM2a resulting in a lower K a calculated from the cumulative isotherm.The nature of these low-affinity sites, however, is unclear.They could reflect natural differences in the abundance of high-and low-energy cellulose crystal faces.They could also originate from the ability of CBM2a to bind both crystalline and amorphous morphologies of cellulose, with the affinity of the module for amorphous regions being somewhat weaker.
We tested the latter possibility by measuring the binding of CBM2a to PASC in the presence and absence of near saturating amounts of CBM4-1.CBM4-1 specifically binds amorphous cellulose, showing no significant affinity for crystalline preparations of cellulose such as V.ventricosa cellulose and BMCC (Coutinho et al., 1992;Tomme et al., 1996;Kormos et al., 2000).The K a for CBM4-1 binding to either soluble cellulose polymers or PASC at 4°C (50 mM potassium phosphate buffer, pH 7.0) is 2.6ϫ10 5 M -1 , an order of magnitude less than the average K a for CBM2a binding to PASC.Thus, for a fixed total CBM4-1 concentration of 11 µM, increasing concentrations of CBM2a should lead to the progressive displacement of bound CBM4-1 if CBM2a shows an affinity for amorphous regions of the PASC.
Binding of CBM2a to PASC was not significantly affected by the presence of CBM4-1 (Figure 2).With no CBM4-1 present, CBM2a binds with a K a of 1.3 (Ϯ0.1)ϫ10 6 M -1 .When saturating levels of CBM4-1 are present, the K a for binding of CBM2a to PASC is reduced only slightly to 1.0 (Ϯ0.1)ϫ10 6 M -1 .The sorbent capacity for CBM2a also remained unchanged (25.0 Ϯ 0.1 µmol CBM2a/g PASC).Moreover, the amount of bound CBM4-1 decreased to no less than 75% saturation as the concentration of bound CBM2a approached saturation, indicating that most of the bound CBM4-1 is unaffected by the presence of CBM2a.Together, these results support a view that CBM2a binding is specific to crystalline regions of cellulose.Competition between bound CBM2a and CBM4-1 therefore only occurs at boundaries between crystalline and amorphous microstructures, such that ~25% of bound CBM4-1 is susceptible to displacement by CBM2a, presumably through a steric exclusion mechanism.

Binding of mutants of CBM2a to BMCC
Since CBM2a localizes to crystalline regions of cellulose, BMCC was chosen as the matrix for comparison of the affinities of the wild-type and mutant CBM2a polypeptides.Using the synthetic CBM2a encoding DNA as a template and appropriate primers (Table II), 14 mutants of CBM2a were constructed by two-primer PCR cassette mutagenesis (see Materials and methods), including conservative (phenylalanine and tyrosine) and non-conservative (alanine) mutants of each of the three tryptophan residues, W17, W54 and W72, on the binding face.Other residues on the binding face of CBM2a that have the potential to participate in hydrogen bonds with the cellulose surface were substituted individually with alanine (Figure 3).Substitution of each of the surface tryptophan residues with tyrosine, phenylalanine or alanine reduced the affinity for BMCC, with alanine having a greater effect than the more conservative phenylalanine substitution (Table III).The affinities of the tyrosine mutants were W72Y Ͼ W17Y Ͼ W54Y, those of the phenylalanine mutants were W72F Ͼ W17F Ͼ W54F and those of the alanine mutants were W17A Ͼ W72A Ͼ W54A.The W54A mutant had the lowest affinity, two orders of magnitude lower than the wild-type, whereas the W72Y mutant had the highest affinity of the nine tryptophan mutants, about two-thirds of the wild type (Table III).
Alanine was substituted for other residues on the binding face that could potentially form hydrogen bonds with cellulose.The mutation N87A reduced binding to half that of the native polypeptide; the N15A mutation reduced binding to one-fifth of native affinity.The mutations N18 and Q52A reduced binding only slightly; the mutation Q83A did not affect binding (Table V).(Koradi et al., 1996) based on the NMR solution structure (Xu et al., 1995).

Discussion
CBM2a has a similar affinity for all forms of insoluble cellulose tested (K a µ 10 6 M -1 ).Not surprisingly, the capacities of the various celluloses for this family 2a module vary widely, reflecting differences in total number of available binding sites (Table IV).CBM2a has the highest affinity for the crystalline cellulose preparation BMCC, whereas PASC has the greatest capacity for binding, suggesting that the acid swelling process increases the solvent-exposed surface area of crystalline cellulose.By competition experiments, we demonstrate that CBM2a and CBM4-1 bind independently to two different classes of binding sites in PASC.CBM4-1 binds soluble oligosaccharides and amorphous cellulose; it has no significant affinity for highly crystalline BMCC (Johnson et al., 1996a,b;Tomme et al., 1996).CBM4-1 binds saccharide chains within an obvious binding groove and has a maximum affinity for oligosaccharides, such as cellopentaose, with five or more pyranose rings.Binding is enthalpically driven and is fully reversible (Tomme et al., 1996).In contrast, CBM2a has a more planar binding face and does not bind small sugars or cellooligosaccharides.The thermodynamic driving force for binding is dominated by entropic effects, notably dehydration of apolar surface residues, and binding is irreversible (Creagh et al., 1996).In a preparation of PASC, if CBM4-1 binds to amorphous, disordered, individual cellulose chains, then CBM2a is presumably binding to discrete crystalline regions.Additionally, as discussed below, the observed binding capacity of BMCC for CBM2a is approximately equal to its total 806 crystalline surface capacity.The capacity of cellulose to bind CBM2a is therefore a function of the total solvent-exposed surface area of the crystalline cellulose available to this module and does not directly correlate with the bulk crystallinity of the cellulose.Overall, BMCC has a larger proportion of crystalline cellulose than PASC; however, based on the larger capacity of PASC to bind CBM2a, a much larger proportion of the crystalline regions of PASC are solvent-exposed and available to the binding module.Based on ITC data (Creagh et al., 1996), crystalline cellulose appears to offer two classes of binding sites to CBM2a, both of relatively high affinity (10 6 -10 7 M -1 ).The affinity of CBM2a for BMCC, PASC and Avicel is similar; the relatively small differences reflect the proportion of high-and low-affinity binding sites of each cellulose preparation.
The single-site mutation studies indicate that binding of CBM2a involves the concerted interaction with crystalline cellulose of a cluster of solvent-exposed residues located on a planar surface region (see Figure 3) defined by the presence of three tryptophan residues (W17, W54 and W72).Each of these tryptophan residues makes a significant and different contribution to binding.W72A has a 15-fold lower affinity for BMCC than the wild-type His 6 -CBM2a.Replacement of W72 with either phenylalanine or tyrosine, however, results in relatively little loss of binding affinity, suggesting that the contact formed between W72 and the sorbent surface may involve hydrophobic interaction between the side chain and the apolar face of a glucopyranoside ring within the crystalline cellulose matrix.
The 100-fold loss in affinity exhibited by W54A is an order of magnitude more than that for any other mutant and reflects the essential role of this residue in binding.More conservative substitution of W54 with either phenylalanine or tyrosine reduces the binding affinity only 10-fold.Thus, like W72, W54 appears to couple to the crystalline cellulose matrix, at least in part, through favorable hydrophobic interactions.The importance of the apolar aromatic character of W54 and W72 therefore supports the ITC studies (Creagh et al., 1996), which indicate that binding is primarily driven by dehydration of residues in close contact with the crystalline cellulose surface.Dehydration of contacting sorbent and protein surfaces and the concomitant formation of strong van der Waals contacts are known to drive a wide range of specific and non-specific protein adsorption processes (Haynes and Norde, 1994).As observed for binding of CBM2a to BMCC (Creagh et al., 1996), the thermodynamic signature of an adsorption process dominated by dehydration effects is a large positive change in entropy and a large negative change in heat capacity, both due to the release of ordered water in the first and second solvation shells.
Replacement of the tryptophan at position 54 with phenylalanine leads to a 10-fold drop in affinity, in contrast to W72F which has only a two-fold reduction from wild-type.This suggests that the pyrrole ring of W54 interacts favorably with the crystalline cellulose surface.Both W54, through its pyrrolic amine and the cellobiose repeating unit of cellulose, have the potential to form hydrogen bonds.The arrangement of βlinked glucopyranoside units in cellulose presents a uniform surface distribution of hydroxyl groups on the outside of each chain.In native cellulose the rigid β-1,4 glycosidic chain linkage positions these hydroxyl groups to allow all but one of the hydrogen bonds (per glucopyranoside) to be satisfied at both the ( 110) and (1-10) crystal faces through in-plane The arrangements shown allow for at least two of the tryptophan residues (highlighted in green), instrumental in binding, to interact with the edge of the pyranose rings of the staircase-like surface of crystalline cellulose.One side of the molecule is tilted towards the surface, allowing for other residues (highlighted in red) to interact transiently with the cellulose surface forming hydrogen bonds with the available cellulose hydroxyl groups.The cellulose surface was constructed by using the structure of repeating cellobiose units arranged to approximate the proposed crystalline surface of BMCC (Gardner and Blackwell, 1974a,b;Hackney et al., 1994;Baker et al., 1997;Koyama et al., 1997).
interchain interactions (see Figure 4).Thus, in the absence of local structural perturbation, cellulose I appears to offer a suitable proton acceptor for hydrogen bond formation with the pyrrolic amine of W54.The potential to form an intermolecular hydrogen bond suggests that W54 may provide binding specificity in addition to contributing to the overall driving force for adsorption to the crystalline cellulose surface owing, in large part, to dehydration of the hydrophobic indole ring and the underlying crystalline cellulose surface, which exhibits a pronounced hydrophobic character.
The importance to binding affinity of proper formation of an intramolecular hydrogen bond involving the pyrrolic amine of a surface tryptophan is suggested by mutants of W17 and N15.Replacement of the tryptophan at position 17 with either alanine or phenylalanine, neither of which has hydrogen bonding potential, results in an order of magnitude reduction in binding affinity.An equivalent reduction in affinity is observed for N15A.The similarity of the affinities of both of these mutants could be due to the necessity for an intramolecular hydrogen bond likely involving a bridging essential water molecule, between N15 and the pyrrolic amine of W17 locking W17 into a specific orientation.This argument is supported by the fact that the mutant W17Y maintains native-like affinity, presumably because the intramolecular hydrogen bonding 807 requirement for proper orientation of the side chain is at least weakly satisfied by the phenolic group of tyrosine.Intramolecular hydrogen bonding will be investigated further using spectroscopic methods.
Collectively, these results suggest that effective binding of a family 2a CBM to crystalline cellulose requires an aromatic group at position 72, a tryptophan at position 54 and either tyrosine or tryptophan, for possible hydrogen bond formation, at position 17.The tryptophan residues are well conserved among members of family 2a, consistent with the results obtained here by site-specific substitution.W72 is the least well conserved tryptophan of the three surface tryptophans.In five of ~40 members, this position is occupied by tyrosine.The interchangeability of tryptophan and tyrosine at this position in CBM2a, also mirrored in other family 2a members, reinforces the role of hydrophobic interaction of an aromatic ring with the glucopyranoside rings of the cellulose surface.W54 is not effectively substituted by tyrosine, phenylalanine or alanine and it is invariant among family 2a, demonstrating that this residue likely contributes to binding both through dehydration and by providing sorbent specificity via its hydrogen bonding potential.Tryptophans corresponding to W17 are also invariably conserved; the asparagine corresponding to N15 is also very well conserved (Ͼ65%), supporting the hypothesis that an intramolecular hydrogen bond between these residues is an important functional feature of family 2a modules.
The specificity of CBM2a to bind crystalline cellulose and the identification of amino acid residues involved in binding provide a sound basis for formulating a putative binding model.Previous models for the binding of CBMs view crystalline cellulose as comprised of layers of parallel cellulose molecules, with the sugar rings in successive layers perfectly superimposed.The surface to which the CBM binds would then be an ordered array of parallel cellulose chains, with the planes of the pyranose rings parallel to the surface and fully exposed to the solvent.Binding was postulated to occur by direct stacking of the exposed aromatic residues on the binding face of the CBM to the fully solvent-exposed pyranose rings of the (020) crystal face of the cellulose surface (Reinikainen et al., 1995;Tormo et al., 1996;Mattinen et al., 1997).The surface of crystalline cellulose, however, is more accurately described as a staircase (Figure 4) (Henrissat et al., 1988).In cross-section, viewed from one end, the crystal comprises layers with increasing numbers of parallel cellulose molecules; the molecules in successive layers are off-set slightly with respect to those in the layer above it (Gardner and Blackwell, 1974b;Hackney et al., 1994;Baker et al., 1997;Koyama et al., 1997).The largest solvent-exposed surfaces, the two (110) and the two (1-10) crystal faces, comprise mostly the edges of the sugar rings (Figure 4); only the apical cellulose chains at two opposite vertices of the rectangular crystal fully expose one face of their sugar rings.BMCC comprises bundles of microfibrils.Each microfibril presents two solvent-exposed crystalline faces with cross-sectional dimensions estimated to be 15 nm [(110) face] by 40 nm [(1-10) face] ( White and Brown, 1981;Kuga and Brown, 1987).Using the dimensions of the BMCC microfibril and the density of crystalline cellulose (1.5 g/cm 3 ) (Meyer and Misch, 1937), we calculate the (110) crystal face to have a surface area of 3.34ϫ10 5 cm 2 /g cellulose and the (1-10) face to have a calculated surface area of 8.91ϫ10 5 cm 2 /g.As calculated from the solution structure, the maximum area shadowed by a bound CBM2a molecule is 1.32ϫ10 -13 cm 2 .At maximum capacity, assuming monolayer surface coverage and confluent packing of the CBM on the sorbent surfaces, the (110) crystal face could accommodate 4.2 µmol CBM2a/g BMCC and the (1-10) face could accommodate 11.2 µmol CBM2a/g BMCC.The predicted capacity of the two binding faces is therefore 15.4 µmol/g.Given that the estimates of available cellulose surface area calculations are based on a maximum fiber dispersal and monolayer surface coverage, the observed capacity of 11.7 µmol/g cellulose (Table III) agrees remarkably well with this estimated maximum value.The fully exposed cellulose chains, located at the two vertices of the rectangular fibril (the obtuse edges), could accommodate at most 0.91 µmol/g cellulose, far less than the observed capacity.Therefore, CBM2a must be binding to one or both of the crystalline faces [( 110) and (1-10)] of BMCC by interacting with the partially occluded sugar rings.The previous model based on stacking to fully exposed sugar rings leads to a predicted capacity far less than observed experimentally.
Tryptophan residues are involved in many protein-carbohydrate interactions (Quiocho, 1986;Vyas et al., 1988;Vyas, 1991;Rini, 1995;Weis and Drickamer, 1996;Taroni et al., 2000).The non-covalent interactions between the aromatic residues of a protein and the sugar ring are not restricted to parallel stacking; there are likely many other conformations that are both thermodynamically stable and able to mediate ligand binding in crystalline cellulose.Since most of the pyranose rings of the glucose units are partially occluded in crystalline cellulose, the tryptophans cannot stack directly on the sugar rings without a disruption of the cellulose structure.Other conformations, involving angled relationships or off-set parallel stacking that allow interaction between tryptophan and the sugars rings, are therefore more likely to occur (Sun and Bernstein, 1996;Weis and Drickamer, 1996;McGaughey et al., 1998).With minimal movement from their equilibrium positions in the solution state, the surface tryptophans could form non-covalent interactions with the individual cellulose chains without substantial disruption of the crystal lattice.In CBM2a, the tryptophans essential for binding form a ridge along one face of the module.We propose that the tryptophans on this ridge bind to partially exposed cellulose chains or 'steps', on the face of the cellulose crystal (Figure 4).The binding module tilts toward the 'staircase' so that other residues such as Q52 and N87, located alongside but slightly removed vertically from the vertex of the tryptophan ridge, are oriented in positions potentially to hydrogen bond or form van der Waal's interactions with the groups at the edges of the cellulose chains.The module may bind along one cellulose chain or straddle several 'steps' that comprise the cellulose surface.The arrangement of the tryptophan residues appears to allow contact with cellulose via only two pairs of the three tryptophans at once: either W17 and W54 or W54 and W72.
The binding of CBM2a to BMCC is dominated by entropic effects (Creagh et al., 1996).Dehydration of the hydrophobic tryptophan-rich protein surface and the crystalline cellulose surfaces drives binding.The specificity of the module for the cellulose surface, via the formation of hydrogen bonds and van der Waals contacts, is provided by both the surface tryptophans (especially W54) and other residues on the binding surface.The model proposed here agrees with this mechanism of binding.With the exception of N15, residues of CBM2a immediately adjacent to the tryptophan binding ridge that have hydrogen bonding potential have little overall effect on binding affinity.A similar phenomenon has been described for CBMs from families 1 and 5 (Linder et al., 1995b;Simpson and Barras, 1999).These residues, often conserved in a family, may participate in specific intermolecular interactions between the protein and the cellulose, but do not provide the thermodynamic impetus for binding (Creagh et al., 1996).Their mutation to alanine has little or no effect on binding affinity.
The binding of CBM2a to crystalline cellulose is irreversible yet dynamic (Ong et al., 1993;Bray et al., 1996;Creagh et al., 1996).The domain moves in two dimensions over the cellulose surface without ever fully dissociating from it (Jervis et al., 1997).Sufficient multiple contacts must be maintained to prevent desorption.Since the module moves on the surface, the number of residues directly interacting with the cellulose surface must be in constant flux.For example, once CBM2a is bound to cellulose, none of the surface tryptophans is protected completely from oxidation by N-bromosuccinimide, although they are more difficult to oxidize (Bray et al., 1996).The accessibility to oxidation is consistent with a model for binding in which the tryptophans do not overlap the pyranose rings completely and only two of the three conserved tryptophans bind to BMCC at any instant.The mutation W54A reduced affinity the most, suggesting that this residue plays the major role in binding, perhaps allowing pivoting between binding by W17-W54 and W54-W72 pairs.Previous studies have demonstrated the abrogation or reduction of binding affinity when surface tryptophan residues of this and other family 2a CBMs are modified by site-directed mutation or chemical modification, clearly emphasizing the substantial role that tryptophans play in family 2a CBMs binding to insoluble cellulose (Poole et al., 1993;Din et al., 1994;Bray et al., 1996).Additionally, the conservation of aromatic residues, such as tyrosine and phenylalanine, on an exposed surface is common in CBMs from families 1, 3, 5 and 10.Representatives from each of these families bind to crystalline cellulose and, as with CBM2a, mutation of the exposed aromatic residues reduces the affinity for cellulose (Reinikainen et al., 1992;Nagy et al., 1998;Simpson and Barras, 1999;Ponyi et al., 2000;Raghothama et al., 2000).Many of the CBM2a variants constructed here with conservative substitutions maintain appreciable and biologically significant affinity for crystalline cellulose.We propose that all of the binding modules with affinity for crystalline cellulose share a mode of binding where the surface aromatics drive binding and make contact with the staircase-like cellulose surface.

Fig. 2 .
Fig. 2. Competition binding isotherms.Binding of CBM2a to PASC alone and in competition with saturating concentrations of CBM4-1.The main panel plots the fraction bound vs the free concentation of CBM2a: filled circles indicate the binding of CBM2a alone, open circles indicate the binding of CBM2a in the presence of a saturating amount of CBM4-1 and the triangles indicate the fraction CBM4-1 bound.The inset shows the relationship between fluorescence of Oregon Green-labeled CBM2a and the solution polypeptide concentration.The relationship between AMCA-X ϭ labeled CBM4-1 fluorescence and polypeptide concentration is similar to that of labeled CBM2a and is not shown.

Fig. 3 .
Fig.3.Two views of a representation of CBM2a highlighting the position of the residues on the binding face that were mutated.This figure was created with MolMol(Koradi et al., 1996) based on the NMR solution structure(Xu et al., 1995).

Fig. 4 .
Fig. 4. Three views each of two possible arrangements of CBM2a bound to the (110) face of crystalline celulose.(I) CBM2a bound roughly along a single cellulose chain and (II) CBM2a bound across a number of cellulose chains.(A) Cross-section of the microfibril, with the internal cellulose chains omitted for simplicity; (B) side view of the solvent-exposed surface; (C) view along the cellulose surface.The arrangements shown allow for at least two of the tryptophan residues (highlighted in green), instrumental in binding, to interact with the edge of the pyranose rings of the staircase-like surface of crystalline cellulose.One side of the molecule is tilted towards the surface, allowing for other residues (highlighted in red) to interact transiently with the cellulose surface forming hydrogen bonds with the available cellulose hydroxyl groups.The cellulose surface was constructed by using the structure of repeating cellobiose units arranged to approximate the proposed crystalline surface of BMCC(Gardner and Blackwell, 1974a,b;Hackney et al., 1994;Baker et al., 1997;Koyama et al., 1997).

Table I .
Oligonucleotides used for the construction of the synthetic His 6 -CBM2a gene fragment

Table II .
Oligonucleotides used for the construction of His 6 -CBM2a mutants Nucleotides in bold type indicate the loci of changes made for amino acid substitution or introduction of a restriction endonuclease recognition site. a

Table III .
Binding of CBM2a tryptophan variants and native CBM2a to BMCC

Table IV .
Binding parameters of His 6 -CBM2a to insoluble cellulose

Table V .
Binding of CBM2a variants to BMCC