The major intrinsic proteins (MIPs) form a large protein family of ancient origin and are found in bacteria, fungi, animals, and plants. MIPs act as channels in membranes to facilitate passive transport across the membrane. Some MIPs allow small polar molecules like glycerol or urea to pass through the membrane. However, the majority of MIPs are thought to be aquaporins (AQPs), i.e., they are specific for water transport. Plant MIPs can be subdivided into the plasma membrane intrinsic protein, tonoplast intrinsic protein, and NOD26-like intrinsic protein subfamilies. By database mining and phylogenetic analyses, we have identified a new subfamily in plants, the Small basic Intrinsic Proteins (SIPs). Comparisons of sequences from the new subfamily with conserved amino acid residues in other MIPs reveal characteristic features of SIPs. Possible functional consequences of these features are discussed in relation to the recently solved structures of AQP1 and GlpF. We suggest that substitutions at conserved and structurally important positions imply a different substrate specificity for the new subfamily.
Major intrinsic proteins (MIPs) are integral membrane proteins that facilitate the passive transport of small polar molecules across membranes. In many membranes, MIPs are very abundant and constitute the major membrane protein. Plant MIPs have been classified into three different subfamilies in phylogenetic comparisons (Weig, Deswartes, and Chrispeels 1997<$REFLINK> ; Johansson et al. 2000<$REFLINK> ). Two of the subfamilies, plasma membrane intrinsic proteins (PIPs) and tonoplast intrinsic proteins (TIPs), are named according to the subcellular location of the proteins. The third group of MIPs shows similarity with NOD26, a nodulin expressed in peribacteroid membrane surrounding the symbiotic nitrogen-fixing bacteria in nodules of soybean roots (Fortin, Morrison, and Verma 1987<$REFLINK> ). These proteins are called NOD26-like MIPs (NLMs; Weig, Deswartes, and Chrispeels 1997<$REFLINK> ) or NOD26-like intrinsic proteins (NIPs; Heymann and Engel 1999<$REFLINK> ).
Most of the plant MIPs that have been tested have been shown to function as water channels (for recent reviews see Johansson et al. 2000<$REFLINK> and Santoni et al. 2000<$REFLINK> ). However, there are some examples of plant MIPs that show a broader permeability. For example Nt-AQP1 (aquaporins [AQP]) belongs to the PIPs and transports water, glycerol, and urea, whereas Nt-TIPa mainly transports urea but also water and glycerol. Several NIPs have been shown to transport both water and glycerol (Dean et al. 1999<$REFLINK> ; Weig and Jakob 2000<$REFLINK> ). The recent determination of the structure of the water-specific AQP1 and the glycerol-specific GlpF to a resolution of 3.8 and 2.2 Å, respectively, has identified residues that are important determinants for the substrate specificity (Fu et al. 2000<$REFLINK> ; Murata et al. 2000<$REFLINK> ). These structures provide valuable information for predicting the substrate specificity of new forms of MIPs.
The genome sequencing project of Arabidopsis has revealed many new genes (Arabidopsis Genome Initiative 2000<$REFLINK> ). The new genes are automatically annotated according to their best matches in similarity searches, and in most cases this annotation will provide a clue to their function. However, sometimes, for example, when the distance to a well-characterized protein is too long, this fast annotation fails or is directly misleading. For example, accession AAC26712 has been annotated as a PIP, although it clearly belongs to the recently discovered NIP subfamily (NIP2;1, Johanson et al. 2001<$REFLINK> ). To facilitate a correct annotation of genomic or EST accessions, to a reasonable resolution, it is important to perform phylogenetic analyses of protein families to identify the major branches in the phylogenetic tree. The aim of this paper is to present characteristic features of the Small basic Intrinsic Proteins (SIPs), a new and novel plant MIP subfamily.
Materials and Methods
SIPs genes and ESTs were found by BLAST or TBLASTN searches at the National Center for Biotechnology Information (NCBI; www.ncbi.nlm.nih.gov:80/blast/blast.cgi) or at The Arabidopsis Information Resource (TAIR; www.arabidopsis.org/blast/).
MacVector 7.0 (Oxford Molecular Ltd, U.K.) was used to translate sequences and to calculate pIs and molecular weights presented in figure 2 . ClustalW (Thompson, Higgins, and Gibson 1994<$REFLINK> ), included in MacVector 7.0, was used to generate multiple alignments of translated sequences using the blosum matrix and slow mode. Open gap penalty and extend gap penalty were set to 10.0 and 0.05, respectively. Alignments were manually inspected and adjusted to fit to conserved residues and to avoid gaps in transmembrane helices (Heymann and Engel 2000<$REFLINK> ). The alignment of SIPs to GlpF and AQP1 is unambiguous in all helices except in helix 2 and 5. These regions are harder to align because of the lack of conserved residues that are shared between SIPs and other MIPs.
PAUP*4.0b4a (Swofford 2000<$REFLINK> ) was used in phylogenetic analyses. Cytoplasmic N- and C-termini were excluded from the phylogenetic analyses of the 374-character–long alignment. Cytoplasmic N- and C-termini were defined as characters 1–92 and 336–374, respectively, based on the alignment with AQP1 and GlpF. One hundred bootstrap replicates were performed in bootstrap tests using the option full heuristic search.
Results and Discussion
Phylogenetic Analysis and ESTs
Blast searches in the Arabidopsis genomic sequence (Arabidopsis Genome Initiative 2000<$REFLINK> ) reveal three novel MIP-like genes, T6K12.29, MRG7.25, and F24I3.30, all very different from any of the previously described plant MIP genes belonging to the PIP, TIP, and NIP subfamilies (U. Johanson at the MIP 2000 meeting in Göteborg, Sweden, unpublished data). Phylogenetic comparison with Arabidopsis representatives for these three MIP subfamilies shows that the new proteins form a distinct and new subfamily of MIPs (fig. 1 ). T6K12.29 and MRG7.25 are more similar to each other than to the F24I3.30. We suggest that these proteins should be named SIP1;1, SIP1;2, and SIP2;1, respectively (see subsequently and Johanson et al. 2001<$REFLINK> ). Matching ESTs in Arabidopsis verify that all three new genes are expressed and therefore likely to be fully functional (table 1 ). Judging from the low number of corresponding ESTs (2–6/gene), the overall expression levels of the new genes are likely to be much lower than for most PIPs and TIPs which have very abundant mRNAs (Weig, Deswartes, and Chrispeels 1997<$REFLINK> ).
ESTs encoding SIP-like proteins can not only be identified in Arabidopsis and other dicotyledons but also in several monocotyledons, one conifer, and one fern species (Ceratopteris richardii). This suggests that SIPs are widely distributed in the plant kingdom and can be found at least in all higher vascular plants. On the basis of the partial sequence of the ESTs, most of them can be classified as SIP1s; only one (Glycine max AW351321) is definitely of the SIP2 type (data not shown). According to the ESTs, SIP2s are expressed preferentially in roots, whereas there is no obvious pattern in the expression of SIP1s. Very recently, a phylogenetic analysis of different MIPs identified in 470,000 maize ESTs revealed that maize also has at least one SIP2 gene in addition to two SIP1 genes (Chaumont et al. 2001<$REFLINK> ).
Characteristics of SIPs
We have previously noted that the different MIP subfamilies tend to have characteristic biochemical properties (Johansson et al. 2000<$REFLINK> ). For instance, TIPs are in general smaller and more acidic than PIPs or NIPs. Proteins of the new subfamily are also small like TIPs but still different from TIPs in that they are highly basic proteins (fig. 2 ). Hence, we suggest the name SIPs for this new MIP subfamily. The main reason for their small size is a very short cytosolic N-terminal region compared with the other plant MIPs. The N-terminal region in SIPs is even shorter than in TIPs but similar to AqpZ from Escherichia coli. The high isoelectric point of SIPs is partly caused by runs of lysines in the C-terminal region. Unlike in the basic PIPs and NIPs, these basic residues are not a part of any evident phosphorylation site (Johansson et al. 2000<$REFLINK> ).
Heymann and Engel (2000)<$REFLINK> have compiled and aligned in total 164 different forms of MIPs from bacteria, fungi, animals, and plants. On the basis of 46 different type sequences they identified highly conserved amino acid residues. The high degree of conservation suggests that all MIPs have a common fold, and that the conserved residues have a crucial role in the structure and function of most MIPs. The recently published structures of the human water channel AQP1 and the bacterial glycerol facilitator GlpF are indeed very similar, despite the difference in substrate specificity and the long evolutionary distance between man and bacteria (Fu et al. 2000<$REFLINK> ; Murata et al. 2000<$REFLINK> ). The SIPs clearly belong to the MIP family because they have most of the conserved amino acid residues found in other MIPs. However, there are several interesting deviations in SIPs compared with other MIPs that are worth commenting on and on which this paper will focus.
The predicted topology of SIPs conform to the general structure of MIPs with six transmembrane helices and two shorter transmembrane helices that together form a seventh transmembrane region connected by two NPA or NPA-like motifs (fig. 3 ). Thus, the overall fold of the SIPs is likely to be similar to other MIPs like AQP1 or GlpF. Both AQP1 and GlpF form right-handed helical bundles around the pore with a narrow part close to the asparagines of the NPA boxes. In GlpF, the interior of the channel is mainly hydrophobic, with the exception of a helical polar stripe running through the channel providing sites of hydrophilic interactions for the hydroxyl groups of glycerol. The polar stripe consists of carbonyls from the backbone of the extended chain preceding the two short half transmembrane helices, the side chains of asparagine of the two NPA boxes, and a conserved arginine (206) in the second half transmembrane helix (Fu et al. 2000<$REFLINK> ).
Some structurally important positions where SIPs are different compared to GlpF and AQP1 are listed in table 2 , and in figure 4 some of these positions are shown in the structure of GlpF. In general, the amino acid residues at these positions are conserved among non–SIP-MIPs. Several deviations from the MIP consensus are clustered in positions that affect loop B and helix B. In helix 1, there is a very conserved glutamate (E14) among MIPs. The corresponding amino acid in all SIPs investigated is aspartate. In GlpF, the carboxyl group of E14 is important in fixating loop B in the right position by forming a hydrogen bond to the backbone NH of H66 (Fu et al. 2000<$REFLINK> ). If the same interaction were maintained in SIPs, this would change the position of loop B relative to helix 1 and might result in a wider cytoplasmic vestibule. Three residues before H66 in GlpF there is a conserved S63. The corresponding S71 in AQP1 is thought to further stabilize loop B by hydrogen bonding to Y97 in helix 3. This interaction is not possible in SIPs where the corresponding residues are G and R, respectively, although other types of interactions cannot be excluded. Positions A70–T72 are substituted in most of the SIPs examined. These positions are part of the core close to N68 in the first NPA box and may be of importance for the correct positioning of N68. D207 and P210 probably have a corresponding function in helix E, in that they influence the position of N203 in the second NPA box. The highly conserved R206, also in helix E, is not present in SIPs. In GlpF, the arginine side chain is part of a glycerol-binding site which has been proposed to constitute a part of a selectivity filter responsible for the glycerol specificity (Fu et al. 2000<$REFLINK> ). Interestingly, a V or S mutation of the corresponding R189 in the bacterial water channel AqpZ abolishes water transport (Borgnia et al. 1999<$REFLINK> ). F200 is not conserved among MIPs and is replaced by P in most SIPs. All these changes may affect the specificity of the channel because positions of the carbonyls of H66 and F200, together with the side chains of N68, N203, and R206, determine the geometry of part of the polar strip inside the channel. Furthermore, some of the hydrophobic residues inside the GlpF channel are changed in SIPs and may in some cases form hydrogen bonds to a transported molecule. In particular, the threonine in SIPs at the position corresponding to I187 in GlpF is interesting because this residue might disrupt the hydrophobic back and provide a novel polar patch directly opposite to N68 and N203, again suggesting that SIPs have a different specificity compared with other MIPs.
In GlpF, the aromatic W48 and F200 form a hydrophobic corner that interacts with the alkyl chain of glycerol (Fu et al. 2000<$REFLINK> ). In SIPs, they are replaced by the smaller nonaromatic L or S and P, respectively. The most conserved amino acid residues in helices 2 and 5 are G49 and G184, respectively. These residues allow a close packing of these two helices. In SIPs, the G residues are substituted by L and S, respectively, which will affect the packing of helices 2 and 5. The result of these changes could be a wider pore. In addition, the less-conserved P180 in helix 5 is also involved in the interaction with helix 2 because it protrudes from helix 5 and fits nicely in between I56 and Y57 of helix 2. It is hard to predict the effects of the polar threonine at this position in the SIPs. It might affect the packing of helix 2 and 5, but it is also possible that the hydroxyl group is pointing to the cytoplasmic vestibule, thereby providing another hydrophilic interaction site for the substrate. However, it is important to realize that the exact alignment, of SIPs to GlpF and AQP1, in helix 2 and 5 is not as clear as in other regions because of the lack of the conserved glycines in SIPs.
In summary, the SIPs are different at many important positions that are likely to affect the interior properties of the channel. Both the width of the pore and the hydrophilic-hydrophobic pattern inside the channel may be altered in SIPs, as compared with other MIPs. This suggests that SIPs have a different substrate specificity than both AQP1 and GlpF. From sequence comparisons, it has previously been suggested that there are mainly five positions, P1–P5, that are important determinants of substrate specificity for water and glycerol channels (Froger et al. 1998<$REFLINK> ). Remarkably, mutating a YW to PL at P4–P5 has been shown to be sufficient to convert one water channel, AQPcic, into a glycerol facilitator (Lagrée et al. 1999<$REFLINK> ). In this context, it is interesting to note that all the SIPs carry the canonical aquaporin sequence, YW, at the corresponding position.
Function and Intracellular Location of SIPs
We have identified a new type of MIP that is very different from other previously identified plant MIPs. In order to elucidate the function of SIPs, it is important to determine their intracellular localization, expression pattern, and substrate specificity. Experiments are in progress to address these questions. In the meantime, it is important to recognize the SIPs as a new subfamily of plant MIPs and to avoid confusing them with other MIPs because the SIPs most likely have a different function compared with all previously characterized MIPs.
Claudia Kappen, Reviewing Editor
Keywords: MIPs water channels aquaporins glycerol facilitators SIPs
Address for correspondence and reprints: Urban Johanson, Department of Plant Biochemistry, P.O. Box 124, S-221 00 Lund, Sweden. firstname.lastname@example.org .
This work was funded by NFR, SJFR and EU-Biotech program (BIO4-CT98-0024). We are grateful to Per Kjellbom and Salam Al-Karadaghi for critical reading and helpful suggestions on the manuscript.