## Abstract

The major intrinsic proteins (MIPs) form a large protein family of ancient origin and are found in bacteria, fungi, animals, and plants. MIPs act as channels in membranes to facilitate passive transport across the membrane. Some MIPs allow small polar molecules like glycerol or urea to pass through the membrane. However, the majority of MIPs are thought to be aquaporins (AQPs), i.e., they are specific for water transport. Plant MIPs can be subdivided into the plasma membrane intrinsic protein, tonoplast intrinsic protein, and NOD26-like intrinsic protein subfamilies. By database mining and phylogenetic analyses, we have identified a new subfamily in plants, the Small basic Intrinsic Proteins (SIPs). Comparisons of sequences from the new subfamily with conserved amino acid residues in other MIPs reveal characteristic features of SIPs. Possible functional consequences of these features are discussed in relation to the recently solved structures of AQP1 and GlpF. We suggest that substitutions at conserved and structurally important positions imply a different substrate specificity for the new subfamily.

Major intrinsic proteins (MIPs) are integral membrane proteins that facilitate the passive transport of small polar molecules across membranes. In many membranes, MIPs are very abundant and constitute the major membrane protein. Plant MIPs have been classified into three different subfamilies in phylogenetic comparisons (Weig, Deswartes, and Chrispeels 1997<$REFLINK> ; Johansson et al. 2000<$REFLINK> ). Two of the subfamilies, plasma membrane intrinsic proteins (PIPs) and tonoplast intrinsic proteins (TIPs), are named according to the subcellular location of the proteins. The third group of MIPs shows similarity with NOD26, a nodulin expressed in peribacteroid membrane surrounding the symbiotic nitrogen-fixing bacteria in nodules of soybean roots (Fortin, Morrison, and Verma 1987<$REFLINK> ). These proteins are called NOD26-like MIPs (NLMs; Weig, Deswartes, and Chrispeels 1997<$REFLINK> ) or NOD26-like intrinsic proteins (NIPs; Heymann and Engel 1999<$REFLINK> ). Most of the plant MIPs that have been tested have been shown to function as water channels (for recent reviews see Johansson et al. 2000<$REFLINK> and Santoni et al. 2000<$REFLINK> ). However, there are some examples of plant MIPs that show a broader permeability. For example Nt-AQP1 (aquaporins [AQP]) belongs to the PIPs and transports water, glycerol, and urea, whereas Nt-TIPa mainly transports urea but also water and glycerol. Several NIPs have been shown to transport both water and glycerol (Dean et al. 1999<$REFLINK> ; Weig and Jakob 2000<$REFLINK> ). The recent determination of the structure of the water-specific AQP1 and the glycerol-specific GlpF to a resolution of 3.8 and 2.2 Å, respectively, has identified residues that are important determinants for the substrate specificity (Fu et al. 2000<$REFLINK> ; Murata et al. 2000<$REFLINK> ). These structures provide valuable information for predicting the substrate specificity of new forms of MIPs. The genome sequencing project of Arabidopsis has revealed many new genes (Arabidopsis Genome Initiative 2000<$REFLINK> ). The new genes are automatically annotated according to their best matches in similarity searches, and in most cases this annotation will provide a clue to their function. However, sometimes, for example, when the distance to a well-characterized protein is too long, this fast annotation fails or is directly misleading. For example, accession AAC26712 has been annotated as a PIP, although it clearly belongs to the recently discovered NIP subfamily (NIP2;1, Johanson et al. 2001<$REFLINK> ). To facilitate a correct annotation of genomic or EST accessions, to a reasonable resolution, it is important to perform phylogenetic analyses of protein families to identify the major branches in the phylogenetic tree. The aim of this paper is to present characteristic features of the Small basic Intrinsic Proteins (SIPs), a new and novel plant MIP subfamily. ## Materials and Methods SIPs genes and ESTs were found by BLAST or TBLASTN searches at the National Center for Biotechnology Information (NCBI; www.ncbi.nlm.nih.gov:80/blast/blast.cgi) or at The Arabidopsis Information Resource (TAIR; www.arabidopsis.org/blast/). MacVector 7.0 (Oxford Molecular Ltd, U.K.) was used to translate sequences and to calculate pIs and molecular weights presented in figure 2 . ClustalW (Thompson, Higgins, and Gibson 1994<$REFLINK> ), included in MacVector 7.0, was used to generate multiple alignments of translated sequences using the blosum matrix and slow mode. Open gap penalty and extend gap penalty were set to 10.0 and 0.05, respectively. Alignments were manually inspected and adjusted to fit to conserved residues and to avoid gaps in transmembrane helices (Heymann and Engel 2000<REFLINK> ). The alignment of SIPs to GlpF and AQP1 is unambiguous in all helices except in helix 2 and 5. These regions are harder to align because of the lack of conserved residues that are shared between SIPs and other MIPs. PAUP*4.0b4a (Swofford 2000<REFLINK> ) was used in phylogenetic analyses. Cytoplasmic N- and C-termini were excluded from the phylogenetic analyses of the 374-character–long alignment. Cytoplasmic N- and C-termini were defined as characters 1–92 and 336–374, respectively, based on the alignment with AQP1 and GlpF. One hundred bootstrap replicates were performed in bootstrap tests using the option full heuristic search.

ESTs encoding SIP-like proteins can not only be identified in Arabidopsis and other dicotyledons but also in several monocotyledons, one conifer, and one fern species (Ceratopteris richardii). This suggests that SIPs are widely distributed in the plant kingdom and can be found at least in all higher vascular plants. On the basis of the partial sequence of the ESTs, most of them can be classified as SIP1s; only one (Glycine max AW351321) is definitely of the SIP2 type (data not shown). According to the ESTs, SIP2s are expressed preferentially in roots, whereas there is no obvious pattern in the expression of SIP1s. Very recently, a phylogenetic analysis of different MIPs identified in 470,000 maize ESTs revealed that maize also has at least one SIP2 gene in addition to two SIP1 genes (Chaumont et al. 2001<$REFLINK> ). ### Characteristics of SIPs We have previously noted that the different MIP subfamilies tend to have characteristic biochemical properties (Johansson et al. 2000<$REFLINK> ). For instance, TIPs are in general smaller and more acidic than PIPs or NIPs. Proteins of the new subfamily are also small like TIPs but still different from TIPs in that they are highly basic proteins (fig. 2 ). Hence, we suggest the name SIPs for this new MIP subfamily. The main reason for their small size is a very short cytosolic N-terminal region compared with the other plant MIPs. The N-terminal region in SIPs is even shorter than in TIPs but similar to AqpZ from Escherichia coli. The high isoelectric point of SIPs is partly caused by runs of lysines in the C-terminal region. Unlike in the basic PIPs and NIPs, these basic residues are not a part of any evident phosphorylation site (Johansson et al. 2000<$REFLINK> ). Heymann and Engel (2000)<$REFLINK> have compiled and aligned in total 164 different forms of MIPs from bacteria, fungi, animals, and plants. On the basis of 46 different type sequences they identified highly conserved amino acid residues. The high degree of conservation suggests that all MIPs have a common fold, and that the conserved residues have a crucial role in the structure and function of most MIPs. The recently published structures of the human water channel AQP1 and the bacterial glycerol facilitator GlpF are indeed very similar, despite the difference in substrate specificity and the long evolutionary distance between man and bacteria (Fu et al. 2000<$REFLINK> ; Murata et al. 2000<$REFLINK> ). The SIPs clearly belong to the MIP family because they have most of the conserved amino acid residues found in other MIPs. However, there are several interesting deviations in SIPs compared with other MIPs that are worth commenting on and on which this paper will focus.

The predicted topology of SIPs conform to the general structure of MIPs with six transmembrane helices and two shorter transmembrane helices that together form a seventh transmembrane region connected by two NPA or NPA-like motifs (fig. 3 ). Thus, the overall fold of the SIPs is likely to be similar to other MIPs like AQP1 or GlpF. Both AQP1 and GlpF form right-handed helical bundles around the pore with a narrow part close to the asparagines of the NPA boxes. In GlpF, the interior of the channel is mainly hydrophobic, with the exception of a helical polar stripe running through the channel providing sites of hydrophilic interactions for the hydroxyl groups of glycerol. The polar stripe consists of carbonyls from the backbone of the extended chain preceding the two short half transmembrane helices, the side chains of asparagine of the two NPA boxes, and a conserved arginine (206) in the second half transmembrane helix (Fu et al. 2000<$REFLINK> ). Some structurally important positions where SIPs are different compared to GlpF and AQP1 are listed in table 2 , and in figure 4 some of these positions are shown in the structure of GlpF. In general, the amino acid residues at these positions are conserved among non–SIP-MIPs. Several deviations from the MIP consensus are clustered in positions that affect loop B and helix B. In helix 1, there is a very conserved glutamate (E14) among MIPs. The corresponding amino acid in all SIPs investigated is aspartate. In GlpF, the carboxyl group of E14 is important in fixating loop B in the right position by forming a hydrogen bond to the backbone NH of H66 (Fu et al. 2000<$REFLINK> ). If the same interaction were maintained in SIPs, this would change the position of loop B relative to helix 1 and might result in a wider cytoplasmic vestibule. Three residues before H66 in GlpF there is a conserved S63. The corresponding S71 in AQP1 is thought to further stabilize loop B by hydrogen bonding to Y97 in helix 3. This interaction is not possible in SIPs where the corresponding residues are G and R, respectively, although other types of interactions cannot be excluded. Positions A70–T72 are substituted in most of the SIPs examined. These positions are part of the core close to N68 in the first NPA box and may be of importance for the correct positioning of N68. D207 and P210 probably have a corresponding function in helix E, in that they influence the position of N203 in the second NPA box. The highly conserved R206, also in helix E, is not present in SIPs. In GlpF, the arginine side chain is part of a glycerol-binding site which has been proposed to constitute a part of a selectivity filter responsible for the glycerol specificity (Fu et al. 2000<$REFLINK> ). Interestingly, a V or S mutation of the corresponding R189 in the bacterial water channel AqpZ abolishes water transport (Borgnia et al. 1999<$REFLINK> ). F200 is not conserved among MIPs and is replaced by P in most SIPs. All these changes may affect the specificity of the channel because positions of the carbonyls of H66 and F200, together with the side chains of N68, N203, and R206, determine the geometry of part of the polar strip inside the channel. Furthermore, some of the hydrophobic residues inside the GlpF channel are changed in SIPs and may in some cases form hydrogen bonds to a transported molecule. In particular, the threonine in SIPs at the position corresponding to I187 in GlpF is interesting because this residue might disrupt the hydrophobic back and provide a novel polar patch directly opposite to N68 and N203, again suggesting that SIPs have a different specificity compared with other MIPs.

In GlpF, the aromatic W48 and F200 form a hydrophobic corner that interacts with the alkyl chain of glycerol (Fu et al. 2000<REFLINK> ). In SIPs, they are replaced by the smaller nonaromatic L or S and P, respectively. The most conserved amino acid residues in helices 2 and 5 are G49 and G184, respectively. These residues allow a close packing of these two helices. In SIPs, the G residues are substituted by L and S, respectively, which will affect the packing of helices 2 and 5. The result of these changes could be a wider pore. In addition, the less-conserved P180 in helix 5 is also involved in the interaction with helix 2 because it protrudes from helix 5 and fits nicely in between I56 and Y57 of helix 2. It is hard to predict the effects of the polar threonine at this position in the SIPs. It might affect the packing of helix 2 and 5, but it is also possible that the hydroxyl group is pointing to the cytoplasmic vestibule, thereby providing another hydrophilic interaction site for the substrate. However, it is important to realize that the exact alignment, of SIPs to GlpF and AQP1, in helix 2 and 5 is not as clear as in other regions because of the lack of the conserved glycines in SIPs. In summary, the SIPs are different at many important positions that are likely to affect the interior properties of the channel. Both the width of the pore and the hydrophilic-hydrophobic pattern inside the channel may be altered in SIPs, as compared with other MIPs. This suggests that SIPs have a different substrate specificity than both AQP1 and GlpF. From sequence comparisons, it has previously been suggested that there are mainly five positions, P1–P5, that are important determinants of substrate specificity for water and glycerol channels (Froger et al. 1998<REFLINK> ). Remarkably, mutating a YW to PL at P4–P5 has been shown to be sufficient to convert one water channel, AQPcic, into a glycerol facilitator (Lagrée et al. 1999<$REFLINK> ). In this context, it is interesting to note that all the SIPs carry the canonical aquaporin sequence, YW, at the corresponding position. ### Function and Intracellular Location of SIPs We have identified a new type of MIP that is very different from other previously identified plant MIPs. In order to elucidate the function of SIPs, it is important to determine their intracellular localization, expression pattern, and substrate specificity. Experiments are in progress to address these questions. In the meantime, it is important to recognize the SIPs as a new subfamily of plant MIPs and to avoid confusing them with other MIPs because the SIPs most likely have a different function compared with all previously characterized MIPs. Claudia Kappen, Reviewing Editor Keywords: MIPs water channels aquaporins glycerol facilitators SIPs Address for correspondence and reprints: Urban Johanson, Department of Plant Biochemistry, P.O. Box 124, S-221 00 Lund, Sweden. urban.johanson@plantbio.lu.se . Table 1 SIP-like EST from Plants and Their Minimal Expression Pattern Based on Information in the Sequence Accessions Table 2 Structurally Important Amino Acids in MIPs and Their Role in the Structure of GlpF or AQP1 (or both) Fig. 1.—Phylogenetic comparison of protein sequences of the new SIP subfamily of MIPs to representatives for the previously described subfamilies of Arabidopsis PIPs, TIPs, and NIPs. This unrooted phylogram has been generated in PAUP4.0b using the distance method with minimum evolution as the optimality criteria. Parsimonious analysis results in one tree (data not shown) with a very similar topology to the shown distance tree. The only difference when compared with the distance tree is the position of TIP3;1 and TIP3;2 which branch closer to TIP1s in the parsimony tree. Each of the four subfamilies are monophyletic, regardless of method, with bootstrapping values of 98%–100% depending on the tree building method used. Protein accession numbers are found in Johanson et al. (2001)<$REFLINK> . Cri BE641624, Gar AW729182, and Hvu AW982441 denote proteins translated from ESTs (table 1 ). AW982441 probably encodes the full-length protein, whereas the other two ESTs probably are slightly truncated in the C termini (no stop codon in the ESTs)

Fig. 1.—Phylogenetic comparison of protein sequences of the new SIP subfamily of MIPs to representatives for the previously described subfamilies of Arabidopsis PIPs, TIPs, and NIPs. This unrooted phylogram has been generated in PAUP4.0b using the distance method with minimum evolution as the optimality criteria. Parsimonious analysis results in one tree (data not shown) with a very similar topology to the shown distance tree. The only difference when compared with the distance tree is the position of TIP3;1 and TIP3;2 which branch closer to TIP1s in the parsimony tree. Each of the four subfamilies are monophyletic, regardless of method, with bootstrapping values of 98%–100% depending on the tree building method used. Protein accession numbers are found in Johanson et al. (2001)<REFLINK> . Cri BE641624, Gar AW729182, and Hvu AW982441 denote proteins translated from ESTs (table 1 ). AW982441 probably encodes the full-length protein, whereas the other two ESTs probably are slightly truncated in the C termini (no stop codon in the ESTs) Fig. 2.—Comparison of molecular weight and isoelectric point for representatives of different subfamilies of MIPs from Arabidopsis. For accession numbers, see figure 1 Fig. 2.—Comparison of molecular weight and isoelectric point for representatives of different subfamilies of MIPs from Arabidopsis. For accession numbers, see figure 1 Fig. 3.—Alignment of GlpF and AQP1 to the SIPs included in the phylogenetic analysis. Dark and light shadings show positions with identical and similar amino acid residues, respectively, shared by at least six of the eight aligned sequences. Transmembrane helices common to both AQP1 and GlpF are boxed and numbered H1–H6. HB and HE together form a seventh transmembrane helix, connected by the two NPA boxes. The most common amino acid residue among MIPs, according to figure 3 in Heymann and Engel (2000)<REFLINK> is shown at the top for some positions where SIPs differ from other MIPs. At the bottom, the SIP consensus at these positions is indicated. P4–P5 mark two positions that have been suggested to be important determinants of substrate specificity for water and glycerol channels (Froger et al. 1998<REFLINK> ) Fig. 3.—Alignment of GlpF and AQP1 to the SIPs included in the phylogenetic analysis. Dark and light shadings show positions with identical and similar amino acid residues, respectively, shared by at least six of the eight aligned sequences. Transmembrane helices common to both AQP1 and GlpF are boxed and numbered H1–H6. HB and HE together form a seventh transmembrane helix, connected by the two NPA boxes. The most common amino acid residue among MIPs, according to figure 3 in Heymann and Engel (2000)<REFLINK> is shown at the top for some positions where SIPs differ from other MIPs. At the bottom, the SIP consensus at these positions is indicated. P4–P5 mark two positions that have been suggested to be important determinants of substrate specificity for water and glycerol channels (Froger et al. 1998<$REFLINK> ) Fig. 4.—Part of the interior structure of GlpF, showing the asparagines 68 and 203 in the NPA boxes and some of the discussed amino acid residues as ball-and-stick models (Fu et al. 2000<$REFLINK> ). Oxygen, nitrogen, and carbon atoms are depicted in dark, intermediate, and light gray, respectively (in the on-line version, oxygen and nitrogen atoms are visualized in red and blue, respectively). The top of the structure is facing periplasmic space and the arrow indicates the orientation of the pore. The side chains of N68, N203, and R206, together with the carbonyl oxygen of H66 form part of the helical polar strip that interacts with hydroxyl groups of glycerol. I187 and F200 form part of the hydrophobic back of the pore and the hydrophobic corner, respectively, that interacts with the alkyl backbone of glycerol. In SIPs, I187 is replaced by a threonine, changing the pattern of polar residues in the pore. The structure of the hydrophobic pocket is changed by the replacement of F200 for a proline in most SIPs. R206 is conserved in almost all MIPs. However, the corresponding amino acid in SIPs is not conserved but varies between hydrophobic residues like isoleucine to polar residues such as asparagine. The carbonyl oxygen of H66 constitutes another polar interaction site in the pore of GlpF. The position of H66 is stabilized by E14. This glutamate is replaced by the one carbon shorter aspartate in all SIPs, possibly changing the position of the serine in SIPs corresponding to H66. This might result in a wider pore. The changed interior environment suggests that SIPs have a different substrate specificity compared with other MIPs

Fig. 4.—Part of the interior structure of GlpF, showing the asparagines 68 and 203 in the NPA boxes and some of the discussed amino acid residues as ball-and-stick models (Fu et al. 2000<\$REFLINK> ). Oxygen, nitrogen, and carbon atoms are depicted in dark, intermediate, and light gray, respectively (in the on-line version, oxygen and nitrogen atoms are visualized in red and blue, respectively). The top of the structure is facing periplasmic space and the arrow indicates the orientation of the pore. The side chains of N68, N203, and R206, together with the carbonyl oxygen of H66 form part of the helical polar strip that interacts with hydroxyl groups of glycerol. I187 and F200 form part of the hydrophobic back of the pore and the hydrophobic corner, respectively, that interacts with the alkyl backbone of glycerol. In SIPs, I187 is replaced by a threonine, changing the pattern of polar residues in the pore. The structure of the hydrophobic pocket is changed by the replacement of F200 for a proline in most SIPs. R206 is conserved in almost all MIPs. However, the corresponding amino acid in SIPs is not conserved but varies between hydrophobic residues like isoleucine to polar residues such as asparagine. The carbonyl oxygen of H66 constitutes another polar interaction site in the pore of GlpF. The position of H66 is stabilized by E14. This glutamate is replaced by the one carbon shorter aspartate in all SIPs, possibly changing the position of the serine in SIPs corresponding to H66. This might result in a wider pore. The changed interior environment suggests that SIPs have a different substrate specificity compared with other MIPs

This work was funded by NFR, SJFR and EU-Biotech program (BIO4-CT98-0024). We are grateful to Per Kjellbom and Salam Al-Karadaghi for critical reading and helpful suggestions on the manuscript.

## References

Arabidopsis Genome Initiative.
2000
Analysis of the genome of the flowering plant Arabidopsis thaliana
Nature

408
:
796
-815
Borgnia M. J., D. Kozono, G. Calamita, P. C. Maloney, P. Agre,
1999
Functional reconstitution and characterization of AqpZ, the E. coli water channel protein
J. Mol. Biol

291
:
1169
-1179
Chaumont F., F. Barrieu, E. Wojcik, M. J. Chrispeels, R. Jung,
2001
Aquaporins constitute a large and highly divergent protein family in maize
Plant Physiol

125
:
1206
-1215
Dean R. M., R. L. Rivers, M. L. Zeidel, D. M. Roberts,
1999
Purification and functional reconstitution of soybean nodulin 26. An aquaporin with water and glycerol transport properties
Biochemistry

38
:
347
-353
Fortin M. G., N. A. Morrison, D. P. Verma,
1987
Nodulin-26, a peribacteriod membrane nodulin is expressed independently of the development of the peribacteriod compartment
Nucleic Acids Res

15
:
813
-824
Froger A., B. Tallur, D. Thomas, C. Delamarche,
1998
Prediction of functional residues in water channels and related proteins
Protein Sci

7
:
1458
-1468
Fu D., A. Libson, L. J. Miercke, C. Weitzman, P. Nollert, J. Krucinski, R. M. Stroud,
2000
Structure of a glycerol-conducting channel and the basis for its selectivity
Science

290
:
481
-486
Heymann J. B., A. Engel,
1999
Aquaporins: phylogeny, structure, and physiology of water channels
News Physiol. Sci

14
:
187
-193
———.
2000
Structural clues in the sequences of the aquaporins
J. Mol. Biol

295
:
1039
-1053
Johanson U., M. Karlsson, I. Johansson, S. Gustavsson, S. Sjövall, L. Fraysse, A. R. Weig, P. Kjellbom,
2001
The complete set of genes encoding major intrinsic proteins in Arabidopsis provides a framework for a new nomenclature for major intrinsic proteins in plants
Plant Physiol

126
:
1358
-1369
Johansson I., M. Karlsson, U. Johanson, C. Larsson, P. Kjellbom,
2000
The role of aquaporins in cellular and whole plant water balance
Biochim. Biophys. Acta

1465
:
324
-342
Lagrée V., A. Froger, S. Deschamps, J. F. Hubert, C. Delamarche, G. Bonnec, D. Thomas, J. Gouranton, I. Pellerin,
1999
Switch from an aquaporin to a glycerol channel by two amino acids substitution
J. Biol. Chem

274
:
6817
-6819
Murata K., K. Mitsuoka, T. Hirai, T. Walz, P. Agre, J. B. Heymann, A. Engel, Y. Fujiyoshi,
2000
Structural determinants of water permeation through aquaporin-1
Nature

407
:
599
-605
Santoni V., P. Gerbeau, H. Javot, C. Maurel,
2000
The high diversity of aquaporins reveals novel facets of plant membrane functions
Curr. Opin. Plant Biol

3
:
476
-481
Sayle R.,
1995
RasMac molecular graphics 2.6 biomolecular structures group Glaxo Wellcome Research & Development, Stevenage, Hertfordshire, U.K
Swofford D. L.,
2000
PAUP*: phylogenetic analysis using parsimony (* and other methods). Version 4 Sinauer Associates, Sunderland, Mass
Thompson J. D., D. G. Higgins, T. J. Gibson,
1994
CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice
Nucleic Acids Res

22
:
4673
-4680
Weig A., C. Deswartes, M. J. Chrispeels,
1997
The major intrinsic protein family of Arabidopsis has 23 members that form three distinct groups with functional aquaporins in each group
Plant Physiol

114
:
1347
-1357
Weig A. R., C. Jakob,
2000
Functional identification of the glycerol permease activity of Arabidopsis thaliana NLM1 and NLM2 proteins by heterologous expression in Saccharomyces cerevisiae
FEBS Lett

481
:
293
-298