Many genes have been described and characterized which result in alternative polyadenylation site use at the 3′-end of their mRNAs based on the cellular environment. In this survey and summary article 95 genes are discussed in which alternative polyadenylation is a consequence of tandem arrays of poly(A) signals within a single 3′-untranslated region. An additional 31 genes are described in which polyadenylation at a promoter-proximal site competes with a splicing reaction to influence expression of multiple mRNAs. Some have a composite internal/terminal exon which can be differentially processed. Others contain alternative 3′-terminal exons, the first of which can be skipped in some cells. In some cases the mRNAs formed from these three classes of genes are differentially processed from the primary transcript during the cell cycle or in a tissue-specific or developmentally specific pattern. Immunoglobulin heavy chain genes have composite exons; regulated production of two different Ig mRNAs has been shown to involve B cell stage-specific changes in trans -acting factors involved in formation of the active polyadenylation complex. Changes in the activity of some of these same factors occur during viral infection and take-over of the cellular machinery, suggesting the potential applicability of at least some aspects of the Ig model. The differential expression of a number of genes that undergo alternative poly(A) site choice or polyadenylation/splicing competition could be regulated at the level of amounts and activities of either generic or tissue-specific polyadenylation factors and/or splicing factors.
In the nuclei of eukaryotic cells, precursor RNAs made by eukaryotic RNA polymerase II undergo a series of post-transcriptional processing events to produce mature mRNAs which are then exported to the cytoplasm. These modifications generally include methylation of the 2′ hydroxyl group of the ribose sugar(s) near the co-transcriptionally added cap, splicing to remove introns and cleavage with subsequent polyadenylation (addition of ∼200 adenosines) at the 3′-end; internal base editing and methylation of the 6 position of adenosines also occur with some RNAs. The precursor RNAs for most genes are not processed efficiently and instead are rapidly degraded in the nucleus; the bulk of the RNAs never reach the cytoplasm. Therefore, small changes in overall RNA processing efficiency in a particular cell or the effective strength of a particular splicing or polyadenylation site can serve as an important control point for gene expression in a tissue or developmental stage-specific manner. A functional polyadenylation signal is required for transcription termination by RNA polymerase II ( 1–3 ); transport of the message from nucleus to cytoplasm is dependent on polyadenylation and splicing ( 4–7 ) and these processes are apparently coupled through the C-terminal domain of RNA polymerase ( 8 ). The more efficient a poly(A) site is at processing in vitro , the more efficient it is at generating termination-competent RNA polymerase II elongation complexes and mature RNA ( 9 ). Poly(A) site strength can directly influence the amount of cytoplasmic RNA produced from a transcript ( 10 ); therefore, changing polyadenylation efficiencies can have a profound effect on the amount and nature of a gene product. In the cytoplasm, the poly(A) tail on the message plays a role in stability and translatability ( 11–13 ) and stabilizes RNA from degradation by preventing association with the degradation machinery ( 14 ). While initiation of RNA polymerase II transcripts is an important starting point in gene expression, the job is not done until the poly(A) tail is added and the mRNA is exported and translated in the cytoplasm. Transcription ends well only when the message ends well; therefore, polyadenylation is an important means to the end of the message.
Splicing factors can be categorized into three broad groups based on activity: constitutive factors which are involved in the generic splicing reaction, positive regulators which improve recognition of weak alternatively used splice sites and negative regulators which antagonize the positive regulators. A recent review detailed a variety of splicing options which can be exercised by tissue-specific changes in the amounts of the various splicing factors as well as the sequences at the splice sites and their positioning within the pre-mRNA ( 15 ). Detailed mechanistic studies of the events surrounding the polyadenylation/cleavage reaction are just emerging. This review examines the large number of genes described to date in which the regulation of polyadenylation may play an important role.
Three sequence elements determine the precise site of 3′-end cleavage and polyadenylation in mammalian pre-mRNAs. The highly conserved poly(A) signal, the hexanucleotide AAUAAA, is present in the 3′-untranslated region (3′-UTR) near the mature mRNA end in many genes (∼80%), while AUUAAA is found less frequently. Other less conserved AU- or A-rich sequences have been observed near the 3′-end of mature mRNA in a smaller fraction of cases and function less well in vitro than AAUAAA ( 16 ). Cleavage occurs at the poly(A) addition site ∼11–23 nt downstream of the hexanucleotide. The third element is a GU- or U-rich region, usually 10–30 bases downstream of the cleavage site ( 17–20 ). The AAUAAA or some variant of it, the downstream region and their relative positions define the approximate site at which cleavage will occur for most poly(A) sites ( 21 ); secondary structure can shift the site of poly(A) tail addition slightly ( 22 ). In some viral genes there is an additional element upstream of the hexanucleotide which can aid in efficient poly(A) site recognition ( 23–30 ); the U1 snRNP-specific protein U1A can activate SV40 late polyadenylation by interaction with a sequence element upstream of that poly(A) site and the polyadenylation factor CPSF ( 31 , 32 ). Upstream elements have been identified in a few eukaryotic genes, such as the complement C2 gene ( 33 ) and the immunoglobulin (Ig) γ2a membrane-specific poly(A) site ( 34 , 35 ), although the mechanisms of their action and the factors involved are less clear.
The biochemistry of pre-mRNA cleavage and polyadenylation has been well characterized and reviewed recently ( 36–39 ). Protein factors required for accurate cleavage and polyadenylation have been isolated from HeLa cell nuclear extracts and calf thymus. These factors include: cleavage and polyadenylation specificity factor (CPSF) ( 40–42 ), cleavage stimulatory factor (CstF) ( 43 ), poly(A) polymerase ( 44–47 ), the phosphorylation and activity of which varies during the cell cycle ( 48 ), cleavage factors I m and II m ( 49 , 50 ) and poly(A) binding protein II, which binds to the growing poly(A) tail ( 51 ). These factors interact to form a complex on the precursor RNA prior to the cleavage and polyadenylation reactions. The key factors are conserved between yeast and mammals; they are, in fact, better conserved than the cis -acting elements (reviewed in 52 ).
CPSF is a multisubunit factor that recognizes the poly(A) signal sequence AAUAAA ( 43 , 53 , 54 ) and is required for both cleavage and polyadenylation ( 40 , 42 , 43 ). CstF is required for the cleavage reaction ( 55 ) and stabilizes interaction of AAUAAA with CPSF via protein-protein interactions and protein-RNA interactions with the region downstream of the poly(A) site. Sites which interact strongly with CstF in vitro are strong sites in vivo ( 55 , 56 ). The 64 kDa subunit of human CstF contains an RNA-binding domain which can be crosslinked to RNAs containing GU- or U-rich regions downstream of the poly(A) site in the presence of CPSF, AAUAAA and the other subunits of CstF ( 57 , 58 ). The 50 kDa subunit of human CstF exhibits regions of identity and similarity with mammalian G protein β subunits and has seven copies of a characteristic transducin repeat motif ( 59 ). The 77 kDa subunit of CstF bridges the 64 and 50 kDa subunits and contains the putative nuclear localization signal most likely responsible for transporting the 50:77:64 trimer into the nucleus, where CstF functions ( 60 ). The 77 kDa subunit of CstF shares extensive homology with the Drosophila modifier supressor of forked, su(f) ; mutations of su(f) change the relative utilization of poly(A) sites in genes with inserted transposable elements. The strong homology suggests that human CstF may also regulate expression of specific poly(A) sites.
U1 snRNP and its A protein appear to play central roles in the coupling of splicing and polyadenylation ( 61 ). It is well established that U1 snRNP is involved in 5′ splice site recognition and definition of internal exons ( 62 ). The A protein of U1 can interact with the 160 kDa subunit of CPSF and activate polyadenylation ( 32 ). The A protein of U1 can also bind to its own pre-mRNA and inhibit poly(A) polymerase by interaction with the C-terminus of the polymerase, thereby controlling U1A production ( 61 , 63 ). Furthermore, binding of U1 to 3′-terminal exons has been observed and a correlation between U1 binding and polyadenylation activity was established ( 64 ). These observations are consistent with an involvement of U1 in the definition of 3′-terminal exons and coordination of splicing and cleavage/polyadenylation. U1 snRNP therefore plays a pivotal role in precursor RNA processing and subtle changes in its amount, characteristics or factors with which it interacts could profoundly influence alternative poly(A) site choice.
Genes with Tandem Terminal Poly(A) Sites, a Large and Growing Family
Although the majority of eukaryotic gene transcription units possess a single polyadenylation signal, numerous examples of transcription units with multiple poly(A) sites, all within a single 3′-terminal exon, have been described over the past several years; we have compiled these in Table 1 and contrast them in Figure 1A–C with the other types of genes whose messages could have multiple poly(A) sites. We have included in the tables only those genes for which there is solid evidence for more than one RNA species, i.e. Northern blots or nuclease protection assays; not included are a large number of other genes which have been sequenced and suggested, but not directly shown, to encode more than one mRNA because of several potential poly(A) signals in the 3′-end. Space limitations prohibit us from including a complete bibliography on each gene listed, so we have chosen to include either the first or the most comprehensive reference for each entry in the table.
Of the genes listed in Table 1 having multiple, tandem poly(A) sites in a 3′-terminal exon, many have not been examined in enough different tissues and cell types to determine if there is differential processing of the poly(A) sites. Use of these multiple sites may be regulated or may instead reflect random use of signals with varying inherent strengths. Polyadenylation may be important enough to nuclear processing and export to warrant two or more chances at recognition by the cellular machinery.
There are at least 33 genes listed in Table 1 which show changes in the distribution of 3′-ends of mRNAs produced based on the time in development, growth state of the cell or tissue in which they are expresssed, with testes being a hotspot for differential poly(A) site use. These genes with differential expression are indicated within the Notable features columns. Dihydrofolate reductase (DHFR) is an extreme example of multiple poly(A) sites, with seven spread over 5 kb of sequence. Transcription proceeds through all seven poly(A) sites and occurs 1 kb downstream of the last one ( 65 ), indicating that the multiple forms of mRNA arise by processing and not transcription termination between some of the sites. S1 protection analyses of steady-state DHFR mRNAs from growing versus resting cells show a different distribution of 3′-ends in the two stages ( 66 ), although it is not known if these are a consequence of cell cycle-specific differences in stability or processing. The same question of nuclear processing versus differential cytoplasmic stability arises with some of the examples in the tables, although for others the experiments addressing this question have been done and the results noted.
Since the multiple forms of mRNA in Table 1 generally differ only at the 3′-end and not in the coding regions, it may not be obvious how differential poly(A) sites could influence protein expression. However, if the different forms of mRNA have different stabilities or translatability, then use of alternative poly(A) sites can positively or negatively impact on the final amount of protein product per unit precursor RNA transcribed. An interesting example of regulation of a gene with tandem poly(A) sites, listed in Table 1 , is the gene for eukaryotic initiation factor 2α (eIF-2α), a key factor for protein synthesis, in which there are multiple poly(A) sites in the 3′-UTR encoding two fairly common mRNAs of 1.6 and 4.2 kb ( 67 ). While both messages are on polyribosomes, the 1.6 kb message is less stable than the 4.2 kb mRNA species in T cells, but is more readily translated in vitro . Shifting to expression of the shorter message could increase the amount of protein produced. The ratios of the 1.6 to 4.2 kb species ranged from 1:1 in brain and skeletal muscle to 10:1 in placenta, liver and pancreas. T cells activated with ionomycin and PMA leave G 0 and enter S phase; shortly following this treatment the 1.6 kb mRNA increased in abundance 11.5-fold, while the 4.2 kb mRNA increased only 4-fold, indicating that the poly(A) site leading to 1.6 kb mRNA is favored in S phase.
It was recognized many years ago that treatment of resting T cells with agents which caused them to proceed from G 0 to S phase caused increases in polyadenylation enzyme activity ( 68–70 ) and that an increased rate of polyadenylation of mRNA was a rapid response to entry into S phase ( 71 ). The recent observation of changes in poly(A) polymerase activity during the cell cycle ( 48 ) and increases in CstF-64 amount following B cell stimulation ( 72 ) indicate that changing the site of poly(A) addition on a given transcript may be a response to the cellular environment through activation of the polyadenylation machinery. Increases in polyadenylation activity favor use of the first poly(A) site in the eIF-2α primary transcript, ultimately producing more protein per primary transcript, both from more efficient nuclear RNA processing and transport and from more efficient cytoplasmic translation of the 1.6 kb mRNA.
A third poly(A) site for the eIF-2α gene is used only in testes to produce a 1.7 kb mRNA, indicating that in the testes a different distribution of polyadenylation factors may be recognizing different cis -acting elements in the primary transcript. The overall pattern of processing in the eIF-2α gene is conserved between mouse and man and is presumably important for its regulation. As indicated in Table 1 , seven genes clearly show a pattern of differential stability or translatability of the various mRNA products, including the cationic amino acid transporter gene, cyclic AMP-responsive element modulator, cyclooxygenase-2, eIF-2α, histone H1 0 , splicing factor PR264/SC35 and vascular endothelial growth factor. Fourteen more genes have been suggested to contain potential regulatory elements between the poly(A) signals, but the mRNA half-lives were not measured directly. Several other examples where an instability element was postulated but subsequently shown not to have differentially stable mRNAs are also indicated in Table 1 .
Generation of Alternative 3′-Ends by Competition Between Polyadenylation and Splicing: Composite Versus Skipped Exons
Two other major classes of gene organization leading to the generation of alternative poly(A) sites on mRNA are illustrated in Figure 1B and C ; the genes in each class are listed in Tables 2 and 3 . The final protein products of both types of genes can differ at their C-termini depending on which processing pathway is followed. Exons are generally categorized as 5′-terminal, internal or 3′-terminal with polyadenylation signals in the UTR. A number of genes listed in Table 2 contain composite exons in which 5′ splice sites can sometimes be silent, causing them to behave as 3′-terminal exons, or sometimes be active, thereby causing them to behave as internal exons, depending on the tissues in which the gene is expressed; these we call composite, in/terminal exons. Genes like the immunoglobulin heavy chains have an exon serving either as the first 3′-terminal exon in one mRNA (use of pA1) or as an internal exon in a second mRNA which ends with a normal 3′-terminal exon found further downstream (use of pA2). The primary transcript from other genes like calcitonin/calcitonin gene-related peptide, listed in Table 3 , are processed into two mRNAs by using either the first alternative 3′-terminal exon with its poly(A) site (pA1) or skipping that exon entirely and splicing the second 3′-terminal exon into the transcript, using pA2 instead. The distance between the poly(A) sites in these two classes of genes can be quite large (>3 kb in Ig genes) and differential sites of transcription termination, between the poly(A) sites, could change the distribution of 3′-end use in mRNA. Levels of basal polyadenylation factors, splicing factors and termination factors could all contribute cell type-specific mechanisms leading to 3′-end formation. Considerations of differential stability of mRNA, as discussed with the genes described in Table 1 , also pertain with the genes in Tables 2 and 3 .
Composite Exons Which Can Serve as Internal or 3′-Terminal Elements
To understand the composite exon behavior, a number of studies have been done with synthetic constructs. The addition of an adenovirus 5′ splice site to a 3′-terminal exon was shown to negatively affect polyadenylation of the adjacent poly(A) site in HeLa cells ( 73 ). A 5′ splice site consensus sequence was necessary and sufficient to inhibit polyadenylation when inserted into a 3′-UTR of papillomavirus in BPV-1-transformed mouse cells ( 74 ). However, those 5′ splice sites are quite vigorous sites; it is interesting to note that the 5′ splice sites in most cellular composite exons are quite weak and may not bind tightly to all components of the splicing machinery. Furthermore, regions with limited sequence complementarity to the 5′ splice site in a 3′-terminal exon were shown to have a positive effect on polyadenylation through an interaction with U1 snRNP, but when these sequences were mutated to more closely match the consensus, the positive effect was lost ( 64 ). The mechanism by which the choice is made to splice or polyadenylate a composite in/terminal exon probably varies based on the balance of splicing and polyadenylation factors present in the tissue. This potential balancing act has parallels with tissue-specific variations in the amounts of positive and negative alternative splicing factors which can influence alternative splicing (reviewed in 15 ).
The Ig heavy chain genes represent the best-studied examples of complex transcription units in which composite exons can switch between being internal or 3′-terminal; their differential expression during B cell development may provide insights into the expression and trans -acting factors operating on the processing of some of the other 19 members of the group listed in Table 2 . Ig µ heavy chains are expressed in pre-, immature and mature B cells and some plasma cells. The α, δ, ε and γ heavy chains are expressed in memory and plasma cells. RNA from each of the five classes of immunoglobulin heavy chain genes (α, δ, ε, γ and µ) can be alternatively processed to produce two types of mRNAs, one encoding the membrane-bound receptor for antigen on the surface of mature and memory B cells, the other encoding the secreted form of the Ig protein ( 75–77 ). Polyadenylation at the secretory-specific poly(A) site and splicing in of the membrane-specific exons to the composite in/terminal exon are two mutually exclusive events. During the development of B cells there is a regulated shift from production of the membrane- to the secretory-specific form of Ig mRNA and protein; the secretory-specific forms predominate in terminally differentiated plasma cells (reviewed in 78,79) and can exceed the membrane form by 100:1. The total amount of cytoplasmic Ig mRNA also increases by 30- to 100-fold in plasma cells. Differences in mRNA stability alone cannot account for the shift to secretory-specific mRNA production, for while there is an increase in the half-life of Ig mRNA following differentiation to the plasma cell stage ( 80–84 ), this increase occurs equally with both the secretory- and membrane-specific species ( 83 , 84 ). However, the 5-fold increased transcrip- tion of the Ig locus coupled with a more efficient conversion of the primary transcript to mature secretory-specific mRNA by increased polyadenylation would contribute significantly to both the abundance increase and the shift towards secretory species.
For the µ heavy chain gene, the site of transcription termination shifts from downstream of the membrane exons in B cells to a region between the secretory poly(A) site and the membrane exons in plasma cells ( 85–88 ). In some plasma cells the membrane exons are not even transcribed and secretory-specific mRNA results. In contrast, there is no change in the site of transcription termination for Ig α and γ heavy chains; here termination always occurs at approximately the same location, ∼1 kb downstream of the last membrane exon ( 35 , 89 , 90 ). Therefore, changes in the site of transcription termination play a role in the expression of Ig µ but not Ig γ or α secretory-specific mRNA. This difference may be the result of the unique location of the Ig µ heavy chain exons, which lie only 9 kb upstream of the δ constant region exons; both sets of exons are expressed in a common precursor at the mature B cell stage ( 91 ). The extent of δ gene transcription is also regulated by differential termination and polyadenylation, which involves sequence elements both within the µ membrane poly(A) site and a segment between the µ and δ coding sequences ( 92 ).
RNA processing events play a major role in determining the final amounts and the ratios of the two forms of Ig mRNA. Early experiments demonstrated that during B cell differentiation use of alternative cleavage/polyadenylation sites modulates the production of the two mRNAs from an Ig γ gene ( 93 , 94 ) and from the Ig µ gene ( 85 , 86 , 95–97 ). In later studies, an increase in the efficiency of polyadenylation at Ig secretory-specific poly(A) sites was seen in plasma cells versus mature and memory B cells for µ and α ( 98 , 99 ) and γ sites ( 100 , 101 ), as measured by the relative use of tandem poly(A) sites in a 3′-terminal UTR in vivo . Transfection experiments have failed to identify cis -acting sequences within the immunoglobulin g gene responsible for the observed regulation of poly(A) site choice ( 102 ) or the µ splicing versus poly(A) choice ( 103 ). Attempts to determine which is the default pathway in non-B cells, Ig secretory or membrane mRNA production, have given different answers based on the Ig heavy chain gene, with a study of the IgG gene indicating that the membrane processing pattern occurs predominantly in non-lymphoid cells ( 104 ), while transfection of a hybrid SV40/IgM gene into a variety of non-lymphoid cells indicated that the secretory processing pattern was the default ( 103 ). Neither study determined the potential differences in Ig transcription termination sites or mRNA stabilities in non-lymphoid cells which might influence the interpretations. While there is a balance between polyadenylation and splicing which shifts in B cell development ( 79 ), there is no measurable change in efficiency of splicing either between B cell stages or in comparison with several different non-B cell lines ( 95 , 98 , 104–106 ). Therefore, the differential expression of Ig heavy chain genes must primarily be the result of changes in the trans -acting factors responsible for polyadenylation.
Mechanisms of Poly(A) Site Choice in B Cells
The relative levels of polyadenylation, splicing and transcription termination factors might be expected to play a role in the modulated expression of the two forms of mRNA from the Ig gene. When the gene for the 64 kDa subunit of CstF, driven by an actin promoter, was over-expressed 10-fold in a chicken B cell line an 8-fold shift toward the use of the promoter-proximal, secretory-specific poly(A) site in the endogenous Ig µ gene was seen ( 72 ). Increasing the amount of a limiting component of the CstF complex increased the amount of the complex in the nucleus, thereby increasing polyadenylation efficiency by mass action. This same study also showed that there was an increase in the amount of the 64 kDa protein when resting mouse splenic B cells (mature B cells) were stimulated by lipopolysaccharide treatment to grow and secrete Ig (summarized in Table 4 ). Therefore, at least during the transition from the resting B cell to the growing lymphoblast, an increase in 64 kDa CstF protein can play a role in increasing Ig secretory mRNA expression. However, comparisons of continuously growing cell lines which accurately represent Ig expression at various B cell stages have shown that the shift to production of the secretory-specific form of mRNA can increase from 30- to 100-fold in the absence of a change in the level of 64 kDa protein in the nucleus ( 107 , 108 ) or in the whole cell (unpublished observation). Therefore, a mechanism other than an increase in the amount of the 64 kDa subunit of CstF must be operative in plasma cells and tumor lines derived from them, to shift RNA processing towards production of secretory-specific forms of Ig mRNA. This issue was recently discussed ( 39 , 72 ).
The binding activity but not the amount of several constitutive factors required for cleavage and polyadenylation increases in continuously growing plasma cells producing large amounts of secretory-specific Ig mRNA (see Table 4 ). There is as much as an 8-fold increase in binding to input substrates of the 64 kDa subunit of CstF and the 100 kDa subunit of CPSF, two constitutive polyadenylation factors, in myeloma/plasma cell nuclear extracts as compared with lymphoma (early or memory B cell) extracts ( 107 , 108 ). These increases in binding occur regardless of the sequence of the polyadenylation-competent substrates as long as the substrates contain both an AAUAAA and a downstream element. Another activity was described in early/memory B cell extracts which seems to selectively destabilize complexes formed on weak poly(A) sites such as the immunoglobulin secretory-specific site ( 109 ). The activity of this factor on the dissociation of RNA-protein complexes formed on the membrane poly(A) site is less than that seen on the secretory-specific site.
Induction of a novel 28–32 kDa nuclear RNA binding factor in mouse splenic B cells was found to correlate with production of the secretory form of IgM heavy chain ( 110 ). Treatment of cells with both lipopolysaccharide and anti-µ antibodies, which allows for growth but not secretion, caused inhibition of Ig secretory-specific mRNA production but did not decrease induction of the 28–32 kDa RNA binding protein. Instead, another RNA binding protein of 50–55 kDa was produced; this protein binds to both secretory- and membrane-specific µ poly(A) sites in vitro . These proteins have not been identified further, but may represent positive and negative regulators of the secretory-specific polyadenylation complex, based on their binding specificities.
A postulated activator of polyadenylation/cleavage in plasma cells ( 107 , 108 ), which could act on any weak poly(A) site, together with the loss of a distinct inhibitor of the Ig secretory-specific poly(A) site, as postulated ( 109 ), could stabilize the polyadenylation complex formed at the secretory polyadenylation/cleavage site; this would allow the weak secretory site to be used to the exclusion of splicing in the composite exon in plasma cells. In early/memory B cells the secretory-specific poly(A) site cannot effectively compete for polyadenylation factors and the composite in/terminal exon functions as an internal exon; the membrane poly(A) site is then used by default.
Activation of weak polyadenylation sites in plasma cells occurs with a variety of transfected sequences ( 102 ). In addition, examination of early/memory stage B cells shows that they tend to accumulate more mRNA in the nucleus than do plasma cells. The effect seems more pronounced for secretory-specific Ig mRNA than for some other cellular RNAs ( 84 ), perhaps as a consequence of its extremely weak poly(A) site. The shifts in CstF-64 activity described above might increase the cytoplasmic abundance of a group of endogenous transcripts with weak poly(A) sites as well as shift poly(A) site location in complex transcription units like those described in Tables 1 and 2 and perhaps, but less likely, those listed in Table 3 . CD40, described in Table 1 , is a gene with two poly(A) sites whose relative use changes during B cell development ( 111 ) and might represent a member of the group of endogenous transcripts co-regulated with Ig secretory-specific mRNA through alternative poly(A) site choice. Before B cells are activated both mRNAs are produced; after activation the shorter, potentially more stable mRNA is the predominant species. The change in the factor(s) influencing use of the secretory-specific Ig poly(A) site could therefore have a broader effect on differentiation in B cells by influencing the expression of many mature mRNAs through poly(A) site selection.
Other Genes with a Composite Exon
A gene listed in Table 2 that was not influenced by changes in the levels or activities of polyadenylation factors in B cells is the gene for the Ca 2+ -transport ATPases of the sarcoplasmic or endoplasmic reticulum (SERCAs). Tissue-specific alternative 3′-end processing of SERCA2 pre-mRNA gives rise to two distinct protein isoforms (2a and 2b) which differ in their C-terminal portions ( 112 ). SERCA2a is found in cardiac, smooth and slow twitch skeletal muscle, while SERCA2b is found in smooth muscle and non-muscle tissues. No change in expression pattern was seen when B cells representing different stages of development were transfected with SERCA2 constructs ( 113 ), indicating that expression of the SERCA2 alternatively processed forms may result from tissue-specific alternative splicing instead of regulated polyadenylation or that the factors which are changed in B cells are not the ones necessary to influence SERCA pre-mRNA polyadenylation.
The gene encoding GARS/AIRS/GART ( Table 2 ) seems to make both the mRNA products regardless of the tissue type ( 114 ) or B cell stage (L.Souan, Masters Thesis, University of Pittsburgh). Therefore, the changes in processing factors which are able to alter the processing fate of some exons may not operate on all similarly organized genes. The reasons for this remain unclear.
One of the mammalian thyroid hormone receptors is encoded by the erbAα gene, which can give rise to two mRNAs by the composite, in/terminal exon mode ( Table 2 ). These two mRNAs give rise to receptor isoforms with antagonistic functions. The levels of the two mRNAs vary in different tissues and at different developmental stages ( 115 , 116 ). The Rev-erbAα gene is encoded on the DNA strand opposite erbAα and produces an antisense RNA with complementarity to the 3′-end of the mRNA using the second poly(A) site ( 117 ). When expression of erbAα was examined in B cells representing different stages in development it was shown to vary; however, the relative levels of the two forms of erbAa mRNA depended not on B cell stage but rather on the amount of Rev-erbA being expressed ( 118 ). This unusual mechanism for regulating alternative 3′-ends is common in viruses, but much less so in eukaryotic genes.
The Skipped Exon Genes
The genes listed in Table 3 and diagramed in Figure 1C can encode two or more mRNAs by using classical 3′-terminal exons which are arranged so that the first can be skipped over. The regulated expression of these genes may be sensitive not only to the levels of general splicing and polyadenylation factors but also to gene-specific splicing factors which can facilitate either the inclusive ( dsx ) or the skip-over (CGRP) splice. Calcitonin and the calcitonin gene-related peptide (CGRP) are produced from a single gene by alternative splicing or polyadenylation; the common exons 1–3 are spliced during processing for both, but inclusion of exon 4 in the final mRNA results in polyadenylation at a site in its 3′-UTR (pA1, Fig. 1C ) to produce calcitonin. To produce CGRP the processing reaction skips over exon 4 but splices exon 3 to exon 5; this is followed by exon 5 to exon 6 joining and polyadenylation after exon 6 (pA2, Fig. 1C ; 119 , 120 ). Studies of mice with a calcitonin/CGRP transgene showed that calcitonin-specific inclusion and polyadenylation of exon 4 occurs in a variety of tissues, while CGRP expression (skipping) is limited almost exclusively to neuronal cells ( 121 ), suggesting that the calcitonin pattern is the default pathway and that neuronal cells must enhance the exon 3 to exon 5 splice. Neither differential mRNA half-lives nor changes in transcription termination sites can account for the tissue-specific differences in calcitonin/CGRP expression (reviewed in 122 ).
Inclusion of exon 4, with its weak splice site and poly(A) site, to generate calcitonin mRNA in HeLa cells was shown to require an enhancer sequence located within the intron downstream of the poly(A) site of exon 4 distinct from the typical downstream GU- or U-rich elements ( 123 ). The intron enhancer activates cleavage and polyadenylation of precursor RNAs containing the calcitonin poly(A) site or heterologous poly(A) sites in exon 4 at a distance of several hundred nucleotides from the AAUAAA ( 124 ). The enhancer can work with a heterologous gene and contains (Py) n CAGGUAAGAC, a so-called ‘zero length’ exon, composed of adjacent 3′ and 5′ splice site consensus elements preceded by a pyrimidine tract; the zero length exon can bind U1 snRNP, alternative splicing factor/splicing factor 2 (ASF/SF2) and pyrimi- dine tract binding protein. The enhancer of exon 4 inclusion activates polyadenylation and cleavage through binding of known splicing factors.
When the calcitonin/CGRP gene was transfected into a B cell line in which the amount of Ig secretory- and membrane- specific species was about equal, an accumulation of large amounts of partially processed nuclear species was seen. This was originally interpreted as indicating that the machinery necessary to splice exon 3 and exon 5 was missing but failed to explain why exon 4 was not used ( 125 ). A possible interpretation of these older results with B cell transfections is that the intron 4 enhancer was not active in those cells because of low precursor RNA binding activity of CstF-64 which also influences Ig processing or because of low levels of other unspecified RNA processing factors which interact with the intron enhancer.
The doublesex ( dsx ) gene in Drosophila ( Table 3 and Fig. 1C ) shows skipping of exon 4 with splicing of exons 1, 2, 3, 5 and 6 in males but splicing of exons 1, 2, 3 and 4 in females ( 126 ). Polyadenylation occurs after exon 6 (pA2) in males and after exon 4 (pA1) in females. The tra and tra-2 gene products are required for female-specific processing of dsx pre-mRNA, with the male pattern representing the default pathway in the absence of these two genes. Binding of the tra-2 protein product to a region within the female-specific exon is required not only to activate splicing at the weak splice sites in exon 4, but also independently for female-specific polyadenylation at exon 4 ( 127 ). Highly cooperative interactions between domains of tra and tra-2 and serine/argininerich proteins result in formation of a multiprotein complex on the female-specific exon to positively enhance its splicing and hence the choice of the poly(A) site at the end of exon 4 ( 128 ). The mechanism of enhancement of polyadenylation has not been elucidated; however, the observations that a mutation in a 3′ splice site can negatively affect polyadenylation ( 129 ) and that an intron enhancer that binds splicing factors can stimulate polyadenylation ( 124 ) argue that at least part of the activation by tra-2 of dsx exon 4 polyadenylation is through an interaction between the splicing and polyadenylation machinery. Enhancement of both female-specific splicing and polyadenylation are therefore important for regulated expression of this gene.
Viral Systems and CstF-64 Polyadenylation/Cleavage Factor
The adenovirus major late transcription unit can be alternatively processed during the viral infectious cycle (see Table 1 ). Polyadenylation has been shown to regulate L1 versus L3 mRNA production in adenovirus infection ( 24 , 130–134 ). The promoter-proximal L1 poly(A) site is weaker than the promoter-distal L3 poly(A) site ( 135 ), analogous to the Ig secretory- and membrane-specific poly(A) site arrangement. The switch in adenovirus, however, is to predominant use of the stronger downstream L3 poly(A) site late in infection. The regulation of poly(A) site use in adenovirus shows many similarities to that of the Ig transcription unit in cultured cells (see Table 4 ); there is a change in binding of the 64 kDa subunit of CstF to poly(A) sites with no change in the amount of 64 kDa protein ( 134 ). Late in adenovirus infection the activity of binding of the 64 kDa protein to poly(A) sites decreases, suggesting a decrease in overall polyadenylation efficiency ( 132 ); as a consequence, the stronger, promoter-distal poly(A) site is favored. In late stage/plasma cells the activity of binding of the 64 kDa subunit of CstF increases, implying a general increase in polyadenylation efficiency; consequently, the weaker, promoter-proximal poly(A) site is favored. The change in CstF activity during adenovirus infection indicates that the Ig gene model for poly(A) site choice may have broader relevance.
Polyadenylation efficiency changes throughout the course of herpes simplex virus type 1 (HSV-1) infection ( 136–143 ). Several HSV genes themselves contain multiple poly(A) sites (see Table 1 ). HSV-1 gene expression is temporally regulated during lytic infection, with immediate early gene products being produced directly after infection in the absence of de novo protein synthesis. Immediate early gene expression is required to produce the early viral proteins which turn on viral DNA synthesis and transcription of the late gene products. ICP27 (also known as IE63) is a nuclear phosphoprotein required for viral replication and for the switch from early to late gene expression ( 139–142 ). ICP27 functions post-transcriptionally to activate cleavage and polyadenylation of late, weaker viral poly(A) sites and to inhibit splicing of host cell pre-mRNA ( 139 ). Activation of the late, weak poly(A) sites is due to increased binding of the 64 kDa subunit of CstF to these sites in the presence of ICP27 ( 138 ). Strong poly(A) sites, defined by efficient cleavage in uninfected HeLa nuclear extracts, are not affected by the presence of ICP27. A direct interaction between ICP27 and the 64 kDa subunit of CstF has not been demonstrated and thus the precise mechanism by which ICP27 increases cleavage and polyadenylation at late viral sites is not known. The HSV-1 system, however, is another example in which changes in binding of the general polyadenylation factor CstF plays a role in a regulated switch in poly(A) site use.
The organization of the retrovirus HIV-1 with flanking long terminal repeats (LTRs), each of which contains a poly(A) site, requires the polyadenylation machinery to ignore the 5′ poly(A) site close to the promoter and process only the 3′ poly(A) site far downstream. Other retroviruses, like Rous sarcoma virus and T cell leukemia virus-1, have a transcription start site between the first AAUAAA and its downstream element, thereby precluding the problem. In HIV-1, the U3 element upstream of the 3′ poly(A) site in the transcribed RNA has been shown to have an influence on enhancing processing in vitro and in vivo ( 144 ), a special example of the upstream elements mentioned earlier. In addition, the major splice donor site inhibits the adjacent 5′-LTR poly(A) signals ( 145 ). Therefore, use of the 3′ poly(A) site in HIV-1 is through a combination of both enhancement of the active 3′ site and depression of the 5′ site.
Summary and Conclusions
Changes in the level or activity of the 64 kDa subunit of polyadenylation factor CstF can influence expression of viral and Ig heavy chain genes by changing the processing efficiency of weak poly(A) sites. A large number of other genes have multiple poly(A) sites, the use of which may vary in a differentiation or developmentally regulated fashion. The relative strengths of the poly(A) sites of many of these complex transcription units have yet to be determined and even less is known about potential positive and negative regulators of the cleavage/polyadenylation reaction. The evidence emerging from experiments in yeast suggests that other modifying factors influence the constitutive cleavage/polyadenylation machinery ( 146 ) beyond the well-established U1 snRNP protein A. Therefore, it is likely that there are tissue-specific levels of expression of basal polyadenylation/cleavage factors, as well as of modulators of the constitutive polyadenylation factors, in higher eukaryotes. If the levels of the constitutive cleavage/polyadenylation factors or modulators of them vary from tissue to tissue and throughout development, then differential use of multiple poly(A) sites can be achieved, providing ‘a means to an end’ in complex transcription units. Tissue-specific variations in splicing factors can also tip the balance with some genes. Having the mRNA end well is a challenge to the cell. Characterizing RNA processing modulators and their interactions with constitutive polyadenylation and splicing factors to regulate alternative pre-mRNA processing remains a challenge to investigators in this field.
This work was supported by grant GM50145 to C.M.. K.L.V. is a member of the MD/PhD Program at the University of Pittsburgh. We thank Drs J.Cohen, S.Phillips and numerous colleagues for comments on the manuscript and useful discussions.