Genes from the recA/RAD51 family play essential roles in homologous recombination in all organisms. Using sequence homologies from eukaryotic members of this family we have identified fragments of two additional mammalian genes with homology to RAD51. Cloning the full-length cDNAs for both human and mouse genes showed that the sequences are highly conserved, and that the predicted proteins have characteristic features of this gene family. One of the novel genes (R51H2) occurs in two forms in human cDNA, differing extensively at the 3′ end, probably due to an unusual form of alternative splicing. The new genes (R51H2 and R51H3) were mapped to human chromosomes 14q23–24 and 17q1.2, respectively. Expression studies showed that R51H2 is expressed at lower levels than R51H3, but that expression of both genes occurs at elevated levels in the testis compared with other tissues. The combination of gene structure conservation and the transcript expression patterns suggests that these new members of the recA/RAD51 family may also function in homologous recombination-repair pathways.
Genetic recombination is a vital process in all cells, not only to create genetic diversity but also to repair damage to DNA. In bacteria, the RecA protein has a central role in recombination, mediating the search for homologous regions of DNA molecules and promoting DNA strand exchange (1,2). The Rad51 protein of the yeast Saccharomyces cerevisiae (ScRad51) has structural homology to RecA (3) as well as functional similarities, including the formation of underwound nucleoprotein filaments on duplex DNA and the promotion of homologous pairing and strand exchange reactions in vitro (4,5). In the last few years, functional homologues (orthologues; 6) of the ScRAD51 gene have been isolated from human and mouse tissue (HsRAD51 and MmRAD51, respectively) (7–9); recent studies with purified HsRad51 show that both nucleoprotein filament formation and pairing, and strand exchange are conserved (10,11). Intriguingly, although the RAD51 gene of S.cerevisiae is not essential for survival, deletion of the orthologue in mice gives embryonic lethality, leading to the suggestion of a potentially important role for the mammalian RAD51 gene in other aspects of DNA metabolism (12,13). The potential importance of mammalian Rad51 interactions has been highlighted recently by the finding of physical interactions with the tumour suppressor protein p53, which has a central role in the control of the cell cycle and apoptosis (14,15), as well as with the breast cancer-susceptibility gene products Brca1 and Brca2 (16,17).
Saccharomyces cerevisiae has two other recombination-repair proteins with structural homology to RecA, ScRad55 and ScRad57 (18); these gene products display a temperature-conditional mutant phenotype and appear to work in a complex with ScRad51 (19,20). Most recently, ScRad55 and ScRad57 have been shown to exist as a heterodimer which interacts with ScRad51 in the presence of replication protein A to stimulate DNA strand exchange (21). These observations suggest that the three RecA homologues in S.cerevisiae do not have redundant functions, but rather have distinct roles to play in the complex process of recombination. No mammalian orthologues of the ScRad55 and ScRad57 genes have yet been identified.
In a recent study we cloned a human gene, provisionally named XRCC2, that complements a mutant cell line (irs1) sensitive to agents causing severe types of DNA damage. We found that the XRCC2 predicted protein has structural homology to the RecA/Rad51 family, and more specifically to HsRad51 (22). This finding suggested that, in addition to the possibility of finding orthologues of ScRAD55 and ScRAD57, further structural homologues of RAD51 are present in humans. Therefore, we have used the XRCC2 sequence, and others from the recA/RAD51 family, to attempt to find novel members of this family. We have searched the growing collection of expressed-sequence tags (ESTs) available in the public nucleotide databases, using programs such as TBLASTN (23), which can be queried with conserved regions of gene families. These searches may reveal significant matches to give a partial gene sequence. An EST identified in this way can then be used to facilitate the isolation of a full-length gene copy, so that structural and functional studies can be performed.
Materials and Methods
Identification of structural homologues of HsRAD51
Human or mouse ESTs were identified using sub-routines of the BLAST program (23) with the DDBJ/EMBL/GenBank or EST databases. The queries were based on conserved regions of the predicted protein sequences of members of the RAD51 gene family, such as ScRAD51, HsRAD51, ScRAD55 and HsXRCC2 (DDBJ/EMBL/GenBank accession nos M88470, D14134, U01144 and Y08837, respectively). Providing the conserved regions chosen for searching (such as the two nucleotide-binding domains; 24,25) were not too small, these searches found sequences which on re-searching protein databases (e.g. SwissProt) had significant similarity to the RAD51 family (P < 10−5 for at least one member of this family), without detecting related gene families (such as the functionally diverse AAA family of ATPases; 26). For example, the first of these (accession no. R50193) had 62% identity over 90 amino acids of the HsRad51 sequence, while others had lower levels of homology (e.g. accession no. T92120 had 34% identity over 50 amino acids of HsRad51). These two human ESTs showed no sequence overlap and aligned to different parts of the HsRad51 protein, but sequencing showed that they were parts of the same gene. In subsequent searches, ESTs derived from mouse tissue (accession nos AA118958 and AA124439) were found to have a high level of nucleotide homology to R50193 (77% identity over 300 bp), suggesting that there is a mouse orthologue of this gene. Human and mouse sequences from a different structural homologue of RAD51 were similarly identified. In this case, overlapping mouse ESTs were identified first (accession nos AA049653 and AA260430 with 39% identity over 50 amino acids of HsRad51) and subsequently a human EST was found (accession no. N51784 with 76% identity over 131 bp of the mouse sequence). In matching this human EST to the mouse orthologue, it was apparent that a region of ∼340 bp, including the first nucleotidebinding domain, was missing in the human clone. This was verified by finding another EST (accession no. D59413) which overlapped N51784 but also had part of the missing sequence (i.e. was contiguous with the mouse sequence). Note that in the text we are using the term ‘structural homologue’ to indicate that these new genes are structurally most closely related to the RAD51 gene (as seen in database searches; see Discussion), but that their functional relationships are as yet unknown.
Cloning and sequencing full-length genes
ESTs, available as I.M.A.G.E. Consortium cDNA clones (27), were obtained from the Resource Centre of the UK Human Genome Mapping Project, Cambridge (I.M.A.G.E. Clone IDs: 576685, 573969, 478233, 746795, 692441, 259579). Additionally, a human cDNA library derived from HeLa cells (kindly supplied by Dr R.Legerski, University of Texas, Houston) (28) was screened with probes made by the polymerase chain reaction (PCR), using primers designed to sections of sequenced ESTs. Probes were cloned and verified by sequencing. Bacteria containing the library or a sub-library enriched for the gene by PCR-analysis of fractions, were screened on Hybond N filters (Amersham) at 2 × 105 bacteria/filter, with 32P-labelled probes. Bacterial colonies hybridizing to the probes were respread onto fresh filters for at least one further round of screening, then purified and mini-preparations of their DNA were checked for the presence of the gene by PCR with gene-specific primers. Where an EST or isolated cDNA clone lacked a specific segment of the gene (see above), this was isolated using PCR with specific primers on a cDNA library, followed by cloning into the pCRII vector (Invitrogen). Other cDNA libraries used to screen for specific PCR products (see Results) were either made in-house (R.Cartwright, unpublished) or made by Dr D.Simmons and kindly supplied by Dr I.D.Hickson (Institute of Molecular Medicine, Oxford).
Sequencing of the genes was performed either manually or using ABI373A (by Alta Bioscience, Birmingham) or ABI377 automated sequencers with dye-termination chemistry, using both vector and gene-specific primers. Sequences were manipulated and translated using programs included in the Wisconsin Sequence Package, Version 8, Genetics Computer Group, Madison, through the UK HGMP Resource Centre.
Filters with human genomic PAC clones (library RPCI1), constructed by P.de Jong, Buffalo (29) were screened using radiolabelled cDNA probes as described above. Filters and positive clones were provided by the UK HGMP Resource Centre. Mini-preparations of PAC DNA were nick-translated with biotin-dUTP (Gibco Life Technologies) and mixed with hybridization buffer in 50% formamide. Primary human fibroblasts were fixed, dropped onto slides and G-banded with trypsin (30); mitotic cell images were captured using a colour CCD camera (Optivision). The cells were then fixed, denatured in 70% formamide/2× SSC at 70°C for 2 min and the probe mixture hybridized overnight at 37°C. Slides were washed with 2× SSC in 50% formamide at 37°C. The PAC sequences were indirectly labelled with the fluorochrome FITC through an avidin reaction and amplified once using biotinylated anti-avidin. Chromosomes were counterstained with propidium iodide and analysed with a UV fluorescent microscope; images were recaptured and combined with their G-band image using Adobe Photoshop.
In addition, for one of the novel genes (R51H3), DNA samples derived from a panel of monochromosomal rodent-human hybrids (31) were screened by PCR with primers designed to the 3′ end of the gene. Primer sequences were: forward 5′-AATCTTCCCGACAGCCAACAGGTT-3′; reverse 5′-AGTGCCAGGTGGCAGTAAACAGCA-3′; these showed no product formation with mouse or hamster genomic DNA.
Northern blot and RT-PCR analysis
Multiple-tissue northern blots were purchased from Clontech (human #7759-1) and hybridized using Rapidhyb (Clontech) using the supplier's protocol for cDNA probes. cDNA probes were made using PCR with primers designed to the sequenced genes, as follows: R51H2, forward primer 5′-TGTGGATCCCTCACAGAGATTACA-3′; reverse primer 5′-TGTGCATCAAAACCCCTTCTG-3′; R51H3, forward primer 5′-GGACCTGGTTTCTGCAGACCTGGA-3′; reverse primer 5′-GCAGCACATCCAGCATCTGGAAGA-3′.
Total RNA was isolated using Trizol reagent (Gibco Life Technologies) from whole tissues of 67 day old male CBAH/1768.2 fos mice. The RNA concentration was measured spectrophotometrically and 75 µg was used for poly(A)+-RNA isolation. Poly(A)+-RNA was isolated using oligo(dT25) Dynabeads (Dynal) following the supplier's protocol. mRNA was reverse transcribed in the following reaction (20 µl): 1× reaction buffer (10 mM Tris-HCl pH 8.3, 90 mM KCl), 1 mM MnCl2, 200 µM each dNTP, 0.75 µM gene-specific reverse primer (R51H2, 5′-GAGAATCTGCCTTCTCTCTGAATC-3′; GAPDH, 5′-GAGAAACCTGCCAAGTATGATGAC-3′) and 5 U rTth reverse transcriptase (Perkin Elmer). The mixture was heated at 68°C for 10 min and then at 55°C for 30 min. The reaction was placed on ice and the following added: chelating buffer (Perkin Elmer) to 0.8×, MgCl2 to 1.5 mM, gene-specific forward primer to 0.15 µM (R51H2, 5′-TAAGCTTGTGATTGTTGACTCCAT-3′; GAPDH, 5′ TGATGGTATTCAAGAGAGTAGGGAG-3′) and water to 100 µl. This mixture was further incubated at 94°C for 2 min and then cycled at 94°C for 30 s, 57°C for 30 s and 72°C for 30 s, followed by an extension at 72°C for 7 min. Reactions were also done in the absence of Mn2+ to test for the presence of contaminating DNA, and none was found.
Identification of novel human members of the recA/RAD51 family
Searches of the databases, using queries based on conserved regions of predicted amino acid sequences of the recA/RAD51 gene family, detected several human and mouse ESTs with significant similarities to specific regions. In these searches, we were able to define clearly members of the RecA/Rad51 gene family, separately from members of related families (see Materials and Methods). Two novel genes were detected, which we have provisionally designated as RAD51 homologue 2 (R51H2) and homologue 3 (R51H3), since these genes have no obvious closer structural homologues among the known yeast genes in the RecA/Rad51 family. This numbering reflects the fact that the XRCC2 gene was the first human structural homologue of HsRAD51 to be identified (22), and that other members of this family which have a close structural homologue in yeast such as the recently identified human DMC1 gene (32) are not included. While this report was in preparation, R51H2 was also identified by others using degenerate PCR based on the sequence of a gene (REC2) from the smut fungus, Ustilago maydis, and was provisionally named hREC2 (33) (but see below and Discussion).
Cloning and sequencing of full-length cDNAs
The ESTs identified for human homologue 2 (HsR51H2) did not contain the 5′ end of the gene. A full-length HsR51H2 sequence was identified by probing a human cDNA library; this was sequenced and found to have an open-reading frame of 1050 bp, with a start codon (GGCATGG) closely conforming to the consensus identified by Kozak (34). However, when we compared this sequence with that derived from the ESTs in the database and the sequence recently published as hREC2 by Rice et al. (33), we found that our sequence differed substantially at the 3′ end (Fig. 1A). This difference encompassed the final 14 bp (five amino acids) of the open-reading frame, although the stop codon is in the same relative position and has the same sequence (TAG), and the whole 3′ untranslated region (445 bp). There is a characteristic poly(A) site close to the end of our HsR51H2 sequence (AATAAA; Fig. 1A), while this sequence was not present in hREC2 (33). We also compared HsR51H2 with its mouse counterpart (Fig. 1B), and found in contrast that the overall identity (85.1%) was maintained into the 3′ UTR (84.7% identity over the region that differed between the two forms of the human sequence; Fig. 1B). The published human sequence (hREC2) is only 37.4% identical to the mouse counterpart in the 3′ UTR. The small part of the coding frame which is dissimilar in the human sequences and the mouse sequence is compared in Figure 1B.
To test for the prevalence of the two forms of the human R51H2 sequence, we designed PCR primers to amplify these independently from cDNA libraries derived from different sources. Interestingly, as shown in Figure 2, both forms were detected in the library we used to derive the full-length human HsR51H2 sequence (lanes marked HeLa). Similarly, both forms were found in libraries from some other sources (testis, placenta, HF19, CP; Fig. 2), although in some, the R51H2 product level was somewhat lower than the hREC2 product level, while further libraries contained one form or the other (R51H2 in DX3; hREC2 in HT29).
The other novel gene, R51H3, encoded human and mouse sequences with 73.9% identity at the nucleotide level (including 5′ and 3′ UTRs; not shown) and 82.3% in predicted protein sequences (Fig. 3A). The predicted open-reading frame for the human nucleotide sequence is 984 bp, while the mouse sequence has an extra 3 bp. The translation initiation site (human AACATGG, mouse ACTATGG) is in good agreement with the consensus (34). In common with R51H2, the 5′ UTR is relatively GC-rich. A putative short nuclear localization sequence, as well as the nucleotide-binding domains characteristic of the RecA/Rad51 family are marked in Figure 3A. The first domain has the conserved [GxxxxGKT] structure, with additional glycines at alternate sites as commonly found in other adenine nucleotidebinding proteins (24), while the second domain has the characteristic hydrophobic β-sheet preceded by glycine at three residues upstream. One of the less common variants (35) of the poly(A) signal sequence (AATATA) is present in both the human and mouse gene (not shown).
In Figure 3B the regions of protein conservation of the human R51h2 and R51h3 are marked in comparison with HsRad51. Further areas of sequence conservation are seen, especially in the region of ∼75 amino acids surrounding the first nucleotide-binding domain; these conserved sequences are found in RecA-like proteins in the archaea, prokaryotes and eukaryotes (36). A second region of significant protein sequence conservation includes the second nucleotide-binding domain and ∼50 amino acids downstream (Fig. 3B); this region includes the sequence that in the RecA protein appears to be responsible for homologous DNA pairing (37,38).
Genomic DNA segments, isolated from a human PAC library, were used to localize the new genes by FISH on normal human metaphases. Since there may to be two different forms of the homologue 2 gene (see above), three PACs identified with the HsR51H2 probe were labelled and tested for chromosomal position. All three PACs mapped to the same chromosomal region, 14q23–24, in agreement with the recently published data for hREC2 (Fig. 4, top). The PACs isolated by probing with HsR51H3 cDNA mapped to human chromosome 17 (Fig. 4, bottom); screening a monochromosomal DNA library with primers specific for the 3′ end of HsR51H3 gave a product with only the chromosome 17 DNA, verifying this mapping (data not shown). Combined G-banded and fluorescent images for HsR51H3 localized the gene to 17q1.2. The chromosomal positions of these new members of the recA/RAD51 family differ from both HsRAD51 (chromosome 15q) (8) and XRCC2 (chromosome 7q36.1) (39,40).
Expression of mRNA
The tissue specificity of human gene expression was investigated by northern blot analysis with cDNA probes, as shown in Figure 5A. Probes specific to HsR51H2 hybridized to mRNA of ∼1.8 kb but this was expressed at a very low level in all tissues, with perhaps an increased level in testis. In view of this overall low expression, RT-PCR was carried out on mRNA from mouse tissues using gene-specific primers. To control for the amount of mRNA present, primers specific to the housekeeping gene GAPDH were used on the same samples. Figure 5B shows that amplified products were present in all tissues with the exception of lung tissue. Although RT-PCR can at best be only semi-quantitative, after correction for the amounts of mRNA present, we found that there were reproducibly elevated levels of R51H2 product in the spleen and especially in the testis.
Probes specific to HsR51H3 hybridized to an mRNA of ∼1.7 kb (Fig. 5A) and again was present in all tissues; in the prostate, small intestine and colon, the probe hybridized to an additional transcript of ∼7 kb presumably representing pre-mRNA. The 1.7 kb transcript is overexpressed 5–10-fold in the testis when compared with the other tissues.
We have identified two novel human members of the recA/RAD51 recombination-repair gene family, to add to the XRCC2 gene we recently identified (22), and found that these structural homologues are highly conserved in humans and rodents. The sequence of one of these structural homologues, HsR51H2, as well as its mouse counterpart, have recently been described independently by others under the names hREC2 and mREC2, respectively (33). However, we find that there are discrepancies between our human gene data (HsR51H2) and that of hREC2. Firstly, the gene we isolated differs in its sequence from a few codons before the stop codon through to the end of the 3′ UTR. In assessing the significance of this finding, we compared our human sequence with the mouse sequence, and found that these were highly homologous (Fig. 1B), while this was not true for hREC2 and mREC2. Since only one form of the mouse gene has been found, and this was present in libraries derived from normal 4 week old mice, this raised the possibility that our human gene sequence (R51H2) may be the ‘correct’ form of this gene. However, the hREC2 form was sequenced from an I.M.A.G.E. clone derived from breast tissue, and cDNAs representing parts of this sequence derived from at least one other library (foetal liver spleen) are also present in the EST database, suggesting that this form is not an artefact. The use of PCR with primers spanning the junction where the sequence differs, showed that different cell lines/tissues have either one of the gene forms, or both (Fig. 2). These findings suggest that both forms of the R51H2 gene are transcribed—perhaps to different extents in different tissues— and that the isoforms represent alternative splicing events at the 3′ end of the gene. Precedents for this intriguing finding have been found rarely; for example, genes having more than one 3′-terminal exon, situated in tandem, which can act as alternative sites of transcription termination in different tissues (41). We are at present examining genomic DNA from this region to see if this type of gene structure accounts for our findings with R51H2.
A further difference is in the expression pattern of R51H2, relative to that reported for hREC2. We found that the level of expression of R51H2 was too low in northern analysis to assess clearly whether a human tissue-specific pattern occurs, while Rice et al. (33) suggested that several human tissues had higher levels of hREC2 expression (placenta, lung, liver, pancreas, spleen, etc., but not testis). They contrasted this pattern with that found for RAD51 in the mouse, which is more abundant in testis, ovary, spleen and thymus (8). In our experiments, the R51H2 transcript was readily detectable by RT-PCR in mRNA from several different mouse tissues. These data suggest that MmR51H2 is expressed at highest levels in the testis (Fig. 5B), but has significant levels of expression in other tissues indicating a broader pattern of tissue expression than for MmRAD51. R51H3 also has a relatively broad tissue expression pattern (Fig. 5A), but again is expressed at highest levels in testis. The elevated level of expression of both R51H2 and R51H3 in the testis, in particular, is suggestive of a role in meiotic recombination events, but its expression in some other tissues indicates an additional mitotic role.
In searching the protein databases with the full coding sequences of R51H2 and R51H3, we find that they have no significant similarity to gene families other than the RecA/RAD51 family. Using BLASTX, R51H2 finds highly significant matches to regions of various Rad51 orthologues (probabilities of up to 4e-31), with ScRad51 at P = 6e-26 and HsRad51 at P = 5e-25. We did not find that R51h2 protein has significant similarity to the Rec2 gene product of U.maydis, which is approximately twice the size of other proteins in the RecA/Rad51 family (42), although as expected R51h2 does have significant similarity to the U.maydis Rad51 orthologue (43). In similar searches, R51H3 finds ScRad51 at P = 5e-10, with HsRad51 at P = 2e-8. Thus, the predicted protein of R51H2 is more closely related to Rad51 proteins than that of R51H3. However, if the amino acid sequences are directly compared along the whole length of the proteins, both R51h2 and R51h3 show a lower level of identity/similarity to RecA/Rad51 members than that shown by functional homologues of Rad51 from different species (see comparison of ScRad51 and HsRad51; Table 1). The values of identity/similarity for the new genes are close to those found when comparing other accepted members of the family, such as Rad55 and Rad57, with each other and with Rad51 (Table 1). At present it is not clear whether orthologues of Rad55 and Rad57 exist in mammals, or how well conserved their structure will be if these exist. Therefore, until further functional data are available, these novel members of the RecA/Rad51 family are best described as structural homologues of RAD51, since on sequence criteria alone they cannot be considered simply as HsRad51 isoforms, or as orthologues of ScRad55 and ScRad57.
The presence of more than one structural homologue of RAD51 in human tissues leads to questions of what their functions may be. There is clear evidence for conservation of specific residues in the central parts of these proteins, even in comparison with prokaryotic RecA proteins (residues underlined in the consensus of Fig. 3B). The crystal structure of Escherichia coli RecA has been solved (37), and from localization of the sequences of other members of the recA/Rad51 family onto the RecA structure it has been suggested that members will share common mechanisms for binding and hydrolysing ATP, leading to conformational change, as well as the ability to make filaments (44). We suggest from these structural analyses and our preliminary tissue expression data, with evidence for increased expression in germ cell tissue, that the new genes may also function in recombination-repair pathways. As found in E.coli (45,46), there may be several pathways to effect homologous recombination in human cells, involving groups of proteins with similar (and potentially overlapping) functions. These proteins would be organized to meet the diverse requirements of the cell for genetic exchange in a number of important processes, such as meiotic chromosome exchange and segregation and the repair of chromosome damage.
The work of the authors was supported in part by European Communities contract F14P-CT95-0010. A.M.D. is grateful for an MRC Research Studentship.