Cerebral cavernous malformations (CCM) are congenital vascular anomalies of the brain that can cause significant neurological disabilities, including intractable seizures and hemorrhagic stroke. One locus for autosomal dominant CCM (CCM1) maps to chromosome 7q21–q22. Recombination events in linked family members define a critical region of ∼2 Mb and a shared disease haplotype associated with a presumed founder effect in families of Mexican-American descent points to a potentially smaller region of interest. Using a genomic sequence-based positional cloning strategy, we have identified KRIT1, encoding a protein that interacts with the Krev-1/rap1a tumor suppressor, as the CCM1 gene. Seven different KRIT1 mutations have been identified in 23 distinct CCM1 families. The identical mutation is present in 16 of 21 Mexican-American families analyzed, substantiating a founder effect in this population. Other Mexican-American and non-Hispanic Caucasian CCM1 kindreds harbor other KRIT1 mutations. Identification of a common Mexican-American mutation has potential clinical significance for presymptomatic diagnosis of CCM in this population. In addition, these data point to a key role for the Krev-1/rap1a signaling pathway in angio-genesis and cerebrovascular disease.
Cerebral cavernous malformations (CCM) are congenital vascular anomalies of the brain, comprising 10–20% of all vascular malformations of the central nervous system, with an incidence in the general population of 0.1–0.5% (1,2). Histopathological studies show focal, well-circumscribed, thin-walled vascular spaces. The grossly dilated vascular spaces are lined by a single layer of endothelium, usually without any intervening neural parenchyma. They are also devoid of elastic tissue, smooth muscle and mature vessel wall elements (1–3).
Although cavernous malformations of the brain are congenital, they usually present clinically between the third and fifth decades of life (2,4,5). Therefore, many affected individuals are clinically quiescent in early life. Patients most often present with intractable seizures, intracerebral hemorrhage, recurrent headaches and focal neurological deficits (2,4,5). Hemorrhagic stroke, due to massive intracerebral hemorrhage of the vascular lesion, can be fatal and is often the first and only clinical presentation (6). CCM lesions therefore behave as intra-cerebral vascular tumors that present as space-occupying lesions, epileptogenic foci and sites for recurrent hemorrhage.
Both sporadic and familial forms of CCM have been identified. A familial form exhibits autosomal dominant inheritance with incomplete penetrance (OMIM 116860). However, a recent report suggests that up to 75% of the so-called sporadic cases are in reality members of an affected family, with asymptomatic vascular lesions in relatives masking the autosomal dominant segregation pattern (7). This suggests that the population incidence of familial CCM may be significantly underestimated.
Genetic linkage studies revealed a locus for CCM (CCM1) on chromosome 7q (8–10). Two additional loci, mapping to chromosomes 7p and 3q, respectively, have also been identified (11). Crossovers in multigenerational families define an ∼4 cM critical region for ccm1, bordered by D7S2410 and D7S689 (12). The apparent high prevalence of the disease in families of Mexican-American heritage (1,13) and evidence for a common disease haplotype in affected kindreds from this population (12,14) suggest a possible founder effect.
Provided with a convincing genetic localization of the CCM1 gene and with a strong genomic infrastructure for this region of chromosome 7, including a complete physical and rapidly evolving sequence map, we sought to identify the CCM1 gene by a positional cloning strategy. Here we report the outcome of this effort, which has successfully identified KRIT1 as the CCM1 gene.
Refinement of the CCM1 critical region
Two previously unpublished non-Hispanic families were ascertained. Family 2 is an American Caucasian family. While showing a clear autosomal dominant inheritance pattern for CCM, initial linkage analysis of this family was ambiguous. In part, this reflected the fact that a number of the family members were of uncertain affectation status (unpublished data). Family 55, a small Canadian Caucasian family, initially showed weak linkage to the CCM1 locus (lod 0.74 at Ø = 0) and strongly negative linkage to both the CCM2 and CCM3 loci (11; unpublished data). Families 2 and 55 did not allow additional refinement of the established CCM1 critical interval (12).
Our previous studies revealed a common disease haplotype among three apparently unrelated Mexican-American CCM1 families (12). To utilize potential ancestral crossovers for further refinement of the critical interval, we collected 18 additional CCM families of Mexican-American descent. A series of new microsatellite-based genetic markers were developed from the available genomic sequence (see below) and used to analyze the available CCM1 families. Haplotype construction for the Mexican-American families confirmed the shared disease haplotype (Fig. 1), consistent with the findings of others (15,16). Although most Mexican-American families shared alleles over a broad interval encompassing the entire critical region based on crossovers, two families (12 and 1200) appeared to share a smaller portion of the interval (Fig. 1), suggesting that the lineage of this genomic segment derives from ancestral crossovers. D7S689, a region of 450 kb, and the new marker 085–222 (Fig. 1) bordered the smallest common region. Thus, genetic mapping suggests two regions of interest: a larger ∼2 Mb region based on recombinants in families and bordered by D7S2410 and D7S689 and an ∼450 kb region contained within the telomeric end of the larger region bordered by 085–222 at the centromeric end (Fig. 1).
Physical mapping and sequencing of the CCM1 critical region
We reported previously the construction of a complete yeast artificial chromosome (YAC)-based physical map of the region of human chromosome 7 encompassing the D7S2410-D7S689 interval (12). This map and its associated sequence-tagged sites (STSs) were used to develop a corresponding bacterial clone [bacterial artificial chromosome (BAC) and P1-derived artificial chromosome (PAC)] contig. The genetically defined CCM1 critical region is ∼2.0 Mb in size and represented by a minimal set of 20 overlapping BAC/PAC clones (Fig. 2). These were subjected to systematic sequencing at the Washington University Genome Sequencing Center.
Absence of mutations within the narrowest critical region
All known and computer predicted exons mapping within the narrowest (450 kb) region suggested by the apparent shared disease haplotype in Mexican-American families were analyzed. Although >120 potential exons were evaluated, no CCM-associated mutations were identified. We recognized that two families that might not share a common ancestral mutation primarily define the narrow region. Since the 7q telomeric boundary of the Mexican-American disease haplotype is the same as that identified by crossovers in Caucasian families, we extended our efforts in the centromeric direction.
KRIT1 mutations in Mexican-American families
One of the known genes within the larger candidate region is KRIT1, which encodes a novel Krev-1/rap1a binding protein (17). Alignment of the KRIT1 cDNA sequence with the genomic sequence (BAC RG161K23) revealed that the ORF is distributed over 12 exons extending over 37.7 kb of genomic DNA (Fig. 2e). Probands from Mexican-American families 10 and 200 were found to harbor a single base transition (742C→T) in exon 6 of KRIT1,changingaGlntoapremature termination codon (Fig. 3a). Significantly, the proband from family 12 (the third Mexican-American family in the initial screening panel) does not harbor this mutation. This base substitution was not detected in 45 control individuals (90 chromosomes) of Mexican-American descent. Co-segregation of the mutation with disease status was assessed by conformation-sensitive gel electrophoresis (18). The mutation was present only in affected individuals or in unaffected family members containing the disease haplotype (and therefore assumed to be asymptomatic mutation carriers). This mutation was identified in a majority (16 of 21) of Mexican-American families analyzed (families 11, 13, 14, 15, 16, 123, 152, 128, 300, 500, 600, 800, 900 and 1100).
Distinct mutations in KRIT1 were identified in other Mexican-American families (Table 1). Family 400 carries a single base pair deletion (1089delA) in exon 8, resulting in a frameshift and premature termination. Family 12 carries a single base pair deletion (1314delT) in exon 10, leading to a frameshift and premature termination. Family 1200 carries a single base pair transversion (990T→ A) in exon 8, converting a Tyr to a termination codon. Mutations in KRIT1 were not identified in two families from this group: one Mexican-American and one collected entirely in Mexico. Although both families share some alleles for relevant markers within the CCM1 critical region with the other Mexican-American families, neither family shares an extensive portion of the disease haplotype with the others (data not shown). Both families are too small to conclusively exhibit or exclude linkage to the CCM1 locus.
Analysis of the genetic haplotype data for the Mexican-American families reveals a correlation between the KRIT1 mutations and the common disease haplotype for the interval between markers D7S1789 and 552–1166 (Fig. 1). All families with the 742C→T mutation share the common disease haplo-type in this interval; for some families, the shared haplotype extends for additional markers mapping in both directions (Fig. 1). These data demonstrate that a founder mutation does exist in this population and that the majority of Mexican-American CCM1 chromosomes are descended from a common ancestor.
Further analysis of haplotypes and KRIT1 mutations reveals that family 400 appears to share the common disease haplotype between D7S1789 and 552–1166, a region that includes KRIT1, but harbors a unique KRIT1 mutation (Fig. 1). This apparent sharing of the disease haplotype must reflect identity by state and not by descent. Families 12 and 1200 also harbor unique KRIT1 mutations and share a decreasing portion of the common disease haplotype that does not include the region immediately surrounding KRIT1. This also likely reflects identity by state and not by descent. Importantly, inclusion of the latter two families in defining the most delimited common disease haplotype erroneously diverted the initial gene search towards the region just telomeric of KRIT1.
KRIT1 mutations in non-Hispanic Caucasian families
There is no evidence for a common disease haplotype in non-Hispanic Caucasian families (12; unpublished data). Not surprisingly, distinct KRIT1 mutations were found in each of the four Caucasian CCM1 families (Table 1). Family 20 carries the same mutation (1089delA in exon 8; Fig. 3b) as that present in family 400 of the Mexican-American kindreds, although the families are unrelated by disease haplotype and ancestry. The mutation consists of the loss of a single A in a run of six A residues, which may explain its independent origin. Families 1 and 2 harbor different splice site mutations. Family 1 exhibits a single base pair transition at the exon-intron border in intron 4 (IVS4+1G→A) that disrupts the invariant splice donor site from gt to at (Fig. 3c). Family 2 exhibits a single base pair transition at the intron-exon border in intron 8 (IVS8-2A→G) that disrupts the invariant splice acceptor site from ag to gg. A single base pair transversion (281C→G in exon 3) in family 55 changes a Ser to a termination codon.
In total, seven distinct KRIT1 mutations were found among the CCM1 kindreds examined (Table 1). Mutations fall into one of three classes: frameshifts, premature chain terminations and splice site mutations (Fig. 4). The combined data suggest that these are all loss-of-function alleles. Thus far, no missense mutations have been identified.
Computational analysis of the KRIT1 protein
A BLASTP search of KRIT1 against the non-redundant collection of GenBank coding sequences reveals many significant matches to proteins containing either the ankyrin domain (19) or the so-called FERM domain (20,21). However, the two best matches (apart from the match of KRIT1 with itself) are to two uncharacterized genes. Residues 35–312 of KRIT1 align with the protein encoded by the predicted nematode gene CAB03514 (Z81143) at an E value of ∼10−20 and residues 252–472 align with a conceptual translation from a nematode expressed sequence tag (EST) CAB02995 (Z81069) at an E value of ∼10−13.
To better understand the domain organization of KRIT1, a search of the SMART database of protein domains was conducted. This revealed that KRIT1 contains at least three ankyrin domains, beginning at residue 80 and ending at residue 177, and one FERM (or B41) domain, spanning residues 209–433 (Fig. 4). Furthermore, this domain composition appears to be unique to KRIT1, with no other proteins in the SMART database containing both ankyrin and FERM domains. We note that Serebriiskii et al. (17) originally reported that KRIT1 has four ankyrin domains, but only three are detected using the strict SMART criteria.
The positional cloning-based identification of KRIT1 as the CCM1 gene was facilitated by the availability of the complete genomic sequence of the region (22). The sequence provided new microsatellite markers for refined haplotype construction, detailed information about known genes and clues about computer predicted expressed sequences. Since few known genes mapped to the interval, we relied on gene prediction programs to identify putative exons for mutation screening. Of note, although the CCM1 locus ultimately proved to be a previously known gene, all 12 KRIT1 exons were identified by at least one of the set of gene prediction programs utilized (23).
The strategy to concentrate our efforts on families of Mexican-American descent for the purpose of mapping ancestral crossovers proved to be sound, although the assumption that all such families were related to a common ancestor proved to be incorrect. The inappropriate inclusion of putative ancestral recombination events suggested by two families diverted the initial search ∼200 kb telomeric of KRIT1. These families falsely appeared to share a reduced portion of the common disease haplotype with the majority of the other Mexican-American CCM1 families. Some of the microsatellite markers used for determining disease haplotypes were of low heterozygosity, with alleles for some of the markers being relatively common in the Mexican-American population. Linkage disequilibrium is an alternative approach for fine mapping of disease genes when a founder effect is suspected, with this approach less influenced by the haplotypes of one or two specific families. Such an approach was used by other investigators searching for the CCM1 gene, with their data also predicting that it resided a few cM telomeric of KRIT1 (15). Nonetheless, the great majority of CCM1 alleles in Mexican-American families consist of the identical 742C→T (Q248X) KRIT1 mutation. This observation has potential clinical significance for presymptomatic diagnosis of CCM in the Mexican-American population.
A central question of great interest is the function of the KRIT1-encoded protein. Clues about KRIT1 function come partly from its initial isolation as a binding partner of Krev-1/ rap1A in the yeast interaction trap assay (17). Krev-1/rap1a was originally isolated due to its ability to partially revert the transformed phenotype of Kirsten sarcoma virus-transformed cells and later shown to be a member of the Ras family of GTPases with tumor suppressing activity for the ras oncogenes (24–26). The protein residues in both Krev-1/rap1a and Ras that are responsible for interaction with effectors are identical, indicating that these two proteins mediate opposing effects through common downstream partners like Raf-1, B-Raf, and RalGDS (26–31). The homology between the Krev-1/rap1a protein and Ras, its ability to interact with most cellular effectors of Ras and its ability to revert the oncogenic effect of Ras in vivo suggests that Krev-1/rap1a behaves as a tumor suppressor in the cell, possibly by sequestering Ras-GAP and acting as a competitive inhibitor of Ras-GAP interaction. Whether the KRIT1 mutations activate or inactivate Krev-1/ rap1a signaling remains to be determined. Krev-1/rap1a homologs in Drosophila (Ras3) and yeast (RSR/BUD1) also regulate GTPase signal transduction cascades controlling cellular differentiation, morphology, polarity and cytoskeletal structure (32,33). Some of these same regulatory circuits for these fundamental characteristics of the cell may be conserved in mammalian species.
Other potential clues about KRIT1 function can be gained by analyzing the structural motifs of the protein. The ∼33 residue ankyrin repeat has been described in >400 proteins with diverse functions and, in a number of cases, is known to mediate protein-protein interactions (34,35). The ∼225 residue FERM domain is present in a superfamily of proteins, including the band 4.1 protein originally isolated from human erythrocyte ghosts and the ezrin/radixin/moesin (ERM) protein family that regulates the interaction of plasma membranes with the actin cytoskeleton. However, it should be noted that most FERM-containing proteins possess an N-terminal FERM domain, whereas the FERM domain of KRIT1 resides towards the C-terminus (21). Furthermore, the combination of a FERM domain with ankyrin repeats is apparently unique.
The expression profiles of KRIT1 and Krev-1/rap1a only hint at a role in vascular development. Krev-1/rap1a is expressed in a number of tissues that includes the small blood vessels of many organs (36). Previously described northern analysis of KRIT1 showed weak expression in heart, muscle and brain, but not in a number of other tissues (17). Our analysis shows varying levels of expression in heart, brain, placenta, lung, liver, skeletal muscle, kidney and pancreas (data not shown), suggesting that KRIT1 expression is not restricted to vascular or cerebrovascular tissue. The KRIT1 expression pattern during development has yet to be reported. As CCMs may be present at birth, they may develop in utero, decades before any clinical presentation. A better understanding of KRIT1 expression in a developmental and anatomical context is required to establish why the CCM phenotype is restricted to the vasculature of the brain.
A working model would be that cerebral cavernous malformations are benign vascular tumors that develop due to an alteration in an important growth control pathway(s) involving Krev-1/rap1a. In support of this, the Krev-1/rap1a pathway is also altered in tuberous sclerosis type 2 (TSC2), an autosomal dominant condition characterized by benign neurocutaneous tumors. Significantly, TSC2, like CCM, is often associated with seizures. The TSC2 gene product, tuberin, functions as a tumor suppressor protein by acting as a GTPase-activating protein for Krev-1/rap1a (37). Somatic inactivation or loss of heterozygosity of the wild-type tuberin allele in TSC2-associated tumors leads to constitutive activation of Krev-1/rap1a and to unregulated growth (38,39). The focal nature of CCMs suggests that these vascular tumors may develop due to loss of the wild-type KRIT1 allele in the developing cerebral vasculature. A two-hit model, with a central tumor suppressor function of KRIT1, would be similar to the mechanism of tuberin inactivation in TSC2-associated tumors. Alternatively, the etiology of vascular lesion development may relate to a mechanical or environmental trigger in the cerebral vasculature, coupled with the underlying genetic defect in KRIT1. Whatever the mechanism, the data reported here point to a key role for the KRIT1 and the Krev-1/rap1a pathway in the complex process of angiogenesis and, potentially, in other forms of cerebrovascular disease.
Materials and Methods
Families and DNA samples
We ascertained and collected DNA samples from 25 multi-generation families with CCM under appropriate IRB-approved informed consent. Twenty-one of these are of Mexican-American descent and four are Caucasian. Clinical details, pedigrees and genetic linkage data for families 1, 20, 10, 200 and 300 have been reported previously (12). Two previously unpublished Caucasian families (2 and 55) are included in these data, as described above. Eighteen new unpublished Mexican-American families were collected ranging in size from seven members (four affected) to a single affected individual. One of these families was collected entirely in Mexico. Diagnosis of most patients with CCM was based on detailed neurological evaluation that included clinical features, familial occurrence, MRI examination and, when possible, post-surgery histopathology.
DNA was purified from whole blood using established methods (Puregene; Gentra Systems, Research Triangle Park, NC). Each sample was assigned a consecutive sample number, entered into a secure database and identified only by that number for all subsequent analyses. Individual affected and unaffected chromosomes 7 were isolated from patient mono-cytes from a number of affected individuals and maintained as somatic cell hybrids in a rodent cell line (RJK88). DNA from these cell lines was used for exon amplification and mutation analysis when available.
New families were genotyped as described (40). New micro-satellite markers (085–-411, 085–222, 552–1166 and others not shown) were generated from genomic sequence generated in the candidate interval. The primer sequences for the newer markers are: 085–411, forward, cactttaaaagatcacagaatctc, reverse, aggcacagctgagttatgat; 085–222, forward, catctacta-gaatgcccagc, reverse, caaatggttccttcctttct; 552–1166,forward, gtctgagcagttaaaatggc, reverse, gttgagtgtccacctgagat. Two-point lod scores were calculated for the largest Caucasian families using LINKAGE v.5.03b (41). Haplotypes were generated both manually and using the Cyrillic software package (Cherwell Scientific, Oxford, UK).
Construction of a bacterial clone-based physical map
Human DNA-containing BAC (42) and PAC clones (43) were isolated, analyzed and assembled into contigs as described previously (44–46). Specifically, this involved hybridization (47) and PCR screening (48) of available human genomic libraries, PCR analysis of isolated clones to establish their STS content and restriction enzyme digest-based fingerprint analysis (49). For contig expansion, sequence was obtained from the insert ends of clones residing at contig ends and used to isolate additional BACs/PACs. Clones are named with a prefix reflecting their library of origin: RG, Research Genetics human BAC library (Huntsville, AL; see http://www.resgen.com); GS, Genome Systems human BAC library (St Louis, MO; see http://www.genomesystems.com); DJ, Roswell Park Cancer Institute PAC library (see http://bacpac.med.buffalo.edu).
Identification of candidate genes in the CCM1 critical region
Several strategies were used to identify genes within the CCM1 critical region. The genomic sequence of the critical region (Fig. 2a) was analyzed for the presence of known genes or matching ESTs. This was supplemented by the use of the gene prediction programs GRAIL II (50), GENESCAN (51) and MZEF (52). Data were assimilated and organized using a computational approach for analyzing and managing genomic sequence (23).
CCM1 mutation panel
In total, 25 different CCM families were used for mutation analysis, 21 of which were of Mexican-American descent and four non-Hispanic Caucasian. An initial smaller screening panel included the probands of the four non-Hispanic CCM1 families, expected to harbor distinct mutations, and those from only three of the 21 Mexican-American families, assumed to harbor the same mutation. The largest of both groups of families havebeen showntobelinkedto the CCM1 locus (8,10,12); others were smaller but consistent with linkage, while a few of the Mexican-American kindreds were too small for genetic linkage analysis. Each individual exon (known or computer predicted) was amplified by PCR and subjected to direct DNA sequencing.
PCR amplification of exons
KRIT1 exon-containing fragments were PCR amplified from genomic DNA (Table 2). Primers were designed from the genomic sequence flanking the exons using the program PRIMER v.3.0 (http://www-genome.wi.mit.edu) ensuring that the amplified project contained putative splice junctions (Table 3). All the fragments were amplified using the following amplification cycle: initial denaturation at 95°C for 5 min followed by 35 cycles of 94°C for 30 s, 56°C for 60 s and 72°C for 60 s.
Mutation detection and analysis
PCR-amplified products were separated in low melting point agarose gels and specific fragments were excised, crushed, freeze-thawed and purified by spinning and retrieving the supernatant. Direct DNA sequencing of the amplified genomic dNa fragment was carried out with a ThermoSequenase Cycle Sequencing kit (Amersham Life Science, Cleveland, OH) utilizing 33P terminator labeling. Sequencing reaction products were resolved on denaturing polyacrylamide gels and loaded so as to have all the individual ddATP, ddCTP, ddGTP or ddTTP lanes together in sets. Segregation of the mutation with affected phenotype was carried out by either PCR amplification and sequencing of the appropriate exon for each family or by conformation-sensitive gel electrophoresis of the amplified fragments in high resolution 15% polyacylamide gels containing ethylene glycol (10%) and formamide (15%) (18).
Computer modeling of KRIT1 structure
The KRIT1 protein sequence (accession no. NP_004903; 529 residues) was retrieved from www.ncbi.nlm.nih.gov/Entrez/ and used for all subsequent analyses. Sequence similarity searching was carried out using the BLAST family of programs (53). Domain analysis was performed using SMART (the Simple Modular Architecture Research Tool) available at http://coot.embl-heidelberg.de/SMART (54,55).
GenBank accession numbers
KRIT1 mRNA, U90268; BAC, RG161K23 and AC000120.
We are grateful to the patients and their families for their participation in this study and the Washington University Genome Sequencing Center for their generation and open dissemination of human genomic sequence data. We also thank Marco Marra, John McPherson, Brad Barbazuk and Tammy Kucaba for collaborative assistance in bacterial clone-based physical mapping and Blaine Hart (University of New Mexico) for neuroimaging and diagnostic confirmation. This work was supported by American Heart Association Scientist Development Grant 9630236N to E.W.J. and Grant-in-Aid 95006970 to D.A.M. D.A.M. is an Established Investigator of the American Heart Association.