The molecular background of human autoimmunity is poorly understood. Although many autoimmune diseases have a genetic basis, the actual disease appearance results from a complex interplay between genes and environment and thus these diseases represent typical multifactorial diseases. Even with molecular tools provided by the Human Genome Project, it still remains a challenge to identify the predisposing DNA variants behind such multifactorial traits. Two strategies have been suggested to provide short-cuts to the dissection of the genetic background of complex autoimmune diseases: (i) identification of genes in rare human diseases with a strong autoimmune component or (ii) unravelling loci causing phenotypes resembling autoimmune diseases in inbred mice strains. Autoimmune polyendocrinopathy-candidiasis-ectodermal dystrophy (APECED) is a monogenic autosomal disease with a recessive inheritance pattern, characterized by multiple autoimmune endocrinopathies, chronic mucocutaneous candidiasis and ectodermal dystrophies. Since it is the only known human autoimmune disease inherited in a Mendelian fashion, it provides an excellent model to analyse the genetic component of human autoimmunity. The causative gene for APECED was isolated recently by a traditional positional cloning strategy by two independent groups. The cDNA for the APECED gene proved to originate from a novel gene, AIRE, which is expressed prevalently in thymus, pancreas and adrenal cortex. Multiple mutations in AIRE have been identified in APECED patients. The predicted proline-rich AIRE polypeptide harbours two PHD-type zinc finger motifs and contains a putative nuclear targeting signal suggesting its involvement in the regulation of transcription. In the future, functional analysis of the AIRE protein both in vitro and in vivo will provide valuable insight not only into the molecular pathogenesis of APECED but also into the aetiology of autoimmunity in general.
Clinical Phenotype of Apeced
Autoimmune polyendocrinopathy-candidiasis-ectodermal dystrophy (APECED; OMIM 240300) is a rare autoimmune disease affecting mainly the endocrine glands (1). Two distinct autoimmune polyglandular syndromes have been characterized, originally referred to as type I (APECED disease) and type II (Schmidt's complex) (2). APECED has no known HLA association, in contrast with the Schmidt's complex which is associated with the HLA-A1,B8,DR3 haplotype (3). The two syndromes have a different aetiology: APECED is clearly a monogenic disease, whereas it cannot be stated with certainty whether the basis of Schmidt's complex is genetic, or, if genetic, whether a single gene change is involved (4). The syndromes can be distinguished from each other according to the clinical manifestations specific to each particular disease. Schmidt's complex is characterized by a combination of hypothyroidism and primary adrenocortical failure (Addison's disease), and these may be associated with insulin-dependent diabetes mellitus (IDDM), myasthenia gravis and pernicious anaemia (3). The typical triad for APECED disease is the presence of hypoparathyroidism, primary adrenocortical failure and chronic mucocutaneous candidiasis. However, the clinical phenotype of APECED is composed of a highly variable combination of autoimmune reactions towards different endocrine and non-endocrine organs, resulting in failure of parathyroid glands, adrenal cortex, gonads, pancreatic β-cells and gastric parietal cells. These are associated with ectodermal manifestations such as dystrophies of dental enamel and nails. The clinical features of APECED have been reviewed in detail by Ahonen et al. (5), and are summarized in Table 1. The symptoms typically begin in childhood, and most patients have multiple disease components. All patients need life-long counselling and monitoring for development of new symptoms which may appear as late as during the fifth decade of life. The most life-threatening complications are oral squamous cell carcinoma and fulminant autoimmune hepatitis. Maintenance of normocal-caemia is demanding in many patients with hypoparathyroidism. Keratopathies may lead to eye cataract and even blindness, and infertility is a frequent problem of the female patients (5,6).
Immunological Features of Apeced
The autoimmune manifestations observed in APECED patients are characterized by lymphocytic infiltrations in target organs and by the presence of a wide variety of tissue-specific circulating autoantibodies. The target organs eventually will be destroyed and the corresponding endocrinological functions annihilated (6).
Adrenal cortex is one of the characteristic target tissues for tissue destruction in APECED, and both adrenal antibodies (AAs) and steroidal cell antibodies (SCAs) have been demonstrated in the patients' sera. AAs bind only to the adrenal cortex, whereas SCAs also bind to ovary, testis and placenta. AAs and SCAs have a prognostic significance, since they herald the development of adrenal failure and also precede ovarian failure (7). In APECED patients, all identified autoantigens in the adrenal cortex and the gonads are cytochrome P450 enzymes involved in steroid synthesis: steroid 17α-hydroxylase (P450c17) (8), cholesterol side chain-cleaving enzyme (P450scc) and steroid 21-hydroxylase (P450c21) have all been identified as reactive with serum samples from the patients (9,10) (Table 2). P450c21 correlates with the presence of AAs, indicating that it is the major antigen in adrenal autoimmune diseases, whereas autoantigens to P450c17 and P450scc represent components of SCAs (13).
Another target organ in APECED are the Langerhans' islets of the pancreas. Antibodies against aromatic l-amino acid decar-boxylase (AADC) (12), glutamic acid decarboxylase (Gad65) (13,14) and pancreatic cytoplasmic islet cell antibodies (ICAs) (15) are common in APECED patients and often present in high titres. However, even if these antibodies are generally predictive of the appearance of IDDM, they seem to have no value in the prediction of IDDM in APECED patients (16). In APECED patients suffering from chronic hepatitis, cytochrome P450 enzymes 1A2 and 2A6 have been shown to be the major hepatocellular autoantigens (17,18). The value of the antibodies recognizing these antigens in predicting the life-threatening component of hepatitis is not known. Presumably they are useful at least in the monitoring of the activity of hepatitis and the response to immunosuppressive therapy.
A specific but uncharacterized defect in the T-cell activity of APECED patients has been suggested by the presence of chronic mucocutaneous candidial infections (19). Weak and delayed hypersensitivity reactions to Candida antigens would support this hypothesis further (20). Despite the cutaneous anergy to these antigens and the inability to eradicate C.albicans from body surfaces, the patients have high levels of protective antibodies against major Candida antigens (21). This, together with the apparently normal vaccination responses, would indicate intact antibody formation in APECED patients (22).
APECED represents a unique autoimmune syndrome in many aspects. As described above, the clinical phenotype in individual families represents a spectrum of pathological features and often includes most known tissue-specific autoimmune diseases. In addition, the disease is not associated with the HLA region on chromosome 6. Consequently, unravelling the molecular pathology of APECED should provide a short-cut for dissection of the molecular background of autoimmune phenomena in general. An understanding of the molecular pathogenesis of APECED should shed light on the HLA-independent processes in the destruction of multiple target cells and tissues in autoimmune disorders (7,23).
Genetics of Apeced
Although APECED cases are reported worldwide, the incidence of the disease is especially high in Finland (1 in 25 000), among Iranian Jews (1 in 9000) and in the Sardinian population (1,17,24). APECED belongs to the Finnish disease heritage, and the predicted homogeneity of the mutations in these diseases offers a valuable advantage for both the mapping and positioning of the causative genes. We identified linkage to chromosome 21q22.3 in Finnish family material using a random mapping approach with 250 markers. The obligatory recombinations initially restricted the locus to a 2.6 cM region between markers D21S49 and D21S171 (25). When bi-allelic polymorphisms in the critical region were analysed, an ancient recombination in the haplotype became evident in one Finnish family with marker D21S25. This narrowed the APECED critical region to 350 kb, emphasizing the power of monitoring for ancient recombinations in disease chromosomes of isolated populations (26,27). By monitoring the segregation of the haplotypes of APECED chromosomes, we were able to demonstrate locus homogeneity for APECED also among patients from other populations (28). This indicated that despite the high phenotype diversity in families from different populations, the disease is caused by a spectrum of different mutations in one single locus.
Positional Cloning of The Apeced Gene
Human chromosome 21 has been a model chromosome for the Human Genome Project because of its small size and because of the extensive research focused on this chromosome largely due to Down syndrome. However, despite several physical maps published on chromosome 21, the available YAC maps still show major gaps and deleted clones, in particular in the 21q22.3 region critical for APECED (29–31). To clone the APECED region into a sequence-ready map, we used P1, PAC and BAC libraries as well as a chromosome 21-specific cosmid library. We observed that the redundancy of clones in the APECED region tends to be lower than that reported for particular libraries in general, possibly reflecting a local compositional bias or the occurrence of recombination-prone sequences.
In the physical mapping, we took advantage of the different resolution levels offered by fluorescence in situ hybridization (FISH)-based visual mapping techniques (32,33). All identified clones were hybridized systematically to metaphase chromosomal targets to ensure correct chromosomal localization and to exclude chimerism of the clones. For the actual ordering and orientation of the physical clones, we utilized the fibre FISH technique and hybridized the clones on extended DNA fibres. The fibre FISH technique, combined with an ultrasensitive tyramide-based detection method, also allowed precise mapping of candidate transcripts on the critical physical clones (Fig. 1) (27).
The only known gene that was localized to the 350 kb refined APECED region was the gene encoding liver-type phosphofruc-tokinase (PFKL) (34), but this gene was excluded as causative for APECED. We applied different gene identification methods to the bacterial-based genomic clone templates spanning the APECED candidate region. Direct cDNA selection (35), exon trapping (36) and large-scale genomic sequencing followed by computer-assisted searches were utilized to identify positional candidates for APECED. One novel gene on the region, C21orf3 (EMBL accession no. Z93322), identified by exon trapping and mapping only a few kilobases distal to PFKL, was soon also excluded as the APECED gene (37). Genomic sequencing was performed on two cosmids localized just proximal to PFKL, and 87 kb of genomic sequence was analysed by gene prediction software, such as GRAIL (38) and Genie (39). Several gene models were predicted in the genomic sequence, and notably one in the immediate vicinity of PFKL was recognized by several gene-finding programs. This gene, which proved to be the APECED gene, was actually identified as a group of predicted exons, totally void of expressed sequence tag (EST) signatures in the databases. The physical location of this gene model made it an excellent candidate for the causative gene of APECED. A 316 bp genomic probe spanning two of the predicted exons was obtained by PCR and allowed the identification of a single strand conformation polymorphism (SSCP) shift in the Finnish APECED patients compared with the controls (40). This initial finding justified the further analyses of this novel gene.
The AIRE Gene
The cDNA corresponding to the APECED gene was isolated from a human adult thymus cDNA library and a 3′-untranslated region (3′-UTR) extension PCR product was used to deduce the composite cDNA sequence of 2245 kb (EMBL accession no. Z97990) (Fig. 2). The gene shows a relatively compact structure, with 14 exons spanning 11.9 kb of genomic DNA, and the exon-intron boundaries follow the canonical GT-AG rule (41). A putative promotor featuring a TATA box, a GC box and a CpG island was identified immediately upstream of the first AIRE exon. The 3′ end of the AIRE gene overlaps with the promotor region of the PFKL gene transcribed from the same strand (42). The cDNA sequence exhibits a high GC content of 68.8%, and contains an open reading frame (ORF) of 581 amino acids. The likely initiator ATG codon is found at nucleotide 121, followed by a STOP codon at nucleotide 1756. Thus, the novel gene is predicted to encode a protein of 545 amino acids, and the mRNA levels are most prevalent in thymus, pancreas, adrenal cortex and testis (40). Interestingly, pancreas, adrenal cortex and testis are among the tissues against which high titres of autoantibodies are detected in the sera of patients (Table 2), and the role of thymus is evident for T-cell-mediated immunity disturbed in APECED. Mutations initially found in the Finnish APECED patients in this cDNA determined its causative role in APECED and the gene was named AIRE for autoimmune regulator.
An independent study by Nagamine et al. published in the same issue of Nature Genetics also reported this same gene to be defective in APECED (43). The major difference between the two papers was that Nagamine et al. identified multiple mRNA species, resulting from alternative splicing. Also, the tissue distribution of steady-state transcripts differed slightly in the two reports. We identified expression of an −2 kb transcript in a wide variety of tissues, with the highest expression levels in thymus, pancreas and adrenal cortex, whereas Nagamine et al. identified transcripts only in thymus, lymph node, fetal liver and appendix.
The Apeced Mutations
In our sample panel of DNAs from APECED patients from different populations, we have screened systematically all 14 exons of the AIRE gene amplified from genomic DNA with SSCP (44), and the fragments showing alterations between patients and controls have been sequenced. All changes detected have been monitored against a control panel of 500 unrelated Finns and 60 unrelated Europeans including 32 CEPH parents. To date, multiple mutations in patients with different ethnic background have been identified (Fig. 2).
The most common mutation worldwide is a C→T transition at nucleotide 889 in exon 6. This mutation changes an Arg into a premature STOP codon and is predicted to lead to a truncated 256 residue protein lacking both PHD finger domains. This mutation was denoted as APECEDFin major since it is found in 85% of the Finnish disease chromosomes. The carrier frequency of the Fin major mutation was observed to be 1:250 in Finland. Surprisingly, the same mutation was also found in an Italian patient in heterozygous form and in a German patient in homozygous form, yet these patients carry different haplotypes from the Finnish major haplotype. One CEPH parent in the control panel was found to be heterozygous for this mutation, further suggesting the relatively high prevalence of this mutation in populations outside Finland (40). So far, a total of 12 APECED mutations have been identified, representing both single nucleotide substitutions and small insertions and deletions (Fig. 2). The mutations are spread throughout the coding sequence; the only suggestive clustering is observed in the region encoding the first PHD zinc finger domain. Based on information of individual mutations available so far, no correlation exists between the mutation type or location and the clinical phenotype of the patients.
The Predicted Apeced Protein
The conceptual APECED protein consists of 545 amino acid residues. The theoretical molecular weight for this protein is 57.7 kDa and the calculated pI is 7.53. The protein is proline rich (11.7%), and no apparent charge clusters or periodicity patterns can be identified. The predicted secondary structure consists mainly of coils. This is in agreement with the high proline content of the protein. No cell sorting signals are apparent, yet a putative bipartite nuclear targeting signal is found between amino acids 113 and 133, suggesting nuclear localization of the protein (40) (Fig. 2).
Sequence comparison with known proteins in the databases indicated that the APECED protein harbours two zinc finger motifs (Fig. 2). Two cysteine-rich regions of 42 amino acids specify a Cys4-His-Cys3 double-paired finger motif of the plant homeodomain (PHD) type, originally described in plants (45,46). Spacing of the essential cysteine and histidine residues is perfectly conserved in two PHD fingers of the APECED protein. This particular motif is also found in nuclear proteins involved in the regulation of transcription, such as Kruppel-associated box A (KRAB-A)-interacting protein (KRIP-1), transcription intermediary factor 1 (TIF1) and Mi-2, the major autoantigen in dermatomyositis patients (47–49). It is noteworthy that the sequence homology between APECED protein and these other proteins is strictly limited to the Cys4-His-Cys3 motif. Of these proteins, only APECED protein and Mi-2 contain two PHD motifs, whereas other homologous proteins contain only one PHD finger in addition to some other type of zinc finger motif. In the predicted APECED protein and in the Mi-2 protein, the two PHD fingers are the only zinc finger motifs present. The functional significance of the PHD motif in these proteins remains to be elucidated. Although experimental data on the function of the APECED protein is still lacking, based on the features of the predicted protein it can be suggested that this novel protein may be involved in the regulation of transcription. The putative nuclear location, the high proline content and the presence of the two PHD domains are compatible with this hypothesis.
Lessons from Cloning the Apeced Gene
Since there was no knowledge of the biochemical defect behind the clinical phenotype of the APECED disease, the positional cloning strategy had to be applied to isolate and characterize the defective gene in this disease. The autosomal inheritance mode was determined earlier in an extensive series of Finnish APECED families, and the large family material, well characterized by a few clinicians, offered a good basis for the linkage studies. All families used in the initial linkage mapping were of Finnish origin and thus the possibility of genetic heterogeneity in this family material was unlikely. Furthermore, the concept of one major mutation in Finnish diseases (founder effect) has been proven at the molecular level in multiple Finnish diseases (50). We could thus utilize both the linkage disequilibrium (LD) and the ancient haplotype sharing in our locus restriction (27). All Finnish diseases mapped to date reveal an LD with markers on disease chromosomes and, in the case of some disease alleles, the LD can be detected over surprisingly long genetic distances, reflecting the young age of the disease-causing mutation (26,51,52). In the case of APECED, the genetic interval revealing LD was 2.6 cM, suggesting that the major mutation was introduced into this population ∼50–60 generations ago.
Construction of the physical map across the APECED region was a demanding task due to the poor representation of this chromosomal region in genomic libraries. This might be due to the high GC content or abundant repetitive sequences of the region. The fibre FISH techniques proved to be of great value throughout our cloning effort, especially in the localization and size estimation of uncloned gaps (27).
The identification of the APECED gene was possible by a so-called ‘cloning in silico’ approach although no known ESTs could be identified for this gene in public databases. The APECED gene was first identified from the genomic sequence based on exon prediction data, and the full-length cDNA was isolated by screening of a thymus cDNA library. However, when comparing the cDNA sequence with the genomic sequence of the AIRE gene, it appeared that all gene identification software performed poorly for predicting the structure of this gene. The absence of the AIRE gene in the EST databases is somewhat surprising, since the gene is transcribed in multiple tissues. Further, we could not identify the gene in our cDNA selection experiment, yet it was within the PAC clone that was used as a genomic template for the cDNA selection. However, this gene was identified by exon trapping by Nagamine et al. (43). These results demonstrate that, to identify definitely and clone a gene, different approaches must still be used in parallel. Moreover, our results prove that the currently available tools for cloning ‘in silico’ still need to be developed for a more efficient and reliable genomic sequence annotation.
The target molecules interacting with the APECED protein remain to be identified in order to understand the molecular processes that trigger the specific tissue destruction characteristic of APECED and other autoimmune diseases. Initial experiments should be focused on confirming the predicted role of the AIRE product as a potential new transcription factor. This information could help us to understand the development of the selective breaking of the immununological tolerance in APECED patients. It is interesting that organs showing a high transcription level of AIRE include thymus and fetal liver, both tissues involved in the maturation of the lymphocytes. Future studies will shed light on the development of the pathological immune response seen in APECED and will also provide a basis for the design of novel therapeutic strategies in the future.
The AIRE gene represents the first known case of a single gene defect sufficient alone to cause an autoimmune disease. The presence of a similar type of zinc finger domain in Mi-2 protein, the major autoantigen identified in some autoimmune dermato-myositis patients, may indicate the functional significance of the PHD finger domains in autoimmunity, most probably via defective DNA-binding function of the proteins. The AIRE gene and the corresponding protein now provide a key tool to clarify some molecular mechanisms involved in human autoimmunity.
We are grateful to all APECED patients and families. The constant support and collaboration of Drs Jaakko Perheentupa and Aarno Palotie is highly appreciated. These studies have been supported by the Academy of Finland, the Sigrid Juselius Foundation, the Hjelt Fond of the Pediatric Foundation, the EEC (grant BMH4-CT96-0554) and the Deutches Humangenomepro-jekt (DHGP/BMBF grant OIKW 9608 Teilprojekt 3).