We have characterized a newly identified 16.6 kb deletion which removes a significant proportion of the human α-globin cluster including the ψζ1, α D , ψα1 and α2-globin genes but leaves the duplicated α1 gene intact. This complicated rearrangement results from a combination of slippage and strand switching at sites of microhomology during replication. Functional analysis shows that expression of the remaining α1 gene is increased, rather than down-regulated by this deletion. This could be related to its proximity to the remote upstream α-globin regulatory elements or reduced competition for these elements in the absence of the dominant α2-globin gene. The finding of a very mild phenotype associated with such an extensive deletion in the α-globin cluster implies that much of the DNA removed by the deletion is likely to be functionally unimportant. These findings suggest that other than the upstream regulatory elements and promoter proximal elements there are unlikely to be additional positive cis -acting sequences in the α-globin cluster.
The human α-globin cluster lies close to the telomere of chromosome 16 (16p13.3) and includes an embryonic gene (ζ2), two fetal/adult α genes (α2 and α1), two minor globin-like genes (α D , θ) and two pseudo genes (ψζ1 and ψα1) arranged in the order; telomere-ζ2−ψζ1−α D −ψα1−α2− α1−θ-centromere (Fig. 1 A). Expression of the α-like globin genes is controlled by remote sequences (multispecies conserved regulatory elements, MCS-R1 to 4) lying 10–48 kb upstream of the α-globin genes, of which MCS-R2 appears to be the major regulatory element ( 1 ).
Natural mutations of the human α-globin cluster and their regulatory elements have provided important insights into the general mechanisms by which mutations may arise and influence gene regulation ( 2–9 ). These chromosome rearrangements in turn provide important information relating genome structure to function. Normal individuals have two α-globin genes (αα) on each copy of chromosome 16 (αα/αα) producing sufficient α-globin chains to balance the β-like globin chains encoded on chromosome 11, thus making the predominant adult (HbA, α 2 β 2 ) and fetal (HbF, α 2 γ 2 ) hemoglobin tetramers. Variant chromosomes with no (- -), or one (-α) α-globin gene are common causes of α-thalassemia, associated with tetramers of excess β [HbH (hemoglobin H), β 4 ] and γ (Hb Bart’s, γ 4 ) globin chains. Typically, individuals with α-thalassemia are anemic, with small red cells (reduced mean corpuscular volume, MCV) containing reduced amounts of Hb (reduced mean corpuscular hemoglobin, MCH). The α-thalassemias are amongst the most common of all human genetic diseases ( 10 ). Rare cases of α-thalassemia result from deletions removing one or more of the regulatory elements ( 1 ).
In addition to mutations that cause thalassemia there are many naturally occurring SNPs (single nucleotide polymorphisms) and CNVs (copy number variations) at the α-globin cluster that do not cause thalassemia or any red cell changes. These mutations either cause only minor changes in α-globin expression or no significant change at all. Such point mutations and rearrangements are useful because they define regions of the chromosome that have no influence on α-globin expression, thus complementing and validating experiments designed to understand how the α-globin cluster is normally regulated within its natural chromosomal context in vivo ( 11 ).
Here we describe a deletion which removes 16.6 kb from the human α-globin cluster, including the ψζ1, α D , ψα1 and α2 genes together with a substantial amount of intergenic non-coding DNA, but leaving the α1 gene intact. Detailed analysis of the breakpoints of the deletion revealed a complex rearrangement in which the 5′ and 3′ breakpoints are joined by an inverted segment of DNA derived from the ψζ1 gene. It seems most likely that this has arisen via the process of slippage and strand switching during replication, a mechanism recently highlighted by Chen et al . ( 12 ) and Lee et al . ( 13 ). We show that this extensive deletion from the α-globin cluster does not down-regulate expression of the remaining α1 gene. The mutation adds significantly to data from previously reported rearrangements of the α-globin cluster and transgenic experiments, demonstrating that (other than the upstream MCS-R elements) there are no additional positive cis -acting elements in the 20 kb containing this multigene family other than those in the immediate vicinity of the globin genes themselves.
Identification of a family with a novel rearrangement of the α-globin cluster
In neonates with α-thalassemia, the imbalance in globin chain synthesis results in an excess of free γ-globin chains which form Hb Bart’s. During switching (γ to β), an elevated level of Hb Bart’s at birth can be detected and provides a sensitive indicator of α-thalassemia ( 11 ). The propositus (EN), of Vietnamese origin, was first identified via a cord blood survey of an ethnically diverse UK population ( 14 ). The level of Hb Bart’s in the cord blood was 4.9%, which lies at the upper end of the levels seen in individuals missing a single α-globin gene (-α/αα).
Using multiplex ligation-dependent probe amplification (MLPA) as an initial screen for DNA deletions, we identified reduction in gene copy number at probes: HBZP, HBAP2, HBAP1 and ψα1−α2 indicating a deletion of at least 8 kb from the α-globin cluster (Fig. 1 A and B). Southern blot analysis of DNA from the propositus demonstrated an abnormal pattern consistent with a deletion. This was confirmed by analyzing an interspecific hybrid containing the abnormal copy of chromosome 16 in a mouse erythroleukemia (MEL) cell background. The most informative results were derived from digests using Bgl II, Hin dIII and Spe I analyzed with ζ and α probes (Fig. 1 A and C). The 5′ breakpoint of the deletion was localized to a region of 5.8 kb between ζ2 and ψζ1 flanked by Hin dIII (co-ordinate 147782) and Bgl II (co-ordinate 153170) restriction sites while the 3′ breakpoint was delimited to a 0.78 kb region between α2 and α1 flanked by Bgl II (co-ordinate 165301) and Spe I sites (co-ordinate 166073) (Fig. 1 A and C). The position of the breakpoints was further refined by designing various primers throughout these breakpoint regions which were tested (in combination) until a specific and reproducible amplification product spanning the breakpoint was obtained (Fig. 1 D). A family study showed that this rearrangement was carried by both the propositus (EN) and the mother (BV), who have very mild hematological defects (Table 1 ). The red cell indices are at the mild end of the range observed with a single α-globin gene deletion (-α/αα) and are bordering the normal range (Table 1 ). The mild phenotype of EN at age nine contrasts with the high level of Hb Bart’s (indicative of a more severe phenotype) detected at birth, suggesting that this defect may have somewhat different effects during fetal and adult life. As a result of the very mild adult phenotype, we would anticipate that had this patient not been detected in a neonatal screen for Hb Bart’s at birth, she would have most likely evaded detection by conventional screening strategies in later life.
|BV (mother)||EN (propositus)||TN (father)|
|Age (at testing)||31||9||26|
|RBC × 10 12||4.75||4.9||5.23|
|HbH inclusions||0||0||Not tested|
|Genotype||-α 16.6 /αα||-α 16.6 /αα||αα/αα|
|BV (mother)||EN (propositus)||TN (father)|
|Age (at testing)||31||9||26|
|RBC × 10 12||4.75||4.9||5.23|
|HbH inclusions||0||0||Not tested|
|Genotype||-α 16.6 /αα||-α 16.6 /αα||αα/αα|
The data show the hematological results detected in the propositus (EN, at age 9), her mother (BV, at age 31) and her father (TN, at age 26). The mother (BV) and the propositus (EN) are mildly anemic and lie within the ranges observed in individuals with the -α/αα genotype (frequently referred to as silent α-thalassemia) which also overlaps the normal range ( 33 ).
Sequence analysis across the deletion breakpoints
The PCR product, which specifically amplifies this rearrangement (Fig. 1 D), was sequenced. Comparison with the wild-type sequence shows that the 5′ deletion breakpoint lies between 148978 and 148982 and the 3′ breakpoint between 165614 and 165622. Surprisingly, the breakpoints were found to be joined by an insertion of unknown sequence of ∼360 bp. Subsequent analysis showed the insertion to contain a series of imperfect tandemly repetitive elements with high homology (99%) to a region in the ψζ1 gene (co-ordinates 153221–153582) which contains a 248 bp simple tandem repeat (GGGGA) n (Fig. 2 ). The insertion is also related to a similar region in the ζ2 gene (89% homology). It is of interest that the insert at the breakpoint lies in the opposite orientation to that seen in the wild-type ψζ1 pseudogene. Close inspection of the 5′ and 3′ breakpoints revealed some ambiguity; the extent of the deletion is between 16632 and 16644 bp and the insertion is between 361 and 373 bp (Fig. 2 A). It was not possible to define these sizes more precisely because of microhomology between sequences flanking the deletion and sequences flanking the repeat elements of the inserted DNA (Fig. 2 A and B). It seems most likely that this rearrangement (designated -α 16.6 ) arose via an error in replication repaired by switching of DNA strands as described in Discussion.
To ensure that the phenotype resulted solely from this rearrangement, the remaining α-globin genes and the major upstream regulatory element (MCS-R2) were also sequenced and found to be wild-type, as were the β-globin genes.
Functional analysis of the -α 16.6 chromosome
Although the 16.6 kb deletion removes a significant portion of the α-globin cluster (including ψζ1, α D , ψα1 and α2 genes) it leaves the α1 gene intact. The hematological data in the two family members who carry this deletion (Table 1 ) are consistent with the mild type of α-thalassemia which occurs in individuals with three functional α-globin genes (-α/αα) rather than four (αα/αα) ( 11 ) implying that the remaining α1-globin gene on the abnormal chromosome (-α 16.6 ) is expressed normally.
To investigate expression from the α1 gene directly we established interspecific (MEL × chromosome 16) hybrids containing single copies of either the normal (αα) or abnormal (-α 16.6 ) chromosome 16 from the patient, BV. Such hybrids were induced into terminal erythroid maturation (and globin gene transcription). Since some hybrid cells may lose a whole or partial copy of chromosome 16 during cell division, Southern blot analysis was used to estimate human α-globin gene copy numbers at the time of harvesting each culture (data not shown).
An RNA protection assay (RPA) identifying human and mouse specific transcripts was used to estimate the relative amounts of human and mouse α-globin in each hybrid culture ( 15 ). These values were then corrected for the estimated human and mouse gene copy numbers per cell. Abundant amounts of human α-globin RNA were detected in hybrids containing the -α 16.6 chromosome, confirming that the remaining α1 gene on this chromosome was active (Fig. 3 A). Quantitation of these data (Fig. 3 B) provisionally suggested that this α1 gene was expressed at higher levels than the normal α1- and α2-globin genes. More importantly, finding no significant down-regulation of the remaining α1 gene is consistent with the mild phenotype observed in adults (Table 1 ).
Chromatin structure and recruitment of RNA polymerase II at the -α 16.6 chromosome α1 gene
To investigate the functional status of the remaining α1 gene in more detail, we looked for markers of gene activation across the α-globin cluster on the mutant -α 16.6 chromosome.
Gene activation is usually associated with histone acetylation (H3ac and H4ac) at the promoter of the gene in question ( 16 ). We therefore determined the pattern and quantified the relative levels of histone acetylation across both normal (αα) and abnormal (-α 16.6 ) chromosomes in induced MEL hybrids at specific points of interest by chromatin immunoprecipitation (ChIP) (Fig. 4 A). In general, the pattern of H3ac and H4ac over the intact α-globin gene and the surrounding region on the -α 16.6 chromosome was very similar to that observed on the normal chromosome (αα) (Fig. 4 B and C). These patterns of histone modification were also consistent with our previous analyses of histone acetylation across the human α-globin cluster in primary erythroid cells, with H3ac primarily occurring at the promoter and body of the gene and H4ac occurring at both the gene and the upstream regulatory elements ( 17 ). The bodies of the nearby, ubiquitously expressed housekeeping genes ( c16orf8 and LUC7L ) demonstrated relatively low levels of acetylation (serving as a negative control) whereas the promoter of one such gene ( c16orf35 ; probe HS-14 on Fig. 4 ) was acetylated (serving as a positive control).
We next analyzed the recruitment of RNA polymerase II (Pol II) at the α-globin cluster by ChIP. Pol II was present at the α-globin promoter on the -α 16.6 chromosome. Furthermore, the level of enrichment was significantly higher than on the normal chromosome 16 (Fig. 4 D) consistent with the higher level of RNA expression per gene seen from this chromosome. As expected, comparatively low levels of enrichment were observed at the other points of interest across the entire α-globin gene cluster, upstream regulatory elements and surrounding genes of both wild-type and mutant chromosomes (Fig. 4 D). These results therefore confirm that Pol II is recruited to the activated, single, intact α1 gene on the -α 16.6 chromosome and may be increased with respect to each of the two α genes on a normal chromosome (αα).
Although the globin gene clusters are amongst the most intensively studied of all loci, the details of their tissue-specific and developmentally regulated expression are still not clear. This underscores the complexity and consequent difficulties in establishing the general principles underlying mammalian gene regulation. This problem is compounded by observations demonstrating that some of the mechanisms (both in cis and trans ) by which specific genes are regulated differ between human and experimental models, such as mouse ( 18 , 19 ). Furthermore, the interpretation of transgenic models may be compromised by factors such as chromosomal position effects and variation in gene copy numbers. Careful analysis of natural variants in human populations therefore continues to provide an invaluable and essential approach to complement experimental systems.
Superficially the human α-globin cluster, which spans ∼30 kb comprising five genes and two pseudogenes, appears complex and yet here we have shown that removal of 16.6 kb including the ψζ1, α D , ψα1 and α2 gene does not down-regulate the remaining α1 gene suggesting that this region contains no positive cis -acting elements. This observation prompted us to review previously published deletions that either have no discernible effect on α-globin expression, or remove one of the duplicated α-globin genes but have no effect on the remaining gene (Fig. 5 A). Together these mutations suggest that the only critical regulatory sequences in this region are those spanning the structural gene and its promoter (∼3.5 kb of the 30 kb).
The other critical cis -acting sequences lie outside the multigene family. These are the four highly conserved non-coding sequences referred to as upstream regulatory elements (MCS-R1 to 4). Natural deletions of these elements which cause α-thalassemia all remove MCS-R1 and R2 ( 20 ), and experimental data suggest that MCS-R2 is the major regulatory element ( 1 ). None of the other elements has enhancer activity on their own or in combination and their role (if any) is not yet clear.
It has recently been shown that during erythropoiesis, the upstream elements and the α-globin promoters become progressively bound by key transcription factors to form multiprotein/DNA complexes which eventually physically interact as Pol II is recruited to the α-globin promoters ( 21 ). A similar interaction has also been observed at the β-globin cluster ( 22 ); in this case between the upstream elements (referred to as the β-globin locus control region, β-LCR) and the promoters of the five β-like genes. It has been proposed that the β-LCR can only interact with a single promoter at any time, creating competition between multiprotein complexes at each promoter for access to the β-LCR. If such competition for the α-globin upstream elements occurred between the different adult α-globin promoters (α2, α1) this could explain why we observed increased loading of Pol II and RNA transcription (per gene copy) from the single α-globin gene in the -α 16.6 chromosome: in this case, all of the activity of the upstream elements would be directed to one gene rather than two (Fig. 5 B). In support of this, it has been previously suggested that removal of a single α-globin gene in the common deletions that cause α-thalassemia (-α 3.7 and -α 4.2 ) may be associated with a compensatory increase in the expression of the remaining α1 gene ( 23 ). Although this scenario is most likely we cannot exclude the possibility of the deletion removing repressive cis -acting elements. This might be resolved in the future by performing similar analysis using chromosomes carrying the common -α 4.2 deletion.
Natural mutations may also help to elucidate other nuclear processes including replication and recombination. Most frequently α-thalassemia results from deletion of one (-α) or both (- -) of the duplicated α-globin genes. Such chromosomes occur at high frequencies in all tropical and subtropical regions of the world because heterozygotes (-α/αα, -α/-α and - -/αα) are thought to be at an advantage in the presence of endemic falciparum malaria ( 24 ). This is balanced by the severe phenotypes of compound heterozygotes (- -/-α, HbH disease) and homozygotes (- -/- -, Hb Bart’s hydrops fetalis). To date >50 deletions associated with α-thalassemia have been described and they provided amongst the first examples of non-allelic homologous recombination, and illegitimate recombination in the human genome. The -α 16.6 chromosome described here provides an example of a complex rearrangement in which a region between two breakpoints has been repaired by an inverted sequence derived from the ψζ1 gene (Fig. 2 ). The structure of this rearrangement is very similar to that found in the breakpoint of the - - MED deletion, first described almost 20 years ago ( 6 ). At that time we proposed that the - - MED deletion had arisen by repairing a double strand break (arising during replication) with an Okazaki fragment from the lagging strand of a closely opposed replication loop. This mechanism has more recently been set out in greater detail ( 13 ). In Lee et al . ( 13 ), it was proposed that a replication Fork Stalling and Template Switching (FoSTeS) event can occur when the nature of the sequence causes a replication fork to stall or pause such that the lagging strand disengages and switches to a nearby template at another active replication fork where there is a region of microhomology. It therefore seems likely that the -α 16.6 rearrangement has arisen by a similar mechanism as illustrated in Figure 6 and that this may be a common mechanism underlying deletions, insertions, transpositions and duplications in the human genome.
Over the past few years, in the field of human genetics, there has been considerable interest in understanding the origins and influence of CNV on gene expression. The detailed analysis of a single locus continues to shed new light on such effects and gives some insight into the potential complexity associated with this phenomenon.
MATERIALS AND METHODS
Patient consent was obtained in accordance with standard ethics approval guidelines. Full blood counts, examination of peripheral blood films, high performance liquid chromatography (Biorad Variant™, Biorad, Hercules, USA) analysis of hemoglobin and identification of HbH inclusions were performed as previously described ( 14 , 25 ).
Epstein-Barr virus (EBV)-transformed B-lymphocyte cell lines were established from the mother of the propositus (BV) and a number of normal individuals. Interspecific hybrids containing the abnormal chromosome -α 16.6 were established by fusing EBV-transformed lymphocytes from the patient and normal to MEL cells as previously described ( 26 ). These cells were induced for 3 days to undergo terminal differentiation using HMBA (25 m m ) and Hemin (0.3M) as previously described ( 15 ).
Characterization of -α 16.6 deletion breakpoints
DNA was extracted with phenol and chloroform using standard methods. The α-globin genotypes were determined as previously described ( 27 ). MLPA (Service XS, Leiden, The Netherlands) was performed as described previously ( 28 ). Further mapping of the -α 16.6 breakpoint by Southern blot analysis was performed with a variety of restriction enzymes (including Bgl II, Hin dIII and Spe I) using α and ζ probes. Known DNA sequences around the 5′ and 3′ breakpoints were used to design forward 506 (5′-CTCATCCAGACTCTCCAGCTG-3′) and reverse 507 (5′-CAAGTACACACAGAGGTGC-3′) PCR primers which amplified across the deletion. The 1091 bp breakpoint fragment generated was sequenced directly using the same primers, in addition to a nested primer, 505 (5′-CCATCTATCAACAGAAGCAA-3′). The patient’s remaining α-globin genes and the major upstream regulatory element (MCS-R2) were also amplified and directly sequenced. The primers and DNA sequence are available on request.
Determination of human α-globin gene copy number
Southern blot analysis was performed to calculate the ratio of human α-globin gene to mouse α-globin gene in the cultures ( 15 ). DNA from MEL hybrids was digested with Pst I and the subsequent blot was hybridized concurrently with probes specific for the human α- and mouse α-globin genes ( 15 ). The intensities of the human α and mouse α signals were measured using the Storm™ Phosphorimager (GE Healthcare Life Sciences, Amersham, UK) and a ratio of human α:mouse α signal was determined for each culture.
Total RNA from induced MEL hybrid cells was prepared with TRI reagent (Sigma, St Louis, USA). RPA was performed using 32 P-labeled mouse and human α-globin Sp6-derived cDNA probes (SP6/T7 transcription kit; Roche, Basel, Switzerland) as described previously ( 15 , 29 ). The products were resolved by polyacrylamide electrophoresis, and the specific ‘protected’ fragments were quantitated using the Storm™ Phosphorimager (GE Healthcare Life Sciences). Human α-globin expression was calculated as a ratio of human α to mouse α RNA signal, after correction for the number of labeled nucleotides in each probe. The values obtained were adjusted for human α-globin content using Southern blot data.
ChIP was performed as previously described ( 21 ) using the ChIP assay kit (Upstate/Millipore, Billerica, USA). Each ChIP was performed on 1.0 × 10 7 MEL hybrid cells. Immunoprecipitation of the DNA/protein complexes was performed directly on fresh sonicated chromatin overnight at 4°C with antibodies against the protein of interest ( 21 , 30 ). Antibodies used were: anti-diacetylated histone 3 (Upstate 06-599) anti-tetra-acetylated histone 4 (Upstate 06-866) and anti-Pol II (H-224) (Santa Cruz Biotechnology, Santa Cruz, USA sc-9001). Quantitative PCR (q-PCR) of immunoprecipitated DNA was performed by Real Time on the ABI 7000 Sequence detection system (Applied Biosystems, Foster City, USA) using forward and reverse primers and FAM-TAMRA modified probes (manufactured by Eurogentec, Liege, Belgium) ( 21 , 30 ). The quantitative data from q-PCR analysis were expressed as fold enrichment of bound material over input (relative to endogenous mouse Gapdh ). Fold enrichments were calculated by a modification of the comparative C T method (ΔΔCT) described by Applied Biosystems.
This work was supported by the Medical Research Council, The National Health Service (UK) and the Oxford Biomedical Research Centre.
The authors would like to thank Dr Adrian Stephens, Dr Roger Amos and Dr M. Beekes for providing blood samples for the study; Sue Butler for technical assistance; and Liz Rose and Nicki Gray for their help in preparing the manuscript. They are also grateful to Professor Bill Wood and Dr Richard Gibbons for their comments on the manuscript.
Conflict of Interest statement . None declared.