The eighth edition of the haemophilia B database ( http://www.umds.ac.uk/molgen/haemBdatabase.htm ) lists in an easily accessible form all known factor IX mutations due to small changes (base substitutions and short additions and/or deletions of <30 bp) identified in haemophilia B patients. The 1713 patient entries are ordered by the nucleotide number of their mutation. Where known, details are given on: factor IX activity, factor IX antigen in circulation, presence of inhibitor and origin of mutation. References to published mutations are given and the laboratories generating the data are indicated.
Haemophilia B, or Christmas disease, is an X-linked recessive disorder due to mutations causing marked deficit of coagulation factor IX-a glycoprotein of 415 amino acid residues normally present in plasma and an essential component of the clotting cascade. The disease affects one in ∼30 000 males and only very rarely females. The reduced genetic fitness of affected individuals has resulted in the fairly rapid loss of mutations from the population and continued renewal of the population pool of haemophilia B genes. Thus unrelated patients usually carry mutations of independent origin although clusters of defects identical by descent do occur (for general reviews see 1 , 2 ).
The purpose of this database is to assemble in a yearly updated accessible, summary form, molecular data on the causative mutations of haemophilia B patients worldwide. It is not intended to replace primary publications although it does contain a significant amount of unpublished work. We have continued our database numbering system ( 3 ) giving all patients a unique Patient Identity Number (PIN or ID number). As in previous years, we have included repeat observations of the same mutation, as well as molecularly unique mutations.
The factor IX gene lies on the long arm of the X chromosome at Xq27.1 and its entire sequence of 33 kb is known ( 4 ). It contains eight exons (a–h) encoding six major domains of factor IX. These are: (i) exon a, a hydrophobic signal peptide which targets the protein for secretion from the hepatocyte into the blood stream; (ii) exons b and c, a propeptide and gla domain, the latter containing 12 γ-carboxyglutamyl residues. This post-translational modification is required for the correct folding and calcium binding of factor IX; (iii) exon d, a type B , or first epidermal growth factor-like domain , which shows homology to epidermal growth factor (EGF) and, in addition, contains conserved carboxylate residues including a β-hydroxyaspartate at amino acid 64. This domain binds an additional Ca 2+ with high affinity ( 5 ); (iv) exon e, a type A , or second epidermal growth factor-like (EGF) domain , which lacks the conserved carboxylate residues of the EGF type B domain; (v) exon f, an activation domain, within which factor XIa, and VIIa plus tissue factor, cleave twice, converting factor IX to IXa; (vi) exons g and h, the serine protease or catalytic domain , responsible for the proteolysis of factor X to Xa. This region is homologous to other well studied serine proteases (e.g. chymotrypsin) and it is thought likely that his (221), asp (269) and ser (365), all participate in the classical catalytic mechanism.
Factor IX is initially synthesised in the liver as a precursor molecule, either 46, 41 or 39 amino acids (it is not known which, although 39 is probable; 6 ) longer at its N-terminus than the 415-long mature factor IX found in plasma. Processing steps occur in the hepatocyte prior to secretion and sequentially remove the hydrophobic signal peptide and the propeptide. In addition to the γ-carboxylation of the 12 N-terminal glutamyl residues carried out by a vitamin K-dependent carboxylase, and the partial β-hydroxylation of aspartate 64, N-linked carbohydrate side chains are added at residues 157 and 167 and an O-linked carbohydrate at serine 53 and 61 and threonine 159 and 169 ( 7–9 ). The crystal structure of porcine factor IX has been determined recently ( 10 ).
There are 1713 patient entries in this seventh edition of the database compared with 1535 patients last year ( 11 ). Besides point mutations, these show 132 short (defined as <30 nt) deletions or additions or both, made up from 99 deletions, 25 additions and eight examples involving both additions and deletions. There are also 21 double mutations and one triple mutation. Twenty patients are known to have developed inhibitors, two are somatic mosaics and 29 are females (either affected or non-symptomatic carriers). The list excludes the 29 patients with partial or complete gene deletions or more complex rearrangements quoted by Thompson, 1990 ( 12 ) and others observed since ( 13 , 14 ). Complete population studies suggest an incidence of 2–5% for these gross defects in haemophilia B patients. Of the 1713 patients listed, 652 show unique molecular events probably causing the disease, while the remainder are repeats. Many of these repeats occur at CG doublets and involve a CG→TG or CA change. As discussed before ( 3 ), such sites are believed to be genuine ‘hotspots’ for mutation. However it is now becoming clear that the high number of repeat observations at some CG doublets, particularly those causing mild disease (e.g. 87 examples at 31 008) are caused, at least in part, by founder effects. A founder effect is responsible for the many repeats of a mutation at residue 31 311 that is not part of a CG ( 12 , 15 ). There are many examples in the database of repeating mutations, and those with more than five repeats are shown in Figure 1 . Probably some, but not all, of these will have a common origin. The database attempts to offer a view of the spectrum of mutations causing haemophilia B that is as accurate as possible and this is helped by the fact that about one third of all mutations have been detected as a result of full population studies. However, some bias cannot be completely avoided. Obviously there is an over-representation of severe haemophilia-causing mutations as these tend to be the first analysed and the most likely to come to notice ( Fig. 2 ). We also expect under-representation of double mutants as not all laboratories have done ‘complete’ gene screens.
The distribution of mutants according to protein domains and control regions within the gene shows that mutations have been detected in all regions except the poly(A) site. The contribution of different base substitutions to the total of point mutations is shown in Figure 3 , while the breakdown of mutations according to class is shown in Figure 4 . Remarkably, there are now 18 molecularly unique mutants occurring within a short region of the promoter, and these are invaluable in studying gene regulation ( 16–19 ). Missense mutations within exons give valuable information as to the importance of particular amino acid residues. The number of missense mutations (as a percentage of the total number of missense mutations) in each exon is shown in Figure 5 alongside the percentage of total amino acids encoded by that exon. As expected, exons a and f are under-represented by mutations due to the lack of importance of most of their amino acids (prepeptide and activation peptide), whereas exons d and h are over-represented illustrating their importance (calcium binding EGF, and catalytic domain, respectively). Perhaps surprisingly, exon g is also under-represented by mutations suggesting weaker structural constraints on its amino acid sequence. The present list contains 389 different amino acid substitutions (probably detrimental). 113 residues of factor IX show two or more amino acid substitutions and 92 only one. These include the active site serine (amino acid 365), and the proposed active site aspartate (amino acid 269) and histidine (amino acid 221). Mutations at nine of the 12 γ-carboxyglutamyl residues have now been detected, confirming their critical role for the function of factor IX. Amino acid substitutions have also been found for every one of the 22 cysteines of circulating factor IX and therefore mutations have been found that compromise each one of the disulphide bridges of the mature protein, thus confirming the importance of such structures. This year 118 missense mutations were added, of which 27 are new.
This year's database is now available on the World Wide Web ( http://www.umds.ac.uk/molgen/haemBdatabase.htm ), but is too large to publish as a printed table. Copies on disk (in both text and database format) can be obtained from the nearest country coordinator if internet access is not possible. The data are arranged in 10 fields. The ‘Patient’ field gives the code for the patient provided by the contributor: this is usually unique except when the contributors have not given separate codes to patients with the same mutation (e.g. Vancouver, Fr) or a code for their patient (‘unnamed’). The PIN field distinguishes all patient entries. Thus a patient with a double mutation has two database entries but with the same name and PIN. Occasionally, PINs are removed (and not replaced) when two or more patients previously thought to be unrelated and uniquely reported are subsequently shown to be related or reported more than once. Factor IX coagulation (FIX:C) and antigen assays are given (as iu/dl or %) in ‘Clotting’ and ‘Antigen’ fields. The nucleotide numbering is as in ref. 4 , and the amino acid numbering as in ref. 20 . The base change is given in the ‘mutation’ field: a minus or plus sign signifies deletion and addition respectively. The amino acid change is given where relevant or ‘-X’ to signify an amino acid deleted. A yes/no ‘CpG’ field replaces previous years' footnote 4 indicating transitions at CpG sites. The comments field contains further assorted information such as Frameshift, Double (meaning double mutant, cross referenced), N (normal variant), Gla (γ-carboxy glutamic acid), Bm (prolonged bovine prothrombin time; 21 ), Inhibitors (declared presence of), de novo [mutation originating in mother, maternal grandmother (MGM) or maternal grandfather (MGF)]. The ‘reference’ field indicates contributors quoted in detail in a separate file.
The database was compiled by the central coordinators (Giannelli and Green) from separate lists updating the previous year's list prepared by coordinators for the different countries. This year these were as follows: Giannelli and Green representing the UK, Sweden and Iceland; Sommer representing USA; Poon representing Canada; Ludwig and Schwaab representing Germany; Reitsma representing The Netherlands; Goossens representing France; Yoshioka representing Japan; Figueiredo representing South America; and Brownlee, the rest of the world. New data or notification of errors or omissions should be sent to the individual country coordinators.