Classic M protein serotyping has been invaluable during the past 60 years for the determination of relationships between different group A streptococci (GAS) strains and the varied clinical manifestations inflicted by these organisms worldwide. Nonetheless, during the past 20 years, the difficulties of continued expansion of the serology-based Lancefield classification scheme for GAS have become increasingly apparent. By use of a less demanding sequence-based methodology that closely adheres to previously established strain criteria while being predictive of known M protein serotypes, we recently added types emm94–emm102 to the Lancefield scheme. Continued expansion by the addition of types emm103 to emm124 are now proposed. As with types emm94–emm102, each of these new emm types was represented by multiple independent isolates recovered from serious disease manifestations, each was M protein nontypeable with all typing sera stocks available to international GAS reference laboratories, and each demonstrated antiphagocytic properties in vitro by multiplying in normal human blood.
In 1928, Dr. Rebecca Lancefield described a system for typing group A streptococci (GAS)  based on the variable antigenic properties of a heat stable surface protein. Antibodies to this protein (M) were shown to be type (strain) specific and antibacterial through opsonic activity. The presence of the surface M protein appeared to allow the bacterium to survive in the human host. From 1928 through the late 1950s, Lancefield described M type strains with corresponding type-specific antisera M1 to M50. M type 51 was the last M type validated at the meeting of the International Subcommittee on Nomenclature of Bacteria Subcommittee on Streptococci and Pneumococci in 1966  through a procedure agreed upon by Dr. R. Lancefield (Rockefeller University, New York), Dr. J. Rotta (Streptococcus WHO Collaborating Center, Prague), Dr. M. T. Parker (Streptococcus laboratory in Colindale, England), and Dr. M. Moody (Streptococcus laboratory at the Center for Disease Control, Atlanta). This agreement included several criteria for designating new M types of GAS. Lancefield extracts from new M serotype strains could not react specifically with any known M type antisera after absorption with other known types. The precipitating type-specific antisera had to be prepared and characterized by the original investigator and by 1 of the 4 reference laboratories. The test strain had to demonstrate a 32-fold (5 generation) increase in cell count after 3-h rotation in normal human blood . Furthermore, specific M antisera against the strain raised in rabbits added to the human blood had to show inhibition of the growth of the immunizing strain in an in vitro bactericidal test . The M serotype also had to be identified from multiple independent and clinically significant isolates .
From 1965 through 1976, 30 additional M-type strains (M serotypes 52 to 81) were characterized by the 4 reference laboratories . Although the M type designations M52 to M81 are currently commonly used, 2 of these designations were premature, because the reference strains of M types 44 and 27 were recently associated with the M61 and M77 serotypes, respectively . Not coincidentally, the deduced M27 peptide sequence is identical to M77  and the M44 N terminal sequence is identical to M61 [6, 7].
From 1976 through 1990, several investigators described new “provisional” M-type strains, from which individual investigators had prepared type-specific precipitating antiserum that also contained opsonic antibodies . These results were not verified by a second laboratory, which prevented their inclusion into the Lancefield classification system. In the late 1980s, there was a documented increase in severe invasive disease caused by GAS [9–11], and it became apparent that many isolates could not be typed with available M-typing antisera. During this time, investigators determined that the dissimilar N termini of M proteins were M-type specific and contained opsonic epitopes [12–15]. It was soon shown that 5′ emm sequences could be efficiently and reliably used to predict M serotypes [5–7, 16–19]. Although GAS strains are often associated with as many as 2 additional emm-like genes, cumulative data from emm gene cloning and expression experiments, and the correlation of sequence-based data with available serotype information, suggest that, in these strains, it is the emm gene that encodes M serotype specificity and type-specific opsonic epitopes (for examples, see [5, 15, 20–22]). It is important to note that most classic M protein gene (emm) designations (e.g., types emm1 to emm81) and all recent emm sequence designations have relied solely on sequencing results of amplicons generated by use of oligonucleotide primers specific for what is thought to be the M protein gene, and not other emm-like genes [5–7]. Although there is nearly always a 1 : 1 relationship between 5′ emm sequence type and M serotype [5, 7, 16], the terms "M types" and "emm types" refer to serotypes and 5′ emm sequence types, respectively.
Workshops were held in 1997 and 1999 by 6 international GAS reference laboratory representatives to standardize an emm typing system as an extension of the Lancefield M typing system and to establish a culture exchange process [23–25]. A Web site was established with an emm sequence database that could be used for emm typing (http://www.cdc.gov/ncidod/biotech/infotech_hp.html). The workshop participants agreed that 12 of the 15 provisional M types  fit the required serologic and sequence-based criteria for new M and emm type designations (M82 to M93 and emm82 to emm93) . At present, there are 83 GAS M serotypes unequivocally acknowledged by the authors to be both serologically unique and encoded by unique emm gene sequences. These are designated within the sequence of M1 to M93, all of which are described at the aforementioned Web site. Certain M serotypes that were not included were found to be from non–group A organisms, to be identical to another group A type, or to correspond to an existing M serotype.
Investigators had previously reported several new emm gene sequence types likely to represent new M serospecificities because of their dissimilarity to known emm sequences corresponding to conventional and provisional M serotypes (for examples, see [17, 18]). A working committee reported on an exchange of 13 strains of GAS reportedly having novel emm sequence types , 8 of which had emm sequences that passed uniform criteria for new types and were designated types emm94 to emm102. Ideally, serologic characterization of these strains will be performed in the future, allowing for both M serotype and emm–sequence type designations. Although serologic characterization of M proteins provides extremely valuable information, it is no longer practical for timely and much-needed extension of the Lancefield GAS classification system. Therefore, it is expedient to recognize potential new M virulence protein serotypes by virtue of new emm sequence types alone.
GAS-mediated disease continues to be a major problem worldwide. By some estimates, there are millions of cases of GAS pharyngitis, which have led to billions of dollars in medical expenses and work stoppage in the United States alone. Approximately 10,000–15,000 cases of invasive GAS disease occur annually in the United States, which are associated with a 10%–13% mortality rate (see http://www.cdc.gov/ncidod/dbmd/abcs/ for invasive and noninvasive disease incidence). Throughout the past 50 years, numerous reports concerning group A streptococcal epidemiology and pathogenesis have relied on the M serotype to designate strain type. This was reasonable on the basis of the general consensus that the M protein is the key virulence factor of GAS. In addition, multilocus sequence typing has verified that M serotype (or emm sequence type) generally reflects GAS clonal type . For these reasons, we think that it is important to keep a uniform M protein/emm gene nomenclature. Here we report on an exchange of strains between the 6 group A streptococcal reference laboratories resulting in 22 additional sequence types designated emm103 to emm124.
Materials and Methods
Strains. GAS strains were provided by investigators in Papua New Guinea (D. Lehman), Malaysia (F. Jamal), India (A. Kalia), Thailand (C. Pruksakorn), New Zealand (D. Martin), the United States (Centers for Disease Control and Prevention [CDC]), Egypt (S. El Tayeb), and Argentina (H. Lopardo).
Serologic testing. Cultures were distributed among the 6 laboratories, T typed, and tested for opacity factor (OP) reaction. Lancefield extracts were prepared and tested for M precipitin reactions by using available antisera (see Efstratiou et al.  for complete list of available M antisera at each laboratory). All laboratories followed the same procedures for T typing, antiopacity (AOF) factor typing, and M typing . Cultures were tested for survival in a blood rotation test that was based on Dr. Lancefield's method for testing survival and multiplication in human blood (method 2) .
Sequence typing. emm and sof PCR and sequencing were performed as described elsewhere [5, 6]. emm sequences were compared with the CDC data bank (http://www.cdc.gov/ncidod/biotech/infotech_hp.html) and to the GenBank (http://www.ncbi.nlm.nih.gov/BLAST/). To represent a new emm type, the 5′ gene sequence encompassing the first 160 bases from a designated primer annealing site sequence (encoding the membrane export leader and the type specific tip of the mature M protein) must be unique, with >95% sequence identity to other known emm sequences. A single deletion or insertion of up to 7 codons not shifting the reading frame is tolerated within an emm sequence type; however, any alterations of the reading frame for ã7 codons results in a different emm sequence type designation. For new emm type designations, identical emm amplicon sequencing results were obtained for at least 2 reference laboratories.
Dendrogram construction. The dendrogram presented in figure 1 was constructed by sequential use of the PileUp (gap creation and extension penalties of 8 and 2, respectively), Distances (uncorrected distance), and GrowTree (neighbor-joining) programs from the Wisconsin Package, version 10.1 (Genetics Computer Group).
General findings concerning newemmtypes. All strains were originally isolated from 1995 through 1998, were M nonserotypeable with existing M antisera, and multiplied ⩾6 generations in the blood rotation test, indicating functional antiphagocytic properties (data not shown). This was important because, before preparing M-type–specific antiserum, it must be shown that the strain can survive in blood in the absence of type-specific bactericidal antibodies. Homologies of the deduced N-terminal 50 M-protein residues of these new types to those of the closest corresponding Lancefield types ranged from 40% to 82% sequence identity (table 1).
The number of clinical isolates identified from various studies with the same emm sequence type as the reference strain ranged from 2 to 106 (table 1). The assignment of new emm types is given in the first column of table 1. The details of the sources of the cultures and limited clinical information concerning isolates within these and other emm types (as well as alleles within emm types) can be found at the Web site http://www.cdc.gov/ncidod/biotech/infotech_hp.html, which is updated every 6–12 months.
The results of T typing and OF reactions [27, 28] were in general agreement between the 6 reference laboratories (table 2), with only emm type reference strains for st2035 and st436 displaying some inconsistencies.
All strains, regardless of OF phenotype, were tested for the presence of the sof gene by PCR. Only 1 of the 11 OF-negative strains was sof PCR positive (type st2267) but not phenotypically OF positive. The remaining 10 OF-positive strains were associated with unique 5′ sof sequence types, with the exceptions of the st3018 and st1160 reference strains, which carried sof8 and sof2967 sequences, respectively . The sof8 and sof2967 genes are consistently found in reference strains and clinical isolates of the corresponding emm types (emm8 and st2967). The AOF factor test relies on the fact that antisera against GAS streptococcal serum OF generally inhibit OF activity in a strain-specific manner. The observation that the st3018 reference strain was AOF nontypeable is inconclusive, because none of the laboratories had AOF-8 typing sera .
Similar to observations based on the OF phenotype , the 5′ emm sequences from these new types were divisible into either sof-positive or sof PCR–negative clusters (figure 1). Within the dendrogram shown, the closest deduced Lancefield serotype M protein match to each of the 22 sequence types (table 1, column 5) have been added as reference points.
Information concerning individual newemmsequence types. st2034 (emm103) was originally found in the blood isolate recovered from a patient in Papua New Guinea and was also associated with an epidemic of skin infections in 16 patients living in Hawaii. In addition, st2034 isolates have been isolated from patients with skin, throat, or invasive infections in 5 other countries (table 1). Consistent with the M nontypeability of type emm103 (st2034) isolates, the closest match to the Emm103 50 N-terminal residues (the M87 N-terminal 50 residues) shares only 66% identity (table 1). All 5 type st2034 isolates tested, which were from 5 different geographic areas that represent 4 different countries, shared the same unique 5′ sof2034 sequence .
st2035 (emm104) was originally found in a blood isolate recovered from a patient living in Papua New Guinea (strain SS1664 in table 2). At the CDC, we have identified 5 widely geographically separated st2035 isolates (table 1) recovered from blood, wound, throat, and impetigo lesions [29, 30]. Nine additional type st2035 isolates have been identified from patients living in New Zealand. Two different sof sequence types have been associated with different emm type st2035 isolates. The unique sof1457 5′ sequence was found in the reference strain for st2035 described in table 2. As shown in table 2, the results of OF phenotype testing of this strain were inconsistent between the 6 laboratories. The AOF type 4 consistently found for the reference strain by 2 laboratories likely reflects shared epitopes between the Sof1457 and Sof4 proteins. Because the sequences that encode the enzymatic domains of these large proteins are incomplete, this relationship remains unclear.
st4529 (emm105) was originally found in a blood isolate recovered from a Malaysian patient . Four other isolates from wound, skin, and blood culture sources from 3 additional geographic locations were found to have the same emm type (table 1).
st4532 (emm106) was originally found in a blood isolate (strain SS1416) recovered from a Malaysian patient , and it has subsequently been identified from skin infections in Egyptian patients  and Nepalese children.
st4264 (emm107) was recently identified from a knee aspirate isolate recovered from a patient living in Malaysia. An additional st4264 isolate was identified from an isolate recovered from a patient in New Zealand. It is interesting that one laboratory identified SS-1551 (st4264) as being inhibited by AOF serum for provisional type TR2407 originally isolated from a skin infection of a patient in Trinidad (table 2) . Another laboratory found strain SS-1551 to be AOF type 89. Although the sof4264 and sof89 5′ sequences are distinct, the complete sof gene sequences of these type strains may help to elucidate the basis of these AOF typing results. In addition, the emm and sof gene sequences from the TR2407 strain will be obtained.
st4547 (emm108) was originally identified in a throat isolate recovered from a patient with acute pharyngitis living in Malaysia. Isolates with this type have also been recovered from patients with skin or throat infections in 4 additional countries, which include a throat isolate recovered from a patient with rheumatic fever and a sibling of a patient with rheumatic fever (D.R.M., unpublished data).
st3018 (emm109) was originally identified from a Malaysian patient with a skin infection . Blood, skin lesion, and throat isolates of type st3018 have also been isolated from patients living in New Zealand and Egypt .
st4935 (emm110) was originally found in throat culture isolates recovered from asymptomatic patients living in India. Type st4935 isolates were also identified from various specimens (ankle fluid, blood, brain, throat) obtained from patients living in 4 additional countries (table 1). These isolates appear to represent genetically heterogeneous strains because they have diverse sof gene sequences and T-agglutination patterns  (authors' unpublished data).
Four laboratories identified strain SS-1422 (st4935) as AOF type 63. The sof63 and sof4935 sequences are only partially complete, but they do share high homology over the 3′ boundaries of their partial sequences, which includes sequence encoding ∼20% of the Sof putative enzymatic domains (see GenBank accessions Af133806 and AF139754 for sof63 and sof4935 sequences, respectively).
st4973 (emm111) was originally isolated from the throat of an asymptomatic patient living in India. Four additional isolates with this same type were recovered from samples of the throat, CSF, and ascitic fluid of Indian patients. st4973 was also identified from blood and throat cultures of patients in Brazil and California.
stcmuk16 (emm112) was originally found in throat isolates recovered from patients living in Thailand . Additional isolates from sterile and nonsterile sites with this emm type were recovered from patients residing in the United States, Brazil, and Russia. The 5′ sof sequence of strain SS-1550 (stcmuk16) is unique and was also found in the Brazil stcmuk16 isolates (authors' unpublished data), which suggests high genetic relatedness between geographically widely separated stcmuk16 isolates.
The st2267 (emm113) sequence is nearly identical to the 560-base accession AF078068 that was submitted to the GenBank by C. S. Chiou and S. W. Lin. The isolate with this emm sequence was reported to be associated with case of a scarlet fever in Taiwan. Ten isolates were identified from patients living in New Zealand (1 from a blood culture, 2 from skin swabs, and 7 from throat cultures). The st2267 reference strain has a unique OF gene, sof2267, but none of the 6 laboratories were able to detect positive OF reactions (table 2).
st2967 (emm114) was originally identified from a blood isolate (SS-1357) recovered from a patient living in San Francisco during 1995 . By the end of year 2000, we had identified 99 additional isolates from the Active Bacterial Core Surveillance (ABCs) of invasive disease in the United States, which represented ∼3% of all invasive GAS in a 7 state ABCs population-based study during the years of 1995–2000. We also identified st2967 from isolates recovered from 4 other countries. These isolates have been demonstrated to be serologically distinct from M and AOF types of accepted Lancefield reference strains by the Edmonton Laboratory group, which have prepared M and AOF antisera to the reference type st2967 strain . The sof2967 gene (partial 2587 base GenBank accession AF13749) and the AOF type 2967 is shared between SS-1357, randomly selected st2967 isolates from varied geographic locations, and also clinical isolates of the distinct emm type st1160 (emm124 in table 1) . st2967 isolates have been shown to be uniformly bacitracin resistant (M.L., unpublished data).
st2980 (emm115) was originally found in a blood isolate (SS-1366) recovered from a patient living in San Francisco during 1995 . Eleven additional isolates (at least 4 of which were from sterile sites) have been identified from the United States, Brazil, and Russia.
st2370 (emm116) was originally identified from a cerebrospinal fluid isolate (strain SS-1467) recovered from a patient living in San Francisco during 1997. Two additional isolates (blood and wound isolates) were identified from a GAS outbreak investigation in Maryland. Six st2370 isolates were identified from patients living in New Zealand and Nepal.
Type st436 (emm117) was originally identified from a blood isolate (SS-1363) recovered from a patient living in Connecticut as part of the ABCs surveillance system during 1995. Four additional sterile site cultures have been identified from 3 different states in the United States.
st448 (emm118) was originally identified from a wound culture of a patient living in Connecticut during 1995 and from 4 other surveillance states in the United States. We have also identified st448 from patients in Brazil, Egypt, Nepal, and New Zealand. Isolates from the United States with this emm type have been typed with components of the T-3/13/B complex (as the reference strain in table 2) or have been T nontypeable, whereas st448 isolates from Brazil were type T-6. In addition, the sof sequence type from the isolates from the United States (sof448) is distinct from that of the Brazil isolates (sof3894) . Both st448 isolates from the United States and Brazil cross-react with anti-M49 serum in gel diffusion tests (nonidentical line of diffusion with M type 49 reference extracts), which is consistent with the high similarity between the mature N termini of Emm118 and M49 .
st3365 (emm119) was identified from blood cultures of patients with invasive disease living in Connecticut and Brazil. An additional throat isolate was recovered from a child in Nepal.
st1135 (emm120) was identified from skin lesions on patients living in Egypt .
st1161 (emm121) was identified from 2 impetigo isolates recovered from patients living in Egypt .
st1432 (emm122) was identified from an impetigo lesion and throat isolates recovered from patients living in Egypt and Nepal.
st6949 (emm123) was identified from sterile site isolates recovered from patients in Argentina, San Francisco, and New Zealand.
st1160 (emm124) was identified from throat and skin isolates recovered from patients living in Egypt  and New Zealand . st1160 was also found in a corneal isolate from a patient in Malaysia. st1160 is an unusual type in that the reference strain and another independent isolate, typed as M2 and AOF-ST2967 by use of antisera (including AOF typing serum prepared against the proposed emm114 reference strain) prepared in the Edmonton laboratory . The other 5 laboratories found this reference strain to be M nontypeable. Nonetheless, a potential basis of the Edmonton serotyping results is readily apparent from observed sequence homologies. Although their N-terminal tips are divergent, mature Emm124 residues 21–62 share sequence identity with the corresponding residues 13–54 of the M2 molecule. The 5′ sequence of the sof gene from strain SS1536 (st1160), as well as the restriction cleavage profile of the entire structural gene, is identical to sof2967 .
The majority of the new emm sequence types presented here appear to represent virulent clones or M protein types that are widely geographically disseminated. The importance of establishing a system for tracking new types of GAS cannot be overemphasized. So-called new sequence types have recently been found to share near-complete emm sequence identity and M serospecificity with strains serologically characterized decades ago, clearly demonstrating the need for a system to validate and properly document new types [5, 6, 7, 23]. Sequence-based recognition of new types has allowed the identification of prevalent types that have almost certainly gone unrecognized for many years. For example, emm92/M92 was recently identified in isolates from California and New Zealand [18, 23, 33]. This sequence type cannot be said to have recently emerged, because we have found isolates with this type that have been in the CDC culture collection for as long as 40 years. It is striking that emm92/M92 constituted 2% of invasive isolates in the United States in a 7-state, population-based surveillance study conducted during the years of 1995–1999. Such information is vital to researchers currently formulating multivalent M-protein vaccines representative of common circulating GAS isolates. On the basis of population-based emm typing data during the years of 1995–2000, a vaccine composed of 25 type-specific components could theoretically protect against ∼90% of invasive GAS in the United States (see http://www.cdc.gov/ncidod/biotech/images/emmdistr.tif).
It has been previously noted that emm sequences are generally phylogenetically divisible into OF-positive and OF-negative types . Studies of mga loci have revealed mosaic structures of emm and emm-like genes indicative of intraspecies horizontal recombination events . Together, these 2 observations indicate that there may be natural barriers preventing recombination between these 2 broad divisions of GAS. The 22 new sequence types that are presented here follow this trend in that they and each of their closest Lancefield M type matches are divisible into sof (serum OF gene)–positive and –negative groups on the basis of phylogenetic analysis of N-terminal M-protein segments (figure 1).
Although emm types do usually appear to correspond with specific clones , emm type and other individual typing parameters should not be used independently in designating strain types. For example, infrequently, isolates sharing identical M serotypes and emm sequence types can be distinguished by different T agglutination patterns, sof genes, pulsed-field electrophoresis (PFGE) profiles, multilocus sequence profiles, amplified-fragment length polymorphisms, or some combination of these [5, 26, 35, 36]. Within a few of the emm types described in this work, we have found a fair variety of different T agglutination patterns among isolates, and for certain types, we have also found different sof sequence types (described under the aforementioned individual types). Even more rarely, one can find apparent strain heterogeneity within isolates sharing identical emm and T types. For example, to date, we have examined 2 independent emm11 (M11), T11, OF-positive isolates (from the United States and Argentina) that, unlike the majority of emm11 isolates, are AOF type 25 and carry the sof25 gene (authors' unpublished observations) . We find that 5′ emm and sof sequences generally exhibit little allelic variation within types, so the combination of both sequence types serves as a useful genetic marker. On the basis of PFGE and multilocus sequence data, it appears that isolates sharing identical emm and sof 5′ sequences are representative of clonal populations, in contrast to those that share only emm or sof 5′ sequence types (authors' unpublished observations) [5, 26]. Within the set of new emm sequence types represented here, we have found conservation of both emm and sof sequence types among geographically widely separated isolates for types st2034 (emm103), stcmuk16 (emm112), and st2967 (emm114). In contrast, we have noted emm sequence types st4935, st448, and st2035 are shared between different GAS genotypes distinguishable by distinct sof sequence types . For the other 5 emm types presented here that are associated with the sof-positive trait, only the reference strains have been subjected to sof sequence analysis.
In table 1, it is shown that the deduced N termini of Emm124 and M2 share 82% identity and that, similarly, Emm108 and M70 share 84% identity. Although ã80% sequence identity between the N terminal sequences of serologically distinct M types is not common, this level of homology does occur between such serologically distinct pairs as M9/M34, M11/M85, and M48/M75, which display ⩾84% identity in this region. Although we use the sequence-based emm type definition described in the Materials and Methods section, we are also expanding the emm sequence database by documenting all alleles of established types that contain any amino acid sequence changes in the mature N terminal 50 residues. Some common types frequently display multiple alleles of this nature, whereas within several common types, new alleles have not been encountered. For example, at present, type emm5 has 14 alleles of this nature documented, whereas type emm2 is represented by a single 50-residue sequence.
It is our hope that ongoing molecular surveillance studies clearly identifying widely disseminated virulent GAS strains will provide useful groundwork for future elucidation of virulence properties of different strain types and development of effective vaccines. We urge investigators who discover potential new emm sequence types from GAS infection isolates to submit their strains to 1 of the 6 reference laboratories listed in the authorship of this article. Many newly identified GAS strains are widely geographically disseminated, which increases the importance of establishing a globally accepted and readily expandable modification of the Lancefield classification scheme.
We thank Theresa Hoenes (1995–1997), Raji Viswanathan (1997–1998), Holly Starling (1998), Zhongya Li (1998–present), and Varja Sakota (2000–present) for their excellent work in the CDC streptococcal genetics laboratory during the past 6 years. Special thanks to the CDC Biotechnology Core Facility Branch computing group—most notably to Scott Sammons, Elizabeth Neuhaus, and Sarah Mckneally—for their assistance in constructing and maintaining the Web site (http://www.cdc.gov/ncidod/biotech/infotech_hp.html).