- Split View
-
Views
-
CiteCitation
Pedro Soares, Jean Alain Trejaut, Jun-Hun Loo, Catherine Hill, Maru Mormina, Chien-Liang Lee, Yao-Ming Chen, Georgi Hudjashov, Peter Forster, Vincent Macaulay, David Bulbeck, Stephen Oppenheimer, Marie Lin, Martin B. Richards; Climate Change and Postglacial Human Dispersals in Southeast Asia, Molecular Biology and Evolution, Volume 25, Issue 6, 1 June 2008, Pages 1209–1218, https://doi.org/10.1093/molbev/msn068
Download citation file:
© 2019 Oxford University Press
Close -
Share
Abstract
Modern humans have been living in Island Southeast Asia (ISEA) for at least 50,000 years. Largely because of the influence of linguistic studies, however, which have a shallow time depth, the attention of archaeologists and geneticists has usually been focused on the last 6,000 years—in particular, on a proposed Neolithic dispersal from China and Taiwan. Here we use complete mitochondrial DNA (mtDNA) genome sequencing to spotlight some earlier processes that clearly had a major role in the demographic history of the region but have hitherto been unrecognized. We show that haplogroup E, an important component of mtDNA diversity in the region, evolved in situ over the last 35,000 years and expanded dramatically throughout ISEA around the beginning of the Holocene, at the time when the ancient continent of Sundaland was being broken up into the present-day archipelago by rising sea levels. It reached Taiwan and Near Oceania more recently, within the last ∼8,000 years. This suggests that global warming and sea-level rises at the end of the Ice Age, 15,000–7,000 years ago, were the main forces shaping modern human diversity in the region.
Introduction
Between the 1970s and the 1990s, archaeologists and linguists developed a model for the settlement of Island Southeast Asia (ISEA) that became widely accepted. The “out of Taiwan” model suggested that settlement occurred in 2 waves of dispersal. The first of these, a late Pleistocene arrival of so-called “Australo-Melanesian” people ∼50,000 years ago (Barker et al. 2007), led to the initial peopling of both Sahul (Hudjashov et al. 2007) and the ancient continent of Sundaland, which comprised Indo-China, Malaysia, and western Indonesia before the sea-level rises that began ∼19,000 years ago (fig. 1). These people were then replaced or assimilated in the mid-Holocene by the “Austronesian expansion”—a maritime dispersal of rice agriculturalists from southern China via Taiwan into ISEA between ∼5,500 and 4,000 years ago, who brought with them the Austronesian languages now spoken throughout the region (Bellwood 1997; Blust 1999; Diamond and Bellwood 2003). This view implies that these were the 2 most important demographic events shaping modern human diversity in the region, with the latter generally accorded far greater attention than the former.
Map of Taiwan and Southeast Asia showing both modern coastlines (dark shading) and the 120-m depth contour below sea level (pale shading), indicating the extent of Sundaland at the LGM. The small circles in the map indicate sampling points in this study and additional published data points used in the analyses. Map outline kindly provided by H. Voris and C. Simpson, Field Museum of Natural History, Chicago, IL (Voris 2000).
Map of Taiwan and Southeast Asia showing both modern coastlines (dark shading) and the 120-m depth contour below sea level (pale shading), indicating the extent of Sundaland at the LGM. The small circles in the map indicate sampling points in this study and additional published data points used in the analyses. Map outline kindly provided by H. Voris and C. Simpson, Field Museum of Natural History, Chicago, IL (Voris 2000).
The benchmark status of a mid-Holocene out of Taiwan dispersal has been questioned by genetic studies, especially those focusing on variation in the nonrecombining mitochondrial DNA (mtDNA) and Y chromosome (Richards et al. 1998; Kayser et al. 2000, 2001; Su et al. 2000; Capelli et al. 2001; Trejaut et al. 2005). In particular, from a detailed consideration of more than 1,000 mtDNA control region sequences of samples from throughout ISEA and Taiwan, we have recently shown that, at most, only ∼20% of modern mtDNAs in ISEA were introduced at the time of the Neolithic (Hill et al. 2007).
These results could be compatible with a version of out of Taiwan that allows for a greater level of assimilation of indigenous Pleistocene lineages during the Austronesian expansion than had previously been anticipated (Hurles et al. 2002; Paz 2002). However, some scholars have suggested a more radical view in which the key demographic processes involved not (or not only) the spread of farming but late glacial and postglacial dispersals as a result of climate change and sea-level rises (Meacham 1984–1985; Oppenheimer 1998; Solheim 2006). We have recently shown that the majority of the mtDNAs in the region indeed coalesce to founder ages between 5,000 and 25,000 years ago, suggesting independent genetic evidence for such a view (Hill et al. 2007). Climate change at the end of the last Ice Age has been shown to have had an important effect on human genetic diversity in western Europe, where increased glaciation drove hunter-gatherer groups into southern refuge areas (Torroni et al. 1998; Torroni, Bandelt, et al. 2001; Achilli et al. 2004; Gamble et al. 2005; Pereira et al. 2005). The potential impact in Southeast Asia was at least as profound: around half of the land area of the continent of Sundaland was lost to the sea between 15,000 and 7,000 years ago, with a concomitant doubling of the length of coastline as the resulting archipelago was formed (Bird et al. 2005; Sathiamurthy and Voris 2006). Moreover, it seems likely that the sea level rose in a series of rapid steps that could have had a catastrophic impact on human populations (Blanchon and Shaw 1995; Pelejero et al. 1999; Hanebuth et al. 2000).
Here we use the sequence variation in complete mtDNA genomes to investigate the possibility that a signal of late glacial and postglacial dispersals exists in ISEA (Richards and Macaulay 2001). Although many mtDNA clusters coalesce in the time window of interest (Hill et al. 2007), we have focused particularly on mtDNA haplogroup E. This haplogroup accounts for ∼15% of mtDNA lineages in ISEA and Taiwan and is almost entirely restricted to this region; it therefore presents a particularly clear and highly informative distribution (Hill et al. 2007). Although rarely found elsewhere, it nests within a larger haplogroup, M9, that includes a sister clade to E, M9a'b, much of which is restricted to Japan and the East Asian mainland.
Material and Methods
Subjects
We completely sequenced 50 haplogroup E mtDNAs: 25 from 9 locations throughout Indonesia/East Malaysia, 9 from 4 locations in the Philippines, 14 from 8 aboriginal groups in Taiwan (including several who no longer speak Austronesian languages), and 2 from the north coast of Papua New Guinea (PNG). Samples were selected on the basis of hypervariable segment I (HVS-I) control region variation and the results of restriction typing for diagnostic sites. Haplogroup E includes an HVS-I motif with transitions at positions 16223, 16362, and 16390 with respect to the reference sequence (Andrews et al. 1999); in addition, we tested for the diagnostic loss of restriction site −7598 HhaI in the coding region. We also sequenced 3 “pre-M9a” lineages identified on the basis of control region polymorphisms at nucleotide positions (nps) 16234 and 16362 shared with M9a. The human DNA samples used in the UK were anonymized unlinked samples in accordance with UK ethical guidelines, and the work was approved by the University of Leeds, Faculty of Biological Sciences Ethics Committee. The Taiwan samples were collected with the informed consent of the subjects and the work approved by the Human Experiment Committee of the Mackay Memorial Hospital of Taipei.
We included one published E sequence from the Philippines in the analysis (Ingman and Gyllensten 2003) and a further published coding region sequence for the age estimates (Herrnstadt et al. 2002); we also included a further 13 published M9a sequences (Ingman et al. 2000; Tanaka et al. 2004), an unclassified M9* sequence (diverging independently from the root of M9) previously identified in Malaysia (Macaulay et al. 2005), and an M9b sequence (Kong et al. 2006).
Sequence Analysis
We performed the complete sequencing as previously described (Torroni, Rengo, et al. 2001), either at the University of Leeds using an ABI 16-capillary 3130XL DNA Analyzer or in Taipei using an ABI 48-capillary 3730 DNA Analyzer.
Phylogenetic and Statistical Analyses
For the phylogenetic analysis, we used (unweighted) the reduced median (RM) network algorithm of the package Network 4.1 (Bandelt et al. 1995), with all characters included with the exception of size differences in the polyC tracts at positions 309 and 315. A previously published M9* sequence from Malaysia (Macaulay et al. 2005) served as a suitable outgroup, and the RM network equates to the single most parsimonious tree in this case with the exception of a handful of characters. These are 3 fast-evolving control region characters, at positions 153, 16390, and 16519, which generate reticulations in the RM network. In the E2a clade, the positions 16185 and 5460 also generated a reticulation. The mutation at position 16185 appears independently far less frequently than that at 5460 in a database of ∼2,000 complete sequences, so we assumed a single event at this site in E2a.
Time depths were estimated from the more slowly evolving coding region of the molecule between nps 577 and 16023. We obtained the time estimates by converting the averaged distance (ρ) of the haplotypes of a clade from the respective root haplotype, accompanied by a heuristic estimate of standard error, σ, following Saillard et al. (2000). We also obtained maximum likelihood (ML) estimates from the coding region where possible (i.e., for clades defined by coding region mutations), using PAML 3.13 (Yang 1997), assuming the HKY85 mutation model with gamma-distributed rates (approximated by a discrete distribution with 32 categories). We converted ρ in the coding region and mutational distance in ML to time using a mutation rate estimate of 1.26 × 10−8 base substitutions per nucleotide per year, corresponding to one substitution per 5,140 years in the coding region (Mishmar et al. 2003).
We used HVS-I information to augment the age estimates in some cases, using a transition rate of 1 in 20,180 years for the region between nps 16090 and 16365 (Forster et al. 1996). Table 1 shows a comparison of overall time depths of nodes in the gene tree estimated from both coding-region sequences and HVS-I data, indicating that the age estimates are comparable. Additionally, we compared the coding region ML age estimates with the ρ estimates (table 2). The ML age estimates did not differ substantially from the ones calculated with ρ.
Comparison of Coding Region and Control Region HVS-I Age Estimates and Standard Error (SE) within Subclades and Paragroups of Haplogroup E Using ρ
| Clade/Paragroup | Segment | n | Age Estimate (SE) |
| E1a* | HVS-I | 40 | 5,550 (2,300) |
| Coding | 5 | 7,700 (2,000) | |
| E1a1a | HVS-I | 202 | 9,200 (2,500) |
| Coding | 13 | 10,300 (2,500) | |
| E1b | HVS-I | 41 | 6,750 (2,550) |
| Coding | 9 | 6,850 (2,050) | |
| E2a | HVS-I | 29 | 8,350 (3,100) |
| Coding | 11 | 6,700 (1,800) | |
| HVS-I | 70 | 4,600 (2,700) | |
| E2b | Coding | 6 | 4,300 (2,300) |
| Clade/Paragroup | Segment | n | Age Estimate (SE) |
| E1a* | HVS-I | 40 | 5,550 (2,300) |
| Coding | 5 | 7,700 (2,000) | |
| E1a1a | HVS-I | 202 | 9,200 (2,500) |
| Coding | 13 | 10,300 (2,500) | |
| E1b | HVS-I | 41 | 6,750 (2,550) |
| Coding | 9 | 6,850 (2,050) | |
| E2a | HVS-I | 29 | 8,350 (3,100) |
| Coding | 11 | 6,700 (1,800) | |
| HVS-I | 70 | 4,600 (2,700) | |
| E2b | Coding | 6 | 4,300 (2,300) |
Comparison of Coding Region and Control Region HVS-I Age Estimates and Standard Error (SE) within Subclades and Paragroups of Haplogroup E Using ρ
| Clade/Paragroup | Segment | n | Age Estimate (SE) |
| E1a* | HVS-I | 40 | 5,550 (2,300) |
| Coding | 5 | 7,700 (2,000) | |
| E1a1a | HVS-I | 202 | 9,200 (2,500) |
| Coding | 13 | 10,300 (2,500) | |
| E1b | HVS-I | 41 | 6,750 (2,550) |
| Coding | 9 | 6,850 (2,050) | |
| E2a | HVS-I | 29 | 8,350 (3,100) |
| Coding | 11 | 6,700 (1,800) | |
| HVS-I | 70 | 4,600 (2,700) | |
| E2b | Coding | 6 | 4,300 (2,300) |
| Clade/Paragroup | Segment | n | Age Estimate (SE) |
| E1a* | HVS-I | 40 | 5,550 (2,300) |
| Coding | 5 | 7,700 (2,000) | |
| E1a1a | HVS-I | 202 | 9,200 (2,500) |
| Coding | 13 | 10,300 (2,500) | |
| E1b | HVS-I | 41 | 6,750 (2,550) |
| Coding | 9 | 6,850 (2,050) | |
| E2a | HVS-I | 29 | 8,350 (3,100) |
| Coding | 11 | 6,700 (1,800) | |
| HVS-I | 70 | 4,600 (2,700) | |
| E2b | Coding | 6 | 4,300 (2,300) |
Comparison of Coding Region Age Estimates and Respective Standard Errors (SEs) of the M9 Haplogroup and Subclades within M9 Using ML and ρ
| Clade | ML Age estimate (SE) | ρ Age estimate (SE) |
| M9 | 52,550 (7,600) | 49,100 (10,000) |
| M9a | 17,550 (4,450) | 15,050 (4,000) |
| E | 31,050 (6,750) | 33,150 (8,200) |
| E1 | 14,950 (3,900) | 16,950 (4,900) |
| E1a | 9,850 (1,850) | 11,700 (3,450) |
| E1a1 | 7,550 (1,500) | 8,900 (2,150) |
| E1b | 6,600 (1,900) | 6,850 (2,050) |
| E2 | 8,000 (2,050) | 9,450 (3,800) |
| E2a | 6,350 (1,500) | 6,700 (1,800) |
| Clade | ML Age estimate (SE) | ρ Age estimate (SE) |
| M9 | 52,550 (7,600) | 49,100 (10,000) |
| M9a | 17,550 (4,450) | 15,050 (4,000) |
| E | 31,050 (6,750) | 33,150 (8,200) |
| E1 | 14,950 (3,900) | 16,950 (4,900) |
| E1a | 9,850 (1,850) | 11,700 (3,450) |
| E1a1 | 7,550 (1,500) | 8,900 (2,150) |
| E1b | 6,600 (1,900) | 6,850 (2,050) |
| E2 | 8,000 (2,050) | 9,450 (3,800) |
| E2a | 6,350 (1,500) | 6,700 (1,800) |
Comparison of Coding Region Age Estimates and Respective Standard Errors (SEs) of the M9 Haplogroup and Subclades within M9 Using ML and ρ
| Clade | ML Age estimate (SE) | ρ Age estimate (SE) |
| M9 | 52,550 (7,600) | 49,100 (10,000) |
| M9a | 17,550 (4,450) | 15,050 (4,000) |
| E | 31,050 (6,750) | 33,150 (8,200) |
| E1 | 14,950 (3,900) | 16,950 (4,900) |
| E1a | 9,850 (1,850) | 11,700 (3,450) |
| E1a1 | 7,550 (1,500) | 8,900 (2,150) |
| E1b | 6,600 (1,900) | 6,850 (2,050) |
| E2 | 8,000 (2,050) | 9,450 (3,800) |
| E2a | 6,350 (1,500) | 6,700 (1,800) |
| Clade | ML Age estimate (SE) | ρ Age estimate (SE) |
| M9 | 52,550 (7,600) | 49,100 (10,000) |
| M9a | 17,550 (4,450) | 15,050 (4,000) |
| E | 31,050 (6,750) | 33,150 (8,200) |
| E1 | 14,950 (3,900) | 16,950 (4,900) |
| E1a | 9,850 (1,850) | 11,700 (3,450) |
| E1a1 | 7,550 (1,500) | 8,900 (2,150) |
| E1b | 6,600 (1,900) | 6,850 (2,050) |
| E2 | 8,000 (2,050) | 9,450 (3,800) |
| E2a | 6,350 (1,500) | 6,700 (1,800) |
The reconstructed tree was scaled against the ρ coding region point age estimates of the major nodes. The estimates for 2 of the clades, E1a1 and E1a1a, were unfeasible because the point estimate for the ancestor node E1a1 (8,900 ± 2,150) was lower than the derived node E1a1a (10,300 ± 2,500). Both E1a and E1a1 present HVS-I motifs corresponding to the root type of E (with transitions at positions 16223, 16362, and 16390). Observed HVS-I variation in these lineages was completely restricted, albeit rarely, to ISEA. The samples presented in the complete sequence tree for the node E1a1 are from Taiwan, most likely the sink (or descendant) population, and almost lack HVS-I variation. This founder effect in Taiwan has therefore probably reduced the overall age estimate of the node E1a1. So the more frequent and star-like E1a1a clade should present a more reliable age estimate than its ancestor node and is therefore the one scaled in the tree.
Haplotype diversity was calculated as 1 − Σixi2, where xi is the relative frequency of the ith haplotype in the sample (Torroni, Bandelt, et al. 2001).
Phylogeographic Distributions
We used the database of published mtDNA HVS-I sequences and our own unpublished data in order to draw more robust conclusions about frequency distributions than is possible using the numerically more limited complete genome data. Haplogroup E includes an HVS-I motif with transitions at positions 16223, 16362, and 16390. E1a1a has an additional transition at position 16291, E1b is distinguished in HVS-I by a transition at position 16261, E2 by a transition at position 16051, and E2b by a transition at position 16185. M9a carries transitions at positions 16234, 16316, and 16362; 16234 and 16362 occur on the “pre-M9a” branch that diverges from the ancestor of M9 ∼50,000 years ago. Given these motifs, and the size of the HVS-I database, we can have considerable confidence that our frequency estimates are broadly accurate. M9b is more difficult because of its scarcity: it has only been identified twice: once as a complete sequence from western China (Kong et al. 2006) and once as the HVS-I sequence (motif: 16051, 16209, 16223, and 16362) in a Yao speaker from southeast China (Wen, Li, Gao, et al. 2004).
We displayed frequency distributions using the Kriging algorithm of Surfer 8 using the geographic points indicated in figure 1. The haplogroup E distributions were based on a total of 1,704 samples from throughout ISEA (Hill et al. 2006, 2007; authors’ unpublished data), 184 from West Papua (authors’ unpublished data), and 962 from Taiwan (Tajima et al. 2003; Trejaut et al. 2005; authors’ unpublished data). The frequencies displayed in Taiwan correspond solely to Austronesian-speaking aboriginals. The distribution of the haplogroup E sequences is shown in supplementary table S1 (Supplementary Material online). All the haplogroup E sequences identified on the basis of their HVS-I motif in ISEA were further checked by sequencing for position 10834 (defining E1). Samples with the control region transition at position 16051 were checked by sequencing for the position 8730 (defining E2a) and 8 Indonesian and 4 Taiwanese from the E1 branch that lacked either one of the informative control region variants (at nps 16291 or 16261) were checked for np 6620 position (defining E1a). Published data from China, Thailand, Malaysia, West Papua, and Australia were also included in the analyses (van Holst Pellekaan et al. 1998; Fucharoen et al. 2001; Kivisild et al. 2002; Tommaseo-Ponzetta et al. 2002; Yao et al. 2002a; Yao et al. 2002b; Yao and Zhang 2002; Wen, Li, Gao, et al. 2004; Wen, Li, Lu, et al. 2004; Wen, Xie, et al. 2004). There are 3 potential haplogroup E sequences in a Chinese data set of 2,618 sequences (i.e., 0.1%), 2 in southern China (Kivisild et al. 2002; Yao et al. 2002b) and 1 in western China (Yao et al. 2000) which may be “accidentals” from ISEA/Taiwan. The East Asian data sets were additionally used to build a frequency distribution map for M9a.
Results
Haplogroup M9
The phylogenetic tree of M9 with reconstructed character evolution is shown in figure 2. Although M9a is restricted to mainland Asia and Japan, the deepest branches in the clade encompassing it (pre-M9a) are found in Indo-China, China, and Taiwan and the next deepest in ISEA. M9b is the result of the earliest split from pre-M9a within M9 but its phylogeographic usefulness is minimal due to its apparent extreme rarity, with only 2 instances to date in southern and western China (Wen, Li, Gao, et al. 2004; Kong et al. 2006). Along with the finding of an unclassified M9* lineage in the aboriginal inhabitants of Malaysia (Macaulay et al. 2005) and the insular distribution of haplogroup E itself this distribution suggests that M9 as a whole is most likely to have had a Southeast Asian origin, ∼50,000 years ago. M9a itself appears to have spread north into the East Asian mainland ∼15,000 years ago, after the last glacial maximum (LGM). This is supported by the phylogeography of HVS-I variation: to date almost all the deeper branching lineages within the published pre-M9a HVS-I data are found in southern China and Southeast Asia and all northeast Asian lineages belong to M9a, itself as characterized by a transition at np 16316.
Reconstructed phylogeny of 69 complete mtDNA genome sequences from haplogroup M9. Samples are labeled and shaded according to region and age estimates (±standard errors) are indicated at the nodes. Mutations are transitions at the np indicated unless otherwise specified. Three major phases of warming and rapid sea-level rise are labeled as “floods.” Numbers 1–16 correspond to the published sequences with GenBank accession numbers AY963582, DQ272112, AP008710, AF346973, AP008702, AP008815, AP008677, AP008863, AP008629, AP008766, AY255153, AP008704, AP008353, AP008378, AP008860, and AY289070, respectively.
Reconstructed phylogeny of 69 complete mtDNA genome sequences from haplogroup M9. Samples are labeled and shaded according to region and age estimates (±standard errors) are indicated at the nodes. Mutations are transitions at the np indicated unless otherwise specified. Three major phases of warming and rapid sea-level rise are labeled as “floods.” Numbers 1–16 correspond to the published sequences with GenBank accession numbers AY963582, DQ272112, AP008710, AF346973, AP008702, AP008815, AP008677, AP008863, AP008629, AP008766, AY255153, AP008704, AP008353, AP008378, AP008860, and AY289070, respectively.
It is worth noting that M9 is only defined by a single nucleotide change, at position 4491, and its status as a clade may therefore be open to some doubt. A database of ∼2,000 published mtDNA coding region sequences suggests that position 4491 evolves slowly in the global mtDNA tree, where we just detected 6 occurrences in more than 10,000 mutations. Nevertheless, it is possible that the Malaysian outgroup labeled M9* may in fact be the result of an independent mutation at position 4491. The link between haplogroups E and pre-M9a is slightly stronger as it also includes a mutation at the control region site 16362, although this position is fast evolving (Bandelt et al. 2006). The inferred ancestry of pre-M9a in Southeast Asia provides an additional phylogeographic argument for a common ancestry. None of our principal arguments to follow rely on the inferred reality of M9 as a clade, but the phylogenetic link would strengthen the case for their common ancestry in the original Sunda population.
Haplogroup E
Haplogroup E itself has a likely time depth of more than 30,000 years but falls into 2 major subclades, E1 and E2, which are ∼17,000 and ∼9,500 years old, respectively (fig. 2). They are each subdivided into 2 further principal subclades, E1a, E1b, E2a, and E2b, which range from ∼4,500 to ∼11,500 years old. These subclades have a very distinctive geographic distribution, which is highly informative about the demographic history of the region. Although all 4 subclades are found in ISEA, so that ISEA lineages occur throughout the tree, only 2 of the 4 (E1a [fig. 3a] and E2b [fig. 3d]) are found in Taiwan. The remaining subclades, E1b (fig. 3b) and E2a (fig. 3c), are both largely restricted to ISEA, albeit extending occasionally to New Guinea and the Malay Peninsula, suggesting that both arose in ISEA and dispersed fairly recently east and west. Of the 2 subclades shared with Taiwan, E2b, which is only 4,300 (±2,300) years old, falls into 2 branches, both present in ISEA, with the Taiwanese lineages forming a clear, shallower founder subclade. In other words, the diversity of E2b seen in Taiwan is a subset of the clade found in ISEA. This implies an origin in ISEA and subsequent migration to Taiwan.
Spatial frequency distributions created using the Kriging algorithm of the Surfer package of haplogroups (a) E1a, (b) E1b, (c) E2a, (d) E2b, and (e) M9a.
Spatial frequency distributions created using the Kriging algorithm of the Surfer package of haplogroups (a) E1a, (b) E1b, (c) E2a, (d) E2b, and (e) M9a.
Therefore, it is clear that 3 of the 4 subclades of haplogroup E arose in ISEA. The fourth and oldest subclade, E1a, is 11,700 (±3,450) years old, and Taiwanese and ISEA lineages are interleaved within it so that a direction of dispersal is less obvious. However, the overall diversity translates into an age estimate that is somewhat higher in ISEA (13,100 ± 3,700 years) than in Taiwan (10,750 ± 4,200 years). Furthermore, 1 of the 2 major subclades of E1a, E1a2 (at 9,400 ± 2,850 years), is only present in ISEA, suggesting an origin in this region. It is also worth noting that there is very low diversity within the control region in the E1a* sequences (i.e., excluding E1a1a) in Taiwan (haplotype diversity = 0.131 and an estimate of 1,400 ± 1,000 years using ρ), suggesting a recent founder effect. The derived and most common clade E1a1a presents control region diversity in Taiwan that approximates the diversity seen in what would have been Sundaland, using both haplotype diversity (table 3) and ρ statistics. It should be noted that the value in Taiwan is elevated by the presence of 2 high–frequency derived haplotypes that, together with the root type, account for more than 80% of the samples of this clade in Taiwan (supplementary table S1, Supplementary Material online), again strongly suggesting a founder effect.
Heterozygosity of HVS-I Sequences (from positions 16051 to 16400) within Subclades and Paragroups of Haplogroup E by Region (with the highest value for each subclade in bold)
| Clade/Paragroup | Taiwan (n) | ISEAa (n) | Philippines (n) | Sundalandb (n) | Wallaceac (n) |
| E1a* | 0.131 (29) | 0.792 (12) | — (1) | 0.480 (5) | 0.722 (6) |
| E1a1a | 0.659 (73) | 0.572 (129) | 0.573 (40) | 0.751 (34) | 0.353 (55) |
| E1b | — | 0.521 (41) | 0.560 (5) | 0.782 (15) | 0.177 (21) |
| E2a | — | 0.599 (29) | 0.198 (9) | 0.406 (8) | 0.819 (12) |
| E2b | 0.277 (57) | 0.722 (14) | 0.645 (12) | 0.500 (2) | — |
| Clade/Paragroup | Taiwan (n) | ISEAa (n) | Philippines (n) | Sundalandb (n) | Wallaceac (n) |
| E1a* | 0.131 (29) | 0.792 (12) | — (1) | 0.480 (5) | 0.722 (6) |
| E1a1a | 0.659 (73) | 0.572 (129) | 0.573 (40) | 0.751 (34) | 0.353 (55) |
| E1b | — | 0.521 (41) | 0.560 (5) | 0.782 (15) | 0.177 (21) |
| E2a | — | 0.599 (29) | 0.198 (9) | 0.406 (8) | 0.819 (12) |
| E2b | 0.277 (57) | 0.722 (14) | 0.645 (12) | 0.500 (2) | — |
ISEA is defined as the Philippines, Indonesia, and East Malaysia.
Sundaland is defined as Malaysia, Sumatra, Borneo, Java, and Bali.
Wallacea is defined as Sulawesi, Lombok, Sumba, and eastern Indonesia.
Heterozygosity of HVS-I Sequences (from positions 16051 to 16400) within Subclades and Paragroups of Haplogroup E by Region (with the highest value for each subclade in bold)
| Clade/Paragroup | Taiwan (n) | ISEAa (n) | Philippines (n) | Sundalandb (n) | Wallaceac (n) |
| E1a* | 0.131 (29) | 0.792 (12) | — (1) | 0.480 (5) | 0.722 (6) |
| E1a1a | 0.659 (73) | 0.572 (129) | 0.573 (40) | 0.751 (34) | 0.353 (55) |
| E1b | — | 0.521 (41) | 0.560 (5) | 0.782 (15) | 0.177 (21) |
| E2a | — | 0.599 (29) | 0.198 (9) | 0.406 (8) | 0.819 (12) |
| E2b | 0.277 (57) | 0.722 (14) | 0.645 (12) | 0.500 (2) | — |
| Clade/Paragroup | Taiwan (n) | ISEAa (n) | Philippines (n) | Sundalandb (n) | Wallaceac (n) |
| E1a* | 0.131 (29) | 0.792 (12) | — (1) | 0.480 (5) | 0.722 (6) |
| E1a1a | 0.659 (73) | 0.572 (129) | 0.573 (40) | 0.751 (34) | 0.353 (55) |
| E1b | — | 0.521 (41) | 0.560 (5) | 0.782 (15) | 0.177 (21) |
| E2a | — | 0.599 (29) | 0.198 (9) | 0.406 (8) | 0.819 (12) |
| E2b | 0.277 (57) | 0.722 (14) | 0.645 (12) | 0.500 (2) | — |
ISEA is defined as the Philippines, Indonesia, and East Malaysia.
Sundaland is defined as Malaysia, Sumatra, Borneo, Java, and Bali.
Wallacea is defined as Sulawesi, Lombok, Sumba, and eastern Indonesia.
This problem is also amenable to dissection by an exploratory founder analysis that assumes either Taiwan or ISEA as the source in turn. This procedure estimates the time of arrival of lineages in the putative sink population by subtracting the diversity already present in the putative source at the time of the dispersal (Richards et al. 2000). Applying such a reciprocal founder analysis, the founder age for E1a in Taiwan assuming an ISEA source would be 6,550 (±1,850) years, whereas a dispersal in the opposite direction would yield an age of 10,000 (±2,400) years in ISEA, reflecting the higher proportion of private variants in the latter compared with Taiwan. This again suggests that ISEA is the more likely source.
We further assessed the lower diversity in Taiwan by simply calculating age estimates for each of the major clades or paragroups in the tree separately for Taiwan and ISEA (the latter including PNG) (table 4). The only case in which the age estimate is higher in Taiwan is haplogroup E2b. However, in that case, the diversity in Taiwan is restricted to the E2b1 node, corresponding clearly to a shallow founder clade. Moreover, the HVS-I haplotype diversity of E2b in Taiwan is extremely low (table 3).
Comparison of ρ Age Estimates and Respective Standard Errors (SEs) within Haplogroup E for Taiwan and ISEA, Including PNG, Using the Coding Region Molecular Clock
| Clade/Paragroup | Age estimate (SE) | |
| Taiwan | ISEA | |
| E1a* | 6,150 (3,250) | 8,800 (2,550) |
| E1a1a | 9,400 (2,850) | 11,000 (3,800) |
| E1b | — | 6,850 (2,100) |
| E2a | — | 7,500 (2,100) |
| E2b | 5,140 (2,950) (E2b1) | 3,400 (3,400) |
| Clade/Paragroup | Age estimate (SE) | |
| Taiwan | ISEA | |
| E1a* | 6,150 (3,250) | 8,800 (2,550) |
| E1a1a | 9,400 (2,850) | 11,000 (3,800) |
| E1b | — | 6,850 (2,100) |
| E2a | — | 7,500 (2,100) |
| E2b | 5,140 (2,950) (E2b1) | 3,400 (3,400) |
Comparison of ρ Age Estimates and Respective Standard Errors (SEs) within Haplogroup E for Taiwan and ISEA, Including PNG, Using the Coding Region Molecular Clock
| Clade/Paragroup | Age estimate (SE) | |
| Taiwan | ISEA | |
| E1a* | 6,150 (3,250) | 8,800 (2,550) |
| E1a1a | 9,400 (2,850) | 11,000 (3,800) |
| E1b | — | 6,850 (2,100) |
| E2a | — | 7,500 (2,100) |
| E2b | 5,140 (2,950) (E2b1) | 3,400 (3,400) |
| Clade/Paragroup | Age estimate (SE) | |
| Taiwan | ISEA | |
| E1a* | 6,150 (3,250) | 8,800 (2,550) |
| E1a1a | 9,400 (2,850) | 11,000 (3,800) |
| E1b | — | 6,850 (2,100) |
| E2a | — | 7,500 (2,100) |
| E2b | 5,140 (2,950) (E2b1) | 3,400 (3,400) |
We then explored the diversity statistics from the more numerous control region data in further detail, in order to further test this conclusion. We used HVS-I haplotype diversity within each subclade (and the paragroup E1a*, which includes the phylogenetically deeper parts of E1a) to assess variation between regions and the possible impact of recent founder effects. Table 3 shows the diversity of each subclade in both ISEA (divided into the former Sunda area, the Philippines, and the islands of Wallacea to the east, which were never connected to the mainland) and Taiwan. This shows that the higher diversity for each of the clades shared between Taiwan and ISEA is always found in one of the regions of ISEA and never in Taiwan.
Spatial frequency distributions suggest that the deeper branches of E1a and E2a may have diversified in Wallacea, where these haplogroups now occur at the highest frequencies (fig. 3a and c). However, caution is warranted as the sample sizes of these haplogroups are low. The 2 subclades with higher sample sizes (E1a1a and E1b) have a higher diversity in Sundaland than in Wallacea or the Philippines. This pattern is particularly striking in the most common clade, E1a1a. Furthermore, all subclades of haplogroup E are present in Borneo but not in Wallacea.
This suggests that the most likely location for the “homeland” of haplogroup E as a whole—that is to say, the place in which the defining mutations arose—is somewhere in the vicinity of northeast Sundaland. However, the diversity and frequency patterns suggest that it is quite possible that the subclades E2a and E1a* may have arisen in northwest Wallacea. In this case, the area of diversification of haplogroup E lineages would be distributed around the coastlines of the Sulawesi and Sulu Seas, as might be expected for lineages carried by a mobile, maritime-oriented population.
Wherever the precise origin of haplogroup E may lie, spatial frequency distribution and diversity strongly suggest that the haplogroup arose in ISEA and some of its subclades spread subsequently to Taiwan. Furthermore, we can infer a timescale for these processes using the coding region molecular clock. Early subclades are largely restricted to ISEA and date to the early Holocene. These are E1a1a1 (age 7,700 ± 3,150 years), E1a2 (age 9,400 ± 2,850 years), E1b (age 6,850 ± 2,150 years), and E2a (age 6,700 ± 1,800 years). This implies that the diversity of haplogroup E has been accumulating within ISEA for a minimum of 7,000–10,000 years.
Applying a founder analysis to the subclades found in Taiwan, assuming ISEA as the source, yielded arrival time estimates of 6,550 ± 1,850 years ago for E1a and 5,150 ± 2,950 years ago for E2b. As these values were quite similar, we also applied a founder analysis to the entire haplogroup, combining data from both subclades, effectively assuming a single dispersal. This overall founder age for Taiwan came to 6,250 ± 1,600 years. These results indicate that haplogroup E evolved in situ in ISEA and was involved in range expansions from ∼12,000 years ago, reaching Taiwan roughly 4,000–8,000 years ago.
We can also consider the control region database from a phylogeographic point of view to add additional detail to this picture. Control region sequence patterns suggest range expansions across eastern Indonesia in the mid-Holocene. The derived E1a1a lineages (which are distinguished by a diagnostic transition at np 16291 in the control region) present 2 clear zones of diversity across ISEA: western Indonesia (13,450 ± 5,600 years) and Wallacea (5,000 ± 1,700 years). The diversity seen in the Philippines amounts to 6,600 (±3,800) years. Therefore, it seems likely that much of the Philippines and Wallacea were settled at least in part during the mid-Holocene following diversification in the Sulu Sea/Sulawesi Sea region.
Several haplogroup E (mainly E1b) lineages in Near Oceania suggest subsequent gene flow further east (Friedlaender et al. 2007), perhaps along a sphere of interaction or “voyaging corridor” between Southeast Asia and the Solomon Islands that may have been established by ∼6,000 years ago (Terrell and Welsch 1997) or earlier (O'Connor and Veth 2005). A few putative haplogroup E lineages observed in the control region database for China also suggest possible very minor (and probably very recent) gene flow toward the mainland.
Discussion
Haplogroup E allows us to illuminate demographic processes in ISEA prior to the proposed Austronesian expansion from China/Taiwan. This haplogroup probably evolved within the descendants of the first settlers of Sundaland, who arrived carrying haplogroup M >50,000 years ago (Macaulay et al. 2005). Haplogroup E evolved, most likely from haplogroup M9, ∼35,000 years ago and was caught up in a dramatic series of dispersals and expansions that began in eastern Sundaland/northwest Wallacea from ∼12,000 years ago. This may well have been part of the same set of processes that also led to the dispersal of its probable sister haplogroup, M9a, north along the Pacific rim toward Korea and Japan from ∼15,000 years ago (fig. 3e). This pattern confirms the suggestion of postglacial expansion and dispersal proposed for haplogroup E on the basis of HVS-I sequence variation by Hill et al. (2007). Indeed, the HVS-I data for ISEA suggested expansions and dispersals carrying other mtDNA haplogroups in addition to haplogroup E at the same time (Hill et al. 2007). These will need to be similarly tested at the level of complete genome sequencing in order to resolve the signals clearly. The present data clearly demonstrate that these demographic processes took place and that they had a significant impact on the mtDNA pool of modern Southeast Asians.
The most plausible explanation for the spread of these genetic signatures is the impact on coastal-dwelling populations of the rapid global warming and sea-level rises that led to the inundation of the Sunda shelf by meltwater at the end of the last Ice Age (Oppenheimer 1998; Lin et al. 2005). Although sea levels began to rise gradually after the end of the LGM ∼19,000 years ago, it is likely that there were 3 major episodes of accelerated sea-level rise and flooding, probably due to ice sheet collapse and sometimes referred to as Catastrophic Rise Events 1–3, at ∼14,500, 11,500, and probably also ∼7,500 years ago (Blanchon and Shaw 1995; Oppenheimer 1998; Pelejero et al. 1999; Hanebuth et al. 2000; Voris 2000; Lambeck and Chappell 2001; Bird et al. 2005). It has been suggested that these episodes triggered major displacements of human groups living on the Sunda coastline and had an important role in shaping subsequent life in the region, in particular its maritime orientation and the development of sailing technology (Oppenheimer 1998; Solheim 2006).
With the opening up of approximately twice as much coastline as there had previously been, the coastal populations of Sundaland (and perhaps also northwestern Wallacea) would have been well placed to expand into this rapidly growing habitat. Conversely, the hinterland-based populations would have faced severe land pressures from both the drowning of their ancestral lands and the encroachment of rainforest over the savannas and monsoon forests to which they had adapted (Bird et al. 2005). A scenario of overall population growth, combined with intense competition at optimal resource locations on the growing coastline, would have provided the same motivations for dispersal that are conventionally associated with agriculturalist expansion. The fact that haplogroup E presents only 3 branches beyond ∼12,000 years ago, and only 2 long branches from ∼17,000 to ∼33,000 years ago, may however provide a glimpse of the scale of the preceding catastrophe: the heavy lineage losses may perhaps suggest a population decline before the more recent reexpansions.
These results open up a barely explored perspective on the population history of Homo sapiens in Southeast Asia. They provide a window onto major, environmentally related demographic changes that took place before the proposed Neolithic/Austronesian dispersal that is usually the main point of reference for the prehistory of this region. Moreover, archaeological evidence provides independent support for a large-scale dispersal across ISEA (or central/eastern Sundaland and Wallacea, as the region then was) during the 15,000 years after the LGM.
Prior to the LGM, there is no very clear distinction between the stone artifact assemblages of mainland Southeast Asia (MSEA—then, northwestern Sundaland) and ISEA. However, following the LGM, 2 geographically discrete traditions or “technocomplexes” emerged. The Hoabinhian appeared across MSEA (including the Malay Peninsula) and the northern two-thirds of Sumatra. It was based on pebble tools flaked along their edges, usually on one face but occasionally bifacially (Solheim 1994; Bellwood 1997; Bulbeck 2003; Forestier 2007). In contrast, the “flake–blade technocomplex” was based on flakes detached from rotated multiplatform cores. This appears to have emerged as a distinctive stone tool technology approximately 25, 000–30,000 years ago restricted to the islands and coastlines of the Sulu Sea region. After it had spread to northern Borneo ∼18,000 years ago, it evidently dispersed throughout ISEA following the LGM (Bellwood 1997; Tanudirjo 2001; Mahirta 2003; Arifin 2004; Veth et al. 2005; Mijares 2006; Forestier 2007; Mijares 2007).
The expansion of the Hoabinhian from South China/Vietnam has already been related to the arrival of the R9b and N9a mtDNA haplogroups into the Malay Peninsula (Hill et al. 2006). Here, we now find a very close relationship between the geographic extent of post-LGM flake–blade industries and the haplogroup E lineages, again suggesting a close match between archaeological and genetic reconstructions.
Although Pleistocene assemblages assigned to the flake–blade technocomplex lack distinct stone tool types, post-LGM dispersal of Hoabinhian populations into ISEA, or flake–blade technocomplex populations into MSEA/Sumatra, would be expected to leave the mark of anomalous assemblages within their regional context—which are yet to be recorded archaeologically. The archaeology of Taiwan, however, does reveal the incursion of a distinct stone tool technology, which is particularly similar to North Luzon Holocene industries (cf., Mijares 2007). Although most Taiwan flaked stone industries of the last ∼15,000 years are assigned to the Changpanian culture, which is related to the Hoabinhian (Bellwood 1997; Rolett et al. 2002), Zhang (2000) describes the appearance of a flake–blade industry, based on flakes struck from multiplatform cores, at 4 coastal sites in east Taiwan, all dated to 5,000–6,000 years ago. This would fit well with our estimated arrival from the south of E1a lineages in Taiwan.
Both the genetic and archaeological evidence therefore suggest that a maritime-oriented culture was in operation around the coastlines of what are now the Sulu and Sulawesi Seas by the LGM, which would have been preadapted to benefit from rising sea levels (Meacham 1984–1985; Oppenheimer 1998; Solheim 2006). The signature of dispersals from this region at the end of the Ice Age is evident both in the distribution of mtDNA haplogroup E lineages and the expansion of the flake–blade technocomplex.
These results only account for ∼15% of ISEA mtDNA lineages, but preliminary analyses based on HVS-I diversity suggest that many others may share a similar history (Hill et al. 2007). Complete genome analysis, such as we have carried out here, will be needed to test this further. It also remains to be seen whether a similar signal will be detected in the Y chromosome variation of Southeast Asians; the distribution of the main Y chromosome haplogroups in ISEA and Taiwan (Capelli et al. 2001) would be consistent with this scenario, but greater phylogenetic resolution is necessary to identify the directions of dispersal. A recent study of autosomal short tandem repeat variation in Taiwan and the western Pacific (Friedlaender et al. 2008) indicates the possibility of an ISEA component in the diversity of remote Pacific islanders that needs to be explored further.
The discovery of a signature of climate change in the human genome of Southeast Asians echoes similar findings for mtDNA haplogroups V and H in Europe. These haplogroups were carried from a glacial refuge in southwest Europe into western, central, and northern Europe following the retreat of the glaciers from ∼15,000 years ago (Richards et al. 2000; Torroni, Bandelt, et al. 2001; Gamble et al. 2004; Pereira et al. 2005). Our results therefore reemphasize the critical role of climate change—in particular global warming after the LGM—in shaping the evolution of the modern human gene pool.
We thank the Bradshaw Foundation, the British Academy, and the National Science Council of Taiwan (grant 95-2627-H-195-001) for financial support. P.S. was supported by a Marie Curie Early Stage Training Grant. We also thank John Clegg, A. S. M Sofro, and the Taipei County Labor Department for the samples, Harold Voris and Clara Simpson for the map used in figure 1, Toomas Kivisild for critical comments, and Alessandro Achilli for advice on the use of Surfer.



