Abstract

Separating the confounding effects of long-term population history from gene flow can be difficult. Here, we address the question of what inferences about gene flow can be made from mitochondrial sequence data in three closely related species of mosquitoes, Anopheles dirus species A, C, and D, from southeast Asia. A total of 84 sequences of 923 bp of the mitochondrial cytochrome oxidase I gene were obtained from 14 populations in Thailand, Myanmar, and Bangladesh. The genealogy of sequences obtained from two populations of An. dirus C indicates no contemporary gene flow between them. The FST value of 0.421 therefore probably represents a recent common history, perhaps involving colonization events. Anopheles dirus A and D are parapatric, yet no differentiation was seen either within or between species. The starlike genealogy of their haplotypes, smooth unimodal mismatch distributions, and excess of low frequency mutations indicate population expansion in An. dirus A and D. This, rather than widespread gene flow, explains their low within-species FST values (0.018 and 0.022). The greater genetic diversity of An. dirus D suggests that expansion occurred first in species D and subsequently in species A. The current geographical separation and low hybrid fitness of these species also argue against ongoing interspecific gene flow. They suggest instead either historical introgression of mtDNA from An. dirus D into species A followed by independent range expansions, or a selective sweep of mtDNA that originated in An. dirus D. While not excluding contemporary gene flow, historical population processes are sufficient to explain the data in An. dirus A and D. The genealogical relationships between haplotypes could not be used to make inferences of gene flow because of extensive homoplasy due to hypervariable sites and possibly also recombination. However, it is concluded that this approach, rather than the use of fixation indices, is required in the future to understand contemporary gene flow in these mosquitoes. The implications of these results for understanding gene flow in another important and comparable group of malaria vector mosquitoes in Africa, the An. gambiae complex, are also discussed.

Introduction

Mitochondrial DNA (mtDNA) variation is an important and widely implemented tool in indirect studies of gene flow. Inferences of contemporary migration events are often made from the geographical pattern of genetic variation, although this can also be influenced by a population history of colonization events or range expansions or contractions (Templeton 1998<$REFLINK> and references therein). In this paper, we consider what inferences about gene flow can be drawn from mtDNA sequence data for a group of anopheline mosquitoes. Many anopheline mosquitoes are vectors of human malaria, and one of the most effective means of controlling malaria is controlling vector populations. Understanding gene flow in anopheline mosquitoes is therefore important, particularly given the worldwide spread of insecticide resistance. In addition, large research efforts are being directed toward the goal of genetically modifying mosquitoes of the Anopheles gambiae complex, which contains several major African vectors, such that they are incapable of transmitting malarial parasites (Collins 1994<$REFLINK> ). An appreciation of the nature and extent of the barriers to gene flow in anopheline mosquitoes is therefore critical if refractory genes are to be spread effectively through vector populations. In southeast Asia, some of the major malaria vectors are members of the Anopheles dirus complex, the focus of this study. An additional interest in studying the An. dirus complex is that it is largely restricted to areas of tropical evergreen rainforest in southeast Asia that are rapidly diminishing. Members of the An. dirus complex may therefore serve as useful indicator species of the effects of forest loss and fragmentation on biodiversity.

The An. dirus complex comprises at least seven species (Peyton 1989<$REFLINK> ), but this study concerns only species A, C, and D. Anopheles dirus A and D have widespread distributions with a narrow zone of overlap along the Thai-Myanmar border (fig. 1 ). From this area, An. dirus A extends eastward across Thailand to Laos, Vietnam, and Cambodia, and An. dirus D extends westward through Myanmar toward Bangladesh and Assam (Rosenberg and Maheswary 1982<$REFLINK> ; Tun-Lin et al. 1985<$REFLINK> ; Baimai et al. 1988a<$REFLINK> ). Anopheles dirus C appears to be associated with rocky limestone outcrops that occur in the central region of the Thai-Myanmar border and southward into peninsular Thailand (Baimai et al. 1988a<$REFLINK> ). Consequently, An. dirus C has a patchy distribution. Limited allozyme evidence from the An. dirus complex in Thailand suggests little genetic population structure (Green et al. 1992<$REFLINK> ). In An. dirus D, there is a cline of chromosomal inversion frequencies from Thailand through Myanmar into Bangladesh (Baimai et al. 1988b<$REFLINK> ), but it is not known whether or not this is maintained by selection.

Cytogenetic studies of mitotic and polytene chromosomes (Baimai, Harrison, and Nakavachara 1980<$REFLINK> ; Baimai, Poopittayasataporn, and Kijchalao 1988<$REFLINK> ), allozyme data (Green et al. 1992<$REFLINK> ), and crossing studies (Baimai et al. 1987<$REFLINK> ) indicate that An. dirus A and C are very closely related. In their polytene chromosomes, the banding pattern is almost completely homosequential, and in F1 hybrids there is >90% synapsis. The F1 progeny are fertile and viable, with the exception of sterile males that are produced by the cross of female An. dirus A with male An. dirus C. By the same criteria, An. dirus D appears to be more distantly related. In the crossing studies between An. dirus D and An. dirus A or C, the only viable F1 progeny produced were females from crosses of male An. dirus A × female An. dirus D. However, experimental numbers were fairly low, and it is difficult to extrapolate from these laboratory crosses what might happen in nature. The only hybrid recorded in natural populations is that of An. dirus C-D, which was not formed in the small-scale laboratory crosses (Walton et al. 1999b<$REFLINK> ).

Materials and Methods

Mosquito Collection and Identification

Mosquitoes were collected from 11 sites in Thailand, Myanmar, and Bangladesh in an attempt to represent the geographic distributions of the species (fig. 1 ). Along the Thai-Myanmar border, there were sympatric populations of An. dirus A and D at two sites (4 and 6) and sympatric populations of species C and D at site 5, making a total of 14 populations. With the exception of Myanmar, adult females were collected from all sites and assumed to be unrelated. In Myanmar, mosquito larvae were collected from small pools of water and reared to adulthood. Since broods of larvae are temporally and spatially structured, sibling relationships between the individuals cannot be excluded. Mosquitoes were preserved by desiccation, and total DNA was extracted from individuals using a “salting-out” protocol (Sunnucks and Hales 1996<$REFLINK> ; Walton et al. 1999a<$REFLINK> ). Since these species of An. dirus are essentially isomorphic, species identification was performed using an allele-specific PCR method (Walton et al. 1999a<$REFLINK> ).

Amplification and Sequencing of mtDNA

A 1,230-bp fragment of the mitochondrial cytochrome oxidase I (COI) gene was amplified using An. dirus–specific primers D1664-86 (5′-GTTCCTTTAATATTAGGAGCACC-3′) and D2893-70 (5′-AATAGATGAAGATAATTGTATAGG-3′). The primer names indicate the homologous positions on the An. gambiae mitochondrial genome (Beard, Hamm, and Collins 1993<$REFLINK> ). The PCR reactions were performed in 50-μl volumes on a Hybaid Touchdown thermal cycler. Each reaction mix included DNA equivalent to ∼1/100 of a mosquito; 1 μM each primer, 200 μM dNTP, 1.5 mM MgCl2, 20 mM (NH4)2SO4, 75 mM Tris-HCl (pH 9.0), 0.01% (w/v) Tween, and 2.5 U AmpliTaq DNA polymerase (Applied Biosystems). Initial denaturation was for 5 min at 94°C and was followed by 35 cycles of amplification (20 s at 94°C, 30 s at 45°C, and 1 min at 72°C) and a final incubation of 3 min at 72°C. The products were purified on columns (Promega) and 923 bp (positions 1943–2865 in An. gambiae) were sequenced in both directions using two overlapping forward primers, D1908-1932 (5′-TTATTACTACTGTAATTAATATACG-3′) and DCOI-2376-99 (5′-TTTTTAGTTGATTAGCAACATTAC-3′), and two reverse primers, D2451-29 (5′-ACAAATCCAAATGCTCAAAGTAT-3′) and the reverse PCR primer, D2893-70. Sequencing reactions were performed with TaqFS dye-terminator fluorescent chemistry (Applied Biosystems) on an Applied Biosystems Model 373 automated sequencer. DNA sequences were assembled using the SeqEd multiple-sequence editor program (ABI 1992<$REFLINK> ).

Analysis of mtDNA Variation

The DNA sequences were aligned using PILEUP and translated into amino acid sequences using the Drosophila mtDNA genetic code in TRANSLATE (Genetics Computer Group 1994<$REFLINK> ). The sequence was confirmed as that of mitochondrial cytochrome oxidase I by comparison with the An. gambiae sequence (Beard, Hamm, and Collins 1993<$REFLINK> ). Pairwise distances between haplotypes were estimated in ARLEQUIN 1.1 (Schneider et al. 1997<$REFLINK> ). Due to the low level of diversity, no corrections were made for multiple substitutions. Analysis of population genetic structure was carried out using analysis of molecular variance (AMOVA; Excoffier, Smouse, and Quattro 1992<$REFLINK> ) in ARLEQUIN 1.1. AMOVA takes into account the number of molecular differences between haplotypes in an analysis-of-variance framework equivalent to F statistics. The significance of the FST statistics was tested by permutation. Isolation by distance was assessed by a Mantel test in NTSYS-pc (Numerical Taxonomy and Multivariate Analysis System; Rohlf 1990<$REFLINK> ). Minimum spanning haplotype networks were constructed with the aid of MINSPAN in NTSYS-pc. The limits of parsimony in the networks were estimated using a program package in Mathematica described in Templeton, Crandall, and Sing (1992)<$REFLINK> .

Coalescent theory makes predictions of the pattern of segregating sites that can be informative about the evolutionary history of a sample of alleles. The frequency distributions of the numbers of segregating sites in all possible pairwise comparisons, known as mismatch distributions, were calculated in ARLEQUIN 1.1. Slatkin and Hudson (1991)<$REFLINK> showed that the mismatch distributions of stable populations have a ragged profile due to stochastic lineage loss. In contrast, an exponentially growing population has a smooth unimodal distribution approaching a Poisson distribution. This reflects an underlying starlike genealogy in which all of the coalescent events occurred in a narrow time window.

The population parameter 𝛉 = 2Nμ, where N is the effective population size of females and μ is the neutral mutation rate per site per generation, was estimated by 𝛉π and 𝛉S. 𝛉π is based on the average pairwise number of differences between sequences (Tajima 1983<$REFLINK> ), and 𝛉S is estimated from the number of segregating sites in a population (Watterson 1975<$REFLINK> ). A neutral infinite-sites model and a constant-sized population that has reached drift-mutation equilibrium are assumed for both estimators. Within-population estimates were made in ARLEQUIN 1.1. Within- species estimates and tests of neutrality (Fu 1997<$REFLINK> ) were made on the web page of Y.-X. Fu (http://hgc.sph.uth.tmc.edu/cgi-bin/upblue.pl). Tests of neutrality detect deviations from the pattern of polymorphism expected from a neutral model of evolution. Among the most commonly used of these is Tajima's (1989)<$REFLINK> test statistic, which compares the two estimators of 𝛉, 𝛉π and 𝛉S. The other statistics used (with the exception of Fs) are similar in form but use the information in the sample differently. Fu's (1997)<$REFLINK>Fs statistic is negative if there is an excess of alleles over that expected for a given estimate of 𝛉π. The program FLUCTUATE (Kuhner, Yamato, and Felsenstein 1998<$REFLINK> ) was used to make simultaneous, phylogenetic estimates of present-day 𝛉, 𝛉0, and the population growth rate, g, assuming an exponential model of growth using a maximum-likelihood approach.

Results

Mitochondrial Sequence Data

The analysis is based on 923 bp of the mitochondrial cytochrome oxidase I gene sequence obtained from 84 individuals (approximately six individuals per species per site; table 1 ). Six individuals (three each from sites 2 and 3 in Myanmar) that were duplicates of other haplotypes at the same site were excluded in case they were related individuals, in order to make population structure estimation conservative. The An. dirus sequences were very AT-rich (71.0% A and T bases). Nucleotide substitutions were identified at 74 of the 923 sites, the majority of which were transitions (80.2%). At sites 84, 327, 777, and 912, there were three segregating bases, and at site 630, all four bases were segregating. Only two substitutions resulted in amino acid changes: those at position 40 (V→M) in haplotype 34 and at position 912 (Q→H) in haplotype 19. Sequence differences among the haplotypes are shown in table 2 . The reference sequence for haplotype 1 is deposited in EMBL with the accession number AJ271384.

The distribution of haplotypes among the 14 populations is shown in table 1 . Haplotype diversity was very high, with 70 haplotypes found among 84 individuals. This is in accord with the generally high levels of nucleotide diversity estimated by 𝛉π and 𝛉S. Some differences in diversity can be seen between populations. Notably, population 5 of An. dirus C has considerably less diversity than the other population of species C and all those of species A and D. The populations of An. dirus A are significantly less diverse than those of An. dirus D. The among-populations estimates of molecular diversity are 𝛉π = 0.00424 ± 0.000152 and 𝛉S = 0.00484 ± 0.00025 for An. dirus A and 𝛉π = 0.00637 ± 0.00022 and 𝛉S = 0.00720 ± 0.00035 for An. dirus D.

Population Genetic Structure

The within-species FST value for An. dirus C (table 3 ) indicates that the two An. dirus C populations are genetically quite distinct. In contrast, in An. dirus A and D, there appears to be little genetic differentiation among the populations within each species, even over large geographic distances. In a hierarchical AMOVA, pairwise comparisons were made between four groups: species A and D each comprised a group of six populations, and the genetically distinct species C populations were treated as separate groups. The comparisons of species A or D with either of the species C populations yielded very high FST values (0.680–0.800), indicating that An. dirus C is distinct from both An. dirus A and An. dirus D, contrary to expectations. Although significantly greater than 0, there was surprisingly little differentiation between species A and D (FST = 0.048). The greatest distance between an An. dirus D and an An. dirus A population (sites 1 and 10, respectively) is ∼2,000 km, which is greater than the ∼700-km distance between the two species C populations. However, there was no indication from a Mantel test of any isolation by distance among all A populations (r = −0.013, P = 0.38), all D populations (r = 0.059, P = 0.74), or all A and D populations combined (r = 0.049, P = 0.74).

Genealogical Relationships Among An. dirus Haplotypes

The minimum spanning haplotype network for An. dirus C (fig. 2A ) is well resolved and implies that only one mutation has occurred twice within the network. Population 7C has two quite divergent lineages. The An. dirus C network is linked to the combined network of species A and D (fig. 2B ) through one of the missing intermediate haplotypes via seven fixed differences.

The relationships of species A and D haplotypes (fig. 2B ) form a starburst pattern that is highly characteristic of a population expansion. (In the sense used here, population expansion could apply equally well to a demographic expansion of the mosquitoes themselves or to a selective sweep of their mtDNA.) Based on the number of haplotypes and the level of variation, the probability of parsimonious connections (i.e., no unobserved mutations) across the network is estimated to be >97% (Templeton, Crandall, and Sing 1992<$REFLINK> ). Despite this, a very high level of homoplasy is apparent in the network, with many mutational events being inferred repeatedly and sometimes resulting in closed loops. Although there are many equally or slightly longer parsimonious alternatives, the overall starlike topology of the network remains unchanged, as does the level of homoplasy. A root to the An. dirus C network cannot be inferred with any reliability. Many lineages are represented by only a single haplotype, and many missing intermediates have been inferred, since only a fraction of the haplotypes that exist have been detected. Three noncore haplotypes, 21, 38, and 17, are shared between An. dirus A and D. There are also several lineages (those leading to haplotypes 21, 46, 24, 47, 54, and 51) that contain haplotypes from both species. There are no clear-cut geographic associations among haplotypes on the same lineages. Possible exceptions are haplotypes 38 (species A and D), 39 (A), and 40 (A) from site 5; haplotypes 11 (D) and 12 (D), both of which are found at sites 2 and 3 in Myanmar; and haplotypes 48 (A), 50 (A), and 51 (D) from site 4.

Pattern of Segregating Sites in An. dirus A and D

Anopheles dirus A and D both have smooth unimodal mismatch distributions (fig. 3 ), consistent with population expansion. No simple statistical test can be made of a fit of the distributions to a Poisson distribution, as not only are the pairwise comparisons not independent, but the alleles share a common history. Despite this proviso, there are strong deviations from a Poisson distribution for both species A and species D (fig. 3 ), with the means being considerably larger than the variances. The frequency spectrum of segregating sites, equivalent to mutations in the genealogy, is shown in figure 4 . There is an excess of low-frequency mutations compared with that expected for a constant-size population.

Tests of Neutrality

An excess of low-frequency mutations, as observed in figure 4 , can be the result of demographic expansion, genetic hitchhiking, or background selection. Several tests of neutrality designed to detect these evolutionary forces (Fu 1997<$REFLINK> ) were performed on An. dirus A and D (table 4 ). The effect of multiple hits was minimized by inferring the minimum number of mutations that must have occurred at sites that are segregating for more than two bases. Multiple mutations to the same base at a site have not been inferred from the parsimony network due to the uncertainty there. Since the ability to detect deviations from neutral expectations increases with sample size, the individuals of all the populations from each species were pooled. This is reasonable given that there is essentially no genetic differentiation within An. dirus A or D and that the effect of population structure, if any did exist, would be an apparent excess of older mutations (Fu 1996<$REFLINK> ). However, all the test statistics reported here have negative values, indicating an excess of low-frequency mutations. Most of the values were highly significant in An. dirus D but not in An. dirus A. However, the FS test was extremely significant in both cases.

Estimates of the Effective Population Size

When there has been population growth, conventional estimates of 𝛉 (𝛉S and 𝛉π) will underestimate the current effective population size. For An. dirus A and D, where there is evidence of population growth, we therefore used the program FLUCTUATE (Kuhner, Yamato, and Felsenstein 1998<$REFLINK> ) to estimate the present-day value of 𝛉, called here 𝛉0. The algorithm is sensitive to the starting values of 𝛉0, so to obtain reasonable initial estimates, several preliminary runs were made using five short chains of 400 steps and two long chains of 2,000 steps with a wide range of starting 𝛉0 values. Three longer searches were then made for each species using 20 short chains of 400 steps and 5 long chains of 20,000 steps using initial 𝛉0 = 0.4 for An. dirus A and initial 𝛉0 = 0.7 for An. dirus D. Different starting genealogies were used for each long search. The median values of 𝛉0 and the corresponding growth rates are shown in table 5 . Consistently lower estimates of 𝛉0 were obtained for An. dirus A than for species D, and the estimates of growth rates were consistently higher for An. dirus A than for An. dirus D. Although the 𝛉0 confidence intervals of each species overlap, there is no overlap of the g confidence intervals. The values for 𝛉0 and g are both very large. Consequently, the estimates of current female effective population sizes from 𝛉0 are extremely large, approximately two to three orders of magnitude larger than those for An. dirus C. Assuming a constant size for species C, the female effective population size, N, is 6.50 × 105 or 7.10 × 105 (from 𝛉π and 𝛉S, respectively) for population 5, and N is 3.32 × 106 or 2.85 × 106 (from 𝛉π and 𝛉S, respectively) for population 7.

Discussion

Homoplasy

Extensive homoplasy was observed in An. dirus A and D and places some limitations on interpretation of the data. The homoplasy could indicate the presence of hypervariable sites, particularly given the strong AT bias. The presence of five sites that segregate at more than two bases is clear evidence of some site hypervariability. However, to account for all of the observed homoplasy by site hypervariability, each of the 39 apparently nonsingleton mutational events would have to occur on average about twice (to make 78 mutational connections in the network). It is reasonable to suppose that the closely related species An. dirus C would have a similar distribution of hypervariable sites, yet only 4 of its 16 observed mutations were in common with those of species A and D. Even if this low number of shared polymorphic sites were all due to hypervariable sites, rather than to retained ancestral polymorphism, this finding suggests that hypervariable sites cannot be the sole cause of homoplasy. As an example, mutations 1–4, which occur throughout the network (fig. 2B ), do not segregate at more than two bases, and none of these mutations occurs in An. dirus C. Another potential cause of homoplasy is recombination. Although mitochondrial sequences are usually considered strictly maternally inherited and undergo no recombination, evidence has been put forward recently of mtDNA recombination in humans (Eyre-Walker, Smith, and Maynard Smith 1999<$REFLINK> ). A minimum of three recombination events would be required to account for the closed loops in the network (fig. 2B ).

Population Expansion in An. dirus A and D

There was considerable deviation from a Poisson fit in the mismatch distributions (fig. 3 ) due, at least in part, to homoplasy. Nevertheless, the smooth, unimodal distributions are consistent with some sort of population expansion, as simulations have shown that stable populations almost never produce this type of profile (Harpending 1994<$REFLINK> ). Nor can such a profile be the result of pooling data from structured subpopulations (Rogers et al. 1996<$REFLINK> ). This expansion can be visualized in the starlike haplotype network (fig. 2B ). The six core haplotypes (56, 55, 31, 60, 30, and 59) that have many mutational connections, tend to be at a slightly higher frequency, and come from a wide geographical area are probably ancestral (Crandall and Templeton 1993<$REFLINK> ).

An excess of low-frequency mutations relative to that expected in a population of constant size, particularly in An. dirus D (fig. 4 ), was detected by the tests of neutrality. Mutations present at one or a few copies are likely to be of recent origin, whereas those that occur early in the genealogy are likely to be present at higher frequencies. Fu (1997)<$REFLINK> has shown that his Fs test is particularly powerful at detecting population growth and hitchhiking, whereas the D and F tests of Fu and Li (1993)<$REFLINK> , which essentially test for an excess of singleton versus nonsingleton mutations, are more powerful than Fs and Tajima's (1989)<$REFLINK> test at detecting background selection. The highly significant Fs statistics for An. dirus A and D suggest that population expansion (demographic or selective sweep) is more likely to be the cause of deviation from neutral expectations than background selection. However, high Fs values could also indicate the presence of recombination, as this would increase the number of alleles relative to the number of pairwise differences between sequences.

The estimates of exponential growth rates, g, from the FLUCTUATE program also indicate a large population expansion. However, exponential growth is only one possible mode of growth, and there are many others that could have produced the observed unimodal mismatch distributions (Slatkin and Hudson 1991<$REFLINK> ). The presence of sites 72, 528, 756, and 780 segregating at high frequencies (13–24 out of 72 sequences) in species A and D is not consistent with a simple model of exponential growth, as this is unlikely to result in high-frequency mutations. This implies that the corresponding mutations at these sites (numbered 1–4 in fig. 2B ) existed in the population prior to expansion. This is consistent with these older mutations linking together core haplotypes. If a smaller, more stable population existed for some time before expansion, rare recombination would become less improbable as an explanation for homoplasy involving these sites. Expansion from a population that was sufficiently large to contain some variation already could also contribute to the deviation from the Poisson fit in the mismatch distributions. In estimating 𝛉0 and g, FLUCTUATE assumes a constant rate of exponential growth over the entire length of the genealogy. The fact that An. dirus D has, if anything, a higher value of 𝛉0 but a lower growth rate than species A most likely indicates a longer genealogy for species D in which population expansion occurred earlier, enabling greater diversity to accumulate in species D, rather than implying a higher growth rate for species A.

The time to coalescence in an exponentially expanding population is ī/2u (Slatkin and Hudson 1991<$REFLINK> ), where ī is the average number of pairwise differences between sequences and u is the per-sequence mutation rate per generation. Omitting the sites with mutations 1–4 that are likely to have been present before a rapid growth phase, this relation gives the start of population expansion very approximately as 135,000 years ago in An. dirus A and 250,000 years ago in An. dirus D (assuming 10 generations per year and a mutation rate of 10−8 per site per year; Powell et al. 1986<$REFLINK> ). The human population may also have increased in size, and estimates from mtDNA data put the start of expansion at 60,000–120,000 years ago (Rogers and Harpending 1992<$REFLINK> ) or 262,500 years ago (Slatkin and Hudson 1991<$REFLINK> ). If the mosquito expansions are due to demographic expansion rather than a selective sweep, they may have occurred at approximately the same time as the human population expansion. This could be due to a habitat change that allowed both human and mosquito populations to expand, or, alternatively, the expansion of the human population could have resulted in a habitat change for the mosquitoes that allowed them to expand.

Gene Flow and Population History

When gene flow is inferred indirectly from F statistics, the assumption of equilibrium between drift and migration is often reasonable, since migration is a relatively fast homogenizing force. However, the genetic signature of a population expansion can remain for a long time (Rogers and Harpending 1992<$REFLINK> ), obscuring true ecological population structure that may exist. For An. dirus A and D, there is compelling evidence of population expansion. In addition, since the rate of approach to equilibrium is dependent on effective population size (Nei and Feldman 1972<$REFLINK> ), it would take an extremely long time for the very large population sizes of An. dirus A and D to reach migration-drift equilibrium. It would therefore be misleading to infer migration rates from the FST values.

In contrast to An. dirus A and D, the two C populations show considerable population structure. The fixed difference between the two species C populations indicates that the population 5 sample has a distinct genealogy in which all of the alleles coalesce more recently than the time of any connection they have to alleles in the other population. Hey (1991)<$REFLINK> has shown that for these sample sizes even a single fixed difference is significant and implies an absence of gene flow between these two populations. An FST value of 0.421 therefore probably reflects a recent shared ancestry rather than indicating low levels of gene flow. The genetic structure evident within and between populations of An. dirus C indicates that this species has a population history very different from that of species A and D. This may be due to the patchy distribution of An. dirus C, making it more likely to have been influenced by colonization events. If species A and D have undergone demographic expansion, this is more likely to have been due to contiguous range expansion.

Genetic Introgression

The genetic similarity between An. dirus A and D, along with their recent, common genealogy, was completely contrary to the expectation of a sister relationship between species A and C, with species D being more distant. This could be explained by the retention of ancestral polymorphism between species A and D if our current understanding of the species relationships is incorrect. However, in addition to evidence already cited (cytological, hybrid fitness, and allozyme) to support a close An. dirus A and C relationship, sequence data from nuclear loci also concur with this (unpublished data). An alternative explanation of these observations is that mtDNA has introgressed from An. dirus D into species A. This is also consistent with laboratory cross-mating studies in which female hybrids were formed from species A (male) × species D (female) crosses but not from the reverse pairing of the same number of individuals (Baimai et al. 1987<$REFLINK> ). The low fitness of these hybrids suggests that any route for interspecific gene flow must be severely restricted.

Since there are no highly divergent lineages of what could be original species A mtDNA within An. dirus A, this implies that species D mtDNA has replaced all of the species A mtDNA. It is unlikely that all of the An. dirus D haplotypes would be passed into species A, and this effective genetic bottlenecking could have resulted in the lower diversity observed in An. dirus A. Given the restricted route to gene flow, a selective sweep of mtDNA that originated from An. dirus D seems more plausible than demographic expansion. This is also in accord with the indication of a population expansion occurring in An. dirus D before species A. However, the earlier suggestion that the population expansion in An. dirus D, as well as that in An. dirus A, may have stemmed from more than one haplotype would seem to counter a selective sweep. An alternative argument is that introgression could have occurred when An. dirus A had a small population size, such that drift would have a strong effect, and shared a restricted distribution with species D, followed by independent range expansion.

Explanations of low between-species differentiation in terms of ongoing, rather than historical, introgression seem highly implausible. First, the scale of hybridization events would have to be very large to overcome the low hybrid fitness. Second, opportunities for introgression would be limited to the narrow area of overlap of species A and D (coupled with extensive within-species gene flow) and/or arise as a result of long- distance migration. Third, despite their apparent intermingling, there does seem to be some separation between species A and D haplotypes in the network, suggestive of independent recent evolutionary histories: lineages arising from core haplotypes 30 and 55 are composed entirely of species D haplotypes, and 9 of the 10 lineages arising from core haplotype 31 are found in species A.

Comparison with the African Malaria Vectors Anopheles gambiae and Anopheles arabiensis

Findings strikingly similar to those reported here for An. dirus A and D have been reported in a similar mtDNA study on a pair of closely related mosquito species, An. gambiae and An. arabiensis, from east, west, and south Africa (Besansky et al. 1997<$REFLINK> ). The very low levels of genetic differentiation within and between these species were interpreted in terms of both extensive gene flow and recent range expansion from relatively large stable populations. It was suggested that it is at least theoretically possible that equilibrium exists between gene flow and genetic drift across Africa within each species. Lehmann et al. (1996)<$REFLINK> also suggested that intraspecific gene flow on a continental scale is plausible via the considerable active dispersal ability of An. gambiae and long-distance migration facilitated by humans. However, given the evidence for population expansion (Besansky et al. 1997<$REFLINK> ) and the large effective population sizes of these mosquitoes (Lehmann et al. 1998<$REFLINK> ), migration-drift equilibrium seems to us unlikely. As in An. dirus A and D, this would preclude the inference of gene flow from population structure.

The mtDNA data of An. gambiae and An. arabiensis were also interpreted as suggesting extensive interspecific gene flow. Anopheles gambiae and An. arabiensis differ from An. dirus A and D in that they are largely sympatric across sub-Saharan Africa. In addition, their female hybrids are fertile (Coluzzi et al. 1979<$REFLINK> ). Extensive ongoing interspecific gene flow is therefore more feasible between An. gambiae and An. arabiensis. In the starlike network of the An. gambiae and An. arabiensis haplotypes, haplotypes of one species connect to apparently younger haplotypes in another species that could imply interspecific gene flow (Besansky et al. 1997<$REFLINK> ). However, as in An. dirus A and D, homoplasy is also evident in the An. gambiae data with many alternative connections in the network, so it is difficult to make any convincing inferences from the relationships between particular haplotypes.

In conclusion, historical introgression appears to be sufficient to explain the observations in An. dirus A and D without the need to infer large-scale contemporary interspecific gene flow and, indeed, some reason to think that it is unlikely. Ongoing extensive interspecific gene flow is plausible in An. gambiae and An. arabiensis but unlikely in An. dirus. The similarity of the two mtDNA data sets therefore suggests that the hypothesis of only historical introgression (without ongoing introgression) between species cannot as yet be rejected in either case. Even within species, the genetic data for An. dirus A and D, and, by analogy, An. gambiae and An. arabiensis, seem to be adequately explained by historical population processes without any need to infer extensive gene flow. Even if they have not reached migration-drift equilibrium, information on gene flow may be contained within the genealogical relationships of the haplotypes. However, before a phylogenetic approach can be used, much greater certainty is required with regard to the relationships between haplotypes. The problem of homoplasy encountered here is familiar in the analysis of human mtDNA, in which there is both site hypervariability (Wakeley 1993<$REFLINK> ) and possibly also recombination (Eyre-Walker, Smith, and Maynard Smith 1999<$REFLINK> ). In addition to obtaining more mtDNA sequences to gain greater certainty about the haplotype relationships, it is clearly essential to obtain data from the same populations on nuclear loci to be able to distinguish between the hypotheses of a selective sweep and demographic expansion. Achieving a greater understanding of the evolutionary history of these mosquitoes is important if reliable estimates of gene flow, which are critical to the control of malaria, are to be made in the future.

Eleftherios Zouros, Reviewing Editor

1

Present address: Department of Biological Sciences, University of Notre Dame.

2

Keywords: Anopheles dirus, mitochondrial DNA cytochrome oxidase I homoplasy population expansion malaria vectors

3

Address for correspondence and reprints: Catherine Walton, School of Biology, University of Leeds, Leeds LS2 9JT, United Kingdom. E-mail: gencw@leeds.ac.uk.

Table 1 Summary Data for Haplotypes Among Sites and Species

Table 2 Variable Positions in the 923 bp of the COI Gene for the 70 Haplotypes of Anopheles dirus Species A, C, and D

Table 2 Continued

Table 2 Continued

Table 3 FST Estimates for Within- and Between-Species Comparisons

Table 4 Tests of Neutrality

Table 5 FLUCTUATE Estimates of 𝛉0 and the Exponential Growth Rate

Fig. 1.—The distributions of Anopheles dirus A, C, and D in southeast Asia. The northern distribution limits of An. dirus species A and D are uncertain. Only the known sites are marked for An. dirus C, which has a patchy distribution in Thailand, but the species could extend westward into Myanmar. The collection sites are numbered 1–11

Fig. 1.—The distributions of Anopheles dirus A, C, and D in southeast Asia. The northern distribution limits of An. dirus species A and D are uncertain. Only the known sites are marked for An. dirus C, which has a patchy distribution in Thailand, but the species could extend westward into Myanmar. The collection sites are numbered 1–11

Fig. 2.—Haplotype networks for Anopheles dirus species C (A) and species A and D (B). The size of a circle is proportional to the number of haplotypes observed once, twice, or three times, and a unit branch length represents one mutation. 0 = postulated haplotypes that were not observed. A, Haplotype network for population 5C (open circles) and population 7C (filled circles) of species C. The network is connected to that of species A and D through the postulated ancestral haplotype by seven fixed mutational differences. B, Haplotype network for species A (open circles) and D (filled circles). Gray circles represent haplotypes shared by An. dirus A and D. Numbers 1–5 along branches identify sites with homoplastic substitutions along the closed loops discussed in the text: 1—780 A/G; 2—72 A/G; 3—756 T/A; 4—528 A/G; 5—159 C/T

Fig. 2.—Haplotype networks for Anopheles dirus species C (A) and species A and D (B). The size of a circle is proportional to the number of haplotypes observed once, twice, or three times, and a unit branch length represents one mutation. 0 = postulated haplotypes that were not observed. A, Haplotype network for population 5C (open circles) and population 7C (filled circles) of species C. The network is connected to that of species A and D through the postulated ancestral haplotype by seven fixed mutational differences. B, Haplotype network for species A (open circles) and D (filled circles). Gray circles represent haplotypes shared by An. dirus A and D. Numbers 1–5 along branches identify sites with homoplastic substitutions along the closed loops discussed in the text: 1—780 A/G; 2—72 A/G; 3—756 T/A; 4—528 A/G; 5—159 C/T

Fig. 3.—Observed mismatch distributions among haplotypes in species A and D indicating their fit to a Poisson distribution. Mean distance was greater in D (6.00) than in A (4.00). Both distributions differ from Poisson in the same direction; the variance is much lower than the mean (χ2: for species D, s2 = 3.73 and P < 0.001; for species A, s2 = 2.39 and P < 0.001). Note that for an exponentially expanding population, only an approximate Poisson distribution is expected and this test therefore only shows the deviation of a fit to a Poisson distribution (see text)

Fig. 3.—Observed mismatch distributions among haplotypes in species A and D indicating their fit to a Poisson distribution. Mean distance was greater in D (6.00) than in A (4.00). Both distributions differ from Poisson in the same direction; the variance is much lower than the mean (χ2: for species D, s2 = 3.73 and P < 0.001; for species A, s2 = 2.39 and P < 0.001). Note that for an exponentially expanding population, only an approximate Poisson distribution is expected and this test therefore only shows the deviation of a fit to a Poisson distribution (see text)

Fig. 4.—Frequency spectrum for the numbers of segregating sites in An. dirus A and D. Diamonds and triangles show the number of expected sites in each frequency class under a model of constant population size (see Harpending et al. 1998<$REFLINK> ). There is uncertainty in the ancestral state at a site; for example, a mutation that occurred early in the genealogy and currently occupies all but one site cannot be distinguished from a mutation that has occurred recently and occupies only one site. The expected distribution was therefore folded at one half the sample size. The resulting number was multiplied by the function 1 − (xn + (1 − x)n) (where x = frequency of mutant sites) to remove ascertainment bias due to the small sample size relative to the number of alleles likely to be present in the population (Harpending et al. 1998<$REFLINK> ). Due to the interdependence of sites, there is no simple test of the constant-size hypothesis

Fig. 4.—Frequency spectrum for the numbers of segregating sites in An. dirus A and D. Diamonds and triangles show the number of expected sites in each frequency class under a model of constant population size (see Harpending et al. 1998<$REFLINK> ). There is uncertainty in the ancestral state at a site; for example, a mutation that occurred early in the genealogy and currently occupies all but one site cannot be distinguished from a mutation that has occurred recently and occupies only one site. The expected distribution was therefore folded at one half the sample size. The resulting number was multiplied by the function 1 − (xn + (1 − x)n) (where x = frequency of mutant sites) to remove ascertainment bias due to the small sample size relative to the number of alleles likely to be present in the population (Harpending et al. 1998<$REFLINK> ). Due to the interdependence of sites, there is no simple test of the constant-size hypothesis

The mosquito collections on which this work was based would not have been possible without the helpful support of the staff of the malaria centers of Thailand and Mr. Sawadwongporn and Mr. Somsak. We would also like to thank Professor Sornchai Looareesuwan for the samples from Ratchaburi in Thailand, N. P. Maheswary for the samples from Bangladesh, Brian Holloway and the staff of the Biotechnology Core Facility at the Centers for Disease Control and Prevention in Atlanta, Ga., for synthesizing oligonucleotide primers, and Alan Templeton for providing his Mathematica package. This work was funded by a Wellcome Trust (London) fellowship in Biodiversity to C. Walton with the sponsorship of R. K. Butlin.

literature cited

ABI.
1992
. SeqEd DNA Sequence Editor. Version 1.0.3. ABI, Foster City, Calif.
Baimai, V., R. G. Andre, B. A. Harrison, U. Kijchalao, and L. Panthusiri.
1987
. Crossing and chromosomal evidence for two additional sibling species within the taxon Anopheles dirus Peyton and Harrison (Diptera: Culicidae) in Thailand.
Proc. Entomol. Soc. Wash.
 
89
:
157
–166.
Baimai, V., B. A. Harrison, and V. Nakavachara.
1980
. The salivary gland chromosomes of Anopheles (Cellia) dirus (Diptera: Culicidae) of the Southeast Asian Leucosphyrus Group.
Proc. Entomol. Soc. Wash.
 
82
:
319
–328.
Baimai, V., U. Kijchalao, P. Sawadwongporn, and C. A. Green. 1988a. Geographic distribution and biting behaviour of four species of the Anopheles dirus complex (Diptera: Culicidae) in Thailand. Southeast Asian J. Trop. Med. Public Health 19:151–161.
Baimai, V., A. Poopittayasataporn, and U. Kijchalao.
1988
. Cytological differences and chromosomal rearrangements in four members of the Anopheles dirus complex (Diptera: Culicidae) in Thailand. Genome 30:372–379.
Baimai, V., M.-M. Thu, M. Paing, and N. P. Maheswary. 1988b. Distribution and chromosomal polymorphism of the malaria vector Anopheles dirus species D. Southeast Asian J. Trop. Med. Public Health 19:661–665.
Beard, C. B., D. M. Hamm, and F. H. Collins.
1993
. The mitochondrial genome of the mosquito Anopheles gambiae: DNA sequence, genome organisation, and comparisons with mitochondrial sequences of other insects.
Insect Mol. Biol.
 
2
:
103
–124.
Besansky, N. J., T. Lehmann, T. Fahey, D. Fontenille, L. E. O. Braack, W. A. Hawley, and F. H. Collins.
1997
. Patterns of mitochondrial variation within and between African malaria vectors, Anopheles gambiae and An. arabiensis, suggest extensive gene flow. Genetics 147:1817–1828.
Collins, F. H.
1994
. Prospects for malaria control through the genetic manipulation of its vectors. Parasitol. Today 10:370–371.
Coluzzi, M., A. Sabatini, V. Petrarca, and M. A. Di Deco.
1979
. Chromosomal differentiation and adaptation to human environments in the Anopheles gambiae complex.
Trans. R. Soc. Trop. Med. Hyg.
 
73
:
483
–487.
Crandall, K. A., and A. R. Templeton.
1993
. Empirical tests of some predictions from coalescent theory with applications to intraspecific phylogeny reconstruction. Genetics 134:959–969.
Excoffier, L., P. E. Smouse, and J. M. Quattro.
1992
. Analysis of molecular variance inferred from metric distances among DNA haplotypes: application to human mitochondrial DNA restriction data. Genetics 131:479–491.
Eyre-Walker, A., N. H. Smith, and J. Maynard Smith.
1999
. How clonal are human mitochondria? Proc.
R. Soc. Lond. B Biol. Sci.
 
266
:
477
–483.
Fu, Y.-X.
1996
. New statistical tests of neutrality for DNA samples from a population. Genetics 143:557–570.
———.
1997
. Statistical tests of neutrality of mutations against population growth hitchhiking and background selection. Genetics 147:915–925.
Fu, Y.-X., and W. H. Li.
1993
. Statistical tests of neutrality of mutations. Genetics 133:693–709.
Genetics Computer Group.
1994
. Program manual for the Wisconsin package. Version 8. Madison, Wis.
Green, C. A., L. E. Munstermann, S. G. Tan, S. Panyim, and V. Baimai.
1992
. Population genetic evidence for species A, B, C and D of the Anopheles dirus complex in Thailand and enzyme electromorphs for their identification.
Med. Vet. Entomol.
 
6
:
29
–36.
Harpending, H. C.
1994
. Signature of ancient population growth in a low-resolution mitochondrial DNA mismatch distribution.
Human Biol.
 
66
:
591
–600.
Harpending, H. C., M. A. Batzer, M. Gurven, L. B. Jorde, A. R. Rogers, and S. T. Sherry.
1998
. Genetic traces of ancient demography. Proc. Natl. Acad. Sci. USA 95:1961–1967.
Hey, J.
1991
. The structure of genealogies and the distribution of fixed differences between DNA sequence samples from natural populations. Genetics 128:831–840.
Kuhner, M. K., J. Yamato, and J. Felsenstein.
1998
. Maximum likelihood estimation of population growth rates based on the coalescent. Genetics 149:429–434.
Lehmann, T., W. A. Hawley, H. Grebert, and F. H. Collins.
1998
. The effective population size of Anopheles gambiae in Kenya: implications for population structure.
Mol. Biol. Evol.
 
15
:
264
–276.
Lehmann, T., W. A. Hawley, L. Kamau, D. Fontenille, F. Simard, and F. H. Collins.
1996
. Genetic differentiation of Anopheles gambiae populations from East and West Africa: comparison of microsatellite and allozyme loci. Heredity 77:192–208.
Nei, M., and M. W. Feldman.
1972
. Identity of genes by descent within and between populations under mutation and migration pressures.
Theor. Popul. Biol.
 
3
:
460
–465.
Peyton, E. L.
1989
. A new classification for the Leucosphyrus group of Anopheles (Cellia). Mosq.
Syst.
 
21
:
197
–205.
Powell, J. R, A. Caccone, G. D. Amato, and C. Yoon.
1986
. Rates of nucleotide substitution in Drosophila mitochondrial DNA and nuclear DNA are similar. Proc. Natl. Acad. Sci. USA 83:9090–9093.
Rogers, A. R., A. E. Fraley, M. J. Bamshad, W. S. Watkins, and L. B. Jorde.
1996
. Mitochondrial mismatch analysis is insensitive to the mutational process.
Mol. Biol. Evol.
 
13
:
895
–902.
Rogers, A. R., and H. C. Harpending.
1992
. Population growth makes waves in the distribution of pairwise genetic differences.
Mol. Biol. Evol.
 
9
:
552
–569.
Rohlf, F. J.
1990
. NTSYS-pc numerical taxonomy and multivariate analysis system. Version 1.60. Exeter, Setauket, New York.
Rosenberg, R., and N. P. Maheswary.
1982
. Forest malaria in Bangladesh.
II. Transmission by Anopheles dirus. Am. J. Trop. Med. Hyg.
 
31
:
183
–191.
Schneider, S., J.-M. Kueffer, D. Roessli, and L. Excoffier.
1997
. ARLEQUIN 1.1. A software for population genetic data analysis. URL: http://anthropologie.unige.ch/arlequin.
Slatkin, M., and R. R. Hudson.
1991
. Pairwise comparisons of mitochondrial DNA sequences in stable and exponentially growing populations. Genetics 129:555–562.
Sunnucks, P., and D. F. Hales.
1996
. Numerous transposed sequences of mitochondrial cytochrome oxidase I-II in aphids of the genus Sitobian (Hemiptera: Aphididae).
Mol. Biol. Evol.
 
13
:
510
–524.
Tajima, F.
1983
. Evolutionary relationship of DNA sequences in finite populations. Genetics 123:437–460.
———.
1989
. Statistical method for testing the neutral mutation hypothesis by DNA polymorphism. Genetics 123:585–595.
Templeton, A. R.
1998
. Nested clade analyses of phylogeographic data: testing hypotheses about gene flow and population history.
Mol. Ecol.
 
7
:
381
–397.
Templeton, A. R., K. A. Crandall, and C. F. Sing.
1992
. A cladistic analysis of phenotypic associations with haplotypes inferred from restriction endonuclease mapping and DNA sequence data. III. Cladogram estimation. Genetics 132:619–633.
Tun-Lin, W., M.-M. Thu, S.-M. Than, and M.-M. Mya.
1995
. Hyperendemic malaria in a forested, hilly Myanmar village.
J. Am. Mosq. Control Assoc.
 
11
:
401
–407.
Wakely, J.
1993
. Substitution rate variation among sites in hypervariable region 1 of human mitochondrial DNA.
J. Mol. Evol.
 
37
:
613
–623.
Walton, C., J. M. Handley, C. Kuvangkadilok, F. H. Collins, R. E. Harbach, V. Baimai, and R. K. Butlin. 1999a. Identification of five species of the Anopheles dirus complex from Thailand, using allele-specific polymerase chain reaction. Med. Vet. Entomol. 13:24–32.
Walton, C., R. G. Sharpe, S. J. Pritchard, N. J. Thelwell, and R. K. Butlin. 1999b. Molecular identification of mosquito species. Biol. J. Linn. Soc. 68:241–256.
Watterson, G. A.
1975
. On the number of segregation sites in genetic models without recombination.
Theor. Popul. Biol.
 
7
:
256
–276.