Abstract
The current study examined temporal and spatial distribution patterns of anopheline malaria vectors in a highland site and determined the number of houses to be sampled to achieve the targeted precision level. Adult mosquito sampling was conducted seasonally in May and August 2002 in a 3 by 3-km2 area, and in November 2002 and February 2003 in an expanded 4 by 4-km2 area in Kakamega District, western Kenya. Anopheles gambiae Giles was the predominant malaria vector species, constituting 84.6% of the specimens, whereas Anopheles funestus Giles constituted 15.4% of the vector populations. An. gambiae abundance increased by six- to eight-fold in the long rainy season over the dry seasons, but An. funestus abundance peaked 3 mo after the long rainy season. For both species, the coefficient of variation was larger than 1, suggesting that the distribution of mosquito adults was aggregated. Mosquito clustering occurred in houses <400 m from a valley bottom. The negative binomial distribution was accepted in one sample period (August 2002) for An. gambiae and in two sampling periods (May and August 2002) for An. funestus. Taylor’s power law analyses indicated that An. gambiae distribution was more aggregated in the wet seasons than in the dry seasons, whereas the degree of aggregation of An. funestus was similar in all four seasons. The minimum number of houses required to estimate anopheline female abundance within the commonly acceptable precision level (0.2) should be 17 houses per km2 for An. gambiae and 42 houses per km2 for An. funestus. The potential factors causing aggregated anopheline mosquito distribution are discussed.
Malaria is a major public health problem in Africa (Breman 2001). Determination of malaria transmission intensity and efficacy of vector control requires accurate information on the abundance of malaria vectors. However, estimation of vector abundance and monitoring vector population dynamics should be based on adequate field sampling plans based on an acceptable precision level to ascertain the reliability and validity, i.e., the true values of the data. Surveying biting malaria vector abundance in a house can be achieved through a variety of sampling methods, such as the indoor pyrethrum spray catch (PSC) method (WHO 1975, Mnzava and Kilama 1986, Githeko et al. 1993, Ribeiro et al. 1996, Lindblade et al. 2000, Minakawa et al. 2002a, van der Hoek et al. 2003), the human-baited catch method (Somboon et al. 1995, 1998, Le Goff et al. 1997), and light traps (Githeko et al., 1994, Davis et al. 1995, Mbogo et al. 1999). One common question facing field medical entomologists is, What would be the appropriate sample size to estimate vector abundance with targeted precision level? Although sample size may be limited by labor, sensitivity of collection methods, and other logistics, most published studies on malaria vector ecology use arbitrary sample size and often overlook the statistical consideration. Furthermore, the environmental factors surrounding the houses may be spatially heterogeneous and may have significant effects on the spatial distribution of the vectors. For example, the number of sleepers, the house roof materials (grass thatch, iron, or tile roof; Carter et al. 2000, van der Hoek et al. 2003), and use of insecticide-impregnated bed nets have significant effects on the number of mosquitoes caught by the PSC method (Bogh et al. 1998, Yohannes et al. 2000, Guillet et al. 2001, Jawara et al. 2001). Availability of mosquito larval habitats and distance to larval habitats also affect mosquito abundance (Sadanandane et al. 1993, Minakawa et al. 2002b). Thus, sampling size determination needs to take environmental heterogeneity into consideration.
Prior information on mosquito distribution pattern is useful for developing adequate mosquito sampling plans (Reisen and Lothrop 1999). Negative binomial function is commonly used to describe aggregated distributions of insect population (Southwood and Henderson 2000); however, insect distribution may vary among seasons, and the typical negative binomial distribution may not describe the distribution pattern well. Several empirical methods have been proposed to formulate sampling plans that do not necessitate knowing population distribution (Southwood and Henderson 2000). For example, Taylor’s power law can be used to determine sample size and to design a sampling plan based on the information on sample mean and variance (Taylor 1961). Indeed, this method has been frequently used for agricultural insect pest management (Pichett and Gilstrap 1986, Ferrer and Shepard 1987, Smith and Hepworth 1992, Cho et al. 1995, Nestel et al. 1995, Naranjo et al. 1997, Sanchez et al. 2002) and mosquito sampling plan development (Service 1971, Mackey and Hoy 1978, Sandoski et al. 1987, Ritchie and Johnson 1991, Pitcairn et al. 1994, Reisen and Lothrop 1999, Lindblade et al. 2000). Pitcairn et al. (1994) used Taylor’s power law and Iwao’s patchiness regression methods to measure spatial distribution patterns of Anopheles freeborni Aitken and Culex tarsalis Coquillett larvae in rice fields in northern California, and they found that the degree of aggregation for both species was highest among the first instars. Reisen and Lothrop (1999) compared sampling plans that required no information or prior knowledge of mosquito spatial distribution with those with known distribution patterns, and they found that a stratified random design based on the known spatial distribution pattern was the most accurate method for estimating Cx. tarsalis adult abundance. In addition to the mean mosquito abundance per house, information on the proportion of houses with or without malaria vectors is useful for a quick assessment of a vector control measure because this information can be obtained with substantially less labor than the complete count sampling method.
Since the late 1980s, a series of malaria epidemics have occurred in the east African highlands (Lindsay and Martens 1998, Malakooti et al. 1998, Mouchet et al. 1998, Lindblade et al. 1999, Bødker et al. 2000, Shanks et al. 2000). In the western Kenya highlands, the number of districts prone to malaria epidemics increased from three in 1991 to 15 in 2001 (Githeko and Ndegwa 2001), and the primary malaria vector species are An. gambiae, An. arabiensis, and An. funestus (Minakawa et al. 2002a). High case mortality rates as a consequence of low immunity against malaria infection in the human population of the highlands have attracted considerable public attention. The World Health Organization calls for the development of malaria early warning system (MEWS) for the African highlands. Monitoring the dynamics of vector populations and malaria transmission intensity is essential to the MEWS. Due to high environmental heterogeneity and low mosquito vector abundance in the highlands, estimation of mosquito vector abundance with high statistical confidence may require sampling a large number of houses. However, sampling a large number of houses may incur more expenses and thus not practical under some circumstances. In such cases, the precision level of the estimation should be determined before the experimentation. The objective of the present study was to determine the spatial distribution patterns of the vector population and to use the information for sample size calculation for subsequent ecological studies. The purpose of sample size calculation was to determine the number of houses required to estimate mosquito abundance within the targeted precision level. Sample size determination is critical for monitoring vector population dynamics and for assessing the efficacy of vector control measures in the highlands.
Materials and Methods
Study Area.
Adult mosquito sampling were conducted in the long rainy season (8–15 May), dry seasons (18 July–14 August), short rainy season (11 November–3 December) in 2002, and dry season (13–28 February) in 2003, in Iguhu village, Kakamega District, Western Province, Kenya. Iguhu is located at 34°45″ E and 0°10″ N, and the elevation of the study area ranges from 1,420 to 1,600 m above sea level. The first two samplings covered an area of 3 by 3-km2; the study area was expanded to 4 by 4 km2 for the last two samplings to obtain information on the spatial heterogeneity of vector abundance in a larger area. The 4 by 4-km2 study area consists of ≈2,500 households and a human population of ≈11,000. The 1960–1999 average annual rainfall was 1,977 mm, and annual mean minimum/maximum temperature is 13.8/28.0°C, with the hottest season in January–February and the coolest season in July–August. The study area covers a valley where the Yala River runs through (Fig. 1). The study area includes a mosaic of land use types. The hill is mostly maize land dotted by patches of tea plantation, and several swamps are located along the Yala River valley. A natural forest is located in the east side of the 4 by 4-km2 study area, constituting ≈15% of total area.
Study area and An. gambiae (A) and An. funestus (B) abundance (number of female mosquitoes per house) distribution during the November 2002 survey.
Study area and An. gambiae (A) and An. funestus (B) abundance (number of female mosquitoes per house) distribution during the November 2002 survey.
Mosquito Sampling.
Adult mosquitoes were sampled using the indoor pyrethrum spray collection method (WHO 1975). The numbers of houses sampled varied among the seasons, ranging from 80 to 300 houses. The number of sampled houses for the first two seasons was arbitrarily chosen because there was no prior information on vector abundance and spatial heterogeneity. As a consequence of study area expansion, the number of sampled houses was increased during the last two sampling periods. Because residential houses are scattered in the study area, houses were randomly selected for vector abundance sampling. Anopheline mosquito abundance (i.e., the number of anopheline female mosquitoes per house) in individual houses was recorded, and the geographical information system (GPS) coordinates of each house were recorded using differential GPS (Hightower et al. 1998).
An. gambiae sensu lato (s.l.) was distinguished from An. funestus s.l. based on morphological characters (Gillies and De Meillon 1968). One hundred mosquitoes in the An. gambiae complex and 50 mosquitoes in the An. funestus complex from the May 2002 collection were randomly selected and further identified using the rDNA-polymerase chain reaction (PCR) method (Scott et al. 1993, Koekemoer et al. 2002).
Spatial Distribution.
The coefficient of variation was calculated for An. gambiae and An. funestus for each sampling season by dividing the sample variance by the sample mean. The spatial distribution of the malaria vectors was tested for goodness-of-fit for the Poisson and negative binomial distribution. The Poisson distribution describes random patterns in which, for any given average abundance (λ) in the study area, the probability to find x number of mosquitoes in a house is determined by $$P_x=\frac{\lambda^x}{x!}e^{-\lambda}$$. Parameter λ was estimated from the sample mean. The negative binomial distribution describes aggregated patterns. The probability of one house with x number of mosquitoes is calculated by the equation $$P_x=\frac{(k+x-1)!}{x!(k-1)}p^xq^{-k-x}$$, where k measures the degrees of aggregation, p is the probability that a mosquito adult can be found from a given house, and q = 1 - p. These parameters can be estimated from sample means $$(\bar{x})$$) and variance (S2) based on the equation $$\hat{k}=\bar{x}/\hat{p}$$, where $$\hat{p}=(S^2/\bar{x})-1$$. The goodness-of-fit of the experimental data to the negative binomial and Poisson distributions was tested using the χ2 test (Zar 1999). The negative binomial distribution is accepted only if the probability of being accepted is >0.95. The aggregated distribution of malaria vectors in areas with a high environmental heterogeneity can be described by Taylor’s power law, s2 = amb, where s2 is sample variance and m is mean abundance (Taylor 1961). Parameter a is a scaling factor, and b provides a description of the spatial structure (e.g., the degree of aggregation) of a species in a particular environment (Southwood and Henderson 2000). When spatial distribution of individuals is random, variance and mean are equal and b = 1. Spatial distribution is considered regular when b < 1 and aggregated when b > 1. The degree of aggregation increases with increasing b value (Taylor 1961, Taylor et al. 1978, Ritchie et al. 1991, Pitcairn et al. 1994, Nestel et al. 1995, Strong et al. 1997, Sanchez et al. 2002). The frequency of recurrence of similar b values in various seasons in the mosquito vectors in the highlands is our particular interest. To estimate the parameters a and b, we divided the study area into 20 polygons for the four sampling seasons according to topography (e.g., streams, rivers, and ridges and valleys). The number of sampled houses in each polygon was similar within a sampling season, but it varied among sampling seasons due to the variable total number of houses sampled. Parameters a and b were estimated by linear regression analysis of log-transformed sample means against log-transformed variance of abundance in these polygons. The t-test was used to determine whether the estimated b value is different from one unit for each species and each season; the F-test was used to compare the b values among sampling seasons for both An. gambiae and An. funestus (Zar 1999).
Sample Size Estimation.
The minimum sample size (n) required for detecting the population mean abundance with a targeted precision level and at a significance level α can be determined based on mosquito spatial distribution patterns. The precision level is defined as $$D=SE/\bar x$$, where $$\bar x$$ is the sample mean abundance and SE is the standard error of the mean abundance. The allowable precision level in ecological research is typically 10–25% (Southwood and Henderson 2000). In the case of Poisson distribution, the minimum sample size (n) can be calculated using the formula $$n = (t/D)^2/\bar x$$, where t is the critical value for the t distribution at the fixed type I error with the degree of freedom that sample mean ($$\bar x$$) was calculated. If the mosquito spatial distribution is a negative binomial distribution, n can be calculated using $$n = (t/D)^2(\bar x+k)/\bar xk$$, where k is the parameter describing the negative binomial distribution. If the mosquito distribution pattern cannot be described by either Poisson distribution or negative binomial distribution, n was calculated based on Taylor’s power law by using the formula $$n = a\bar x^{b-2}(t/D)^2$$, where a and b are the parameters obtained from Taylor’s power law (Southwood and Henderson 2000).
Results
Malaria Vector Temporal Dynamics.
During the four sampling periods, a total of 2,696 female anopheline mosquitoes were collected. An. gambiae s.l. was the predominant malaria vector species, constituting 84.6% of the specimens, whereas 15.4% of the specimens were An. funestus s.l. PCR analysis found that all An. gambiae s.l. specimens were An. gambiae sensu stricto (s.s.), and all An. funestus s.l. specimens belonged to An. funestus s.s. The frequency distribution of the number of anopheline female mosquitoes caught in each house is shown in Table 1. The mean An. gambiae abundance was 8.9 mosquitoes per house in the long rainy season of May, but 1.6–3.2 mosquitoes per house during the dry and short rainy seasons (Table 1). In contrast, An. funestus abundance was 0.6 mosquitoes per house in May, and peaked (2.1 mosquitoes per house) in August, right after the long rainy season. For both species, the coefficient of variation (CV) was >1, suggesting that the distribution of mosquito adults was aggregated. Figure 1 shows an example of mosquito distribution in the November 2002 sampling for An. gambiae and An. funestus. Houses with higher numbers of adult mosquitoes were those houses within 400 m of the Yala River, near the valley bottom, for both An. gambiae and An. funestus in all four sampling seasons.
Frequency distribution of An. gambiae and An. funestus female adults in each house in four sampling periods and the summary statistics
Malaria Vector Spatial Distribution.
χ2 tests rejected Poisson distribution for An. gambiae and An. funestus in all four sampling seasons, indicating nonrandom distribution of malaria vectors in our study area (data not shown). We use a conservative criterion to consider the mosquito distribution as the negative binomial distribution. The negative binomial distribution is accepted only if the probability of being accepted is >0.95. By this criterion, the negative binomial distribution is accepted for An. gambiae in one sample period (August 2002), and in two sampling periods (May and August 2002) for An. funestus. The k values ranged from 0.14 to 0.19 (Table 2).
Test of goodness of fit to the negative binomial distribution for Anopheles adult mosquitoes in Iguhu, Kakamega District, western Kenya highland
The relationship between the log-transformed sample variances and log-transformed sample mean abundances in the polygons fits very well to Taylor’s power law for both An. gambiae and An. funestus in all four sampling seasons. The minimum r2 of the regression was >0.92 (P < 0.001) for all four sampling seasons and for both species (Table 3). The values of parameter b in Taylor’s power law were significantly >1.0 for both An. gambiae and An. funestus in all four sampling seasons (P < 0.05), indicating aggregation distributions of adult mosquitoes of both species in all four seasons, consistent with nonrandom distribution results indicated by the coefficient of variation. The aggregation of anopheline adults occurred in houses near the Yala River. For example, in the rainy season (May 2002) the average An. gambiae abundance was 14.6 mosquitoes per house among the houses <200 m from the Yala River, and 5.8 and 0.4 mosquitoes per house for houses at a distance of 600–800 m and 1,200–1,400 m from the river, respectively. For An. gambiae, the b values of the two rainy seasons (May and November 2002) were significantly larger than the two dry seasons (August 2002 and February 2003) (t = 5.91, df = 39, P < 0.001; Table 3), suggesting that An. gambiae distribution was more aggregated in the wet seasons than in dry seasons. However, for An. funestus, the b values were similar in the four sampling seasons (F = 2.80; df = 3, 40; P > 0.10), indicating a similar degree of aggregation.
Taylor's power law parameters of An. gambiae and An. funestus distribution
Sample Size Determination.
Because the negative binomial distribution was accepted for the August 2002 sampling for An. gambiae and for the May and August 2002 samplings for An. funestus, the negative binomial distribution was used to calculate the minimum sample size. To achieve a 0.3 precision level with 0.05 type I error for the average mosquito abundance, the estimated minimum sample size is 251 houses for An. gambiae in August and 362 and 325 houses for An. funestus in May and August, respectively. If the targeted precision level is reduced to 0.2, then a minimum of 564 houses is needed for An. gambiae in August, and 814 houses and 732 houses for An. funestus in May and August, respectively.
Because Taylor’s power law described the aggregated distribution of An. gambiae and An. funestus very well, we determined the relationship between the minimum sample size and average mosquito abundance in residents’ houses at various targeted precision levels by using Taylor’s power law. Figure 2 presents the relationship using the parameter values estimated from the four sampling seasons (Table 3), which provides an approximation to the minimum sample sizes applicable to any sampling seasons. We found that to achieve the 0.2 precision level in the mean mosquito abundance, one needs to survey 69, 251, 274, and 184 houses for An. gambiae (Fig. 2A) and 391, 212, 672, and 603 houses for An. funestus (Fig. 2B) in May, August, and November 2002 and February 2003, respectively. Sample size would be quadrupled if the precision level were reduced to 0.1 from 0.2 for An. gambiae and An. funestus (Fig. 2).
Minimum sample size requirements for malaria vectors of An. gambiae (A) and An. funestus (B) at three precision levels calculated from the Taylor's power law based on parameters for all seasons combined.
Minimum sample size requirements for malaria vectors of An. gambiae (A) and An. funestus (B) at three precision levels calculated from the Taylor's power law based on parameters for all seasons combined.
Discussion
In this study, we examined the temporal and spatial distribution of malaria vectors in a western Kenya highland site. We found that An. gambiae abundance increased by six- to eight-fold in the long rainy season (May) over the dry seasons (August and February). However, An. funestus was most abundant in August 3 mo after the long rainy season. For both species, the ratio of sample variance to mean was >1, suggesting that the distribution of adult mosquitoes was aggregated. The estimated b values of the Taylor’s power law suggest the degree of aggregation of An. gambiae was greater in the rainy season than dry season. The higher degree of aggregation in the rainy season was resulted from the disproportionally large variance in mosquito abundances relative to the mean when the mean abundance was higher. For example, although larval habitats were more limited to the bottom of the Yala River valley in the dry season than rainy season (e.g., 82% of the mosquitoes were collected from the area <300 m to Yala River in August versus 63% in May), coefficient of variance was lower in the dry season than the rainy season (5.1 in August versus 22.2 in May).
Several factors likely influence adult mosquito distribution, including availability of suitable larval habitats, distance to larval habitats from residents’ houses, and the house roof types. In a country-wide mosquito survey in Eritrea comprising 302 villages, Shililu et al. (2003) found that at least 80% of the anopheline mosquitoes were collected from grass-thatched houses. In our study site, we found that mosquito clustering occurred around the Yala River. Most houses (83.7%) use iron sheets as roofing materials, and ≈10% of houses use bed nets, but all bed nets were not treated with insecticides. Moreover, the houses with iron-sheet roof or with bed nets in the study area are randomly distributed in the study site. Thus, house roof type and bed net use should not be an important mechanism for mosquito aggregation. A more plausible explanation is that the clumped distribution of mosquito larval habitats in the valley bottom along the Yala River causes higher adult abundance in the houses near the River, this is especially true during dry season when all the larval breeding sites shrunk to the bottom of the valley. Minakawa et al. (2002b) examined the relationship between anopheline adult abundance and environmental factors, such as human distribution and abundance, cowshed distribution and abundance, and distance to the nearest larval habitats from the sampled house in a lowland site, and they found that distance from a house to the nearest larval habitats was the only variable that showed a significant correlation with the An. gambiae abundance in both rainy and dry seasons. More than 90% of anopheline adults were found in the houses within 300 m from the nearest larval habitats. Similarly, Ribeiro et al. (1996) suggested that the mosquito breeding habitats along an irrigation canal in their study site was the major factor causing the aggregated distribution of An. gambiae.
Topography and landcover have significant effects on the availability of suitable larval habitats in the highlands. The east African highland region contains numerous valleys and basin-like depressions in a plateau where malaria transmission intensity ranges from low to a level as high as the lowlands (Lindsay et al. 1998). There are far more aquatic habitats in the valley bottom than in the hill areas because rainwater in the hill often runs off and cannot accumulate. We observed a few hoofprint or temporary water pond in the hilly area, but the habitats generally dried up quickly. The topographical characteristics in the highlands are different from those in the basin region of Lake Victoria. The flat plain in the lake basin region renders mosquito larval habitats widely spread, whereas larval habitat distribution is more concentrated along the valley bottom (S. Munga, A.K.G., and G.Y., unpublished data). Land cover can affect the development of anopheline larvae. For example, we did not find any An. gambiae larvae in the aquatic habitats inside a forest in our study site, despite the fact that these habitats were within a close distance to residents’ houses, whereas An. gambiae larvae were highly abundant in the surrounding agricultural fields (S. Munga, A.K.G., and G.Y., unpublished data). The canopy cover reduces the exposure of aquatic habitats to sunlight and thus causes lower temperature (Beschta and Taylor 1988) and reduces algal abundance. Whether a habitat contains anopheline larvae also depends on other factors such as human host distribution and the oviposition preference of gravid female mosquitoes (Wekesa et al. 1996). The nonrandom distribution of suitable larval habitats such as stream edges for An. funestus and cultivated swamps for An. gambiae can, in turn, affect the aggregated distribution of adult mosquitoes.
We used untransformed, rather than transformed, data to determine the distribution patterns (Mackey and Hoy 1978, Sandoski et al. 1987, Mogi et al. 1990, Pitcairn et al. 1994, Lindblade et al. 2000) because transformation reduces variance and consequently affecting the sample size calculation. In other analyses (e.g., to examine the factors regulating the mosquito abundance), mosquito abundance data can be transformed (e.g., logarithmic) to normalize and stabilize the variance (Southwood and Henderson 2000). In our study, we tested whether the abundance data fit to the Poisson or negative binomial distribution, transformation is thus not adequate. When vector abundance data did not fit to the Poisson and negative binomial distributions, we used the Taylor’s power law to determine sample size because Taylor’s power law provided a good fit for the relationship between sample variance and mean abundance in our study area for both An. gambiae and An. funestus for all four sampling seasons. Taylor’s power law was used to estimate the minimum sample size required to achieve a desired precision level in average abundance with an acceptable type I error level for the simple random sampling method. We did not use the systematic stratified random sampling method (Keating et al. 2003) because our study area comprises a complex mosaic of landuse patterns and topography that render the systematic strata not clearly defined. For the simple random sampling method, we found that the minimum sample size (n) varied among season and among species due to large between-species and among-season differences in the average mosquito abundance and variance in the abundance of mosquitoes. Generally, with the same targeted precision level, fewer houses are required to survey for An. gambiae than for An. funestus. If the targeted precision level is 0.2, a total of 274 houses would be required in our 4 by 4-km2 area (17 houses per km2) for An. gambiae, and 672 houses (42 houses per km2) for An. funestus, even in the season with the lowest mosquito abundance. However, 63–73% fewer houses can meet the targeted precision level in the long rainy season when mosquito abundance is much higher for An. gambiae. Similarly, 46–68% fewer houses are required in the dry season than in the long rainy and short rainy seasons for An. funestus. If the targeted precision level is reduced to 0.1 from 0.2, the minimum sample size is nearly quadrupled. Whether the parameters in Taylor’s power law (a and b) obtained from the current study can be applicable to other areas with different ecological/topographic conditions for sample size estimation should be tested.
Acknowledgements
We thank M. Abuom, H. Atieli, L. A. Atieli, B. Ingosi, W. Makosa, J. Maritim, V. Misigo, D. Musavi, and F. Oduk for field assistance. We are grateful to the two anonymous reviewers for their critical suggestions. This paper is published with permission from the Director, Kenya Medical Research Institute. This study was supported by National Institutes of Health grant R01-AI50243.


