Malaria Host Candidate Genes Validated by Association With Current, Recent, and Historical Measures of Transmission Intensity

Summary Genetic association within several malaria candidate genes was examined in 24 villages of northeast Tanzania, using different measures of malaria transmission intensity. We demonstrate that the classic hemoglobinopathies were associated with measures of transmission intensity that had longer time scales.

Malaria caused by the parasite Plasmodium falciparum continues to have a significant impact on public health in tropical areas of the world. Although major advances have been made in malaria control, there is increasing parasite drug resistance and mosquito insecticide resistance, and no fully efficacious vaccine exists. A better understanding of the factors associated with immunity to and protection from malaria may contribute to the development of vaccines and other therapies. It is widely recognized that malaria has exerted strong genetic selection on the human genome [1]. Mutations leading to sickle cell trait (hemoglobin S [HbS]) were central to the hypothesis, proposed by Haldane in the 1940s [2], that the high prevalence of hemoglobinopathies in southern European and African populations was due to the protection against malaria in carriers of these traits. This hypothesis has subsequently been validated for several hemoglobinopathies, indirectly by the concordance between the prevalence of malaria parasites and frequencies of mutated alleles [3,4] and directly through protection from disease in carriers [4][5][6][7]. The different geographic distributions of HbS, α-thalassemia, glucose-6-phosphate dehydrogenase (G6PD) deficiency, ovalocytosis, and the Duffy-negative blood group are examples of the general principle that different genetic variants have arisen and been selected in different populations (see [8] for a review). The most striking example is the β-globin HBB gene, in which 3 different coding single-nucleotide polymorphisms (SNPs) confer protection against malaria: Glu6Val (HbS), Glu6Lys (hemoglobin C), and Glu26Lys (hemoglobin E). The mechanisms by which the various hemoglobinopathies protect individuals from malaria are not fully understood [9,10].
There is increasing evidence that other traits, particularly those relevant to immunity and inflammation, may also be selected by malaria infection; these include polymorphisms in genes encoding tumor necrosis factor (TNF, MHC class III region, reviewed in [11]; Toll-like receptors (TLR4, TLR9) [12]; CD40 ligand (CD40L) [13]; interferon γ (IFNG), reviewed in [14]; and nitric oxide synthase type 2 (NOS2A), reviewed in [15]). These associations are consistent with the observation that disease severity (and thus malaria-related mortality risk) is associated with the strength of the inflammatory response to malaria infection [16].
The bulk of the evidence linking genotype to malaria susceptibility stems from studies of severe and/or complicated disease ,where the genetic association might be the strongest owing to the close temporal relationship between genotype, consequence (infection or inflammation), and effect (disease or death). Conversely, genetic traits that reduce the risk of becoming patently infected with malaria [17], such as those operating at the pre-erythrocytic stage of infection, are less likely to be detected because their effect is too small or because controls tend to be drawn from those with patent but nonsevere infection. However, these traits may become apparent in large population-based studies. Landmark studies by Flint et al [4,5] in Melanesia were the first to use the natural cline in malaria, associated with latitude and altitude, to demonstrate a positive correlation between malaria transmission and the prevalence of α-thalassemia, an observation replicated in northeast Tanzania, where a negative correlation was observed between altitude (as a proxy for malaria transmission) and the prevalence of HbS and α-thalassemia heterozygotes [3]. Integrated approaches have now shown that the frequency of different genetic polymorphisms varies with malaria transmission intensity at a regional and at a global scale [18][19][20].
Malaria transmission intensity can be defined in a variety of ways, including the incidence of clinical disease, the frequency of being bitten by infected mosquitoes, and the environmental suitability for transmission [21]. These definitions are linked to different epidemiological measures that capture distinct aspects of transmission, including risk of exposure (geographic, climatic, and mosquito-related variables), actual exposure (serological data), infection (parasitological variables) and disease (clinical data). These measures are associated with differing time scales, ranging from generations (for risk) to years (for exposure), months or weeks (for infection) or days (for disease). In turn, these different measures may be associated with different genetic markers, depending on the clinical or parasitological consequences of the respective genetic effects. In the current study, we combine a candidate-gene approach with different measures of malaria transmission to investigate malaria genetic associations in populations living in northeast Tanzania.

Ethical Approvals
Ethical approval was obtained from the London School of Hygiene and Tropical Medicine, Kilimanjaro Christian Medical College in Tanzania, and the Tanzanian National Medical Research Institute.

Study Sites
The study was conducted in the Kilimanjaro and Tanga regions of northeast Tanzania. Cross-sectional age-stratified malariometric surveys, conducted in 24 villages in 6 transects ( Figure 1 and Table 1) after the short rainy season in November 2001 and again after the long rains the following June (2002), as described elsewhere [22], generated between 194 and 435 samples from unique individuals 1-45 years old in each village. To control for population structure, each transect was selected to approximately represent a specific ethnic group (Table 1).
A finger-prick blood sample was collected from each participant for assessment of parasitemia, antimalarial antibody responses and genotype. Written consent was provided by all participants or by their guardians, and clinical illnesses were treated in accordance with national guidelines.

Blood Slide Examination
The presence of P. falciparum parasites was determined by microscopic examination of Giemsa-stained blood smears. Asexual (blood-stage) parasite density was calculated relative to 200 white blood cells [22]. Each slide was read by 2 independent microscopists, and noncongruent readings were resolved by a third reader.

Calculation
Serological characterization of the samples through enzymelinked immunosorbent assays and calculation of seroconversion rate (SCR) (ie, the rate at which individuals become seropositive per year) has been published elsewhere [23]. Briefly, indirect enzyme-linked immunosorbent assays were used to detect immunoglobulin G (antibodies to blood-stage malaria antigens merozoite surface protein 1 19 [K1-Wellcome genotype] and apical membrane antigen 1 [3D7], produced as described elsewhere [24,25]). Antibody responses in serum from Europeans with no previous exposure to malaria were used to define a cutoff for seropositivity (mean optical density plus 3 standard deviations) for each antigen. Seroprevalence was calculated for each village as the respective proportion of seropositive individuals. SCRs were estimated for each antigen and village using reversible catalytic models, where each individual transits between seronegativity and seropositivity over time under the assumption of stable and constant malaria transmission intensity.

DNA Extraction and Genotyping
DNA was extracted from the archived, pelleted, blood samples using Nucleon kits (Hologic) according to the manufacturer's instructions. SNP genotypes were determined by means of primer-extension mass spectrometry using the Sequenom iPlex platform MassARRAY system (Agena Bioscience) at the Wellcome Trust Centre for Human Genetics in Oxford. The α 3.7 -thalassemia deletion was identified using polymerase chain reaction amplification, as described elsewhere [3]. A total of 275 SNP assays were designed incorporating a core set of 65 SNPs (as described elsewhere [26]), plus a further 137 autosomal SNPs selected in genes associated with or described as associated with antibody production [27], and another 73 identified in a previous study [28].

Statistical Analysis
SNPs were removed from the analysis if monomorphic, if the allele frequency was <1%, if >10% of their genotypes were missing, or if there was evidence of extreme deviation from Hardy-Weinberg equilibrium (P value < .001; χ 2 test). Because the sample and actual population sizes are close in many villages (mostly those in the highlands owing to their geographic isolation), genetic association analyses were conducted using aggregated data from each village: mean altitude, parasite prevalence, SCR, genotype distributions, male-female ratio, and transect of residence. Genotype frequencies of each SNP were summarized for each village and then rescaled using the log-additive transformation [29], with the heterozygous genotype frequencies used as reference, that is, .
Missing genotype frequencies of the α-thalassemia locus from 11 villages [30] plus missing SCRs from 3 other villages (Table 1) were jointly imputed via multiple imputation using chained equations [31]. Multiple imputation was performed at the village level and not at the individual level as described elsewhere [30]. Altitude, log odds of parasite rate, sex log ratio, and transect of residence were included in the imputation models as fixed covariates. Postimputation estimates and standard errors were based on 100 imputed data sets and determined as described elsewhere [32]. Evidence for genetic association of a given SNP with each malaria transmission measure was assessed by multivariate linear regression based on a multivariate normal distribution, using the log-additive genotype distributions as dependent variables and the logarithm of the male-female ratio and the transect of residence and the different malaria transmission measures as covariates. Models were tested with SCR and altitude in linear or in log, choosing the scale that provided the best fit for the data. Log odds of parasite prevalence were used in the analysis. The Wilks likelihood ratio was then used to compare models with or without a given malaria transmission measure. For consistency, -log 10 (P value) was considered a measure of statistical significance. High values of this measure provided evidence for an association between a given genetic marker and the respective transmission intensity measure. Because data refer to a set of candidate genes for which there is a priori evidence of association with malaria susceptibility, a significance level of 1% (eg, 2 for -log 10 [P value]) was used in each individual association test. Finally, the fitted regression models were statistically checked by residual analyses (eg, normality checks or trends in the residuals).
A principal component analysis of altitude, SCR, and parasite rate enabled the derivation of alternative "latent" measures of malaria transmission. Genetic association was assessed again, as described above. Association analyses involving X chromosome SNPs were performed on male and female data separately. The underlying false discovery rates were finally estimated for the different analyses (Supplementary Table S1). These rates ranged from 0.08 to 0.19; thus, one could not rule out that some of the detected association signals, namely, those close to the significance threshold, were due to chance. All analyses were carried out with R software, using 2 packages: MICE (for data imputation) and Genetics (for genetic analysis) (see https:// cran.r-project.org/).

RESULTS
A total of 8241 samples were genotyped for an initial set of 275 SNPs by Sequenom technology. After implementation of quality control protocols, data from 8096 individuals across 175 high-quality SNPs were available for the final analysis. For α 3.7 -thalassemia, 2997 samples were successfully genotyped from 13 villages (Mgome in Tanga and 4 villages each from Kilimanjaro, South Pare, and West Usambara 2), and data imputation was performed in the remaining 11 villages.

Malaria Transmission
Genetic variation analysis confirmed previous observations of significant inverse gradients of α-thalassemia and HbS and altitude (−log 10 [P value], 5.30 and 4.33, respectively; Table 2 and Supplementary Figure S1A) [3]. The odds of noncarrier versus HbS carrier increased by approximately 0.16 per 100 m increase in altitude, whereas the same odds of the noncarriers of the α-thalassemia trait increased by approximately 0.08 per 100 m using data imputation. Similar genetic effects for the α-thalassemia trait were obtained for the complete data of 13 villages but with increased standard errors. Strong associations were found between genotype frequency of these 2 traits and SCR and parasite prevalence (−log 10 [P value], >2.61; Table 2 and Supplementary Figure S1B and S1C). Significant associations were detected between all malaria transmission measures and SNP rs3211938 in CD36 (−log 10 [P value], >2.58; Table 2 and Supplementary Figure S1) and several point mutations within the X-linked G6PD locus (Table 3 and Supplementary Figure S2). For G6PD, the associations were most pronounced among female participants and for current transmission intensity (parasite prevalence) compared with historical or recent measures of malaria transmission. These associations are in agreement with recent findings from the same area, showing that heterozygous women are more protected against severe malaria than men [33].

Association Between Variation Within Immune Response Genes and Current Malaria Transmission Levels
Variation in frequency of a number of SNPs located within immune response associated genes is statistically associated with parasite rate (Table 2). These SNPs included 5 within DDC (which encodes dopa decarboxylase/aromatic L-amino acid decarboxylase, an essential component of the dopamine/serotonin/tryptamine pathway); SNPs in genes encoding interleukin 3, interleukin 13, and CTLA-4; and 2 SNPs in PDLIM4 (believed to play a role in bone development and homeostasis). Note, however, that the signals of association are moderate compared with those for sickle cell, α-thalassemia, and G6PD.

Variation
Although altitude, SCR and parasite prevalence capture different time scales of malaria transmission, these measures were highly correlated with each other (Supplementary Table   S2). Altitude was inversely correlated with both SCR and parasite rate (R 2 < −0.688) whereas SCR was positively correlated with parasite rate (R 2 = 0.568). These correlations were further explored using 2 principal components that accounted for 95% of the total variation of these measures (Supplementary  Table S2). These principal components could be interpreted as 2 independent measures of malaria propensity of a village across historical, recent, and current infection. The respective derived data were then tested for genetic association ( Figure  2 and Table 4). HbS, α-thalassemia, CD36, several SNPs in the G6PD locus either in female or male participants, and borderline associations with genetic markers at the DDC and PDLIM4 loci were associated with the first principal component. For the second principal component, there was an enrichment of immune response genes (IL3, TNF, TLR4, and CR1), although the corresponding associations were not strong (2.00 < −log 10 (P value) < 3.00). Of note, the genetic variation at the G6PD locus could be explained by both principal components.

DISCUSSION
Geographic variation in the frequency of hemoglobinopathies provided the first evidence that these traits might protect against disease or death caused by malaria. They tend to be rare in areas of low malaria transmission and more common in areas of higher transmission. In this study, we have used local altitude-dependent variations in malaria transmission intensity to both confirm and identify additional human genetic polymorphisms relevant for individuals living in the study area. The present study focused on a set of genetic polymorphisms previously implicated in resistance to malaria. Replication of many of these associations has been difficult because, although some polymorphisms are likely to be genuinely (and causally) associated with malaria resistance, other associations may have arisen by chance (eg, in small studies or owing to population structure or instability of malaria infection measures) or may be closely linked to causal variants in some populations but not in others [34]. Possible population structure was controlled by a study design comprising transects that represented specific ethnic groups. Well-matched age distributions and sample sizes ensured similar precision of the aggregated data across villages.
The use of altitude and SCR in the genetic analysis reduced the chance of detecting sporadic associations due to the intrinsic instability of the parasite rate in estimating the underlying malaria transmission intensity. Altitude, through the effect of temperature on sporogonic development of malaria parasites in the mosquito, is a stable proxy for historical, recent, and ongoing malaria exposure stretching back to the original settlement of these villages some 4000-5000 years ago [35].
SCR is a more direct measure of exposure to infection but estimated from a mathematical model that assumed constant and stable malaria transmission intensity over time. Such a model, although fitting the data well, might mask possible slow (linear) trends in disease transmission intensity occurring over time [36]. Estimates of SCR should be then seen as averages of recent transmission intensities. This averaging effect might reduce the power of identifying genetic associations with more  subtle effects. Notwithstanding this limitation, SCR analysis suggests that malaria transmission intensities in the study villages could be considered approximately stable for about 40 years before sampling in 2001 [23], and probably before this [37]. The same likely holds true for the resulting genetic association due to malaria, given that there have been no major migrations or admixture events in the recorded history of this area The approach presented here was validated by strong genetic associations of HbS, α-thalassemia, and G6PD deficiency, each of which showed marked variation with the different malaria transmission intensities. As anticipated, geographic variation in the frequency of sickle cell trait and α-thalassemia was most highly correlated with altitude and is thus a stable marker of malaria exposure over many generations. Moreover, both these traits were also highly associated with SCR and parasite prevalence, suggesting stable genetic associations over time. Interestingly, G6PD deficiency was most significantly associated with parasite rate (stronger in female than in male participants) rather than with altitude or SCR. This may reflect the fact that mutations in this gene protect from severe clinical outcomes rather than infection per se [2].
Given the geographic variation in the prevalence of classic malaria resistance traits and their strong association with transmission intensity, it was surprising that relatively few of the other "malaria-associated" polymorphisms showed any such variation and/or association. Of the 70 genes analyzed, only 6 (CD36, DDC, IL3, IL13, CTLA4, and PDLIM4) showed any direct association with malaria transmission measures. Interestingly, polymorphisms in all of these loci were significantly associated with parasite prevalence, but only 2 (CD36 and DDC) were associated with altitude and only 1 (CD36) with SCR. Although located in the same chromosome, these genes are not linked to each other. Three other loci (CR1, TNF, and TLR4) showed borderline associations with the principal component 2 but not with any particular measure of malaria transmission. Village-level prevalence of the CD36 polymorphism (rs3211938) was associated with all 3 transmission intensity measures and with the principal component representing putative long-term effects of malaria exposure. CD36 is a ubiquitously expressed scavenger receptor and a major receptor for the Plasmodium falciparum erythrocyte membrane protein 1 family of erythrocyte surface proteins, responsible for sequestration and rosetting of malaria-infected red blood cells [38]. Although an early study suggested an association between CD36 mutations and susceptibility to severe malaria [39], this link has not been substantiated, and an extensive multiple-country analysis indicated that the high prevalence of the rs3211938 polymorphism in African populations is mostly likely maintained by factors other than malaria [40]. However, the consistent association of rs3211938 with 3 different measures of malaria transmission intensity suggests that CD36 may indeed contribute to protection against malaria, but perhaps through its role in macrophages as a nonopsonic mediator of phagocytosis or through its role in hemostasis and thrombosis, rather than as an endothelial receptor for infected erythrocytes.
Several linked polymorphisms in the DDC locus were associated with parasite prevalence and the principal component pertaining to variation between intermediate and current malaria exposure. This result is agreement with 2 recent studies from Tanzania [28] and Gambia [41] that detected a possible role for DDC in severe malaria. Because these various polymorphisms are all in linkage disequilibrium with each other, the corresponding genetic effect might reflect haplotypes protecting from malaria, as demonstrated for the G6PD locus in a study from Tanzania [33]. A detailed analysis of individual haplotype associations was beyond the scope of the current work.
Genetic variations in several immune-response genes (CR1, TNF, TLR4, IL3, IL13, and CTLA4), as well as in RAD50, SLC22A4, and PDLIM4, were found to be moderately associated with parasite prevalence and the alternative measure of current transmission intensity based on the second principal component. Because these polymorphisms were associated only with these measures of transmission intensity, one can question their possible replication in future studies. In this regard, parasite rate can be affected by the study design used, time of sampling, seasonal effects on malaria transmission, and sensitivity/specificity of the diagnostic test. Therefore, this measure should be considered less robust than altitude and SCR, which capture longer time scales of transmission intensity.
In summary, genetic associations concerning common red blood cell polymorphisms were confirmed using different measures of malaria transmission intensities. The remaining associations would seem less robust and less likely to be replicated, because they are related to parasite rate and an alternative measure of current transmission intensity. The adopted approach therefore shows the advantage of finding genetic associations that are likely to persist over time.

Supplementary Data
Supplementary materials are available at The Journal of Infectious Diseases online. Consisting of data provided by the authors to benefit the reader, the posted materials are not copyedited and are the sole responsibility of the authors, so questions or comments should be addressed to the corresponding author.