GWAS of self-reported mosquito bite size, itch intensity and attractiveness to mosquitoes implicates immune-related predisposition loci

Abstract Understanding the interaction between humans and mosquitoes is a critical area of study due to the phenomenal burdens on public health from mosquito-transmitted diseases. In this study, we conducted the first genome-wide association studies (GWAS) of self-reported mosquito bite reaction size (n = 84,724), itchiness caused by bites (n = 69,057), and perceived attractiveness to mosquitoes (n = 16,576). In total, 15 independent significant (P < 5×10−8) associations were identified. These loci were enriched for immunity-related genes that are involved in multiple cytokine signalling pathways. We also detected suggestive enrichment of these loci in enhancer regions that are active in stimulated T-cells, as well as within loci previously identified as controlling central memory T-cell levels. Egger regression analysis between the traits suggests that perception of itchiness and attractiveness to mosquitoes is driven, at least in part, by the genetic determinants of bite reaction size. Our findings illustrate the complex genetic and immunological landscapes underpinning human interactions with mosquitoes.


Introduction
Being bitten by mosquitoes is a common, but generally minor irritation. However, mosquitoes are vectors for many infectious agents that affect humans and have the potential to transmit deadly diseases like malaria, dengue, yellow fever and Zika virus-associated microcephaly (1,2). Understanding the interactions between mosquitoes and humans, including genetic factors, may lead to the development of methods to reduce the spread of mosquito-borne infections and predict risk of infection.
Following a mosquito bite, mosquito salivary antigens elicit a cutaneous hypersensitivity reaction (3), predominated by mast cell degranulation of pruritic mediators, lymphocyte recruitment and localised inflammation (4). Longstanding evidence has suggested a natural history of sensitisation and subsequent desensitisation to mosquito saliva antigen in humans (5). As well as this within-individual variation, levels of mosquito antigen hypersensitivity are known to vary between individuals with bite reaction sizes ranging from mild wheals to large papules. It has been suggested that bite reaction size correlates with the level of pruritus. Previous studies have also shown that mosquitoes prefer to bite some individuals over others who are in close proximity (6). Variation in host attractiveness to mosquitoes is thought to be mediated by body odour composition. Mosquitoes have evolved odorant receptors that are acutely sensitive to compounds unique to human odour (7). Twin studies have suggested that host attractiveness to mosquitoes is a highly heritable trait (8), suggesting that there may be a genetic basis to both hypersensitivity to bites and attractiveness, but no factors have been so far described (8,9).
To identify genetic factors related to mosquito bite traits, we analysed data from three self-reported phenotypes: mosquito bite reaction size (hereon described as 'bite size'), itchiness caused by mosquito bites, and perceived attractiveness to mosquitoes. We identified a high degree of pleiotropy between these traits, and our GWAS identified a collective total of 15 independent genome-wide significant (GWS) loci, all of which map to immune genes. Further annotation with epigenetic functional data and mining of immune cell phenotyping datasets identified evidence for a prominent role for activated T cells, and overlap with predisposition factors identified for allergic diseases. Using Egger regression, we demonstrate evidence for a causal relationship between bite size and perception of both itch intensity and self-reported attractiveness to mosquitoes.

Description of GWAS studies and cohort
We designed three questionnaires to capture variance in mosquito bite size, itch intensity of mosquito bites and perceived attractiveness to mosquitoes (Supplementary Material, Note), and deployed them to research participants and customers of 23andMe Inc., a personal genetics company (10) that had previously developed a range of health and lifestyle surveys. Bite size was captured using one web-based questionnaire on a six point scale (Supplementary Material, Table S1); itch intensity was captured using one web-based questionnaire on a four point scale (Supplementary Material, Table S2); and mosquito attractiveness was captured using one web-based questionnaire (Supplementary Material, Table S3) where participants scored their perceived attractiveness to mosquitoes as being either more or less attractive, compared to other people. Participant responses of 'I'm not sure' to any question were removed from phenotypic and genotypic analyses. Table 1 illustrates final participant numbers for each of the three mosquito-related traits, broken down by gender and age.
Cross-correlation of phenotypic responses using linear regression (Materials and Methods) for bite size and itch intensity indicated a high positive correlation (adjust_R ¼ 0.51) after adjusting for age and sex, implying each variable explains approximately 26% (adjust_R 2 ¼ 0.26) of the variability of the other (Supplementary Material, Table S4 and Fig. S1). Cross-comparison of mosquito bite size and mosquito attractiveness phenotypes (Supplementary Material, Table S5 and Fig. S2) indicated a positive correlation (adjust_R ¼ 0.49) after adjusting for age and sex, implying that 24% (adjust_R 2 ¼ 0.24) of the variance is explained by each other. The cross-comparison between itch intensity and attractiveness phenotypes (Supplementary Material, Table S6 and Fig.  S3), showed a high positive correlation (adjust_R ¼ 0.63) after adjusting for age and sex, implying that over 39% (adjust_R 2 ¼ 0.39) of the variance is explained by each other.
Mosquito bite size data were analysed by linear regression (Materials and Methods) and after applying quality controls we identified 10 GWS (P < 5Â10 À8 ) associations, as illustrated in the Manhattan plot (Fig. 1). Quantile-Quantile plots for this and all other GWAS are provided. Index single nucleotide polymorphisms (SNPs) and nearby genes are listed in Table 2 (regional plots are provided in Supplementary Material, Note, and association test results are provided in Supplementary Material, Data S1). Bite size was scored in a positive direction, meaning the effect size identified for the most significant association (rs377070, P ¼ 1.2Â10 À36 , effect size -0.07) predicted a 0.07 point reduction in size on a 6-point ordinal scale.
Since we observed a high correlation between the bite size and itch intensity traits, to enrich for genetic factors specifically driving the itch response, we conducted a GWAS using linear regression and adjusting for bite size (Materials and Methods). In addition and differing to the analysis of other traits reported, we sought to reduce any genetic contribution from immunerelated loci that may play a transitive role in underlying predisposition to immune conditions, hypothesising that this may reveal drivers for the sensory aspect to perception of itch. To address this, we excluded data from 15,667 individuals who selfreported as positive for any one of the 19 autoimmune-related conditions (Supplementary Material, Table S7), reducing the total dataset to 69,057 individuals (Supplementary Material, Table S8). GWAS analysis revealed six GWS associations (P < 5Â10 À8 ) ( Fig. 2 and Table 3, Supplementary Material, Note and Data S1). We consider this maximally adjusted itch GWAS as the most representative dataset describing the genetic determinants for itch response to mosquito bites. Regarding the effect magnitude for the itch intensity GWAS, the most significant association at 5q31.1 (rs2248116, P ¼ 3.5Â10 À21 , effect size; 0.034) corresponds to a 0.034 point reduction in itch on a four point ordinal scale, as this phenotype was scored in a negative direction. Results from the interim itch intensity GWAS without and with bite size adjustment are provided in Supplementary Material, Figures S4 and S5, Tables S9 and S10, respectively.
We also carried out ordinal regression analyses on the bite size and itch intensity datasets, using proportional and partial proportional odds models on associations P < 1Â10 À5 (Materials and Methods). Compared to the results determined using linear regression, ordinal analysis produced highly concordant p values for itch intensity and bite size (Supplementary Material, Fig. S6), demonstrating that both approaches were similarly powered for analysing ordinal traits.
Inspection of participant responses revealed that females (in comparison to males) were more likely to report a severe itch response rather than a mild or non-noticeable itch response, and chose the largest mosquito bite size answer option. Females were also more likely to report being more attractive to mosquitoes than others, compared to males (Pearson's chisquared test, P < 2.2Â10 À16 for all). We explored this gender bias further in the itch intensity GWAS by conducting two additional linear regression analyses, using data from male (n ¼ 41,343, Supplementary Material,  Table S14, Data S1). An association at 6p21.32 was the only locus among the five GWS associations to survive an interaction test (P ¼ 0.0045, significant against five tests). A comparison of the effect sizes suggested that this locus in the HLA region confers an approximately three-fold greater effect on itch intensity in females compared to males (effect size -0.040 versus -0.017 respectively, Supplementary Material, Table S15).
We conducted a GWAS for mosquito attractiveness using logistic regression on 16,576 unrelated European individuals (Supplementary Material, Table S3), grouped into cases (less attractive, n ¼ 7,296) or controls (more attractive, n ¼ 9,280), and identified three GWS associations (Fig. 3, Table 4, Supplementary Material, Data S1 and Note). Attractiveness was scored in a negative direction; an odds ratio >1 indicates that the effect allele confers a protective effect against being bitten (reduced attractiveness), and an odds ratio <1 indicates an increased perceived risk of being bitten of being bitten (increased attractiveness).
A comparison of effect size and P values between traits for the top index significant SNPs of each trait (mosquito bite size variation, itch intensity from mosquito bites and perceived attractiveness to mosquitoes) is shown in Supplementary Material, Table S29, where the summary statistics for the top hits of each trait are highlighted in gold, and the corresponding summary statistics for the other two traits are shown in adjacent columns as labelled. While it can be seen that many loci are significant across each trait, there are some which are specific to both bite size and itch intensity.

Heritability and cross-trait correlations
We investigated the heritability of the measured traits by applying LD Score regression (11) on GWAS summary statistics to determine the fraction of heritability (cumulative variance) explained by the GWS loci (P < 5Â10 À8 ) that were identified by each mosquito-trait GWAS (Supplementary Material, Table S16, Materials and Methods).
For mosquito bite size, we calculated the heritability to be 5.7% (h 2 ¼0.0572, 0.01 SE), with GWS bite size loci explaining approximately 13.1% of this variance (or 0.75% of the total variance). For itch intensity, the heritability was 4.3% (h 2 ¼0.0425, 0.01 SE), with GWS itch loci accounting for approximately 10% of this variance (0.43% of the total). For mosquito attractiveness the heritability was 9.1% (h 2 ¼ 0.091, 0.04 SE), with GWS attractiveness loci accounting for 11.1% of this variance (1.0% of the total) (Materials and Methods). Cross-trait LD Score correlation analysis (12) identified a positive genotypic correlation between mosquito bite size and itch intensity (R ¼ 0.56, 0.14 SE, P ¼ 7.14Â10 À5 , Supplementary Material, Table S16, Materials and Methods). Genetic determinants for mosquito attractiveness were also positively correlated with mosquito bite size (R ¼ 0.97, 0.19 SE, P ¼ 5.18Â10 À7 ) and itch intensity (R ¼ 0.94, 0.18 SE, P ¼ 8.16Â10 À8 ). Region, cytogenetic band; chr, chromosome; position, build 37 map position of the SNP; SNP quality is average r 2 from imputation; alleles A and B are assigned based on their alphabetical order; EAF (B allele), the frequency of the effect allele, which is denoted here as allele B, across all study participants; effect size, magnitude of effect for the B allele; CI, confidence interval; P, k adjusted significance level; gene context, gene(s) spanning or flanking (<1Mb away from) the index SNP: brackets indicate the position of the SNP, and dashes indicate distance to a flanking gene (À, >1 kb; À, >10kb; À À, >100kb). conditions. The grey horizontal line corresponds to P ¼ 5 Â 10 À8 , and results above this threshold are shown in dark grey. Gene labels are annotated as the nearby genes to the significant SNPs.

MR analysis
To find evidence in support of causal relationships between the three traits, we applied a Mendelian Randomisation (MR) approach called MR Egger regression (13) (Supplementary Material, Table S17 and Supplementary Material, Fig. S9, Materials and Methods). Our most significant finding was for the test for bite size affecting attractiveness, where we observed a negative intercept estimate (-0.09, P ¼ 0.0047) and a significant positive slope coefficient (4.33, P ¼ 1.8Â10 À5 ). After constraining at the origin, the magnitude of the slope decreased to 2.28, but the significance level increased to P ¼ 4.0Â10 À7 . In the test for bite size affecting itch intensity, we also observed a negative but insignificant intercept estimate and a significantly positive slope coefficient (0.43, P ¼ 0.016). After constraining at the origin, the slope estimate again decreased modestly while increasing dramatically in significance (0.34, P ¼ 1.88Â10 À6 ). These results provide evidence of a causal relationship by virtue of a significant slope both with and without constraint through the origin, and by intercepts that are respectively of modest and no significance. While the remaining pairwise tests all failed to demonstrate a significant slope or intercept when not constrained through the origin, each did exhibit at least a modestly significant positive slope when constrained through the origin, suggesting some degree of pleiotropy or causality between each trait (Fig. 4).

Genetic association analyses
We investigated whether mosquito trait loci also predispose to other traits and/or diseases by annotating our findings The analysis was adjusted for bite size and excluding responders positive for common immune-related conditions). Region, cytogenetic band; chr, chromosome; position, build 37 map position of the SNP; SNP quality is average r 2 from imputation; alleles A and B are assigned based on their alphabetical order; EAF (B allele), the frequency of the effect allele, which is denoted here as allele B, across all study participants; effect size, magnitude of effect for the B allele; CI, confidence interval; P, k adjusted significance level; gene context, gene(s) spanning or flanking (<1Mb away from) the index SNP: brackets indicate the position of the SNP, and dashes indicate distance to a flanking gene (À, >1 kb; À, >10kb; À À, >100kb).   Table S25). Additionally, to inform mechanistic hypotheses, we annotated index SNPs in high LD (r 2 > 0.8) with variants in regulatory regions identified by public databases, namely co-localisation with chromatin states, conservations, and regulatory motif alterations, using the web-based computational tool HaploReg (15,16) (Supplementary Material, Table S26). This information is summarised in Table 5.
We identified significant associations for all three mosquito-related traits at 6p21.32, which harbours the major histocompatibility complex (MHC) locus that encodes the highly polymorphic classical human leukocyte antigen (HLA) class I and class II genes, which are essential for self versus non-self-immune recognition. For mosquito bite size, the index SNP (rs3134995; P ¼ 1.9 Â 10 À24 ) is in LD (r 2 ¼ 0.51) with the nonsynonymous variant rs9260 (M230V) in HLA-DQA1, plus variants that are eQTLs for eight HLA genes (Table 5) and the variant rs3129720 previously identified as a risk factor for multiple sclerosis (19), and hypothyroidism (20). We identified two independent GWS associations at 6p21.32 for mosquito itch intensity. The variant rs12055445 (P ¼ 1.5 Â 10 À10 ) correlates with expression of ten HLA genes and the other variant (rs2523614, P ¼ 1.4 Â 10 À9 ) was also identified as an eQTL for HCG27, HLA-B, and PSORS1C1 (Table 5). We also identified one association near HLA-DQB1 that was significantly associated with itch intensity only in female responders. For mosquito attractiveness, the index GWS variant (rs9268659; P ¼ 3.5Â10 À9 ) is in high LD (r 2 > 0.8) with a HLA-DRA variant rs7192 (L242V) and other variants that are eQTLs for five HLA genes (Supplementary Material, Table S25). However, each mosquito trait was associated with an independent set (LD r 2 < 0.1) of HLA variants, reflecting the highly complex patterns of recombination and structural variation at 6p21.32, as well as the gene density (21). Previous studies have identified the MHC region as a prominent susceptibility locus for many immunerelated diseases (22).
We observed several independent GWS associations in genomic regions associated with cytokines and their receptors. A GWS association for mosquito bite size was identified at 22q12.3 co-localising with the CSF2RB gene that encodes the common b chain receptor component for the IL3, IL5 and CSF2 cytokines (rs5750339, P¼ 9.2 Â 10 À21 ). Sub-significant associations were also detected in the itch intensity and attractiveness GWAS (itch: rs5756391, P ¼ 3.20 Â 10 À7 ; attractiveness: rs5750339, P¼ 8.03Â10 À4 ). The bite size variant is an eQTL for CSF2RB in lymphoblastoid cells (Table 5). Mosquito bite size and itch intensity were both significantly correlated with variants located downstream of the IFNG locus, encoding interferon-c (IFN-c), an important innate immunity cytokine (bite size: rs2906856, P¼ 7.9Â10 À12 ; itch intensity: rs3814244, P¼ 1.9Â10 À9 , r 2 ¼ 1). Variants at this locus correlate with reduced expression of IFNG-AS1, a non-coding RNA suggested to positively control IFNG expression post-transcriptionally (30). Finally, we identified a bite size association located intronic to DARS (rs6754311, P ¼ 1.6 Â 10 À10 ), a gene that encodes an aspartyl-tRNA synthetase. Further inspection shows that this index variant colocalises with enhancer histone modifications and is an eQTL for DARS, MCM6, UBXN4, and CXCR4 in a range of cell types. CXCR4 encodes the C-X-C chemokine receptor type 4, a receptor for SDF-1, an important chemotactic factor for lymphocytes. CXCR4 is primarily expressed by naïve and T H 2 cells, eosinophils and mast cells, where it plays a significant role in T H 2-type allergic diseases (31).
Four mosquito trait loci were mapped to transcription factor genes all of which have links to immune system function. Mosquito bite size was associated with a region located 5' of IKZF1, which encodes the Ikaros transcription factor (rs62447171, P ¼ 3.5 Â 10 À8 ). Ikaros is upregulated in lymphoid cells and regulates T H 2 differentiation (32). The index variant co-localises with active histone marks in lymphocytes and is predicted to disrupt a putative binding site for the regulatory transcription factor complex, AP-1. This index SNP is also in high LD with a systemic lupus erythematosus risk factor (33) (rs4917014, r 2 ¼ 0.699). Bite size was associated with an intronic variant of FOXK1 (rs7793919, P ¼ 2.8 Â 10 À8 ). This variant co-localises with enhancer histone marks in embryonic and mature cell types, and is an eQTL for FOXK1 in peripheral blood leukocytes. FOXK1 is a member of the human Forkhead-box family, a group with diverse roles including control of lymphocyte development and regulation (34). A further bite size association was located in a dense haematopoietic enhancer region at 6p21.1, situated 3' to RUNX2 (rs62447171, P ¼ 1.1 Â 10 À10 ). RUNX2 is a key transcriptional regulator for T cell development (35). A sub-GWS bite size association was detected at the STAT6 locus (rs3024971, P¼ 5.9 Â 10 À8 ). This SNP is an eQTL for the genes encoding STAT6 and STAT2, transcription factors important in the mammalian cytokine response. Descriptions of sub-GWS genetic associations are listed in the Supplementary Material, Note.

Mosquito-related trait loci are enriched for immunerelated enhancers and T lymphocyte loci
To further explore the regulatory nature of the mosquitorelated trait associations, we mapped GWS associations (P < 5Â10 À8 ) and their proxies onto sequences annotated with ChromHMM epigenetic markers specifically for active enhancers in published reference tissues and cell types (15,16) (Materials and Methods). For the '7_Enh' annotation across all tissues, the mosquito bite size loci are more enriched for 'Primary T helper cells PMA-I stimulated' than any other tissue (10 of 19 lead SNPs) with an uncorrected P value of 0.018, but it does not reach significance after correcting for multiple tissues (Fisher's exact, Supplementary Material, Table S27).
The overlap with loci known to explain variation in levels of T lymphocyte cell subtypes was investigated (36). Although no single immune cell subset obtained significance at a Bonferroni corrected P value of 3.5Â10 À4 , central memory T lymphocytes emerged as a theme among immune cell subsets (P < 0.05) (Supplementary Material, Tables S28 and S30). Five of eight itch subsets, five of 17 subsets for bite size, and two of 12 subsets for attractiveness reflected associations with central memory T cell subsets.

Pathway analyses
We evaluated whether any biological pathways were enriched in our GWAS results using PASCAL, a tool which integrates signals from multiple SNPs in order to map trait associated genomic regions to genes and annotated pathways (37). First, we applied protein-protein interaction (PPI) network analysis (Materials and Methods) to the PASCAL gene lists in order to find subnetworks of physically and functionally interacting genes with consistently strong associations with our traits. The highest scoring subnetwork for each trait is shown in Figure 5. For the bite size and itch intensity traits, the top scoring subnetworks are significantly larger than observed given a random permutation of the association signals (P < 0.05). As we discuss below, a number of cytokine/receptor pairs are identified within the subnetworks, especially the bite size network, which includes CSF2 and CSF2RB, IL21 and IL21R, and IL4 and IL4R.
The top enriched annotated pathways identified by PASCAL include the cytokine receptor interaction pathway from KEGG (v 2 FDR ¼ 7.36Â10 À6 ), the BIOCARTA 'NKT' pathway which describes the selective expression of chemokine receptors during T-cell polarization (v 2 FDR, P ¼ 0.0003) and the REACTOME interleukin receptor SHC signalling pathway (R-HSA-912526, v 2 FDR, Table S31). The BIOCARTA IL3 signalling pathway was significantly enriched in both the itch intensity and attractiveness datasets (both v 2 FDR, P ¼ 0.01, Supplementary Material , Tables S32 and S33).

Discussion
Our analysis shows that mosquito bite size, itch intensity and perceived attractiveness to mosquitoes are all highly correlated traits, at the phenotypic and genotypic level. The primary aim of this GWAS was discovery, and we identified many GWS loci associated with these traits, almost all of which mapped to immune-related genes and pathways, many with wellcharacterised roles in the acquired immune response to antigenic stimulation. Since this is the first study of this type and scale and there are no comparable studies with sufficient power for replication, we sought to validate our findings by identifying evidence of pleiotropy with other immune phenotypes, annotating loci with epigenetic regulatory information generated in relevant cell types, and identifying biological links with mechanistic pathways, some of which are already linked to mammalian responses to mosquito saliva antigens by in vitro and in vivo experiments.
The heritability of attractiveness to mosquitoes was previously calculated to be 62% (8). In this previous study, an olfactomer was used to assess female Aedes aegypti mosquito preferences for body odour emitted by identical compared to non-identical twin pairs. We used LD Score regression and calculated the heritability for each mosquito-related trait to be below 10%. We estimated that the proportion of variance explained by GWAS factors explains only a fraction of the heritability, suggesting that a large proportion of genetic liability remains undiscovered. LD Score regression relies on the strength of genetic association from all common SNPs (partly determined by sample size), and our estimates are a likely underestimate compared to the h 2 parameter predicted by twin studies that consider all genetic variation (38). Despite being derived from a smaller sample size, our analysis suggested that attractiveness was marginally more heritable than bite size or itch intensity. Attractiveness also shared a greater genetic component with bite size and itch intensity, than they did with one another. However, the itch intensity results were adjusted by bite size during analysis, and this will have reduced the shared genotypic component available for capture using cross-trait analysis. A high degree of genetic correlation (12) was detected between the mosquito traits; however, interpretation of this analysis is confounded by the complete overlap in subjects between the bite size analysis and the itch intensity analysis, and also substantial overlap with the attractiveness analysis, together with the fact that these traits likely share variation whether genotypically or environmentally driven. In the latter case, even stochastic false-positive signals may correlate with LD score in each trait, and likewise such a signal would also likely correlate across traits.
Our study also indicated that compared to males, females report greater bite sizes, itchier bites, and greater perceived attractiveness to mosquitoes compared to other individuals. We identified one genetic locus in the HLA region relevant to this difference (rs201452941). This contrasts with previous studies that failed to determine any gender difference in self-reported frequency of bite and symptoms (9) and in experimental models examining mosquito preference for body odours collected from different people (6,39). The itch intensity GWAS was designed with the aim of preferentially identifying non-immune related itch-specific loci, and we identified one such GWS loci (rs4499342, near ACTL9) that was not significantly associated (<5Â10 À8 ) with bite size nor attractiveness. In this analysis, we excluded 18.7% participants who had previously reported a common immune condition. This percentage was greater than current estimates of between 7.6-9.4% Europeans recorded as affected by autoimmune disease in hospitalisation registries (40), although this is not directly comparable as the diseases classed as autoimmune differ to our study. Differences may also reflect the phenotyping ascertainment by self-reporting rather than medical diagnosis. In addition, we acknowledge that we have not captured individuals that may develop an immune disease in the future.
Desensitisation to mosquito bites correlates with increasing age, but for the general population this takes many years to occur or may not even happen, as individuals prone to allergic responses naturally avoid getting bitten. The majority of our study participants were older than 30 years (88.9% of the bite size cohort) and could therefore be placed in a sensitizing stage. There are two causal hypotheses we consider when thinking about the interaction of the bite size and itch intensity traits and the perceived attractiveness trait. In the first hypothesis, we suggest that individuals who are predisposed to be more attractive to mosquitoes would be bitten more often, therefore increasing their exposure to mosquito antigens and sensitising the immune system to manifest bigger bite sizes and a more severe itch over time. Alternatively, one might hypothesise that it is simply an individual's predisposition to increased bite size and itchiness that drives the perception of being bitten, which results in a greater perceived bite frequency and perceived attractiveness to mosquitoes compared to others. To distinguish between these two hypotheses, we applied an MR approach to identify evidence for causality in either direction between the three traits. There was a high degree of pleiotropy between all three traits, but Egger regression suggests that there is strong evidence for causality running in the direction predicted by the second hypothesis of bite size to attractiveness (P ¼ 4Â10 À7 ), along with bite size to itch intensity (P ¼ 1.88Â10 À6 ) (Fig. 4). Interestingly, in addition to the evidence for causality of bite size on attractiveness, the significantly negative intercept value from Egger regression might be interpreted as negative pleiotropy running counter to the causal relationship ( Supplementary  Material, Fig. S9A), meaning that genetic factors predisposing to greater bite size would independently reduce attractiveness to mosquitoes. However, a more plausible explanation for this apparent causal relationship is that bite size is driving the perception of bite frequency but not actual bite frequency. As bite size was measured as a five-point ordinal trait, it is plausible that there would be a minimum threshold of increased bite size necessary to affect the perception of bite frequency. We advise caution in over interpreting this finding and note that the lack of evidence for causality in the reverse direction (of attractiveness on bite size or itch) could be due to the reduced power of the attractiveness dataset due to the smaller sample size.
The control of mammalian body odour production, which may mediate attractiveness to mosquitoes, has been linked to MHC genes (41), which encode cell-surface glycoproteins that present peptides to antigen receptors of T cells, an interaction pivotal for the maintenance of self-tolerance, and protection against pathogens and tumours (42). It is these MHC-derived peptides that undergo metabolism by skin microflora to produce a particular composition of odour (43,44). One study supporting this link demonstrated that carriers of the HLA allele Cw*07 were more attractive to Anopheles gambiae mosquitoes compared to carriers of other HLA allelic profiles (45). Unfortunately, we were unable to test for this association due to the density of genotyping and imputing in this region. Further investigation is required to determine whether the mosquito attractiveness genetic factors identified in this study will translate to a measurable difference in mosquito host preference in a controlled environment. Our mosquito trait GWAS results demonstrated significant associations at the MHC region, which has been previously identified as a major risk locus for other immune and allergic sensitivity phenotypes (46,47), suggesting shared genetic aetiology.
There is mounting evidence demonstrating that mosquitoes have evolved to skew host haemostasis and immune responses to create a more favourable environment for pathogen transmission, at least in mouse models (48). Mosquito saliva antigens appear to induce CD4þ helper T cells to differentiate into CD8þ helper T cells, which in turn is thought to facilitate more efficient transmission of a number of arboviruses that would otherwise be neutralised by T H 1 cytokines (49)(50)(51). The suppression of host immune defences brought about by this process is thought to be broadly beneficial to the mosquito population, as there would be a reduction in the host immunity development of salivary antigens such as anti-coagulants, which are essential for blood feeding. In mouse models, exposure to mosquito saliva antigens induces a drop in IFN-c expression (marker of a T H 1 phenotype), and stimulates a rise in IL-4 and IL-10 expression (markers of a T H 2 phenotype) (50)(51)(52)(53)(54). Whether a similar immune reaction occurs in humans is not clear; however, the genetic predisposition factors identified by our GWAS implicate many of these players in the human inflammatory response to mosquito antigens.
We detected enrichment of GWS mosquito trait loci mapping to active enhancer regions present in stimulated T cells, and evidence for overlap with loci controlling levels of central memory T cells, a more differentiated T H 2 cell subset. A top locus identified by the mosquito-trait GWAS was the central memory T cell cytokine IL21 (55) and the PPI network-based analysis (Fig. 5) also implicated the IL21 receptor (IL21R). The association at IL21 is substantiated biologically by previous work demonstrating upregulation of IL21 and its receptor IL21R in skin lesions from psoriasis and atopic dermatitis (AD) patients (56). An IL21 blocking antibody has been shown to reduce epidermal thickness and inflammation in a human psoriasis xenograft SCID model (57).
Pathway analysis identified a number of other network hubs and cytokine/receptor pairs including IRF1, CSF2, CSF2RB, IL-3, and HLA-B. One such cytokine/receptor pair is IL4 and IL4R. Levels of the T H 2 cytokine IL-4 are known to be modulated in response to mosquito saliva antigen in vitro (49) and pathway analysis on our GWAS results placed IL-4 and its receptor IL-4R central to the bite size network (rs2070874, P ¼ 2.74Â10 À5 and rs3024662, P ¼ 4.4Â10 À4 , respectively). Another pairing identified by mosquito trait GWAS was CSF2RB, which encodes the common b chain receptor component for the IL3, IL5 and CSF2 cytokines, the genes of which were all implicated by an independent GWS association at 5q31.1. The bite size subnetwork also highlighted variation mapping to LTA and LTB (rs9267485, P ¼ 2.4Â10 À17 ), which encode lymphotoxins TNF-b and TNF-C, respectively (58). TNFA is another hub central for bite size (rs9267485, P ¼ 2.4Â10 À17 ) and TNF-a has been previously identified as a target for immune modulation by Aedes aegypti antigen in rat mast cells (59). Downstream of many of these cytokines and cytokine receptors lie transcriptional regulators such as STAT3. STAT3 was placed central to the itch intensity network (rs9891119, P ¼ 2.55Â10 À5 ) and was recently identified as a critical mediator for chronic itch in mouse models of AD (60).
The hypersensitivity reaction observed in response to mosquito saliva antigens is central to the pathogenesis of common allergic diseases. GWAS for atopy, defined by elevated antigenspecific IgE in blood serum, has identified several susceptibility loci (29,61), many of which overlap with those identified by GWAS for self-reported allergies (47). Topical pruritus is a common complaint for AD and the genetic landscape for this phenotype has been well characterised (17,62). Many of these loci map to proteins implicated in innate immunity and allergic inflammation (63), with some overlap with genetic factors for mosquito-related traits. For both the bite size and itch intensity traits one GWS loci mapped to IFN-c, and whilst not genetically predisposing to AD, patients with AD treated with topical IFN-c demonstrated an improvement in pruritus and oedema, supporting IFN-c as a key mediator of itch response to antigenic stimulation (64).
Web-based surveys capturing phenotypic information on common traits in large recontactable cohorts have proved useful for identifying novel genetic associations by GWAS (10) and are also suitable for replicating genetic associations previously identified in datasets compiled using stricter clinical diagnostic records (65). In this study, we relied on the participants' assessment of their bite response and perceived mosquito attractiveness, which potentially introduced confounders because the questionnaires used were not standardized or validated. We anticipate some bias of effect estimates to the null due to reporting error and this should be taken into consideration when interpreting effect sizes and odds ratios identified in this study. We encourage others replicate these findings in independent datasets, including those of non-European descent. We also encourage experimental validation, for example through studies examining mosquito preference for a particular genotype in a controlled setting and by exploring whether candidate proteins modulate the immune system in the presence of mosquito saliva antigens.
Our findings provide a further understanding of the complex interaction between humans and mosquitoes, and may provide useful information for identifying individuals at risk for mosquito-borne diseases, monitoring disease-vector populations, and developing novel methods for attracting and repelling mosquito populations.

Data availability
The full GWAS summary statistics for the 23andMe discovery data set may be requested from 23andMe, Inc. and received subject to the execution of 23andMe's standard data transfer agreement, which includes clauses intended to protect the privacy of 23andMe participants, among other matters. Please contact David Hinds (dhinds@23andme.com) for more information on how to apply for access to the data.

23andMe cohort
Participants in the 23andMe dataset were drawn from the customer base of 23andMe, Inc., a personal genetics company. All individuals included in the analyses provided informed consent and answered surveys online according to 23andMe's human subjects protocol, which was reviewed and approved by Ethical & Independent Review Services, a AAHRPP-accredited institutional review board. The protocol includes methods described in this paper, including online recruitment and data collection, and filtering data for genome-wide analyses. The 23andMe research program, which aims to use genetic and self-reported information to make and support scientific discoveries, is described in the consent document available online (https:// www.23andme.com/about/consent/). Individuals were included in the analysis based on selection for having >97% European ancestry, as determined by analysis of local ancestry via comparison to the CEU, YRI and JPT þ CHB HapMap 2 populations (66). A maximal set of unrelated individuals was chosen for each analysis using a segmental identity-bydescent (IBD) estimation algorithm (67). Individuals were defined as related if they shared more than 700cM IBD, including regions where the two individuals share either one or both genomic segments identical-by-descent. This level of relatedness (roughly 20% of the genome) corresponds approximately to the minimal expected sharing between first cousins in an outbred population. DNA extraction and genotyping were performed on saliva samples by the National Genetics Institute (NGI), a CLIA licensed clinical laboratory and a subsidiary of Laboratory Corporation of America. Samples have been genotyped on one of four genotyping platforms. The V1 and V2 platforms were variants of the Illumina HumanHap550þ BeadChip, including about 25,000 custom SNPs selected by 23andMe, with a total of about 560,000 SNPs. The V3 platform was based on the Illumina OmniExpressþ BeadChip, with custom content to improve the overlap with our V2 array, with a total of about 950,000 SNPs. The V4 platform in current use is a fully custom array, including a lower redundancy subset of V2 and V3 SNPs with additional coverage of lower-frequency coding variation, and about 570,000 SNPs. Samples that failed to reach 98.5% call rate were reanalyzed. Individuals whose analyses failed repeatedly were recontacted by 23andMe customer service to provide additional samples.
Participant genotype data were imputed against the March 2012 'v3' release of 1000 Genomes reference haplotypes (68). We phased and imputed data for each genotyping platform separately. First, we used Beagle (69) (version 3.3.1) to phase batches of 8000-9000 individuals across chromosomal segments of no more than 1,000 genotyped SNPs, with overlaps of 200 SNPs. We excluded SNPs with Hardy-Weinberg equilibrium P < 10 À20 , call rate < 95%, SNPs with minor allele frequency <0.001, or SNPs with large allele frequency discrepancies compared to European 1000 Genomes reference data. The Hardy Weinberg equilibrium test was performed on the full 23andMe cohort (approximately 500,000 Europeans). This threshold was considered conservative and selected as a P value of 1.0e-20 with a sample size of 500,000 represents a magnitude of deviation from equilibrium that would give a P value of 0.003 with a sample size of 50,000. Frequency discrepancies were identified by computing a 2Â2 table of allele counts for European 1000 Genomes samples and 2000 randomly sampled 23andMe customers with European ancestry, and identifying SNPs with a chi squared P < 10 À15 . We imputed each phased segment against all-ethnicity 1000 Genomes haplotypes (excluding monomorphic and singleton sites) using Minimac2 (70), using five rounds and 200 states for parameter estimation. For the non-pseudoautosomal region of the X chromosome, males and females were phased together in segments, treating the males as already phased; the pseudoautosomal regions were phased separately. We then imputed males and females together using minimac, as with the autosomes, treating males as homozygous pseudo-diploids for the non-pseudoautosomal region. 23andMe participants were able to fill out web-based questionnaires whenever they logged into their 23andMe accounts.
We performed genome-wide tests for each association relating to the mosquito bite size and itch intensity phenotypes, using linear regression assuming an additive model for allelic effects, adjusting for age, gender, and five principal components of the genotype data matrix. The mosquito bite size phenotype was captured as an ordered variable with assigned numerical values scored as "less_than_pencil" <" pencil" <" m_m"<" dime" <" quarter"<"more_than_quarter"; 0 < 1<2 < 3<4 < 5, respectively.
The mosquito attractiveness phenotype was analysed by logistic regression adjusting for age, gender, and five principal components of the genotype data matrix. The cases were defined as answering "less_than_people_around_me" and controls defined as answering "more_than_people_around_me." The answer option 'same as people around me' was not available.
We applied a genomic control inflation correction of 1.07 to bite size GWAS result, and 1.042 to itch intensity GWAS result (k ¼ 1.067 and 1.049, to the itch intensity GWASs without removing cases having autoimmune conditions, unadjusted and adjusted by bite size, respectively). We applied a genomic control inflation factor of 1.027 to mosquito attractiveness GWAS.
Following quality controls, a total of 13,520,550 SNPs were available for GWAS for bite size, 13,519,193 SNPs for the representative itch intensity GWAS (with 13,520,550 SNPs both available for the bite size unadjusted and bite size adjusted itch intensity GWASs without removing cases having autoimmune conditions). The itch intensity GWAS performed in males only tested 13,515,082 validated SNPs and in females only tested 13,515,836 validated SNPs. Following quality controls, a total of 7,433,384 SNPs were available for GWAS for attractiveness.

Cross-correlation analysis on phenotypic responses
For bite size against itch intensity, bite size survey responses were scored in numerical fashion, and cross-compared with that from itch intensity, using linear regression analysis, adjusting for age and gender, with data from 84724 unrelated Europeans who answered both web-based surveys. The partial correlation between bite size and itch intensity, controlling for age and gender was calculated as: , where R 2 is the coefficient of determination from the linear regression of bite size against itch intensity, age and gender and R 2 ðÀiÞ is the coefficient of determination from the linear regression of bite size against only age and gender. The partial correlation can be interpreted as the proportion of the remaining unexplained variance that is accounted for by adding "itch intensity" to the existing model.
For bite size against attractiveness, bite size survey responses were scored in numerical fashion, and cross-compared against responses from attractiveness using linear regression analysis, adjusting for age and gender, with data from 7353 unrelated Europeans who answered both web-based surveys. The similar partial correlation between bite size and attractiveness, controlling for age and gender was calculated For itch intensity against attractiveness, itch intensity survey responses were scored in numerical fashion and crosscompared against responses from attractiveness, using linear regression analysis, adjusting for age and gender, with data from 7353 unrelated Europeans who answered both web-based surveys. The similar partial correlation coefficient between bite size and attractiveness, controlling for age and gender was calculated.

Ordinal regression GWAS
We applied ordinal models, a proportional odds model and a partial proportional odds model, to the mosquito bite size and itch intensity phenotypes, restricting the analysis to SNPs with P < 1Â10 À5 in the linear regression GWASs. To apply a proportional odds model a cut-off value was defined and used to create odds ratios that describe the odds of being in equal to or lower than categories than the chosen cut-off versus the odds of being in greater categories. For example, for the itch intensity phenotype, we chose a cut-off value "very_badly", "badly" or "mildly." We applied the proportional odds model to both phenotypes and compared the P values calculated in this model versus those calculated in the linear regression model. P values from the two models were plotted against one another (Supplementary Material, Fig. S6B), and showed high concordance (r 2 ¼0.946 for bite size and r 2 0.994 for itch intensity). For the partial proportional odds models, we relaxed the proportional odds assumption for age, sex and genotypes, and compared the P values to those from the linear regression model. Again, the P values were highly concordant (r 2 ¼0.965 for bite size and r 2 ¼ 0.973 for itch intensity) (Supplementary Material, Fig. S6C).

LD score regression
We used the LD score regression method (11,12) to estimate heritability of the three mosquito-related traits as well as to determine the genetic correlation between all pair-wise combinations of traits. The intercepts of the cross-trait analyses were not constrained, as sample overlap was present among all GWAS analyses performed. All other options were left with the default settings, and we utilized the set of SNPs used to calculate the LD Score files (utilized for the regression analyses) to assess strand ambiguity, remove variants that are not SNPs, and remove SNPs with duplicated rs numbers.
The fraction of variance explained by genome-wide significant loci was calculated by adding the r 2 value across independent loci having adjusted P < 5.0Â10 À8 , where r 2 was estimated from each SNP's z-statistic (beta/SE) via r 2 ¼ zstat 2 / (zstat 2 þ N -2), N being the number of subjects. This was then divided by heritability to get the fraction of heritable variance explained. (13) is a weighted linear regression analysis that estimates a causal relationship between two traits analysed in a pairwise fashion, by taking the most significant summarised genetic association estimates for one trait (the putative causal trait), and regressing these effect estimates against effect estimates for those same variants in some other trait (the putative affected trait). MR Egger regression weights by the inverse variance of the genetic effects on the putative affected trait (by analogy to a meta-analysis), and further it allows for pleiotropy by permitting the regression line to have a non-zero intercept. Under certain assumptions (13), a significantly non-zero intercept can be interpreted as evidence of pleiotropy, while a significant slope can be interpreted as evidence of a causal effect. Note that this represents a parsing of the more traditional measure, in which the regression line is constrained to pass through the origin. Where the slope of the line through the origin is significant (here, using the same inverse-variance weights), this is evidence that there is either pleiotropy or a causal relationship (or any mix thereof), but the distinction between the two is only resolved by removing the constraint on the intercept.

MR Egger regression
Effect sizes for top GWS associations (<5Â10 À8 ) for mosquito bite size, itch intensity (adjusted for size and excluding individuals with prior common immune condition), and attractiveness were used as instrumental variables in Egger regression, where we used only peaks at each loci (pruning away SNPs within 300Kb of a more significant peak, or with r 2 > 0.1 from a more significant peak).
In order to match the positive directionality of the bite size GWAS (where a positive effect size denotes greater bite size), we reversed the effect signs for both itch intensity and attractiveness results, so that a positive effect corresponds to increased itch intensity and greater attractiveness.
MR Egger regression further requires that we choose the effect allele, which increases the putative causal trait, by analogy to the reason that a meta-analysis uses the same definition of reference group across studies.
We then executed MR Egger regression for each pair of traits, in each case looking up the effect sizes for the effect alleles (chosen as described above) of the GWS pruned loci from the putative causal trait in the putative affect trait. We then conducted two linear regressions weighted by inverse variance of the effect size on the putative affected trait, as described by Bowden et al. (13); one through the origin (which hence does not distinguish between pleiotropy and causality), and one not constrained through the origin (where the intercept is interpreted to represent pleiotropy and the slope to represent causality). Note that aside from the obvious considerations of power, an insignificant slope does not necessarily eliminate the possibility of pleiotropy with inconsistent mutual directions across loci.

Assessing enrichment for immune-related enhancers
Lead mosquito-related trait SNPs (P < 5Â10 À8 ) were pruned according to a peak height-dependent distance cut off; SNPs were pruned based on LD (r 2 < 0.1) within a 500Kb window using 1000G human reference genome, where SNPs outside 500Kb from a more significant SNP were automatically considered independent, regardless of LD. Each lead variant was annotated using HaploReg v4.1 (15,16) specifically for '7_Enh' enhancer marks (71) ascribed to any SNP in LD with that lead SNP, using an r 2 of 0.8 according to the European population in 1000 Genomes. Background counts were calculated by applying the same procedure to significant (P < 5Â10 À8 ) GWAS associations from the NHGRI GWAS Catalogue (14). Significance was calculated using Fisher's exact test, comparing the frequency of each enhancer mark in each trait compared to the frequency for all other GWAS association.

Enrichment of GWAS loci explaining variation in T lymphocyte cell subtypes
To test for significant overlap of mosquito-related trait loci with loci known to explain variation in levels of T lymphocyte cell subtypes, we used publically available immunophenotyping GWAS data generated from the SardiNIA study from 144 T lymphocyte subsets (36). A Wilcoxon test was applied to test for an enrichment in -log(P) rankings of mosquito trait SNPs with P values < 1Â10 À5 compared to SNPs genotyped in SardiNIA that did not meet this level of significance for a mosquito trait. A Bonferroni corrected P value of P < 3.5Â10 À4 was applied to determine significant enrichment.

Pathway analyses
We applied PASCAL (Pathway scoring algorithm) (37) to the summary statistics from each mosquito trait GWAS with default parameter. PASCAL assigns SNPs to genes if they are located within a 50kb window either side of the gene body; it corrects for LD effects, using 1000 Genomes data. We estimated gene scores using the sum of chi squares method. The gene scores were used for the protein-protein interaction (PPI) analysis (see below). The PASCAL pathway scoring method recalculates scores by aggregating P values across pathway genes that are in LD and thus cannot be treated independently. It fuses the genes and calculates a pathway score that takes the full LD structure of the corresponding genes into account. Using pathways from the Reactome (72), Biocarta, and KEGG (73) databases we used PASCAL to compute chi-squared and empirically sampled pathway scores. To find high scoring PPI subnetworks we mapped PASCAL gene scores onto their encoded proteins in the StringDB v10 (74) Human PPI network. We restricted the network to PPIs with >90% confidence and removed nodes with a degree >99th percentile of the degree distribution. Any proteins without a PASCAL score were assigned a score taken from a random uniform distribution between 0 and 1. We then used the BioNet R/Bioconductor package (75) to fit a beta-uniform mixture model to the gene score distribution and BioNet's fast heuristic approach to find the highest scoring subnetwork. This process was repeated 100 times for each trait and only those nodes appearing in >60% of the highest scoring subnetworks and that are within the largest connected component were retained.

Supplementary Material
Supplementary Material is available at HMG online.