Human genomics of the humoral immune response against polyomaviruses

Abstract Human polyomaviruses are widespread in humans and can cause severe disease in immunocompromised individuals. To identify human genetic determinants of the humoral immune response against polyomaviruses, we performed genome-wide association studies and meta-analyses of qualitative and quantitative immunoglobulin G responses against BK polyomavirus (BKPyV), JC polyomavirus (JCPyV), Merkel cellpolyomavirus (MCPyV), WU polyomavirus (WUPyV), and human polyomavirus 6 (HPyV6) in 15,660 individuals of European ancestry from three independent studies. We observed significant associations for all tested viruses: JCPyV, HPyV6, and MCPyV associated with human leukocyte antigen class II variation, BKPyV and JCPyV with variants in FUT2, responsible for secretor status, MCPyV with variants in STING1, involved in interferon induction, and WUPyV with a functional variant in MUC1, previously associated with risk for gastric cancer. These results provide insights into the genetic control of a family of very prevalent human viruses, highlighting genes and pathways that play a modulating role in human humoral immunity.


Introduction
The variability of the humoral response to antigenic stimulation has been documented for a long time (Grundbacher 1974). While it is partly due to demographic and environmental factors, a substantial part of this variation is heritable and can thus be attributed to differences in the host genome. For example, human genetic factors were estimated to account for 35 per cent (±6 per cent) and 21 per cent (±5 per cent) of the variance in Helicobacter pylori and Toxoplasma gondii antibody levels and 60 per cent (±11 per cent) and 57 per cent (±11 per cent) of the variance in cytomegalovirus and Epstein-Barr virus (EBV) serostatus, respectively (Rubicz et al. 2011). This high heritability makes serological phenotypes highly promising targets for human genomic studies. Indeed, specific genetic variants have already been associated with variation in immunoglobulin G (IgG) levels. Most variants identified to date are located within the major histocompatibility complex locus on chromosome 6 (Rubicz et al. 2013;Hammer et al. 2015;Scepanovic et al. 2018). This highly polymorphic complex notably encodes the class I and class II human leukocyte antigen (HLA) proteins, which present antigenic epitopes to effector cells of the immune system-a key role in the crossroad between innate and acquired immune responses.
Polyomaviruses are small (40-50 nm), double-stranded DNA viruses belonging to the Polyomaviridae family, with an average genome size of 5,000 base pairs (Gardner et al. 1971;Padgett et al. 1971). First discovered in rodents in the 1950s, the first human polyomavirus, namely, JC polyomavirus (JCPyV), was discovered in 1965 in a patient with progressive multifocal leukoencephalopathy (PML) (Gross 1953;Zurhein and Chou 1965).
Serological surveys indicate that human polyomaviruses are ubiquitous, with most of the world's population infected by several of them during childhood (Cook 2016). In most cases, polyomavirus infections are asymptomatic. After acute infection, they silently persist in healthy individuals. However, they can lead to symptoms and even severe disease in immunocompromised individuals. As an example, reactivation of JCPyV can cause PML in patients with low CD4 + T-cell counts, such as individuals with AIDS or on chronic immunosuppressive therapy, and BK polyomavirus (BKPyV) is known to cause renal disease in kidneytransplanted patients (Gardner et al. 1971;Padgett et al. 1971;Crum-Cianflone et al. 2007;Trofe-Clark and Sawinski 2016). A better understanding of the genetic control of the human immune response against polyomaviruses may provide new insights into pathologic mechanisms. We therefore searched for associations between human genetic variation and both differences in IgG levels and serostatus against five human polyomaviruses: BKPyV, JCPyV, human polyomavirus 6 (HPyV6), Merkel cell polyomavirus (MCPyV), and WU polyomavirus (WUPyV), using genome-wide association study (GWAS) and metaanalysis in a total of 15,660 healthy individuals of European ancestry from three studies from Switzerland, the United Kingdom, and Germany.

GWASs and meta-analyses
We performed a total of nine independent GWASs using either a case-control study design (serostatus: antibody-positive versus antibody-negative) or a continuous, quantitative approach (IgG levels in seropositive individuals) to search for human genetic determinants of the antibody response to BKPyV and JCPyV in the three cohorts, to HPyV6 and WUPyV in CoLaus and GRAS, and to MCPyV in GRAS and UKB (Fig. 1). We did not run a case-control analysis for WUPyV because of the very high seroprevalence of that virus (96 per cent). Meta-analyses were conducted using GWAS summary statistics. For each of the five phenotypes, metaanalytic inflation factors λ ranged from 0.99 to 1.01 across the tested cohorts, indicating proper control of population stratification ( Supplementary Fig. S1). In total, we identified four genomewide significant loci, mapping to the HLA (for HPyV6, JCPyV, and MCPyV), FUT2 (for BKPyV and JCPyV), STING1 (for MCPyV), and MUC1 (for WUPyV) regions (Fig. 2).

Role of HLA variation in antibody response to polyomaviruses
Significant associations were observed in the HLA class II region on chromosome 6 for HPyV6, JCPyV, and MCPyV (Fig. 2), with the lead variants being rs9272624 (P = 9.62 × 10 −14 ), rs112463084 (P = 2.18 × 10 −20 ), and rs147658078 (P = 8.09 × 10 −20 ), respectively. To further dissect the association signals, we imputed four-digit HLA alleles and tested them for association with anti-HPyV6, anti-JCPyV, and anti-MCPyV VP1 IgG levels. The regional association plots are shown in Supplementary Fig. S2, and the resulting association P-values are listed in Table 2. We observed strong associations with quantitative seroreactivity for HLA-DRB1 and HLA-DQA1 alleles for all three human polyomaviruses. The size and directionality of the effects were virus-specific.  SNPs are plotted on the x-axis according to their physical position on each chromosome, and the strength of the association with antibody levels is indicated on the y-axis (as −log10 P-value for IgG response and as +log10 P-value for serostatus). The dashed line marks the Bonferroni-corrected genome-wide significance threshold of 5.6 × 10 −09 . For each locus, the SNP with the most significant association is plotted in red and labeled. Serostatus was not tested for WUPyV due to very high seroprevalence. Chromosome X is not shown since data were not available for all studies.

FUT2 non-secretor status is associated with stronger antibody responses to BKPyV and JCPyV
We observed significant associations in the FUT2 gene on chromosome 19 for BKPyV and JCPyV (Fig. 2). The fucosyltransferase 2 protein, encoded by FUT2, is expressed in epithelial cells and adds fucose to type 1 glycoprotein chains to form soluble ABO blood group (ABH) antigens that can be found in secretions and fluids (Kelly et al. 1995;Rouquier et al. 1995). The top associated singlenucleotide polymorphisms (SNPs) for BKPyV (rs681343, P = 6.40 × 10 −16 ) and JCPyV (rs12462111, P = 4.84 × 10 −10 ) are in high linkage disequilibrium (LD, r 2 > 0.7), with the common FUT2 nonsense variant rs601338 (P = 6.81 × 10 −16 and P = 1.89 × 10 −09 , respectively). The homozygous genotype for the rs601338 minor allele (AA) is classically referred to as the 'non-secretor' variant, as individuals homozygous for this null allele do not secrete blood group antigens at epithelial surfaces (Kelly et al. 1995;Rouquier et al. 1995). Among the three cohorts, 24 per cent of individuals were FUT2 non-secretors (AA) and 76 per cent were FUT2 secretors (GG or GA). Overall, we observed that non-secretors had significantly higher IgG levels against both polyomaviruses in comparison to secretors (Fig. 3).

Genomic variation in STING1 is associated with antibody response to MCPyV
We observed genome-wide significant associations with anti-MCPyV IgG levels in the STING1 gene on chromosome 5 (Fig. 2). STING1 encodes the endoplasmic reticulum-resident membrane protein STING that functions as a major regulator of innate immunity. Upon viral infection and activation of systolic cyclic GMP-AMP synthase, STING mediates type I interferon (IFN-I) production by infected cells to protect them and neighboring cells from local infection (Barber 2011). The top associated variant in meta-analysis using GRAS and UKB was rs7444313 (P = 5.95 × 10 −15 ), a strong eQTL for STING1 in the GTEx data set, whose G allele was associated with a stronger IgG response against MCPyV VP1 (GTEx Consortium 2013).

Alternative splicing of MUC1 is associated with humoral immune response to WUPyV
We observed a genome-wide significant association with anti-WUPyV IgG levels on chromosome 1, mapping to the MUC1 gene that encodes the Mucin1 protein (lead SNP rs12411216, P = 2.70 × 10 −30 ; Fig. 2). This variant is in perfect LD (r 2 = 1), with the splicing variant rs4072037 (P = 4.08 × 10 −30 ) located in the second exon of the MUC1 gene. The rs4072037 T allele was associated with a stronger IgG response against WUPyV VP1. The same allele was previously associated with an increased risk of gastric cancer (Zheng et al. 2013;Liu et al. 2014). Because infections with H. pylori and EBV have been implicated in the pathogenesis of gastric cancers, we searched for potential associations of rs4072037 with the humoral response to these pathogens (Shibata and Weiss 1992;Tokunaga et al. 1993;Takada 2000;Li et al. 2003). Using GWAS data available in CoLaus, we performed serological analyses for six different H. pylori antigens and four EBV antigens but did not find evidence for an rs4072037-dependent differential antibody response (Supplementary Table S1).

Relative abundance of short MUC1 isoforms depends on rs4072037 allele status
The SNP rs4072037 has been functionally investigated previously and was shown to cause alternative splicing at the 5 ′ end of exon 2, leading to the absence or presence of nine amino acids in the signal peptide region (Zheng et al. 2013). Due to the difficulty in determining full exon connectivity from short-read sequencing data, the distribution of isoforms remained unclear. Therefore, we have decided to use nanopore sequencing technology, allowing us to investigate sequencing long reads. We obtained twelve stomach RNA samples from GTEx: six from homozygous carriers of the rs4072037 major allele T and six from homozygous carriers of the minor allele C (GTEx Consortium 2013). We amplified the MUC1 region and sequenced all barcoded amplicons on a single flow cell on the MinION sequencing device. In total, we obtained 1,121 reads spanning the full-length MUC1 isoforms, i.e., having assignable barcodes at both ends for downstream analysis (Supplementary Figs. S3-S4; Supplementary Table S2). We exclusively observed an isoform with a short exon 2 in the TT individuals, whereas a longer isoform was always present in the CC individuals, thereby confirming the modulating effect of rs4072037 genotype on MUC1 splicing.

Similar results using a case-control study design
We then investigated whether the genetic architecture of antibody levels against human polyomaviruses is similar to that of serostatus of the same polyomaviruses. Serological status was defined based on the manufacturers' defined cutoff of 250 MFI (see the 'Materials and methods' section). Individuals with levels of IgG antibodies above the threshold were considered seropositive (cases), while those below the threshold were seronegative (controls) ( Table 1). Case-control GWASs using logistic regression were performed independently in the three cohorts and meta-analyzed (Willer, Li, and Abecasis 2010). Q-Q plots revealed no inflation of the test statistics for HPyV6 (λ = 0.99) and JCPyV (λ = 0.98), while MCPyV (λ = 0.90) and BKPyV (λ = 0.72) showed varying degrees of deflation ( Supplementary Fig. S1). Chicago plots for the genomewide significant loci are presented in Fig. 2. We again observed evidence of an involvement of HLA class II variation for JCPyV (rs9271025, P = 4.31 × 10 −46 ) and MCPyV (rs73401330, P = 8.49 × 10 −29 ) ( Table 2). Strongest associations with seropositivity were observed for HLA-DQB1 allele. We also replicated the associations observed in FUT2 for JCPyV (rs2432132, P = 1.70 × 10 −15 ) and STING1 for MCPyV (rs55792153, P = 2.97 × 10 −09 ). Associations with FUT2 and HLA variants were not replicated for BKPyV and HPyV6, respectively. In addition, the case-control analyses did not reveal any additional associated region, suggesting that there are no SNPs purely associated with susceptibility.

Discussion
Human polyomaviruses are highly prevalent and can cause severe disease in immunocompromised individuals. They present a wide range of tissue tropisms and appear to cause neurologic, renal, and skin diseases (Feng et al. 2008;Ferenczy et al. 2012;Kuppachi et al. 2016). While their molecular mechanisms of infection are well-known, the potential impact of human genetic variation on their life cycle and pathogenicity remains understudied. For this study, we were interested in gaining insight into the genetic underpinning of the humoral immune responses to five polyomaviruses: BKPyV, JCPyV, HPyV6, MCPyV, and WUPyV. Their seroprevalence was high in all three cohorts included in our study (55-96 per cent), consistent with previously published surveys and confirming that human polyomavirus infections are common in the general population (Hammer et al. 2015;Kamminga et al. 2018). Genes within the HLA region are prime candidates for susceptibility to viral infection and regulation of the immune response. HLA has been associated with multiple infectious diseases such as HIV, hepatitis B, hepatitis C, and tuberculosis (Sanchez-Mazas 2020). In this study, we confirm the involvement of HLA variation in the modulation of humoral immune response to two polyomaviruses, JCPyV and MCPyV (Hammer et al. 2015;Scepanovic et al. 2018;Kachuri et al. 2020). Specifically, we observed the strongest associations with quantitative antibody levels for HLA-DRB1 and HLA-DQA1 alleles and with seropositivity for HLA-DQB1 allele.
Our study also shows that FUT2 variation (rs601338 G>A) plays a role in the determination of antibody levels against BKPyV and JCPyV VP1. The FUT2 enzyme catalyzes the addition of a fucose residue to type 1 glycoprotein chains to form ABH blood group antigens that are released in secretions and at mucosal surfaces (Marionneau et al. 2001). Consistent with previous reports, 78 per cent of our study population carried at least one functional allele, which is sufficient to render an individual FUT2 secretor, whereas 22 per cent of individuals were found to be homozygous for the nonsense variant, leading to defective FUT2 and the absence of ABH antigens in secretions (FUT2 non-secretors) (Kelly et al. 1995;Azad, Wade, and Timpson 2018). Interestingly, nonsecretors had significantly higher IgG titers against BKPyV and JCPyV compared to secretors, suggesting higher antigen exposure and/or stronger immune response. One hypothetic explanation is that the binding of polyomaviruses to host cell sialic acid receptors could be impaired in the presence of FUT2-dependent blood antigens on the mucosal surfaces. Consequently, non-secretors might be more permissive to viral propagation and replication than secretors. The same FUT2 variant is known to have an impact on several infections: non-secretors are protected against norovirus and rotavirus but are at higher risk of infection with other pathogens, including, e.g., Candida albicans, Streptococcus pneumoniae, and mumps (Thorven et al. 2005 The capacity to generate soluble ABH blood group antigens, dependent on the functionality of the FUT2 enzyme, represents an important modulator of infectious disease susceptibility in humans. Our analyses confirm the implication of the STING1 gene in the determination of antibody levels against MCPyV VP1, which was recently reported by Kachuri et al. (2020). The variants most associated with anti-MCPyV IgG levels are strong regulators of the expression levels of STING1. The encoded protein, STING, plays a pivotal role in anti-viral innate immunity by activating IFN-I and proinflammatory cytokines. Individuals carrying the homozygous genotype for the rs7444313 minor allele (GG) had significantly higher anti-MCPyV IgG levels. A recent study by Kennedy et al. reported an association between the same variant and the intensity of the immune response following primary smallpox vaccination (Kennedy et al. 2020). The same study also showed that expression of the correlated non-synonymous rs1131769 variant (R232H) in the STING1 gene leads to a decrease in IFN-α expression levels. This result substantiates a previous finding showing reduced IFN-β transcription for carriers of the R232H variant (Zhang et al. 2013). Evidence suggests that this amino acid change may have a modifying effect on the function of STING. Due to a weaker binding between cGAMP and the H232 variant, STING might be less effective, resulting in a lower immune response. In turn, this might allow higher viral replication, resulting in increased anti-MCPyV IgG levels. Interestingly, rs7444313 has significantly different allele frequencies in European populations (ref. allele: G = 0.28) relative to African populations (ref. allele: G = 0.83), suggesting a possible role of infection-driven selection in this genomic region.
We observed a very strong association signal with anti-WUPyV IgG response in MUC1, the gene encoding the Mucin1 protein.
There is no established role of WUPyV in disease thus far. However, it was detected in the trachea of immunocompromised children, in close proximity to MUC5AC-expressing cells (Siebrasse et al. 2016). Physiologically, cell surface mucins form a network that contributes to the mucosal barrier to infection (Hattrup and Gendler 2008). MUC1, in particular, consists of a large extracellular O-glycosylated polypeptide backbone that extends 200-500 nm above the apical surface (Hollingsworth and Swanson 2004). The minor allele (T) of the top associated SNP, rs4072037, which associates with higher IgG levels, is known to alter the physiological functions of MUC1 because it leads to an alternative splicing of the 5 ′ -region of exon 2. As a consequence, a shorter protein isoform is produced, which impairs the protective function of MUC1 in the gastric mucosa (Xu et al. 2009). The rs4072037 T/T genotype has been previously associated with an increased risk of gastric cancer (Xu et al. 2009;Jia et al. 2010), most likely due to a concomitant increase in the risk of gastritis due to H. pylori infection (Vinall et al. 2002). The newly discovered association with humoral response to WUPyV is most likely coincidental. Indeed, the absence of any trace of WUPyV in cancer samples suggests that this virus is unlikely to play any role in gastric carcinogenesis (Toptan et al. 2016). However, our results suggest once again that MUC1 plays an essential role in the protection of gastric mucosa and that a common, genetically encoded alteration of MUC1 has a deleterious impact on that important physiological function.
For human polyomaviruses with seroprevalence below 95 per cent, we investigated the relationship between the genetic contribution to serostatus and humoral immune response in seropositive individuals. Whereas the quantitative approach is purely looking at immune response in infected individuals, the casecontrol study design also has the potential to uncover genetic loci involved in susceptibility to infection. Our meta-analysis results suggest very similar genetic contributions to the continuous (quantitative) and binary (case-control) phenotypes, with comparable statistical strength. The same variable genetic loci in HLA class II, FUT2 and STING1, were found to be associated with JCPyV6 and MCPyV serostatus and IgG levels. These similarities in the results could be explained by the impossibility of being able to differentiate between individuals who were never infected and individuals who were infected but had an extremely weak antibody response.
Here, we report the first meta-analyses aiming to identify human genetic determinants of the humoral immune response against multiple polyomaviruses. Statistically combining data from CoLaus, GRAS, and a subset of the UKB allowed us to run analyses on a total of 15,660 individuals of European ancestry. This provided greater statistical power and a more accurate estimate of the underlying effects, also reducing the risk of false-negative results. Some limitations need, however, to be considered. First, we did not have serological data for all five polyomaviruses in each cohort. Second, the meta-analyses relied on summary statistics derived from independently computed GWAS, which might result in biases due to differences in the design, analysis, and conduct of the individual studies. Third, our analyses were restricted to individuals of European ancestry, which limits the generalizability of our results. Finally, as mentioned above, individuals with very low IgG levels were excluded from the quantitative association analyses, due to the impossibility of distinguishing them from truly seronegative (i.e., never infected) study participants; this resulted in reduced statistical power.
In summary, we here report strong associations of HLA class II, FUT2, STING1, and MUC1 genetic variants with the intensity of the humoral IgG response to multiple human polyomaviruses. Together, these results demonstrate the modulating contribution of host genetic variation to the individual response against some of the most prevalent human viruses.

CoLaus study
The Cohorte Lausannoise (CoLaus) is a population-based prospective study that started in 2003 in Lausanne, Switzerland (Firmann et al. 2008). It includes 6,188 individuals of European descent (47.5 per cent male) initially aged 35-75 years (mean ± SD: 51.1 ± 10.9), who were randomly selected from the general population and continuously undergo a follow-up every 5 years. Detailed phenotypic information was obtained from every study participant through questionnaires, a physical assessment, and biological measurement of blood and urine markers. The institutional Ethics Committee of the University of Lausanne, which afterward became the Ethics Commission of Canton Vaud (www.cer-vd.ch), approved the baseline CoLaus study (reference 16/03, decisions of 13 January and 10 February 2003), and written consent was obtained from all participants.

The GRAS data collection
The Göttingen Research Association for Schizophrenia (GRAS) data collection consists of 2,363 immunocompetent adults of European ancestry, comprising 1,147 anonymized blood donors (62.0 per cent male, mean age ± SD: 37.5 ± 13.2) and 1,216 individuals with psychiatric diagnoses (64.9 per cent male, mean age ± SD: 40.6 ± 13.5) (Begemann et al. 2010;Ribbe et al. 2010). All study participants provided informed consent, including genetic testing, and the GRAS data collection has been approved by the ethical committee of the Georg-August-University of Göttingen (master committee) as well as by the respective local regulatory/ethical committees of all collaborating centers (Ribbe et al. 2010).

UK biobank
The UK Biobank (UKB) is a population-based prospective study whose recruitment process has been described previously (Sudlow et al. 2015). Briefly, half a million men and women aged 40-69 years (45.6 per cent male, mean age ± SD: 56.5 ± 8.1) attended one of the twenty-two UKB assessment centers located throughout England, Scotland, and Wales between 2006 and 2010. All participants completed a touchscreen questionnaire and verbal interview and had a range of physical measurements and blood, urine, and saliva samples taken for long-term storage. A subset of 20,000 individuals attended a repeat assessment between 2012 and 2013. This project was conducted with approved access to UKB data under application numbers 50085 (PI: Fellay) and 43920. All UKB participants provided informed consent at recruitment. Ethics approval for the UKB study was obtained from the North West Centre for Research Ethics Committee (11/NW/0382).

Serological analyses
To assess humoral responses to polyomaviruses, serum samples were independently shipped by the three studies to the Infections and Cancer Epidemiology Division at the German Cancer Research Center (Deutsches Krebsforschungszentrum, DKFZ) in Heidelberg. Seroreactivity against BKPyV, HPyV6, JCPyV, MCPyV, and WUPyV viral capsid protein 1 (VP1) was measured at serum dilution 1:1,000 using multiplex serology based on glutathione-S-transferase-VP1 fusion capture immunosorbent assays combined with fluorescent bead technology (Waterboer et al. 2005). Antigen preparation and methodology, as well as robustness of seroprevalence estimates, have been described previously for human polyomaviruses (Gossai et al. 2016). For the quantitative IgG level association analyses including only seropositive individuals, the results above the assay cutoff of 250 mean fluorescence intensity (MFI) were used. For the case-control study design, the results were expressed as a binary output (serostatus: anti-polyomavirus IgG-positive or IgG-negative) based on the same assay cutoff. Using the same assay, seroreactivity against six H. pylori antigens (HP1564 OMP, HP10 GroEL, HP547 CagA, HP887 VacA, HP73 UreaseA, and HP875 Catalase) and four EBV antigens (VCA p18, EBNA, Zebra, and EA-D) was measured in CoLaus.

DNA genotyping and statistical analyses
4.3.1 CoLaus study DNA samples from 5,399 CoLaus participants were genotyped for 799,653 SNPs using the BB2 GSK-customized Affymetrix Axiom Biobank array. Quality control procedures described by Anderson et al. were applied (Anderson et al. 2010). A total of 4,781 individuals were included for further analyses. Variants were excluded if they were missing in >5 per cent of the subjects or if they deviated significantly from Hardy-Weinberg equilibrium (HWE, P < 10 × 10 −7 ), resulting in 697,923 SNPs kept for phasing and imputation.
To statistically estimate the haplotypes, the genotypes were phased using EAGLE2 v2.0.5 software and the Haplotype Reference Consortium (HRC) r1.1 reference panel (Loh et al. 2016;McCarthy et al. 2016). Genotype imputation was performed using two independent reference panels: the HRC reference panel and the merged 1000 Genomes Phase 3 and UK10K reference panel (Birney and Soranzo 2015;Walter et al. 2015;Loh et al. 2016). Both phasing and imputation were performed using the Sanger Imputation Service (https://imputation.sanger.ac.uk). The imputed data set (N = 89 million SNPs) consisted of (1) SNPs imputed with HRC reference panel only, (2) SNPs imputed with the merged reference panel only, and (3) SNPs with the highest imputation information (INFO) score if imputed with both reference panels. To filter out poorly imputed or rare genotypes, SNPs with INFO score <0.8, minor allele frequency (MAF) <1 per cent, or significant deviation from HWE (P < 10 × 10 −7 ) were excluded, resulting in a final number of 9,031,264 SNPs available for association analysis. We also imputed 111 four-digit classical HLA alleles using SNP2HLA and the T1DGC Immunochip/HLA reference panel (Jia et al. 2013).
Genome-wide association analyses were performed using mixed models in GCTA (Yang et al. 2011). HLA association analyses were performed using generalized linear models in PLINK v2 (Chang et al. 2015). We assumed an additive model for SNP associations and a dominant model for associations with HLA alleles. The top three principal components of ancestry, sex, and age were used as covariates in all analyses.

The GRAS data collection
Genome-wide SNP genotyping of GRAS study participants was performed using an Axiom® myDesign™ Genotyping Array (Affymetrix, Santa Clara, CA, USA). The quality control steps have been described previously (Hammer et al. 2015). Imputation of unobserved genotypes was performed using the 1000 Genomes Project Phase 1 v3 haplotypes as reference panel. Genotypes were pre-phased with MaCH v19 and subsequently imputed by Minimac (Howie et al. 2012). SNPs with a reported INFO score <0.8 or an MAF <5 per cent, as well as markers on sex chromosomes, were excluded from downstream analyses. SNPs were also filtered on the basis of missingness (excluded if <95 per cent genotyping rate) and marked deviation from HWE (excluded if P < 5 × 10 −07 ). We then used linear regression models in PLINK v1.9 to test for association between ∼6 million SNPs and IgG responses using a continuous, quantitative approach (log-normalized IgG levels in seropositive samples) (Chang et al. 2015). The first three principal components of ancestry, calculated using GCTA (v1.24), as well as sex and age, were included as covariates in all analyses (Yang et al. 2011).

UK Biobank
Genotyping and imputation of UKB individuals have been fully described by Bycroft et al. (2018). Briefly, samples were genotyped on either the UK BiLEVE Axiom array (Affymetrix) or UKB Axiom array (Applied Biosystems). Genotypes were phased using SHAPEIT3 and the 1000 Genome Phase 3 data set as a reference and then imputed using IMUTE4 using the Haplotype Reference Consortium data, 1,000 Genomes Phase 3, and UK10K data as references. Serological markers of infection were measured for a random subset of 9,611 unrelated UKB participants (44.0 per cent male, mean age ± SD: 56.5 ± 8.2) (Mentzer et al. 2019). Association analyses were carried out using rank-based inverse normal transformed MFI values and linear mixed models as implemented in BOLT-LMM (Loh et al. 2015(Loh et al. , 2018, using age at recruitment and genetic sex as covariates and variants with an MAF >0.01 and an INFO score >0.3.

Meta-analyses
Meta-analyses of the GWAS results obtained in the different cohorts were performed using a fixed-effect model in METAL (Willer, Li, and Abecasis 2010). For BKPyV and JCPyV, summary statistics of all three studies were used. For HPyV6 and WUPyV, only the results of CoLaus and GRAS studies were used, and for MCPyV, only the results of UKB and GRAS were used, as the data were not available in the UKB and CoLaus, respectively (Fig. 1). The meta-analyses followed the sample size-based strategy, considering P-value, sample size, and direction of effect. The Bonferroni-corrected significance level threshold of P = 5.6 × 10 −09 (P = 5 × 10 −08 divided by the eight GWAS performed) was applied. Similarly, for HLA alleles, we determined significance thresholds correcting for the number of alleles, the viruses tested, and the two study designs (P = 4.3 × 10 −05 ).

Inference of FUT2 secretor status
FUT2 secretor status was determined from the rs601338 genotype, where the wild-type rs601338 (G) encodes the secretor allele, whereas A is the nonsense 'non-secretor' allele. Individuals with homozygous major (GG) or heterozygous (GA) genotypes were considered as secretors and compared with homozygous minor (AA) non-secretors (Kelly et al. 1995).

MUC1 full-length mRNA and nanopore sequencing
A total of twelve RNA samples from gastrointestinal tissue (200-225 ng each) were obtained from the Genotype-Tissue Expression (GTEx) project (GTEx Consortium 2013). Samples were selected based on expression values of the MUC1 gene in the GTEx expression data set and based on the genotype of the associated variant. Nine donors were male, and three donors were female. Retrotranscription was performed using SMARTer™ PCR cDNA Synthesis Kit from Clontech. We used the following forward and reverse primers to amplify the complementary (cDNA) of the MUC1 transcript: 5 ′ -GCGCCTGCCTGAATCTGTTC-3 ′ and 5 ′ -CCCACATGAGCTTCCACACAC-3 ′ . For the polymerase chain reaction, we used LongAmp™ Taq 2X Master Mix (Super-Script from Life Technologies/Q5 High-Fidelity from New England BioLabs) and the following protocol: 30 seconds at 94 • C for initial denaturation, followed by 30 cycles at 94 • C for 10 seconds, 55 • C for 60 seconds, 65 • C for 50 seconds, and final extension at 65 • C for 10 minutes. To pool the twelve samples for sequencing, we used native EXP-PBC001 barcoding kit on R9.4 flow cell. The following adapter sequences for MinION sequencing were added to the forward and reverse MUC1 primers, respectively: 5 ′ -TTTCTGTTGGTGCTGATATT-3 ′ and 5 ′ -ACTTGCCTGTCGCTCTATCTTC-3 ′ . We used the SQK-MAP006 protocol for library preparation for nanopore sequencing on MinION (Oxford Nanopore Technologies, ONT). Briefly, the combined end repairing and dA-tailing step was performed using the NEB-Next Ultra™ II End Repair/dA-Tailing Module (New England Bio-Labs). Samples were incubated for 5 minutes at 20 • C followed by 5 minutes at 65 • C. The reactions were purified by using Agencourt AMPure XP beads (Beckman Coulter) and further ligated to HPadapter using Blunt/TA Ligase Master Mix (New England BioLabs) with 10 minutes of incubation at room temperature. Adapted DNA was purified using Dynabeads MyOne Streptavidin C1 (Life Technologies) and washed to remove unbound DNA. The captured cDNA library was eluted by resuspending the beads in ONT's Elution Buffer for 10 minutes at 37 • C, and the beads were pelleted using a magnetic rack, leaving the supernatant containing the library. We used poretools (Loman and Quinlan 2014) to extract FASTA sequences from the ONT fast5 format, with a minimum length of 600 base pairs and forward and reverse strand sequence (2d reads). We then used nanocorrect (Loman, Quick, and Simpson 2015) for computational error correction and aligned all corrected reads to a list of MUC1 isoforms using the LAST aligner (Kiełbasa et al. 2011). Each read was assigned to the isoform with the best alignment score.