Genetic Susceptibility to Astrovirus Diarrhea in Bangladeshi Infants

Abstract Background Astroviral infections commonly cause acute nonbacterial gastroenteritis in children globally. However, these infections often go undiagnosed outside of research settings. There is no treatment available for astrovirus, and Astroviridae strain diversity presents a challenge to potential vaccine development. Methods To address our hypothesis that host genetic risk factors are associated with astrovirus disease susceptibility, we performed a genome-wide association study of astrovirus infection in the first year of life from children enrolled in 2 Bangladeshi birth cohorts. Results We identified a novel region on chromosome 1 near the loricrin gene (LOR) associated with astrovirus diarrheal infection (rs75437404; meta-analysis P = 8.82 × 10−9; A allele odds ratio, 2.71) and on chromosome 10 near the prolactin releasing hormone receptor gene (PRLHR) (rs75935441; meta-analysis P = 1.33 × 10−8; C allele odds ratio, 4.17). The prolactin-releasing peptide has been shown to influence feeding patterns and energy balance in mice. In addition, several single-nucleotide polymorphisms in the chromosome 1 locus have previously been associated with expression of innate immune system genes PGLYRP4, S100A9, and S100A12. Conclusions This study identified 2 significant host genetic regions that may influence astrovirus diarrhea susceptibility and should be considered in further studies.

Diarrhea is the eighth leading cause of death across all ages and the second leading cause of death among children <5 years of age [1,2].There were 70.6 deaths per 100 000 children in this age group globally in 2016, compared with 1.3 deaths per 100 000 children within high-income countries [3].Classic human astroviruses alone account for 2%-9% of all cases of acute nonbacterial gastroenteritis in children across the world, and an analysis of a Bangladeshi birth cohort found that 15%-20% of all diarrhea samples in the first year of life were positive for astrovirus [4,5].Astroviral infections are characterized by 2-3 days of watery diarrhea, but unlike many other enteric pathogens, astrovirus causes little histological change to intestinal epithelia, including inflammatory responses and cell death [4].The virus can be recovered from the feces of asymptomatic children, suggesting the possibility of prolonged viral shedding or possibly a mechanism that allows astrovirus to remain in the gastrointestinal tract while causing epithelial barrier dysfunction [6][7][8][9][10].
Beyond gastrointestinal infections, there are reported cases of immunocompromised patients with encephalitis and meningitis, where astrovirus RNA has been detected in cerebrospinal fluid [11,10].Coupled with other known target organs in animals and the ability to infect across species, the burden of disease associated with astroviral infections may be higher than expected.In patients with gastroenteritis-associated astrovirus infections, the only available therapy is fluid replacement to avoid dehydration, as treatment for the virus does not exist.
Astrovirus is most commonly spread from person to person, especially via fecal-to-oral transmission and drinking water routes [12,13].There are 2 types of astrovirus, classic and novel, that are determined by genetic similarity; novel astroviruses are phylogenetically distant from classic human astroviruses [11].Both classic and novel astroviruses circulate globally, with classic viruses more prevalent in developing countries [4,10].The persistence of astrovirus in high-income settings suggests that prevention of disease requires strategies beyond hygiene improvements [14].Of the 3 open reading frames in the astrovirus genome, classic astroviruses share 64%-84% of capsid amino acid similarities and 93%-95% of nucleotide similarities in part of the second open reading frame [10].Novel astroviruses belong to different clades than classic astroviruses, sharing up to 54% of amino acid identity with classic astroviruses [10].The wide range of genetic diversity within Astroviridae makes potential vaccine development especially difficult.
We lack an understanding of the risk factors associated with astrovirus infections and disease severity.To identify host genetic risk factors associated with astrovirus disease susceptibility that may explain disease mechanism and vaccine targets, we performed genome-wide association studies in children enrolled in 2 birth cohorts, the Performance of Rotavirus and Oral Polio Vaccines in Developing Countries (PROVIDE) study and the Cryptosporidiosis and Enteropathogens in Bangladesh Birth Cohort (CBC) study in Dhaka, Bangladesh [15,16].We meta-analyzed the results and identified 2 novel regions on chromosomes 1 and 10 significantly associated with astrovirus diarrheal infections in infants.

METHODS
The PROVIDE study protocol was approved by the Research Review Committee and Ethics Review Committee at the International Centre for Diarrhoeal Disease Research Bangladesh (icddr,b) and the institutional review boards of the University of Virginia and Vermont before implementation.The Ethics and Research Review Committees at icddr,b approved the CBC study.For both studies, informed written consent was obtained from the participants or the parents or guardians of all participants.

PROVIDE Study Design
The PROVIDE study, including children from the Mirpur area of Dhaka, Bangladesh, aimed to evaluate the efficacy of oral and injectable vaccines using a randomized controlled clinical trial 2 × 2 factorial design [15].From 2011 to 2016, the 700 children in the birth cohort and their mothers were followed up for the child's first 2 years of life, with biweekly diarrhea surveillance conducted in the homes by field research assistants.Active episodes of diarrhea were referred to the study clinic for evaluation and treatment.For each episode, a diarrhea stool specimen was collected.Height-for-age z (HAZ) and weight-for-age z (WAZ) scores were collected every 3 months.

CBC Study Design
The CBC study investigated the disease burden of cryptosporidiosis and its effect on children's growth in urban and rural Bangladesh [16].A total of 500 children from Mirpur, Dhaka, and 258 children from Mirzapur, a rural subdistrict located near Dhaka, were enrolled at birth.From 2014 to 2018, twice-weekly in-home visits were conducted by field research assistants to collect data regarding diarrhea and disease in children, and HAZ and WAZ scores were collected every 3 months.The study clinic was available to children and caregivers for development of symptoms of any illness.Stool samples were collected monthly and during episodes of diarrhea.Only a subset of children from the Mirpur site had polymerase chain reaction (PCR) testing of diarrheal samples (n = 220) and were therefore included in this study.

Case and Control Definitions
Stool samples collected in both cohorts were tested for the presence of pathogens via real-time reverse-transcription PCR (RT-PCR) using TaqMan Array Cards [17].Bar graphs displaying the distribution of RT-PCR cycle threshold (Ct) values for astrovirus in diarrhea samples were plotted using R v3.5.1 (Supplementary Figure 1).Case patients with diarrhea attributable to astrovirus were defined as children with diarrheal samples collected within the first year of life that resulted in RT-PCR Ct values for astrovirus >0 and <30.Children were defined as controls if they had ≥1 diarrhea sample available for testing from the first year of life but all RT-PCR Ct values for astrovirus were ≥30.

Genotyping Array
In the PROVIDE study, children were genotyped on the Expanded Multi-Ethnic Genotyping Array (MEGA-EX) from Illumina, and those from the CBC study were genotyped on Illumina's Infinium Multiethnic Global Array (MEGA).These genetic data were phased with SHAPEIT software (version 2) and imputed with IMPUTE software (version 2.3.2) with 1000 Genomes Project phase 3 data as the reference.Standard quality control metrics were used for the genomewide data.Single-nucleotide polymorphism (SNP) filters included genotype missingness <5%, minor allele frequency (MAF) >0.05, and Hardy-Weinberg equilibrium P > 10 −5 .The PROVIDE cohort had 10 792 283 initial variants, and the total genotyping rate was 0.99.A total of 590 340 variants were removed owing to missing genotype data, 1 487 431 were removed owing to minor allele threshold, and 131 were removed owing to Hardy-Weinberg exact test, leaving 8 777 081 variants.
In the CBC study, there were 10 942 212 initial variants and a total genotyping rate of 0.99.A total of 532 850 variants were lost owing to missing genotype data, 1 528 417 were removed owing to minor allele threshold, and 8 were removed owing to Hardy-Weinberg exact test, resulting in 8 880 118 variants that passed the quality control filters.Eighteen individuals in the CBC study were identified as outliers in the principal component analysis (PCA) based on their PCA score.Those with PCA scores falling 3 standard deviations above or below the median PCA value were removed (Supplementary Figure 2A  and 2B).The genomic inflation factor, or λ, showed no inflation (λ=1.003 and 1.033 for PROVIDE and CBC studies, respectively) (Supplementary Figure 3).

Association Analysis
Genome-wide association analyses [18] using homo sapiens (human) genome assembly GRCh37 (hg19) from the Genome Reference Consortium were performed separately for each study, using logistic regression with an additive model within SNPTEST version 2. Manhattan plots were constructed for each cohort using the ggplot2 package in R software, version 3.6.1,and highlights of regions of interest were created using LocusZoom tools at locuszoom.orgwith hg19/1000 Genomes South Asian as the reference genome build.
Data from the PROVIDE and CBC studies were combined for a fixed-effects meta-analysis using METAL software [19].Input from both cohorts was filtered on MAF >5% and IMPUTE2 (INFO) score >0.7, retaining SNPs that had been imputed with high certainty in the data set.Functional Mapping and Annotation of Genome-Wide Association Studies (FUMA) was used to annotate genomic regions of interest and identify variants in linkage disequilibrium [20].This annotation included expression quantitative trait loci (eQTLs) from several databases, including the Genotype-Tissue Expression project (GTEx) and the eQTLGen Consortium [21].Conditional analyses were performed at each associated locus using SNPTEST version 2 in each cohort and then combined for an overall fixed-effects meta-analysis with METAL software.

RESULTS
Within the PROVIDE study, 119 children were identified as having ≥1 astrovirus-associated diarrheal case in the first year of life, and 314 children did not have an associated astroviral infection.Children with astrovirus-associated diarrhea had a younger mean age (99.3 days) at the first diarrheal episode than children without astrovirus infection (123.6 days).Similarly, there were 58 case patients with astrovirus-associated diarrhea and 96 controls in the CBC study, with a mean age of first diarrheal episode of 117.7 days for case patients and 125.2 days for controls.In both cohorts, children not identified as case patients or controls either left the study early, did not have diarrhea, or had ≥1 sample with missing data for astrovirus.The distribution of HAZ and WAZ scores did not differ between case patients and controls in either cohort at birth or at 12 months of age (Table 1).Case patients and controls both had a mean mild diarrheal severity score in the PROVIDE study and a mean moderate diarrheal severity score in the CBC study (Table 1).

Genetic Associations
Figure 1 shows the association results for each cohort and the meta-analysis.The top association (rs75437404; meta-analysis P = 8.82 × 10 −9 ; MAF, 16.7%) was identified in a noncoding region on chromosome 1 near the loricrin gene (LOR) (Table 2).The LOR gene is 2.4 kb in size, and the peak association region around this SNP spans 88 kb (Figure 2A) [22].Children with ≥1 copy of the A allele at SNP rs75437404 were 2.7 times as likely to have an astrovirus-associated diarrheal infection within the first year of life as children with the T allele.This top associated SNP, rs75437404, has an MAF of 17.2% in the PROVIDE and 15.2% in the CBC study and is in linkage disequilibrium (r 2 > 0.8), with 13 variants spanning an intergenic region 15.3 kb upstream of LOR.A conditional analysis incorporating rs75437404 in the model attenuated the association, suggesting that the region is in strong linkage disequilibrium and the alleles are correlated (Figure 2C).
We also identified another region, including the prolactin releasing hormone receptor gene (PRLHR).The peak association region around PRLHR spans 51 kb, including the 9.9-kb gene (Figure 2B) on chromosome 10 (Figure 1A, 1C) [22].SNP rs75935441 is in the 3' untranslated region of the PRLHR gene (meta-analysis P = 1.33 × 10 −8 ; MAF, 7.5%) and is part of an enhancer sequence [23].Children with ≥1 copy of the C allele at rs75935441 were 4 times as likely to have an astrovirus diarrheal infection as children with the T allele (odds ratio [OR], 4.17 [95% confidence interval, 2.55-6.83]).The conditional analysis including rs75935441 in the model attenuated the signal (Figure 2D).

DISCUSSION
We identified 2 significant regions across the genome that were associated with astrovirus diarrheal infection in the first year of life.The first region is upstream of the LOR gene and is associated with expression of immune system genes PGLYRP4, S100A12, and S100A9.The second region encompasses the PLRHR gene and is associated with expression of the gene CACUL1.
LOR encodes loricrin, which contributes to the protective barrier function of the epidermis and is expressed almost exclusively in mammalian stratified epithelia [24].It is also found in macrophages in various tissues [25], but the relationship to astrovirus is not evident.However, within the chromosome 1 locus, the presence of chromatin interactions coupled with identification of eQTLs indicates that several of these variants are likely regulatory in nature.Both S100A12 and S100A9 are in the "MyD88-dependent cascade initiated on endosome" SuperPath [22].Astrovirus enters cells via endocytosis, and when Toll-like receptors (TLRs) bind viral nucleic acid inside the cell, they initiate signaling cascades which include MyD88 and type 1 interferon [26,27].The S100A9 protein regulates TLR3 activation [28,29], and the S100A12 protein is an endogenous activator of TLR4 [30].Both proteins are calgranulins and in addition to TLR activation are also involved in protecting the body against damage from inflammation [31,32].The same allele associated with higher expression of these genes in whole blood is associated with increased odds of astroviruspositive diarrhea in the first year of life.The C allele of rs12125683 is also associated with higher expression of PGLYRP4 in esophagus mucosa.PGLYRP4 is a peptidoglycan recognition protein, involved in the innate immune response to bacteria [22].Higher expression of this gene could contribute to astrovirus susceptibility via changes to the gastrointestinal microbiome [33].
The PRLHR gene is also in a region significantly associated with astrovirus diarrheal infections.This gene encodes the receptor for prolactin-releasing peptide, which has been identified as a target for obesity treatment [34].In mouse and rat No. in household, mean Principal source of household drinking water Municipality supply/piped water 114 ( Own arrangement by pump models, prolactin-releasing peptide influences feeding patterns and energy balance, and it has shown diet-suppressing effects [34,35].However, we further stratified by WAZ and HAZ scores to account for any underlying malnutrition at birth or at 12 months of age that may predispose an individual to an infection, and there were no differences between case patients and controls (Table 1).Thus, the association between PRLHR and astrovirus diarrhea infections is not likely to be mediated   by malnutrition.Further investigation to identify the potential association between PRLHR and the astrovirus replication cycle or its pathogenic mechanism is warranted.
In addition to the effects on PRLHR, we found SNPs associated with increased risk of astrovirus-positive diarrhea that were also associated with lower expression of the gene CACUL1 in whole blood.This gene promotes cell proliferation and higher expression has been linked to increased invasion of gastric cancer cells [36].While this gene is linked to Helicobacter pylori activity [36], it is unclear how it may be involved in astrovirus pathogenesis.Additional chromatin interactions at this locus suggest the potential for regulation of the genes ENO4, KCNK18, and EIF3A, but there are not yet data showing differences in gene expression based on genotypes found in the region of In conclusion, we identified 2 genetic regions associated with astrovirus infection susceptibility.Both regions warrant additional exploration for their potential association with immune and gastric cells and may elucidate astrovirus infection pathways.The next ring in each plot shows base pair coordinates along the chromosome (in megabases), with the region of interest colored blue.The innermost ring is annotated with gene symbols for all genes with a chromatin interaction (orange), an expression quantitative trait locus (green), or both (red).

Figure 1 .
Figure 1.Single-nucleotide polymorphism (SNP) associations with astrovirus-associated diarrhea in the first year of life.Manhattan plots show −log 10 P values for SNP associations by cohort, and dashed lines indicate thresholds for genome-wide significance (5 × 10 −7 ).A, Performance of Rotavirus and Oral Polio Vaccines in Developing Countries (PROVIDE) study.B, Cryptosporidiosis and Enteropathogens in Bangladesh Birth Cohort (CBC) study C, Meta-analysis.

Figure 3 .
Figure 3. Circos plots of chromatin interactions.Outermost rings show results from the genome-wide association study meta-analysis for chromosomes 1 (A) and 10 (B).The next ring in each plot shows base pair coordinates along the chromosome (in megabases), with the region of interest colored blue.The innermost ring is annotated with gene symbols for all genes with a chromatin interaction (orange), an expression quantitative trait locus (green), or both (red).

Table 1 . Characteristics of Children in Performance of Rotavirus and Oral Polio Vaccines in Developing Countries (PROVIDE) and Cryptosporidiosis and Enteropathogens in Bangladesh Birth Cohort (CBC) Studies
: CBC, Cryptosporidiosis and Enteropathogens in Bangladesh Birth Cohort; HAZ, height-for-age z; PROVIDE, Performance of Rotavirus and Oral Polio Vaccines in Developing Countries; VIP, ventilated improved pit; WAZ, weight-for-age z. a Data represent no.(%) of children unless otherwise specified. Abbreviations

Table 2 . Genome-wide Significant Associated Single-Nucleotide Polymorphism From Stratified Analyses and Meta-analysis
Abbreviations: CBC, Cryptosporidiosis and Enteropathogens in Bangladesh Birth Cohort; CI, confidence interval; MAF, minor allele frequency; OR, odds ratio; PROVIDE, Performance of Rotavirus and Oral Polio Vaccines in Developing Countries; SNP, single-nucleotide polymorphism.