A Genome-wide Association Study Identifies SERPINB10, CRLF3, STX7, LAMP3, IFNG-AS1, and KRT80 As Risk Loci Contributing to Cutaneous Leishmaniasis in Brazil

Abstract Background Our goal was to identify genetic risk factors for cutaneous leishmaniasis (CL) caused by Leishmania braziliensis. Methods Genotyping 2066 CL cases and 2046 controls using Illumina HumanCoreExomeBeadChips provided data for 4 498 586 imputed single-nucleotide variants (SNVs). A genome-wide association study (GWAS) using linear mixed models took account of genetic diversity/ethnicity/admixture. Post-GWAS positional, expression quantitative trait locus (eQTL) and chromatin interaction mapping was performed in Functional Mapping and Annotation (FUMA). Transcriptional data were compared between lesions and normal skin, and cytokines measured using flow cytometry and Bioplex assay. Results Positional mapping identified 32 genomic loci associated with CL, none achieving genome-wide significance (P < 5 × 10−8). Lead SNVs at 23 loci occurred at protein coding or noncoding RNA genes, 15 with eQTLs for functionally relevant cells/tissues and/or showing differential expression in lesions. Of these, the 6 most plausible genetic risk loci were SERPINB10 (Pimputed_1000G = 2.67 × 10−6), CRLF3 (Pimputed_1000G = 5.12 × 10−6), STX7 (Pimputed_1000G = 6.06 × 10−6), KRT80 (Pimputed_1000G = 6.58 × 10−6), LAMP3 (Pimputed_1000G = 6.54 × 10−6), and IFNG-AS1 (Pimputed_1000G = 1.32 × 10−5). LAMP3 (Padjusted = 9.25 × 10−12; +6-fold), STX7 (Padjusted = 7.62 × 10−3; +1.3-fold), and CRLF3 (Padjusted = 9.19 × 10−9; +1.97-fold) were expressed more highly in CL biopsies compared to normal skin; KRT80 (Padjusted = 3.07 × 10−8; −3-fold) was lower. Multiple cis-eQTLs across SERPINB10 mapped to chromatin interaction regions of transcriptional/enhancer activity in neutrophils, monocytes, B cells, and hematopoietic stem cells. Those at IFNG-AS1 mapped to transcriptional/enhancer regions in T, natural killer, and B cells. The percentage of peripheral blood CD3+ T cells making antigen-specific interferon-γ differed significantly by IFNG-AS1 genotype. Conclusions This first GWAS for CL identified multiple genetic risk loci including a novel lead to understanding CL pathogenesis through regulation of interferon-γ by IFNG antisense RNA 1.

American cutaneous leishmaniasis (ACL) caused by Leishmania braziliensis has multiple presentations including cutaneous (CL), mucosal (ML), and disseminated (DL) leishmaniasis. ML and DL are generally preceded by CL. The common CL form of disease is associated with localized skin lesions, mainly ulcers, on exposed body parts. While normally self-limiting, the degree of pathology and rate of healing varies, with lesions leaving lifelong scars. Not all infected individuals go on to develop disease. Subclinical infection is associated with Leishmaniaspecific cellular immune responses, measured as delayed-type hypersensitivity (DTH) skin test responses [1]. Leishmania antigen-stimulated peripheral blood lymphocytes also produce interferon gamma (IFN-γ) and tumor necrosis factor (TNF) in subclinical infection, but at lower levels than CL [1]. In a longitudinal study, IFN-γ was associated with protection, but a positive skin-test response was not [2]. Indeed, a positive DTH response has high sensitivity for diagnosis of L. braziliensis CL [1,3]. All forms of ACL are associated with exaggerated cellular immunity. In CL, there is a positive correlation between the frequency of CD4 + T cells expressing IFN-γ and TNF and lesion size [4], with higher levels in ML than CL [5]. The outcome of L. braziliensis infection is determined by a fine balance between e516 • cid 2021:72 (15 May) • Castellucci et al proinflammatory IFN-γ and TNF and anti-inflammatory interleukin-10 (IL-10) [3,5].
Here we perform the first well-powered genome-wide association study for L. braziliensis CL, combining analysis across 2 cohorts comprising 2066 cases and 2046 controls. Integrative post-GWAS analysis [19] is used to positionally map genomic loci associated with CL, with functional annotation and experimental studies used to identify plausible genes that act as genetic risk factors for CL.

Ethical Considerations, Sampling, and Clinical Data Collection
The study, approved by the Hospital Universitário Professor Edgard Santos Ethical Committee (018/2008 and 22/2012) and the Brazilian National Ethical Committee (CONEP-305/2007; CONEP-1258513.1.000.5537), complied with principles of the Helsinki declaration. All participants or parents/guardians signed written consents. Post-quality control (QC) genotype data are lodged in the European Genome-Phenome Archive (accession number EGAS00001004596). CL cases were ascertained at the Public Health Post, Corte de Pedra, Bahia, Brazil, where L. braziliensis is the confirmed species [9][10][11][12]. CL is defined as presence of chronic ulcerative lesions without mucosal involvement (ML) or dissemination to ≥10 sites (DL). ML and DL cases were excluded due to insufficient power. All CL cases had confirmed parasite detection and/or minimally met 2 of 3 criteria: positive leishmania-specific DTH, positive leishmania serology, and leishmania histopathology. Endemic controls were attendants of cases with no current/previous history of CL, DL, or ML, including no scars. Samples were collected in 2 phases: 2008-2010 and 2016-2017. Blood bank controls were collected during 2015-2017 at the HEMOBA Foundation, Salvador. Demographic data (age, sex) were recorded. Blood (8 mL) was taken by venipuncture into dodecyl citrate acid-containing vacutainers (Becton Dickinson). Genomic DNA was prepared using proteinase K and salting-out and shipped to the United Kingdom for genotyping at Cambridge Genomic Services.

SNV Imputation and GWAS
Imputation was performed using the multiethnic 1000 Genomes Project phase 3 reference panel (1000G): 84.8 million variants, 2504 samples, and 26 populations. The 293 563 post-QC genotyped SNVs common across phases 1 and 2 were imputed using the Michigan Imputation Server version 1.0.4 [21]. Imputed SNVs with information metric <0.8 or genotype probability <0.9 were excluded. Remaining variants were converted to genotype calls and filtered for <5% missingness and minor allele frequency >0.005. Imputation accuracy was assessed as the squared Pearson correlation between imputed SNV dosage and known allele dosage (r 2 > 0.5).
Genome-wide association analysis was performed using a linear mixed model in FaST-LMM version 2.07 under an additive model [22]. Population structure and relatedness were controlled using the genetic similarity matrix, computed from 32 696 phase 1 and 45 569 phase 2 linkage disequilibrium (LD)pruned array variants. Systematic confounding was assessed using quantile-quantile (Q-Q) plots and an inflation factor (denoted λ; median observed/median theoretical χ 2 distributions). Manhattan plots were generated in R using mhtplot in the genetic analysis package "gap. " Regional association plots were created using LocusZoom [23]. The 32 696 phase 1 and 45 569 phase 2 LD-pruned variants were matched to HapMap populations and Principal Component Analysis plots prepared in R.

Post-GWAS Annotation in FUMA
Functional Mapping and Annotation (FUMA) [19] was used to characterize regions of association based on positional, expression quantitative trait loci (eQTL) and chromatin interaction mapping. Summary statistics from the combined GWAS were loaded into FUMA. SNP2GENE was used to identify independent significant SNVs based on 1000G multiethnic LD data. SNP2GENE mapping used the default GWAS P < 10 −5 plus 1 manually entered seed hit at P = 1.32 × 10 −5 . Independent significant SNVs and SNVs in LD with them were annotated for consequences on gene function using ANNOVAR, potential regulatory functions (Regulome DB score), and 15-core chromatin state predicted by ChromHMM for 127 tissue/cell types. Effects of SNVs on gene expression were determined using eQTLs from multiple tissue/cell types of healthy donors from databases: eQTLgen (44 different tissue types); BIOSQTL (BIO_eQTL_gene level, whole peripheral blood, 2116 healthy donors); DICE (B and T cells, monocytes, natural killer cells); and GTEx version 8 (whole blood; cultured fibroblasts; skin exposed and not sun exposed).

Expression Analysis in CL Lesions
RNA expression for mapped genes was examined using published microarray data [24] comparing CL lesion biopsies (n = 25) with normal skin (n = 10) from nonendemic unexposed donors (GEO database: GSE55664). Between-group comparisons were made on log-transformed data using the GEO2R tool with Benjamini and Hochberg false discovery rate-adjusted P values.

Characteristics of the Study Population
Demographic and clinical details comparing cases and controls are provided in Supplementary Table 1. The younger age of endemic controls was counterbalanced by older age of blood bank controls. Phase 1 and 2 CL cases were matched for lesion number/size and DTH, with no correlation between lesion and DTH sizes. Blood bank controls fell within genetic heterogeneity of endemic controls ( Supplementary Figures 1 and 2), with all controls matched to cases. A few outliers occurred in phase 1 endemic controls, which also showed greater heterogeneity in phase 2. Comparison against HapMap populations showed predominant admixture between white and African ethnicities. Linear mixed models used in association analyses take account of genetic heterogeneity.

Genome-wide Association Study
Manhattan and Q-Q plots for genotyped data for phases 1/2 ( Supplementary Figures 3 and 4) showed no systematic bias (λs 0.998/1). A Manhattan plot for the combined imputed genotype data ( Figure 1) shows no hits at P < 5 × 10 −8 . Four approaches were used to identify susceptibility genes: (1) integrative post-GWAS annotation in FUMA [19]; (2) analysis of transcriptional data comparing lesions with normal skin [24]; (3) review of gene function for relevance to parasite biology/ immunopathology; and (4) analysis of genotypic differences in T-cell responses.

Transcriptional Analyses of Putative Susceptibility Genes
Additional evidence (Table 1) to support genes as candidates was sought by comparing expression in CL lesions vs normal skin [24]. Six genes were expressed at higher level in lesions (LAMP3, STX7, CALCR, CRLF3, PPP6R1, CHKB), 5 genes at lower levels (ZNF385D, MCCC1, PCMTD1, TSPAN9,  Figure 2). The lead SNV and others in strong LD lie upstream of KRT80 in a region of strong transcriptional/ enhancer activity and act as eQTLs in cultured fibroblasts, sun and non-sun-exposed skin. Other genes supported by both eQTL and lesion expression were in tandem with genes positionally mapped to the same lead SNV (Table 1), namely LAMP3/MCCC1, PXDNL/PCMTD1, and CHKB/MAPK8IP2. For PXDNL/PCMTD1, the lead SNV was intronic in PXDNL but neither it nor numerous mapped SNVs in LD with it were eQTL for PXDNL itself. Rather, they acted as eQTLs for PCMTD1 expression in fibroblasts (Supplementary Figure 5). For CHKB/MAPK8IP2, the lead SNV lies upstream of both genes transcribed in opposing directions. Mapped SNV act as eQTLs for MAPK8IP2 in sun and not-sun-exposed skin but not for CHKB (Supplementary Figure 6). For LAMP3/MCCC1, the lead SNV and SNVs in LD with it map predominantly within LAMP3 (Supplementary Figure 7). While they act as cis-eQTLs for MCCC1 expression, data from CL lesions ( Table 1) suggest stronger upregulation of LAMP3 compared to downregulation of MCCC1.
Ten genes showed no differential expression in lesions (Table 1). This included SERPINB10 across which multiple cis-eQTLs mapped to chromatin states of transcriptional/enhancer activity in neutrophils, monocytes, B cells, and hematopoietic stem cells (Supplementary Figure 8). Four noncoding RNA genes (Table 1) not present on the chips used for CL lesion data [24] included IFNG-AS1 which had 10 eQTLs across a chromatin state region of transcriptional/enhancer activity in immune cells (whole blood: T cells, B cells, hematopoietic stem cells) that were associated with expression of IFNG-AS1, IFNG, and IL26 (Figure 3).

Relevance to the Biology and Immunopathology of CL Disease
In summary, of 32 positional mapped genomic loci, 23 occurred at protein coding or noncoding RNA genes of which 15 had eQTLs for expression in relevant cells/tissues and/or showed differential expression in CL lesions. To determine which genes in these 15 loci might act as CL susceptibility genes, we reviewed gene function in relation to parasite biology and CL immunopathology (Supplementary Table 3). A plausible functional role for 12 genes was not found, including PCMTD1 for which eQTL and lesion expression data were strong. A role for these genes cannot be discounted, but 6 genes had plausible links to CL pathogenesis (Table 2 and Figure 4): LAMP3 and STX7 play a role in lysosome function; KRT80 and CRLF3 relate to skin perturbations; SERPINB10 and IFNG-AS1 play central roles in immune responses.

DISCUSSION
Our GWAS provides the first hypothesis-free insights into genetic risk factors for L. braziliensis CL. Despite prior evidence for genetic regulation of leishmaniasis [7], and the robust well-powered study undertaken, no signals of association achieved genome-wide significance (P < 5 × 10 −8 ) and only modest support was found for previous candidate gene studies (Supplementary Data). We therefore employed integrative approaches [19] to prioritize 6 genes as plausible genetic risk loci for CL. Two genes relate to intracellular localization of Leishmania parasite in phagolysosomes [27]. LAMP3 encodes lysosomalassociated membrane protein 3, also known as dendritic cell LAMP or DCLAMP [28]. Expression of DCLAMP increases in activated dendritic cells, localizing to the MHC class II compartment immediately before translocation of class II to  [28]. LAMP3 was expressed at 5.9-fold higher levels in lesions compared to normal skin, supporting its role in CL. Since dendritic cells are the most potent antigenpresenting cells that induce primary T-cell responses, it is likely that variation at LAMP3 will relate to presentation of Leishmania antigens to T cells. STX7 encodes syntaxin 7, which influences vesicle trafficking to lysosomes, including phagosome-lysosome fusion [29]. Variants at STX7 could influence delivery of Leishmania phagosomes to lysosomes of macrophages.
Two other susceptibility genes, KRT80 and CRLF3, likely relate to skin perturbations and/or the wound healing response. Keratin 80 is a type II epithelial keratin with biased expression in skin keratinocytes [30]. Pathogens invading skin cause keratinocytes to produce chemokines which attract monocytes, natural killer cells, T cells, and dendritic cells [31]. KRT80 and multiple other keratins (KRT77/81/4/39/32/33B) were expressed at lower levels in lesions compared to normal skin. Whether this reflects a paucity of keratinocytes in lesions, or specific downregulation of gene expression in keratinocytes within lesions, requires further investigation. Keratinocytes play a role in wound healing, are potent producers of IL-10 and transforming growth factor-β [32], and can change to a sclerotic phenotype by gene silencing of Fli1 [33]. Although association of human CL and FLI1 [9,10] was not replicated here, Fli1 is a confirmed murine CL susceptibility gene [34]. Our novel associations continue to focus on molecules/cells involved in wound healing. In contrast, the cytokine receptor-like factor 3 CRLF3 is expressed in normal skin, and shows pathologically enhanced expression in premalignant actinic keratosis and malignant squamous cell carcinoma [35]. CRLF3 appears to be similarly dysregulated in CL lesions.
SERPINB10 and IFNG-AS1 are highlighted as central regulators of immune responses. Serpin family B member 10 is a peptidase inhibitor expressed in bone marrow [36] in the monocytic lineage and can inhibit TNF-induced apoptosis [37]. Epithelial SERPINB10 contributes to allergic eosinophilic inflammation [38]. However, despite eQTL data showing SERPINB10 expressed in sun-exposed skin, we found no evidence for its expression in lesions, consistent with more central roles in immune regulation. IFNG antisense RNA 1 fine-tunes the magnitude of IFN-γ responses [26]. It is expressed in mouse and human T-helper1 cells and positively regulates Ifng expression [39,40]. Transient overexpression of Ifng-as1 is associated with increased IFN-γ and reduced susceptibility to Salmonella enterica [39]. Conversely, deletion of Ifng-as1 in mice compromises host defence against Toxoplasma gondii by reducing Ifng expression. Discordant expression of IFNG and IFNG-AS1 is seen in long-lasting memory T cells, where high IFNG-AS1 associated with low IFNG suggests feedback inhibition [26]. We observed that the IFNG-AS1 genotype was associated with downstream effects on percentage of IFN-γ-producing CD3 + T cells and highly correlated TNF-producing CD3 + T cells following antigen stimulation. Individuals homozygous for the disease-associated allele at 7 IFNG-AS1 associated SNVs had significantly lower percentages of IFN-γ/TNF T cells, suggesting that lower IFN-γ and TNF regulated by IFNG-AS1 causes increased disease risk. These 2 proinflammatory cytokines are important activators of macrophages for anti-leishmanial activity.
Our GWAS identified novel genetic risk factors for CL that provide interesting leads to further understanding CL pathogenesis, including through regulation of IFN-γ responses.

Supplementary Data
Supplementary materials are available at Clinical Infectious Diseases online. Consisting of data provided by the authors to benefit the reader, the posted materials are not copyedited and are the sole responsibility of the authors, so questions or comments should be addressed to the corresponding author. Details of all post-genome-wide association study candidate genes are provided in Supplementary Table 3.
Abbreviations: Chr, chromosome; CI, confidence interval; rsID, reference SNP cluster identity; SE, standard error. a Associated allele (ancestral/minor) for risk or protection as indicated by the odds ratio.