Natural genetic variation in Drosophila melanogaster reveals genes associated with Coxiella burnetii infection

Abstract The gram-negative bacterium Coxiella burnetii is the causative agent of Query (Q) fever in humans and coxiellosis in livestock. Host genetics are associated with C. burnetii pathogenesis both in humans and animals; however, it remains unknown if specific genes are associated with severity of infection. We employed the Drosophila Genetics Reference Panel to perform a genome-wide association study to identify host genetic variants that affect host survival to C. burnetii infection. The genome-wide association study identified 64 unique variants (P < 10−5) associated with 25 candidate genes. We examined the role each candidate gene contributes to host survival during C. burnetii infection using flies carrying a null mutation or RNAi knockdown of each candidate. We validated 15 of the 25 candidate genes using at least one method. This is the first report establishing involvement of many of these genes or their homologs with C. burnetii susceptibility in any system. Among the validated genes, FER and tara play roles in the JAK/STAT, JNK, and decapentaplegic/TGF-β signaling pathways which are components of known innate immune responses to C. burnetii infection. CG42673 and DIP-ε play roles in bacterial infection and synaptic signaling but have no previous association with C. burnetii pathogenesis. Furthermore, since the mammalian ortholog of CG13404 (PLGRKT) is an important regulator of macrophage function, CG13404 could play a role in host susceptibility to C. burnetii through hemocyte regulation. These insights provide a foundation for further investigation regarding the genetics of C. burnetii susceptibility across a wide variety of hosts.


Introduction
Coxiella burnetii is the causative agent of Query (Q) fever, a zoonotic disease that poses a serious threat to both human and animal health (Maurin and Raoult 1999). Because of its morbidity, low infectious dose, and the environmental stability of C. burnetii, the US NIH and CDC classify it as a Category B priority pathogen (Madariaga et al. 2003). Humans primarily become infected from sheep, goats, and cattle through inhalation of contaminated aerosols (Marrie et al. 1996;McQuiston et al. 2002;Schimmer et al. 2008). Therefore, reducing bacterial load in the livestock is critical to preventing Q fever outbreaks. C. burnetii is endemic worldwide and sporadic outbreaks have recently been reported in the United States (Karakousis et al. 2006;Anderson et al. 2013;Kersh et al. 2013;Sondgeroth et al. 2013;Dahlgren et al. 2015). A recent large outbreak of Q Fever on a goat farm in the Netherlands cost 307 million Euros in public health management efforts and agricultural interventions (Schimmer et al. 2008;Roest et al. 2011aRoest et al. , 2011bvan Asseldonk et al. 2013). To date, no commercial Q fever vaccine is available for humans or animals in the United States, and antibiotic therapy is the only option for treating human infection. Culling infected or at-risk animals is often employed to control outbreaks (Roest et al. 2011b(Roest et al. , 2012(Roest et al. , 2013. Additionally, the lack of animal models with genetic malleability and the strict requirements in BSL3 animal facilities towork with Select Agent phase I virulent strains of C. burnetii make it difficult to study host-pathogen interactions in vivo. Host genetics influence the development of C. burnetii infection in both humans and other animals (Ghigo et al. 2002;Leone et al. 2004;Raoult et al. 2005;Meghari et al. 2008;Delaby et al. 2012;De Lange et al. 2014). Experimental studies in human and mouse cells correlate defective monocyte/macrophage activation and migration with ineffective granuloma formation, and overexpression of interleukin (IL)-10 is present in patients with chronic Q fever (Meghari et al. 2008;Delaby et al. 2012;Bewley 2013;Mehraj et al. 2013;Ka et al. 2014). Two recent studies genotyped human populations and revealed that genetic variation in innate immune genes, such as those encoding pattern recognition receptors and IFNG, are associated with susceptibility to Q fever (Wielders et al. 2015;Ammerdorffer et al. 2016). Despite this importance, specific genetic variants associated with susceptibility to C. burnetii infection remain largely unknown. In addition, it is unknown how host genetic factors affect bacterial load and shedding in susceptible reservoir hosts, namely livestock.
Previous studies profiled mammalian host responses to C. burnetii infection. These studies show that the bacteria downregulate the host innate immune response during acute infection and determine that the resolution of Q fever is associated with the reestablishment of type I interferon (IFN) signaling (Ghigo et al. 2002;Faugaret et al. 2014;Gorvel et al. 2014). Directed studies in humans reveal that single-nucleotide polymorphisms (SNPs) in innate immune receptors and signaling genes such as TLR1, STAT1, IFNG, and MYD88 are associated with acute or chronic Q fever (Schoffelen et al. 2015;Wielders et al. 2015). Since these studies used a targeted approach to examine SNPs in only a set of candidate genes, we aimed to undertake a global, genome-wide analysis to identify gene variants associated with C. burnetii infection using Drosophila melanogaster as the host model.
We recently demonstrated that adult D. melanogaster is susceptible to infection with the BSL2 NMII clone 4 strain of C. burnetii and that this strain replicates in flies (Bastos et al. 2017). Importantly, this previous work established D. melanogaster as a suitable model for studying host-pathogen interactions during C. burnetii infection despite the bacteria being a mammalian pathogen and not a natural pathogen of insects. Additionally, mechanisms of immunity differ between mammals and insects, most notably the lack of an adaptive immune response in insects. Furthermore, while many components of innate immunity are conserved between mammals and insects, such as TLR/Toll, NFjB/Relish, and JAK/STAT signaling (reviewed in Sheehan et al. 2018;Tafesh-Edwards and Eleftherianos 2020), insects lack an intact cGAS/STING signaling axis, as a cytosolic DNA sensor has yet to be identified in insects (Goto et al. 2018;Hua et al. 2018;Martin et al. 2018). Application of the D. melanogaster/C. burnetii model to the Drosophila genetics reference panel (DGRP), a fully sequenced, inbred panel of fly lines derived from a natural population, provides an efficient platform for genotype-to-phenotype associations via a genome-wide association study (GWAS) (Mackay et al. 2012;Huang et al. 2014). The DGRP has already been used to reveal genes associated with susceptibility/tolerance to other bacterial pathogens (Bou Sleiman et al. 2015;Howick and Lazzaro 2017;Wang et al. 2017).
In this study, we identified genetic variants in D. melanogaster that were associated with susceptibility or tolerance to C. burnetii infection. Specifically, from the different GWA analyses performed, we obtained a list of 64 unique variants associated with 25 candidate genes that were selected for validation studies. Our analyses reveal genes with sex-specific effects on susceptibility and tolerance that have functions associated with actin binding, transcriptional response, and regulation of G-proteins. We found that multiple genes within the decapentaplegic (DPP) pathway, which is homologous to the TGF-b pathway in mammals (Gelbart 1989) were associated with susceptibility to infection. Similarly, we identified Rho GEFs and TGF-b, which are associated with the development of the Coxiella-containing vacuole and pathogenesis in humans, respectively (Benoit et al. 2008a(Benoit et al. , 2008bAguilera et al. 2009;Pennings et al. 2015;Salinas et al. 2015;Weber et al. 2016). Importantly, all the candidate genes identified here have mammalian orthologs or highly conserved functions that allow for extrapolation to mammalian systems.
Of the 25 candidate genes we identified in the GWAS, 15 genes significantly affected host survival during C. burnetii infection in D. melanogaster null mutants, RNAi knockdown flies, or both. We also examined the effect of candidate SNPs using regulatory element analysis (modENCODE) and found that some were within transcription factor binding hot spots, putative enhancers, novel splicing, branch point variation, and codon usage variation that could explain how the variants affect host gene expression and ultimately infection outcome. Altogether, we utilize the DGRP to identify host genetic variants associated with sex-specific susceptibility or tolerance to C. burnetii infection. The human orthologs of the genes we identified may also play important roles in human immune cell regulation, highlighting the conserved nature of gene function between insect and mammalian models of C. burnetii infection.

Materials and methods
Drosophila melanogaster and C. burnetii stocks Fly stocks were obtained from the Bloomington Drosophila Stock Center, the Vienna Drosophila Resource Center, Exelixis at Harvard Medical School, and the Kyoto Stock Center. Fly stocks were maintained at room temperature in standard meal agar fly food at 25 C and 65 C humidity. All fly strains are listed in Supplementary Table S1. Coxiella burnetii Nine Mile phase II (NMII) clone 4 RSA439 was propagated in Acidified Citrate Cysteine Medium 2 as previously described (Omsland et al. 2009). Coxiella burnetii stocks were quantified using quantitative realtime PCR to measure bacterial genome equivalent (GE), as previously described (Coleman et al. 2004).

Fly infections and hazard ratio phenotype determination
Each DGRP line was separated into groups of 40 male and 40 female adult flies (2-7 days old) (Supplementary Tables S2 and S3) and injected with phosphate-buffered saline (PBS) or 10 5 GEs of C. burnetii diluted in PBS to establish infection. We infected flies at a multiplicity of infection of 10 5 GE C. burnetii/fly based on the previous study establishing D. melanogaster as a model for C. burnetii infection (Bastos et al. 2017). For injections, flies were anesthetized with CO 2 and injected with 23 nL of bacteria or PBS using a pulled glass capillary and an automatic nanoliter injector (Drummond Scientific, Broomall, PA, USA), as previously described . Individual flies were injected at the ventrolateral surface of the fly thorax and placed into new vials. Male and female flies were housed in separate vials. After injection, survival was monitored daily for 30 days with the flies maintained at 25 C and 68% humidity. We used Prism v8.0 (GraphPad Software, Inc.) to determine hazard ratios and P-values [log-rank (Mantel-Cox) test] for survival curves for males and females from each DGRP line. All survival analyses take the full 30-day trial into account, and raw data for the DGRP lines are found in Supplementary File S1. Lines having less than 3% mortality in the mock-infected group were not included in downstream analyses (Chow et al. 2013).

Genome-wide association using hazard ratios and candidate gene analyses
To determine phenotype-to-genotype association, hazard ratios were log 10 transformed and submitted to the dgrp2 webtool (http://dgrp2.gnets.ncsu.edu/), which adjusts the phenotype for the effects of Wolbachia infection and major inversions (Mackay et al. 2012;Huang et al. 2014). Three separate analyses were run using male, female, and combined hazard ratios for the DGRP lines (Supplementary Tables S2-S4). R was used to create Quantile-quantile (Q-Q) plots from log-transformed hazard ratios and obtain r 2 values and genomic inflation values (k). For male and female analyses, 193 male and 195 female log-transformed hazard ratios were submitted (Supplementary Tables S2 and S3, respectively). SNPs and small indel variants with a P-value (mixedeffects model) below 10 À5 were considered genome-wide suggestive candidate variants and further analyzed. For the combined analysis, both male and female hazard ratios were submitted for 191 DGRP lines. Candidate variants with P-values (mixed-effects model) less than 10 À5 for the average trait or the difference (female-male) trait were selected for further study. Since the nominal genome-wide suggestive P-value of 10 À5 has been employed in other GWA studies (Stranger et al. 2011;Chow et al. 2016;Fadista et al. 2016;Zhang et al. 2016;Gibson et al. 2019), this threshold justifies our choice for the purposes of this study and choosing candidate genes to evaluate further. The DGRP genome assembly (BDGP R5/dm3) was used to identify variants in candidate genes. Human orthologs of candidate genes were identified using the DRSC Integrative Ortholog Prediction Tool (DIOPT v8.0) reported in Flybase (Hu et al. 2011). The ortholog with the highest weighted score is reported in Table 1. Predicted functions for each candidate gene were gathered using Flybase (FB2019_02). Regulatory annotation summaries for each SNP and indel were compiled using Flybase (FB2019_02) and modENCODE utilizing variant coordinates converted to the BDGP R6/dm6 reference assembly. To define regulatory annotations, we reviewed publicly available data within modENCODE tracks, such as all noncoding features including transcription factor binding sites (TFBS), histone ChIP-seq data, chromatin domain segmentation, and small RNA-seq tracks. Relative male: female ratio gene expression results were calculated using RNA-seq data (Graveley et al. 2011) in Flybase. Sex-specific expression data of the exon nearest to the gene variant were averaged for 1-and 5-day-old flies.

Validation of candidate genes
To empirically determine the effect of knockout or knockdown of the candidate gene on severity to infection, 40 adult flies from each null mutant and RNAi knockdown for each candidate gene were injected with PBS or C. burnetii, as stated previously. RNAi knockdown was performed using straight-winged progeny from crosses between the CyO-balanced Act5C-GAL4 driver line and the corresponding dsRNA-containing RNAi lines (Supplementary  Table S1). Sibling progeny flies carrying the CyO balancer were used as control flies. Genetic background strains for each null mutant strain were used as control flies. We conducted all survival experiments for each candidate gene twice independently and we used Prism v8.0 (GraphPad Software, Inc.) to determine hazard ratio and P-value [log-rank (Mantel-Cox) test] for each independent survival experiment to ensure that the two experiments were not significantly different from each other (data not shown). We then combined the results from both experiments to determine single hazard ratios and generate survival graphs Significance levels of combined survival experiments for each genotype were binned into one of three categories: no significance (P > 0.01), low significance (0.01 > P > 0.0001), or high significance (P < 0.0001) (Supplementary Tables S6 and S7). We considered a gene to validate if the significance level changed between control and null mutant or RNAi knockdown genotypes for the sexes corresponding to GWA analysis type. For validated gene candidates in the average category, significance levels changed for both sexes, and validated gene candidates in the difference category changed for one sex but not the other.

Splicing, branch point variation, and codon usage analysis
The Ensembl project (http://uswest.ensembl.org/index.html) and the Human Splicing Finder (http://www.umd.be/HSF/) were used to determine splicing and branch point variation from curated sequences to determine codon usage fraction based on frequency of amino acids per thousand.

Data availability
Strains and stocks are available upon request. Genomic sequence for the DGRP is available at http://dgrp.gnets.ncsu. edu/. Supplementary material and all raw survival data (Supplementary Files S1 and S2) are available at FigShare: https://doi.org/10.25386/genetics.13490244. The authors affirm that all data necessary for confirming the conclusions of the article are present within the article, figures, and tables.

Susceptibility to C. burnetii infection is dependent on host genetic background
Previously, we determined that flies deficient in the IMD signaling pathway genes, PGRP-LC and Relish, exhibit increased susceptibility to C. burnetii infection. We also determined that the gene eiger contributed to decreased tolerance to C. burnetii infection in flies, as eiger mutant flies were less susceptible to C. burnetii infection (Bastos et al. 2017). Therefore, we hypothesized that susceptibility to C. burnetii infection in Drosophila is associated with host genetics, and that the broad base genetic variation in the DGRP could identify other candidate genes that effect susceptibility to C. burnetii infection via a GWAS. To determine the susceptibility of each DGRP line to infection, adult males and females of each line were mock-infected or infected with C. burnetii. We then monitored survival and calculated hazard ratios that were used as input for the GWA analysis ( Figure 1). In total, we calculated 193 and 195 hazard ratios for males and females, respectively. The survival curves reveal an approximately log-normal distribution of hazard ratios ranging from À0.719 to 1.643 for male flies (0.191-44.01, non-log-transformed) and À0.714 to 1.200 for female flies (0.1932-15.85, non-log-transformed) (Supplementary Tables S2 and S3, Figure S1A), which indicates that genetic polymorphisms in the DGRP lines affect susceptibility to C. burnetii infection. Interestingly, male flies are more susceptible than female flies overall to C. burnetii infection, with a mean hazard ratio of 1.90 for male flies and 1.56 for female flies (P ¼ 0.0015) (Supplementary Figure S1A). Notably, we observe three distinct survival phenotypes for both male and female flies among all DGRP lines. These survival phenotypes are defined by the hazard ratio, which compares the mortality rate of C. burnetii-infected flies to mock-infected flies. Hazard ratio analysis has been used to examine flies' mortality rate to West Nile virus compared to mock-infection and to Pseudomonas entomophila based on route of infection (Martins et al. 2013;Ahlers et al. 2019). In general, susceptible DGRP lines display increased mortality to C. burnetii infection compared to mock-infection and positive log 10 hazard ratios. Tolerant DGRP lines show no change in survival between C. burnetii and mock-infection and have log 10 hazard ratios close to zero. We also observe that certain DGRP lines exhibit decreased mortality compared to the mock-infected group, as noted by the negative log 10 hazard ratio in Supplementary Tables S2  and S3. Negative hazard ratios following microbial infection are not uncommon. This means that the genetic background of the particular DGRP line results in increased survival compared to the mock/injury control (Martins et al. 2013;Ahlers et al. 2019). A study on dietary restriction also shows negative hazard ratios (McCracken et al. 2020).

GWA analyses of DGRP hazard ratios reveal candidate gene variants
The DGRP facilitates rapid GWA analyses using a quantitative phenotype via submission of a data set to the online webtool (Mackay et al. 2012). To determine polymorphisms in the DGRP population that affect susceptibility to C. burnetii, we submitted hazard ratios for analysis. We log 10 -transformed the hazard ratios prior to submission for GWA analysis (Supplementary Figure  S1A) to yield an approximately normal distribution (Shapiro-Wilk test, P > 0.1) due to GWA analyses relying on parametric tests (Chow et al. 2013). We determined that hazard ratios are significantly positively correlated between male and female flies (P ¼ 4.99 Â 10 À7 ), but with an r 2 value of 0.121, which indicates a weak correlation and potential sex-dependent genotypes (Supplementary Figure S1B). Thus, we submitted hazard ratios as separate files for male and female analyses, and a single, combined file in order to identify polymorphisms that may be sexdependent and to increase power for polymorphisms that are sex-independent. We termed the sex-independent analysis the average analysis, which results in top candidate variants that affect both sexes while the sex-dependent analysis which we termed difference analysis, results in top candidates that affect one sex but not the other. In total, we submitted 193, 195, and 191 hazard ratios for males, females, and average and difference, respectively (Supplementary Tables S2-S4).
We tested a total of 1,893,791 polymorphisms in the male analysis, 1,897,049 polymorphisms in the female analysis, and 1,889,141 polymorphisms in the combined analysis. These analyses were not sufficiently powered to detect polymorphisms at a Bonferroni-corrected P-value of 2.64 Â 10 À8 . Therefore, we employed a genome-wide suggestive P-value threshold of 10 À5 which has been used for studies employing the DGRP (He et al. 2014;Chow et al. 2016;Kelsey and Clark 2017;Schmidt et al. 2017;Lavoy et al. 2018;Mackay and Huang 2018;Palu et al. 2020;Talsness et al. 2020). Using this P-value, we obtained a total of 69 associated polymorphisms from the GWA analyses, which included five duplicate variants (Figure 2 and Supplementary Table  S5). Q-Q plots revealed no significant inflation due to dataset distribution, lambda values ranged from 0.993 (females) to 1.002 (difference), and P-values derived from these analyses appear to be reduced overall based on the lines from the Q-Q plots and lambda values below 1 (Supplementary Figure S2, A-D). Another interpretation of the Q-Q plots and GWAS significance values is that there is random association of our genome-wide suggestive variants. However, previous work has shown that weak signals in DGRP studies produce meaningful results. For example, candidate gene association and phenotypic correlation can be preserved among GWAS from different labs and using different populations of inbred fly lines (Everman et al. 2019;Pitchers et al. 2019). Furthermore, weak DGRP signals have been used to identify genes that genuinely regulate the phenotypic output of the DGRP studies (Chow et al. 2016;Palu and Chow 2018;Ahlers et al. 2019;Palu et al. 2020). Nevertheless, we calculated FDR corrections that would result in an equivalent P-value cutoff of P < 10 À5 for each of our GWA analyses (Benjamini and Hochberg 1995; Pinheiro et al. 2020). For the female GWAS, the FDR correction is 0.88; for the male, it is 0.94; for the average, it is 0.99; for the difference, it is 0.78. As described in Everman et al. (2019), an FDR equivalent for a P < 10 À5 cutoff ranged from 0.49 to 0.82 for previously published studies (Mackay et al. 2012;Everman and Morgan 2018). One reason our FDR corrections are higher is because our study uses additional DGRP lines than those cited above, which results in a greater number of tests (N) and additional variants with MAFs > 0.05. Nevertheless, the very lenient P-value cutoff we used gives us the opportunity to test more candidate genes in our validation experiments to rule out false positives that may have been selected using our lenient cutoff. While the DGRP may be underpowered, the ease with which one can perform empirical validation using Drosophila genetics makes the model as a whole very useful. Importantly, our goal is not to claim the associations reported here as definitive markers for host susceptibility to C. burnetii infection but to broadly identify candidate genes for future mechanistic studies in the context of C. burnetii or other pathogenic infections.
Of the 64 unique polymorphisms identified from the GWA, 14 variants are intergenic (21.9%), three of which are within 200 base pairs upstream of nearest gene; 39 are within introns (60.9%); eight are within exons (12.5%); one is within the 5 0 UTR (1.6%); and two are in antisense-coding RNA within exon/introns (3.1%) (Supplementary Table S5). Of the eight SNPs within exons, six are silent and two are missense mutations. The 64 unique variants correspond to 31 unique genes that we narrowed to 25 candidate genes for validation experiments. Candidate genes were chosen based on stocks available and the location and type of gene disruption used in the available stocks. Due to these limitations, we did not perform validation experiments for CG42455, side, MtnB, Or92a, Cpr100A, or CG32694. We also report the relative male: female ratio gene expression data of each candidate gene using information available on Flybase (FB2019_02) ( Table 1). We used the DGRP genome assembly (BDGP R5/dm3) to gather putative functions and regulatory annotations for each gene using Flybase and modENCODE and found that 12 are in Figure 1 Experimental design schematic. Groups of 40 males and females per DGRP line were injected with PBS or C. burnetii at 10 5 bacteria/fly and host survival monitored for 30 days to obtain hazard ratios. The hazard ratios of all DGRP lines were log-transformed and used as input for a GWAS. TFBS (48%); nine are within regions predicted to be transcriptionally silent (36%); one is within a long noncoding RNA (4%); and three are in enhancers only (12%) ( Table 1). Lastly, we report the human ortholog with the highest weighted score from the DRSC Integrative Ortholog Prediction Tool (DIOPT) on Flybase.

Validation of candidate genes
We next tested the 25 candidate genes from the different GWA analyses (Table 1) by infecting and monitoring survival during C. burnetii infection for 30 days in flies carrying a null mutation in the candidate gene or knocked down for the candidate gene by RNAi. We defined validation of candidate genes when the null mutant or RNAi knockdown line that has a different threshold of survival P-value significance than its genetic control, as described in the Materials and methods section and Supplementary Tables S6 and S7. Of the 25 candidate genes, 6 validated in null mutants only (24%), five in RNAi knockdown only (20%), 4 in both null mutants and RNAi (16%), and 10 did not validate with either method (40%) ( Figure 3A). Survival of w 1118 males and females (Figure 3, B-B 0 ) during C. burnetii infection was used as the genetic control for several null mutants, including RhoGEF64C MB04730 ( Figure 3C), tara 1 (Figure 3D), and CG13404 f07827b ( Figure 3E). We selected these candidate genes to represent how we determined validation based on P-value and survival trend for validating genes from different categories, i.e. null-only, RNAi-only, or both. w 1118 females ( Figure 3B) are not susceptible to C. burnetii infection (P ¼ 0.0333) but w 1118 males (Figure 3B 0 ) are highly susceptible (P < 0.0001) which corroborates our previous work (Bastos et al., 2017). We selected the candidate gene RhoGEF64C MB04730 from male-only GWA (Figure 3, C-C 0 ) and we observe that survival in null mutants ( Figure 3C) is overall tolerant (P ¼ 0.0014) compared to w 1118 males ( Figure 3B 0 ). In contrast, there is no significant change in survival between control and RNAiknockdown flies (Figure 3C 0 ) (control, P ¼ 0.0374; RNAi, P ¼ 0.0130). Thus, RhoGEF64C MB04730 males validated only in null mutants. The candidate gene tara was selected from the femaleonly GWA and we observe that in null mutants ( Figure 3D) and RNAi knockdown flies ( Figure 3D 0 ), the absence of the gene results in decreased survival compared to control genotypes. Specifically, tara 1 females are susceptible to infection (P < 0.0001) Figure 2 Genome-wide association analyses with hazard ratios reveal candidate genes. Manhattan plots for (A) male, (B) female, (C) average, and (D) difference GWA analyses using mixed-effect P-values for all four traits from dgrp2 webtool. Highlighted gene variants with P-values below 10 À5 are labeled with associated candidate genes. compared to w 1118 females ( Figure 3B) and tara RNAi females ( Figure 3D 0 ) are also susceptible (P ¼ 0.0025) compared to control (P ¼ 0.0123). Thus, tara validated for females in both null mutants and RNAi knockdown flies. We selected the candidate gene CG13404 f07827b from female-only GWA and we observe that null mutants ( Figure 3E) are not susceptible to infection (P ¼ 0.2737) like w 1118 females ( Figure 3B). In contrast, CG13404 RNAi females ( Figure 3E 0 ) are susceptible to infection (P < 0.0001) while control genotype females are not (P ¼ 0.3914). Thus, CG13404 validated only in RNAi knockdown flies. Finally, to determine the relative effects on survival in males and females for all candidate genes, we calculated the relative hazard ratio of the null mutant (Supplementary Figure S3A) or RNAi knockdown (Supplementary Figure S3B) flies as compared to their respective genetic controls. Relative hazard ratios greater than one mean that the loss of that gene reduces survival to C. burnetii infection while relative hazard ratios less than one mean that that the loss of that gene improves survival to C. burnetii infection.   Komar 2016;Zhou et al. 2016;Jeacock et al. 2018). Therefore, we used data available from the ENCODE project to determine regulatory annotations for the variants in genes that validated in host survival experiments. Table 2 summarizes the splicing and branch point analysis in terms of percent variation from wild type and codon usage as a fraction of frequency of amino acid (SNP) per thousand over frequency of amino acid (wild type) per thousand. Several SNPs varied at the predicted mRNA splicing sites, branch points, or codon usage compared to wild-type sequences such as the variants affecting the validated genes CG34351, DIP-e, Pura, tara, FER, and IP3K2. The insertion (3R_5218712) within FER results in a À92.69% difference from wild-type splicing and has no variation in branch point splicing from wild type. This change in splicing for the FER indel indicates RNAi males (C 0 ), tara 1 (D) or control and tara RNAi females (D 0 ), or CG13404 f07827a (E), or CG13404 RNAi females (E 0 ) following mock or C. burnetii infection. Each survival curve represents two independent experiments of at least 40 flies that were combined for a final survival curve, Statistical significance (Log-rank test) from the mock-infected group is indicated. the site is broken but a 3 base pair insertion offsets the destruction. Similarly, the SNP (3R_12079260) within tara differs 64.82% from wild-type splicing which also indicates a new splice site creation with no destruction. The frequency of the wild-type DIP-e codon is 3.72-times higher than that of the DIP-e SNP (2L_6394872), which suggests that decreased abundance of tRNA codon availability for the transcript variant may affect its translation and thus its function during C. burnetii infection. Changes in codon usage fraction for shn and CG13404 may affect these gene variants albeit to a lesser extent.

Discussion
In this study, we describe the application of an unbiased GWAS using the DGRP to reveal variants in genes associated with C. burnetii infection. We show that 15 genes conferred a significant difference in host survival during C. burnetii infection in null mutants, RNAi knockdown, or both gene disruption methods. We also compiled regulatory annotations of the variants in validating genes and show that certain gene variants affect splicing and codon usage, which may in turn downregulate gene expression. These data support the use of previously generated null mutant or RNAi knockdown gene disruption fly stocks to validate our candidate genes. While C. burnetii is not a natural D. melanogaster pathogen, there is evidence that it is an endosymbiont of tick arthropods and may have co-evolved in these animals (Duron et al. 2015). Previous studies show that few genome-wide significant variants are identified when using a non-natural versus natural D. melanogaster pathogen (Magwire et al. 2012), or when the pathogen has low prevalence in D. melanogaster (Chapman et al. 2020). Similarly, our GWA analyses did not identify any variants that met the expected Bonferonni-corrected P-value significance level. Instead, this study provides a platform to examine potential host factors that regulate C. burnetii infection given the limited genetic tools available for the bacteria. It is currently not possible to perform a genetic screen in mammals using the BSL2 strain of C. burnetii due to the inability of the strain to infect wildtype animals. Furthermore, a genetic screen using a Select Agent strain would encounter its own logistical roadblocks. Using our innovative approach, we uncover genes and processes that are relevant to bacterial infections in general. Additionally, directed studies of these genes could be performed using a Select Agent strain of C. burnetii. For these reasons, we do not take the validating genes in this study to be absolute but rather a first step toward understanding their cross-species role in the host response to infection. Previous studies have shown that C. burnetii infection differentially affects male and female mice (Leone et al. 2004;Textoris et al. 2010), and our results corroborate these studies. First, male and female hazard ratios differ significantly among the DGRP lines in the initial screen (Supplementary Figure S1A). We interpret the difference in hazard ratios between sexes as a conserved phenotype with mammalian organisms. Secondly, expression of the candidate genes in adult flies differs between sexes (Table 1). For example, the expression of RhoGEF64C and DIP-e are 7.0 and 3.8-times more expressed in males than females, which is a potential reason why they validated in the male-only category, but not a definite cause. Validating genes from the difference category have higher differential sex expression, such as CG42741 and CG42673, which are 10 and 4.5-times more expressed in males than females, respectively. In contrast, validating genes in average category have closer relative expression, such as Pura and IP3K2, which are only 1.5 and 2.4-times more expressed in males than females, respectively. These results suggest that gene expression may drive sex-specific differences in host survival to C. burnetii infection. Lastly, genes we identified and validated as sex-specific, such as CG13404 and FER, have known functions in immunity, as discussed below.
Host survival in CG13404 RNAi-knockdown female flies indicated they were significantly more susceptible to infection compared to control genotype. The human ortholog of CG13404 is the plasminogen receptor gene PLGRKT, which is important for macrophage polarization and efferocytosis, two key components of inflammation regulation (Vago et al. 2019). The absence of Plg-R KT causes defective plasminogen binding and inflammatory macrophage migration in both male and female mice pups, but only female PLGRKT À/À pups die 2 days after birth (Miles et al. 2017). Our results corroborate this study because CG13404 was a top candidate from the female-only GWA. We hypothesize the role of Plg-R KT in macrophage regulation connects CG13404 to immunity. In D. melanogaster immunity, hemocytes are the professional phagocytic cells. They are present in flies during both larval and adult stages, and they recognize, engulf, and destroy dying host cells and pathogens (Hoffmann 2003;Yano et al. 2008;Regan et al. 2013). Hemocytes are critical for innate immune signaling by mediating the secretion of antimicrobial peptides (AMPs) in response to pathogens through the Toll, JAK/STAT, and Immune deficiency (Imd) pathways (Hoffmann 2003;Lemaitre and Hoffmann 2007). Recent studies in our lab show that hemocytes support C. burnetii replication and induce Imd-specific AMPs (Bastos et al. 2017;Hiroyasu et al. 2018). However, our screen did not identify any genes in the classical Imd or Toll pathway. Nevertheless, the involvement of CG13404, the fly ortholog of Plg-R KT , during C. burnetii infection may exemplify conserved, sex-specific differences in mammalian macrophage and fly hemocyte regulation.
FER expression leads to activation of the DPP-mediated pathway, which has recently been shown to improve survival of Klebsiella pneumoniae through STAT3 when overexpressed (Murray 2006;Dolgachev et al. 2018;Li et al. 2019). Coxiella burnetii induces expression of STAT3 and IL-10 during murine infection (Murray 2006;Textoris et al. 2010;Millar et al. 2015). Textoris et al. show that male mice have increased gene expression of STAT3 and IL-10 during infection which may account for the higher susceptibility of Q fever observed in men. Our study corroborates this study because FER was a top candidate in the female-only GWAS. We hypothesize that the absence of FER in females disrupts the immune response required to control infection and leads to significantly decreased host survival.
In addition to FER, the other three genes that validated in both gene disruption methods reveal new connections between immunity and C. burnetii infection. Tara encodes a transcriptional coregulator that interacts with chromatin remodeling complexes, cell cycle proteins, the JNK signaling pathway, and plays a role in ataxin-1-induced degeneration (Fernandez-Funez et al. 2000;Calgaro et al. 2002;Branco et al. 2008;Afonso et al. 2015). The human ortholog of tara is SERTAD1, which is also a transcriptional co-regulator (Biswas et al. 2010;Savitz et al. 2013). Interestingly, induction of SERTAD1 is expressed independently of IFN during Nipah virus infection (Glennon et al. 2015). IFN induction is tissue-dependent during C. burnetii infection (Hedges et al. 2016); therefore, it is plausible that tara is targeted by the bacteria during infection. DIP-e encodes a protein belonging to the immunoglobulin superfamily of defective proboscis extension response (Dpr) and Dpr-interacting proteins (DIP), which form a complex network of cell surface receptors in synaptic specificity. The human ortholog of DIP-e is OPCML, an immunoglobulin protein best characterized as a tumor suppressor (Cui et al. 2008;Birtley et al. 2019), and there is currently no reported role for either gene during bacterial infections. CG42673 remains uncharacterized, but another DGRP GWAS reports that loss of function of CG42673 in blood cells significantly impairs the cellular immune response to Staphylococcus aureus (Nazario-Toole 2016). Interestingly, this study also shows that dpr10 significantly affects S. aureus phagosome maturation while our own top candidate, dpr6 validated by RNAi knockdown. It is possible that CG42673 functions as an enhancer like its human ortholog NOS1AP (Grossmann et al. 2015;Hein et al. 2015) and regulates C. burnetii infection.
While some gene candidates validated as expected, other candidates validated as per our statistical tests, but in opposite directions. For example, we observed differences between the DGRP predictive effect of top candidates and validating genes. The GWA output predicts that the RhoGEF64C SNP (3L_4738164) has increased host susceptibility to infection (effect ¼ À0.1709, Supplementary Table S5). However, survival of RhoGEF64C MB04730 null mutant males was significantly improved compared to control genotype ( Figure 3C) during C. burnetii infection. The opposite survival trend in the null mutant flies is likely if the SNP is a gain-of-function mutation, which is difficult to test but worth pursuing in further studies. Similarly, the SNP (X_14160126) in CG13404 (effect ¼ 0.1213) predicts decreased host susceptibility to infection but RNAi knockdown females were significantly more susceptible compared to the control genotype ( Figure 3E 0 ). One explanation for these opposing results is the possibility of gene product threshold effects, and overall susceptible or tolerant phenotypes during infection must be tested at the host level with subsequent functional experiments. Furthermore, the effect of the gene variants cannot be inferred from GWA alone, and knockout or knockdown of the associated gene may yield a different phenotype than that which was predicted for the variant in question. The use of null mutants or gene knockdown by RNAi is common practice to validate DGRP candidate gene variants (Howick and Lazzaro 2017;Palu et al. 2019Palu et al. , 2020, but the most precise way to test the effect that a gene variant has is to use gene-editing technology to knock-in the specific gene variant (Yoo et al. 2020). While we did not test the effect of each individual allele variant on host survival during C. burnetii infection, we conclude that the presence or absence of the genes in which the variants lie affects the outcome of infection.
In conclusion, this study builds on our previously developed framework utilizing D. melanogaster as an animal model to dissect the innate immune response to C. burnetii infection (Bastos et al. 2017;Hiroyasu et al. 2018). We observe that C. burnetii infection significantly depends on host genetic background of the fly. In contrast, genetic studies in relevant natural hosts such as ticks, livestock, and humans are severely limited. Thus, we propose that the validating genes in this study can be used to test new hypotheses regarding host responses, taking into consideration the genes' function in flies, their regulatory annotations, and their orthologs' function in humans or other animals. These studies may reveal novel mechanisms of transmission among different host species or help identify at-risk human and livestock populations through genotyping efforts.

Funding
This investigation was supported by funds from Washington State University and the National Institutes of Health Public Health Service grant R21AI128103 (to A.G.G.). R.D.O. was supported by NIH Training Grant T32AI007025. This investigation's contents are solely the responsibility of the authors and do not necessarily represent the official views of the NIH.

Conflicts of interest
None declared.