Identification of Essential Genes and Fluconazole Susceptibility Genes in Candida glabrata by Profiling Hermes Transposon Insertions

Within the budding yeasts, the opportunistic pathogen Candida glabrata and other members of the Nakaseomyces clade have developed virulence traits independently from C. albicans and C. auris. To begin exploring the genetic basis of C. glabrata virulence and its innate resistance to antifungals, we launched the Hermes transposon from a plasmid and sequenced more than 500,000 different semi-random insertions throughout the genome. With machine learning, we identified 1278 protein-encoding genes (25% of total) that could not tolerate transposon insertions and are likely essential for C. glabrata fitness in vitro. Interestingly, genes involved in mRNA splicing were less likely to be essential in C. glabrata than their orthologs in S. cerevisiae, whereas the opposite is true for genes involved in kinetochore function and chromosome segregation. When a pool of insertion mutants was challenged with the first-line antifungal fluconazole, insertions in several known resistance genes (e.g., PDR1, CDR1, PDR16, PDR17, UPC2A, DAP1, STV1) and 15 additional genes (including KGD1, KGD2, YHR045W) became hypersensitive to fluconazole. Insertions in 200 other genes conferred significant resistance to fluconazole, two-thirds of which function in mitochondria and likely down-regulate Pdr1 expression or function. Knockout mutants of KGD2 and IDH2, which consume and generate alpha-ketoglutarate in mitochondria, exhibited increased and decreased resistance to fluconazole through a process that depended on Pdr1. These findings establish the utility of transposon insertion profiling in forward genetic investigations of this important pathogen of humans.

aided by CRISPR/Cas9 technology (Enkler et al. 2016;Vyas et al. 2018). Previously, a consortium of researchers has replaced 15% of the non-essential genes with DNA barcodes and a nourseothricinresistance marker (NATr) (Schwarzmüller et al. 2014). Systematic screens of this collection have yielded insights into a number of biological processes including relative fitness, colony morphology, biofilm formation, and resistance to three different classes of clinical antifungals (fluconazole, caspofungin, amphotericin B). However, the collection is highly biased toward regulatory genes of interest to the community. In a pioneering approach, thousands of random Tn7 transposon insertions also were generated in vitro and integrated into the C. glabrata genome, which were then arrayed and screened individually for susceptibility to antifungals (Castano et al. 2003;Sanchez et al. 2019).
Transposon insertion profiling in eukaryotes has dramatically improved in recent years. Next-generation DNA sequencing technology facilitates en masse analyses of very large pools of insertion mutants (Guo et al. 2013;Bronner et al. 2016;Michel et al. 2017;Zhang et al. 2018). Typically, genomic DNA adjacent to each insertion site is directly amplified by PCR and sequenced using Illumina technology, and then the reads are mapped to precise sites in the genome and tabulated. In contrast to CRISPR/Cas9 screening approaches that are user-guided, transposon insertion profiling enables direct observation and quantitation of each mutation in the population and full coverage of the genome. In S. cerevisiae, millions of independent insertions have been obtained using derivatives of the maize Activator/Dissociation (mini-Ac/Ds) transposon (Michel et al. 2017) and the housefly Hermes transposon (Gangadharan et al. 2010). In C. albicans, a mini-Ac/Ds transposon (Segal et al. 2018) and a PiggyBac transposon (Gao et al. 2018) have yielded hundreds of thousands of defined insertions. In the fission yeast Schizosaccharomyces pombe, which is distantly related to all the budding yeasts, both Hermes (Guo et al. 2013;Lee et al. 2020) and PiggyBac (Li et al. 2011) have produced high-density insertion pools. Remarkably, all the transposons inserted with high enough density to reliably distinguish most non-essential genes from essential genes, where severe fitness defects of insertion mutants cause their depletion from the pool (Levitan et al. 2020).
In this study, we adapt the Hermes transposon for insertion profiling in a clinical isolate of C. glabrata. Over 500,000 different insertion sites were identified using the quantitative insertion-site sequencing (QIseq) method (Bronner et al. 2016), allowing for the first description of its essentialome as well as comparison with S. cerevisiae. We show that Hermes inserts in C. glabrata with sequence bias and nucleosome bias similar to S. cerevisiae but with a much stronger centromere bias, suggesting a difference in chromosome architecture. Interestingly, genes involved in kinetochore function were more likely to be essential in C. glabrata than S. cerevisiae. We also identify hundreds of genes that alter susceptibility to fluconazole, including many previously identified genes. These findings illustrate the power of in vivo transposon mutagenesis when coupled with nextgeneration DNA sequencing for functional genomics research. They also extend our understanding of drug resistance mechanisms that operate in an important pathogen of humans, thus facilitating development of improved and novel antifungal strategies.

Plasmids, organisms, and culture conditions
The plasmid pCU-MET3-Hermes was constructed by double digestion of pCU-MET3 (Zordan et al. 2013) with SpeI and XhoI and 3-way ligation with a 2.1 kb SpeI-NotI and a 2.3 kb NotI-XhoI fragment from pSG36 (Gangadharan et al. 2010). DNA sequencing confirmed that Hermes transposase coding sequences were placed downstream of the methionine-repressible MET3 promoter of C. glabrata on the centromeric plasmid bearing the Hermes-NATr transposon. This plasmid was introduced into strain BG14, a ura3derivative of C. glabrata BG2 (Cormack and Falkow 1999), by transformation using the lithium acetate protocol (Cormack and Falkow 1999). Individual transformants were purified and stored frozen at -80°until use.
To generate pools of Hermes-NATr insertion mutants, single colonies of BG14 [pCU-MET3-Hermes] were inoculated into 100 mL synthetic SCD medium lacking uracil, cysteine, and methionine and shaken for 3 days at 30°in either a single 500 mL glass culture flask (pools 1 and 2) or forty 16 · 150 mm glass culture tubes which were then pooled (pool 3). The cells were then pelleted, resuspended in 600 mL SCD medium containing 0.1 mg/mL nourseothricin and 1 mg/mL 5-fluoroorotic acid and shaken overnight at 30°in 2 L culture flasks. This step was repeated once more. Finally, 60 mL of these partially enriched cultures were pelleted and resuspended in 600 mL of the same medium and cultured as before. The highly enriched cultures were pelleted, resuspended in 15% glycerol, and frozen in aliquots at -80°. Aliquots were thawed and sequenced according to the QIseq protocol below or diluted 10-fold into 10 mL fresh SCD medium, grown overnight at 30°, and diluted 100-fold into 300 mL fresh SCD medium containing or lacking 128 mg/mL fluconazole. These cultures were shaken at 30°for 24 hr, then the cells were pelleted, washed 1x in SCD medium to remove residual drugs, resuspended in 300 mL fresh SCD medium, and cultured at 30°f or an additional 2 days. Cells were then pelleted, resuspended in 15% glycerol, and frozen in aliquots at -80°as before.
The IDH2 gene (CAGL0I07227g) and KGD2 (CAGL0E01287g) genes were knocked out in strains BG14 and CGM1094, an isogenic derivative of BG14 that carries a pdr1Δ mutation (Orta-Zavalza et al. 2013), using a variation of the standard protocol (Schwarzmüller et al. 2014). Briefly, 59 and 39 homology arms located upstream and downstream of the IDH2 and KGD2 genes were PCR amplified from BG14 genomic DNA, fused to an intervening PCR product of URA3 from S. cerevisiae, and transformed into BG14. The Ura+ colonies were screened by PCR using primers listed in Supplemental Table S3 to identify mutants where IDH2 and KGD2 were deleted and replaced with URA3. A verified idh2Δ mutant (named AGY07), idh2Δ pdr1Δ double mutant (AGY04), kgd2Δ mutant (AGY15), and kgd2Δ pdr1Δdouble mutant (AGY12), and the parent strains were grown to saturation in SCD medium, diluted 2000-fold into fresh medium containing variable concentrations of fluconazole in 96-well dishes, mixed, and incubated at 30°for 20 hr. Optical density at 650 nm was then measured for 3 technical replicates. IC50 was calculated independently for each replicate by fitting the data to a standard 3-parameter sigmoid equation using non-linear regression (Kaleidagraph v4.5 software) and the 3 values were averaged (6 SD).
Genomic DNA extraction, QIseq, data processing and visualization To extract genomic DNA, 100 mg of thawed cell pellets were washed three times in 1 mL deionized water and extracted using Quick-DNA Fungal/Bacterial Miniprep kit (Zymo Research). A total of 2.4 mg of purified gDNA was fragmented by sonication to average size of 350 bp in four separate aliquots using a Diagenode Picoruptor. The fragmented DNA was then end repaired, ligated to splinkerette adapters (Supplemental Table S3), size selected with AMPure xp beads, and PCR amplified in separate reactions using transposon-specific and adapter-specific primers as detailed previously (Bronner et al. 2016). Samples were then PCR amplified twice (Supplemental Table S3) to enrich for insertion sites and to attach Illumina P5 and P7 adapters that were not indexed (pool Cg-1) or indexed (pools Cg-2, Cg-3). PCR products were purified with AMPure XP beads (Beckman Coulter Life Sciences), mixed with phiX-174, loaded into MiSeq instrument (Illumina) and 75 bp of each end was sequenced using custom primers specific for Hermes right inverted repeat and P7 (Supplemental Table  S3). Detailed protocols and primer sequences are available upon request. DNA sequence reads (Cg-2 and Cg-3) were de-multiplexed using CutAdapt (Martin 2011) mapped to the C. glabrata CBS138 reference genome (version 32) using Bowtie2 (Langmead and Salzberg 2012), and any mapped reads with a quality score ,= 20 or a mismatch at nucleotide +1 were removed to eliminate ambiguous mappings. The number of DNA sequence reads that map to each unique site in the genome was tabulated using a custom script in Python. Data from multiple sequencing runs were visualized using the IGV genome browser (Robinson et al. 2011) after scaling the number of reads at each site according to the following formula: reads · 20 + 100 (Michel et al. 2017).

Computational methods
The libraries prepared from 3 different pools were sequenced to different levels (ranging from 7.2 to 21.5 million mapped reads), and therefore required normalization before they can be combined and analyzed. For normalization, we developed a back-sampling approach that simultaneously estimates the frequency of jackpots (the frequency of reads that occur at one or more over-represented sites) and midLC (mid-library complexity, or the number of reads that produce 2x midLC unique insertion sites after eliminating jackpots). Briefly, the list of mapped reads was sampled randomly at multiple different depths (100, 400, 1600, 64000, etc.) in triplicate up to the maximum and the number of unique sites at each depth was recorded. The frequency of unique sites at each depth (y-axis) was charted against depth (x-axis) and the data were fit to a 3-parameter sigmoid equation y = (1-jackpot) / (1+(x/midLC)^slope) using non-linear regression (Kaleidagraph v4.5 software). The output produced estimates of jackpot frequency (percentage of high frequency sites), mid-library complexity or midLC (after excluding jackpots, the read depth where half map to unique sites and the remainder map to those same sites), and slope factor. In all three libraries, jackpots were low (, 0.04%) while midLC varied over a 1.6 fold range (Supplemental Figure S4). The library derived from pool Cg-1 had the highest number of mapped reads and the lowest midLC, with a ratio between the two values of 259. In contrast, the more complex libraries from pools Cg-2 and Cg-3 were sequenced to 65.1 and 63.3 times the midLC, respectively, or about 4-fold lower depth than Cg-1. For comparison, libraries prepared from Hermes insertion pools in haploid and diploid S. cerevisiae (Levitan et al. 2020) yielded midLC's that averaged 1.06-and 1.74-fold higher than C. glabrata libraries (Supplemental Figure S4). A compilation of six mini-Ac/Ds libraries from S. cerevisiae (Michel et al. 2017) exhibited far higher midLC and jackpots (Supplemental Figure S4). Thus, midLC, jackpots, and depth of sequencing can vary significantly between species, transposons, and methodologies.
To identify essential genes in C. glabrata, the datasets obtained from pools Cg-2 and Cg-3 were normalized to correct for 4.0-fold under-sequencing relative to that of Cg-1 and then combined. The combined dataset was analyzed by an 8-feature machine learning algorithm that was previously used to identify essential genes in C. albicans, S. pombe, and S. cerevisiae (Levitan et al. 2020). As listed in Supplemental Table S4, each feature were weighted based on a validated training dataset of 692 essential and 1756 nonessential genes from S. cerevisiae (Levitan et al. 2020).
To quantify insertion site sequence bias, raw data from pools 1, 2, and 3 were compiled into a single table and nucleotides at positions +2 and +7 were imported from the reference genome. The read counts at each of the 16 possible sites were summed for each pool and compared to all such sites in the CBS138 reference genome. To quantify centromere bias, the read counts at each insertion site from pool 1 were binned into 1 kb segments from the centromere and averaged across all 26 chromosome arms of C. glabrata and all 32 arms of S. cerevisiae (Levitan et al. 2020).
To calculate z-scores of each gene, the log(2)ratio of experiment/ control was divided by the local standard deviation, which was calculated as follows. The data from two biological replicates of control culture conditions were tabulated for each gene and used to calculate average and log(2)ratio. The table was sorted from highest to lowest average and then the 40-gene running average and running standard deviation of the log(2)ratios were calculated. The running standard deviation (y-axis) was charted against the average (x-axis) and fit to a power function y = m1 + m2 ⋆ x^m3 using non-linear regression (Kaleidagraph) for all genes with average read counts .= 6. The parameters obtained from the best-fits were then used to calculate local standard deviation of the log(2)ratios for each gene in the experimental and control datasets after slight normalization of the replicates.
Multiple sequence alignment using MUSCLE and phylogenetic tree analysis using PhyML were implemented using SEAVIEW v4 (Gouy et al. 2010).

Data availability
Raw DNA sequence data were deposited at the Sequence Read Archive (NCBI) with the bioproject ID PRJNA625944. Tables of mapped sequence reads as well as unmappable BG2 and CBS138 sites are available for download from the authors upon request. The .bed files used for IGV genome browser will be available upon request and downloadable from the Candida Genome Database (http://www.candidagenome.org). Supplemental material available at figshare: https://doi.org/10.25387/g3.12445373.

RESULTS
Hermes transposition in C. glabrata Our strategy for launching the Hermes-NATr transposon and enriching for insertions in C. glabrata was based on prior studies in S. cerevisiae (Gangadharan et al. 2010) and S. pombe (Guo et al. 2013). Briefly, a centromere-containing plasmid bearing a methionine-repressible MET3 promoter and a counter-selectable URA3 gene (Zordan et al. 2013) was modified to express the Hermes transposase and to contain a Hermes transposon in which a NATr expression cassette replaced the natural transposase gene. Upon induction of transposase expression, the transposon can be excised from the plasmid and inserted into the genome. The plasmid launchpad is frequently lost and the cells become resistant to 5-FOA while remaining resistant to nourseothricin. After transformation of C. glabrata strain BG14, a derivative of clinical isolate BG2 (Cormack and Falkow 1999), and growth in synthetic complete medium lacking uracil, we observed a much greater number of insertion mutants (simultaneously resistant to both 5-FOA and nourseothricin) in the absence of methionine and cysteine than in their presence. The number of insertion mutants increased 50-fold or more after the cultures reached stationary phase on day 1 (Supplemental Fig. S1). Thus, after 3 days of induction, the vast majority of insertion mutants are likely to be independent of one another with relatively few insertions that occurred during the exponential growth phase and proliferated to a disproportionate frequency (i.e., jackpots).
In three separate cultures, less than 0.04% of cells in the population acquired an insertion into the genome, suggesting that instances of re-excision and re-insertion in the same cell line are extremely rare. Insertion mutants were enriched to high frequency (.99%) by passaging several times in medium containing nourseothricin and 5-FOA (see Materials and Methods). The enriched pools were frozen in aliquots for future use. Genomic DNA was extracted from each pool, sheared, repaired, ligated to adaptors, and amplified by PCR with custom primer pairs that target the adaptor and the right inverted repeat of the Hermes transposon as described previously (Bronner et al. 2016;Levitan et al. 2020). The resulting libraries of PCR products were then directly sequenced on an Illumina MiSeq instrument using a custom primer that hybridized to the end of the transposon. Sequence reads were filtered for quality and mapped to the reference genome of C. glabrata strain CBS138 (see Materials and Methods). Sequence reads that map to the same insertion site were tallied and visualized using the IGV genome browser together with annotated genes and other genomic features ( Figure 1). The results appeared comparable to previous findings in S. cerevisiae, where insertions were distributed semi-randomly throughout the lengths of all 13 chromosomes of C. glabrata. Overall, the three pools contained between 164,000 and 335,000 defined insertions, with a combined total of 513,123 unique insertion sites. Approximately 1.4% of the CBS138 reference genome was unmappable using short reads due to repetitive sequences and another 17% contained essential genes (see below). Excluding these regions, an average of one insertion was observed every 19 bp.

Insertion biases
Hermes transposons are known to insert preferentially at sites with T at position +2 and A at position +7 (Gangadharan et al. 2010). In the three C. glabrata pools of Hermes-NATr insertions, 72% of all mapped sequence reads were located at such TA sites, a frequency that is 7.8-fold higher than that of TA sites in the genome ( Figure 2A). Near-cognate TG and CA sites exhibited 1.15-and 1.12-fold enrichment, while the nine non-cognate (non-T and non-A) sites exhibited 10-to 162-fold underrepresentation ( Figure 2A). In spite of this 1,250-fold range of insertion site bias, all annotated genes of C. glabrata contain multiple preferred TA sites and near-cognate sites. These findings suggest that with sufficient diversity of insertions in the initial pool, complexity of the library, and depth of sequencing, all genes can be profiled.
Hermes and other transposons also insert more frequently into DNA that is physically close to the site of their excision. As Hermes-NATr was launched from a centromere-containing plasmid and centromeres cluster together within the nucleus of C. glabrata (Descorps-Declère et al. 2015) similar to S. cerevisiae (Kitamura et al. 2007), we expect to observe an insertion bias toward all centromeres. To quantify the magnitude of this bias, the read counts along all 26 chromosome arms of C. glabrata were tabulated in 1 kb increments from the centromere and compared to data from 32 chromosome arms of S. cerevisiae obtained using a similar centromerecontaining plasmid (Levitan et al. 2020). Though insertions were biased toward centromeres in both species, the effect appeared much stronger and longer in C. glabrata relative to S. cerevisiae ( Figure 2B). This difference suggests potential variation between the species in architecture and/or packaging of chromosomes or plasmids in the nucleus.
Transposon insertions in vivo are also biased toward nucleosomefree regions (Guo et al. 2013;Gangadharan et al. 2010), such as promoter regions upstream of coding sequences. By aligning all nonessential genes at the start codons and tabulating read counts at every nearby position, a clear bias toward the 59 non-coding region was observed relative to coding sequences ( Figure 2C, black). Within the first 100 codons, non-essential genes exhibited .30-fold higher read counts than essential genes ( Figure 2C, red) that are identified in the next section. Interestingly, in the adjacent 59 non-coding region between -200 and -1 relative to the start codon, non-essential genes also exhibited substantially higher read counts than essential genes ( Figure 2C). These findings suggest Hermes-NATr insertions in Figure 1 Hermes-NATr insertions visualized in C. glabrata. IGV browser representations of insertions in segments of chromosomes J (A) and B (B). Each row contains tick marks representing mapped insertion sites within a particular sequenced library that have been scaled to reflect read counts at that site. Segments of the CBS138 reference genome that are unmappable with short reads of CBS138 and BG2 genomic DNA are depicted in the top two tracks and highlighted (blue circles). Black bars indicate the positions of coding sequences and arrows indicated direction of transcription. Numbers at bottom indicate essentiality scores. Essential genes (red boxes) and PAN1 (black box) are indicated.
C. glabrata have little or no enhancer and promoter activities, which is similar to Hermes-NATr in S. cerevisiae (Levitan et al. 2020) and distinct from mini-Ac/Ds in S. cerevisiae (Michel et al. 2017).

Identification of essential genes
Essential genes perform functions that are critical for growth and survival in the laboratory and their identification can facilitate development of new antifungals. Generally, transposon insertions within the coding sequences of essential genes greatly diminish competitive fitness, leading to depletion of sequence reads relative to those in the surrounding non-essential DNA. To discover essential genes in C. glabrata, we combined the data from all three pools after normalizing for the 4-fold under-sequencing of libraries from pools Cg-2 and Cg-3 relative to Cg-1 (using a novel statistic termed midLC; see Materials and Methods) and then we implemented a machine learning approach that had been applied successfully to multiple transposon insertion datasets from multiple species (Levitan et al. 2020). The algorithm was trained on a set of confirmed essential and non-essential protein-encoding genes from S. cerevisiae using eight classification features that represent different aspects of essentiality. Essentiality scores ranging from 0 (non-essential) to 1 (essential) were calculated for each gene. A histogram of the output revealed a bimodal distribution with maxima in the first and last deciles and relatively few genes in the middle quartiles ( Figure 3A). A total of 1342 out of 5282 protein-encoding genes scored above 0.5 and were classified as essential for competitive fitness in these conditions (Supplemental Table S1). After filtering genes that contain . 50% unmappable segments, this number dropped to 1278 out of 5190 genes (24.6%), which is similar to 1232 out of 5674 genes (21.7%) in S. cerevisiae cultivated under similar conditions (Levitan et al. 2020). Hundreds of genes in C. glabrata and S. cerevisiae exist as duplications from an ancient whole genome duplication (WGD) event while thousands exist as singletons (Wolfe and Shields 1997). When only 1-to-1 orthologous genes are considered (3880 genes total) (Byrne and Wolfe 2005), 84% of essential genes in C. glabrata were also essential in S. cerevisiae.
To explore possible instances of essentialome evolvability, we focused on 1001 genes of the 1-to-1 orthology group that were confirmed as inviable (essential) in S. cerevisiae through conventional approaches (Winzeler et al. 1999). We found 93 genes of C. glabrata that scored below 0.5 (non-essential) while scoring 0.5 or higher in S. cerevisiae (essential) and annotated as inviable at the Saccharomyces Genome Database ( Figure 3B). Gene ontology (GO) analyses (Eden et al. 2009) revealed enrichment of spliceosomal complex (GO:0005681, P-value = 1.8E-4, FDR q-value = 1.2E-1) encompassing 15 genes ( Figure 3B, blue squares) and transcriptional mediator complex (GO:0016592, P-value = 8.6E-4, FDR q-value = 2.9E-1) encompassing 5 genes. C. glabrata retains less than half as many introns as S. cerevisiae (Neuvéglise et al. 2011), so perhaps the level of importance of some spliceosomal subunits has been relaxed in this species. Another interesting example is the SPS complex, where all 3 components (SSY1, PTR3, SSY5) were scored essential in S. cerevisiae but not C. glabrata ( Figure 3B, yellow diamonds). The SPS complex is only essential in laboratory strains of S. cerevisiae that have amino acid auxotrophies, such as BY4741 (his3Δ, leu2Δ, met15Δ) and is nonessential in prototrophic strains (Ljungdahl 2009). Our findings in the prototrophic BG14 strain of C. glabrata therefore confirm the S. cerevisiae (prototroph) data and expose an annotation error in S. cerevisiae genome database. Last, the ortholog of S. cerevisiae VHT1 in C. glabrata (CAGL0K04565g) scored as non-essential when it is critical for uptake of the essential vitamin H (biotin) present in the medium. Surprisingly, BLAST searches revealed a duplicate of CAGL0K04565g termed CAGL0K04609g nearby on chromosome K and this duplicate gene scored as essential. Phylogenetic analyses of these genes and their orthologs from other species indicate that the duplication is preserved in one sub-clade of the Nakaseomyces (C. glabrata, C. bracarensis, C. nivariensis, N. delphensis) but not in the other sub-clade (C. castellii, N. bacillisporus) or in any other species The number of sequencing reads at each nucleotide position relative to the start codon were tabulated for all 1-to-1 non-essential genes (black) and essential genes (red) and divided by the number of genes in each set.
of yeasts (not shown). In the four species with duplicates, the ancestral CAGL0K04565g orthologs evolved more than twice as fast as the duplicated CAGL0K04609g orthologs (Supplemental Figure 2), suggesting that the duplicates are more likely to retain the essential function in biotin transport. Recently, individual knockouts of the CAGL0K04565g and CAGL0K04609g genes in C. glabrata showed that only the latter was essential for growth in these conditions (Sprenger et al. 2020), thus validating our assessments with transposon mutagenesis.
On the other end of the spectrum, a total of 138 genes were scored as essential in C. glabrata and not S. cerevisiae where they also have been confirmed as non-essential using gene knockouts (Supplemental Table S1). One such gene (NPT1) has been shown to be essential in C. glabrata but not S. cerevisiae in similar culture conditions (Ma et al. 2007). GO term analyses suggest polyamine biosynthesis (GO:0006596, P-value 5.23E-6, FDR 2.29E-2) involving the SPE1, SPE2, and SPE3 genes is one such crucial process ( Figure 3C, blue diamonds). Though mutations in these genes cause spermidine auxotrophy in S. cerevisiae (Whitney and Morris 1978), transposon insertions in these genes apparently did not diminish growth in the spermidine-free medium used here, possibly because such mutants were surrounded by cells that were able to synthesize and excrete spermidine. Insertions in the SPE genes of C. glabrata caused fitness defects as expected. Another major GO term component enriched in this set was defined by eight genes that contribute to the kinetochore (GO:0000776, P-value = 5.76E-5, FDR q-value = 6.39E-3; Figure 3C, yellow squares). Essentiality of one such kinetochore gene (CIN8) can be explained by the loss of the functionally redundant KIP1 gene in C. glabrata. In S. cerevisiae, CIN8 is non-essential in wild-type strains essential in kip1Δ strains (Hoyt et al. 1992). Essentiality of another kinetochore gene (CBF1) has been confirmed in C. glabrata (Stoyan et al. 2001) though it is clearly non-essential in S. cerevisiae. A possible explanation for essentiality of CBF1 and the other kinetochore genes in this GO category (BUB1, BUB2, BUB3, CTF19, MCM21, MCM22, KAR3 as well as CIN8) is a potential deficiency of BIR1 function in C. glabrata. BIR1 (CAGL0M09152g) is currently annotated as a pseudogene in the C. glabrata reference genome (Dujon et al. 2004) because of an in-frame stop codon that occurs at position 588, which would eliminate the strongly conserved C-terminal domain that is essential for chromosome segregation in S. cerevisiae (Yoon and Carbon 1999) and mammalian cells (Carmena et al. 2012). In S. cerevisiae, bir1 hypomorphic mutations exhibit synthetic lethality with all these kinetochore genes (Yoon and Carbon 1999;Makrantoni et al. 2017). However, BIR1 is not likely to be a pseudogene in C. glabrata because the C-terminal domain is well-conserved in the +1 reading frame, a single mRNA is expressed that spans the entire gene (Linde et al. 2015), the mRNA contains a 7 base pair programmed +1 ribosomal frameshift sequence (Farabaugh et al. 2006) just upstream of the in-frame stop codon, and all of these features are conserved in all five sequenced species of the Nakaseomyces clade (Supplemental Fig. S3). If full-length Bir1 is expressed through a programmed +1 ribosomal frameshifting mechanism in these species, this mechanism could contribute to chromosome instability, Figure 3 Essentiality comparisons between species. (A) Histogram of essentiality scores for all protein-encoding genes of C. glabrata. The 1-to-1 orthology group was split into two groups based on SGD annotations for inviable (B) and viable (C). Dark gray circles indicate individual othologous genes. Spliceosomal complex genes (blue squares) and SPS genes (yellow diamonds) were scored as essential in S. cerevisiae but not C. glabrata. Spermidine biosynthesis genes (blue diamonds) and kinetochore complex genes (yellow squares) were scored as essential in C. glabrata but not S. cerevisiae. V-ATPase genes (red circles) and PAN1 (black circle) are indicated. which previously has been associated with acquired fluconazole resistance (Marichal et al. 1997) and persistent infections (Poláková et al. 2009).
Some duplicates from the WGD event have been fully retained in both species, forming the 2-to-2 orthology group with 606 genes. Another 236 singleton genes of C. glabrata remain duplicated in S. cerevisiae while 182 duplicated genes in C. glabrata are singletons in S. cerevisiae, forming the 1-to-2 and 2-to-1 orthology groups. In both species, the frequency of essentiality of these 1-to-2 and 2-to-1 singletons (30%) was similar to that of the 1-to-1 group (29%). But when a duplicate exists in these orthology groups of both species, the frequency of essentiality is much lower (7.5%), suggesting that duplicated genes in both species often function redundantly in essential processes. Thus, essentiality of the singleton in the 1-to-2 and 2-to-1 orthology groups can help expose redundant essentiality in the species with non-essential duplicates. In one example, the singleton AUR1 gene in S. cerevisiae scored essential while neither of the duplicates in C. glabrata scored essential, though both species remain highly susceptible to the compound aureobasidin A that directly inhibits the products of AUR1 genes (Zhong et al. 2000).
The 1-to-0 and 0-to-1 orthology groups consisted of 33 and 126 genes, respectively, after excluding BIR1 and all genes that were not present in the common ancestor of both species. Of these, only one gene in C. glabrata (CAGL0M04543g) and only three genes in S. cerevisiae scored over 0.5. All four are likely to be false positives due to their small size (228 -540 bp) or borderline score, but validation will be necessary to rule out the possibility of divergent essentiality in the two species.
A total of 82 genes in the 1-to-1 orthology group scored as essential in both C. glabrata and S. cerevisiae but were annotated as viable at SGD (Supplemental Table S1). Four such genes involved in mating type determination are false positives because they exist as near perfect duplicates and are thus almost completely unmappable with short-read Illumina sequencing. We found 1.4% of the CBS138 reference genome was unmappable even with perfect 75 bp sequence matches to the reference genome. To help identify such false positives, the number of unmappable sites within each gene was tabulated for each gene in both species (Supplemental Table S1). Interestingly, dozens of other highly mappable genes scored as essential in both C. glabrata and S. cerevisiae have been validated as non-essential in S. cerevisiae with knockout mutations (see SGD). Twelve of these mutants encode critical subunits of the V-ATPase or its assembly factors ( Figure 3C, red circles), and one gene (VPH2) has been confirmed as non-essential in C. glabrata by knockout mutation (Nishikawa et al. 2016). These findings suggest that the V-ATPase is non-essential in some culture conditions and yet essential for competitive fitness in the conditions used to generate pools of transposon mutants. Note that some knockout-confirmed essential genes score as non-essential in both species, often because the gene products contain large non-essential domains at their C-termini adjacent to essential N-terminal domains (e.g., PAN1 in Figure 1A, and Figure 3B, black circle). Altogether, transposon insertion profiling and analysis by machine learning can generate interesting insights into genes and processes that are essential for fitness, though care must be exercised to identify false positives and false negatives.
Genes that regulate susceptibility to fluconazole The large pools of Hermes insertion mutants and the profiling methods described above provide an unprecedented opportunity to identify genes that impact fitness of C. glabrata in new conditions. To identify genes that regulate susceptibility of C. glabrata to fluconazole, we grew pool Cg-1 to stationary phase, diluted aliquots 100-fold into fresh medium containing or lacking fluconazole (128 mg/mL), and shook the planktonic cultures for 24 hr at 30°. Cells were then pelleted, washed free of drugs, and cultured in an equal volume of fresh medium for an additional 48 hr. These conditions were chosen to expose genes that alter cell survival and recovery rates, in addition to genes that alter sensitivity and resistance to the antifungal. Genomic DNA from these cultures was then isolated, converted into Illumina libraries, and sequenced as before. For each annotated gene, the total number of reads that mapped within the coding sequences were tabulated (Supplemental Table S2). Each gene was then represented as a single point in a 2-dimensional plot of experimental (fluconazole) vs. control conditions ( Figure 4A). Most genes appear very close to the main diagonal and therefore did not play a strong role in susceptibility to fluconazole. Genes below the main diagonal promoted resistance (i.e., confer hypersensitivity when disrupted with transposons) to fluconazole whereas genes above the main diagonal promoted sensitivity to fluconazole. Significance of each gene was calculated as z-score relative to the local standard deviation (see Materials and Methods).
Resistance of C. glabrata to fluconazole is known to be conferred, in part, by a drug-responsive transcription factor (Pdr1) that induces the expression of a drug efflux pump (Cdr1) and homologs of phosphatidylinositol transfer protein (Pdr16, Pdr17) (Sanglard et al. 1999;Kaur et al. 2004;Tsai et al. 2006;Culakova et al. 2013;Paul et al. 2014). We observed PDR1, CDR1, PDR16, and PDR17 among the strongest outliers below the main diagonal ( Figure 4A, blue squares) with z-scores of -2.7,-4.4, -10, and -3.1, respectively. We also observed UPC2A far below the main diagonal (z-score = -18). Upc2A also confers fluconazole resistance by inducing expression of the direct target of the drug (Erg11) as well as other enzymes that function in the ergosterol biosynthetic pathway (Nagi et al. 2011;Whaley et al. 2014). DAP1, encoding a heme-binding protein important for Erg11 function and fluconazole resistance (Hosogaya et al. 2013), also conferred fluconazole resistance (z-score = -4.5). In S. cerevisiae, Dap1 physically interacts with the product of YHR045w (Tarassov et al. 2008), and yhr045wΔ mutants correlated strongly with dap1Δ mutants (Pearson correlation of 0.86 (Kim and Cunningham 2015)) in terms of their chemical interaction profiles (Hoepfner et al. 2014). Insertions in YHR045w gene of C. glabrata (CAGL0J00297g) also resulted in hypersensitivity to fluconazole (z-score = -3.6). Last, the STV1 gene appeared below the main diagonal (z-score of -10.1), confirming a previous report of fluconazole hypersensitivity when stv1 is disrupted with Tn7 transposon (Kaur et al. 2004). These findings demonstrate that genes conferring innate fluconazole resistance in C. glabrata can be identified readily through insertion profiling.
Another 14 genes were found to promote innate resistance to fluconazole by at least 2-fold with z-scores ,= -3 ( Figure 4A, blue circles). The most significant component revealed by GO term analysis of this group involved KGD1 and KGD2 (z-score = -5.7, -7.0; yellow-filled blue circles), whose products in S. cerevisiae form the alpha-ketoglutarate dehydrogenase complex that synthesizes succinyl-CoA in the TCA cycle Tzagoloff 1989, 1990). The hypersensitive phenotypes of insertions in KGD1 and KGD2 are in striking contrast to the phenotypes of insertions in IDH1 and IDH2 (z-score = +8.4 and +5.4; yellow-filled red circles), whose products together form an enzyme complex that synthesizes the alpha-ketoglutarate that is consumed by the Kgd1-Kgd2 complex. These findings suggest that production of alpha-ketoglutarate in mitochondria may somehow diminish the intrinsic resistance to fluconazole (see Figure 4B).
In addition to IDH1 and IDH2, 198 other genes above the main diagonal were significant (z-score .= 3). The striking majority of this group (135 genes) encode proteins required for mitochondrial functions such as translation, respiration, TCA cycle (excepting KGD1 and KGD2), and lipid biosynthesis (Figure 4, red circles). Previous studies showed that mutations disrupting PGS1, SUV3, and MRPL4 genes, which are critical for diverse mitochondrial functions, also confer resistance to fluconazole, likely through hyperactivation of Pdr1 (Kaur et al. 2004;Batova et al. 2008). These three genes (z-scores = 13.9, 7.0, and 4.3, respectively) fall within the minor diagonal formed by the other mitochondrial genes ( Figure 4A). Previous studies showed that complete loss of the mitochondrial chromosome causes activation of Pdr1, up-regulation of Cdr1 and Pdr16, and resistance to fluconazole (Sanglard et al. 2001;Paul et al. 2014). By generating new pools of transposon insertions in pdr1Δ knockout mutants, rho0 mutants, and other backgrounds, it should be possible to disentangle the regulatory networks that converge on Pdr1 and to identify novel Pdr1-independent regulators of fluconazole susceptibility (Coorey et al. 2015).
Numerous additional genes contributed substantially to fluconazole susceptibility. Out of 65 non-mitochondrial genes that are significantly resistant to fluconazole when disrupted by transposons, the top three genes encode subunits of the SKI complex (SKI2, SKI3, and SKI8; z-scores = 12. 8, 9.5, and 11.4; red squares), which recruits the exosome and facilitates 39 to 59 degradation of mRNAs (Brown et al. 2000). We were unable to find any precedence in the literature for involvement of the SKI complex in susceptibility to fluconazole in any species. GO term analysis of the remainder of this group yielded only one significant process: Glycosphingolipid Biosynthesis (GO:0006688, P-value = 4.7E-4), which contained catalytic and regulatory subunits of MIPC synthase (encoded by CSG2, SUR1).
An additional gene of interest that lies below the diagonal is CNA1 (z-score of -3.0; Figure 4), which encodes the catalytic subunit of the calcium-activated protein phosphatase known as calcineurin (Miyazaki et al. 2010). The CNB1 gene, which encodes the regulatory subunit of calcineurin, contained no insertions in the starting pool probably because it is very small (528 bp) and distant from the centromere (911 kb). Calcineurin plays a minor role in resistance to fluconazole and other azoles (Miyazaki et al. 2010;Schwarzmüller et al. 2014). However, calcineurin plays a major role in the promotion of cell survival during exposure to high fluconazole in C. glabrata (Onyewu et al. 2003;Kaur et al. 2004;Miyazaki et al. 2010) and many other yeasts including S. cerevisiae (Bonilla et al. 2002) and C. albicans (Sanglard et al. 2003;Onyewu et al. 2003). This calcineurindependent "tolerance" to azoles also can be blocked by calcineurin inhibitors, such as FK506 and cyclosporine, thus converting fluconazole from a fungistat to a fungicide (Juvvadi et al. 2017).

Figure 4
Identification of fluconazole susceptibility genes in C. glabrata. (A) Pool Cg-1 was split into 4 portions and duplicates were regrown in the absence and presence of 128 mg/mL fluconazole as described in Methods. Libraries were prepared, sequenced, and mapped, and then read counts within the coding sequences of each gene were tabulated and averaged across the duplicates. Each dot represents one annotated gene. Key genes required for innate resistance to fluconazole are labeled (blue). Mitochondrial genes that cause significant fluconazole resistance when disrupted are indicated (red circles). Other genes mentioned in the text are highlighted. (B) A portion of the TCA cycle from S. cerevisiae and hypothetical inhibition of fluconazole resistance proteins by alpha-ketoglutarate accumulation, which could occur in kgd1-and kgd2-insertion mutants but not idh1-and idh2-insertion mutants. (C) Growth of pdr1Δ idh2Δ and pdr1Δ kgd2Δ double mutants, single mutants, and wild-type parental strain was measured after 20 hr incubation in SCD medium at 30°following a 2000-fold dilution from stationary phase pre-cultures. Each data point indicates the average of 3 technical replicates. Standard deviations were too small to be displayed.
Because the culture conditions involved high doses and long exposure times, we expect calcineurin-deficient mutants to exhibit significant loss of viability during the exposure to fluconazole and thus lower regrowth upon re-culturing in fresh medium lacking fluconazole. The hypersensitive mutants lacking Stv1 or Pdr1 did not exhibit significant cell death in response to fluconazole (Kaur et al. 2004), suggesting these proteins promote resistance mechanisms and not tolerance mechanisms. On the other hand, calcineurin promotes resistance of C. glabrata to the echinocandin-class antifungal caspofungin by inducing expression of the drug targets (Miyazaki et al. 2010;Singh-Babak et al. 2012). Calcineurin also promotes cell survival in serum and virulence in mouse models of disseminated candidiasis and ocular infection (Chen et al. 2012). Hermes insertion profiling in C. glabrata may shed new light on the mechanisms that regulate all these processes.

DISCUSSION
Efficient and comprehensive methods of functional genomics are critical to understand the principles and molecular mechanisms that distinguish pathogens such as C. glabrata from non-pathogenic model organisms such as S. cerevisiae. Here we develop the Hermes transposon from housefly as a powerful new tool for functional genomics research in C. glabrata. Unlike CRISPR/Cas9 methods, where mutations are user-guided and not directly sequenced, Hermes insertions in vivo are unguided, inexpensive to generate, and easy to profile directly, thus probing the functions of all regions of the genome without guidance. In spite of modest bias of Hermes insertions toward nTnnnnAn sequences and nucleosome-free regions, nearly all annotated genes of C. glabrata and S. cerevisiae can be disrupted if the pool size is large enough to achieve sufficient coverage. Interestingly, while transposons tend to reinsert close to the site from which they were launched, we observed a significant difference between C. glabrata and S. cerevisiae in the degree of this form of bias. By launching Hermes from a centromere-containing plasmid in both species, which would be clustered in the nucleus close to all the chromosomal centromeres, we observed a much stronger and lengthier bias of insertions near centromeres of C. glabrata relative to S. cerevisiae. This difference probably reflects some previously unappreciated difference in chromosome architecture or compaction, which potentially could be visualized using chromosome conformation capture and 3D representation (Duan et al. 2010).
Since their last common ancestor, C. glabrata has lost hundreds more ancestral genes than S. cerevisiae (Dujon et al. 2004), all of which are non-essential in S. cerevisiae except for BIR1. BIR1 is likely annotated incorrectly as a pseudogene in C. glabrata because the in-frame stop codon, a sequence conforming to a programmed +1 ribosomal frameshift, and the essential C-terminal domain are strongly conserved in all six species of the Nakaseomyces clade. Programmed +1 ribosomal frameshifting occurs broadly in eukaryotes and is utilized to regulate the expression of antizyme (Oaz1 in yeasts), thus coupling polyamine biosynthesis to cellular concentration of spermidine (Ivanov and Atkins 2007). While more research will be needed to establish the regulation of BIR1 expression through +1 ribosomal frameshifting, the implications in C. glabrata are noteworthy. BIR1 overexpression has been shown to increase virulence of Aspergillus fumigatus in mouse models of lung infection (Shlezinger et al. 2017), possibly even when truncated to lack the essential C-terminal domain. Overexpression of only the N-terminal portion of Bir1 accelerates the growth rate of wild-type S. cerevisiae in culture (Li et al. 2000). Perhaps more importantly, Bir1 assembles via its C-terminal domain into the chromosomal passenger complex, which is crucial for proper segregation of chromosomes in mitosis from yeast to human (Carmena et al. 2012). Thus, the regulated expression of Bir1 with or without its C-terminal domain could potentially alter virulence traits or the rate of chromosomal aneuploidy (Ravichandran et al. 2018), which have been previously associated with acquisition of fluconazole resistance (Marichal et al. 1997) and host infection by C. glabrata (Poláková et al. 2009). Deficiency of BIR1 function in S. cerevisiae also causes synthetic lethal interactions with several different kinetochore genes, perhaps explaining why these same genes have become essential in C. glabrata but not S. cerevisiae.
With the assistance of machine learning (Levitan et al. 2020), the essentialome of C. glabrata has been defined for the first time. While several kinetochore genes appeared to be essential in C. glabrata and not S. cerevisiae, 15 essential genes in S. cerevisiae involved in mRNA splicing were scored as non-essential in C. glabrata. This could reflect the diminished number of introns and intron-containing genes in C. glabrata relative to S. cerevisiae (Neuvéglise et al. 2011;Linde et al. 2015). However, other explanations cannot be ruled out without validation and further testing. In the 1-to-1 orthology group, 84% of essential genes in S. cerevisiae were also scored by machine learning as essential in C. glabrata. Even so, false positives occurred due to repetitive sequences and small gene sizes while false negatives may also occur if essential genes are located in regions of the genome that are sparsely inserted with transposons or if the essential gene products contain large non-essential domains at their 39 ends (e.g., PAN1). Calling of essential genes from transposon insertion datasets could be improved by also incorporating datasets from diploids, where very few heterozygous gene knockouts exhibit strong fitness defects (Sanchez et al. 2019;Levitan et al. 2020). In any case, this experimental approach to essentialome determination is complementary to more traditional approaches such as genome-wide knockout, silencing, and protein destabilization approaches.
Until now, methods for quantifying the complexity of the sequencing libraries and normalizing different experiments have been rudimentary. To improve the situation, we developed a midLC statistic that empirically quantifies the complexity of a library independent of jackpots (insertions that occur at early phases of culture and are over-represented) and the degree of over-sequencing (see Materials and Methods). The degree of library over-sequencing can be quantified relative to the midLC and then used to normalize datasets from different runs prior to analyses. Accurate measures of library complexity and jackpot frequency can help define bottlenecks in the protocols that would lower complexity and diminish the usable capacity of sequencing runs.
The power of transposon mutagenesis and insertion profiling was also illustrated through the identification of non-essential genes that become essential in other conditions. Original efforts made use of the bacterial Tn7 transposon that was first inserted into purified genomic DNA in vitro and then recombined into the C. glabrata genome (Kaur et al. 2004). Individual clones were then screened manually for resistance and hypersensitivity to fluconazole (Kaur et al. 2004) and caspofungin (Rosenwald et al. 2016). Now, much larger pools of insertion mutants can be generated and screened by combining in vivo transposition with deep sequencing technology. With this approach, we confirm large contributions of Upc2A, Pdr1, Pdr16, Pdr17, and Cdr1 to the innate fluconazole resistance of C. glabrata through their ability to increase fluconazole efflux or fluconazole target expression. We also report dozens of other genes whose products contribute to fluconazole resistance (e.g., Stv1, Kgd1-2 complex) and hypersensitivity (e.g., Ski2-3-8 complex, mitochondrial functions including Idh1-2). Our findings suggest that alphaketoglutarate biosynthesis by Idh1-2 in mitochondria naturally diminishes Pdr1 function in fluconazole resistance. Though these experiments with gene knockout mutants validate our findings with transposon insertions, more direct experiments will be necessary to determine precisely how alpha-ketoglutarate (or a related molecule) interacts with Pdr1. The conditions employed herein only revealed the genes with rather strong contributions to fluconazole susceptibility. With modifications to the experimental design, more genes with more subtle contributions will surely become apparent. By iterating the process in mutants lacking Pdr1 (or others), networks of gene function can be inferred.
The pools of Hermes-NATr insertions described here can immediately be utilized for exploring C. glabrata susceptibility to any number of clinical and experimental antifungals. Additionally, new pools can be generated in the extremely diverse sub-clades that define the C. glabrata species group (Carrete et al. 2018), enabling exploration of their vast phenotypic diversity. The pCU-MET3-Hermes launchpad developed here may even be useful for pool generation in other species of the Nakaseomyces clade, thereby enhancing our understanding of genomic and phenotypic plasticity that underlays the evolution of virulence in this group of emerging pathogens (Gabaldón and Carrete 2016).