Population genetics and microevolution of clinical Candida glabrata reveals recombinant sequence types and hyper-variation within mitochondrial genomes, virulence genes, and drug targets

Abstract Candida glabrata is the second most common etiological cause of worldwide systemic candidiasis in adult patients. Genome analysis of 68 isolates from 8 hospitals across Scotland, together with 83 global isolates, revealed insights into the population genetics and evolution of C. glabrata. Clinical isolates of C. glabrata from across Scotland are highly genetically diverse, including at least 19 separate sequence types that have been recovered previously in globally diverse locations, and 1 newly discovered sequence type. Several sequence types had evidence for ancestral recombination, suggesting transmission between distinct geographical regions has coincided with genetic exchange arising in new clades. Three isolates were missing MATα1, potentially representing a second mating type. Signatures of positive selection were identified in every sequence type including enrichment for epithelial adhesins thought to facilitate fungal adhesin to human epithelial cells. In patent microevolution was identified from 7 sets of recurrent cases of candidiasis, revealing an enrichment for nonsynonymous and frameshift indels in cell surface proteins. Microevolution within patients also affected epithelial adhesins genes, and several genes involved in drug resistance including the ergosterol synthesis gene ERG4 and the echinocandin target FKS1/2, the latter coinciding with a marked drop in fluconazole minimum inhibitory concentration. In addition to nuclear genome diversity, the C. glabrata mitochondrial genome was particularly diverse, with reduced conserved sequence and conserved protein-encoding genes in all nonreference ST15 isolates. Together, this study highlights the genetic diversity within the C. glabrata population that may impact virulence and drug resistance, and 2 major mechanisms generating this diversity: microevolution and genetic exchange/recombination.


Introduction
Candida is the most prominent genus of the Debaryomycetaceae family, with over 400 genetically and phenotypically diverse species currently described (Daniel et al. 2014).Many of these species are harmless commensals of the mucous membranes and digestive tracts of healthy individuals.Approximately 30 Candida species are of clinical importance in humans.Most of these species that are capable of causing disease in humans belong to the CTG-Serine clade, including Candida albicans, Candida dubliniensis, Candida tropicalis, Candida parapsilosis, Candida lusitaniae, Candida guilliermondii, and Candida auris, while others such as Candida glabrata and Candida bracarensis belong to the genetically distant Nakaseomyces clade (Daniel et al. 2014;Turner and Butler 2014).In adult patients, C. glabrata is the second most commonly isolated species after C. albicans, which together cause approximately 3 quarters of all systemic candidiasis (Perlroth et al. 2007;Rajendran et al. 2016a).Infections caused by these species range from mild vulvovaginal candidiasis (VVC or thrush) to severe, drug resistant and difficult to treat invasive infections affecting single organs or the blood stream (candidemia) with or without dissemination to the heart, brain, kidneys, and other parts of the body (Mullick et al. 2006).Bloodstream infections caused by Candida spp.are associated with mortality rates of 30-60% (Hirano et al. 2015;Lamoth et al. 2018).Candidemia is associated with diverse risk factors including neutropenia, chemotherapy, diabetes, old age, compromised immune function, prolonged antibiotic and steroid treatment, and intravenous catheters that can harbor fungal biofilms (Yapar 2014).Pathogenic Candida species including C. glabrata have also exhibited alarming increases in resistance against all major classes of antifungal drugs, hindering effective treatments and resulting in increasing mortality rates (Cowen et al. 2002;Odds et al. 2007;Pfaller et al. 2012;Rajendran et al. 2016a;Barber et al. 2019).
Candida glabrata typically grows in the yeast form and is considered to have evolved an infection strategy based on stealth and evasion without causing severe damage in murine models (Brunke and Hube 2013).This ability of C. glabrata and some of its relatives in the Nakaseomyces clade to infect humans is thought to have evolved recently (Gabaldo ´n et al. 2013), as several of its closest relatives have to date been exclusively isolated environmentally (Caenorhabditis castellii, Nakaseomyces baccilisporus, and Nakaseomyces delphensis) (Gabaldo ´n et al. 2013).Pathogenicity in the Nakaseomyces correlates with the number of epithelial adhesins (EPA) encoded in their genomes, which facilitate adherence and colonization of human epithelial cells (Cormack et al. 1999).In contrast to C. albicans, the pathogenicity of Nakaseomyces species does not coincide with number or presence of Phospholipase-B and Superoxide Dismutase genes (Ibrahim et al. 1995;Martchenko et al. 2004;Cafarchia et al. 2008;Bink et al. 2011).Many Candida genes involved in virulence are therefore likely to have diverse functions, some of which may not be conserved between distant clades.
Like many fungal pathogens, C. glabrata's niche(s) and life cycle are poorly understood.Candida glabrata is increasingly identified among clinical samples where it is responsible for an increasing proportion of cases of candidemia (Beck-Sagu e and Jarvis 1993; Rajendran et al. 2016a).Candida glabrata has also been identified environmentally, including as a component of the mycobiota of yellow-legged gulls (Al-Yasiri et al. 2016), in droppings and cloaca swabs of birds of prey, migratory birds and passeriformes (Cafarchia et al. 2008), and other potentially transitory sources including spontaneously fermenting coffee beans (de Melo Pereira et al. 2014).Concerted efforts for sampling are required to determine the true ecological distribution of C. glabrata, as they are for several other important Candida species such as C. auris (Arora et al. 2021).Furthermore, the relatedness of global isolates and their routes of transmission (either patient to patient, or between patient and the environment) requires further studies comparing genotypes to collected metadata including the location of isolation.
High levels of genetic heterogeneity have been identified in the C. glabrata population, as molecular methods have identified diverse strains, clades and sequence types (STs) both inter-and intra-nationally (Biswas et al. 2018;Carret e et al. 2018).As of January 2022, the multi locus sequence type (MLST) database for C. glabrata included 233 STs from 1,414 isolates from 29 countries, based on the sequence identity for 6 genetic loci (Gabaldo ´n et al. 2020).All isolates of C. glabrata reported to date have been haploid, with occasional aneuploids e.g.transitory disomies of chromosomes E and G (Carret e et al. 2018).
Genetic heterogeneity within many fungal populations is shaped by their ability to switch between clonal and sexual recombination (Drenth et al. 2019).The ability for C. glabrata to undergo a sexual cycle remains unknown, with all reported attempts in the laboratory to encourage mating thus far unsuccessful.Candida glabrata has therefore been regarded as an asexual species, despite the presence of well-conserved mating loci (Wong et al. 2003;Gabaldo ´n and Fairhead 2019) and 14 examples of phylogenetic incompatibilities from multi locus sequencing (Dodgson et al. 2005).More recent genomic analysis from 34 globally isolated C. glabrata strains revealed evidence of population admixture, suggesting a thus far undiscovered sexual cycle (Carret e et al. 2018).Greater sampling efforts and genomic analyses are therefore required to fully explore signatures of adaptation, virulence, and recombination.
In this study, we explore the population genetics and microevolution of C. glabrata using comparative genomic analysis of 68 clinical C. glabrata isolates from 8 hospitals across Scotland, combined with 83 publicly available and globally isolated genomes, finding evidence of recombinant STs, hypervariable mitochondrial genomes, as well as variation in virulence genes and drug targets between STs and between serial isolates from prolonged or recurrent infection.

Library preparation, sequencing, and antifungal tests
Candida glabrata was collected from blood in 2012 from 8 hospitals in Scotland (Supplementary Table 1).These isolates were collected as part of a retrospective study of all cases of Candida blood stream infections carried out within Scotland under NHS Caldicott Guardian approval from March 2012 to February 2013, as described previously (Rajendran et al. 2016a(Rajendran et al. , 2016b)).
Genomic and mitochondrial DNA was extracted from 68 isolates using the QIAamp DNA mini extraction kit (Qiagen) according to the manufacturer's instructions.A small modification was made prior to extraction which was to mechanical disrupt the yeast.This was achieved by bead-beating the cells with sterile acid-washed 0.5 mm diameter glass beads (Thistle Scientific) for 3 Â 30 s.Following isolation and extraction using the QIAamp columns the DNA was eluted into elution buffer before storage at À20 C and transport to the sequencing facility.
Library preparation was performed by the Centre for Genome-Enabled Biology and Medicine at the University of Aberdeen.Briefly, gDNA quality was assessed on a Tapestation 4200 with a high sensitivity genomic DNA tape (Agilent) and quantified by fluorimetry using Quant-IT dsDNA high-sensitivity (HS) assay (Thermo Fisher).Dual indexed Illumina libraries were prepared from 1 ng purified gDNA using an Illumina Nextera XT DNA library preparation kit and Nextera XT v2 indices, which were purified from free adapters using AMPure XP beads (Beckman Coulter).Libraries were quantified using Quant-IT dsDNA HS assay (Thermo Fisher) and average fragment size was calculated on Tapestation 4200 (Agilent), then equimolar pooled at 10 nM.Concentration of the pool was verified by qPCR (Kapa library quantification kit, Roche) on QuantStudio 6 using SYBR green, and 1.8 pM of the library pool was sequenced on an Illumina NextSeq500 with 150 bp paired-end reads and 8 bp index reads to average alignment depths of 41.9X (Supplementary
To identify aneuploid chromosomes, depth of coverage was calculated for each of 206 fungal samples.Sorted BAM files prepared in the pre-processing phase of SNP calling were passed to Samtools v.1.2(Li et al. 2009) and mpileup files were generated.Read depth was normalized by total alignment depth and plotted against the location in the genome using 10 kb nonoverlapping sliding windows.To identify structural variation, assembly de novo was achieved using Spades v3.12 (Bankevich et al. 2012) using default parameters.

Phylogenetic and population genetic analysis
To construct species-specific phylogenetic trees, all sites that were either a homozygous reference or SNP in every isolate were identified using ECATools (https://github.com/rhysf/ECATools)and concatenated into a FASTA file.Our rooted tree C. bracarensis included 1,198 phylogenetically informative sites, while the unrooted C. glabrata tree included 34,980 phylogenetically informative sites.Phylogenetic trees were constructed with RAxML PThreads v.7.7.8 (Stamatakis 2006) using the general-timereversible model and CAT rate approximation with 100 bootstrap support, both with rooting to C. bracarensis AGP (Correia et al. 2006) or midpoint rooting without C. bracarensis.We constructed a tree using the same models with 1000 bootstrap support for all Saccharomycetaceae species that had a genome assembly in NCBI or JGI Mycocosm.We also constructed neighbor-joining trees using PAUP v4.0b10 (Swofford n.d.) and a NeighborNet Network with SplitsTree v.4.15.1 (Huson 1998).Trees were visualized using FigTree v. 1.4.4 (http://tree.bio.ed.ac.uk/software/fig tree/).
We applied Weir's estimator (Hudson and Kaplan 1985) of Wright's Fixation Index (F ST ) according to the equations given in Multilocus 1.3 (Agapow and Burt 2001) using nonoverlapping sliding windows.The scripts have been made available online (https://github.com/rhysf/FSTwindows).

Selection and microevolutionary analysis
The direction and magnitude of natural selection for each ST were assessed by measuring the rates of nonsynonymous substitution (dN), synonymous substitution (dS), and omega (x ¼ dN/dS) using the yn00 program of PAML (Yang 2007), which implements the Yang and Nielsen method, taking into account codon bias (Yang and Nielsen 2000).Further GC corrections were not applied.The program was run on every gene in each isolate using the standard nuclear code translation table.To examine the functional significance of genes with x > 1, we evaluated their Pfam domains and gene ontology (GO) terms for statistical enrichment (genes with x > 1 vs, the remaining genes) using the 2-tailed Fisher exact test with Storey false discovery rate (FDR)-corrected P-values (q) of < 0.05.GO terms were acquired using Blast2Go v6.0.1 (Conesa et al. 2005, p. 2) using Blastp-fast to the NCBI BLAST nr-database (E-value < 1E-5).
Genes of interest were defined including both FKS and 12 ERG pathway genes, as well as all genes listed in Table 1 of Weig et al. (2004), which included adhesins including EPA genes, aspartic proteases, phospholipases, cell wall biogenesis, structural wall proteins, regulatory, efflux pumps.This gene list was then screened for genes with either signature of positive selection or those undergoing microevolutionary changes (nonsynonymous and frameshift indels).

Recombinant sequence types in Scottish clinical samples
Clinical isolates of C. glabrata from across Scotland are highly genetically diverse.Using whole-genome sequencing, we analyzed the genomes for 68 isolates of C. glabrata from 47 separate patients across 8 Scottish hospitals, generating the largest panel of C. glabrata genome sequences to date.These 68 isolates belonged to 20 separate sequence-types (STs) of C. glabrata, which represent genetically related sub-populations based on alleles from 6 loci/genes.One isolate (CG57 from a single patient in Forth Valley Royal Hospital) belonged to a new ST that has not been previously identified anywhere else (ST204) (Fig. 1, Supplementary Tables 1 and 2).Variant calling using the diploid model of GATK found few examples of heterozygosity (<0.41 per kb for every isolate) suggesting all isolates were haploid (Supplementary Table 3).Our panel of C. glabrata isolates was supplemented with a further 83 genomes from 3 recent studies of global C. glabrata isolates (Biswas et al. 2018;Carret e et al. 2018;Xu et al. 2020) Our WGS-based calculation of C. glabrata p was the highest of any species in the Saccharomycetaceae that had both an available genome assembly and a calculation of p (albeit those are based on ITS sequences and fewer strains than we had) (Irinyi et al. 2015) (Fig. 3a).However, C. glabrata genetic diversity was typical among the Saccharomycotina (mean/x ¼ 0.0055, median ¼ 0.0039, standard deviaton/r ¼ 0.0055) (Fig. 3b).Nucleotide diversity within the population was present across the nuclear genome (Fig. 3c), with window length having some impact on the result [smaller window lengths (5 kb) resulted in higher average p in approximately half of the genome: chromosomes A-F, M, and H].Most of the allelic diversity across the 151 C. glabrata isolates came from the nuclear genome (min.¼ 0.09 SNPs/kb, max.¼ 6.54 SNPs/kb, x ¼ 5.55 SNPs/kb) compared with the mitochondrial genome (min.¼ 0.05 SNPs/kb, max.¼ 3.64 SNPs/kb, x ¼ 1.21 SNPs/kb).Indeed, a significant difference between nuclear SNPs/kb and mitochondrial SNPs/kb was found using a 2-tailed t-test for all 151 genomes (P ¼ 5.6147E-111).
Several C. glabrata STs had evidence of genetic recombination.Our neighbor-net network tree of all isolates suggested historic gene-flow between several STs including for example ST7, ST55, and ST162 (Fig. 2).Genomic regions with low Wright's fixation index (F ST ) values, consistent with genetic exchange, were also identified from pairwise comparisons (n ¼ 435) across 5-and 10kb nonoverlapping windows of all STs (Supplementary Fig. 2).F ST values calculated from 5-kb windows were slightly lower than those calculated from 10-kb windows (averaging À0.046 for each ST pairwise comparison), indicating that window length impacts this measure of genetic variation.Twelve pairwise comparisons from 10-kb windows had F ST < 0.9 across the genome (Supplementary Fig. 3), with the lowest for ST18 and ST26 (F ST ¼ 0.64).In addition, ST7, ST55, and ST162 had lower F ST values across the genome (F ST ¼ 0.65-0.83)than other pairwise comparisons demonstrating incompatible phylogenetic signals between these STs (Supplementary Fig. 2).
PCA of whole-genome SNPs revealed little evidence of clustering of several C. glabrata STs, which is consistent with gene flow between them (Fig. 4a).For unsupervised model-based clustering with ADMIXTURE, we first identified that K ¼ 20 had the lowest cross-validation error (Fig. 4b), and was therefore used for subsequent analysis.Two isolates were consistently (6 independent Admixture runs) found to have evidence for mixed ancestry: ST177 CG1 and our newly discovered ST204 CG57 (Fig. 4c, Supplementary Fig. 4).Other isolates were found to have Mating types and mating-type switching are poorly understood in C. glabrata, although it is thought that Mating-type regulatory protein a2 is expressed in all MTLa strains and not in MTLa strains (Srikantha et al. 2003;Butler et al. 2004).MATa2 (CAGL0B01265g) was present in all Scottish isolates (breadth of coverage; BOC > 87%).However, MATa1 appeared to be absent or partially absent in 3/9 ST6 isolates (CG12 ¼ 16% BOC, CG121 ¼ 18% BOC, CG42 ¼ 12.5% BOC), while present in the remaining 6 ST6 isolates, and all the other STs (BOC 100%).The functional relevance of MATa1 deletion or truncation is unclear but may be a hallmark of the rarer of the 2 mating types.

Hypervariable mitochondrial genomes among sequence types
F ST analysis highlighted the mitochondrial genome of C. glabrata as hyper-variable (Supplementary Table 2, Supplementary Fig. 6).Forty-three genes were identified in !10 pairwise F ST comparisons, including all 11 mitochondrial protein-encoding genes.To explain this enrichment of low F ST mitochondrial genes, we studied the 151 genome alignments.While the nuclear genome had 97.3-99.4% BOC, the mitochondrial genome had 20.4-99.9%BOC, with 42% of isolates (n ¼ 63/151) containing >10% ambiguous mitochondrial bases (2 kb) (Supplementary Table 2) (here, we define ambiguous as sites with too few reads aligning to be called by GATK, or reads that cannot be called by GATK due to not passing variant filtration).Surprisingly, a pattern of low and/or patchy read coverage was identified in every isolate including the ST15 reference isolate CBS138 (Supplementary Fig. 6), indicating that the reference mitochondrial sequence assembly (Koszul et al. 2003) may have a high error rate, and given additional differences identified in nonreference isolates, that C. glabrata mitochondrial genomes are highly heterogenous.
The mitochondrial genome for some C. glabrata isolates appears reduced in size and encodes fewer protein-encoding Fig. 2. A NeighborNet network using SplitsTree, with ST labels replacing isolate names at the nodes.Green ¼ found in Scotland, purple ¼ not found in genes (Fig. 5).As many as 22/37 (59%) mitochondrial encoded genes were entirely absent in at least 1 isolate, including Cg1, Cg1II, and Cg1III (putative endonucleases of exons and introns in the mitochondrial COX1 gene), ATP8 and ATP9 (subunits 8 and 9 of the enzyme complex required for ATP synthesis), RPM1/RPR1 (RNA component of mitochondrial RNAse P), VAR1 (putative mitochondrial ribosomal protein of the small subunit,) and most of the tRNA genes (15/23).Nine separate STs had absent mitochondrial genes.Normalized depth of coverage was variable across the genes, with < 1 average normalized depth across all isolates for Cg1, Cg1II, and Cg1III, ATP8, RPM1, VAR1, and tRNA-Met1.While nonuniform coverage in terms of depth and breadth was found across all mitochondrial genomes belonging to all datasets, our newly sequenced isolates have the lowest mean breadth across mitochondrial genes (x ¼ 92.18, r ¼ 20.We used assembly de novo to further explore the mitochondrial sequence for isolate WM03.450 (ST83), which had the greatest number of ambiguous bases across its mitochondrial genome (16 kb/80%).Our WM03.450Illumina-based assembly (12.9 Mb; N.contigs ¼ $3 thousand; N 50 ¼ 85 kb) is 562 kb longer than the CBS138 reference sequence, indicating substantial genomic differences between these isolates and STs.Aligning our assembly to the reference CBS138 mitochondrial genome using Blastn identified 10 contig matches with a combined alignment length of only 1.9 kb (mean 157 nt per contig), suggesting the low alignment is not due to conserved nucleotide sequences that have undergone large rearrangements.Aligning the assembly to the 11 mitochondrial protein sequences using Blastx identified only 6/11 genes across 6 separate contigs, 4 of which were <364 nt length, and 2 that are 10.4 and 81.5 kb.Conversely, assembly de novo and Blastx of our Illumina reads for the reference isolate CBS138 against the published CBS138 genome identified all 11 mitochondrial genes present on 4 contigs, with 18.9 kb total sequence length, of which Blastn aligned 9.3 kb to the published mitochondrial assembly.Together, these analyses suggest that whole gene deletions in the C. glabrata mitochondria are common.

Signatures of selection identified among sequence types
In contrast to the C. glabrata mitochondrial genome, we found that gene deletions in the nuclear genome are rare.Indeed, fewer than 6 presence/absence (P/A) polymorphisms (strictly defined as zero reads aligning) were identified per isolate ($0.1% of 5,210 protein-encoding genes) (Supplementary Table 4).Of these, 2 consecutive nuclear-encoded genes (CAGL0A02255g and CAGL0A02277g) on chromosome A were entirely absent of read coverage in 25 out of the 68 Scottish isolates (37%), which included all representatives of 11 separate STs (ST4, 7, 8, 24, 25, 55, 67, 83, 177, and our newly described 204).These STs do not cluster phylogenetically, ruling out a single evolutionary event causing this deletion.The 2 genes have identical nucleotide sequences and encode the same amino acid sequence, which is conserved across a range of other Ascomycota species, as well as having sequence similarity to the K62 Killer Preprotoxin protein encoded by the Saccharomyces paradoxus L-A virus M62 satellite (BLASTp E-value ¼ 1e-36), suggesting a possible viral origin.CAGL0F09273g is a separate, putative adhesin-like protein (adhesin cluster V) with a "hyphally regulated cell wall protein N-terminal" PFAM that is lost in 11 isolates including all ST4 (CG68A and CG77), 4 ST7 (CG157, CG48A, CG48F, and CG78), 3 ST8 (CG127, CG52, and CG82), ST19 CG119, ST24 CG166, and ST147 CG133.Again, this gene must have been lost multiple times, given its presence in several ST7 and ST8 isolates.This gene is the last gene on chromosome F, has an p ¼ 0.00244, which is lower than the overall average across the genome, and has previously been reported to undergo CCNV within serial clinical isolates (Carret e et al. 2019), suggesting it is able to undergo variation within microevolutionary timescales, which may impact the adhesive ability of these C. glabrata isolates.
Between 61 and 85 genes with a signature of positive selection [dN/dS ¼ x, and x > 1 (Kryazhimskiy and Plotkin 2008)] were found in each ST apart from the reference ST (ST15 CG151), for which only a single gene with x > 1 (x ¼ 1.0019) was identified (Supplementary Table 5).Apart from the reference ST, STs had between 4 and 14 genes with x > 2, showing stronger signatures of diversifying or positive selection.Of the 2,083 total genes with x > 1 across all clades, 608 were unique genes (11.6% of all genes) i.e. they had this signature in multiple clades, owing to either ancestry or selection acting on the same gene families.To explore selection, we took an unbiased approach using PFAM and GOterm enrichment comparing the numbers of each term in those 608 genes compared with the remaining nonselected genes, as well as a targeted approach for genes of interest (see Materials and methods: Selection and microevolutionary analysis) including adhesins, proteases, efflux pumps, FKS, and ERG pathway genes.
Genes with signatures of positive selection within the C. glabrata population targets diverse genes and gene functions.Our unbiased approach for enrichment of functional domains in 608 gene products with signatures of positive selection identified only 3 significantly enriched [2-tailed Fisher exact test with FDR-corrected P-values (q) of < 0.05] PFAM domains and 16 GO terms (Table 1).The enriched PFAM domains were (1) Flocculin repeat (PF00624.20;q ¼ 7.42E-17), (2) GLEYA domain (PF10528.11;q ¼ 0.01), and (3) Armadillo repeat (PF00514.25;q ¼ 0.02).Flocculin is a sub-telomeric gene family involved in flocculation or cell aggregation in S. cerevisiae (Teunissen and Steensma 1995), while GLEYA domains are present in C. glabrata EPA proteins.Thirty Flocculin PFAM domains were assigned to only 6 genes in C. glabrata, 2 of which have x > 1: CAGL0I07293g and CAGL0I00220g, and together account for 23/30 Flocculin repeat PFAMs.Enriched GO-terms covered a range of possible biological functions Our targeted approach highlighted 21/129 genes of interest that have x > 1, with at least 1 found in every ST apart from the reference ST15 and ST46 (Supplementary Table 6).Notably, none of the aspartic proteases, phospholipases, cell wall biogenesis, efflux pumps, ergosterol biosynthesis pathway genes or FKS genes were found to have hallmarks of positive selection, implying these are conserved within the population.Several genes with x > 1 were found in multiple STs, including adhesive protein CAGL0J01727g (adhesin cluster VI) that is under positive selection in 7 STs (18, 26, 36, 45, 147, 177, and 204) and adhesive protein CAGL0I07293g (adhesin cluster V) under positive selection in 7 mostly distinct STs (3,8,25,83,123,136,and 177).C. glabrata encodes 17 putative adhesive proteins without N-terminal signal peptides, casting doubt on their role in adhesin.One of these is a

Candida glabrata nosocomial in-patient microevolution targets pathogenicity factors and drug targets
Our Scottish C. glabrata panel included 7 sets of between 2 and 9 isolates from recurrent cases of candidiasis.To explore the microevolution of C. glabrata within a human host, and the effects of antifungal treatment (fluconazole, nystatin, and posaconazole) on fungal genetics, we documented all genetic changes between serial isolates (Table 2, Supplementary Table 7).Although exact dates of isolation were not documented, phylogenetic analysis (Supplementary Fig. 1, Fig. 6) confirmed these serial isolates were highly related, with between 64 and 140 mutations (1.13468 Â 10 À5 per base pair) identified between pairs of serial isolates (Fig. 6).While the mutation rate or generation time for C. glabrata is not known (Carret e et al. 2019), this small number of mutations likely suggests recent clonal origins appropriate for We documented 1,995 mutations between all serial isolates, which were either in protein-coding regions (Coding) or Intergenic and intron regions (noncoding).
Mutations identified between serial isolates were mostly in protein-coding sequence (CDS) regions (between 53 and 127 mutations per pairs of serial isolates, collectively adding up to 1,741/1,995 total mutations ¼ 87%), despite protein-coding regions taking up only 7.9/12.3Mb (64%) (Fig. 6, b and c).The remaining serial mutations were within intergenic regions (236 mutations; 12%) and intronic regions (18 mutations; 1%).Intronic regions had the highest count of serial mutations after accounting for the total sequence in introns (Fig. 6c), albeit with 3 found per pair of serial isolates.Hypergeometric tests revealed that the number of mutations in coding sequence compared with noncoding sequence was higher than expected by chance, suggesting a highly significant enrichment of mutations in protein-coding genes (P ¼ 3e-120).
Several genes of interest (see Materials and methods: Selection and microevolutionary analysis) had microevolutionary changes (n ¼ 29/129) (Supplementary Table 8).Notably, 1 of the 2 newly acquired nonsense mutations was identified in FKS2 (Case 6 J-K), coinciding with a substantial drop in fluconazole MIC (Table 3).The other was in the uncharacterized CAGL0K04631g at an earlier time point in the same patient (Case 6 D-E).
The longer 42 nt deletion from Case 6 (A-B) reverts back to reference in Case 6 (B-C), suggesting either (1) a nondescendent isolate (intra-host variation), (2) a false-negative reference in 6C or (3) a false-positive deletion in 6A.The same 42 nt deletion, along with a new insertion at the site of the previous synonymous mutation appears in Case 6 (C-D), thereby suggesting the variant is real and option c less likely.That 42 nt deletion reverts back to reference in Case 6 (D-E), and appears again in Case 6 (E-H).By Case 6 (H-I), the gene has a new synonymous mutation, and in Case 6 (I-J) it has accumulated a new 15 nt deletion.EPA3 is therefore a hot-spot of variation.Another EPA gene that accumulated a large number of mutations was AWP12, which accumulated 5 nonsynonymous mutations and 1 synonymous mutation (Case 6H-I).
Other genes that had accumulated mutations between serial isolates included those encoding an aspartic protease YPS5, several structural wall proteins belonging to the Srp1/Tip family, regulatory protein PDR1, the ergosterol synthesis gene ERG4 (a nonsynonymous mutation in Case 3A-B), and both FKS1 and FKS2.Therefore, C. glabrata genes that are antifungal targets and gene families involved in drug-resistance and pathogenicity can therefore undergo rapid mutation within a human host.

Discussion
In this study, we sequenced and analyzed the largest panel of C. glabrata genomes to date.These isolates were collected from blood-stream infections of patients at several Scottish hospitals in 2012.Our 68 genomes were analyzed alongside 83 further publicly available and globally isolated genomes (Biswas et al. 2018;Carret e et al. 2019;Xu et al. 2020), revealing greater genetic diversity than previously recognized, including a nucleotide diversity of 0.00665, which is much higher than has been calculated for the distantly related C. albicans at 0.00298 (Irinyi et al. 2015).Surprisingly, we found that only one of our Scottish isolates had evidence of aneuploidy, despite many having been treated with  2019).Chromosome C in CG46 had lower depth of coverage compared with the rest of the genome, perhaps due to chromosome loss in a subset of cells.The patient that CG46 was isolated from was initially treated with Fluconazole.Following resistance to Fluconazole being detected, the patient was subsequently treated with Caspofungin, suggesting a potential link between those antifungal treatments and the observed aneuploidy.We found that the mitochondrial genome of C. glabrata was hyper-diverse compared with its nuclear genome for many isolates, including several long deletions spanning one or more genes, with the potential to impact many important biological functions including drug resistance and persistence (Garcia-Rubio et al. 2021).High levels of variation in mitochondrial genomes within the major fungal phyla have previously been noted in terms of gene order, genome size, composition of intergenic regions, presence of repeats, introns, associated ORFs, and evidence for mitochondrial recombination in all fungal phyla (Aguileta et al. 2014).This variation is lacking in Metazoa (Aguileta et al. 2014).Our results suggest some of these types of mitochondrial genetic diversity are likely present within the C. glabrata population.
Isolates in this study belonged to 29 separate STs of C. glabrata, each of which was separated by large number of variants.However, as many as 193 MLST STs have been documented (Jolley et al. 2018).Therefore, it is likely that the true genetic diversity of C. glabrata is much higher than we have been able to calculate with whole-genome sequences (albeit the largest panel studied to date).Indeed, several further STs may yield further evidence of recombination or lack of, and may ultimately require a new effort to group STs into lineages (also dependent on the frequency of recombination that erode these divisions).The genetic diversity of C. glabrata in hospitals around Scotland is extremely high, with representatives from 20 STs.Such high genetic diversity (and many of the same STs) have also been found from genome sequencing and phylogenetic analysis of isolates collected in other countries such as Australia (Biswas et al. 2018), suggesting they must have been transported across or between continents, perhaps by anthropogenic or even natural means [for example its association with birds (Cafarchia et al. 2008;Al-Yasiri et al. 2016) and food (de Melo Pereira et al. 2014)].Greater sampling and genotyping of clinical and environmental isolates will be required for understanding ancestry or endemicity.
Candida glabrata has long been regarded as a haploid asexual yeast, although evidence has recently emerged of a cryptic sexual cycle (Wong et al. 2003;Dodgson et al. 2005; Gabaldo ´n and Fairhead 2019).Our genome sequencing and population genetics supports this work, revealing compelling evidence that at least 12 STs stem from recombination between other STs.However, further work remains to document and describe individual recombinant isolates.Providing genetic recombination between isolates is naturally occurring, the mechanisms of genetic exchange are also unknown, although likely relate to the conserved matingtype locus, which play a central role in the sexual cycle of diverse fungi (Fraser and Heitman 2003).Here, we show that the MATa1 gene was absent or partially absent in 3 isolates belonging to ST6, which could potentially impact or even be a hallmark of a rarer second mating type of C. glabrata.Together, genetic recombination among C. glabrata isolates appears much more common than previously recognized, and likely contributes to increased genetic diversity.
The nuclear genome for isolates belonging to every ST (apart from the reference ST15 that was included as a control) included evidence of positive or diversifying selection.Signatures of positive selection were found enriched in genes with diverse functions, including several with repeat domains, as well as EPA genes with GLEYA domains.EPA genes are a large sub-telomeric family of virulence-related surface glycoprotein-encoding genes encoded by several other pathogens including Plasmodium, Trypanosoma, and Pneumocystis (De Las Peñas et al. 2003).Such gene differences between STs of C. glabrata may result in clinically relevant phenotypic differences.
In host microevolutionary changes between serial isolates were enriched within coding-sequences, which is a surprise, given the expectation for intergenic regions to be more permissive to mutations due to relaxed selection within intergenic regions and purifying selection within coding sequence.The reason for this abundance of serial mutations in coding sequence is unclear, although it could potentially be technical (e.g.false-negative variants within repetitive sequences) or biological (e.g.drug exposure and host immune pressure).Alternatively, enriched mutations in genes could potentially be driven by processes such as DNA polymerase induced mutations, or differences in chromatin states (e.g.heterochromatin could lead to increased exposure to DNA damaging agents resulting in higher mutation rates, or conversely, greater surveillance and correction of mutations in euchromatin regions by cellular DNA repair enzymes, Makova and Hardison 2015).
Selection may explain why we identified similar numbers of nonsynonymous mutations to synonymous mutations, given random mutations are expected to be nonsynonymous in $2/3 nucleotides of each codon.Furthermore, accumulations of deleterious mutations could be occurring in the serial isolates due to small population sizes, although population size estimates could not be calculated accurately from the metadata.
Genes with GLEYA domains including EPA genes were significantly enriched for frameshift and nonsynonymous mutations in the coding sequence between serial isolates.Combined with our finding of positive selection in EPA genes across STs, suggests that EPA genes are undergoing variation at both longer time frames and microevolutionary time-scales.
Genes encoding several important drug targets also underwent mutations between serial isolates, including a nonsynonymous mutation in the ergosterol biosynthesis pathway gene ERG4, and a nonsense mutation in the 1,3-b-glucan synthase component FKS2 [mutations in these genes can confer resistance to azoles (Feng et al. 2017) and echinocandins (Katiyar et al. 2012, p. 1), respectively].Indeed, the nonsense mutation in FKS2 coincided with a marked drop in fluconazole MIC for isolate CG93K, suggesting a possible link.
Our study highlights the need for further sampling and genomic analysis of C. glabrata in order to better inform the population structure and mechanisms underlying its increasing emergence, pathogenicity and multi-drug resistance.While we have largely focused on differences among the conserved regions of the C. glabrata ST15 CBS138 genome using an alignment-based strategy, our discoveries of a hyper-diverse mitochondrial sequence highlight the value for future long-read sequencing and assemblies to characterize the pan-genomes of C. glabrata and structural genomic diversity that exists among and perhaps within STs, and to explore the mechanisms driving those changes.Furthermore, given the genetic diversity between STs that we document, it would likely be valuable to sequence and assemble additional high-quality reference sequences for the purposes of increasing variant-calling accuracy and quantifying gene content between different STs.Given the low and patchy alignment depth across the ST15 CBS138 mitochondrial sequence for that same isolate, a review and update for the published CBS138 mitochondrial genome is likely required as well.Indeed, high ($0.5-1%)frequencies of structural variation in the nuclear genomes of C. glabrata isolates was recently found using de novo assemblies from long single-molecule real-time reads (Xu et al. 2021).
The rapidity that C. glabrata can mutate important genes and gene families, both via microevolution and putative recombination highlights an obstacle for future drug-development, given that individual gene targets are able to mutate within short time spans, and substantial diversity already present between STs.In addition, the epidemiology of C. glabrata is poorly understood.Future sampling and genomic comparison studies are necessary to identify the routes and mechanisms of its spread and evolution.
, as well as sequences from the outgroup C. bracarensis (Correia et al. 2006), which has also been identified from clinical settings and is the closest known relative of C. glabrata (Gabaldo ´n et al. 2013).Phylogenetic analysis of our Scottish collection along with the worldwide C. glabrata isolates revealed high genetic diversity among the 29 separate STs represented by our combined panel (Fig. 2, Supplementary Fig. 1).Allelic diversity among C. glabrata isolates [mean nucleotide diversity (p) ¼ 0.00665, r ¼ 0.047 was higher than previously reported [p ¼ 0.00485 based on the N. Helmstetter et al. | 3 Internal Transcribed Spacer (ITS) of 29 strains (Irinyi et al. 2015)].

Fig. 1 .
Fig. 1.Candida glabrata isolates were collected across 8 health boards across Scotland in 2012, belonging to 20 separate sequence types, including the newly described ST204.Duplicate isolates stemming from the same patient at different time points have been excluded.

Fig. 3 .
Fig. 3. a) A RAxML phylogenetic tree with 1000 bootstrap support of all Saccharomycetaceae species that had a genome assembly in NCBI or JGI Mycocosm and nucleotide diversity (p).Note: C. glabrata is calculated from whole-genome sequence data presented in this study, while the other species are based on ITS sequences only (Irinyi et al. 2015).b) p (based on ITS sequences only) for all Saccharomycotina and non-Saccharomycotina that are listed in the ISHAM ITS reference DNA barcoding database (Irinyi et al. 2015).c) nonoverlapping 5, 10, and 20 kb windows of xp (p for all sites in the genome divided by window length).

Fig. 4 .
Fig. 4. Population genetics of C. glabrata ST. a) PCA of whole-genome SNPs using SmartPCA revealed little evidence of sub-clustering among STs (isolates are calculated and plotted individually, but labeled by their ST alone for clarity).SmartPCA failed to calculate the eigenvalues for some isolates including those belonging to ST4. b) The CV error from running unsupervised ADMIXTURE for variant-sites across the C. glabrata population, testing K-values between 1 and 35.K ¼ 20 provided the lowest CV error.c) ADMIXTURE plot for all isolates using K ¼ 20, revealing several isolates with evidence of mixed ancestry.Isolates are ordered according to the neighbor-joining tree constructed with PAUP in Supplementary Fig. 1.

Fig. 5 .
Fig. 5. Breadth of coverage and depth of coverage across each of the 37 mitochondrial encoded genes for all 151 isolates compared in this study (Each point represents an isolate).a) Breadth of coverage as a % across each gene.b) The normalized depth of coverage for each gene (total read depth for each gene/total read depth across both nuclear and mitochondrial genomes).c) Breadth of coverage as a % across each gene, categorized by sequence types (ST)s.d) Normalized depth of coverage for each gene, categorized by ST.

Fig. 6 .
Fig. 6.Microevolutionary changes across 7 sets of C. glabrata isolates.a) a RAxML phylogenetic tree of the serial isolates using the general-timereversible model and CAT rate approximation with 100 bootstrap support.Branch lengths indicate the mean number of changes per site.b) The number of serial mutations total (All), those within protein-coding sequence (CDS), intergenic and intronic regions.c) Those same serial mutations per kb [calculated as the count of serial mutations divided by the total length of the feature (where All ¼ whole genome) and multiplied by 1,000].d) Serial mutations within CDS categorized by their effect on the sequencing: Insertion/Deletion (Indel), synonymous mutation (Syn.), nonsynoynmous mutation (Non.Syn.), and nonsense mutation.e) Those same serial mutations within CDS per kb.

Table 2
).This data was supplemented with paired-end illumina reads from Carret e et al. (2018), and isolate CBS138 from Xu et al. (2020).Minimum inhibitory concentration (MIC) tests for fluconazole were performed at the Mycology Reference Laboratory, Public Health England, Bristol, according to standard Clinical and Laboratory Standards Institute (CLSI) broth microdilution M27 guidelines (Alexander and Clinical and Laboratory Standards Institute 2017).

Table 2
Summary of microevolution across 7 sets of between 2 and 9 C. glabrata isolates from recurrent cases of candidiasis.

Table 3
MIC values of fluconazole for each of the serial isolates.