Generating genomic platforms to study Candida albicans pathogenesis

Abstract The advent of the genomic era has made elucidating gene function on a large scale a pressing challenge. ORFeome collections, whereby almost all ORFs of a given species are cloned and can be subsequently leveraged in multiple functional genomic approaches, represent valuable resources toward this endeavor. Here we provide novel, genome-scale tools for the study of Candida albicans, a commensal yeast that is also responsible for frequent superficial and disseminated infections in humans. We have generated an ORFeome collection composed of 5099 ORFs cloned in a Gateway™ donor vector, representing 83% of the currently annotated coding sequences of C. albicans. Sequencing data of the cloned ORFs are available in the CandidaOrfDB database at http://candidaorfeome.eu. We also engineered 49 expression vectors with a choice of promoters, tags and selection markers and demonstrated their applicability to the study of target ORFs transferred from the C. albicans ORFeome. In addition, the use of the ORFeome in the detection of protein–protein interaction was demonstrated. Mating-compatible strains as well as Gateway™-compatible two-hybrid vectors were engineered, validated and used in a proof of concept experiment. These unique and valuable resources should greatly facilitate future functional studies in C. albicans and the elucidation of mechanisms that underlie its pathogenicity.


INTRODUCTION
Over the last decade, there has been an exponential growth in the quantity of available genome sequence data due to the very rapid progress in sequencing technology. In 2004, the genome sequence of the human fungal pathogen Candida albicans was released as Assembly 19 (1). With the challenge of working with a heterozygous diploid organism, new computational methods had to be developed and resulted in the release in 2013 of Assembly 22, an assembly of a completely phased diploid genome sequence for the standard C. albicans reference strain SC5314 (2). This opened new perspectives to understand the genetic basis and functional mechanisms that underlie pathogenesis and evolution in this organism.
Although C. albicans gene sequences have been available to the community for more than a decade (3,4), only 1670 out of the 6198 predicted protein-coding genes have been characterized as of 31 January 31, 2018 according to the Candida Genome Database (5). Nowadays, the growing availability of whole-genome datasets has encouraged a shift towards the development of functional genomics and systems biology approaches, enabling analysis of highthroughput whole-genome assays to better understand biological networks. In this context, major efforts have been made to generate large-scale deletion (6)(7)(8)(9)(10) or overexpression (11)(12)(13) mutant collections. Genome-wide ORF libraries or ORFeomes represent useful resources for the implementation of approaches used to elucidate gene function (14). Besides, the development of collections of overexpression mutants (12), ORFeomes facilitate approaches aimed at evaluating protein subcellular localization and identifying protein-protein interactions (yeast two-hybrid) both at steady state and in response to environmental stimuli (15).
Previously, we have reported the development of a first generation C. albicans ORFeome, in which 644 full-length ORFs, encoding signaling proteins (transcription factors and kinases), as well as cell wall proteins and proteins involved in DNA processes (repair, replication and recombination), were cloned in the Gateway™ vector pDONR207 (32). Here, we describe the generation and validation of the latest C. albicans ORFeome collection, which consists of 5099 ORFs in pDONR207, allowing the transfer of the cloned genes into a variety of C. albicans Gateway™compatible expression vectors. To this end, we also provide a panel of 15 expression vectors that differ in the combination of promoter/selection marker they are carrying, and give different options regarding the tagging of the cloned gene product. These 15 expression vectors have been validated using the filament-specific transcription factor gene UME6. In addition, we also provide a secondary collection of 34 expression vectors with additional options regarding the choice of the promoter and the tagging of the cloned gene product. Finally, among the numerous applications of ORFeome collections, we present a powerful association of the C. albicans ORFeome and a C. albicansadapted two-hybrid system (33) that will allow systematic protein-protein interaction screening in C. albicans.
Escherichia coli strains were routinely cultured at 30 • C or 37 • C in LB or 2YT supplemented with 10 g/ml gentamicin, 50 g/ml kanamycin or 50 g/ml ticarcillin. Solid media were obtained by adding 2% agar.
DH5␣ E. coli cells competent for transformation with DNA were prepared according to Hanahan et al. (34).
The ORFeome collection is being distributed by the Centre International de Ressources Microbiennes (CIRM). The 49 expression vectors are available from Addgene, while two-hybrid strains and vectors are now accessible from the Belgian Co-ordinated Collections of Micro-organisms (BCCM).

Generation of an entry clone collection
Primer design. The DNA sequence of C. albicans was obtained from the Candida Genome Database Release 21 (www.candidagenome.org), before Assembly 22 was released. Gene-specific forward primers were designed by adding the sequence 5 -GGGGACAAGTTTGTACAAA AAAGCAGGCTTG-3 to the 5 end of the first 30 nt of each ORF. Gene-specific reverse primers were designed by adding the sequence 5 -GGGGACCACTTTGTACAAG AAAGCTGGGTC-3 to the 5 end of the last 30 nt of each ORF excluding the Stop codon. Polymerase chain reaction Validation of entry clones by DNA sequencing and bioinformatics analysis. Sanger sequencing of the 5 -end of each BP clone was performed with the 207VER-F oligonucleotide (see Supplementary Table S2 for oligonucleotide sequences except those used to amplify ORFs) to confirm that the expected ORF had been cloned. The 3 -ends were also sequenced with the 207VER-R oligonucleotide to ensure that the oligonucleotides did not carry deletions. All inserts validated by Sanger sequencing were systematically subjected to full-length sequence analysis using Illumina technology. Plasmid DNAs were fragmented using a Bioruptor sonicator. Sequencing libraries were prepared according to the KAPA LTP Library Preparation Kit protocol (KAPA Biosystems, catalogue number KK8232). The libraries were not made in duplicates and therefore pools (1)(2)(3)(4)(5) were sequenced only once. If nonsense or frameshift mutations were observed in a cloned ORF, another colony was checked or the ORF reamplified and cloned again. If these attempts were unsuccessful, cloning of the ORF was abandoned. Clones containing contiguous deletions or insertions of multiples of 3 bp were accepted. Missense mutations and those located within introns were accepted. It should be noted that further analysis of each mutationcontaining ORF is required to determine whether or not mutations affect the function of the ORF. The fastq files have been deposited into NCBI-SRA and are available under the BioProject PRJNA472959. In Supplementary Table S1, the sequencing pool (pools 1-5) is given for each cloned ORF. Because of a server crash, the dataset for pool2 had been lost, but the reads could be recovered from CLC format, and the headers reconstructed based on the sequenc-ing information provided by the Institut Pasteur sequencing platform.

CandidaOrfDB database
CandidaOrfDB was created to integrate the data of the C. albicans ORFeome project and is available at http://candidaorfeome.eu. The database is a relational database using the SQL database server Oracle with a web interface developed using J2EE. The alignment algorithm uses biojava3-alignment Java library, implemented with a NeedlemanWunsch aligner, configured with a gap open penalty of 5 and a gap extension penalty of 2. The substitution matrix used is named 'nuc-4 4 and was created by Todd Lowe (https://github.com/sbliven/biojava/blob/master/ biojava3-alignment/src/main/resources/nuc-4 4.txt). The alignment algorithm was specifically customized for Can-didaOrfDB to work on segments of 500 codons, which provides an optimized alignment performance for C. albicans average sequence length.

Destination plasmids
All destination plasmids constructed in this study were derived from CIp-P TET -GTW (12) and were confirmed by Sanger sequencing. Plasmid sequences have been submitted to GenBank. Accession numbers are given in Table 2. All oligonucleotides used for PCR and/or sequencing are listed in Supplementary Table S2. SP cloning. First, a spacer sequence (SP) was amplified from the E. coli kanR gene borne on pCR-topo-blunt plasmid (Invitrogen) with primers SP3 and SP4. The PCR product was cloned in the SacII site of CIp-P TET -GTW, yielding CIp-P TET -GTW-SP, also referred to as pCA-Dest1100. This 390 bp-long sequence is flanked by the SP1 and SP2 Illumina paired-end sequencing primers 1 and 2 and can be used to insert molecular barcodes if expression plasmids are to be used in signature-tagged mutagenesis approaches.
Epitope tags. We then inserted in frame tags either upstream of attR1 or downstream of attR2 to allow Nterminal or C-terminal protein tagging, respectively. For the cloning of N-terminal tags, we used pUC-attR1-CmR, containing a PciI-BglII fragment from CIp-GTW cloned into the PciI and BamHI sites of pUC18. CIp-GTW was obtained after deleting P TET from CIp-P TET -GTW with Acc65I and HpaI. For cloning of the 3xHA coding sequence, two oligonucleotides (oligo1 PciHABsrG and oligo2 PciHABsrG) containing three HA epitopes were hybridized, gel purified and cloned into pUC-attR1-CmR cut with PciI and BsrGI. The resulting plasmid was then used as a PCR template with primers GTW02 and GTW03. The resulting PCR product was cut with BspEI and HpaI, and cloned into the same sites of pCA-Dest1100 to yield pCA-Dest1110. For cloning of the GFP tag, the GFP gene was amplified from pFA-GFP-URA3 (36) with primers GTW04 and GTW05. The PCR product was cut with PciI and EcoRV and ligated into the same sites of pUC-attR1-CmR. The resulting plasmid was then used as a PCR template with primers GTW07 and GTW03. The resulting PCR product was cut with ScaI and BspEI and ligated into pCA-Dest1100 cut with HpaI and BspEI, yielding pCA-Dest1120. For cloning of the TAP-tag, the TAP-tag coding region was PCR amplified from pFA-TAP-URA3 (12) with primers GTW13 and GTW14. The resulting PCR product was then cut with PciI and BsrGI, and ligated into the same sites of pUC-attR1-CmR. The resulting plasmid was then digested with EcoRV and BspEI and the smallest restriction fragment was ligated into pCA-Dest1100 cut with HpaI and BspEI to yield pCA-Dest1130. For the cloning of C-terminal tags, we generated pUC-attR2 by inserting a SalI-HindIII fragment from pCA-Dest1100 into the same sites of pUC18. For cloning of the 3xHA epitope, pCaMPY-3xHA (37) was used as a PCR template with primers GTW20 and GTW21, and the resulting PCR product cut with BsrGI and NsiI was ligated in the same sites of pUC-attR2. The resulting plasmid was cut with SalI and NsiI and the fragment cloned in the same sites of pCA-Dest1100, yielding pCA-Dest1101. For cloning the GFP tag, the GFP gene was amplified from pFA-GFP-URA3 (36) with primers GTW11 and GTW12. The resulting PCR product was cut with NsiI and EcoRV and ligated into the same sites of pUC-attR2. The resulting plasmid was then cut with SalI and NsiI and the GFP-bearing fragment ligated into the same sites of pCA-Dest1100, yielding pCA-Dest1102. For cloning of the TAP-tag, the TAP-tag coding region was amplified from CIp10-P PCK1 -GTW-TAPtag (12) with primers GTW15 and GTW16. The PCR product was cut with BsrGI and NsiI and cloned in the same sites of pUC-attR2. The resulting plasmid was digested with SalI and NsiI, and the TAP-containing fragment ligated in the same sites of pCA-Dest1100, yielding pCA-Dest1103.
Selection markers. In all plasmids except the P TET series, the auxotrophic marker URA3 was replaced with the NAT1 marker conferring nourseothricin resistance. Briefly, a XbaI-DraIII fragment or SpeI-NaeI fragment was cut from pUC-NAT1 and inserted into the similarly cut vectors. pUC-NAT1 was built by amplifying NAT1 under the control of P TEF1 promoter with oligonucleotides UZ50 and UZ51, and using pFA6-SAT1 (39) as a template, and cloning the AatII-cut fragment into the same site of pUC18.
The 49 expression vectors are available from Addgene (https://www.addgene.org).   (12), respectively, accord-ing to Walther and Wendland (41). Transformants were selected for prototrophy or nourseothricin resistance, respectively, and verified by PCR using primer CIpUL with (i) primer CIpUR for the URA3-bearing plasmids and (ii) primer CgSAT1-rev for the SAT1-bearing plasmids, that yield 1 kb and 1.6 kb products, respectively, if integration of the OE plasmid has occurred at the RPS1 locus.
Induction of the Tet-On system. Overexpression from P TET was achieved by the addition of anhydrotetracycline (ATc, 3 g/ml; Fisher Bioblock Scientific) in YPD at 30 • C (42).
Overexpression experiments were carried out in the dark, as ATc is light sensitive.
Microscope analysis for filamentation. Cells were observed with a Leica DM RXA microscope (Leica Microsystems) with an x40 oil-immersion objective.

Analysis of TAP-and 3HA-tagged proteins by western blots.
For the P TET and P TDH3 strains, a 30 ml culture in YPD or YPD+ATc3 was inoculated at OD600 = 0.2 or 2 × 10 6 cells/ml with a freshly grown colony and incubated at 30 • C with shaking until OD600 reached ∼1.
For the 3HA-tagged strains, 750l of cultures were collected by centrifugation, resuspended in 25 l of water and diluted in 25 l of 2× Laemmli sample buffer. After 10 min at 100 • C, proteins were stored at −80 • C or directly submitted to electrophoresis.
For the TAP-tagged strains, 20 ODs of exponentially growing cells were collected by centrifugation and resuspended in lysis buffer (9M Urea, 1.5% w/v dithiothreitol, 2-4% w/v CHAPS, and 1.5 M Tris pH9.5). Homogenization of the cells was achieved using a FastPrep bead beater. After homogenization, the lysed cells were centrifuged at 13 000 rpm for 10 min at room temperature, and the supernatant containing the solubilized proteins was used directly or stored at −80 • C.
Proteins were separated on an Invitrogen 8% Nu-Page gel, transferred onto nitrocellulose. TAP-and 3HAtagged proteins were detected using peroxidase-coupled anti-peroxidase and anti-HA-peroxidase antibodies (Sigma and Roche, respectively) and an ECL kit (GE Healthcare).

Construction of opaque mating-compatible two-hybrid strains and Gateway™-compatible two-hybrid vectors
Mating-compatible strains were generated by deletion of MTLa or MTL␣ locus using the SAT1 flipper cassette (39) in the two-hybrid strain background SC2H3 (33). To delete the MTLa locus, flanking regions were amplified with primers MTLa-5 F and MTLa-5 R, and with MTLa-3 F and MTLa-3 R. To delete the MTL␣ locus, homologous regions were amplified with primers MTL␣-5 F and MTL␣-5 R, and with MTL␣-3 F and MTL␣-3 R. Both fragments for each locus were cloned into pSAT1 (39), at SacI/NotI and XhoI/KpnI sites respectively. SacI/KpnI deletion constructs were transformed in SC2H3 (33) and transformants were selected on YPD medium containing 200 g/ml of nourseothricin. Positive SC2H3a and SC2H3␣ strains were subsequently grown in maltose medium for induction of the recombinase and loss of the SAT1 flipper cassette.
For induction of opaque switching, a WOR1 fragment was amplified with primers WOR1F and WOR1R and cloned into pNIM1 vector (42) at SalI and BglII sites. SC2H3a and SC2H3␣ strains were transformed with pNIM1-WOR1 with selection on nourseothricincontaining medium. The resulting strains, SC2H3a-pWOR1 and SC2H3␣−pWOR1, were grown in the presence of doxycycline (50 g/ml) to induce the opaque state (43). Opaque mating-compatible strains were finally transformed with prey or bait plasmid and selected on SC-ARG or SC-LEU respectively.
Bait and prey two-hybrid vectors, pC2HB and pC2HP, respectively (33) were converted into Gateway™ destination vectors using the Gateway™ Vector Conversion System to generate pC2HB-GC and pC2HP-GC, respectively. The genes encoding the bait protein Hst7 (Orf19.469) and the prey Cek1 (Orf19.2886) were transferred by LR reactions from the donor collection of vectors into pC2HB-GC and pC2HP-GC destination vectors, respectively.

Mating approach and protein-protein interaction detection
Opaque cells of SC2H3a and SC2H3␣, expressing VP16-Cek1 (Arg+) and lexA-Hst7 (Leu+) respectively, were crossed as follows. Opaque cells of both types were mixed at 1 × 10 6 cells each in Spider medium (44) and incubated at 23 • C for 24 h. Resulting tetraploid cells were selected on SC-LEU-ARG medium for 48 h incubation at 30 • C, and transferred to SC-HIS-MET medium for protein-protein interaction detection. Chromosome loss was induced on presporulation medium at 37 • C for 10 days as described previously (45).

Establishment of a sequence-validated Candida albicans OR-Feome
Our objective was to develop a C. albicans ORFeome encompassing as many of the 6205 ORFs predicted in Assembly 21 of the C. albicans genome as possible (46). To this aim, forward and reverse oligonucleotides with, respectively, attB and attP sequences at their 5 ends were synthesized for each of the 6205 ORFs (Supplementary Table S1), used in independent PCR reactions with genomic DNA of C. albicans strain SC5314 (47) and the resulting PCR products were cloned in pDONR207 using Gateway™ recombinational cloning (31). The resulting plasmids were then subjected to Sanger sequencing at the 5 and 3 ends of the cloned ORFs and to Illumina sequencing throughout the cloned ORFs.
Among the 6205 ORF sequences (from Assembly 21) used for oligonucleotide design, 6 are no longer present in Assembly 22 while 20 are annotated as mitochondrial genes in Assembly 22. Among the 6179 remaining ORFs, 5099 (83%) were successfully cloned and full-length sequence-validated. Sequences were validated by comparing the entire sequence of each cloned ORF against the reference sequences for haplotype A and B of the C. albicans SC5314 genome available at the Candida Genome Database (CGD; version A22-s07-m01-r18; (48)). We defined 4 groups: (i) 4061 sequences that are 100% identical with a reference sequence (No SNP, no DIP; where SNP stands for Single Nucleotide Polymorphism and DIP stands for Deletion/Insertion Polymorphism); (ii) 108 sequences that do not carry any SNP but carry DIPs (No SNP, DIPs); (iii) 843 sequences that carry SNPs but no DIP (SNPs, no DIP); and (iv) 87 sequences that carry both SNPs and DIPs (SNPs and DIPs) ( Figure 1A and Supplementary Table S1). DIPs, multiples of 3, vary from 3 to 33 nt in length. ORFs with DIPs non-multiples of three were also retained when present within introns.
Overall, 4061/6179 variation-free C. albicans ORFs (65.7%) are now available to the community. Among the 930 SNP-containing ORFs, 460 contain 1 SNP, 131 contain 2 SNPs, 78 contain 3 SNPs, 74 contain 4 SNPs and 187 ORFs contain 5 or more SNPs (up to 72 SNPs). For 36% of the validated ORFs, there was no difference between the two alleles in the reference sequences (HapA/B; Figure 1B). When nucleotide differences were present in the two alleles of the reference ORF sequence, we did not notice any bias in the haplotype that was cloned: haplotype A ORFs were cloned in 33% of cases (HapA; Figure 1B) and haplotype B ORFs were cloned in 31% of cases (HapB; Figure 1B). On chromosomes R, 1, 2, 3, 4, 5 and 7, ORF cloning was uniformly successful, with a cloning success rate ranging from ∼82% to ∼86%. Chromosome 6 ORFs were slightly underrepresented (76.9%) ( Figure 1C). Although ORFs up to 1.5 kb could be cloned with a success rate around 87% and ORFs between 1.5 and 4 kb could be cloned with a success rate around 77%, we observed a decreased success rate for the longer ORFs ( Figure 1D). GC content did not seem to be the cause for either lack of PCR amplification or cloning failure as we did not see a statistically significant difference in GC content when comparing successfully clonedand failed-ORFs (data not shown).
Analysis of the 6875211 nucleotides of the 5099 sequencevalidated ORFs revealed 3029 SNPs. The ratio of non-synonymous to synonymous changes due to SNPs was around 1. These nucleotide substitutions could be either real SNPs between our strain and the reference strain, or due to poor quality of some reference sequences, or mutations in the primers or PCR-induced mutations. Overall, the resulting maximum error rate amounted to 4.4 × 10 −4 i.e. 1 SNP every 2270 cloned nucleotides. Notably, the C. albicans SC5314 genome sequence available at CGD (version A22-s07-m01-r18) contained 272 ORFs with sequence ambiguities for at least one of the alleles. These ambiguities included stretches of 'Ns' or IUPAC code nucleotides (Y, R, S, M, K, W). The ORFeome-generated sequencing data resolved sequence ambiguities for 110 of these ORFs (Supplementary Tables S3 and 4). In this study, we also detected recombinant haplotypes for 116 of the cloned ORFs (Supplementary Table S5) that displayed characteristics of both reference haplotypes.
Taken together, our study provides an extensive, extremely high quality C. albicans ORFeome.

CandidaOrfDB: a database for the C. albicans ORFeome
CandidaOrfDB was created to integrate the data of the C. albicans ORFeome project and is available at http:// candidaorfeome.eu. CandidaOrfDB enables the scientific community to search for availability and quality of the clones. All data pertinent to the cloning process are stored in the database. This information includes the name and sequence of the primers used to amplify the ORFs from C. albicans SC5314 genomic DNA, the coordinates of the primers in their 96-well storage plates and any sequencing information available on the clones. SNPs and DIPs, their location in the ORF and their influence on the corresponding amino acid sequences are shown (Figure 2). Nucleotide sequence of the cloned ORF and the corresponding amino acid sequence are displayed with highlights of SNPs or DIPs relative to the closest haplotype sequence ( Figure  2). The reference sequence data have been extracted from CGD (48). The database can be queried through the ORF number (orf19.xxxx or 19.xxxx; (5)), the Assembly 22 name (Ci XXXXXW or Ci XXXXXC; (5)) or the gene name.

A collection of destination vectors for exploiting the C. albicans ORFeome
The C. albicans ORFeome described above was developed in pDONR207 as this allows subsequent transfer of the cloned ORFs to Gateway™-adapted destination vectors. While destination vectors for ORF expression in hosts such as E. coli or S. cerevisiae are available (49,50), only a few destination vectors for ORF expression in C. albicans have been reported (12). Therefore, we set out to establish a collection of Gateway™-adapted destination vectors for constitutive or conditional expression of untagged or tagged ORFs in C. albicans. Our primary collection of 15 C. albicans destination vectors derived from CIp10S (12) provides a choice of two promoters, either the constitutive promoter P TDH3 or the inducible promoter P TET , and the option for N-or C-terminal fusion to various epitope tags (3xHA or TAP) (Table 2 and Figure 3). Integration of the destination vectors and their derivatives at the RPS1 locus in the C. albicans genome is promoted by StuI or I-SceI linearization. These CIp10-derived vectors carry either the auxotrophic marker URA3 or the NAT1 marker that confers resistance to nourseothricin and can be used with clinical isolates. However, P TET -bearing plasmids cannot carry the NAT1 marker since the transactivator needed for tetracycline-mediated induction of P TET is borne on the NAT1-carrying pNIMX plasmid (12). All vectors harbor a so-called spacer sequence (SP) originating from the E. coli kanamycin resistance gene, for barcode insertion and subsequent Illumina-based barcode sequencing. A second set of 34 plasmids was also generated with the constitutive promoters P ACT1 or the inducible promoter P PCK1 , and the option for N-or C-terminal fusion to GFP ( Table 2). All 49 destination vectors were designated with a standardized nomenclature, pCA-DESTijkl, whereby i stands for the transformation marker (1, URA3; 2, NAT1); j stands for the promoter (1, P TET ; 2, P PCK1 ; 3, P TDH3 ; 4, P ACT1 ); k stands for N-terminal tagging (0, no tag; 1, 3xHA; 2, GFP; 3, TAP); and l stands for C-terminal tagging (0, no tag; 1, 3xHA; 2, GFP; 3, TAP) (Table 2 and Figure 3). All destination vectors are available from Addgene (https://www.addgene.org).
In order to validate that the primary set of 15 destination vectors could drive constitutive or conditional expression of untagged or tagged ORFs, the UME6 ORF was transferred into each of them using LR Clonase™. UME6 encodes a transcription factor whose overexpression has been shown to force filamentation in C. albicans (51,52). The validation criteria for the destination vectors were (i) success of the LR reaction, (ii) integration in the C. albicans genome at the RPS1 locus, (iii) filamentation phenotype and/or (iv) detection of the 3xHA-or TAP-tagged Ume6 protein by western blot. The LR reaction was successfully used to transfer UME6 from the pDONR207::UME6 plasmid to all 15 destination vectors. The resulting expression plasmids were successfully integrated at the C. albicans RPS1 locus. The five strains containing P TET were induced overnight at 30 • C by adding 3 g/ml anhydrotetracycline (ATc3) to the culture medium, while the 10 strains containing P TDH3 were grown overnight at 30 • C in YPD. As previously described (12,51,52), overexpression of Ume6 led to hyperfilamentation. This phenotype was observed for all strains, regardless of the tag used ( Figure 4A and B). Nevertheless, the C-terminal TAP-tag appeared to hinder Ume6 function. Indeed, protein tags can potentially alter protein localization and/or protein function. One strength of our system is that it provides a choice of tagging, that could minimize the effect of the tag on a specific protein function. 3xHA-or TAPtagged Ume6 was detected in the relevant strains and under the appropriate growth conditions ( Figure 5A and B). We noticed extra bands migrating faster than the 3HA-tagged Ume6 protein, which are likely to result from protein degradation. The 3HA-tagged Ume6 protein was extracted by resuspending the cells in 2× Laemmli sample buffer and boiling at 100 • C for 10 min. This rapid protocol might be responsible for the observed protein degradation.
Taken together, these data present a collection of 49 available destination vectors and validate 15 of those for application with the C. albicans ORFeome.

Proof of concept of a two-hybrid matrix approach of proteinprotein interaction detection via mating in C. albicans
The availability of the ORFeome collection is a prerequisite to large, systematic two-hybrid (2H) screenings in C. albicans. The Candida 2H (C2H) system was developed for targeted one to one protein-protein interaction detection (33). Hence, there was a need to engineer the system for high throughput screening based on rapid and convenient recombination cloning systems for the generation of prey/bait arrays of proteins, and to use a mating approach to circumvent the low efficiency of transformation of this fungus. The former was obtained by adapting the prey and bait recipient vectors for Gateway™-mediated transfer of the ORFeome library, yielding pC2HB-GC and pC2HP-GC, respectively. The latter involved the combined processes of white-opaque switching and the preparation of mating-compatible 2H strains. The pleiomorphic fungus C. albicans is characterized by a parasexual cycle, with the possible mating of diploids, and generation of tetraploids, the absence of meiosis and the reversion to the diploid Nucleic Acids Research, 2018, Vol. 46, No. 14 6943 Figure 2. Snapshot of the CandidaOrfDB interface. An example of an ORF page is shown. In the 'ORF details' box, the different ID names, the length and the chromosome location of the ORF of interest are displayed. The haplotype assigned to the cloned ORF is noted (A, B or A/B if there is no allelic differences or equal number of differences against both haplotypes). The coordinates of introns are indicated when present. The summary results of the sequence analysis against the reference sequence are presented. The 'SNP(s)' box shows a table that lists the sequence differences between the cloned ORF and the reference sequences (Haplotypes A and B from Assembly22, and Assembly21 sequences). The 'Nucleotide and Protein sequences' boxes display the sequences with a color code for synonymous and non-synonymous SNPs. All sequences can be downloaded. The 'Resources' box displays links towards information that is relevant to each resource, i.e. oligonucleotide sequences for the BP clones, barcode sequence for the overexpression plasmids and the C. albicans overexpression strains, position of the clone in the different plates of the collection, plasmid sequences. The 'Restrictions on the cloned sequence' box indicates the existence of restriction sites, as well as the size of the expected fragments for enzymes that are used in regards to subsequent applications of these donor plasmids. state by chromosome loss (reviewed in (53)). An essential pre-requisite to mating in C. albicans is the phenotypic switch from white to opaque, a stable but reversible switch regulated by environmental stimuli such as low temperatures, carbon dioxide and nutrients. The epigenetic switch is tightly linked to mating since the a1/␣2 complex, encoded at the mating type like (MTL) loci, acts as a repressor of opaque switching (54). In that context, the two-hybrid strain SC2H3 originating from the diploid a/␣ SN152 strain (55) was deleted for the MTLa or MTLα locus to construct SC2H3␣ and SC2H3a strains. The efficient switch to opaque in the hemizygote strains was achieved by the doxycycline-controlled expression of WOR1, a major regulator of the white to opaque switch (43,56).
A proof of principle of the whole procedure for proteinprotein detection is shown in Figure 6. The MAP kinase ki-nase Hst7 was used as bait and linked to the DNA binding domain lexA, as part of the pC2HB-GC vector. Similarly, the Cek1 kinase was expressed as a prey protein, fused to the activation domain VP16, component of the pC2HP-GC vector. pC2HB-Hst7 and pC2HP-Cek1 were transformed into opaque SC2H3␣ and SC2H3a strains, respectively. Mating of the resulting transformants was achieved through selection for leucine and arginine prototrophy. Upon interaction of the two proteins, the reconstituted transcriptional module promoted the expression of the HIS1 marker, allowing only the mating-derived strains to grow on histidinefree medium, whereas the original diploid strains could not ( Figure 6B). It is possible that some strains reverted to the diploid state, but we did not perform FACS analysis to assess this as it is not relevant for the protein-protein interaction analysis. All transformants were able to grow in the absence of arginine and leucine, indicating the presence of the prey and bait vectors respectively. In addition, all these strains were able to grow in absence of histidine, indicative of the targeted protein-protein interaction.

DISCUSSION
A partial C. albicans Gateway™-adapted ORFeome (OR-FeomeV1; 644 ORFs) and the corresponding C. albicans overexpression strains have already been successfully used to investigate morphogenesis, biofilm formation and genome dynamics in C. albicans (12,32,57). In this study, we have now generated the C. albicans ORFeomeV2 within the Gateway™ recombination cloning system, as well as a collection of destination vectors suitable for expression in C. albicans.
This new C. albicans ORFeome resource encompasses 5099 ORFs (83% of the annotated ORFs) and will be made available to the community upon request. The resource is supported by the CandidaOrfDB database, providing information on the individual plasmids and their sequences. The C. albicans ORFeomeV2 is of unprecedented quality, as all ORFs in the collection are fully sequenced, which remains an exception in large-scale ORFeome projects where ORFs are rarely sequenced in their entirety and only the 5 -and 3 -ends of the cloned ORFs are sequenced to confirm identity and the absence of frameshift mutations in the primers (16,17,19,22,24,30). Nevertheless, it should be noted that further analysis of each mutation-containing ORF is required to determine whether or not mutations affect the function of the ORF. Given these high standards, the success rate of 83% that we achieved with the C. albicans OR-FeomeV2 is quite remarkable.
In eukaryotes, ORFeomes are usually cloned from fulllength cDNA libraries to account for the presence of introns and therefore splicing variants. However, only 6% of C. albicans genes have introns (58), rendering the generation of the C. albicans ORFeome more straightforward than for pluricellular eukaryotes. Indeed, ORFs were simply amplified from the genomic DNA of the reference strain SC5314 and therefore, 360 ORFs in the C. albicans ORFeomeV2 harbor introns, a matter that should be taken into consideration when expressing ORFs in a prokaryotic host and, possibly, eukaryotic hosts other than C. albicans. A second aspect to be taken into consideration when using the OR-Feome in hosts other than C. albicans is the unusual codon usage in this species (59). Indeed, in C. albicans, CUG is decoded as a serine instead of a leucine the majority of the time. Hence, improper translation in hosts with a standard genetic code may distort protein structure and function.
Sanger sequencing of both 5 and 3 ends was performed to confirm ORF identity and exclude clones containing primer or recombination errors. Previous studies have reported mutation rates in primers of 3-10% (27,30). Similar rates were observed in this work. Unlike other Gateway™ recombinational cloning projects (16,30), we did not see major differences in cloning efficiency for ORFs up to 4 kb. A decrease in success rate was observed only for ORFs >4 kb. This size bias could be attributed to an increased difficulty to amplify the ORF by PCR, a reduced efficiency of the Gateway™ BP Clonase™ reactions with long PCR products, and the error rate of the PCR polymerase, which increases with longer products. In addition, we also corrected the sequences of 110 ORFs with sequence ambiguities (N-tracts and IUPAC code nucleotides) according to the Candida Genome Database and identified 116 ORFs that display a recombinant haplotype. Recombinant haplotypes could be explained by template switching during PCR cycles or by incorrectly phased SNPs in reference sequences. This could be addressed by performing long-read sequencing of the C. albicans SC5314 genome.
However, many challenges remain to be addressed. For heterozygous genes, only one allele is present in the OR-Feome. Because functional differences have been reported between the two alleles of a heterozygous gene (60)(61)(62), it would be relevant to also clone and validate the second al-lele in these instances. In addition, oligonucleotides have been designed based on annotation of the haploid set of Assembly 19 of the C. albicans genome. As a consequence, genes of the MTLα locus have not been included in the ORFeome. Despite our efforts, the collection is still missing 1081 clones. The next version of the C. albicans ORFeome could extend gene coverage by adding the missing ORFs, allelic variants or strain-specific variations.
ORFeomes are essential to bridge the gap between genome annotation and systems biology and allow largescale gene and protein characterization. The development of a collection of C. albicans constitutive or conditional overexpression strains is one of the applications of the C. albicans ORFeome that we have successfully explored (12,32,57). In this respect, the C. albicans ORFeome project is currently completing its second and third phases, which consist of transferring the 5099 cloned ORFs into barcoded destination vectors and generating a collection of C. albicans barcoded overexpression mutants, each mutant carrying one of the 5099 cloned ORFs under the control of the inducible P TET promoter. CandidaOrfDB already provides information on the available overexpression plasmids and strains. In this study, we have paved the way for a second application of the C. albicans ORFeome through the development of tools for a 2H matrix approach of proteinprotein interaction detection via mating in C. albicans. Our data show that mating of diploid C. albicans expressing a bait fused to the LexA DNA binding domain and a prey fused to the VP16 activation domain, respectively, allows protein-protein interactions to be tested. In S. cerevisiae, large-scale 2H screens can be performed in a matrix design whereby haploid strains expressing baits and preys are mated and protein-protein interactions are scored in the resulting diploids. Our toolkit now enables the implementation of such a matrix design for large-scale 2H screens in C. albicans. To this aim, a collection of 1500 prey clones has already been generated. As mentioned above, C. albicans codon usage is unusual, which limits the development of applications of the C. albicans ORFeome in species that use standard decoding such as S. cerevisiae. Our development of tools for large-scale 2H screens in C. albicans circumvents this limitation, opening the path for the characterization of the C. albicans interactome. Defining the C. albicans interactome will undoubtedly impact our understanding of C. albicans pathogenesis in humans.
In summary, high-level gene coverage coupled with the versatility of the Gateway™ recombinational cloning, C. albicans-adapted functional genomics tools and full access to clones, make the C. albicans ORFeomeV2 a unique and Proof of principle using Hst7 as a bait and Cek1 as a prey, previously shown to interact (31). Cells of each type were spotted in a dilution series and growth was monitored on SC-leu-arg, which allowed growth of the tetraploid and diploid offspring and on SC-met-his, which allowed detection of PPI, only in those cells that are expressing both bait and prey proteins. As negative controls, tetraploids derived from the crossing of bait-expressing strains with empty prey vector transformed strains, and from the crossing of prey-expressing strains with strains expressing the empty bait vector grew on SC-leu-arg but not on SC-met-his. valuable resource for the scientific community that should greatly facilitate future functional studies in C. albicans.

DATA AVAILABILITY
Plasmid sequences have been submitted to GenBank. Accession numbers are given in Table 2.