Identification of essential genes in Caenorhabditis elegans through whole-genome sequencing of legacy mutant collections

Abstract It has been estimated that 15%–30% of the ∼20,000 genes in C. elegans are essential, yet many of these genes remain to be identified or characterized. With the goal of identifying unknown essential genes, we performed whole-genome sequencing on complementation pairs from legacy collections of maternal-effect lethal and sterile mutants. This approach uncovered maternal genes required for embryonic development and genes with apparent sperm-specific functions. In total, 58 putative essential genes were identified on chromosomes III–V, of which 52 genes are represented by novel alleles in this collection. Of these 52 genes, 19 (40 alleles) were selected for further functional characterization. The terminal phenotypes of embryos were examined, revealing defects in cell division, morphogenesis, and osmotic integrity of the eggshell. Mating assays with wild-type males revealed previously unknown male-expressed genes required for fertilization and embryonic development. The result of this study is a catalog of mutant alleles in essential genes that will serve as a resource to guide further study toward a more complete understanding of this important model organism. As many genes and developmental pathways in C. elegans are conserved and essential genes are often linked to human disease, uncovering the function of these genes may also provide insight to further our understanding of human biology.


Introduction
Essential genes are those required for the survival or reproduction of an organism, and therefore encode elements that are foundational to life. This class of genes has been widely studied for a number of reasons. Essential genes are often well conserved and can offer insight into the principles that govern common biological processes (Hughes 2002;Jordan et al. 2002;Georgi et al. 2013). Researching these genes and their functions has important implications in understanding the cellular and developmental processes that form complex organisms, including humans. In addition, identifying genes that are lethal when mutated opens up new avenues through which drug development approaches can target parasites, pathogens, and cancer cells (e.g., Doyle et al. 2010;Shi et al. 2015;Vyas et al. 2015;Zhang et al. 2018). Finally, the concept of a minimal gene set that is comprised of all genes necessary for life has been the subject of much investigation and has recently been of particular interest in the field of synthetic biology (reviewed in Auslä nder et al. 2017).
Studying essential genes in humans is complicated by practical and ethical considerations. Accordingly, model organisms have played a key role in identifying and understanding essential genes, and efforts have been made to identify all essential genes in a few model organisms. Systematic genome-wide studies of gene function in Saccharomyces cerevisiae have uncovered more than 1100 essential genes, many of which have phylogenetically conserved roles in fundamental biological processes such as cell division, protein synthesis, and metabolism (Winzeler et al. 1999;Giaever et al. 2002;Yu et al. 2006;Li et al. 2011). While an important contribution, this is only a fraction of all the essential genes in multicellular organisms. In more complex model organisms, identifying all essential genes in the genome has not been so straightforward. The discovery of RNA interference (RNAi; Fire et al. 1998) enabled researchers to employ genome-wide reverse genetic screens to examine the phenotypic effects of gene knockdown (Fraser et al. 2000;Kamath et al. 2003). In general, this has been an effective, high-throughput method for identifying many genes with essential functions (Gö nczy et al. 2000;Sö nnichsen et al. 2005). However, there are limitations to using RNAi to screen for all essential genes, including incomplete gene knock down, off-target effects, and RNAi resistance in a certain tissue or cell types; thus, many genes of biological importance escape identification in high-throughput RNAi screens. This highlights the motivation to obtain null alleles for every gene in the genome, which has been the goal of several model organism consortia (Bradley et al. 2012;C. elegans Deletion Mutant Consortium 2012;Varshney et al. 2013), though it has not yet been achieved for any metazoan.
Caenorhabditis elegans has been an important model in developmental biology for decades, and the ability to freeze and store populations of C. elegans indefinitely allows investigators to share their original mutant strains with others around the world. In the first few decades of C. elegans research, dozens of forward genetics screens were used to uncover mutants in hundreds of essential genes (e.g., Herman 1978;Meneely and Herman 1979;Rogalski et al. 1982;Howell et al. 1987;Clark et al. 1988;Baillie 1988, 1991;Kemphues et al. 1988;McKim et al. 1988McKim et al. , 1992Howell and Rose 1990;Stewart et al. 1998;Gö nczy et al. 1999). These early studies generated what we refer to here as legacy collections. The alleles were often mapped to a region of the genome through deficiency or linkage mapping. However, the process of identifying the molecular nature of the genetic mutations one-by-one using traditional methods was slow and laborious before the genome sequence was complete (The C. elegans Sequencing Consortium 1998) and next-generation sequencing technologies were developed (reviewed in Metzker 2010; Goodwin et al. 2016).
As whole-genome sequencing (WGS) has become widely adopted, methods for identifying mutant alleles have evolved to take advantage of these technological advances (Sarin et al. 2008;Smith et al. 2008Smith et al. , 2016Srivatsan et al. 2008;Blumenstiel et al. 2009;Schneeberger et al. 2009;Doitsidou et al. 2010;Flibotte et al. 2010;Zuryn et al. 2010). With WGS becoming increasingly affordable over time, mutant collections can now be mined for data in efficient ways that were not possible two decades ago. Performing WGS on a single mutant genome is often insufficient to identify a causal variant due to the abundance of background mutations in any given strain, particularly one that has been subjected to random mutagenesis (Denver et al. 2004;Hillier et al. 2008;Sarin et al. 2008;Flibotte et al. 2010). However, when paired with additional strategies such as deletion or SNP-based mapping or bulk segregant analysis, WGS becomes a valuable tool to expedite gene identification. Furthermore, if multiple independently derived allelic mutants exist, an even simpler approach can be taken. By sequencing two or more mutants within a complementation group and looking for mutations in the same gene, the need for additional mapping or crossing schemes is greatly reduced (Schneeberger and Weigel 2011;Nordströ m et al. 2013).
In the legacy mutant collections described above, where large numbers of mutants are isolated, it is feasible to obtain complementation groups with multiple alleles for many loci. In addition, the abundance of mutants obtained in these large-scale genetic screens suggests that some legacy mutant collections may harbor strains for which the mutations remain unidentified. If such collections are coupled with thorough annotations, they are valuable resources that can be mined with WGS. Indeed, some investigators have recently used such WGS-based approaches to uncover novel essential genes from legacy collections (Jaramillo-Lambert et al. 2015;Qin et al. 2018). These projects bring us closer to identifying all essential genes in C. elegans and also contribute to the ongoing efforts to obtain null mutations in every gene in the genome.
There are currently 3755 C. elegans genes that have been annotated with lethal or sterile phenotypes from RNAi knockdown studies (data from WormBase version WS275). In comparison, the number of genes currently represented by lethal or sterile mutant alleles is 1885 (data from WormBase version WS275). These numbers should be considered minimums, as the database annotations are not necessarily up to date. The discrepancy in these numbers could be illustrative of the comparatively timeconsuming and laborious nature of isolating and identifying mutants. In addition, some of the genes identified as essential in RNAi screens may belong to paralogous gene families whose redundant functions are masked in single gene knockouts. Although the total number of essential genes in C. elegans is unknown, extrapolation from saturation mutagenesis screens has led to estimates that approximately 15%-30% of the $20,000 genes in this organism are essential (Clark et al. 1988;Howell and Rose 1990;Johnsen and Baillie 1997;The C. elegans Deletion Mutant Consortium 2012). This suggests the possibility that there are many essential genes in C. elegans that remain unidentified and/or lack representation by a null allele.
In this study, we use WGS to revisit two C. elegans legacy mutant collections isolated more than 25 years ago. These collections are a rich resource for essential gene discovery; they comprise 75 complementation groups in which at least two alleles with sterile or maternal-effect lethal phenotypes have been found. With these collections, we sought to identify novel essential genes and to conduct a preliminary characterization of their roles in fertilization and development. Wild-type male rescue assays are used to attribute some mutant phenotypes to sperm-specific genetic defects. In addition, we examine arrested embryos using differential interference contrast (DIC) microscopy and document their terminal phenotypes. This work comprises a catalog of 125 alleles with mutations in 58 putative essential genes on chromosomes III-V. Of these 58 genes, 52 are represented by novel alleles in this collection. We present several genes which are reported here for the first time as essential genes and mutant alleles for genes that have only previously been studied with RNAi knockdown. This work aims to help accelerate research efforts by identifying essential genes and providing an entry point into further investigations of gene function. Advancing our understanding of essential genes is imperative to reaching a more comprehensive knowledge of gene function in C. elegans and may provide insight into conserved processes in developmental biology, parasitic nematology, and human disease.
Complementation analysis of legacy mutants was performed by crossing 10 males of one mutant strain to 4 hermaphrodites of another strain. The presence of males with homozygous markers indicated successful crossing, and homozygous hermaphrodite progeny were transferred to new plates to determine whether viable offspring were produced and thus complementation occurred. Failure to complement was verified with additional homozygous animals or by repeating the cross. Complementation tests between CRISPR-Cas9 deletion strains and legacy mutants were performed by crossing heterozygous CRISPR-Cas9 deletion (GFP/þ) males to heterozygous legacy mutant hermaphrodites. Twenty GFP hermaphrodite F1s were singled on new plates and those segregating viable Dpy and/or Unc progeny indicated complementation between the two alleles.

DNA extraction
Balanced heterozygous strains were grown on 100 mm nematode growth medium (NGM) agar plates (standard recipe with 3 times concentration of peptone) seeded with OP50 and harvested at starvation. Genomic DNA was extracted using a standard isopropanol precipitation technique previously described (Au et al. 2019). DNA quality was assessed with a NanoDrop 2000c Spectrophotometer (Thermo Scientific) and DNA concentration was measured using a Qubit 2.0 Fluorometer and dsDNA Broad Range Assay kit (Life Technologies).
Whole-genome sequencing and analysis pipeline DNA library preparation and WGS were carried out by The Centre for Applied Genomics (The Hospital for Sick Children, Toronto, Canada). Between 20 and 33 C. elegans mutant strains were run together on one lane of an Illumina HiSeq X to generate 150-bp paired-end reads.
Sequencing analysis was done using a modified version of a previously designed custom pipeline (Flibotte et al. 2010;Thompson et al. 2013). Reads were aligned to the C. elegans reference genome (WS263; wormbase.org) using the short-read aligner BWA version 0.7.16 (Li and Durbin 2009). Single nucleotide variants (SNVs) and small insertions or deletions (indels) were called using SAMtools toolbox version 1.6 ). To eliminate unreliable calls, variants at genomic locations for which the canonical N2 strain has historically had low read depth or poor quality (Thompson et al. 2013) were removed as potential candidates. The variant calls were annotated with a custom Perl script and labeled heterozygous if represented by 20%-80% of the reads at that location. The remaining candidates were then subjected to a series of custom filters: (i) any variants that appeared in more than three strains from the same collection were removed; (ii) homozygous mutations were removed; (iii) only mutations affecting coding exons (indels, missense, and nonsense mutations) or splice sites (defined as the first two and last two base pairs in an intron) were kept, while all variants from other noncoding regions were removed; and (iv) only mutations on the chromosome to which the mutation had originally been mapped were selected, while variants on all other chromosomes were removed.
For each pair of strains belonging to a complementation group, the final list of candidate mutations was compared and the gene or genes in common were identified. In cases where there was only one gene in common on both lists, this gene was designated the putative essential gene. For complementation groups with multiple candidate genes in common, additional information such as the nature of the mutations and existing knowledge about the genes was used to select a single candidate gene, when possible. When there was no gene candidate in common within a pair of strains, the list of variants was reanalyzed to look for larger deletions and rearrangements. If available, two additional alleles were sequenced to help identify the gene.

Validation of gene candidates
To validate the candidate gene candidates derived from WGS analysis, the genomic position of each candidate gene was corroborated with the legacy data from deficiency mapping experiments. Approximate boundaries for the deletions were estimated from the map coordinates of genes known to lie internal or external to the deletions according to data from WormBase (WS275).
For further validation of select gene candidates, deletion mutants were generated in an N2 wild-type background using a CRISPR-Cas9 genome editing strategy previously described  excise the gene of interest and replace it with a selection cassette expressing G418 drug resistance and pharyngeal GFP (loxP þ Pmyo-2::GFP::unc-54 3 0 UTR þ Prps-27::neoR::unc-543 0 UTR þ loxP vector, provided by Dr. John Calarco, University of Toronto, Canada). Guide RNAs were designed using the C. elegans Guide Selection Tool (genome.sfu.ca/crispr) and synthesized by Integrated DNA Technologies (IDT). Repair templates were generated by assembling homology arms (450-bp gBlocks synthesized by IDT) and the selection cassette using the NEBuilder Hifi DNA Assembly Kit (New England Biolabs). Cas9 protein (generously gifted from Dr. Geraldine Seydoux) was assembled into a ribonucleoprotein (RNP) complex with the guide RNAs and tracrRNA (IDT) following the manufacturer's recommendations. PD1074 animals were injected using standard microinjection techniques (Mello et al. 1991;Kadandale et al. 2009) with an injection mix consisting of: 50 ng/ml repair template, 0.5 mM RNP complex, 5 ng/ml pCFJ104 (Pmyo-3::mCherry), and 2.5 ng/ml pCFJ90 (Pmyo-2::mCherry). Injected animals were screened according to the protocol described in Norris et al. (2015) and genomic edits were validated using the PCR protocol described in Au et al. (2019). Complementation tests between CRISPR-Cas9 alleles and legacy mutant alleles were performed to verify gene identities, as described above.

Analysis of orthologs, GO, and expression patterns
Previously reported phenotypes from RNAi experiments or mutant alleles were retrieved from WormBase (WS275) and GExplore (genome.sfu.ca/gexplore; Hutter et al. 2009;Hutter and Suh 2016). Life stage-specific gene expression data from the modENCODE project (Hillier et al. 2009;Gerstein et al. 2010Gerstein et al. , 2014Boeck et al. 2016) were also accessed through GExplore. Visual inspection of these data revealed genes with maternal expression patterns (high levels of expression in the early embryo and hermaphrodite gonad) as well as those predominantly expressed in males.
Human orthologs of C. elegans genes were determined using Ortholist 2 (ortholist.shaye-lab.org; Kim et al. 2018). For maximum sensitivity, the minimum number of programs predicting a given ortholog was set to one. For genes with no human orthologs, NCBI BLASTp (blast.ncbi.nlm.nih.gov; Altschul et al. 1990) was used to examine distributions of homologs across species and potential nematode-specificity. Protein sequences from the longest transcript of each gene were used to query the nonredundant protein sequences (nr) database, with default parameters and a maximum of 1000 target sequences. The results were filtered with an E-value threshold of 10 À5 .
Gene ontology (GO) term analysis was performed using PANTHER version 16.0 (Thomas et al. 2003). The list of 58 candidate genes was used for an overrepresentation test, with the set of all C. elegans genes as a background list. Overrepresentation was analyzed with a Fisher's Exact test and P-values were adjusted with the Bonferroni multiple testing correction.

Temperature sensitivity and mating assays
To assay temperature sensitivity, heterozygous strains were propagated at 15 and homozygous L4 animals were isolated on 60 mm NGM plates (2 Â 6/plate or 3 Â 3/plate). After 1 week at 15 , plates were screened for the presence of viable homozygous progeny. If present, L4 homozygotes were transferred to new plates at 25 and screened after 3 days to confirm lethality or sterility.
Mating assays were carried out using PD1074 males and mutant hermaphrodites. Three L4-stage homozygous mutant hermaphrodites were isolated and crossed with 10 PD1074 males on each of three 60 mm NGM plates. Control plates consisted of three L4 hermaphrodite mutants without males. Mating assays were carried out at 25 C and observations were taken after 3 days, noting the absence or presence of viable cross progeny.

Microscopy
The terminal phenotypes of dead eggs from maternal-effect lethal mutants were observed using DIC microscopy. Young adult homozygous mutants were dissected to release their eggs in either M9 buffer with Triton X-100 (0.5%; M9þTX) or distilled water and embryos were left to develop at 25 C overnight ($16 h). Embryos were mounted on 2% agarose pads and visualized using a Zeiss Axioplan 2 equipped with DIC optics. Images of representative embryos were captured using a Zeiss Axiocam 105 Color camera and ZEN 2.6 imaging software (Carl Zeiss Microscopy). For embryos incubated in distilled water, an osmotic integrity defective (OID) phenotype was noted for embryos that burst or swelled and filled the eggshell, as described by Sö nnichsen et al. (2005).

Identification of 58 putative essential genes
WGS was performed on a total of 157 strains, with depth of coverage ranging between 21x and 65x (average ¼ 38x). A minimum of two alleles for each of 75 complementation groups were sequenced and a total of 58 putative essential genes were identified (Table 2). Literature searches revealed that 49 of these genes have been annotated with lethal or sterile phenotypes from either mutant alleles or RNAi studies. Furthermore, 46 of the 157 alleles have been previously mentioned in publications with some phenotypic description (Vatcher et al. 1998;Gö nczy et al. 1999Gö nczy et al. , 2001Molin et al. 1999;Kaitna et al. 2002;Brauchle et al. 2003;Cockell et al. 2004;Delattre et al. 2004;Sonneville and Gö nczy 2004;Bischoff and Schnabel 2006;Langenhan et al. 2009;Nieto et al. 2010;von Tobel et al. 2014). Although 18 of these alleles have been previously sequenced, we were unaware of this when initially analyzing the data, and these alleles therefore served as a blind test set to validate our analysis approach. Eight of the nine genes represented in this set of 18 previously sequenced alleles were correctly identified by our pipeline. The gene cul-2 (Sonneville and Gö nczy 2004) escaped identification due to an intronic mutation in one allele that did not pass our filtering criteria but was found upon manual inspection of the sequencing data. A complete list of previously published and sequenced alleles can be found in Supplementary File S2 with their associated publications.
There were 17 complementation groups that had no common gene candidates in the mapping region after our initial analysis. Three of these allele pairs were later shown to be allelic with other complementation groups and were assigned gene candidates accordingly (see below, Table 4, and Supplementary File S2). We were unable to confidently assign gene candidates for the remaining 14 complementation groups. However, Supplementary File S2 contains the full list of common gene mutations (in both coding and noncoding regions) for each complementation group. This list may be used in conjunction with additional genetic assays to elucidate identities for these genes in the future.
While the list of 58 genes includes many known essential genes, among the known genes are alleles that are novel genetic       variants. Nineteen genes from this collection which were not previously studied or were not represented by lethal or sterile mutants were designated genes of interest (GOI; Table 3). These 19 GOI, represented by 40 alleles, were further characterized as part of this study. They include 14 genes (28 alleles) with a maternal-effect lethal phenotype and 5 genes (12 alleles) with a sterile phenotype.

Validation of candidate gene assignments
After isolation, the mutant alleles were each localized to a chromosomal region through deficiency mapping. These data were used to corroborate the candidate genes derived from WGS analysis and to resolve complementation groups with more than one initial gene candidate. There were 53 complementation groups with only one common gene candidate when coding and splicing variants were restricted to the mapped region. This was considered to be strong evidence that we correctly identified the essential genes. For the five groups which had more than one gene candidate in the mapped region, the nature of the mutations and existing knowledge about the genes in question were used to select a single candidate (Supplementary File S2). For the majority of complementation groups, the genomic position of the assigned gene is in agreement with the deficiency genetic mapping data ( Figure 1). However, with limited information available, it was not possible to assign precise map coordinates to the molecular lesions of the deficiency strains which were used for mapping. For three complementation groups, there is an apparent conflict between the deficiency mapping data and the gene candidates proposed through our analysis. These complementation groups were found to not map under any of the tested deficiencies, but were assigned gene candidates whose genomic coordinates fall into regions covered by the tested deficiencies (alleles of bckd-1A, top-3, and unc-112; Figure 1). In addition, two of these groups were assigned the same gene candidate as another, purportedly distinct, complementation group (Table 4). From WGS analysis, bckd-1A was the initial gene candidate for two different complementation groups, yet only one of these groups had been mapped to a deletion (tDf5) that covers the bckd-1A locus. Similarly, top-3 was the assigned gene candidate for three different complementation groups, only one of which was mapped under a deficiency (tDf5) encompassing that gene. By performing complementation tests with select alleles (Table 4), we concluded that the two bckd-1A groups are not distinct, and indeed they contain mutations in the same gene. One of the groups (gene-35) originally identified as top-3 is a double mutant which fails to complement gene-15 (top-3) and gene-34 (unknown gene).
Three candidate genes (nstp-2, C34D4.4, and F56D5.2) were selected for additional validation by generating a deletion of the gene in a wild-type background using CRISPR-Cas9 genome editing (Norris et al. 2015;Au et al. 2019). These genes were chosen because they were expected to be of interest to the broader research community. The deletion alleles have been verified with the PCR protocol described by Au et al. (2019). Guide RNA sequences and deletion-flanking sequences are listed in Supplementary File S3. Complementation testing between the newly generated CRISPR-Cas9 deletion mutants and the legacy mutant strains confirmed that the mutations are allelic, and the genes assigned to the legacy strains are correct (Supplementary File S3).

Human orthologs, gene ontology, and expression patterns
Of the 58 essential gene candidates, 47 genes have predicted human orthologs ( Table 2). Many of these genes in humans have been implicated in disease and are associated with OMIM disease phenotypes (Online Mendelian Inheritance in Man; omim.org). BLASTp searches revealed that the set of 19 GOI contains three nematode-specific genes (F56D5.2, perm-5, and T22B11.1) that have homologs in parasitic species, and two uncharacterized genes (D2096.12 and Y54G2A.73) that do not have homology outside the Caenorhabditis genus.
To gain insight into the functions of the putative essential genes, an overrepresentation test was used to elucidate the most prominent GO terms associated with them. The biological process terms overrepresented in the set of 58 genes include such terms as organelle organization (GO:0006996), nuclear division (GO:0000280), cellular metabolic process (GO:0044237), and DNA repair (GO:0006281), as shown in Figure 2. In the molecular function category, binding (GO:0005488) and catalytic activity (GO:0003824) are overrepresented by 41 genes (adjusted P ¼ 1.2E-07) and 28 genes (adjusted P ¼ 1.8E-03), respectively. A complete list of overrepresented GO terms and associated genes can be found in Supplementary File S3.
To examine the timing of gene expression throughout the life cycle, gene expression data from the modENCODE project (Hillier et al. 2009;Gerstein et al. 2010Gerstein et al. , 2014Boeck et al. 2016) was retrieved from GExplore (genome.sfu.ca/gexplore; Hutter et al. 2009;Hutter and Suh 2016) for the 19 GOI (Supplementary File S4). For 10 of the GOI, these data show high levels of gene expression in the early embryonic stages as well as in adulthood, and particularly in the hermaphrodite gonad. This expression pattern is characteristic of a maternal-effect gene, for which gene products are passed on to the embryo from the parent. Five genes have a maternal gene expression pattern as well as expression throughout other stages of the life cycle, indicating an additional, zygotic role for the gene. Seven genes have elevated expression levels in males and L4-stage hermaphrodites. These genes are suspected to be involved in sperm production or fertilization, and the associated strains were subjected to mating assays (see below).

Temperature sensitivity and mating assays for genes of interest
The 40 alleles associated with the 19 GOI were further examined to gain insight into the phenotypic consequences of their mutations. Each allele was assayed for temperature sensitivity, as some of the original mutant screening was carried out at 25 C. Five alleles (marked with a [ts] phenotype in Table 3) were deemed temperature-sensitive and could proliferate as homozygotes at a permissive temperature of 15 C, while being maternaleffect lethal or sterile at a restrictive temperature of 25 C. Curiously, four of these temperature-sensitive alleles were the results of stop codons, not missense mutations.
Seven candidate genes (16 alleles) were hypothesized to be involved in male fertility, based on the production of unfertilized oocytes by hermaphrodites and/or predominantly male gene expression patterns. These 16 strains were assayed for their ability to be rescued through mating with wild-type males. 14 of the strains were rescued by the mating assay, while two strains failed to rescue (Table 5). Phenotypic rescue through mating was consistent among alleles of the same gene in five of the seven genes, while two genes had conflicting results among the pair of alleles in their complementation groups (F56D5.2 and nstp-2).

Terminal phenotypes of maternal-effect lethal embryos
Using DIC microscopy, the terminal phenotypes of 28 maternaleffect lethal strains (a subset of the 40 GOI strains) were  observed. Representative images were selected and compiled into a catalog of terminal phenotypes (Supplementary File S5). Ten strains showed an OID phenotype (as described in Sö nnichsen et al. 2005) in nearly all embryos after incubation in distilled water, while three additional strains had only some embryos that exhibited this phenotype (Table 3). The OID phenotype was evident in embryos that filled the eggshell completely [for example, dgtr-1(t2043), Figure 3A] and eggs that burst in their hypotonic surroundings. Early embryonic arrest was observed in embryos from the two dlat-1 mutant alleles (t2035 and t2056), which arrested most often with only one to four cells (e.g., Figure 3B). Eleven strains had embryos that terminated with approximately 100-200 cells [for example, ZK688.9(t1433), Figure 3C]; while four strains developed into two-or threefold stage embryos that did not hatch and exhibited clear morphological defects, such as nstp-2(t1835) with a lumpy body wall and constricted nose tip ( Figure 3D).

Revisiting legacy mutant collections with WGS
In this study, we focused on reexamining legacy collections of C. elegans mutants isolated before the complete genome sequence was published (The C. elegans Sequencing Consortium 1998) and long before massively parallel sequencing was widely available. With major advances in sequencing technology in the past 30 years (reviewed in Goodwin et al. 2016), WGS has become affordable and accessible, making it possible to revisit past projects with new approaches and advanced capabilities. We have sequenced paired alleles from 75 complementation groups on chromosomes III-V, from which we identified 58 putative essential genes (Table 2). While WGS is a powerful tool, it does not stand alone as a solution to identifying mutant alleles. This study has shown the power of having multiple alleles in a complementation group when faced with the abundance of genomic variants found in WGS analysis. Indeed, when we sequenced four single alleles, which had no complementation pairs, we were unable to designate a single mutation as the variant responsible for maternaleffect lethality (data not shown). Our approach to gene identification proved to be effective and was validated by a combination of different methods. The blind test set of 18 previously sequenced alleles from which eight of nine genes were readily identified serves as an important validation of our analysis pipeline and gives confidence in the results we obtained. In addition, the deficiency mapping data, gene expression patterns from the modENCODE project, GO term analysis, and phenotypes documented from previous experiments provide evidence to support the gene candidates we assigned in these mutant collections.
The CRISPR-Cas9 deletion alleles we generated for selected gene candidates provide additional validation and will be made available to the research community to serve as useful tools for future studies. While the mutant alleles from the original study have been outcrossed, the genetic balancer background and additional mutations that persist can complicate phenotypic analysis. In contrast, these new CRISPR-Cas9 deletion strains were made in a wild-type background, which makes it much easier to handle them and interpret their mutant phenotypes. Furthermore, the pharyngeal GFP expression introduced by the gene-editing approach acts as a dominant and straightforward marker for tracking the alleles in a heterozygous population. This is useful, as the homozygous animals do not produce viable progeny.
The complementation groups that could not be assigned gene candidates in our analysis may have been complicated by variants in noncoding regions, poor sequencing coverage, or inaccurate complementation pairing, among other possibilities. In future work, tracking down the genes we were unable to identify will require repeating complementation tests and re-tooling the analysis approach.
Among the 19 GOI are four temperature-sensitive alleles, all the result of nonsense mutations. While unusual, temperaturesensitive nonsense alleles are not unprecedented and have been found in several organisms (e.g., Golden and Riddle 1984;Samson et al. 1995). Both nonsense alleles of T22B11.1 are temperaturesensitive. This makes us suspicious that perhaps the wild-type process this gene is involved in is itself temperature-dependent. This idea stems from the observation that all alleles of the dauer constitutive genes daf-4 and daf-7 are temperature-sensitive (Golden and Riddle 1984). These genes have both amber stop alleles and missense alleles and all are temperature-sensitive. If T22B11.1 were indeed involved in a temperature-dependent process we would expect a deletion allele to also be temperaturesensitive. The gene trcs-1 also has a temperature-sensitive nonsense allele and, in addition, it has a leaky temperature-sensitive missense allele. Again, the product of this gene may be involved in a temperature-dependent process. One requires a different explanation for cept-2 where we have identified two alleles and only one, the nonsense allele, is temperature-sensitive. In Drosophila, the elav gene has temperature-sensitive alleles that are nonsense alleles and yet they make full-length proteins (Samson et al. 1995). A detailed study of this gene and its gene product concluded that, at some low level, an alternative amino acid is substituted at the stop site, allowing for a full length but unstable protein (Samson et al. 1995). This type of information suppression suggests we may observe a low-abundance, fulllength protein product for cept-2.

GO analysis reveals common themes and gaps in our knowledge
The underlying biological themes of the 58 putative essential genes were revealed by examining their GO terms. The biological processes represented in Figure 2 help to confirm the nature of this set, as a collection of genes that are required for essential functions such as cell division, metabolism, and development. Performing GO-term analysis also revealed that a number of the genes in this collection lacked sufficient annotation to be interpreted this way. We found four genes about which there is little to nothing known (D2096.12, F56D5.2, T22B11.1, and Y54G2A.73).
For example, F56D5.2 is a gene with no associated GO terms, no known protein domains, and no orthologs in other model organisms. These wholly uncharacterized genes are intriguing candidates which may help uncover new biological processes and biochemical pathways that are evidently fundamental to life for this organism.

Examining expression patterns leads to discovery of genes involved in male fertility
The life stage-specific expression patterns (Supplementary File S4) provide some insight into the roles the genes in this collection play in development. Fifteen of the nineteen GOI are highly expressed in the early embryo and hermaphrodite gonad, which suggests that the gene product is passed on to the embryo from the parent. Five of these maternal genes also have elevated expression during late embryonic and larval stages, which suggests they are pleiotropic. The zygotic functions of these genes must be nonessential or else a zygotic lethal, rather than maternal-effect lethal, phenotype would be observed. We also identified four genes that are most highly expressed in males and L4 hermaphrodites, as well as three genes that have prominent male expression in addition to characteristic maternal expression patterns. Mating assays confirmed that these maleexpressed genes have an essential role in male fertility. Studies have shown that genes expressed in sperm are largely insensitive to RNAi (Fraser et al. 2000;Gö nczy et al. 2000;Reinke et al. 2004;del Castillo-Olivares et al. 2009;Zhu et al. 2009;Ma et al. 2014), making these types of genes particularly difficult to identify in high-throughput RNAi screens. With the availability of RNA-seq data across different life stages for nearly every gene in the C. elegans genome (Hillier et al. 2009;Gerstein et al. 2010Gerstein et al. , 2014Boeck et al. 2016;Tintori et al. 2016;Packer et al. 2019), screening for characteristic gene expression patterns may be a useful approach for identifying sterile and maternal-effect lethal genes that remain to be discovered.
We propose that the seven male-expressed genes are involved in sperm production and/or function (Table 5). These genes are mostly uncharacterized, and this is the first reporting of their involvement in male fertility. While the mutant hermaphrodites lay unfertilized oocytes (5 genes) or dead eggs (2 genes), this phenotype could be rescued in 14 of the 16 alleles by the introduction of wild-type sperm through mating. The two alleles that could not be rescued had allele pairs in the same complementation groups that were rescued in the mating assay. One of these discrepancies, between F56D5.2(t1744) and F56D5.2(t1791), was resolved when we found a second mutation in a nearby essential gene that was likely responsible for the inability of one strain to be rescued (data not shown). The presence of additional lethal mutations in the genome is unsurprising given the nature of chemical mutagenesis, and it reinforces the advantage of having multiple alleles for a gene when interpreting mutant phenotypes.

Interpreting terminal phenotypes of maternaleffect lethal mutants
The catalog of terminal phenotypes (Supplementary File S5) created in this study provides a window into the roles the maternaleffect genes play in development. Some of these phenotypes corroborate previously observed phenotypes from RNAi studies. For example, RNAi knockdown experiments have shown that DLAT-1 is an enzyme involved in metabolic processes required for cell division in one-cell C. elegans embryos (Rahman et al. 2014). We uncovered two alleles of dlat-1 in this study (t2035 and t2056) in which most embryos arrest at the one-to four-cell stage ( Figure 3B). The mutant alleles presented here can confirm previously reported phenotypes and serve as new genetic tools for continuing the study of essential gene function.
We also identified alleles for six genes that exhibit an OID phenotype, resulting in embryos that filled the eggshell completely or burst in distilled water. More than 100 genes have been identified in RNAi screens as important for the osmotic integrity of developing embryos (reviewed in Stein and Golden 2018). Some of these genes have roles in lipid metabolism (Rappleye et al. 2003;Benenati et al. 2009), cellular trafficking (Rappleye et al. 1999), and chitin synthesis (Johnston et al. 2006). Four of the six genes identified with OID mutants in this study have been previously implicated in osmotic sensitivity: dgtr-1 is involved in lipid biosynthesis (Carvalho et al. 2011;Olson et al. 2012), trcs-1 is involved in lipid metabolism and membrane trafficking (Green et al. 2011); perm-5 is predicted to have lipid binding activity; and F21D5.1 is an ortholog of human PGM3, an enzyme involved in the hexosamine pathway which generates substrates for chitin synthase. We found OID mutants for two additional genes that were not previously characterized with this phenotype, bckd-1A and D2096.12. bckd-1A is a component of the branched-chain alpha-keto dehydrogenase complex, which is involved in fatty acid biosynthesis (Kniazeva et al. 2004); this may be indicative of a role in generating or maintaining the lipid-rich permeability barrier. D2096.12 is a Caenorhabditis-specific gene with no known protein domains. Elucidating the function of this uncharacterized gene may lead to new insights about the biochemistry of eggshell formation and permeability in C. elegans embryos.
Most of the mutant strains we examined with DIC microscopy arrested around the 100-to 200-cell stage as a seemingly disorganized group of cells (e.g., Figure 3C). Others developed into twofold or later stage embryos that moved inside the eggshell but did not hatch (e.g., Figure 3D). The terminal phenotypes documented here reveal how long the embryo can persist without the maternal contribution of gene products, and the developmental defects that ensue. Future studies might make use of fluorescent markers and automated cell lineage tracking (e.g., Thomas et al. 1996;Schnabel et al. 1997;Bao et al. 2006;Wang et al. 2019) as well as single-cell transcriptome data (Tintori et al. 2016;Packer et al. 2019) to further investigate these essential genes.

Relevance beyond C. elegans
In this collection of 58 putative essential genes, there are 47 genes (81%) with human orthologs; a two-fold enrichment when compared to all C. elegans genes, 41% of which have human orthologs (Kim et al. 2018). This is in line with previous findings that essential genes are more often phylogenetically conserved than nonessential genes ( Biological process GO terms overrepresented in the set of 58 putative essential genes. Bar length represents the number of genes in the set associated with each GO term. Overrepresentation was analyzed using PANTHER version 16.0 (Thomas et al. 2003) and P-values were adjusted with the Bonferroni multiple testing correction. Results were filtered to include terms with adjusted P < 0.05 and edited to exclude redundant terms. A list of overrepresented GO terms and associated genes can be found in Supplementary File S3. genetic disorders by providing new opportunities to interrogate gene function, explore genetic interactions, and screen prospective therapeutics. Nematode-specific genes that are essential are important to nematode biology in general and are particularly relevant in parasitic nematology. We found three genes in our GOI list (F56D5.2, perm-5, and T22B11.1) that have orthologs in parasitic nematode species and not in other phyla. With growing anthelminthic drug resistance around the world (Jabbar et al. 2006), novel management strategies are needed to combat parasitic nematodes, which infect crops, livestock, and people worldwide (Nicol et al. 2011;Wolstenholme et al. 2004;Hotez et al. 2008). Essential genes are desirable targets for drug development, yet identifying such genes in parasites experimentally is difficult (Kumar et al. 2007;Doyle et al. 2010). Thus, as a free-living nematode, C. elegans is a widely used model for genetically intractable parasitic species (Bü rglin et al. 1998;Hashmi et al. 2001). Our identification of novel essential genes with orthologs in parasitic nematodes may provide new opportunities to explore management strategies.
It is our hope that the alleles and phenotypes presented here will serve as a starting point and guide future research to elucidate the specific roles these genes play in embryogenesis. All of the alleles presented in this study are available to the research community through the Caenorhabditis Genetics Center (cgc.umn.edu) and we anticipate they will serve as a valuable resource in the years to come. The wealth of material uncovered in this specific legacy collection will hopefully inspire similar explorations of other frozen mutant collections.

Data availability
The raw sequence data from this study have been deposited in the NCBI Sequence Read Archive (SRA; ncbi.nlm.nih.gov/sra) under accession number PRJNA628853. Supplemental material is available at figshare: Supplementary File S1 (Detailed experimental methods from the generation of Collection B); Supplementary File S2 (Gene candidate selection, Alleles and associated publications, Common gene hits with deficiency mapping for each complementation group); Supplementary File S3 (CRISPR-Cas9 deletion alleles and associated sequences, GO terms and associated genes); Supplementary File S4 (Life stage-specific expression patterns); Supplementary File S5 (Terminal phenotypes of maternal-effect lethal embryos). Figshare DOI: https://doi.org/ 10.25386/genetics.14702139 F56D5.2(t1744) to reveal an additional mutation in a nearby essential gene. They also thank the two anonymous reviewers for their helpful comments on the manuscript.