Identifying the C. elegans vulval transcriptome

Development of the C. elegans vulva is a classic model of organogenesis. This system, which starts with six equipotent cells, encompasses diverse types of developmental event, including developmental competence, multiple signaling events to control precise and faithful patterning of three cell fates, execution and proliferation of specific cell lineages, and a series of sophisticated morphogenetic events. Early events have been subjected to extensive mutational and genetic investigations and later events to cell biological analyses. We infer the existence of dramatically changing profiles of gene expression that accompanies the observed changes in development. Yet except from serendipitous discovery of several transcription factors expressed in dynamic patterns in vulval lineages, our knowledge of the transcriptomic landscape during vulval development is minimal. This study describes the composition of a vulva-specific transcriptome. We used tissue specific harvesting of mRNAs via immunoprecipitation of epitope-tagged poly(A) binding protein, PAB-1, heterologously expressed by a promoter known to express GFP in vulval cells throughout their development. The identified transcriptome was small but tightly interconnected. From this data set we identified several genes with identified functions in development of the vulva and validated more with promoter-GFP reporters of expression. For one target, lag-1, promoter-GFP expression was limited but fluorescent tag of the endogenous protein revealed extensive expression. Thus, we have identified a transcriptome of the C. elegans as a launching pad for exploration of functions of these genes in organogenesis.


51
Organogenesis involves an elaborate series of developmental events that encompass much 52 of the spectrum of developmental biology. This process is presumed to be accompanied by

57
The C. elegans vulva is a classic system for the genetic investigation of organogenesis 58 . Thus far, most analysis has focused on the initial patterning of the six vulval 59 precursor cells (VPCs). These roughly equipotent cells are located in an anterior-to-posterior line 60 along the ventral midline of the animal (Fig. 1A). Vulval  103 Tan et al. 1998;Miller et al. 2000). Furthermore, the promoter of lin-31 drives GFP expression 104 chiefly in the VPCs (Tan et al. 1998). Thus, we used the lin-31 promoter to transgenically express 105 bait protein in the VPCs throughout development.

106
The C. elegans ortholog of polyadenylation binding protein 1, PAB-1, specifically binds poly(A) 107 tails of mature mRNAs and can be used to immunoprecipitate mRNAs from whole-RNA

131
We have prepared two independent transgenic strains expressing our vulva-specific pulldown 132 construct ("+PAB-1" biological replicates; DV3507 and DV3509) and one control strain in which 133 we deleted sequences encoding PAB-1 (DV3520; "-PAB-1" negative control). We performed each 134 immunoprecipitation in duplicate (technical replicates), processing a total of six samples. We 135 obtained approximately ~90M mappable reads for each biological and technical replicate and 136 ~30M mappable reads for our negative control strain DV3520. We could map more than 90% of 137 the total reads across all samples (Supplemental Figure S1A). The results obtained with our 138 biological replicates correlate well (Supplemental Figure S1B-C).

140
The C. elegans vulva transcriptome 141 Using our PAT-Seq approach, we were able to map 1,671 protein-coding genes in the C. elegans 142 vulva, which corresponds to 8.2% of all C. elegans protein-coding genes (20,362 protein-coding 143 genes; WS250; Fig. 2A Table S1).

157
In addition to the well-known vulval marker lin-31, other genes mutated to a lineage-defective 158 phenotypes during development of the vulva were identified by our sequencing effort, including  Table S1).

165
We also identified 23 genes that, when mutated, confer defective locomotion (Uncoordinated; 166 "Unc"). Some of these, like UNC-31, an ortholog of the human CADPS (calcium-dependent    and HMP-2, respectively, rather than the same protein as in other systems (Eisenmann 2005)).

189
The gene network shaped by our identified genes, although small, is highly interconnected

203
Unfortunately, our PAT-Seq method was not designed to identify miRNAs, and more experiments 204 need to be performed to validate the presence of these miRNAs in this tissue.

225
Our PAT-Seq analysis, based on a comparison of data sets generated with Plin-31::GFP::PAB-226 1::3xFLAG "+PAB-1" vs. Plin-31::GFP::3xFLAG "-PAB-1" control, generated a set of genes 227 potentially expressed in VPCs. Notably, the expression of this set of genes is not expected to be 228 exclusive to VPCs and may also be expressed in other tissues.

229
To validate our approach, we selected candidate genes identified in this study for analysis 230 with promoter::GFP transgenes to ascertain whether they are expressed in VPCs. We cloned 231 sequences upstream of the ATG initiator methionine codon for several genes into vector 232 pPDPD95.67 with 2xNLS::GFP (nuclear localization signal) and generated extrachromosomal 233 arrays harboring these clones (Fig. 3A). Given the interests of our research program, we focused 234 on genes potentially regulating signaling and/or developmental biology, with some randomly 235 selected genes included.

261
We speculated that the reason why we were unable to detect lag-1 expression in the germline 262 and embryos -but still detect it in the vulva -was because this gene possesses four splice 263 variants that differ at their 5' end but share the same 3' end ( Fig. 4A). We hypothesized that 264 perhaps each of these isoforms possess differential tissue localization, and our cloned promoter 265 region, which was specific to the a isoform, while driving strong vulva expression, was not enough 266 to drive the expression of other lag-1 isoforms, perhaps expressed in germline and embryos. To 267 further explore the "missing" expression from our cloned lag-1a putative promoter sequence in 268 the germline and embryos, we used CRISPR technology to tag the endogenous lag-1 gene at the 269 3' end with sequences encoding mNeonGreen (mNG) fluorescent protein and a 3xFLAG epitope.

270
We expected to detect full-length protein fusions regardless of the use of different promoters at 271 the 5' end. Specifically, we used the "self-excising cassette" (SEC) method for two-step positive-  uterine lineages throughout larval development (Fig. 5). We also observed dynamic LAG-1 281 expression in various embryonic cells (Fig. 6). LAG-1::mNeonGreen expression was also 282 observed broadly throughout the animal at various stages (Fig. 7A,B). In conjunction with the

304
Unfortunately, there are no available C. elegans vulva datasets we could use to compare our 305 results, and we cannot conclusively pinpoint all the genes expressed in this organ. Importantly,

306
we have sequenced two independently generated transgenic animal lines (biological replicates 307 DV3507 and DV3509), with a technical replicate each, and subtracted the genes identified in the 308 sequencing results of our negative control (DV3520), which is unable to bind poly(A) tails, to thus 309 isolate transcripts specific to the vulva. Our PCA analysis (Supplemental Fig. S1) shows that our 310 two biological replicates correlate well with each other, suggesting little contamination.

311
Using transgenes harboring promoter::GFP transcriptional fusion reporters, we also were able 312 to validate putative targets identified in our study (e.g. lag-1, toe-1), as being expressed in the 313 VPCs, while for others (e. g. shc-1, mbl-1 and F23A7.4) we were unable to detect expression in 314 VPCs, perhaps because of false positive candidates or the insufficiency of the promoter::GFP 315 transgenes in reflecting the full expression patterns of genes. One validated target, lag-1, 316 exhibited limited expression via promoter::GFP fusion analysis (Fig. 3), but our CRISPR tagging 317 of the endogenous protein revealed spatiotemporally broad and dynamic expression (Figs. 5-7).

318
A caveat to our analysis is that the lin-31 promoter sequences derived from the plasmid pB255 319 (Tan et al. 1998) also drive expression of GFP in two to three small cells, perhaps neurons, each 320 in the head and tail. We have been unable to identify these cells, though could likely do so using

328
Another caveat is the fusion of 3˚ VPC cells to the hyp7 syncytium after initial patterning of 329 VPC cell fates. VPCs are specialized hypodermal cells surrounded by nonspecialized 330 hypodermis, called the hyp7, a syncytium comprised of many fused hypodermal cells. 1˚ and 2˚ 331 cells (Fig. 1A) go through stereotyped series of cell divisions, but non-vulval 3˚ cells divide once 332 and fuse to the surrounding syncytium. The release of "+PAB-1" protein into the general hyp7 333 syncytium may result in identification of transcripts specific to the hyp7. But we expect the 334 concentration of "+PAB-1" protein in the hyp7 after fusion of the 3˚ daughters to be relatively low,

335
and GFP in the hyp7 was not observed after the 3˚ fusion at the L3 stage. Unlike improvement of 336 the lin-31 promoter used to express "+PAB-1" protein, we foresee no plan for working around this 337 limitation to our approach.

338
A final limitation to our approach is the complexity of the vulval system over time. Our bait 339 "+PAB-1" protein and control "-PAB-1" protein proteins were expressed from the L1 to young adult 340 stages (Fig. 1C)

356
Yet it is important that we pilot this technology in the vulval system to be able to refine our 357 analysis in the future. A more specific vulva promoter driving bait "+PAB-1" protein and control "-

437
The recovered pellets were thawed on ice and suspended in 2 mL of lysis buffer (150 mM NaCl,

553
CRISPR tagging strategy using the SEC approach (Dickinson, et al. 2015). Detection primers are 554 denoted by "QZ" (see Supplementary Table 4. C) PCR detection of wild-type and homozygous 555 insertion bands. D) Western blot detection using anti-FLAG antibody of endogenous LAG-556 1::mNeonGreen::3xFLAG protein; the tag portion of the protein is predicted to be 28.8 kD. Isoform 557 D with tag is predicted to be 116.5 kD, while isoforms A, B and C are predicted to be 103.7, 103.5 558 and 98.5 kD, respectively. We detected two general band species but due to gel smiling it was